| .TH HTML 3 |
| .SH NAME |
| parsehtml, |
| printitems, |
| validitems, |
| freeitems, |
| freedocinfo, |
| dimenkind, |
| dimenspec, |
| targetid, |
| targetname, |
| fromStr, |
| toStr |
| \- HTML parser |
| .SH SYNOPSIS |
| .nf |
| .PP |
| .ft L |
| #include <u.h> |
| #include <libc.h> |
| #include <html.h> |
| .ft P |
| .PP |
| .ta \w'\fLToken* 'u |
| .B |
| Item* parsehtml(uchar* data, int datalen, Rune* src, int mtype, |
| .B |
| int chset, Docinfo** pdi) |
| .PP |
| .B |
| void printitems(Item* items, char* msg) |
| .PP |
| .B |
| int validitems(Item* items) |
| .PP |
| .B |
| void freeitems(Item* items) |
| .PP |
| .B |
| void freedocinfo(Docinfo* d) |
| .PP |
| .B |
| int dimenkind(Dimen d) |
| .PP |
| .B |
| int dimenspec(Dimen d) |
| .PP |
| .B |
| int targetid(Rune* s) |
| .PP |
| .B |
| Rune* targetname(int targid) |
| .PP |
| .B |
| uchar* fromStr(Rune* buf, int n, int chset) |
| .PP |
| .B |
| Rune* toStr(uchar* buf, int n, int chset) |
| .SH DESCRIPTION |
| .PP |
| This library implements a parser for HTML 4.0 documents. |
| The parsed HTML is converted into an intermediate representation that |
| describes how the formatted HTML should be laid out. |
| .PP |
| .I Parsehtml |
| parses an entire HTML document contained in the buffer |
| .I data |
| and having length |
| .IR datalen . |
| The URL of the document should be passed in as |
| .IR src . |
| .I Mtype |
| is the media type of the document, which should be either |
| .B TextHtml |
| or |
| .BR TextPlain . |
| The character set of the document is described in |
| .IR chset , |
| which can be one of |
| .BR US_Ascii , |
| .BR ISO_8859_1 , |
| .B UTF_8 |
| or |
| .BR Unicode . |
| The return value is a linked list of |
| .B Item |
| structures, described in detail below. |
| As a side effect, |
| .BI * pdi |
| is set to point to a newly created |
| .B Docinfo |
| structure, containing information pertaining to the entire document. |
| .PP |
| The library expects two allocation routines to be provided by the |
| caller, |
| .B emalloc |
| and |
| .BR erealloc . |
| These routines are analogous to the standard malloc and realloc routines, |
| except that they should not return if the memory allocation fails. |
| In addition, |
| .B emalloc |
| is required to zero the memory. |
| .PP |
| For debugging purposes, |
| .I printitems |
| may be called to display the contents of an item list; individual items may |
| be printed using the |
| .B %I |
| print verb, installed on the first call to |
| .IR parsehtml . |
| .I validitems |
| traverses the item list, checking that all of the pointers are valid. |
| It returns |
| .B 1 |
| is everything is ok, and |
| .B 0 |
| if an error was found. |
| Normally, one would not call these routines directly. |
| Instead, one sets the global variable |
| .I dbgbuild |
| and the library calls them automatically. |
| One can also set |
| .IR warn , |
| to cause the library to print a warning whenever it finds a problem with the |
| input document, and |
| .IR dbglex , |
| to print debugging information in the lexer. |
| .PP |
| When an item list is finished with, it should be freed with |
| .IR freeitems . |
| Then, |
| .I freedocinfo |
| should be called on the pointer returned in |
| .BI * pdi\f1. |
| .PP |
| .I Dimenkind |
| and |
| .I dimenspec |
| are provided to interpret the |
| .B Dimen |
| type, as described in the section |
| .IR "Dimension Specifications" . |
| .PP |
| Frame target names are mapped to integer ids via a global, permanent mapping. |
| To find the value for a given name, call |
| .IR targetid , |
| which allocates a new id if the name hasn't been seen before. |
| The name of a given, known id may be retrieved using |
| .IR targetname . |
| The library predefines |
| .BR FTtop , |
| .BR FTself , |
| .B FTparent |
| and |
| .BR FTblank . |
| .PP |
| The library handles all text as Unicode strings (type |
| .BR Rune* ). |
| Character set conversion is provided by |
| .I fromStr |
| and |
| .IR toStr . |
| .I FromStr |
| takes |
| .I n |
| Unicode characters from |
| .I buf |
| and converts them to the character set described by |
| .IR chset . |
| .I ToStr |
| takes |
| .I n |
| bytes from |
| .IR buf , |
| interpretted as belonging to character set |
| .IR chset , |
| and converts them to a Unicode string. |
| Both routines null-terminate the result, and use |
| .B emalloc |
| to allocate space for it. |
| .SS Items |
| The return value of |
| .I parsehtml |
| is a linked list of variant structures, |
| with the generic portion described by the following definition: |
| .PP |
| .EX |
| .ta 6n +\w'Genattr* 'u |
| typedef struct Item Item; |
| struct Item |
| { |
| Item* next; |
| int width; |
| int height; |
| int ascent; |
| int anchorid; |
| int state; |
| Genattr* genattr; |
| int tag; |
| }; |
| .EE |
| .PP |
| The field |
| .B next |
| points to the successor in the linked list of items, while |
| .BR width , |
| .BR height , |
| and |
| .B ascent |
| are intended for use by the caller as part of the layout process. |
| .BR Anchorid , |
| if non-zero, gives the integer id assigned by the parser to the anchor that |
| this item is in (see section |
| .IR Anchors ). |
| .B State |
| is a collection of flags and values described as follows: |
| .PP |
| .EX |
| .ta 6n +\w'IFindentshift = 'u |
| enum |
| { |
| IFbrk = 0x80000000, |
| IFbrksp = 0x40000000, |
| IFnobrk = 0x20000000, |
| IFcleft = 0x10000000, |
| IFcright = 0x08000000, |
| IFwrap = 0x04000000, |
| IFhang = 0x02000000, |
| IFrjust = 0x01000000, |
| IFcjust = 0x00800000, |
| IFsmap = 0x00400000, |
| IFindentshift = 8, |
| IFindentmask = (255<<IFindentshift), |
| IFhangmask = 255 |
| }; |
| .EE |
| .PP |
| .B IFbrk |
| is set if a break is to be forced before placing this item. |
| .B IFbrksp |
| is set if a 1 line space should be added to the break (in which case |
| .B IFbrk |
| is also set). |
| .B IFnobrk |
| is set if a break is not permitted before the item. |
| .B IFcleft |
| is set if left floats should be cleared (that is, if the list of pending left floats should be placed) |
| before this item is placed, and |
| .B IFcright |
| is set for right floats. |
| In both cases, IFbrk is also set. |
| .B IFwrap |
| is set if the line containing this item is allowed to wrap. |
| .B IFhang |
| is set if this item hangs into the left indent. |
| .B IFrjust |
| is set if the line containing this item should be right justified, |
| and |
| .B IFcjust |
| is set for center justified lines. |
| .B IFsmap |
| is used to indicate that an image is a server-side map. |
| The low 8 bits, represented by |
| .BR IFhangmask , |
| indicate the current hang into left indent, in tenths of a tabstop. |
| The next 8 bits, represented by |
| .B IFindentmask |
| and |
| .BR IFindentshift , |
| indicate the current indent in tab stops. |
| .PP |
| The field |
| .B genattr |
| is an optional pointer to an auxiliary structure, described in the section |
| .IR "Generic Attributes" . |
| .PP |
| Finally, |
| .B tag |
| describes which variant type this item has. |
| It can have one of the values |
| .BR Itexttag , |
| .BR Iruletag , |
| .BR Iimagetag , |
| .BR Iformfieldtag , |
| .BR Itabletag , |
| .B Ifloattag |
| or |
| .BR Ispacertag . |
| For each of these values, there is an additional structure defined, which |
| includes Item as an unnamed initial substructure, and then defines additional |
| fields. |
| .PP |
| Items of type |
| .B Itexttag |
| represent a piece of text, using the following structure: |
| .PP |
| .EX |
| .ta 6n +\w'Rune* 'u |
| struct Itext |
| { |
| Item; |
| Rune* s; |
| int fnt; |
| int fg; |
| uchar voff; |
| uchar ul; |
| }; |
| .EE |
| .PP |
| Here |
| .B s |
| is a null-terminated Unicode string of the actual characters making up this text item, |
| .B fnt |
| is the font number (described in the section |
| .IR "Font Numbers" ), |
| and |
| .B fg |
| is the RGB encoded color for the text. |
| .B Voff |
| measures the vertical offset from the baseline; subtract |
| .B Voffbias |
| to get the actual value (negative values represent a displacement down the page). |
| The field |
| .B ul |
| is the underline style: |
| .B ULnone |
| if no underline, |
| .B ULunder |
| for conventional underline, and |
| .B ULmid |
| for strike-through. |
| .PP |
| Items of type |
| .B Iruletag |
| represent a horizontal rule, as follows: |
| .PP |
| .EX |
| .ta 6n +\w'Dimen 'u |
| struct Irule |
| { |
| Item; |
| uchar align; |
| uchar noshade; |
| int size; |
| Dimen wspec; |
| }; |
| .EE |
| .PP |
| Here |
| .B align |
| is the alignment specification (described in the corresponding section), |
| .B noshade |
| is set if the rule should not be shaded, |
| .B size |
| is the height of the rule (as set by the size attribute), |
| and |
| .B wspec |
| is the desired width (see section |
| .IR "Dimension Specifications" ). |
| .PP |
| Items of type |
| .B Iimagetag |
| describe embedded images, for which the following structure is defined: |
| .PP |
| .EX |
| .ta 6n +\w'Iimage* 'u |
| struct Iimage |
| { |
| Item; |
| Rune* imsrc; |
| int imwidth; |
| int imheight; |
| Rune* altrep; |
| Map* map; |
| int ctlid; |
| uchar align; |
| uchar hspace; |
| uchar vspace; |
| uchar border; |
| Iimage* nextimage; |
| }; |
| .EE |
| .PP |
| Here |
| .B imsrc |
| is the URL of the image source, |
| .B imwidth |
| and |
| .BR imheight , |
| if non-zero, contain the specified width and height for the image, |
| and |
| .B altrep |
| is the text to use as an alternative to the image, if the image is not displayed. |
| .BR Map , |
| if set, points to a structure describing an associated client-side image map. |
| .B Ctlid |
| is reserved for use by the application, for handling animated images. |
| .B Align |
| encodes the alignment specification of the image. |
| .B Hspace |
| contains the number of pixels to pad the image with on either side, and |
| .B Vspace |
| the padding above and below. |
| .B Border |
| is the width of the border to draw around the image. |
| .B Nextimage |
| points to the next image in the document (the head of this list is |
| .BR Docinfo.images ). |
| .PP |
| For items of type |
| .BR Iformfieldtag , |
| the following structure is defined: |
| .PP |
| .EX |
| .ta 6n +\w'Formfield* 'u |
| struct Iformfield |
| { |
| Item; |
| Formfield* formfield; |
| }; |
| .EE |
| .PP |
| This adds a single field, |
| .BR formfield , |
| which points to a structure describing a field in a form, described in section |
| .IR Forms . |
| .PP |
| For items of type |
| .BR Itabletag , |
| the following structure is defined: |
| .PP |
| .EX |
| .ta 6n +\w'Table* 'u |
| struct Itable |
| { |
| Item; |
| Table* table; |
| }; |
| .EE |
| .PP |
| .B Table |
| points to a structure describing the table, described in the section |
| .IR Tables . |
| .PP |
| For items of type |
| .BR Ifloattag , |
| the following structure is defined: |
| .PP |
| .EX |
| .ta 6n +\w'Ifloat* 'u |
| struct Ifloat |
| { |
| Item; |
| Item* item; |
| int x; |
| int y; |
| uchar side; |
| uchar infloats; |
| Ifloat* nextfloat; |
| }; |
| .EE |
| .PP |
| The |
| .B item |
| points to a single item (either a table or an image) that floats (the text of the |
| document flows around it), and |
| .B side |
| indicates the margin that this float sticks to; it is either |
| .B ALleft |
| or |
| .BR ALright . |
| .B X |
| and |
| .B y |
| are reserved for use by the caller; these are typically used for the coordinates |
| of the top of the float. |
| .B Infloats |
| is used by the caller to keep track of whether it has placed the float. |
| .B Nextfloat |
| is used by the caller to link together all of the floats that it has placed. |
| .PP |
| For items of type |
| .BR Ispacertag , |
| the following structure is defined: |
| .PP |
| .EX |
| .ta 6n +\w'Item; 'u |
| struct Ispacer |
| { |
| Item; |
| int spkind; |
| }; |
| .EE |
| .PP |
| .B Spkind |
| encodes the kind of spacer, and may be one of |
| .B ISPnull |
| (zero height and width), |
| .B ISPvline |
| (takes on height and ascent of the current font), |
| .B ISPhspace |
| (has the width of a space in the current font) and |
| .B ISPgeneral |
| (for all other purposes, such as between markers and lists). |
| .SS Generic Attributes |
| .PP |
| The genattr field of an item, if non-nil, points to a structure that holds |
| the values of attributes not specific to any particular |
| item type, as they occur on a wide variety of underlying HTML tags. |
| The structure is as follows: |
| .PP |
| .EX |
| .ta 6n +\w'SEvent* 'u |
| typedef struct Genattr Genattr; |
| struct Genattr |
| { |
| Rune* id; |
| Rune* class; |
| Rune* style; |
| Rune* title; |
| SEvent* events; |
| }; |
| .EE |
| .PP |
| Fields |
| .BR id , |
| .BR class , |
| .B style |
| and |
| .BR title , |
| when non-nil, contain values of correspondingly named attributes of the HTML tag |
| associated with this item. |
| .B Events |
| is a linked list of events (with corresponding scripted actions) associated with the item: |
| .PP |
| .EX |
| .ta 6n +\w'SEvent* 'u |
| typedef struct SEvent SEvent; |
| struct SEvent |
| { |
| SEvent* next; |
| int type; |
| Rune* script; |
| }; |
| .EE |
| .PP |
| Here, |
| .B next |
| points to the next event in the list, |
| .B type |
| is one of |
| .BR SEonblur , |
| .BR SEonchange , |
| .BR SEonclick , |
| .BR SEondblclick , |
| .BR SEonfocus , |
| .BR SEonkeypress , |
| .BR SEonkeyup , |
| .BR SEonload , |
| .BR SEonmousedown , |
| .BR SEonmousemove , |
| .BR SEonmouseout , |
| .BR SEonmouseover , |
| .BR SEonmouseup , |
| .BR SEonreset , |
| .BR SEonselect , |
| .B SEonsubmit |
| or |
| .BR SEonunload , |
| and |
| .B script |
| is the text of the associated script. |
| .SS Dimension Specifications |
| .PP |
| Some structures include a dimension specification, used where |
| a number can be followed by a |
| .B % |
| or a |
| .B * |
| to indicate |
| percentage of total or relative weight. |
| This is encoded using the following structure: |
| .PP |
| .EX |
| .ta 6n +\w'int 'u |
| typedef struct Dimen Dimen; |
| struct Dimen |
| { |
| int kindspec; |
| }; |
| .EE |
| .PP |
| Separate kind and spec values are extracted using |
| .I dimenkind |
| and |
| .IR dimenspec . |
| .I Dimenkind |
| returns one of |
| .BR Dnone , |
| .BR Dpixels , |
| .B Dpercent |
| or |
| .BR Drelative . |
| .B Dnone |
| means that no dimension was specified. |
| In all other cases, |
| .I dimenspec |
| should be called to find the absolute number of pixels, the percentage of total, |
| or the relative weight. |
| .SS Background Specifications |
| .PP |
| It is possible to set the background of the entire document, and also |
| for some parts of the document (such as tables). |
| This is encoded as follows: |
| .PP |
| .EX |
| .ta 6n +\w'Rune* 'u |
| typedef struct Background Background; |
| struct Background |
| { |
| Rune* image; |
| int color; |
| }; |
| .EE |
| .PP |
| .BR Image , |
| if non-nil, is the URL of an image to use as the background. |
| If this is nil, |
| .B color |
| is used instead, as the RGB value for a solid fill color. |
| .SS Alignment Specifications |
| .PP |
| Certain items have alignment specifiers taken from the following |
| enumerated type: |
| .PP |
| .EX |
| .ta 6n |
| enum |
| { |
| ALnone = 0, ALleft, ALcenter, ALright, ALjustify, |
| ALchar, ALtop, ALmiddle, ALbottom, ALbaseline |
| }; |
| .EE |
| .PP |
| These values correspond to the various alignment types named in the HTML 4.0 |
| standard. |
| If an item has an alignment of |
| .B ALleft |
| or |
| .BR ALright , |
| the library automatically encapsulates it inside a float item. |
| .PP |
| Tables, and the various rows, columns and cells within them, have a more |
| complex alignment specification, composed of separate vertical and |
| horizontal alignments: |
| .PP |
| .EX |
| .ta 6n +\w'uchar 'u |
| typedef struct Align Align; |
| struct Align |
| { |
| uchar halign; |
| uchar valign; |
| }; |
| .EE |
| .PP |
| .B Halign |
| can be one of |
| .BR ALnone , |
| .BR ALleft , |
| .BR ALcenter , |
| .BR ALright , |
| .B ALjustify |
| or |
| .BR ALchar . |
| .B Valign |
| can be one of |
| .BR ALnone , |
| .BR ALmiddle , |
| .BR ALbottom , |
| .BR ALtop |
| or |
| .BR ALbaseline . |
| .SS Font Numbers |
| .PP |
| Text items have an associated font number (the |
| .B fnt |
| field), which is encoded as |
| .BR style*NumSize+size . |
| Here, |
| .B style |
| is one of |
| .BR FntR , |
| .BR FntI , |
| .B FntB |
| or |
| .BR FntT , |
| for roman, italic, bold and typewriter font styles, respectively, and size is |
| .BR Tiny , |
| .BR Small , |
| .BR Normal , |
| .B Large |
| or |
| .BR Verylarge . |
| The total number of possible font numbers is |
| .BR NumFnt , |
| and the default font number is |
| .B DefFnt |
| (which is roman style, normal size). |
| .SS Document Info |
| .PP |
| Global information about an HTML page is stored in the following structure: |
| .PP |
| .EX |
| .ta 6n +\w'DestAnchor* 'u |
| typedef struct Docinfo Docinfo; |
| struct Docinfo |
| { |
| // stuff from HTTP headers, doc head, and body tag |
| Rune* src; |
| Rune* base; |
| Rune* doctitle; |
| Background background; |
| Iimage* backgrounditem; |
| int text; |
| int link; |
| int vlink; |
| int alink; |
| int target; |
| int chset; |
| int mediatype; |
| int scripttype; |
| int hasscripts; |
| Rune* refresh; |
| Kidinfo* kidinfo; |
| int frameid; |
| |
| // info needed to respond to user actions |
| Anchor* anchors; |
| DestAnchor* dests; |
| Form* forms; |
| Table* tables; |
| Map* maps; |
| Iimage* images; |
| }; |
| .EE |
| .PP |
| .B Src |
| gives the URL of the original source of the document, |
| and |
| .B base |
| is the base URL. |
| .B Doctitle |
| is the document's title, as set by a |
| .B <title> |
| element. |
| .B Background |
| is as described in the section |
| .IR "Background Specifications" , |
| and |
| .B backgrounditem |
| is set to be an image item for the document's background image (if given as a URL), |
| or else nil. |
| .B Text |
| gives the default foregound text color of the document, |
| .B link |
| the unvisited hyperlink color, |
| .B vlink |
| the visited hyperlink color, and |
| .B alink |
| the color for highlighting hyperlinks (all in 24-bit RGB format). |
| .B Target |
| is the default target frame id. |
| .B Chset |
| and |
| .B mediatype |
| are as for the |
| .I chset |
| and |
| .I mtype |
| parameters to |
| .IR parsehtml . |
| .B Scripttype |
| is the type of any scripts contained in the document, and is always |
| .BR TextJavascript . |
| .B Hasscripts |
| is set if the document contains any scripts. |
| Scripting is currently unsupported. |
| .B Refresh |
| is the contents of a |
| .B "<meta http-equiv=Refresh ...>" |
| tag, if any. |
| .B Kidinfo |
| is set if this document is a frameset (see section |
| .IR Frames ). |
| .B Frameid |
| is this document's frame id. |
| .PP |
| .B Anchors |
| is a list of hyperlinks contained in the document, |
| and |
| .B dests |
| is a list of hyperlink destinations within the page (see the following section for details). |
| .BR Forms , |
| .B tables |
| and |
| .B maps |
| are lists of the various forms, tables and client-side maps contained |
| in the document, as described in subsequent sections. |
| .B Images |
| is a list of all the image items in the document. |
| .SS Anchors |
| .PP |
| The library builds two lists for all of the |
| .B <a> |
| elements (anchors) in a document. |
| Each anchor is assigned a unique anchor id within the document. |
| For anchors which are hyperlinks (the |
| .B href |
| attribute was supplied), the following structure is defined: |
| .PP |
| .EX |
| .ta 6n +\w'Anchor* 'u |
| typedef struct Anchor Anchor; |
| struct Anchor |
| { |
| Anchor* next; |
| int index; |
| Rune* name; |
| Rune* href; |
| int target; |
| }; |
| .EE |
| .PP |
| .B Next |
| points to the next anchor in the list (the head of this list is |
| .BR Docinfo.anchors ). |
| .B Index |
| is the anchor id; each item within this hyperlink is tagged with this value |
| in its |
| .B anchorid |
| field. |
| .B Name |
| and |
| .B href |
| are the values of the correspondingly named attributes of the anchor |
| (in particular, href is the URL to go to). |
| .B Target |
| is the value of the target attribute (if provided) converted to a frame id. |
| .PP |
| Destinations within the document (anchors with the name attribute set) |
| are held in the |
| .B Docinfo.dests |
| list, using the following structure: |
| .PP |
| .EX |
| .ta 6n +\w'DestAnchor* 'u |
| typedef struct DestAnchor DestAnchor; |
| struct DestAnchor |
| { |
| DestAnchor* next; |
| int index; |
| Rune* name; |
| Item* item; |
| }; |
| .EE |
| .PP |
| .B Next |
| is the next element of the list, |
| .B index |
| is the anchor id, |
| .B name |
| is the value of the name attribute, and |
| .B item |
| is points to the item within the parsed document that should be considered |
| to be the destination. |
| .SS Forms |
| .PP |
| Any forms within a document are kept in a list, headed by |
| .BR Docinfo.forms . |
| The elements of this list are as follows: |
| .PP |
| .EX |
| .ta 6n +\w'Formfield* 'u |
| typedef struct Form Form; |
| struct Form |
| { |
| Form* next; |
| int formid; |
| Rune* name; |
| Rune* action; |
| int target; |
| int method; |
| int nfields; |
| Formfield* fields; |
| }; |
| .EE |
| .PP |
| .B Next |
| points to the next form in the list. |
| .B Formid |
| is a serial number for the form within the document. |
| .B Name |
| is the value of the form's name or id attribute. |
| .B Action |
| is the value of any action attribute. |
| .B Target |
| is the value of the target attribute (if any) converted to a frame target id. |
| .B Method |
| is one of |
| .B HGet |
| or |
| .BR HPost . |
| .B Nfields |
| is the number of fields in the form, and |
| .B fields |
| is a linked list of the actual fields. |
| .PP |
| The individual fields in a form are described by the following structure: |
| .PP |
| .EX |
| .ta 6n +\w'Formfield* 'u |
| typedef struct Formfield Formfield; |
| struct Formfield |
| { |
| Formfield* next; |
| int ftype; |
| int fieldid; |
| Form* form; |
| Rune* name; |
| Rune* value; |
| int size; |
| int maxlength; |
| int rows; |
| int cols; |
| uchar flags; |
| Option* options; |
| Item* image; |
| int ctlid; |
| SEvent* events; |
| }; |
| .EE |
| .PP |
| Here, |
| .B next |
| points to the next field in the list. |
| .B Ftype |
| is the type of the field, which can be one of |
| .BR Ftext , |
| .BR Fpassword , |
| .BR Fcheckbox , |
| .BR Fradio , |
| .BR Fsubmit , |
| .BR Fhidden , |
| .BR Fimage , |
| .BR Freset , |
| .BR Ffile , |
| .BR Fbutton , |
| .B Fselect |
| or |
| .BR Ftextarea . |
| .B Fieldid |
| is a serial number for the field within the form. |
| .B Form |
| points back to the form containing this field. |
| .BR Name , |
| .BR value , |
| .BR size , |
| .BR maxlength , |
| .B rows |
| and |
| .B cols |
| each contain the values of corresponding attributes of the field, if present. |
| .B Flags |
| contains per-field flags, of which |
| .B FFchecked |
| and |
| .B FFmultiple |
| are defined. |
| .B Image |
| is only used for fields of type |
| .BR Fimage ; |
| it points to an image item containing the image to be displayed. |
| .B Ctlid |
| is reserved for use by the caller, typically to store a unique id |
| of an associated control used to implement the field. |
| .B Events |
| is the same as the corresponding field of the generic attributes |
| associated with the item containing this field. |
| .B Options |
| is only used by fields of type |
| .BR Fselect ; |
| it consists of a list of possible options that may be selected for that |
| field, using the following structure: |
| .PP |
| .EX |
| .ta 6n +\w'Option* 'u |
| typedef struct Option Option; |
| struct Option |
| { |
| Option* next; |
| int selected; |
| Rune* value; |
| Rune* display; |
| }; |
| .EE |
| .PP |
| .B Next |
| points to the next element of the list. |
| .B Selected |
| is set if this option is to be displayed initially. |
| .B Value |
| is the value to send when the form is submitted if this option is selected. |
| .B Display |
| is the string to display on the screen for this option. |
| .SS Tables |
| .PP |
| The library builds a list of all the tables in the document, |
| headed by |
| .BR Docinfo.tables . |
| Each element of this list has the following format: |
| .PP |
| .EX |
| .ta 6n +\w'Tablecell*** 'u |
| typedef struct Table Table; |
| struct Table |
| { |
| Table* next; |
| int tableid; |
| Tablerow* rows; |
| int nrow; |
| Tablecol* cols; |
| int ncol; |
| Tablecell* cells; |
| int ncell; |
| Tablecell*** grid; |
| Align align; |
| Dimen width; |
| int border; |
| int cellspacing; |
| int cellpadding; |
| Background background; |
| Item* caption; |
| uchar caption_place; |
| Lay* caption_lay; |
| int totw; |
| int toth; |
| int caph; |
| int availw; |
| Token* tabletok; |
| uchar flags; |
| }; |
| .EE |
| .PP |
| .B Next |
| points to the next element in the list of tables. |
| .B Tableid |
| is a serial number for the table within the document. |
| .B Rows |
| is an array of row specifications (described below) and |
| .B nrow |
| is the number of elements in this array. |
| Similarly, |
| .B cols |
| is an array of column specifications, and |
| .B ncol |
| the size of this array. |
| .B Cells |
| is a list of all cells within the table (structure described below) |
| and |
| .B ncell |
| is the number of elements in this list. |
| Note that a cell may span multiple rows and/or columns, thus |
| .B ncell |
| may be smaller than |
| .BR nrow*ncol . |
| .B Grid |
| is a two-dimensional array of cells within the table; the cell |
| at row |
| .B i |
| and column |
| .B j |
| is |
| .BR Table.grid[i][j] . |
| A cell that spans multiple rows and/or columns will |
| be referenced by |
| .B grid |
| multiple times, however it will only occur once in |
| .BR cells . |
| .B Align |
| gives the alignment specification for the entire table, |
| and |
| .B width |
| gives the requested width as a dimension specification. |
| .BR Border , |
| .B cellspacing |
| and |
| .B cellpadding |
| give the values of the corresponding attributes for the table, |
| and |
| .B background |
| gives the requested background for the table. |
| .B Caption |
| is a linked list of items to be displayed as the caption of the |
| table, either above or below depending on whether |
| .B caption_place |
| is |
| .B ALtop |
| or |
| .BR ALbottom . |
| Most of the remaining fields are reserved for use by the caller, |
| except |
| .BR tabletok , |
| which is reserved for internal use. |
| The type |
| .B Lay |
| is not defined by the library; the caller can provide its |
| own definition. |
| .PP |
| The |
| .B Tablecol |
| structure is defined for use by the caller. |
| The library ensures that the correct number of these |
| is allocated, but leaves them blank. |
| The fields are as follows: |
| .PP |
| .EX |
| .ta 6n +\w'Point 'u |
| typedef struct Tablecol Tablecol; |
| struct Tablecol |
| { |
| int width; |
| Align align; |
| Point pos; |
| }; |
| .EE |
| .PP |
| The rows in the table are specified as follows: |
| .PP |
| .EX |
| .ta 6n +\w'Background 'u |
| typedef struct Tablerow Tablerow; |
| struct Tablerow |
| { |
| Tablerow* next; |
| Tablecell* cells; |
| int height; |
| int ascent; |
| Align align; |
| Background background; |
| Point pos; |
| uchar flags; |
| }; |
| .EE |
| .PP |
| .B Next |
| is only used during parsing; it should be ignored by the caller. |
| .B Cells |
| provides a list of all the cells in a row, linked through their |
| .B nextinrow |
| fields (see below). |
| .BR Height , |
| .B ascent |
| and |
| .B pos |
| are reserved for use by the caller. |
| .B Align |
| is the alignment specification for the row, and |
| .B background |
| is the background to use, if specified. |
| .B Flags |
| is used by the parser; ignore this field. |
| .PP |
| The individual cells of the table are described as follows: |
| .PP |
| .EX |
| .ta 6n +\w'Background 'u |
| typedef struct Tablecell Tablecell; |
| struct Tablecell |
| { |
| Tablecell* next; |
| Tablecell* nextinrow; |
| int cellid; |
| Item* content; |
| Lay* lay; |
| int rowspan; |
| int colspan; |
| Align align; |
| uchar flags; |
| Dimen wspec; |
| int hspec; |
| Background background; |
| int minw; |
| int maxw; |
| int ascent; |
| int row; |
| int col; |
| Point pos; |
| }; |
| .EE |
| .PP |
| .B Next |
| is used to link together the list of all cells within a table |
| .RB ( Table.cells ), |
| whereas |
| .B nextinrow |
| is used to link together all the cells within a single row |
| .RB ( Tablerow.cells ). |
| .B Cellid |
| provides a serial number for the cell within the table. |
| .B Content |
| is a linked list of the items to be laid out within the cell. |
| .B Lay |
| is reserved for the user to describe how these items have |
| been laid out. |
| .B Rowspan |
| and |
| .B colspan |
| are the number of rows and columns spanned by this cell, |
| respectively. |
| .B Align |
| is the alignment specification for the cell. |
| .B Flags |
| is some combination of |
| .BR TFparsing , |
| .B TFnowrap |
| and |
| .B TFisth |
| or'd together. |
| Here |
| .B TFparsing |
| is used internally by the parser, and should be ignored. |
| .B TFnowrap |
| means that the contents of the cell should not be |
| wrapped if they don't fit the available width, |
| rather, the table should be expanded if need be |
| (this is set when the nowrap attribute is supplied). |
| .B TFisth |
| means that the cell was created by the |
| .B <th> |
| element (rather than the |
| .B <td> |
| element), |
| indicating that it is a header cell rather than a data cell. |
| .B Wspec |
| provides a suggested width as a dimension specification, |
| and |
| .B hspec |
| provides a suggested height in pixels. |
| .B Background |
| gives a background specification for the individual cell. |
| .BR Minw , |
| .BR maxw , |
| .B ascent |
| and |
| .B pos |
| are reserved for use by the caller during layout. |
| .B Row |
| and |
| .B col |
| give the indices of the row and column of the top left-hand |
| corner of the cell within the table grid. |
| .SS Client-side Maps |
| .PP |
| The library builds a list of client-side maps, headed by |
| .BR Docinfo.maps , |
| and having the following structure: |
| .PP |
| .EX |
| .ta 6n +\w'Rune* 'u |
| typedef struct Map Map; |
| struct Map |
| { |
| Map* next; |
| Rune* name; |
| Area* areas; |
| }; |
| .EE |
| .PP |
| .B Next |
| points to the next element in the list, |
| .B name |
| is the name of the map (use to bind it to an image), and |
| .B areas |
| is a list of the areas within the image that comprise the map, |
| using the following structure: |
| .PP |
| .EX |
| .ta 6n +\w'Dimen* 'u |
| typedef struct Area Area; |
| struct Area |
| { |
| Area* next; |
| int shape; |
| Rune* href; |
| int target; |
| Dimen* coords; |
| int ncoords; |
| }; |
| .EE |
| .PP |
| .B Next |
| points to the next element in the map's list of areas. |
| .B Shape |
| describes the shape of the area, and is one of |
| .BR SHrect , |
| .B SHcircle |
| or |
| .BR SHpoly . |
| .B Href |
| is the URL associated with this area in its role as |
| a hypertext link, and |
| .B target |
| is the target frame it should be loaded in. |
| .B Coords |
| is an array of coordinates for the shape, and |
| .B ncoords |
| is the size of this array (number of elements). |
| .SS Frames |
| .PP |
| If the |
| .B Docinfo.kidinfo |
| field is set, the document is a frameset. |
| In this case, it is typical for |
| .I parsehtml |
| to return nil, as a document which is a frameset should have no actual |
| items that need to be laid out (such will appear only in subsidiary documents). |
| It is possible that items will be returned by a malformed document; the caller |
| should check for this and free any such items. |
| .PP |
| The |
| .B Kidinfo |
| structure itself reflects the fact that framesets can be nested within a document. |
| If is defined as follows: |
| .PP |
| .EX |
| .ta 6n +\w'Kidinfo* 'u |
| typedef struct Kidinfo Kidinfo; |
| struct Kidinfo |
| { |
| Kidinfo* next; |
| int isframeset; |
| |
| // fields for "frame" |
| Rune* src; |
| Rune* name; |
| int marginw; |
| int marginh; |
| int framebd; |
| int flags; |
| |
| // fields for "frameset" |
| Dimen* rows; |
| int nrows; |
| Dimen* cols; |
| int ncols; |
| Kidinfo* kidinfos; |
| Kidinfo* nextframeset; |
| }; |
| .EE |
| .PP |
| .B Next |
| is only used if this structure is part of a containing frameset; it points to the next |
| element in the list of children of that frameset. |
| .B Isframeset |
| is set when this structure represents a frameset; if clear, it is an individual frame. |
| .PP |
| Some fields are used only for framesets. |
| .B Rows |
| is an array of dimension specifications for rows in the frameset, and |
| .B nrows |
| is the length of this array. |
| .B Cols |
| is the corresponding array for columns, of length |
| .BR ncols . |
| .B Kidinfos |
| points to a list of components contained within this frameset, each |
| of which may be a frameset or a frame. |
| .B Nextframeset |
| is only used during parsing, and should be ignored. |
| .PP |
| The remaining fields are used if the structure describes a frame, not a frameset. |
| .B Src |
| provides the URL for the document that should be initially loaded into this frame. |
| Note that this may be a relative URL, in which case it should be interpretted |
| using the containing document's URL as the base. |
| .B Name |
| gives the name of the frame, typically supplied via a name attribute in the HTML. |
| If no name was given, the library allocates one. |
| .BR Marginw , |
| .B marginh |
| and |
| .B framebd |
| are the values of the marginwidth, marginheight and frameborder attributes, respectively. |
| .B Flags |
| can contain some combination of the following: |
| .B FRnoresize |
| (the frame had the noresize attribute set, and the user should not be allowed to resize it), |
| .B FRnoscroll |
| (the frame should not have any scroll bars), |
| .B FRhscroll |
| (the frame should have a horizontal scroll bar), |
| .B FRvscroll |
| (the frame should have a vertical scroll bar), |
| .B FRhscrollauto |
| (the frame should be automatically given a horizontal scroll bar if its contents |
| would not otherwise fit), and |
| .B FRvscrollauto |
| (the frame gets a vertical scrollbar only if required). |
| .SH SOURCE |
| .B \*9/src/libhtml |
| .SH SEE ALSO |
| .IR fmt (1) |
| .PP |
| W3C World Wide Web Consortium, |
| ``HTML 4.01 Specification''. |
| .SH BUGS |
| The entire HTML document must be loaded into memory before |
| any of it can be parsed. |