Legato
Legato

GoFiler Legato Script Reference

 

Legato v 1.5e

Application v 5.25b

  

 

Chapter ElevenSGML Functions (continued)

11.3 The SGML Object

11.3.1 General

Legato provides for a couple of levels of Standard Generalized Markup Language (SGML) support. The SGML umbrella covers HTML, XML, XBRL and a number of other SGML style formats.

The most basic level of SGML parsing is covered in the General Functions area allowing for simple parsing and processing (GetTagElement, GetTagAttributes and GetTagTextContent functions). This set of routines is not very robust and parsed solely on a syntactical level. However, they are very convenient when combined with the general word parse functions WordParseCreate and WordParseGetWord functions to quickly and easily look at SGML style data.

The SGML Object on the other hand is designed to deal with a wide variety of input data and also to process poorly formed or structured data. The SGML operates on three levels: reading or parsing, element data access and validating, and writing. It works side by side with File/Document data which may be in the form of an Edit View, File Object or simple string. If an SGML Object is setup with a file or string, it creates its own Mapped Text Object. It can also be directly linked to an existing Mapped Text Object.

 

The SGML Parser allows data to be read from a number of sources, all of which are transcribed into a Mapped Text Object and referenced on a line and character basis. A Document Type Definition (DTD) is automatically attached to the SGML Object for testing and formatting. Four types of data are gathered during a parse operation: tags, entities, words and spaces. The SGMLNextItem function moves forward and reads until one of the four items is gathered and returned. Alternatively, if a script is only interested in the coding, the SGMLNextElement function skips spaces, words and entities known as Parsed Character Data (PCDATA) and simply returns the next tag. The parsing range can be full file or restricted to a specific range. Parsing can also be restarted at any point within the file. It is important to note that if a tag write operation is performed, the parse position will be automatically set to the end of the written tag.

When an element is parsed, it is automatically further analyzed and placed into the SGML Element Class (see below). Each attribute or property can then be accessed, deleted or altered via functions such as SGMLGetParameter, SGMLDeleteParameter and SGMLPutParameter functions. For ease of parsing HTML, CSS inline styles are also loaded and treated in the same manner as HTML attributes. In other words, the ‘STYLE’ attribute is processed into individual properties that can be accessed in the same manner as attributes.

Elements, attributes and properties are all defined and stored the DTD and, for certain well known DTDs such as HTML, have predefined token values which are also defined in the SDK and described in Appendix A — Legato SDK Standard Definitions. For standard W3C defined elements, attributes and properties, SDK defined token values can be referenced with the following standard prefixes:

HT_ — HTML token, for example HT_TABLE or HT_P for “<TABLE>” and “<P>”

HA_ — HTML attribute. for example, HA_ALIGN or HA_SIZE for “ALIGN=” or “SIZE=”

CP_ — CSS property, for example, CP_TEXT_ALIGN or CP_MARGIN_TOP for “text-align” or “margin-top”

Note that anywhere a dash ‘-’ is used in a name, it is replaced with an underbar ‘_’.

The DTD Object can also contain specific validations for attributes, tag nesting and tag order. As noted above, there is a link between the Mapped Text Object and the DTD Object. Mapped Text Objects can be linked to a DTD Object. This is particular important for Edit Windows such that the DTD Object, once created, is constantly reused. In addition, depending on the Edit View and SGML creation method, the DTD Object will scan the header of the file to locate a DTDTYPE or xmlns schema namespace references.

A DTD can be ad hoc or simply not used. To a certain extent, this cripples the SGML Object and significantly reduces the validation that can be performed by the object. However, by using element and attribute names as strings, a program can take advantage of the parsing action and functions.

For XML parsing, elements and attributes are always case-sensitive including camel case notation. For HTML (except XHTML), the case is not relevant except for writing. During the write the case matches the settings in the DTD.

Each item type parsed by the SGML Object also has characteristics such as its position and size stored. These can be referenced for later use or used to write data back to the source location.

Finally, the default size of a SGML element is approximately 4.096 characters and 48 total attributes/properties.

11.3.2 SGML Element Class

Throughout this section, the Element Class will be referenced. This is an essential part of the SGML Object that consists of three main components: a element token (and namespace), a parameter list and a heap. The element token defines the type of tag, such as <TABLE> or <CAT>. The parameter list is a list of all discrete attributes and CSS properties. List items can be read, altered and deleted. Finally, the heap is a storage area for arrays and strings that serve the parameter list.

The Element Class is reset loaded when an item is parsed or the SGML Object is reset using the SGMLResetElement function. It is loaded when a tag is parsed or information is added by using functions such as SGMLSetParameter. When loading during a parse operation using SGMLNextElement or SGMLNextItem, the list of parameter is validated and disassembled as required. For example, a STYLE attributes if broken into CSS properties and CSS properties as shorthand as further broken into their individual properties. So,

<P STYLE="margin: 0; font: 10pt/12pt Times New Roman, Times, Serif">

Translates to:

Element P

margin-left: 0

margin-top: 0

margin-right: 0

margin-botton: 0

font-family: Times New Roman, Times, Serif

font-size: 10pt

line-spacing: 12pt

When the element class is rendered as a tag, the shorthand and CSS inline styles are processed to write a uniform tag. This does mean a tag can be parsed and written differently depending on the arrangements and errors.

As part of the SGML Object, an error log is also maintained as created during the tag to element conversion.

11.3.3 Parsing

When parsing, items are retrieved as four different types: tags, words, character entities and spaces. Spaces are counted separately as space (0x20), tab (0x09), returns (0x0D), new lines (0x0A) and page breaks (0x0C). With the exception of the preformatted PRE element, all spaces count as a single justifiable word space.

The parse action always accumulates properties such as x/y position, type, parsing results (errors) and text of item. Tags are further analyzed and broken into components as part of the SGML Element Class.

11.3.4 SGML Object Functions

Basic Support (not using the SGML Class):

GetTagAttributes — Gets the attributes of an SGML tag.

GetTagElement — Gets the element portion of an SGML tag with optional namespace.

GetTagTextContent — Gets the text/PCDATA content up to the next SGML tag.

SGML Object Control Functions:

SGMLCreate — Creates an SGML Object and optionally attaches a specific object. Use CloseHandle to release.

SGMLClearAttributes — Clears only attributes from the specified SGML Object’s element.

SGMLClearParameters — Clears all parameters from the specified SGML Object’s element.

SGMLGetDTDType — Returns the DTD type code for the specified SGML Object.

SGMLResetElement — Resets the content of the element and adds optional an element.

SGMLSetCommentMode — Sets the mode for processing SGML comments.

SGMLSetHTMLDTD — Sets up a predefined well-known Document Type Definition.

SGMLSetFile — Sets a file or URL into an existing SGML Object.

SGMLSetHandle — Sets an Edit or Mapped Text Object, or edit window handle, into an existing SGML Object.

SGMLSetString — Sets a string of data into an existing SGML Object.

SGMLSetUTFEncoding — Sets or reads the UTF-8 flag for an SGML Object.

SGML Parsing Functions:

SGMLGetElementString — Returns the current element as a string value.

SGMLGetElementToken — Returns the current element as a token value.

SGMLGetNamespaceString — Returns the current element namespace as a string value.

SGMLGetNamespaceToken — Returns the current element namespace as a token value.

SGMLErrorsToLog — Transfers parsing errors to log.

SGMLFindClosingElement — Scans to find closing element and gathers code or text while parsing.

SGMLFindElement — Finds a specific element with optional starting position.

SGMLGetCharacterValue — Gets the character value for the current item.

SGMLGetItemPosition — Gets the position for the last parsed item.

SGMLGetItemPosEX — Gets the position for the ending X last parsed item.

SGMLGetItemPosEY — Gets the position for the ending Y last parsed item.

SGMLGetItemPosSX — Gets the position for the starting X last parsed item.

SGMLGetItemPosSY — Gets the position for the starting Y last parsed item.

SGMLGetItemResult — Gets the parse result flags.

SGMLGetItemSize — Gets the size of a parsed item.

SGMLGetItemType — Gets the last parsed item type.

SGMLIsEmptyElement — Tests parsed element as an empty element (no content expected).

SGMLNextElement — Gets the next element (skips spaces, text, entities) at the current or specified parse position.

SGMLNextItem — Gets the next item at the current or specified parse position.

SGMLNormalizeErrors — Removes attributes and properties containing errors.

SGMLNormalizeToCSS — Promotes HTML attributes to CSS as appropriate.

SGMLSetDataRange — Sets (restricts) the data range. Default end position is end of file (object).

SGMLSetPosition — Sets the parse position in terms of X/Y.

SGML Writing Functions:

SGMLGetSegment — Gets segment based on finding the close element.

SGMLToString — Converts the currently stored tag (element and attributes) to a string.

SGMLWriteTag — Creates the tag and writes the data to the object managed by the destination handle.

SGML Parameters/Properties Functions:

SGMLDeleteParameter — Deletes a specified parameter.

SGMLGetParameter — Gets a specified parameter (attribute/CSS) as a string.

SGMLGetParameters — Returns all the parameters in the form of a string list.

SGMLGetParameterValue — Gets a specified parameter as a pvalue.

SGMLPutParameter — Either puts a parameter value into the SGML Element Class, deletes the parameter if empty, or ignores the parameter if the value is mixed.

SGMLSetParameter — Sets a parameter into the SGML Element Class. If the parameter already exists, it is overwritten.

SGMLSetElement — Sets the element.

SGML Value Functions:

SGMLAddValues — Adds a and b pvalue types.

SGMLCharEntityToValue — Converts the specific entity as decimal, hex or name to a character value.

SGMLDivideValue — Divides pvalue a by a floating-point divisor.

SGMLFloatToValue — Converts a floating-point to a pvalue.

SGMLIntegerToValue — Converts a simple integer to a pvalue of certain units.

SGMLIsDataAvailable — Test for data being available (not default, mixed, empty or zero).

SGMLIsImplied — Test value for being implied (as empty, default or PT_IMPLIED).

SGMLIsMeasurement — Test value for being a measurement type.

SGMLIsMixed — Test value as mixed (PT_MIXED or “mixed”).

SGMLIsWholeUnits — Test that a measurement has no factional values.

SGMLMultiplyValue — Multiplies pvalue a times a floating-point value.

SGMLNormalizeUnits — Normalizes or converts the units from data to base and returns a normalized value.

SGMLPixelsToValue — Converts pixels to a pvalue (value * 100 | PT_PX).

SGMLStringToValue — Converts a string to pvalue type.

SGMLSubtractValues — Subtracts value b from a. Values must be of the same unit type.

SGMLTWIPSToValue — Converts twips to a pvalue with optional rounding flag.

SGMLValueToData — Removes the unit type and extends the sign of a value.

SGMLValueToInteger — Removes the unit type, adds sign as required and adjusts ranging.

SGMLValueToPixels — Converts value to pixels with optional box and em size for percentage and ems units).

SGMLValueToString — Converts from a pvalue type.

SGMLValueToTWIPS — Converts value to pixels with optional box and em size for percentage and ems units).

SGML DTD Functions:

SGMLAttributeToString — Converts the attribute token to a string.

SGMLElementToString — Converts the element token to a string.

SGMLGetDTDType — Returns the DTD type code for the specified SGML Object.

SGMLSetHTMLDTD — Sets up a predefined well-known Document Type Definition.

SGMLStringToAttribute — Converts the string version of an attribute name to a token value.

SGMLStringToElement — Converts the string version of an element name to a token value.