Legato
Legato

GoFiler Legato Script Reference

 

Legato v 1.5e

Application v 5.25b

  

 

Chapter FiveGeneral Functions (continued)

5.8 Word Parsing

5.8.1 Overview

The Word Parse Object supports a series of functions that are specifically tailored to “parse” or process textual data. The general-purpose parse runs in three modes: text, tags and program. The text mode provides word parsing for reading general text. The tags mode is tailored to work on XML, HTML or SGML tags and character entities and finally a program mode that is tailored for working with typical program or script text.

The text mode basically stops on word spaces, returns (line endings), and punctuation with the textual information. It is up to the script program to perform any additional analysis.

The tag mode is a basic SGML parser. It does not contain robust checking and error recovery, nor are there any provisions for DTD support. For a more expansive SGML parser, use the SGML Object which provides more complex and robust parsing. Tag mode is a simple parser and does not handle multiple lines unless a complete buffer with line endings is handed to the parse object. Other simple SGML functions to get components can be used to get the element and attributes.

Tag mode can be used with SGML and tag functions such as GetTagElement or GetTagAttributes.

Two programming modes are provided that stop on basic program syntax such as “==”, “+”, “=”, etc. In addition there is a program group mode that allows grouped items such as data in parenthesis, brackets and quotes to be grouped together.

Finally, an object notation parse mode allows for the parsing and separation of object names. This mode employs a much smaller set of stop limiters. Note that in order to support loose JSON object names, this mode allows for a number of programming delimiters to be inside of names. For example, “object.name” is broken into “object” “.” and “name” while “object.my-cat” is broken into “object” “.” and “my-cat”, not stopping on the ‘-’ character.

The Word Parsing function is meant to be lightweight and fast. It can be used to quick drive through large amounts of data.

5.8.2 Basic Operation

The general steps are as follows:

–  Create (get handle)

–  Load/Set Data

–  Iterate and Get Words/Item until End of Data (EOD).

New data can be repeatedly loaded to the same object to process multiple buffers or lines. After completion, the Word Parse Object handle should be closed.

As each item is parsed, the leading spaces and statistics are stored. For example, the caller can check to see if there are leading spaces and even return the raw space string. 

The other routines pass additional information regarding word parse results such as the starting position of the item and the current parse position. The word/item buffer is limited to a maximum of 4,096 bytes.

Once the source data is set, the source variable can be changed or released. The Word Parse Object makes an internal copy of the data.

5.8.3 Setting Up a Parse Operation

The first action is to create a word parse object and retrieve a handle. That handle is then used in subsequent operations to move through the text and examine each parsed item.

For example:

        handle          hWP;
        string          s1, s2;
        int             rc, flags, spaces, count, pos;
        
        s1 = "My favorite pastime is waiting for my browser to load a page.";

        hWP = WordParseCreate();
        if (hWP == NULL_HANDLE) {
          MessageBox('x', "Error on handle");
          exit;
          }
        WordParseSetData(hWP, s1);
        s2 = WordParseGetWord(hWP);
        while (s2 != "") {
          count++;
          pos = WordParseGetPosition(hWP);
          flags = WordParseGetResult(hWP);
          spaces = WordParseGetSpaceSize(hWP);
          AddMessage("   %3d %3d %08X %3d :%s:", count, pos, flags, spaces, s2);
          s2 = WordParseGetWord(hWP);
          }
        CloseHandle(hWP);

In this case, the parse object is created with the default mode (text). A string is added to the parse object and then each successive word is retrieved along with certain attributes.

Functions are provided to retrieve and change the parsing position. In addition, a parse object can be used over and over again, provided the parse mode remains the same.

5.8.4 Word Parse Functions

Object Control:

WordParseCreate — Creates a Word Parse Object with options.

WordParseSetData — Sets a buffer (line, etc) into a parse object.

Item Parse:

WordParseGetWord — Parsed the next item or word and returns a string.

WordParseGetPosition — Gets the position of the zero-based X last parsed word.

WordParseSetPosition — Sets the zero-based X position for the next parse action.

Item Statics:

WordParseGetEndX — Gets the ending zero-based X position of the last item.

WordParseGetResult — Returns flags from last word parsed.

WordParseGetSpace — Gets the leading space string from the last word parsed.

WordParseGetSpaceSize — Gets leading space as count from the last word parsed.

WordParseGetStartX — Gets the starting zero-based X position of the last item.

WordParseHasSpace — Tests for leading space while parsing the last word.

Related Functions:

GetListItemType — Scans a word and returns its characteristics as a list item.

GetNumericType — Scans a number (word) and returns its characteristics.

GetNthWord — Returns a word at a specific position within a string.

GetWordType — Gets the type of word and word characteristics of a string.

WordsToArray — Parses a string and returns an array of words.