Legato
Legato

GoFiler Legato Script Reference

 

Legato v 1.4j

Application v 5.22b

  

 

Chapter FiveGeneral Functions (continued)

GetWordType Function

Overview

The GetWordType function analyzes the content of a provided word and returns the type and attributes.

Syntax/Parameters

Syntax

dword = GetWordType ( string data );

Parameters

data

A string containing a word with no spaces.

Return Value

Returns a dword containing the attributes from the word scan.

Remarks

The GetWordType function scans the content of data and returns a composite bitwise value with the type and attributes of the word:

  Definition   Bitwise   Description
  Item Types        
    WT_TYPE_ITEM_MASK   0x000F0000   Item Type Mask
    WT_TYPE_UNKNOWN   0x00000000   Unknown Value
    WT_TYPE_WORD   0x00010000   Word (dog, cat, monkey)
    WT_TYPE_NUMBER   0x00020000   Number
    WT_TYPE_NUMBER_SERIAL   0x00030000   Serial Number (12, 63)
    WT_TYPE_LEADER   0x00040000   Leader Line
    WT_TYPE_RULER   0x00050000   Ruler (possible or dash, nil)
    WT_TYPE_CURRENCY_LEADER   0x00060000   Opening Currency “$  1,121”
    WT_TYPE_NIL   0x00070000   Nil or Compound Nil “--(a)” or “—” or “$-”
    WT_TYPE_DATE   0x00080000   Date “12/12/12”, “12.12.12”, “23:22” or ISO
  Word Variations        
    WT_WORD_MASK   0x00700000   Word Type Mask
  Types        
    WT_WORD_UNKNOWN   0x00000000   Unknown or General Word Type
    WT_WORD_LOWER   0x00100000   Lower Case Word
    WT_WORD_UPPER   0x00200000   Upper Case Word
    WT_WORD_INITIAL   0x00300000   Initial Capital
  Word Flags        
    WT_WORD_TRAIL_MASK   0x000000FF   Punctuation (low in char)
    WT_WORD_TRAIL_PUNCTUATION   0x00800000   Trails Punctuation (in low char)
    WT_WORD_QUOTED   0x01000000   Word Quoted (can be partial)
    WT_WORD_IN_HOLE   0x02000000   Word has Parenthesis or Brackets
    WT_WORD_LEADER_TRAIL   0x04000000   Word has a Trailing Leader Line
  Lexicon        
    WT_WORD_LEXICON_MASK   0x70000000   Lexicon Mask
    WT_WORD_DATE_MONTH   0x10000000   Word is in Month Lexicon
    WT_WORD_DATE_DAY   0x20000000   Word is in Day Lexicon
    WT_WORD_HONORIFIC   0x30000000   Word is in Honorific Lexicon
  Number Variations        
    WT_NUMBER_ALIGN_MASK   0x000000FF   Alignment Position at Size
  Types        
    WT_NUMBER_MASK   0x00700000   Number Type Mask
    WT_NUMBER_UNKNOWN   0x00000000   Unknown Type
    WT_NUMBER_YEAR   0x00100000   Number is Year (1900-2099)
    WT_NUMBER_DAY   0x00200000   Number is Day (1-31)
    WT_NUMBER_FORMATTED   0x00300000   Number is Formatted
    WT_NUMBER_LIST   0x00400000   Part of a List (1-99 with trail)
  Number Flags        
    WT_NUMBER_NEGATIVE   0x01000000   Negative Number (000) or -000
    WT_NUMBER_IN_HOLE   0x02000000   Negative Number (000)
    WT_NUMBER_FOOTNOTE   0x04000000   Has Footnote
    WT_NUMBER_CURRENCY   0x08000000   Has Currency
    WT_NUMBER_PERCENT   0x10000000   Has Percent
    WT_NUMBER_IN_HOLE_ERROR   0x20000000   Error in Parenthetical
    WT_NUMBER_BAD_FORMAT   0x40000000   Bad Format (characters, not structure)
  Leader Variation        
    WT_LEADER_SIZE_MASK   0x00000FFF   Word Type Mask (character in bottom)
  Ruler Variations        
    WT_RULER_MASK   0x00700000   Drawing Character in the Lower 8-bits
    WT_RULER_CHARACTER   0x000000FF   Mask for Ruler Character
  Ruler Types        
    WT_RULER_MIXED   0x00000000   Of Indeterminate Type
    WT_RULER_SUBTOTAL   0x00100000   Subtotal Type
    WT_RULER_TOTAL   0x00200000   Total Type
  Ruler Flags        
    WT_RULER_DASH   0x01000000   Possible Connecting Dash
  Date Variations        
    WT_DATE_MASK   0x0F000000   Date Code Mask
    WT_DATE_AS_GENERAL   0x00000000   Date as Any Type (short mm/yy not supported)
    WT_DATE_ISO_8601   0x01000000   Date as ISO (in part, w w/o time)
    WT_DATE_TIME_ONLY   0x02000000   A Time with Optional AM/PM
  Unknown Word Data        
    WT_UNKNOWN_ALPHA   0x0000000F   Alpha Count
    WT_UNKNOWN_NUMERIC   0x000000F0   Numeric Count
    WT_UNKNOWN_CURRENCY   0x00000300   Currency Count (4)
    WT_UNKNOWN_PUNCTUATION   0x00000C00   Sentence Punctuation Count (4)
    WT_UNKNOWN_COMMA_PERIOD   0x00003000   Comma Period Count
    WT_UNKNOWN_GROUP   0x0000C000   Parenthesis/Brace Group
    WT_UNKNOWN_QUOTE   0x00300000   Quote Character Count
    WT_UNKNOWN_FOOTNOTE   0x00C00000   Footnote Type Characters
    WT_UNKNOWN_RULE   0x03000000   Rule Character Count
    WT_UNKNOWN_ELLIPSE   0x0C000000   Ellipse Count
    WT_UNKNOWN_OTHER   0x30000000   Other Count

 

The item type can be filtered by ANDing the result with the WT_TYPE_ITEM_MASK value:

code = GetWordType(word);
switch (code & WT_TYPE_ITEM_MASK) {
  case WT_TYPE_UNKNOWN:
    break;
  case WT_TYPE_WORD:
    break;
  case WT_TYPE_NUMBER:
    break;
  case WT_TYPE_NUMBER_SERIAL:
    break;
  case WT_TYPE_LEADER:
    break;
  case WT_TYPE_RULER:
    break;
  case WT_TYPE_CURRENCY_LEADER:
    break;
  case WT_TYPE_NIL:
    break;
  case WT_TYPE_DATE:
    break;
  }

Each case section can then count or act upon the details of the item.

The GetWordType function is useful for aggregating information from a text stream to perform high-level analysis. For example, a line of text can be parsed, information accumulated, and the first and last word data examined to determine the probability of the line being a heading, part of a paragraph, or perhaps a row of a table.

Analysis is performed on a gross level basis. That is, types of characters are counted and then run through logic to perform a basic analysis. For example, if one or two dashes are present without text, the content will be considered a “nil” value as would be seen in a table. On the other hand, three dashes would be considered as a possible rule or visual aid.

Other functions, such as the GetNumericType and the GetListItemType functions, can return more details regarding a number.

The word should be passed to the GetWordType function without spaces. If the Word Parse Object is employed with WP_GENERAL mode, the data returned is compatible with analysis. See the WordParseCreate function.

Related Functions

Platform Support

Go13, Go16, GoFiler Complete, GoFiler Corporate, GoFiler, GoFiler Lite, GoXBRL

Legato IDE, Legato Basic