Legato
Legato

GoFiler Legato Script Reference

 

Legato v 1.4j

Application v 5.22b

  

 

Chapter SixFile Functions (continued)

6.6 Mapped Text Objects

6.6.1 Overview

Editing text is the foundation of many computer applications, and there are a lot of ways to access and manage textual data and the editing process. As a user changes text, characters and lines of data must be altered, deleted, or inserted, which in turn requires potentially large swaths of data be shifted to accommodate the changes. Actions like these can become extremely cumbersome and time consuming. In addition, there are times a script might want to randomly access a data within a text file, say by a line. If there is a large chunk of text in memory or a file and a script needs to go to line 3017, the program would have to count line endings from the start of the data until it reached the 3017th link.

One method of improving access is to treat a text file like a database which is exactly what a Mapped Text Object (MTO) does. This provides instant random access.

Another benefit of the using a map scheme is that the program can carry additional functionality and information. A few beneficial features of the MTO is the capacity to undo and repeat actions and to perform file recovery. A MTO is used in every edit window within GoFiler, even ones based on binary data.

A MTO consists of six major components:

Data Management — This section deals with opening and mapping files or strings. It also manages saving or exporting. Saving causes all changes to be written and the file remapped, while exporting merely writes each line to a specified unrelated file. Saving will also manage backup files depending on the application’s settings.

Segment Processing — Segment processing is the highest level of data access. It allows for textual data to be treated as x/y to x/y segments. Segment access can also operate on a transactional basis, allowing Undo and Redo information to be stored and played back.

Meta Data Management — Meta data, including editing statistics, caller data, and line by line flags, can be managed through the MTO. Meta data also includes information such as what windows, if any, are associated with the MTO.

Entry Point Table (EPT) — The EPT is essentially the database index of the file on a line-by-line basis. Changes to the data at this level are not tracked. EPT access function can also manage tab characters handing native versus realized data.

File Recovery Journal (FRJ) — The FRJ is closely related to the Undo operation. As segments are modified, recovery records are added.

Dirty List Management (DLM) —The DLM is used to aid in the processing of data changes in the background.

6.6.2 Entry Point Table

The heart of the MTO is the Entry Point Table (EPT). Each line in a file will have an entry in the EPT. The caller can access data via segments or directly line-by-line:

As lines are revised, the table is updated to point to a temporary area with the latest data.

During the initial mapping operation, the first two bytes of the file are examined to determine if the content is Unicode. Unicode is byte ordered as “little endian” or “big endian” depending on the source system, meaning that the 16-bit words are made from bytes as least and most significant, or, most and least significant. This is largely a function of the source system’s underlying CPU and operating system. While loading Unicode, each line is checked for character above 0xFF (more than 8 bits) and a flag is set to indicate the presence of 16-bit data. It is important to note that the current class does not support writing Unicode and Legato does not presently support 16-bit strings.

If Unicode is not detected, the content is treated as 8-bit with characters below 0x20 (spaces are treated as control characters). This is normally ANSI, ASCII or some variation such as UTF-8. Regardless of 8 or 16-bit, return (0x0D) and line feed (0x0A) characters are checked and a determination is made on how to process line endings. Again, depending on the source system and software, there are various combinations. The line ending mode is generally automatically detected and compensated for, even with mixed modes. Tab characters (0x09) are also detected such that the caller can determine a process method. Any zero bytes in the stream will result in the content being marked as binary. The caller can then determine how to process the data.

As a compromise to save space for each EPT record, some limits are in place. First, the location address is up to 30-bits in size (the top two bits are used for location control). Therefore, the maximum size of the mapped source and working data in memory is limited to 1 gb, or 1,073,741,823 bytes, each. This in turn means an MTO cannot handle files larger than 1 gb in size, and due to the restrictions on the working data, files between 500 mb and 1 gb may encounter issues in edit tracking. If the file is manipulated enough without saving, the working data limit can be exceeded. This is not generally a problem and we have not encountered that limit. The second is the width of a line is limited to 1,048,575 bytes. Larger lines can be mapped and controlled. This will be discussed later. Again, well-formed text and code should not hit these limits.

Each entry also includes meta data for tracking line level changes, markers or any other data the caller desires. However, the meta data is in binary bits with a limited number of positions for each entry.

The EPT table also contains “attribute” data. This information is not presently available to scripts. It is used to represent font, style and other object reference data.

During a save operation the data to which the map points is transferred back to the file or to a new file. That file is mapped and becomes the ‘Original File’.

6.6.3 Supporting the Application Frame

As mentioned above, all edit windows (and some non-edit windows), rely on MTO as their primary file. MTOs can stand alone, but it is important to understand the larger scheme if a Legato script is to effectively access and manipulate data in the larger sandbox:

All edit windows also rely on an intermediate class known as the “Edit Object.” The MTO is not aware (and in fact does not care) where your caret is, what is selected on the screen, what your viewport position is, etc. This is where the Edit Object comes in. They only exist when a user of some action causes a change or inspection of data. Many MTO functions will work with an MTO handle or an Edit Object handle. Since Edit Objects carry positional information, they are unique to the window and are destroyed when the editing session has been completed. Note that an MDI edit window can have many views; all of them will point to a single MTO shared by the views.

A number of functions exist to help access Edit Objects:

handle = GetActiveEditObject ( );

The GetActiveEditObject function can be used with a hook to get the current or active edit window’s Edit Object. Note that using this function in a script running from the IDE will result in the script window’s Edit Object being returned. A specific window’s edit object can be accessed using:

handle = GetEditObject ( handle hwTarget | int index | string name );

Likewise, a handle to the MTO can be retrieved:

handle = GetMappedTextObject ( handle hwTarget | int index | string name );

Both of the above functions take a window handle, an edit window index number, or a filename as an input parameter. If the functions fail, the returned handle value will be NULL_HANDLE.

6.6.4 File Locking

Since the EPT references a file throughout the document edit life cycle, the source file cannot be changed without being remapped. As a consequence, all files are locked by necessity. However, read sharing is allowed. Read-only files can be opened, but the save operation is prohibited. However, the data within the file is not updated until the file is saved.

6.6.5 Opening a File

As shown above, an active MTO handle can be retrieved for an edit window. But what if you want to have your own MTO? Then we open or create one:

handle = OpenMappedTextFile ( string name, [dword mode] );

On success, OpenMappedTextFile returns a handle. The name can be any filename and path, including HTTP references although HTTP references are treated as read-only. On failure, use the GetLastError function to retrieve a formatted error code. The optional mode parameter specifies how to open the file:

Definition   Bitwise   Description
MFC_OPEN_READ   0x00000008   Open as read-only share
MFC_ALLOW_READ_ONLY   0x00000010   Open allow read-only
MFC_NO_CACHE   0x00000080   Do not read cache (HTTP)
MFC_RECOVERY_TRACKING   0x00001000   Use recovering tracking
MFC_UNDO_TRACKING   0x00002000   Keep undo information
MFC_ALLOW_TABS   0x00004000   Allow and process tabs, by default tabs are converted and formatted to spaces
MFC_ALLOW_SHARE_TRACKING   0x00010000   Allow file share tracking

 

MFC stands for Mapped File Control. Bits can be logically ORed to combine options.

To create a new file and MTO, use the CreateMappedTextFile function:

handle = CreateMappedTextFile ( string name );

The name is any valid URN to which the user has create/read/write privileges. Finally, an empty MTO can be create or an MTO can be created using a string using the CreateMappedTextString function:

handle = CreateMappedTextString ( string data );

When a script is ready to comment the changes, the MappedTextSave function can be called:

int = MappedTextSave ( handle hObject, [string filename], [dword flags] );

The return int will contain a formatted error code, and ERROR_NONE indicates success. An optional filename can be provided to save to a different location. If the MTO was created from a string, the filename parameter must be supplied. Finally, the optional flags parameter specifies how manage the save process:

  Definition   Bitwise   Description
  MSF_BAK_FILE_LIMIT   0x0000000F   Limit of Journal Files
  MSF_BAK_FILE_AS_JOURNAL   0x00000100   Perform Backup as Journal Files (name (01).bak)
  MSF_BAK_FILE_HIDDEN   0x00000200   Make Backup Files Hidden
  MSF_NO_BAK_FILE   0x00000400   No Backup File
  MSF_NO_NEWLINES   0x00010000   No 0x0A New Line Codes
  MSF_WRITE_ATTRIBUTES   0x00020000   Write Attributes to File
  MSF_OVERRIDE_READ_ONLY   0x00040000   Override Read-Only Setting
  MSF_NO_FILE_SAVE_NOTIFY   0x00100000   Do Not Notify Application of Change (this only applies to objects associated with one or more windows)

 

MSF stands for Mapped Save Flags. Bits can be logically ORed to combine options. Please note that if the MTO is associated with a window, the menu function should be used to perform a save operation since the view may also perform various tasks as part of the save process, including reading information from the view to be saved in the file.

To export data while not affecting the MTO state, use the MappedTextExport function:

int = MappedTextExport ( handle hObject, string filename, [dword flags] );

The same parameters apply except the only flag that is active is the MSF_NO_NEWLINES flag.

Another way to retrieve data from a MTO is to request a string by using the MappedTextToString function:

string = MappedTextToString ( handle hObject, [boolean newline] );

This function will return a string containing the contents of the MTO. By default, lines are separated by single return characters (0x0D).

There are two ways to modify data within an MTO: non-transaction based line level and transaction based segment level. All MTOs supporting windows will be transaction based.

6.6.6 Non-Transaction Line Access

A number of functions are provided to perform basic line level tasks like read, replace, insert, and delete. In many cases, even if using segments (discussed below), reading lines can be desirable. All line functions expect that the line of text does not contain return or new line codes.

To begin with, It might be good to know how many lines are in a file or MTO.

int = GetLineCount ( handle hObject );

The GetLineCount function will return the total lines within an MTO or Edit Object. To directly read a line, the ReadLine function is used:

string = ReadLine ( handle hObject, [int index | position] );

This function returns a string, which can be empty if the line has no data or if the function encountered an error. Use the IsError or GetLastError functions to check for an error or get an error code. The first parameter, hObject, references the object from which the line will be read. For this discussion, the parameter would contain a handle to an MTO or an Edit Object. When using the ReadLine function with an MTO, the zero-based index parameter is required. The MTO will automatically retrieve the data from the file or temporary area depending on whether or not the data was previously modified. If the source was Unicode, it will be converted to 8-bit ANSI. Note that Legato does not presently support 16-bit wide character processing for Unicode. Also note that the ReadLine function works more object types than only the MTO such as a string pool.

The flip side of the ReadLine function is the ReplaceLine function. It allows data to be written back to the MTO. In the prototypes below, hObject is refined as hMappedText.

int = ReplaceLine ( handle hMappedText, int index, string data );

The function returns a formatted error code on failure, but if used properly, it returns ERROR_NONE. The index parameter must specify an existing line. To add data, the InsertLine function must be used:

int = InsertLine ( handle hMappedText, int index, string text );

Again, the function returns a standard result. The InsertLine function inserts the new line prior to the specified index parameter. The index can be specified as -1 to append to the end of the map. This relieves the script from having to get the last index position and calculate the write position to append data.

Finally, lines can be deleted using the DeleteLine function:

int = DeleteLine ( handle hMappedText, int index, [int count] );

If the count parameter is not provided, then a single line is deleted.

6.6.7 Transaction Based Segment Access

Rather than accessing data by lines, higher level functions are provided to access and alter data by segments or regions. In these cases, there’s no need to specify a line. Regions are defined as x/y-x/y segments. This method supports undo operations and file recovery. Segment X positions are always native, that is, tab characters are counted as one position. There are functions to aid in handling translation of tab positions.

To read a segment, use the ReadSegment function:

string = ReadSegment ( handle hObject, [int sx, int sy, int ex, int ey] );

The function returns a string, which can be empty on failure. Use the GetLastError function to check for errors. Line breaks are represented by returns only (0x0D). The segment parameters are optional only when using an Edit Object since an Edit Object can contain a selected area. If omitted in that case, whatever is selected is returned.

To replace a segment, use the WriteSegment function:

int = WriteSegment ( handle hObject, string data, int sx, int sy, [int ex, int ey] );
int = WriteSegment ( handle hObject, string data );

These functions have essentially three flavors: (i) for an Edit Object, the position can be omitted to replace the selected area or insert at the caret; (ii) for both object types, the sx and sy parameters can be specified to insert at that position; and finally, (iii) a complete segment can be specified.

Obviously, to insert lines either have more line endings in the data to be written or don’t specify an ending position for the segment.

6.6.8 Other Things to Note

Some other line based functions are as follows:

int     = GetLineSize ( handle hObject, int index, [boolean realized] );
boolean = IsBlankLine ( handle hObject, int index );
int     = MoveToNonBlankLine ( handle hMappedText, int index, [boolean backward] );
int     = NativeToRealized ( string reference | [handle hMappedText, int line], int position );
int     = RealizedToNative ( string reference | [handle hMappedText, int line], int position );

What the GetLineSize function does is likely obvious except for the realized flag. When set to TRUE, tabs are expanded to the default and then the size of the line is returned. The IsBlankLine and MoveToNonBlankLine functions help manage empty lines (which includes lines with only white space). Finally, the NativeToRealized and RealizedToNative functions calculate positions based on tabs within text.

As mentioned at the start of the article, MTOs support various kinds of meta data. Named text fields can be added or read using the SetObjectMetaData and GetObjectMetaData functions. While the utility of this type of meta data may seem limited when using an MTO to access a file, it can be invaluable when working with edit windows. Setting meta data allows scripts to be aware of other script actions or to carry information asynchronously.

Other meta data: 

dword  = GetMappedTextEncoding ( handle hMappedText );
string = GetMappedTextEncodingString ( handle hMappedText );
dword  = GetMappedTextFileType ( handle hMappedText );
string = GetMappedTextFilename ( handle hMappedText );

As a file is mapped, the GetMappedTextEncoding and GetMappedTextEncodingString functions can be used to determine if the source is Unicode and what type. The file type and name can also be retrieved.

6.6.9 Related Functions

Since many functions are share across different object, not all MTO functions are contain in this section.

Object Control

CreateMappedTextFile — Creates a file and then opens it as a Mapped Text Object.

CreateMappedTextString — Creates a Mapped Text Object based on a string.

GetMappedTextEncoding — Returns the encoding for a Mapped Text Object.

GetMappedTextEncodingString — Returns the encoding for a Mapped Text Object as a string.

GetMappedTextFilename — Returns the name of the file associated with a Mapped Text Object.

GetMappedTextFileType — Returns the file type code associated with a Mapped Text Object.

GetObjectMetaData — Gets a named meta data item from the specified object with optional title and section.

MappedTextExport — Exports a Mapped Text Object to the specified file.

MappedTextGetEditWindow — Retrieves an edit window handle associated with a Mapped Text Object.

MappedTextSave — Saves a Mapped Text Object (remaps to new saved file).

OpenMappedTextFile — Opens the specified file as a Mapped Text Object.

SetObjectMetaData — Sets a named meta data item to the specified object with optional title and section.

Line Level Data Access

DeleteLine — Deletes one or more lines from Mapped Text Object.

GetAbsoluteFilePosition — Translates X and Y position to the absolute file position within a Mapped Text Object.

GetLineCount — Gets the number of lines in a Mapped Text or Edit Object.

GetLineSize — Gets the size of a line in a Mapped Text or Edit Object as native or optional realized bytes.

InsertLine — Inserts a line of text into a Mapped Text Object.

IsBlankLine — Tests a line within a Mapped Text Object for being blank (white space does not count as text).

MoveToNonBlankLine — Looks forward until a non-blank line is encountered within a Mapped Text Object.

ReadLine — Reads the next line of data from a Basic File or Pool Object, or, reads a specified line from a Mapped Text or Edit Object.

ReplaceLine — Replaces the text for the specified line in a Mapped Text Object.

Transaction Based Data Access

ReadSegment — Reads a segment of data from a Mapped Text or Edit Object.

WriteSegment — Writes a segment of string data to a Mapped Text or Edit Object at an optional specified position.

Tab Characters and Processing

NativeToRealized — Translates an X position for a string or a line within a Mapped Text Object to a realized position.

RealizedToNative — Translates an X position for a string or line within a Mapped Text Object to a native position.