The ExplodeString function allows a single string to be cut into smaller pieces and placed into an array. Conversely, the ImplodeArray function allows an array to be assembled into a single string. Both functions use what is commonly referred to as ‘glue’ (a delimiter) to perform the disassembly and reassembly.
This blog post will cover using these functions, and will provide a real world example of their utility. It should be noted that as with many programming tasks, there are multiple ways to approach a problem, each with their respective pros and cons. This article is biased to using strings and arrays.
Strings To and From Files
Let’s begin by loading data into a string variable. Many times data is either collected from a file or an internet source. Two functions directly copy the contents of a file into or out of a string:
string = FileToString ( string name );
and
int = StringToFile ( string data, string name );
In both functions, the name parameter is a fully qualified filename. In the case of the FileToString function, the name could also be an HTTP URL. This is not so for the StringToFile function because HTTP does not support a direct file write operation.
The FileToString function will return an empty string on error but it could also return an empty string if the source file is empty or contains binary data starting with a zero. The IsError and GetLastError functions can be used to determine if an error occurred and the nature of the error. The source file is opened using the lowest privileges and sharing possible and is immediately closed after the data has been copied. The resulting string can be large. In the example provided, we are opening an EDGAR archive index which is in excess of 30mb. For extremely large files, other methods, such as the OpenMappedTextFile function, would be more appropriate.
On the other hand, the StringToFile function writes the string data verbatim to a specified file. If the file already exists, it is overwritten. The function returns a formatted error code, so it is not necessary to call the GetLastError function. The IsError function is fine to check the function’s return value for an error.
Translating Strings to Arrays
Once we have loaded data into the string using the FileToString function, let’s suppose that we want to access that data in sections, and those sections of text are delimited by line endings. Exploding a string takes the content of the string and finds segments by a delimiter, which are in turn returned as a list or single-dimension array.
string[] = ExplodeString ( string data, [string delimiter], [int limit] );
The data parameter would typically be from a source, such as the returned string from the FileToString function as described above. If the data is to be split by lines, not specifying the delimiter parameter causes any type of line ending to be used as a delimiter.
Other delimiters can also be used, for example:
string s1;
string items[];
int ix, size;
s1 = "The quick brown fox jumped over the lazy dog's back";
items = ExplodeString(s1, " ");
size = ArrayGetAxisDepth(items);
for (ix = 0; ix < size; ix++) {
AddMessage("%d - '%s'", ix, items[ix]);
}
creates the following in the default log:
0 - 'The'
1 - 'quick'
2 - 'brown'
3 - 'fox'
4 - 'jumped'
5 - 'over'
6 - 'the'
7 - 'lazy'
8 - 'dog's'
9 - 'back'
As a side note, the ExplodeString function is not suitable to parse CSV data since there are multiple methods of delimiting and escaping the data. Finally, the ExplodeString function has an optional parameter called limit. The limit parameter allows two different methods of capturing only part of the data. In one mode, the limit will simply add the requested number of items. In the other mode, the function will add the items and place the unprocessed data after the limit as the last item in the array.
On the flipside, the ImplodeArray function will glue an array together into a string. If we add the code below to end of the above script:
s1 = ImplodeArray(items, "+");
AddMessage(s1);
and run the script again, the following will be added to the log:
The+quick+brown+fox+jumped+over+the+lazy+dog's+back
The function prototype is as follows:
string = ImplodeArray ( string[] array, [string glue] );
The glue is optional and, if omitted, will be set to “\r\n” as conventional line endings. In the above example, we used a ‘+’ symbol. It is worth noting that the function does not encode source data to avoid conflicts with the glue string. So take care in the choice of glue and the content of the array.
Translating a Formatted String
Another useful function that takes a string and makes an array from the contents is the FormattedTextToArray function. This function grabs multiple segments and dumps the result into a returned list. It assumes that the source data is a formatted text string with fixed field positions.
Field positions are supplied as a list of integers. The function prototype is as follows:
string[] = FormattedTextToArray ( string data, int[] positions );
The data parameter should not contain control characters, such as tabs and line endings. If the source contains tab characters, they should be expanded using the ExpandTabs function. Fields are identified by the positions array, which should contain integers identifying the ending position of each segment. The first position starts at index 0. As an example similar to the one above, we can read formatted data:
string s1;
string items[];
int pos[];
int ix, size;
s1 = "Peanut Butter 167cal 15g 6g 7g";
pos[0] = 26; pos[1] = 33; pos[2] = 42; pos[3] = 52; pos[4] = 0;
items = FormattedTextToArray(s1, pos);
size = ArrayGetAxisDepth(items);
for (ix = 0; ix < size; ix++) {
AddMessage("%d - '%s'", ix, items[ix]);
}
and send it to the log:
0 - 'Peanut Butter'
1 - ' 167cal'
2 - ' 15g'
3 - ' 6g'
4 - ' 7g'
Note that the resulting data is trimmed for trailing spaces but not leading spaces. (After Legato version 4.16a, both leading and trailing spaces are trimmed.) The last position can be set to some value beyond the end of the longest line or zero to indicate to go to the maximum left position. As we will see later, this function can be useful for processing large lists of data. It could also be used to parse an EDGAR ASCII table that has been marked with EDGAR table tags.
Putting Exploded and Formatted Strings Together
Below is a script that will download a quarterly EDGAR archive index, parse through the lines, create a CSV file, and then open the file in Data View.
// -- Variables
string s1;
string lines[];
string items[];
string table[][];
int pos[20];
int ix, size;
int cx, cols;
int rx;
int rc;
// -- Initialize
ProgressOpen("Get SEC Quarterly Index", 0);
pos[0] = 62; table[0][0] = "Company Name";
pos[1] = 74; table[0][1] = "Form Type";
pos[2] = 86; table[0][2] = "CIK";
pos[3] = 98; table[0][3] = "Date Filed";
pos[4] = 0; table[0][4] = "File Name";
rx = 1;
// -- Get the Archive Data
ProgressSetStatus("Downloading IDX File");
s1 = FileToString("https://www.sec.gov/Archives/edgar/full-index/2016/QTR3/company.idx");
if (IsError()) {
rc = GetLastError();
MessageBox('x', "Error loading idx file (%08X) ", rc);
exit;
}
lines = ExplodeString(s1);
size = ArrayGetAxisDepth(lines);
// -- Line loop (start at data)
ProgressSetStatus("Translating");
ix = 10;
while (ix < size) {
ProgressUpdate(ix, size);
items = FormattedTextToArray(lines[ix], pos);
table[rx][0] = items[0];
table[rx][1] = items[1];
table[rx][2] = items[2];
table[rx][3] = items[3];
table[rx][4] = items[4];
ix++; rx++;
}
// -- Write Out File
ProgressSetStatus("Write File");
s1 = GetTempFile(); // (change file as desired)
CSVWriteTable(table, s1); // (will be deleted on app reopen)
ProgressClose();
DataViewOpenFile(s1);
This script can essentially be broken into four parts: (i) download data from the SEC into a variable; (ii) explode it into an array; (iii) translate each line into table entries; and, (iv) write the data to CSV and open it in Data View. Overall run time will be from 30 to 60 seconds (or more) depending on your internet connection speed and the speed of your computer. The index file can be downloaded locally and the script can point to the download while experimenting to avoid repeatedly downloading the 30mb index file.
Positions and table headings are set in the beginning section of the script (marked in the comments as “Initialize”). They are coded side by side to illustrate the position and column relationships. You will also note that a progress window has been added to show the user how things are moving along. (I hate programs that present the spinning wheel of death with no information as to what is happening.)
The FileToString function points directly to the SEC URL for the index to download. The code checks to see if the download succeeded and then continues to convert the string to an array with the ExplodeString function.
Now that we have an array of data in the lines variable, we can start processing the body of the index data. In this case, we assume that the data starts at row 10. A more sophisticated approach could be taken to locate the index body, which would be preferable should the SEC change the format of the data. This is also true of the column positions.
The FormattedTextToArray function is called on each line of the data from the lines variable. Note that the intermediate variable items is used to translate into the table. (Legato does not presently support partial array transfers such as having the table row equal the function.)
Finally, upon completion, the entire table variable is written to a temporary file using the CSVWriteTable function and the file is then opened in Data View using the DataViewOpenFile function. Note that the temporary file will be deleted by the application the next time it starts up as part of its house keeping. Obviously the resulting file could be written to any location or even a location selected by the user location via the BrowseSaveFile function.
Conclusion
In next week’s blog, we will continue our discussion with parameters and arrays.