HTML is the primary document type submitted to the SEC now. Every filing needs to have a primary document, and that primary document is almost always an HTML file. Because EDGAR HTML is a very specific type of HTML, you need special tools to create and convert these documents. Software packages like GoFiler exist to make this process as easy as possible, but in the end it still involves some level of user interaction to convert the document from a source file, make sure it converted alright, fix any issues, proof it, and either submit it or send it back to the client. Well, using Legato, we can cut a few steps out of that by making an HTML template that’s filled in with data populated from a data file!
Friday, September 08. 2017
LDC #51: Streamlining Document Creation with Templates
Templates are not a new concept, but it does require a bit of modification and tweaking to get it to work exactly right. This week we’ll be looking at how to make a template version of an SC 13D filing. The work flow for the user making an SC 13D filing would look ideally like this:
1) | User opens a template CSV file in Excel and fills out some data fields. This is just raw data so therefore has no formatting. | |
2) | User opens GoFiler, chooses File->New, selects the EDGAR HTML using Form Template option, and picks “Script 13D”. | |
3) | GoFiler prompts the user for the data file created in step 1. | |
4) | GoFiler then takes the user data from the CSV file, puts it into a template file, and opens the finished product. |
This task is going to require several files to be added or modified. All of these files are available for download here. A brief rundown of all the files we’re going to need to modify is:
File | File Name | Description | Install Location | |
1) | BuildTemplate.ls | The file called by the update menu function, which builds the completed template. | Scripts/Extensions | |
2) | ForceFieldUpdate.ms | The script hooked on post process of FILE_NEW. If the file contains the text “%%%UPDATE_ME%%%” then this script runs the update menu function. | Scripts/Extensions | |
3) | Script13DWrapper.htm | The template file containing the UPDATE_ME directive, which triggers ForceFieldUpdate.ms, which in turn calls BuildTemplate.ls to open ScriptTemplate13D.htm. | Templates | |
4) | ScriptTemplate13D.htm | The main template file that looks like our SC13D filing. It contains all the wrapped directives that will be replaced with data by our script. | Templates | |
5) | userhtml.csv | File that needs to be modified in order to add “Script 13D” to the top of GoFiler’s New EDGAR HTML Form with Template menu. | Templates | |
6) | template.csv | The actual data used in building our file. This is a sample of what the file may look like after a user fills it out. The user is prompted for this file. | — |
The biggest advantage using a template has over the normal process is that the user doesn’t really need to know anything about HTML, GoFiler, or EDGAR. All they need to know is how to enter data into a spreadsheet using Excel. Of course, there are a few things going on behind the scenes to make this all happen, and some things need to be set up for this to work. First, we need to add a new custom template onto the “EDGAR HTML using Form Template” menu. The steps to add a template are covered in a blog post here. The file userhtml.csv that needs to be modified is described in more detail in that post. The template we’re adding is going to be called “Script13DWrapper.htm”. This is a very basic template file with the contents below:
<!-- Field: Include-Text; File:c:\Program Files (x86)\GoFiler Complete\Scripts\Extensions\BuildTemplate.ls; Date: 2017%2D02%2D16T12:29:24; Size: 1628; Options: AllowEdit; Name: build_template --> %%%UPDATE_ME%%% This is going to be replaced on file open, and cannot be edited. <!-- Field: /Include-Text -->
This is an HTML file that contains nothing except an Include-Text field, naming a script file as the file to include. This script file will be named “BuildTemplate.ls” and will need to be in the GoFiler installation directory at the specified location for it to work. When a user updates the fields on this document, the script file will be called, and it can build the composed template file to return back to the user. The contents “%%%UPDATE_ME%%%” of the field are going to be used as a directive because we want the update to trigger automatically instead of requiring the user be required to push a button. To achieve this, we’re going to need a script named “ForceFieldUpdate.ms”. This script file will run on the post-process of the FILE_NEW action, and will check the contents of the file. If the file is HTML and contains the text “%%%UPDATE_ME%%%” on the expected line, then it will run the function DOCUMENT_FIELD_UPDATE, which will force the update of the fields. This will in turn automatically build our file.
This may seem complex, but in actuality it’s not very difficult to step through. The contents of the “ForceFieldUpdate.ms” script are:
#define TEMPLATE_DIRECTIVE_LINE 2 #define TEMPLATE_DIRECTIVE "%%%UPDATE_ME%%%" int run(int f_id, string mode); int setup(){ string script; script = GetScriptFilename(); MenuSetHook("FILE_NEW",script,"run"); return ERROR_NONE; } int run(int f_id, string mode){ handle hWindow; handle mText; dword w_type; string line; if(mode=="postprocess" && FindInString(mode,"error")<0){ hWindow = GetActiveEditWindow(); w_type = GetEditWindowType(hWindow); w_type &= EDX_TYPE_ID_MASK; if (w_type != EDX_TYPE_PSG_PAGE_VIEW){ return ERROR_NONE; } mText = GetMappedTextObject(hWindow); if (GetLineCount(mText)<TEMPLATE_DIRECTIVE_LINE){ return ERROR_NONE; } line = ReadLine(mText,TEMPLATE_DIRECTIVE_LINE-1); if (FindInString(line,TEMPLATE_DIRECTIVE)>(-1)){ RunMenuFunction("DOCUMENT_FIELD_UPDATE"); } } return ERROR_NONE; } int main(){ setup(); return ERROR_NONE; }
This is a very simple script file that contains three small functions. The two defined values specify the line number and text in the HTML file that represents the command directive. The setup and main functions are almost identical to other scripts in past blog posts. They simply hook the script into the “FILE_NEW” menu item, which is what is called to create a new template file. The run function first checks to make sure it’s running post-process mode and that there was no error in the creation process. If there was an error, or it’s preprocess, the function returns. Next, we get a handle to the active edit window with the GetActiveEditWindow function, and we check the file type with the GetEditWindowType function. If it’s not EDX_TYPE_PSG_PAGE_VIEW, again the function returns. We should only operate on new HTML files in Page View.
Lastly, the run function gets the Mapped Text Object for the open window and checks the line count. If it’s less than TEMPLATE_DIRECTIVE_LINE, it returns, because there aren’t enough lines in the file to check. Otherwise, it checks the line to see if it has our specified directive. If so, the function runs our DOCUMENT_FIELD_UPDATE function, which starts the template script “BuildTemplate.ls”.
The file “BuildTemplate.ls” does all of our heavy lifting here. It actually reads the CSV data file and replaces things in our 13D template file. This script looks like this:
#define CHECKED_BOX "☒" #define UNCHECKED_BOX "☐" #define TEMPLATE_NAME "ScriptTemplate13D.htm" #define DIRECTIVE_WRAP "%%%" #define ERR_OUT "<P>Error creating template, cannot build 13D from CSV.</P>" #define DATA_ROWS 31 string build_template(); boolean is_checkbox(string field); string build_template() { int rc; int ix; string data[][]; string item; string directive; string checkbox; string csv, file, template, output; csv = BrowseOpenFile("Select CSV Data File","*.csv|*.csv"); rc = GetLastError(); if (rc==ERROR_CANCEL){ return ERR_OUT; } if (IsFile(csv)==false || FindInString(csv,".csv")<0){ MessageBox('x',"Invalid File Selected, try a different file. Error:%0x",rc); return ERR_OUT; } data = CSVReadTable(csv); rc = GetLastError(); if (IsError(rc)){ MessageBox('x',"Cannot read selected file. Error: %0x",rc); return ERR_OUT; } if (ArrayGetAxisDepth(data)!=DATA_ROWS){ MessageBox('x',"The selected CSV file is not properly formatted."); return ERR_OUT; } template = GetApplicationExecuteFolder(); template = AddPaths(template,"Templates\\"+ TEMPLATE_NAME); output = FileToString(template); rc = GetLastError(); if (IsError(rc)){ MessageBox('x',"Cannot read file %s. Error: %0x",template,rc); return ERR_OUT; } for (ix = 0; ix<DATA_ROWS; ix++){ directive = data[ix][0]; item = ANSITextToXML(data[ix][1]); if (is_checkbox(directive)){ if (MakeLowerCase(item) == "x"){ item = CHECKED_BOX; } else{ item = UNCHECKED_BOX; } } directive = DIRECTIVE_WRAP+directive+DIRECTIVE_WRAP; output = ReplaceInString(output, directive, item); } return output; } boolean is_checkbox(string field){ int ix; string checkboxes[]; int size; checkboxes[0] = "COVERCHECKBOX"; checkboxes[1] = "TABLE2A"; checkboxes[2] = "TABLE2B"; checkboxes[3] = "TABLE5"; checkboxes[4] = "TABLE12"; size = ArrayGetAxisDepth(checkboxes); for (ix=0;ix<size;ix++){ if (checkboxes[ix]==field){ return true; } } return false; }
First we have a couple of defined values. CHECKED_BOX and UNCHECKED_BOX are character entities at are going to be used for checked and unchecked checkboxes, respectively. TEMPLATE_NAME is the name of the actual 13D Template file into which we’re going to replace values. It must be located in the Templates folder of the GoFiler installation directory for this script to work. DIRECTIVE_WRAP defines the characters that are prefixed and suffixed to the name of a directive in the template file. This ensures that we’re replacing only things we want to replace. If, for example, you have a directive named “Date”, you can’t do a find/replace on the word “Date” and expect things to work out very well. So, if we want to replace something, we wrap it with DIRECTIVE_WRAP to ensure it cannot possibly be in the file for any reason besides being replaced by our script.
ERR_OUT defines a paragraph that is returned if the template for some reason cannot be successfully filled out. DATA_ROWS is the number of data rows we’re expecting in our CSV file. If the file has a different number of rows, then we can’t use it.
#define CHECKED_BOX "☒" #define UNCHECKED_BOX "☐" #define TEMPLATE_NAME "ScriptTemplate13D.htm" #define DIRECTIVE_WRAP "%%%" #define ERR_OUT "<P>Error creating template, cannot build 13D from CSV.</P>" #define DATA_ROWS 31
This script is very simple and only has two functions. The smaller one, is_checkbox, is designed to be quickly alterable if we need to add or remove checkboxes. It defines an array named checkboxes with no predefined size. Then it sets string values into it to represent the names of the fields that are going to be checkboxes in our template. It then takes a string parameter passed to it (field), and checks to see if any of the strings in the array match it. If so, then it returns true, because it’s a checkbox. If not, it returns false.
boolean is_checkbox(string field){ int ix; string checkboxes[]; int size; checkboxes[0] = "COVERCHECKBOX"; checkboxes[1] = "TABLE2A"; checkboxes[2] = "TABLE2B"; checkboxes[3] = "TABLE5"; checkboxes[4] = "TABLE12"; size = ArrayGetAxisDepth(checkboxes); for (ix=0;ix<size;ix++){ if (checkboxes[ix]==field){ return true; } } return false; }
The main function is called build_template. When the update is triggered by our ForceFieldUpdate script, this is the function that is called. It is responsible for building our template and returning it as a string, which the update will place into our template wrapper. The first thing it needs to do is use the BrowseOpenFile function to prompt the user to enter a CSV data file. If the operation is cancelled, it returns our ERR_OUT string. If the file is not a valid file or it doesn’t end in a .CSV extension, it displays a message box and then returns ERR_OUT. Now that we have a valid filename, we can try to read the file into a CSV data table with the CSVReadTable function. If that fails, we display an error, and return ERR_OUT. If the file is read to a data structure successfully but it has the wrong number of rows, then we display a different error message and return ERR_OUT.
Once we reach this point, we can try to open our template file we’re going to use as a base for this new document. We get the path where the template should be with the GetApplicationExecuteFolder function. We add the “Templates” folder to that, as well as TEMPLATE_NAME (which is our SC 13D template). Now we try to open the template with the FileToString function. If that fails, it means the template doesn’t exist or the script encountered some other problem, so we can again display an error and return ERR_OUT. If we have the data and we have the template, then we can start actually building our file.
string build_template() { int rc; int ix; string data[][]; string item; string directive; string checkbox; string csv, file, template, output; csv = BrowseOpenFile("Select CSV Data File","*.csv|*.csv"); rc = GetLastError(); if (rc==ERROR_CANCEL){ return ERR_OUT; } if (IsFile(csv)==false || FindInString(csv,".csv")<0){ MessageBox('x',"Invalid File Selected, try a different file. Error:%0x",rc); return ERR_OUT; } data = CSVReadTable(csv); rc = GetLastError(); if (IsError(rc)){ MessageBox('x',"Cannot read selected file. Error: %0x",rc); return ERR_OUT; } if (ArrayGetAxisDepth(data)!=DATA_ROWS){ MessageBox('x',"The selected CSV file is not properly formatted."); return ERR_OUT; } template = GetApplicationExecuteFolder(); template = AddPaths(template,"Templates\\"+ TEMPLATE_NAME); output = FileToString(template); rc = GetLastError(); if (IsError(rc)){ MessageBox('x',"Cannot read file %s. Error: %0x",template,rc); return ERR_OUT; }
For each item in our data table, we want to read the directive we’re replacing from the data file. Then we want to get the new text, using the ANSITextToXML function to convert any characters (like ‘&’) to their character entities (like ‘&’) so they are encoded properly. If the directive we’re replacing is a checkbox, we test if it’s an ‘X’ on the data sheet, and if so the item to use in the template will be a CHECKED_BOX character entity. Otherwise, it will be an UNCHECKED_BOX character entity. Then we wrap the directive we’re replacing with the DIRECTIVE_WRAP define value, so the value matches the value in the template file. After that, we can just run the ReplaceInString function to replace all instances of the directive with the new item. Once all items have been handled, we can return the output, which is then displayed on the screen.
for (ix = 0; ix<DATA_ROWS; ix++){ directive = data[ix][0]; item = ANSITextToXML(data[ix][1]); if (is_checkbox(directive)){ if (MakeLowerCase(item) == "x"){ item = CHECKED_BOX; } else{ item = UNCHECKED_BOX; } } directive = DIRECTIVE_WRAP+directive+DIRECTIVE_WRAP; output = ReplaceInString(output, directive, item); } return output; }
This script could be improved even further. Right now, it only works for single table SC 13D filings, though some have multiple tables of information. The CSV format could be modified to allow for multiple tables, and the BuildTemplate.ls file could handle those multiple tables by just reading a separate template file and inserting multiple tables instead of just replacing the values in a single one. It could also be modified to allow for multiple signatures or to work for other forms like SC 13G or really anything that has a standardized template. There’s a lot that could be done to extend scripts like this, so this example is only the tip of the iceberg.
Steven Horowitz has been working for Novaworks for over five years as a technical expert with a focus on EDGAR HTML and XBRL. Since the creation of the Legato language in 2015, Steven has been developing scripts to improve the GoFiler user experience. He is currently working toward a Bachelor of Sciences in Software Engineering at RIT and MCC. |
Additional Resources
Legato Script Developers LinkedIn Group
Primer: An Introduction to Legato