When using GoFiler to do filings that have series and classes in them, using the library function is a lot faster than manually adding each series and each class one at a time. The GoFiler series/class library also has the ability to look up a single CIK at a time and download the series and class information associated with it from the SEC. But what if you have a large list of CIKs, and want to add all of them to the series and classes library at once? Well, using Legato, we can run a lookup on each CIK in the input list and, with the response, build a library file which we can then import into GoFiler.
Friday, November 03. 2017
LDC #58: Bulk Importing CIKs into a Series and Classes Library
This particular script relies on the SDK function EDGARLookupCIK to retrieve information from the SEC. In writing this, we discovered that some CIKs on the EDGAR system cause a syntax error when being parsed. That’s going to be corrected in GoFiler 4.20b, which is scheduled for release on 11/20/2017. Until then, this script can still be used, but it may fail to load some CIK values.
Scripts like this are generally run once and then are not needed again. Also our example script is much simpler than most other scripts because it’s designed to be run from within the IDE only. This means we don’t really need to define function names, hook things into GoFiler’s menus, etc. Note that if you find you are making use of this script (or a script like it) more often, this functionality can be added.
Our script will take an input data file, which must be in CSV format, and add new columns to it to make it into a series/class library for importing into GoFiler. This example assumes that you have a data file with three columns: Trust Name, CIK, and CCC. They will need to be in that order as well. The CCC can be left blank; that simply means the library won’t have the CCC for that trust in it. A sample data file might have the contents:
Trust Name, CIK, CCC Steven's Trust Company, 0000123456, fakec1k@ Steven's Other Trust, 0000654321, b@dcik11
Sample script:
/********************************************************************************************************/ // buildserieslibrary.ls // // Date: 10-31-2017 // // Purpose: Reads a CSV file to an array, looks up all CIKs in file, and writes an output file with // information about every CIK in the file. // // Author: Steven Horowitz /********************************************************************************************************/ int num_entries; /* number of entries to look up */ string s_id; /* series ID */ string c_id; /* class ID */ string filename; /* name of data file */ string output_filename; /* name of output file */ string response[]; /* response from SEC */ string CIK; /* CIK looking up */ string data_table[][]; /* max lines in data file is 1000 */ string output_table[][]; /* output table */ string series_list[]; /* list of series in a cik */ string class_list[]; /* list of classes in a series */ string class_key; /* key for class */ string FYE; /* fiscal year end */ int ox; /* output counter */ int ix; /* counter */ int sx,cx; /* series and class counter */ int ssize,csize; /* series and class array sizes */ int rc; /* response code */ int o_size; /* ciks added to output */ /* */ filename = BrowseOpenFile("Pick CSV of CIKs"); /* choose file to open */ if (IsError()){ /* if error */ return; /* return */ } /* */ output_filename = BrowseSaveFile("Output File"); /* set output path */ if (IsError()){ /* if error */ return; /* return */ } /* */ data_table = CSVReadTable(filename); /* read file to CSV */ output_table[0][0]="Trust Name"; output_table[0][1]="CIK"; output_table[0][2]="CCC"; output_table[0][3]="SID"; /* set column headings */ output_table[0][4]="CID"; /* set column headings */ output_table[0][5]="Series Name"; /* set column headings */ output_table[0][6]="Class Name"; /* set column headings */ output_table[0][7]="FYE"; /* set column headings */ output_table[0][8]="Ticker"; /* set column headings */ ix=1; /* start on first row */ ox = 1; /* start on first output row */ num_entries = ArrayGetAxisDepth(data_table); /* get num entries in the CSV file */ ProgressOpen("Looking up CIK's"); /* open progress bar */ while (ix<num_entries){ /* while we have rows left to go */ ProgressSetStatus("Entry %d of %d",ix,num_entries); /* set status of progress */ ProgressUpdate(ix,num_entries); /* update progress bar */ ArrayClear(response); /* clear out response */ CIK = data_table[ix][1]; /* get CIK from Data table */ if (CIK!=""){ /* if CIK is not blank */ AddMessage("Looking up CIK: %s",CIK); /* add logg message */ LogIndent(); /* indent the log */ response = EDGARLookupCIK(CIK); /* run the lookup */ rc = GetLastError(); /* get error */ if (IsError(rc)){ /* if we couldn't find the CIK */ LogSetMessageType(LOG_ERROR); /* set message type to error */ AddMessage("CIK %s error: %0x",CIK,rc); /* display error message */ LogSetMessageType(LOG_NONE); /* set message type to none */ } /* */ FYE = response["FYE"]; /* get FYE */ if (IsRegexMatch(FYE,"\\d{4}")){ /* if FYE is 4 digits */ FYE=GetStringSegment(FYE,0,2)+"/"+GetStringSegment(FYE,2);/* add slash to FYE */ } /* */ series_list = ExplodeString(response["SeriesList"]," "); /* explode series list into array */ ssize = ArrayGetAxisDepth(series_list); /* get size of series */ if (ssize==0){ /* if no series */ LogSetMessageType(LOG_WARNING); /* set message type */ AddMessage(" No Series Found"); /* no series found */ LogSetMessageType(LOG_NONE); /* set message type */ } /* */ else{ /* if we have at least 1 entry */ o_size ++; /* increment output counter */ } /* */ for(sx=0;sx<ssize;sx++){ /* for each series */ s_id = series_list[sx]; /* get series id */ class_list = ExplodeString(response[s_id+"_Classes"]," ");/* get list of classes */ csize = ArrayGetAxisDepth(class_list); /* get size of class list */ for(cx=0;cx<csize;cx++){ /* for each class */ c_id = class_list[cx]; /* get the series id */ class_key = s_id+"_"+c_id; /* basic key for class */ output_table[ox][0] = data_table[ix][0]; /* set name for row */ output_table[ox][1] = data_table[ix][1]; /* set cik for row */ output_table[ox][2] = data_table[ix][2]; /* set ccc for row */ output_table[ox][3] = s_id; /* series ID */ output_table[ox][4] = c_id; /* class ID */ output_table[ox][5] = response[s_id+"_Name"]; /* get series name */ output_table[ox][6] = response[class_key+"_Name"]; /* get class name */ output_table[ox][7] = FYE; /* set fiscal year end */ output_table[ox][8] = response[class_key+"_Ticker"]; /* get ticker symbol */ ox++; /* increment output counter */ } /* */ } /* */ LogOutdent(); /* */ } ix++; /* increment counter */ } CSVWriteTable(output_table,output_filename); /* write output */ AddMessage("Added %d CIKs to the output file.",o_size); /* log output size */ ProgressClose();
The first thing our script needs to do is ask the user for a file to open. We can use the BrowseOpenFile function to do this. We should check the result as well using the IsError function, because if the user cancelled the operation or there was some other error we want the script to stop executing right now. We can do the same thing to get the location in which the file should be saved by using the BrowseSaveFile function. Then, with the CSVReadTable function, we can read the contents of the table into our table variable data_table.
/* */ filename = BrowseOpenFile("Pick CSV of CIKs"); /* choose file to open */ if (IsError()){ /* if error */ return; /* return */ } /* */ output_filename = BrowseSaveFile("Output File"); /* set output path */ if (IsError()){ /* if error */ return; /* return */ } /* */ data_table = CSVReadTable(filename); /* read file to CSV */
Next we need to set headings up on our output_table variable. Once the headings are set up in the first row, we can initialize our input counter, ix, and our output counter, ox, to start on row 1. Please note as stated above the CSV file has a specific column order and also includes a heading row. After that, we can retrieve the number of entries in our data table by using the ArrayGetAxisDepth function, open up a progress window with the ProgressOpen function, and loop through all of our input entries with a while loop.
output_table[0][0]="Trust Name"; output_table[0][1]="CIK"; output_table[0][2]="CCC"; output_table[0][3]="SID"; /* set column headings */ output_table[0][4]="CID"; /* set column headings */ output_table[0][5]="Series Name"; /* set column headings */ output_table[0][6]="Class Name"; /* set column headings */ output_table[0][7]="FYE"; /* set column headings */ output_table[0][8]="Ticker"; /* set column headings */ ix=1; /* start on first row */ ox = 1; /* start on first output row */ num_entries = ArrayGetAxisDepth(data_table); /* get num entrires in the CSV file */ ProgressOpen("Looking up CIK's"); /* open progress bar */ while (ix<num_entries){ /* while we have rows left to go */
Before each entry, we want to use the ProgressSetsStatus and ProgressUpdate functions to update our progress array. The ArrayClear function clears the last response back from the SEC. If our next CIK in the list isn’t blank, we can run the EDGARLookupCIK function to look the CIK up on the SEC’s EDGAR system. If we got an error, we need to log it. Otherwise, we need to obtain the FYE from the response. If it’s just four digits, we should add a front slash to make it acceptable to the GoFiler library format. The response from EDGAR is structured in such a way that the series ID and class ID will be used as keys in the response array. So if there is series S000000001 and class C000000001, the response array will have a value with the key “S000000001_C000000001_Name” which is the name of the class. It will also have a value with the key “S000000001_Name” with the name of the series. To make it easier to find things, there’s also a “SeriesList” value that contains a list of all the series in this CIK, separated by blank spaces. This lets us use the ExplodeString function to get an array of all series, on which we can then use the ArrayGetAxisDepth function to get the number of series associated with this CIK.
ProgressSetStatus("Entry %d of %d",ix,num_entries); /* set status of progress */ ProgressUpdate(ix,num_entries); /* update progress bar */ ArrayClear(response); /* clear out response */ CIK = data_table[ix][1]; /* get CIK from Data table */ if (CIK!=""){ /* if CIK is not blank */ AddMessage("Looking up CIK: %s",CIK); /* add logg message */ LogIndent(); /* indent the log */ response = EDGARLookupCIK(CIK); /* run the lookup */ rc = GetLastError(); /* get error */ if (IsError(rc)){ /* if we couldn't find the CIK */ LogSetMessageType(LOG_ERROR); /* set message type to error */ AddMessage("CIK %s error: %0x",CIK,rc); /* display error message */ LogSetMessageType(LOG_NONE); /* set message type to none */ } /* */ FYE = response["FYE"]; /* get FYE */ if (IsRegexMatch(FYE,"\\d{4}")){ /* if FYE is 4 digits */ FYE=GetStringSegment(FYE,0,2)+"/"+GetStringSegment(FYE,2);/* add slash to FYE */ } /* */ series_list = ExplodeString(response["SeriesList"]," "); /* explode series list into array */ ssize = ArrayGetAxisDepth(series_list); /* get size of series */
Let’s go back to our loop. If we have no series for this CIK, we need to log that. Otherwise, we can increment the o_size counter, which is the output size, or number of CIKs included in our output file. Then for each series, we can get the series ID and explode the class list for each series just like we exploded the series list. This is always going to be in the response array under a key made by appending “_Classes” to the series ID. Once we have that, we can iterate over every class with another for loop. Every series on the EDGAR system will always have at least one class.
if (ssize==0){ /* if no series */ LogSetMessageType(LOG_WARNING); /* set message type */ AddMessage(" No Series Found"); /* no series found */ LogSetMessageType(LOG_NONE); /* set message type */ } /* */ else{ /* if we have at least 1 entry */ o_size ++; /* increment output counter */ } /* */ for(sx=0;sx<ssize;sx++){ /* for each series */ s_id = series_list[sx]; /* get series id */ class_list = ExplodeString(response[s_id+"_Classes"]," ");/* get list of classes */ csize = ArrayGetAxisDepth(class_list); /* get size of class list */ for(cx=0;cx<csize;cx++){ /* for each class */
For each class, we can get the class ID, build a key value for our series, and use that to fill in our output table. Then we can increment the row counter of our output table, outdent the log back to normal, and increment the input counter so we know we’re on the next row next time the while loop goes around.
c_id = class_list[cx]; /* get the series id */ class_key = s_id+"_"+c_id; /* basic key for class */ output_table[ox][0] = data_table[ix][0]; /* set name for row */ output_table[ox][1] = data_table[ix][1]; /* set cik for row */ output_table[ox][2] = data_table[ix][2]; /* set ccc for row */ output_table[ox][3] = s_id; /* series ID */ output_table[ox][4] = c_id; /* class ID */ output_table[ox][5] = response[s_id+"_Name"]; /* get series name */ output_table[ox][6] = response[class_key+"_Name"]; /* get class name */ output_table[ox][7] = FYE; /* set fiscal year end */ output_table[ox][8] = response[class_key+"_Ticker"]; /* get ticker symbol */ ox++; /* increment output counter */ } /* */ } /* */ LogOutdent(); /* */ } ix++; /* increment counter */ }
Finally, we can write the file out, add a message to the log, and close our progress bar.
CSVWriteTable(output_table,output_filename); /* write output */ AddMessage("Added %d CIKs to the output file.",o_size); /* log output size */ ProgressClose();
Legato is very handy for quick, one off scripts like this. If a series/class library is lost or corrupted, you can easily rebuild it with a master list of the CIK/CCC codes you need. The information could have been added back in one entry at a time, but it’s a lot faster when you can write a script to automate the process.
Steven Horowitz has been working for Novaworks for over five years as a technical expert with a focus on EDGAR HTML and XBRL. Since the creation of the Legato language in 2015, Steven has been developing scripts to improve the GoFiler user experience. He is currently working toward a Bachelor of Sciences in Software Engineering at RIT and MCC. |
Additional Resources
Legato Script Developers LinkedIn Group
Primer: An Introduction to Legato