Last week we added to our XBRL Merger script, giving it the ability to export files and compare the exported files to see if they were able to be merged. This week, we’ll take it another step further by adding in the ability to read the XBRL instance files into data structures, add some debug information, and add a progress bar to the run function. This week introduces the concept of debug print messages. Often when writing a script, you will encounter an issue where you’re not sure what the actual value of a variable is. Depending on the integrated development environment (IDE) being used to write the code, you can sometimes step through the code with breakpoints and inspect the values for each variable. While we’re going to be adding that feature to GoFiler’s Legato IDE, for now we find it helpful to insert various functions that print variables to a console. By adding a special function to do this, you can make showing/hiding debug information very easy. See the section on the debug_message function’s code below for more.
Friday, May 05. 2017
LDC #33: XBRL Merger, Part 3
The new script is below, but here is a quick summary of changes from last week:
1) New global variables and defines have been added.
2) The run function now compares filesizes to find the bigger instance file, has a progress bar, and has some debug information.
3) The debug_message function has been added. It displays some debug information we found helpful while figuring out why things weren’t working.
4) The print_tables function has been added. We added it to make sure the data was being read in correctly. It is a debug function that has no bearing on the actual functionality of the script.
5) The read_file_contents function has been added. This is the main function added this week. It reads the instance file and maps every context, unit, and fact to a data structure in our script.
#include "XBRLMerger.rc" #define EXPORT_FOLDER_PREFIX "Merge" #define CONTEXT_START "<xbrli:context" #define CONTEXT_END "</xbrli:context>" #define UNIT_START "<xbrli:unit" #define UNIT_END "</xbrli:unit>" #define FACT_IDENTIFIER "contextRef" #define ERROR_WARN (ERROR_SOFT | 0x00005555) #define FILE_ONE 1 #define FILE_TWO 2 #define OTHER 999 #define DEBUG true #define DEBUG_MSG_MAX 150 string edit_windows[][]; int bigger_file; int namesize_dif; string FileOneContexts[][]; string FileTwoContexts[][]; string FileOneUnits[][]; string FileTwoUnits[][]; string FileOneFacts[]; string FileTwoFacts[]; string FileOne, FileTwo; handle FileOneWindow; handle FileTwoWindow; int run (int f_id, string mode); int validate_file (string file, string display); string export_file (string foldersuffix, string file, handle window); string get_export_folder (string file, string foldersuffix); int compare_xbrl (string instance,string f1folder, string f2folder); int compare_filesizes (string f1, string f2); int read_file_contents (int file, string instance); int clear_folder (string path); int debug_message (string msg); int print_tables (); /****************************************/ int setup() { /* Called from Application Startup */ /****************************************/ string fnScript; /* Us */ string item[10]; /* Menu Item */ int rc; /* Return Code */ /* */ /* ** Add Menu Item */ /* * Common */ item["Class"] = "Extension"; /* Function Code */ /* o Define Function */ item["Code"] = "XBRL_MERGE"; /* Function Code */ item["MenuText"] = "&Merge XFR Files"; /* Menu Text */ item["Description"] = "<B>Merge XBRL Files</B>"; /* description */ item["Description"].= "\r\rMerges two XBRL Instance Files."; /* description */ /* o Check for Existing */ rc = MenuFindFunctionID(item["Code"]); /* Look for existing */ if (IsNotError(rc)) { /* Was already be added */ return ERROR_NONE; /* Exit */ } /* end error */ /* o Registration */ rc = MenuAddFunction(item); /* Add the item */ if (IsError(rc)) { /* Was already be added */ return ERROR_NONE; /* Exit */ } /* end error */ fnScript = GetScriptFilename(); /* Get the script filename */ MenuSetHook(item["Code"], fnScript, "run"); /* Set the Hook */ return ERROR_NONE; /* Return value (does not matter) */ } /* end setup */ /****************************************/ int run(f_id,mode){ /* main run loop */ /****************************************/ string errmsg; /* an error message */ string master_folder; /* folder for master files */ string merge_output; /* the output folder for merge */ string f1folder,f2folder; /* folders that were exported to */ string f1instance,f2instance; /* instance files of exported XFRs */ qword f1namesize, f2namesize; /* sizes of filenames */ int sizeres; /* result of filesize compare */ int rc; /* result */ /* */ bigger_file = OTHER; /* set default value for bigger file */ if (mode!="preprocess"){ /* if not in preprocess */ return ERROR_NONE; /* return no error */ } /* */ edit_windows = EnumerateEditWindows(); /* get open edit windows */ rc = DialogBox("MergeXBRLDlg", "merge_"); /* open selector dialog */ if (IsError(rc)==true){ /* if the user didn't press OK */ CloseHandle(FileOneWindow); /* close handle */ CloseHandle(FileTwoWindow); /* close handle */ return rc; /* return */ } /* */ /* */ ProgressOpen("XBRL Merger"); /* open progress */ ProgressSetStatus("Exporting Files"); /* set status */ ProgressUpdate(1,10); /* update progress */ f1folder = get_export_folder(FileOne,"One"); /* get the file 1 folder */ errmsg = GetLastErrorMessage(); /* get the last error */ if (f1folder == "" ){ /* if we cannot get the folder */ MessageBox('x',"Cannot create folder for export. %s",errmsg); /* display error message */ return ERROR_EXIT; /* return with error */ } /* */ f2folder = get_export_folder(FileTwo,"Two"); /* get the file 2 folder */ errmsg = GetLastErrorMessage(); /* */ if (f2folder == ""){ /* if we cannot get a folder */ MessageBox('x',"Cannot create folder for export. %s",errmsg); /* display error message */ return ERROR_EXIT; /* return with error */ } /* */ f1instance = export_file("One",FileOne, FileOneWindow); /* export the first file */ f2instance = export_file("Two",FileTwo, FileTwoWindow); /* export the second file */ ProgressSetStatus("Comparing XBRL files"); /* progress set status */ ProgressUpdate(2,10); /* update progress status */ if (f1instance=="" || f2instance==""){ /* test if either instance is blank */ MessageBox('x',"Unable to export XFR files. "+errmsg); /* display error message */ return ERROR_EXIT; /* return with error */ } /* */ rc = compare_xbrl(f1instance,f1folder,f2folder); /* check if the files can be merged */ errmsg = GetLastErrorMessage(); /* get the last error message */ if (IsError(rc)){ /* if there was a problem */ if (rc==ERROR_WARN){ /* maybe not fatal, ask user to cont. */ rc = YesNoBox('q',"File size mismatch, result may have errors."+/* ask user */ errmsg+" Continue?"); /* ask user */ if (rc!=IDYES){ /* if the user idn't press yes */ return ERROR_EXIT; /* return with error */ } /* */ } /* */ else{ /* if the error is definitely fatal */ MessageBox('x',"Files are not compatible to merge. %s", errmsg);/* display error */ return rc; /* return eror code */ } /* */ } /* */ master_folder = f1folder; /* set default master folder */ if (bigger_file == FILE_TWO){ /* if file 2 is bigger */ master_folder = f2folder; /* if file 2 is the master folder */ } /* */ if (bigger_file == OTHER){ /* if we don't know which file is bigger*/ sizeres = compare_filesizes(AddPaths(f1folder,f1instance), /* get result of size compare */ AddPaths(f2folder,f2instance)); /* get the result of the size compare */ bigger_file = FILE_ONE; /* set bigger file */ if (sizeres == FILE_TWO){ /* if file two is bigger */ bigger_file = FILE_TWO; /* set file two as bigger file */ master_folder = f2folder; /* set file two as master file */ } /* */ } /* */ debug_message(FormatString("Master File: %d",bigger_file)); /* add debug message */ debug_message(FormatString("Master Folder: %s",master_folder)); /* add debug message */ merge_output = f1folder; /* set output folder */ if (bigger_file == FILE_TWO){ /* if file two is bigger */ merge_output = f2folder; /* set output folder to file two */ } /* */ merge_output = AddPaths(merge_output,"../"+ /* set path for new merged folder */ EXPORT_FOLDER_PREFIX+"Combined"); /* set path for new merged folder */ if (IsFolder(merge_output)==false){ /* if output folder doesn't exist */ CreateFolder(merge_output); /* create our output folder */ } /* */ rc = GetLastError(); /* get the last error */ if (IsError(rc)){ /* if we cannot create our folder */ MessageBox('x',"Folder %s does not exist and cannot be created."+ /* display error */ "Error: %0x", merge_output); /* display error */ return ERROR_EXIT; /* return with error */ } /* */ rc = clear_folder(merge_output); /* make sure it's empty */ if (IsError(rc)){ /* check if clear_folder worked */ MessageBox('x',"Cannot delete files in folder %s. Error: %0x", /* display error */ merge_output,rc); /* display error */ return rc; /* return with erro */ } /* */ ProgressSetStatus("Reading Files"); /* update status */ ProgressUpdate(4,10); /* update progress */ read_file_contents(FILE_ONE,AddPaths(f1folder,f1instance)); /* read file one contents */ read_file_contents(FILE_TWO,AddPaths(f2folder,f2instance)); /* read file two contents */ if (DEBUG){ /* if debugging */ print_tables(); /* print tables */ } /* */ ProgressSetStatus("Copying Files"); /* update status */ ProgressUpdate(7,10); /* update progress */ // TODO - COPY FILES // // TODO - WRITE INSTANCE FILES // ProgressUpdate(10,10); /* update progress */ ProgressClose(); /* close progress */ MessageBox('i',"Files Merged Successfully, see folder %s", /* display message to user */ merge_output); /* display message to user */ CloseHandle(FileOneWindow); /* close window handle */ CloseHandle(FileTwoWindow); /* close window handle */ return ERROR_NONE; } /****************************************/ int clear_folder(string path){ /* delete all files from given folder */ /****************************************/ string filenames[]; /* filenames to delete */ int num_files; /* number of files */ int rc; /* result */ int ix; /* counter */ /* */ path = AddPathDelimiter(path); /* ensure path ends in slash */ filenames = EnumerateFiles(path+"*.*"); /* get all files in folder */ num_files = ArrayGetAxisDepth(filenames); /* get the number of files in folder */ for (ix=0;ix<num_files;ix++){ /* for each file in folder */ rc = DeleteFile(AddPaths(path,filenames[ix])); /* delete it */ if (IsError(rc)){ /* if we couldn't delete it */ return rc; /* return the error code */ } /* */ } /* */ return ERROR_NONE; /* return no error */ } /****************************************/ int read_file_contents(int file, string instance){ /* read the contents of the instance */ /****************************************/ handle instancefile; /* file we're reading */ int ix; /* counter */ int num_contexts; /* number of contexts */ int num_units; /* number of units */ int num_facts; /* number of facts */ string key; /* key to a table row */ string line; /* a line of the file we're reading */ string contexts[][]; /* table of contexts */ string units[][]; /* table of units */ string facts[]; /* array of facts */ /* */ instancefile = OpenFile(instance, FO_READ); /* open the file to read */ line = ReadLine(instancefile); /* read the next line of the file */ while (line!=""){ /* while we have a next line */ if (FindInString(line,CONTEXT_START)>(-1)){ /* if we are starting a context */ key = MD5CreateDigest(line,GetStringLength(line)); /* store key for context array */ contexts[key][0]=line; /* store first line of context in array */ ix = 1; /* initialize ix as row 1 of table */ while (line!="" && FindInString(line,CONTEXT_END)<0){ /* loop until we end context */ line = ReadLine(instancefile); /* read the next line */ contexts[key][ix] = line; /* store next line of context */ ix++; /* increment counter */ } /* */ } /* */ if (FindInString(line,UNIT_START)>(-1)){ /* if we are starting a unit */ key = MD5CreateDigest(line,GetStringLength(line)); /* store key for context array */ units[key][0]=line; /* store first line of unit in array */ ix = 1; /* initialize ix as row 1 of table */ while (line!="" && FindInString(line,UNIT_END)<0){ /* loop until we end unit */ line = ReadLine(instancefile); /* read the next line */ units[key][ix] = line; /* store next line of unit */ ix++; /* increment counter */ } /* */ } /* */ if (FindInString(line,FACT_IDENTIFIER)>(-1)){ /* if we have a fact */ facts[MD5CreateDigest(line,GetStringLength(line))]=line; /* store the fact in our table */ } /* */ line = ReadLine(instancefile); /* read the next line of the file */ } /* */ num_contexts = ArrayGetAxisDepth(contexts); /* get number of contexts */ num_units = ArrayGetAxisDepth(units); /* get number of units */ num_facts = ArrayGetAxisDepth(facts); /* get number of facts */ debug_message(FormatString("File: %d\r\nContexts: %d\r\nUnits: "+ /* display message */ "%d\r\nFacts: %d",file,num_contexts,num_units,num_facts)); /* display message */ CloseHandle(instancefile); /* close the open file */ switch(file){ /* switch on what file we're reading */ case FILE_ONE: /* if it's file one */ FileOneContexts = contexts; /* store contexts */ FileOneUnits = units; /* store units */ FileOneFacts = facts; /* store facts */ break; /* break */ case FILE_TWO: /* if it's file two */ FileTwoContexts = contexts; /* store contexts */ FileTwoUnits = units; /* store units */ FileTwoFacts = facts; /* store facts */ break; /* break */ } /* */ return ERROR_NONE; /* */ } /* */ /****************************************/ int compare_filesizes(string f1, string f2){ /* compare two file sizes */ /****************************************/ qword f1size; /* file 1 size */ qword f2size; /* file 2 size */ int sizedif; /* difference in sizes */ /* */ debug_message("Comparing "+f1+" to "+f2); /* debug message */ debug_message(FormatString("Name size dif: %d",namesize_dif)); /* debug message */ f1size = GetFileSize(f1); /* get size of file 1 */ debug_message(FormatString("F1 Size: %d",f1size)); /* debug message */ f2size = GetFileSize(f2); /* get size of file 2 */ debug_message(FormatString("F2 Size: %d",f2size)); /* debug message */ sizedif = f1size-f2size; /* get difference in sizes */ if ((Absolute(sizedif) - namesize_dif) == 0){ /* if files are same sized */ debug_message("Files are equal in size."); /* debug message */ return OTHER; /* return other */ } /* */ else{ /* if they are not the same size */ if (f1size>f2size){ /* if file one is bigger */ debug_message("File One is bigger."); /* add debug message */ return FILE_ONE; /* return file one */ } /* otherwise */ debug_message("File Two is bigger."); /* debug message */ return FILE_TWO; /* return file two */ } /* */ } /* */ /****************************************/ int compare_xbrl(string instance,string f1folder, string f2folder){ /* test if 2 XBRL files can be merged */ /****************************************/ /* */ int ix; /* loop counter */ int num_files; /* number of files in folder */ int sizeres; /* result of getting size dif */ string f1filepath; /* path to a file in folder 1 */ string f2filepath; /* path to a file in folder 2 */ string f1files[]; /* files in folders one */ /* */ f1folder = AddPathDelimiter(f1folder); /* ensure path ends in a slash */ f1files = EnumerateFiles(f1folder+"*.*"); /* get filenames in folder one */ /* */ num_files = ArrayGetAxisDepth(f1files); /* number of files in folder one */ for (ix=0;ix<num_files;ix++){ /* for each file in folder one */ f1filepath = AddPaths(f1folder,f1files[ix]); /* get path to file in folder one */ f2filepath = AddPaths(f2folder,f1files[ix]); /* get path to file in folder two */ if(IsFile(f2filepath)==false){ /* if this file doesn't exist in f2 */ SetLastError(ERROR_EXIT,"File "+f2filepath+" does not exist"); /* set error message */ return ERROR_EXIT; /* return with error */ } /* */ sizeres = compare_filesizes(f1filepath,f2filepath); /* get the file size result */ if(sizeres!=OTHER){ /* if the sizes don't match up */ if (f1files[ix]!=instance){ /* and this isn't the instance file */ bigger_file = sizeres; /* store the bigger file */ SetLastError(ERROR_WARN,"File "+f1files[ix]+" does not "+ /* set error */ "match." ); /* set error */ return ERROR_WARN; /* return error */ } /* */ } /* */ } /* */ return ERROR_NONE; /* return without error */ } /* */ /****************************************/ string export_file(string foldersuffix,string file, handle window){ /* export the XFR file */ /****************************************/ string path; /* path to output file */ string response; /* response from export */ string filenames[]; /* filenames */ string cmd; /* command to export */ int rc; /* response code */ int ix; /* counter */ int num_files; /* number of files */ /* */ path = get_export_folder(file,foldersuffix); /* get folder to export to */ if (path == ""){ /* if the export folder is blank */ return ""; /* return an error */ } /* */ cmd = "NoQuery: TRUE; Path: " + path; /* generate command string */ if (IsWindowHandleValid(window)){ /* check if file is already open */ ActivateEditWindow(window); /* activate edit window */ RunMenuFunction("XBRL_EXPORT",cmd); /* export the file */ } /* */ else{ /* */ RunMenuFunction("FILE_OPEN","Filename:"+file); /* open file */ RunMenuFunction("XBRL_EXPORT",cmd); /* export the file */ } /* */ response = GetMenuFunctionResponse(); /* response from the export */ return GetParameter(response,"Instance"); /* return the name of the instance file */ } /****************************************/ string get_export_folder(string file, string foldersuffix){ /* build the path to the output folder */ /****************************************/ int rc; /* return code */ string folder; /* folder for XFR file */ string newfolder; /* new folder for exporting to */ /* */ folder = GetFilePath(file); /* get folder */ newfolder = EXPORT_FOLDER_PREFIX+foldersuffix; /* build name of new folder */ newfolder = AddPaths(folder,newfolder); /* build full path to newfolder */ if (IsFolder(newfolder)==false){ /* if folder doesn't exist */ rc = CreateFolder(newfolder); /* try to create the folder */ if (IsError(rc)){ /* did it create OK? */ return ""; /* return a blank string */ } /* */ } /* */ rc = clear_folder(newfolder); /* clear the export folder */ if (IsError(rc)){ /* if we couldn't clear the export */ SetLastError(rc,"Cannot delete files in folder "+newfolder); /* set error */ return ""; /* return */ } /* */ return newfolder; /* return the new folder path */ } /****************************************/ int main(){ /* main function */ /****************************************/ string s1; /* General */ /* */ s1 = GetScriptParent(); /* Get the parent */ if (s1 == "LegatoIDE") { /* Is run from the IDE (debug) */ setup(); /* run setup */ run(0,"preprocess"); /* run as though hooked */ } /* end IDE run */ return ERROR_NONE; /* */ } /* */ /****************************************/ int merge_load(){ /* Setup Action */ /****************************************/ string file_one,file_two; /* old file paths */ /* */ file_one = GetSetting("XBRLMerge","File One"); /* get path to file one */ file_two = GetSetting("XBRLMerge","File Two"); /* get path to file two */ /* */ EditSetText(XBRL_ONE_TEXT,file_one); /* set edit text */ EditSetText(XBRL_TWO_TEXT,file_two); /* set edit text */ return ERROR_NONE; /* */ } /****************************************/ int merge_action(int c_id, int c_ac) { /* Control Action */ /****************************************/ string s1; /* General */ /* */ /* ** Control Actions */ /* * Browse for XML 1 */ if (c_id == XBRL_RESET){ /* if resetting */ EditSetText(XBRL_ONE_TEXT,""); /* reset text of box */ PutSetting("XBRLMerge","File One",""); /* reset setting file */ EditSetText(XBRL_TWO_TEXT,""); /* reset text of box */ PutSetting("XBRLMerge","File Two",""); /* reset setting file */ } /* */ /* */ if (c_id == XBRL_ONE_BROWSE) { /* Control ID (button) */ s1 = EditGetText(XBRL_ONE_TEXT); /* Get the current path */ s1 = BrowseOpenFile("Select First XBRL File","*.xfr|*.xfr", s1); /* Browse for the folder */ if (s1 != "") { /* Returned a value (OK) */ EditSetText(XBRL_ONE_TEXT, s1); /* Get the current path */ PutSetting("XBRLMerge","File One",s1); /* store setting for later */ } /* end has string */ return ERROR_NONE; /* Done */ } /* end browse */ /* * Browse for XML 2 */ if (c_id == XBRL_TWO_BROWSE) { /* Control ID (button) */ s1 = EditGetText(XBRL_TWO_TEXT); /* Get the current path */ if (s1 == "") { /* Empty, pick up source */ s1 = EditGetText(XBRL_ONE_TEXT); /* Get the current path */ } /* end has string */ s1 = BrowseOpenFile("Select Second XBRL File","*.xfr|*.xfr",s1); /* Browse for the folder */ if (s1 != "") { /* Returned a value (OK) */ EditSetText(XML_TWO_TEXT, s1); /* Get the current path */ PutSetting("XBRLMerge","File Two",s1); /* store setting for later */ } /* end has string */ return ERROR_NONE; /* Done */ } /* end browse */ return ERROR_NONE; /* Exit no error */ } /* end routine */ /****************************************/ int merge_validate(){ /* Validate Action */ /****************************************/ int valid_one,valid_two; /* validations of files */ string file_one,file_two; /* file paths */ /* */ file_one = EditGetText(XBRL_ONE_TEXT); /* get path to file one */ file_two = EditGetText(XBRL_TWO_TEXT); /* get path to file two */ if (file_one == "" || file_two == ""){ /* if either file is blank */ MessageBox('x',"Two files must be selected."); /* make sure two files are selected */ return ERROR_EXIT; /* exit with error */ } /* */ if (file_one == file_two){ /* test if same file */ MessageBox('x',"You cannot merge a file into itself."); /* display error message */ return ERROR_EXIT; /* return with error */ } valid_one = validate_file(file_one,"One"); /* validate file_one */ if (IsWindowHandleValid(FileTwoWindow)){ /* if our validate set a file handle */ FileOneWindow = FileTwoWindow; /* store handle as file one */ FileTwoWindow = NULL_HANDLE; /* close file two window */ } /* */ valid_two = validate_file(file_two,"Two"); /* validate file_two */ if (valid_one==valid_two && valid_two==ERROR_NONE){ /* if both validates returned ERROR_NONE*/ return ERROR_NONE; /* return no error */ } /* */ return ERROR_EXIT; /* exit with an error */ } /****************************************/ int validate_file(string file, string display, handle file_handle){ /* validate an individual file */ /****************************************/ int rc; /* return code from our file */ int depth; /* number of windows open */ int ix; /* array index counter */ /* */ depth = ArrayGetAxisDepth(edit_windows); /* get number of windows open */ if (IsFile(file)){ /* check if file one is a file */ if (CanAccessFile(file,FO_WRITE)){ /* check if we can write to the file */ if (MakeLowerCase(GetExtension(file))==".xfr"){ /* make sure file ends in .xfr */ return ERROR_NONE; /* return no error */ } /* */ else{ /* if file doesn't end in .xfr */ MessageBox('x',"File "+display+" Must be an XFR file."); /* display error message */ } /* */ } /* */ else{ /* if we cannot write to file */ for (ix = 0; ix<depth; ix++){ /* scan all open windows */ if (edit_windows[ix]["Filename"] == file){ /* if the file is already open */ FileTwoWindow=MakeHandle(edit_windows[ix]["ClientHandle"]); /* get the handle to it */ rc = GetLastError(); /* check if we got an error */ if (IsError(rc)){ /* if we have an error */ MessageBox('x',"Cannot create handle, error %0x",rc); /* display error message */ } /* */ else{ /* if we don't have an error */ return ERROR_NONE; /* the file is already open, return */ } /* */ } /* */ } /* */ MessageBox('x',"Cannot open File "+display); /* give error message */ } /* */ } /* */ else{ /* if we cannot open file one */ MessageBox('x',"File "+display+" does not exist."); /* give error message */ } /* if we haven't exited yet */ return ERROR_EXIT; /* return an error */ } /* */ /****************************************/ int debug_message(message){ /* add a message to the debug log */ /****************************************/ if (DEBUG){ /* if debugging */ if(GetStringLength(message)>DEBUG_MSG_MAX){ /* if message is longer than 100 */ message = GetStringSegment(message,0,DEBUG_MSG_MAX); /* print out first 100 chars */ } /* */ AddMessage(message); /* log the message */ } /* */ return ERROR_NONE; /* */ } /* */ /****************************************/ int print_tables(){ /* debug function to print tables */ /****************************************/ int num_rows; /* num rows */ int ix; /* counter */ /* */ debug_message("Unit Table 1:"); /* add a message */ num_rows = ArrayGetAxisDepth(FileOneUnits); /* get num rows */ for(ix=0;ix<num_rows;ix++){ /* for each row */ debug_message("Hash: "+ArrayGetKeyName(FileOneUnits,ix)); /* add a debug message */ debug_message("Unit: "+FileOneUnits[ix][0]); /* print out info */ } /* */ debug_message("Unit Table 2:"); /* add a message */ num_rows = ArrayGetAxisDepth(FileTwoUnits); /* get num rows */ for(ix=0;ix<num_rows;ix++){ /* for each row */ debug_message("Hash: "+ArrayGetKeyName(FileTwoUnits,ix)); /* add a debug message */ debug_message("Unit: "+FileTwoUnits[ix][0]); /* print out info */ } /* */ debug_message("*****************************"); /* add spacer */ debug_message("Context Table 1:"); /* add a message */ num_rows = ArrayGetAxisDepth(FileOneContexts); /* get num rows */ for(ix=0;ix<num_rows;ix++){ /* for each row */ debug_message("Hash : "+ArrayGetKeyName(FileOneContexts,ix)); /* add a debug message */ debug_message("Context: "+FileOneContexts[ix][0]); /* print out info */ } /* */ debug_message("Context Table 2:"); /* add a message */ num_rows = ArrayGetAxisDepth(FileTwoContexts); /* get num rows */ for(ix=0;ix<num_rows;ix++){ /* for each row */ debug_message("Hash : "+ArrayGetKeyName(FileTwoContexts,ix)); /* add a debug message */ debug_message("Context: "+FileTwoContexts[ix][0]); /* print out info */ } /* */ debug_message("*****************************"); /* add spacer */ debug_message("Fact Table 1:"); /* add a message */ num_rows = ArrayGetAxisDepth(FileOneFacts); /* get num rows */ for(ix=0;ix<num_rows;ix++){ /* for each row */ debug_message("Hash: "+ArrayGetKeyName(FileOneFacts,ix)); /* add a debug message */ debug_message("Fact: "+FileOneFacts[ix]); /* print out info */ } /* */ debug_message("Fact Table 2:"); /* add a message */ num_rows = ArrayGetAxisDepth(FileTwoFacts); /* get num rows */ for(ix=0;ix<num_rows;ix++){ /* for each row */ debug_message("Hash: "+ArrayGetKeyName(FileTwoFacts,ix)); /* add a debug message */ debug_message("Fact: "+FileTwoFacts[ix]); /* print out info */ } /* */ return ERROR_NONE; /* */ } /* */ /****************************************/ int merge_ok(){ /* OK Action */ /****************************************/ int f1namesize,f2namesize; /* sizes of the names of xfr files */ string f1,f2; /* file names chosen by user */ /* */ FileOne = EditGetText(XBRL_ONE_TEXT); /* get path to file one */ FileTwo = EditGetText(XBRL_TWO_TEXT); /* get path to file two */ /* */ f1 = GetFilename(FileOne); /* get the name of the first xfr file */ f2 = GetFilename(FileTwo); /* get the name of the second xfr file */ f1namesize = GetStringLength(EncodeURIComponent(f1)); /* size of file 1 name */ f2namesize = GetStringLength(EncodeURIComponent(f2)); /* size of file 2 name */ namesize_dif = Absolute(f1namesize-f2namesize); /* get size of dif in name sizes */ return ERROR_NONE; /* return that the user pressed OK */ }
This week, we thought it would be helpful to list out only the new defines and talk about what the new defines are used for. The four defines, CONTEXT_START, CONTEXT_END, UNIT_START, and UNIT_END, are string segments that occur in an XBRL file exported from GoFiler at the start or end of a context or unit. So if we’re parsing an instance file and encounter the text in the define CONTEXT_START, for example, we can tell that this line is the start of a new context define. If we encounter a line that contains CONTEXT_END, we can tell this is the end of a context. The same goes for the two unit defines. They are used to indicate where a context or unit starts or stops. We used defines so that they can be changed quickly if the XBRL written out by GoFiler is ever changed. The define FACT_IDENTIFIER is very similar in that it only occurs in a fact definition. So if a line of XML contains this string segment, then that line must be a fact.
Theoretically, the string segment contextRef could actually appear in a line other than a fact, if the XBRL for example has a unit named “contextRef” or a context with an ID that contains “contextRef” as part of the ID. To get around this, we just need to check if it’s a context or unit first. If not, we can assume if it contains contextRef then the line is a fact.
These defines are all based off of values that appear in GoFiler and GoXBRL generated XBRL instance files. Not all instance files will have the same formatting, so if you’re trying to merge files from a different XBRL vendor, these defines may need to be altered. Also, it assumes that each XML tag is on it’s own line. This is the style GoFiler uses, so it’s a safe assumption to make when the files are always written out by GoFiler. If trying to merge files where the entire context is on a single line, for example, it will cause unpredictable (and probably bad) behavior.
The define DEBUG is a true/false value. It is used by the new function debug_message. If true, this function will print out debug messages. If false, the debug_message function does nothing. This is useful in printing out values during the execution of the code so we can see exactly what is going on. The DEBUG_MSG_MAX defines the maximum number of characters to be printed out by a debug message.
We also have six new global variables this week. FileOneContexts and FileTwoContexts are two-dimensional string arrays. These will be used as a sort of hash table to store contexts. Each row will have a key that will be a hashed value of the first line of that context. Each column will be a separate line of the the context. So the first two columns of this table might look something like this for a pair of standard contexts:
Key | Column 1 | Column 2 |
8e2715c249422f.... | <xbrli:context id="AsOf2016-12-31> | <xbrli:entity> |
e48505dadb74b.... | <xbrli:context id="AsOf2015-12-31> | <xbrli:entity> |
Using this data structure, we can quickly reference individual lines of a context, and we can easily compare contexts from one file to the other by checking to see if the hashed key value for a context exists in both tables. The variables FileOneUnits and FileTwoUnits are structured the exact same way. FileOneFacts and FileTwoFacts are different, however, in that they only need one column, so it can just be a single dimensional hash table.
#define CONTEXT_START "<xbrli:context" #define CONTEXT_END "</xbrli:context>" #define UNIT_START "<xbrli:unit" #define UNIT_END "</xbrli:unit>" #define FACT_IDENTIFIER "contextRef" #define DEBUG true #define DEBUG_MSG_MAX 150 string FileOneContexts[][]; string FileTwoContexts[][]; string FileOneUnits[][]; string FileTwoUnits[][]; string FileOneFacts[]; string FileTwoFacts[];
The first new function this week is debug_message. It’s a very basic function. It checks to see if the define DEBUG is turned on (true). If so, the function checks the debug message length. If it’s longer than the max allowed message length, it truncates the message to that length. If it’s shorter, it simply prints the message. Using a function like this instead of putting random MessageBox or AddMessage functions in your script is very convenient because now you can turn off all debug messages by just switching a define to false. Otherwise, you would have to go over the entire file to make sure all extra messages were removed.
/****************************************/ int debug_message(message){ /* add a message to the debug log */ /****************************************/ if (DEBUG){ /* if debugging */ if(GetStringLength(message)>DEBUG_MSG_MAX){ /* if message is longer than 150 */ message = GetStringSegment(message,0,DEBUG_MSG_MAX); /* print out first 150 chars */ } /* */ AddMessage(message); /* log the message */ } /* */ return ERROR_NONE; /* */ } /* */ /****************************************/
Next is the print_tables function. We added this function for the sole purpose of making sure the hash tables look like we expect them to. If the tables match the expected output we know that the hash table creation is working properly and can be used to find unique facts, units and contexts in the instance files next week. First, the function prints out the message “Unit Table 1” using the debug_message function defined above. Then it gets the number of units in the FileOneUnits table and loops through them with a for loop, printing out the hash value for each row (by using the ArrayGetKeyName SDK function). Finally the function examines the first column of that row, so we can see what the first line of the unit was. This is then repeated for the hash table FileTwoUnits. The exact same process is repeated for Contexts, and for Facts. This way, we can have all hash tables printed out, so we can verify that these tables are indeed being created with the expected data. This function currently only prints out the hash for the row and the first column in the row, but if needed, it could be modified to print out every column in the row. It was not really necessary to do so for this script, however, so we left it out.
/****************************************/ int print_tables(){ /* debug function to print tables */ /****************************************/ int num_rows; /* num rows */ int ix; /* counter */ /* */ debug_message("Unit Table 1:"); /* add a message */ num_rows = ArrayGetAxisDepth(FileOneUnits); /* get num rows */ for(ix=0;ix<num_rows;ix++){ /* for each row */ debug_message("Hash: "+ArrayGetKeyName(FileOneUnits,ix)); /* add a debug message */ debug_message("Unit: "+FileOneUnits[ix][0]); /* print out info */ } /* */ debug_message("Unit Table 2:"); /* add a message */ num_rows = ArrayGetAxisDepth(FileTwoUnits); /* get num rows */ for(ix=0;ix<num_rows;ix++){ /* for each row */ debug_message("Hash: "+ArrayGetKeyName(FileTwoUnits,ix)); /* add a debug message */ debug_message("Unit: "+FileTwoUnits[ix][0]); /* print out info */ } /* */ debug_message("*****************************"); /* add spacer */ debug_message("Context Table 1:"); /* add a message */ num_rows = ArrayGetAxisDepth(FileOneContexts); /* get num rows */ for(ix=0;ix<num_rows;ix++){ /* for each row */ debug_message("Hash : "+ArrayGetKeyName(FileOneContexts,ix)); /* add a debug message */ debug_message("Context: "+FileOneContexts[ix][0]); /* print out info */ } /* */ debug_message("Context Table 2:"); /* add a message */ num_rows = ArrayGetAxisDepth(FileTwoContexts); /* get num rows */ for(ix=0;ix<num_rows;ix++){ /* for each row */ debug_message("Hash : "+ArrayGetKeyName(FileTwoContexts,ix)); /* add a debug message */ debug_message("Context: "+FileTwoContexts[ix][0]); /* print out info */ } /* */ debug_message("*****************************"); /* add spacer */ debug_message("Fact Table 1:"); /* add a message */ num_rows = ArrayGetAxisDepth(FileOneFacts); /* get num rows */ for(ix=0;ix<num_rows;ix++){ /* for each row */ debug_message("Hash: "+ArrayGetKeyName(FileOneFacts,ix)); /* add a debug message */ debug_message("Fact: "+FileOneFacts[ix]); /* print out info */ } /* */ debug_message("Fact Table 2:"); /* add a message */ num_rows = ArrayGetAxisDepth(FileTwoFacts); /* get num rows */ for(ix=0;ix<num_rows;ix++){ /* for each row */ debug_message("Hash: "+ArrayGetKeyName(FileTwoFacts,ix)); /* add a debug message */ debug_message("Fact: "+FileTwoFacts[ix]); /* print out info */ } /* */ return ERROR_NONE; /* */ } /* */
So this is the real “meat and potatoes” function this week where most of our new work is done. The function read_file_contents only takes two parameters, the number of the file we’re reading (this is intended to be the define FILE_ONE or FILE_TWO) and the path to the instance file for it. The first thing we’re going to do is open that file with the FileOpen function and read the first line of it with the ReadLine SDK function. While the line isn’t blank, we can iterate over the each line. In that loop, the first thing that must be checked is if the line has the define CONTEXT_START in it. If it does, that means this is the first line of a context, so we can start mapping that context. To map a context, we must first create a key for this row of our hash table. We can do this with the MD5CreateDigest SDK function. This function takes two parameters: the string to be hashed and the length of that string. Creating a hash gives us a unique value for this row, so we can compare the contents of file one and file two very quickly. Like with all hash tables, there is technically a chance that two different lines can give us identical values out of this function. However, it would take approximately 264 (or roughly 18 quintillion) attempts for this to happen. The likelihood of it is therefore small enough to be considered a non-issue here.
After we get our key value, we can use it to store the first line of our context in our instance variable contexts. Then, while the next line is not the end of our file or does not contain the define CONTEXT_END, we can keep reading lines and adding them as additional columns for this row of our hash table.
/****************************************/ int read_file_contents(int file, string instance){ /* read the contents of the instance */ /****************************************/ handle instancefile; /* file we're reading */ int ix; /* counter */ int num_contexts; /* number of contexts */ int num_units; /* number of units */ int num_facts; /* number of facts */ string key; /* key to a table row */ string line; /* a line of the file we're reading */ string contexts[][]; /* table of contexts */ string units[][]; /* table of units */ string facts[]; /* array of facts */ /* */ instancefile = OpenFile(instance, FO_READ); /* open the file to read */ line = ReadLine(instancefile); /* read the next line of the file */ while (line!=""){ /* while we have a next line */ if (FindInString(line,CONTEXT_START)>(-1)){ /* if we are starting a context */ key = MD5CreateDigest(line,GetStringLength(line)); /* store key for context array */ contexts[key][0]=line; /* store first line of context in array */ ix = 1; /* initialize ix as row 1 of table */ while (line!="" && FindInString(line,CONTEXT_END)<0){ /* loop until we end context */ line = ReadLine(instancefile); /* read the next line */ contexts[key][ix] = line; /* store next line of context */ ix++; /* increment counter */ } /* */ } /* */
Now that we have our contexts stored, we can do the exact same for our units. This block works exactly the same way as the contexts block, except it triggers if the line contains UNIT_START and ends when our loop hits UNIT_END (or the file ends). If the line isn’t a context and it isn’t a unit, then we can check if it contains our FACT_IDENTIFIER define. Just in case the XBRL file has a context or a unit that contains the string segment FACT_IDENTIFIER, we check it last. If it is a fact, then we can just add the appropriate row to our hash table facts. Then we can read the next line and loop all over again until the file ends.
if (FindInString(line,UNIT_START)>(-1)){ /* if we are starting a unit */ key = MD5CreateDigest(line,GetStringLength(line)); /* store key for context array */ units[key][0]=line; /* store first line of unit in array */ ix = 1; /* initialize ix as row 1 of table */ while (line!="" && FindInString(line,UNIT_END)<0){ /* loop until we end unit */ line = ReadLine(instancefile); /* read the next line */ units[key][ix] = line; /* store next line of unit */ ix++; /* increment counter */ } /* */ } /* */ if (FindInString(line,FACT_IDENTIFIER)>(-1)){ /* if we have a fact */ facts[MD5CreateDigest(line,GetStringLength(line))]=line; /* store the fact in our table */ } /* */ line = ReadLine(instancefile); /* read the next line of the file */ } /* */
Once we have the information loaded into our data structure, we can get the number of contexts, units, and facts by using the ArrayGetAxisDepth function. This isn’t really required, but we thought it would be helpful to put a debug message here that describes the size of these data structures for each file. Then we can close our file and save this instance information to our global variables. We need to switch variables depending on which file it is. If file is FILE_ONE, we can set our values for the file one data structures (FileOneContexts, FileOneUnits, and FileOneFacts) to our temporary file structure instance variables. If it’s FILE_TWO, we can set our values for the file two data structures instead.
num_contexts = ArrayGetAxisDepth(contexts); /* get number of contexts */ num_units = ArrayGetAxisDepth(units); /* get number of units */ num_facts = ArrayGetAxisDepth(facts); /* get number of facts */ debug_message(FormatString("File: %d\r\nContexts: %d\r\nUnits: "+ /* display message */ "%d\r\nFacts: %d",file,num_contexts,num_units,num_facts)); /* display message */ CloseHandle(instancefile); /* close the open file */ switch(file){ /* switch on what file we're reading */ case FILE_ONE: /* if it's file one */ FileOneContexts = contexts; /* store contexts */ FileOneUnits = units; /* store units */ FileOneFacts = facts; /* store facts */ break; /* break */ case FILE_TWO: /* if it's file two */ FileTwoContexts = contexts; /* store contexts */ FileTwoUnits = units; /* store units */ FileTwoFacts = facts; /* store facts */ break; /* break */ } /* */ return ERROR_NONE; /* */ } /* */
The run function also needs to be modified. The first difference here is that right when the function starts, we need to set the bigger_file global variable to the defined value for OTHER. We need to figure out which file is bigger so we know which one to merge into the other (the smaller goes into the bigger). So if this value is still set to OTHER later, we know that we need to actually check which one is bigger.
/****************************************/ int run(f_id,mode){ /* main run loop */ /****************************************/ string errmsg; /* an error message */ string master_folder; /* folder for master files */ string merge_output; /* the output folder for merge */ string f1folder,f2folder; /* folders that were exported to */ string f1instance,f2instance; /* instance files of exported XFRs */ qword f1namesize, f2namesize; /* sizes of filenames */ int sizeres; /* result of filesize compare */ int rc; /* result */ /* */ bigger_file = OTHER; /* set default value for bigger file */ if (mode!="preprocess"){ /* if not in preprocess */ return ERROR_NONE; /* return no error */ } /* */ edit_windows = EnumerateEditWindows(); /* get open edit windows */ rc = DialogBox("MergeXBRLDlg", "merge_"); /* open selector dialog */ if (IsError(rc)==true){ /* if the user didn't press OK */ CloseHandle(FileOneWindow); /* close handle */ CloseHandle(FileTwoWindow); /* close handle */ return rc; /* return */ } /* */ /* */
The next difference is the ProgressOpen function. This opens a progress window. This window can be modified with the ProgressSetStatus SDK function to update the message it shows. The ProgressUpdate function, when given two numbers, sets the window to show a percentage. So calling the ProgressUpdate function with the parameters 1 and 10 sets it to 10% finished.
ProgressOpen("XBRL Merger"); /* open progress */ ProgressSetStatus("Exporting Files"); /* set status */ ProgressUpdate(1,10); /* update progress */ f1folder = get_export_folder(FileOne,"One"); /* get the file 1 folder */ errmsg = GetLastErrorMessage(); /* get the last error */ if (f1folder == "" ){ /* if we cannot get the folder */ MessageBox('x',"Cannot create folder for export. %s",errmsg); /* display error message */ return ERROR_EXIT; /* return with error */ } /* */ f2folder = get_export_folder(FileTwo,"Two"); /* get the file 2 folder */ errmsg = GetLastErrorMessage(); /* */ if (f2folder == ""){ /* if we cannot get a folder */ MessageBox('x',"Cannot create folder for export. %s",errmsg); /* display error message */ return ERROR_EXIT; /* return with error */ } /* */ f1instance = export_file("One",FileOne, FileOneWindow); /* export the first file */ f2instance = export_file("Two",FileTwo, FileTwoWindow); /* export the second file */ ProgressSetStatus("Comparing XBRL files"); /* progress set status */ ProgressUpdate(2,10); /* update progress status */ if (f1instance=="" || f2instance==""){ /* test if either instance is blank */ MessageBox('x',"Unable to export XFR files. "+errmsg); /* display error message */ return ERROR_EXIT; /* return with error */ } /* */ rc = compare_xbrl(f1instance,f1folder,f2folder); /* check if the files can be merged */ errmsg = GetLastErrorMessage(); /* get the last error message */ if (IsError(rc)){ /* if there was a problem */ if (rc==ERROR_WARN){ /* maybe not fatal, ask user to cont. */ rc = YesNoBox('q',"File size mismatch, result may have errors."+/* ask user */ errmsg+" Continue?"); /* ask user */ if (rc!=IDYES){ /* if the user idn't press yes */ return ERROR_EXIT; /* return with error */ } /* */ } /* */ else{ /* if the error is definitely fatal */ MessageBox('x',"Files are not compatible to merge. %s", errmsg);/* display error */ return rc; /* return eror code */ } /* */ } /* */
This is the new main section of the run function. First, we need to set a default master (bigger) folder, so we set that to f1folder . Then if bigger_file is FILE_TWO, we can change the master folder to f2folder. Remember from last week that bigger_file is set by the compare_xbrl function when it compares the two exported filesets. If the files were of equal size, then bigger_file is still equal to OTHER. In this case, we must use our compare_filesizes function to compare the sizes of the instance files, which were the only files not checked by the compare_xbrl function because they are expected to be difference sizes. Depending on the results of that, we set the bigger_file to FILE_ONE or FILE_TWO and master_folder to f1folder or f2folder. After all that, we want to put out two debug messages specifying the master file and the master folder. For example, if file one is bigger, master_file should be FILE_ONE and master_folder should be “MergeOne”. If there is a mismatch and master_file is FILE_ONE but master_folder is “MergeTwo”, then we know there is a serious problem with our logic. Using debug messages like this can help pinpoint issues in a script.
master_folder = f1folder; /* set default master folder */ if (bigger_file == FILE_TWO){ /* if file 2 is bigger */ master_folder = f2folder; /* if file 2 is the master folder */ } /* */ if (bigger_file == OTHER){ /* if we don't know which file is bigger*/ sizeres = compare_filesizes(AddPaths(f1folder,f1instance), /* get result of size compare */ AddPaths(f2folder,f2instance)); /* get the result of the size compare */ bigger_file = FILE_ONE; /* set bigger file */ if (sizeres == FILE_TWO){ /* if file two is bigger */ bigger_file = FILE_TWO; /* set file two as bigger file */ master_folder = f2folder; /* set file two as master file */ } /* */ } /* */ debug_message(FormatString("Master File: %d",bigger_file)); /* add debug message */ debug_message(FormatString("Master Folder: %s",master_folder)); /* add debug message */
So now that we have which file is bigger, we need to build our output folder. We can start by setting our merge_output variable to a default value of f1folder. If the bigger_file variable is FILE_TWO, then our merge_output folder should be f2folder. We can then complete our merge_output folder by using the AddPaths function to combine it with a folder name, made from our defined EXPORT_FOLDER_PREFIX value and the suffix “Combined”. Now that we have a folder name, we can check if it already exists with the IsFolder function. If not, we need to run the CreateFolder function to make it. We can then use the GetLastError function to see if there was an error creating the folder, and if so, we need to display an error message and exit. Once we know the folder exists, we can use our clear_folder function to empty it out, and then use the IsError function to check to make sure it actually emptied out correctly.
merge_output = f1folder; /* set output folder */ if (bigger_file == FILE_TWO){ /* if file two is bigger */ merge_output = f2folder; /* set output folder to file two */ } /* */ merge_output = AddPaths(merge_output,"../"+ /* set path for new merged folder */ EXPORT_FOLDER_PREFIX+"Combined"); /* set path for new merged folder */ if (IsFolder(merge_output)==false){ /* if output folder doesn't exist */ CreateFolder(merge_output); /* create our output folder */ } /* */ rc = GetLastError(); /* get the last error */ if (IsError(rc)){ /* if we cannot create our folder */ MessageBox('x',"Folder %s does not exist and cannot be created."+ /* display error */ "Error: %0x", merge_output); /* display error */ return ERROR_EXIT; /* return with error */ } /* */ rc = clear_folder(merge_output); /* make sure it's empty */ if (IsError(rc)){ /* check if clear_folder worked */ MessageBox('x',"Cannot delete files in folder %s. Error: %0x", /* display error */ merge_output,rc); /* display error */ return rc; /* return with erro */ } /* */
After we know we have our output folder and we know it’s empty, we can update our progress bar with the ProgressSetStatus and ProgressUpdate functions. Then we can call our read_file_contents functions to parse the contents of our two instance files. After that, we need to check if we’re running in debug mode, and if so, run our print_tables debug function so we can see the output from all our tables. Once we do that, we should update our progress again .We put in a couple of placeholder “TODO” comments next, because this is where we’re going to need to add the code to copy our fileset to the output folder and write out our merged instance file. After these actions are complete, we can then update our progress to be finished and then close the progress bar. Then we can display a success message and close the handles to our file windows.
ProgressSetStatus("Reading Files"); /* update status */ ProgressUpdate(4,10); /* update progress */ read_file_contents(FILE_ONE,AddPaths(f1folder,f1instance)); /* read file one contents */ read_file_contents(FILE_TWO,AddPaths(f2folder,f2instance)); /* read file two contents */ if (DEBUG){ /* if debugging */ print_tables(); /* print tables */ } /* */ ProgressSetStatus("Copying Files"); /* update status */ ProgressUpdate(7,10); /* update progress */ // TODO - COPY FILES // // TODO - WRITE INSTANCE FILES // ProgressUpdate(10,10); /* update progress */ ProgressClose(); /* close progress */ MessageBox('i',"Files Merged Successfully, see folder %s", /* display message to user */ merge_output); /* display message to user */ CloseHandle(FileOneWindow); /* close window handle */ CloseHandle(FileTwoWindow); /* close window handle */ return ERROR_NONE; /* */ } /* */
That concludes our script for this week. So far, it asks the user to pick a pair of files, opens or switches to those files if they are already open, exports them to new folders, compares them to figure out which is bigger, and then reads the contents of the instance files into six hash tables we will use to write out the merged output file next week. It may not look like much got done, but reading input from files and storing it in an intelligent way so it can be used later can be a difficult task programmatically.
Steven Horowitz has been working for Novaworks for over five years as a technical expert with a focus on EDGAR HTML and XBRL. Since the creation of the Legato language in 2015, Steven has been developing scripts to improve the GoFiler user experience. He is currently working toward a Bachelor of Sciences in Software Engineering at RIT and MCC. |
Additional Resources
Legato Script Developers LinkedIn Group
Primer: An Introduction to Legato