With the recent news about the SEC data breach, people are looking towards their own internal security practices to ensure they are handling their data responsibly and doing everything possible to prevent their own data breaches. In the financial document preparation and submission industry, one of the most common pieces of private data that is passed back and forth through email is a filer’s CCC. To file a document on behalf of a company, you only need their CIK (which is publicly available) and the CCC (which is not supposed to be publicly available). It follows then that if an unauthorized entity obtains a CCC, they could fraudulently file on a company’s behalf to the SEC. So it’s in your best interest to keep your CCC secret!
Friday, September 29. 2017
LDC #54: Protecting Your CCCs With Encryption
By default, GoFiler stores CCC codes as plain text in project files since that is the format the SEC expects during the filing process. GoFiler also creates a copy of every document sent to the SEC, which also has the CCC in plain text. To prevent unauthorized data loss, these CCCs can be redacted or encrypted after you’re finished with a project. This would normally entail opening the gfp or xml files and removing the data as part of closing out a project. However, this process is much easier when you have a Legato script that can target a folder and automatically redact or encrypt all the CCCs in it.
Our script this week is going to ask for a folder and an obfuscation method (either redacting of the CCC permanently or encrypting it for later decryption and use). Then it recursively iterates over every file in the selected folder to look for GoFiler or EDGAR XML files to obfuscate. It’s important to note that the script detailed here only works for files that have the CCC on the same line as the SGML tag in the XML or project file. This is the case for all projects and XML files created by GoFiler but may not be true for all files that contain EDGAR information. If your organization uses different EDGAR software, the script could be modified to use the more comprehensive SGML parser built into Legato instead of using MappedData and the WordParser object like this script does.
Our script:
// // // GoFiler Legato Script - Obfuscate CCCs // ------------------------------------------ // // Rev 09-29-2017 // // (c) 2017 Novaworks, LLC -- All rights reserved. // // Place this and its companion file (.rc) into the Scripts folder of the application. GoFiler must be // restarted to have the menu functions added. // // Notes: Runs from menu hook, prompts user for a folder and obfuscation method, then obscures the CCC // in project and XML files it detects in the directory it was pointed at. // // #include "ObfuscateCCC.rc" #define CCC_REGEX "[#|\\*|\\$|@|a-zA-Z|\\d]{8}" #define FIELD_WRAP "<!-- EncryptedCCC: %s -->" #define REDACTED_STRING "<REDACTED>" #define MD_ENCRYPT 1 #define MD_REDACT 2 void run_obfuscate(int f_id, string mode); /* Call from Hook Processor */ boolean obfuscate_file(string file, handle log); /* obfuscates a file. */ boolean is_obfuscatable(string filetype); /* checks if the filetype is recognized */ string obfuscate_string(string input); /* obfuscates the string */ int obfuscate_mode; /****************************************/ void setup(){ /* setup - adds function to menu */ /****************************************/ string obfuscate[]; /* menu options array */ string fn; /* script filename */ /* */ fn = GetScriptFilename(); /* gets the filename of the script */ obfuscate["Code"] = "OBFUSCATE_CCC"; /* menu code of the function */ obfuscate["MenuText"] = "Obfuscate CCC"; /* menu description of the function */ obfuscate["Description"]="<b>Obfuscate CCC</b> Obfuscate CCC "; /* long description of the function */ obfuscate["Description"]+="by either redacting or encrypting them.";/* long description of the function */ obfuscate["SmallBitmap"] = "IMAGE_ICON"; /* image icon to use */ /* */ MenuAddFunction(obfuscate); /* adds menu function to tools menu */ /* */ MenuSetHook(obfuscate["Code"],fn,"run_obfuscate"); /* add run function to menu function */ } /* */ /****************************************/ void main(){ /* main - primary program entry pt */ /****************************************/ if (GetScriptParent()=="LegatoIDE"){ /* if running within legato IDE */ setup(); /* setup hook on menu */ run_obfuscate(1,"preprocess"); /* run the actual fucntion */ } /* */ } /* */ /****************************************/ void run_obfuscate(int f_id, string mode){ /* run_obfuscate - obfuscate CCC */ /****************************************/ int rc; /* return code */ handle log; /* log handle */ string msg; /* dialog message */ string target; /* target folder to obscure */ string files[]; /* array of files to obscure */ string file_path; /* path to current file */ string extension; /* extension of a file */ int ix,numfiles,modified; /* counter, num files, modified files */ /* */ if (mode!="preprocess"){ /* if not running in preprocess */ return; /* return and exit */ } /* */ /* */ rc = DialogBox("ObfuscateDlg", "obfuscate_"); /* open the options dialog */ if (rc != ERROR_NONE){ /* if the dialog didn't exit normally */ return; /* return and exit */ } /* */ /* */ target = GetSetting("Options","Last Folder"); /* get the target folder to obscure */ ProgressOpen("Obfuscating CCCs"); /* open a progress bar */ ProgressSetStatus("Discovering Files"); /* set the message on progress bar */ files = EnumerateFiles(AddPaths(target,"*.gfp;*.xml"), /* enumerate files in target folder */ FOLDER_USE_PROGRESS | /* set flag to use progress bar */ (FOLDER_LOAD_FOLDER_NAMES | FOLDER_LOAD_RECURSE)); /* set flag to recurse and load folders */ numfiles = ArrayGetAxisDepth(files); /* get the number of files in the array */ msg = "Discovered %d Possible EDGAR Files. "; /* set the dialog message */ msg+= "Obfuscate CCC Codes in Files?"; /* set the dialog message */ rc = YesNoBox('q',msg,numfiles); /* display query to user */ if (rc!=IDYES){ /* if user did anything but press yes */ return; /* return and exit */ } /* */ ProgressSetStatus("Obfuscating (Press 'ESC' to stop)"); /* set message on progress */ log = LogCreate("Obfuscate CCCs"); /* create a log */ for (ix=0;ix<numfiles;ix++){ /* for each file in folder */ if (ix%10 == 0){ /* on every 10th file */ ProgressSetStatus(2,"File %d of %d",ix,numfiles); /* update the progress message */ } /* */ rc = ProgressUpdate(ix,numfiles); /* update the progress bar */ if (IsError(rc)){ /* if the user cancelled it */ ProgressClose(); /* close the progress bar */ msg = "Operation stopped. %d files modified."; /* set message to user */ MessageBox('i',msg,modified); /* display message */ LogDisplay(log); /* display the log */ return; /* */ } /* */ file_path = AddPaths(target,files[ix]); /* get the path to the current file */ if (ClipFileExtension(file_path)!=""){ /* if the file has an extension */ if (obfuscate_file(file_path,log)== true){ /* if we made a change in the file */ modified++; /* increment number of modified files */ } /* */ } /* */ } /* */ ProgressClose(); /* close the progress bar */ AddMessage(log,"Modified %d Files.",modified); /* display number of files modified. */ LogDisplay(log); /* display the log */ } /* */ /****************************************/ boolean is_obfuscatable(string filetype){ /* is_obfuscatable - checks filetype */ /****************************************/ switch (filetype){ /* switch on the filetype */ case "FT_XML_SECTION_16": /* if section 16 */ return true; /* return true */ case "FT_XML_FORM_13H": /* if form 13h */ return true; /* return true */ case "FT_XML_FORM_C": /* if form c */ return true; /* return true */ case "FT_XML_FORM_13F": /* if 13f */ return true; /* return true */ case "FT_XML_FORM_D": /* if form d */ return true; /* return true */ case "FT_XML_FORM_MA": /* if form ma */ return true; /* return true */ case "FT_XML_FORM_N_MFP": /* if form nmfp */ return true; /* return true */ case "FT_XML_FORM_N_SAR": /* if nsar */ return true; /* return true */ case "FT_XML_EDGAR": /* if normal EDGAR XML */ return true; /* return true */ case "FT_XFDL": /* if old school XFDL */ return true; /* return true */ case "FT_GFP_3X_ELO": /* if GoFiler EDGARLinkOnline */ return true; /* return true */ case "FT_GFP_3X_13H": /* if 13H Project File */ return true; /* return true */ case "FT_GFP_3X_13F": /* if 13F Project File */ return true; /* return true */ case "FT_GFP_3X_MA": /* if MA Project File */ return true; /* return true */ case "FT_GOFILER_PROJECT_3X": /* if other GoFiler 3.x project file */ return true; /* return true */ case "FT_GOFILER_PROJECT": /* if old GoFiler Project File */ return true; /* return true */ } /* */ return false; /* return false */ } /* */ /****************************************/ boolean obfuscate_file(string path, handle log){ /* obfuscate_file - Obscures ccc in file*/ /****************************************/ handle file; /* handle to file mappdata obj */ handle wp; /* word parser handle */ string line; /* a single line of the file */ string ccc; /* the CCC in the file */ boolean modified; /* did we modify the file? */ boolean in_tag; /* is ccc part of an SGML tag? */ string filetype; /* filetype of the file as string */ string msg; /* message to log */ int lines_mod; /* number of lines modified */ int ix,size,rc; /* counter, file legnth, return code */ /* */ modified = false; /* modified defaults to false */ lines_mod = 0; /* number of lines modified */ filetype = GetFileTypeString(path); /* get filetype of file */ if (is_obfuscatable(filetype)==false){ /* if not a recgnized obscurable file */ return false; /* return false (file not modified) */ } /* */ file = OpenMappedTextFile(path); /* open handle to map data object */ if (IsValidHandle(file)==false){ /* if we could not get a handle */ MessageBox('x',"Cannot open file %s",path); /* display error message */ return false; /* return false (file not modified */ } /* */ size = GetLineCount(file); /* get the length of the file */ for(ix=0;ix<size;ix++){ /* for each line in the file */ line = ReadLine(file,ix); /* read the line */ if (FindInString(line,"ccc")>0){ /* if the line contains the chars 'ccc' */ wp = WordParseCreate(WP_SGML_TAG,line); /* open handle to Wordparser on line */ ccc = WordParseGetWord(wp); /* get the first word */ in_tag = false; /* reset in_tag to false */ while (ccc!=""){ /* while the word isn't blank */ if (FindInString(ccc,"ccc")>0 && IsSGMLTag(ccc)){ /* if CCC is in string, and it's SGML */ in_tag = true; /* ccc is in an SGML tag */ } /* */ if (in_tag && IsRegexMatch(ccc,CCC_REGEX)){ /* if the word matches CCC regex */ line = ReplaceInString(line,ccc,obfuscate_string(ccc)); /* replace the ccc with obscured one */ ReplaceLine(file,ix,line); /* replcae the line in the mappdata obj */ lines_mod++; /* increment number of lines modified */ modified = true; /* mark file as modified */ } /* */ ccc = WordParseGetWord(wp); /* get the next word in the word parser */ } /* */ } /* */ } /* */ if (modified == true){ /* if we modified the file */ msg = "Obfuscated file: %s"; /* set message to user */ AddMessage(log,msg,path); /* add message to log */ msg = " CCCs Changed: %d"; /* set message to user */ AddMessage(log,msg,lines_mod); /* add message to log */ MappedTextSave(file,path); /* save the file */ } /* */ CloseHandle(file); /* close handle to file */ CloseHandle(wp); /* close the word parser */ return modified; /* return modified status of file (t/f) */ } /* */ /****************************************/ string obfuscate_string(string input){ /* obfuscate_string - obscure ccc */ /****************************************/ if (obfuscate_mode == MD_REDACT){ /* if we're in mark redacted mode */ return REDACTED_STRING; /* return the default redacted string. */ } /* */ else{ /* if not in redact mode */ return FormatString(FIELD_WRAP,EncryptSettingsString(input)); /* return encrypted CCC */ } /* */ } /* */ /****************************************/ void obfuscate_load(){ /* called on dialog load, populate dlg */ /****************************************/ string target; /* target folder to obscure */ /* */ target = GetSetting("Options","Last Folder"); /* get the last folder modified */ EditSetText(TARGET,target); /* set the text field on the dialog */ } /* */ /****************************************/ void obfuscate_action(int control, int action){ /* called when user presses anything */ /****************************************/ string target; /* target folder to obscure */ /* */ if (control==BROWSE){ /* if user pressed browse */ target = EditGetText(TARGET); /* get the target text from dialog */ target = BrowseFolder("Select Folder to Obfuscate", target); /* open folder browse for user */ } /* */ if (target==""){ /* if the user didn't pick a valid flder*/ return; /* return */ } /* */ EditSetText(TARGET,target); /* otherwise, set selected folder val */ } /* */ /****************************************/ int obfuscate_validate(){ /* called when user presses 'ok' on dlg */ /****************************************/ string target; /* target folder to obscure */ string msg; /* message back to user */ int redact, encrypt; /* radio button statuses */ int rc; /* return code */ /* */ target = EditGetText(TARGET); /* get folder from dialog */ if (target == "" || IsPath(target)==false){ /* if it's not a valid path */ msg = "Please choose a valid folder location and try again."; /* set response to user */ MessageBox('x',msg); /* display error to user */ return ERROR_EXIT; /* return with error */ } /* */ redact = CheckboxGetState(REDACT); /* get status of redact radio button */ encrypt = CheckboxGetState(ENCRYPT); /* get status of encrypt radio button */ if (redact == encrypt){ /* if the buttons are the same status */ MessageBox('x',"Please choose an obfuscation mode"); /* display an error */ return ERROR_EXIT; /* return with error */ } /* */ if (redact == BST_CHECKED){ /* if redact is checked off */ msg = "CCC codes are NOT recoverable after redacting. Continue?"; /* make sure user actually means it */ rc = YesNoBox('x',msg); /* by displaying a message */ if (rc!=IDYES){ /* if they didn't press yes */ return rc; /* return their response */ } /* */ obfuscate_mode = MD_REDACT; /* store user selection */ } /* */ else{ /* if mode is not redacted */ obfuscate_mode = MD_ENCRYPT; /* store mode as encrypt */ } /* if no errors returned in validation */ PutSetting("Options","Last Folder",target); /* store last folder obscured */ return ERROR_NONE; /* return no error */ } /* */ #beginresource IMAGE_ICON BITMAP { '42 4D E8 04 00 00 00 00 00 00 36 00 00 00 28 00' '00 00 14 00 00 00 14 00 00 00 01 00 18 00 00 00' '00 00 B2 04 00 00 12 0B 00 00 12 0B 00 00 00 00' '00 00 00 00 00 00 CC 00 FF CC 00 FF CC 00 FF CC' '00 FF CC 00 FF CC 00 FF CC 00 FF CC 00 FF CC 00' 'FF CC 00 FF CC 00 FF CC 00 FF CC 00 FF CC 00 FF' 'CC 00 FF CC 00 FF CC 00 FF CC 00 FF CC 00 FF CC' '00 FF CC 00 FF CC 00 FF CC 00 FF CC 00 FF CC 00' 'FF CC 00 FF CC 00 FF CC 00 FF CC 00 FF CC 00 FF' 'CC 00 FF CC 00 FF CC 00 FF CC 00 FF CC 00 FF CC' '00 FF CC 00 FF CC 00 FF CC 00 FF CC 00 FF CC 00' 'FF CC 00 FF 72 72 72 72 72 72 72 72 72 72 72 72' '72 72 72 72 72 72 72 72 72 72 72 72 72 72 72 72' '72 72 72 72 72 72 72 72 CC 00 FF CC 00 FF CC 00' 'FF CC 00 FF CC 00 FF CC 00 FF CC 00 FF CC 00 FF' '72 72 72 CE C2 FF CE C2 FF CE C2 FF CE C2 FF CE' 'C2 FF CE C2 FF CE C2 FF CE C2 FF CE C2 FF 72 72' '72 7E 7E 7E 72 72 72 CC 00 FF CC 00 FF CC 00 FF' 'CC 00 FF CC 00 FF CC 00 FF CC 00 FF 72 72 72 CE' 'C2 FF CE C2 FF CE C2 FF CE C2 FF CE C2 FF CE C2' 'FF CE C2 FF CE C2 FF CE C2 FF 72 72 72 CE C2 FF' '7E 7E 7E 72 72 72 CC 00 FF CC 00 FF CC 00 FF CC' '00 FF CC 00 FF CC 00 FF 72 72 72 CE C2 FF CE C2' 'FF CE C2 FF CE C2 FF CE C2 FF CE C2 FF CE C2 FF' 'CE C2 FF CE C2 FF 72 72 72 CE C2 FF CE C2 FF 7E' '7E 7E 72 72 72 CC 00 FF CC 00 FF CC 00 FF CC 00' 'FF CC 00 FF 72 72 72 CE C2 FF CE C2 FF 38 5A D6' 'CE C2 FF CE C2 FF CE C2 FF CE C2 FF CE C2 FF CE' 'C2 FF 72 72 72 72 72 72 72 72 72 72 72 72 72 72' '72 CC 00 FF CC 00 FF CC 00 FF CC 00 FF CC 00 FF' '72 72 72 CE C2 FF CE C2 FF CE C2 FF 38 5A D6 7A' '8E D4 CE C2 FF CE C2 FF CE C2 FF CE C2 FF CE C2' 'FF CE C2 FF CE C2 FF CE C2 FF 72 72 72 CC 00 FF' 'CC 00 FF CC 00 FF CC 00 FF CC 00 FF 72 72 72 CE' 'C2 FF CE C2 FF CE C2 FF 7A 8E D4 38 5A D6 38 5A' 'D6 7A 8E D4 CE C2 FF CE C2 FF CE C2 FF CE C2 FF' 'CE C2 FF CE C2 FF 72 72 72 CC 00 FF CC 00 FF CC' '00 FF CC 00 FF CC 00 FF 72 72 72 CE C2 FF CE C2' 'FF CE C2 FF CE C2 FF 38 5A D6 38 5A D6 38 5A D6' '7A 8E D4 CE C2 FF CE C2 FF CE C2 FF CE C2 FF CE' 'C2 FF 72 72 72 CC 00 FF CC 00 FF CC 00 FF CC 00' 'FF CC 00 FF 72 72 72 CE C2 FF CE C2 FF CE C2 FF' 'CE C2 FF 7A 8E D4 38 5A D6 38 5A D6 38 5A D6 7A' '8E D4 CE C2 FF CE C2 FF CE C2 FF CE C2 FF 72 72' '72 CC 00 FF CC 00 FF CC 00 FF CC 00 FF CC 00 FF' '72 72 72 CE C2 FF CE C2 FF CE C2 FF CE C2 FF CE' 'C2 FF 7A 8E D4 38 5A D6 38 5A D6 38 5A D6 7A 8E' 'D4 CE C2 FF CE C2 FF CE C2 FF 72 72 72 CC 00 FF' 'CC 00 FF CC 00 FF CC 00 FF CC 00 FF 72 72 72 CE' 'C2 FF CE C2 FF CE C2 FF CE C2 FF CE C2 FF CE C2' 'FF 7A 8E D4 38 5A D6 38 5A D6 38 5A D6 7A 8E D4' 'CE C2 FF CE C2 FF 72 72 72 CC 00 FF CC 00 FF CC' '00 FF CC 00 FF CC 00 FF 72 72 72 CE C2 FF CE C2' 'FF CE C2 FF CE C2 FF CE C2 FF CE C2 FF CE C2 FF' '7A 8E D4 38 5A D6 38 5A D6 38 5A D6 7A 8E D4 CE' 'C2 FF 7A 7A 7A CC 00 FF CC 00 FF CC 00 FF CC 00' 'FF CC 00 FF 72 72 72 CE C2 FF CE C2 FF CE C2 FF' 'CE C2 FF CE C2 FF CE C2 FF CE C2 FF CE C2 FF 7A' '8E D4 38 5A D6 38 5A D6 38 5A D6 7A 8E D4 72 72' '72 CC 00 FF CC 00 FF CC 00 FF CC 00 FF CC 00 FF' '72 72 72 CE C2 FF CE C2 FF CE C2 FF CE C2 FF CE' 'C2 FF CE C2 FF CE C2 FF CE C2 FF CE C2 FF 7A 8E' 'D4 38 5A D6 38 5A D6 38 5A D6 7A 8E D4 CC 00 FF' 'CC 00 FF CC 00 FF CC 00 FF CC 00 FF 72 72 72 CE' 'C2 FF CE C2 FF CE C2 FF CE C2 FF CE C2 FF CE C2' 'FF CE C2 FF CE C2 FF CE C2 FF CE C2 FF 7A 8E D4' '38 5A D6 38 5A D6 38 5A D6 7A 8E D4 CC 00 FF CC' '00 FF CC 00 FF CC 00 FF 72 72 72 CE C2 FF CE C2' 'FF CE C2 FF CE C2 FF CE C2 FF CE C2 FF CE C2 FF' 'CE C2 FF CE C2 FF CE C2 FF CE C2 FF 7A 8E D4 38' '5A D6 38 5A D6 38 5A D6 7A 8E D4 CC 00 FF CC 00' 'FF CC 00 FF CC 00 FF CC 00 FF CC 00 FF CC 00 FF' 'CC 00 FF CC 00 FF CC 00 FF CC 00 FF CC 00 FF CC' '00 FF CC 00 FF CC 00 FF CC 00 FF CC 00 FF CC 00' 'FF CC 00 FF CC 00 FF CC 00 FF CC 00 FF CC 00 FF' 'CC 00 FF CC 00 FF CC 00 FF CC 00 FF CC 00 FF CC' '00 FF CC 00 FF CC 00 FF CC 00 FF CC 00 FF CC 00' 'FF CC 00 FF CC 00 FF CC 00 FF CC 00 FF CC 00 FF' 'CC 00 FF CC 00 FF 00 00 ' } #endresource
In addition to all the code above, this script also uses a resource file (.rc) to display a dialog to the user. This file must be named 'ObscureCCC.rc' in order for the script to work properly. The dialog file must be included in the same folder as the script file. The dialog is going to allow the user to select a folder in which to operate and an obfuscation method: either redaction or encryption. We discuss the Legato scripts attached to the dialog’s operation later on. The dialog file contents are:
#define ENCRYPT 202 #define BROWSE_FILES 102 #define BROWSE 102 #define TARGET 101 #define REDACT 201 ObfuscateDlg DIALOG 0, 0, 215, 90 EXSTYLE WS_EX_DLGMODALFRAME STYLE DS_3DLOOK | DS_MODALFRAME | DS_SETFONT | WS_CAPTION | WS_VISIBLE | WS_POPUP | WS_SYSMENU CAPTION "Obfuscate CCCs" FONT 8, "MS Sans Serif" { CONTROL "Mode", -1, "STATIC", SS_LEFT | WS_CHILD | WS_VISIBLE, 6, 4, 26, 8 CONTROL "", -1, "static", SS_ETCHEDFRAME, 28, 9, 184, 1 CONTROL "Redact CCC Codes", REDACT, "BUTTON", BS_AUTORADIOBUTTON | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 12, 18, 80, 8 CONTROL "Encrypt CCC Codes", ENCRYPT, "BUTTON", BS_AUTORADIOBUTTON | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 100, 18, 83, 8 CONTROL "", -1, "static", SS_ETCHEDFRAME, 5, 67, 207, 1 CONTROL "Start", IDOK, "BUTTON", BS_DEFPUSHBUTTON | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 97, 72, 50, 14 CONTROL "Cancel", IDCANCEL, "BUTTON", BS_PUSHBUTTON | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 157, 72, 50, 14 CONTROL "", TARGET, "edit", ES_LEFT | ES_AUTOHSCROLL | WS_CHILD | WS_VISIBLE | WS_BORDER | WS_TABSTOP, 12, 47, 144, 12, 0 CONTROL "Browse...", BROWSE, "button", BS_PUSHBUTTON | BS_CENTER | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 158, 46, 50, 14, 0 CONTROL "Target Folder", -1, "static", SS_LEFT | WS_CHILD | WS_VISIBLE, 6, 31, 54, 8, 0 CONTROL "", -1, "static", SS_ETCHEDFRAME | WS_CHILD | WS_VISIBLE, 52, 36, 160, 1, 0 }
This script has five defined values in it. CCC_REGEX is a regular expression that looks for CCC-like text. This expression matches any eight characters that are either letters, numbers, or allowable CCC symbols (‘*’, ‘$’,’@’, or ‘#’). The expression could match non-CCC things as well, but combined with some other checking we do we can be confident we’re only altering CCCs. FIELD_WRAP is used to wrap around encrypted CCC values. GoFiler 4.20b and later will be able to open project files and XML files with encrypted CCCs and handle them normally. GoFiler 4.20a and earlier will just ignore them. REDACTED_STRING is the value that is placed into the file if the CCC is being redacted instead of encrypted. This effectively deletes the CCC from all files and makes it unrecoverable, so it should only be used if you don’t ever want to recover the CCC values. MD_ENCRYPT and MD_REACT are there to make the code easier to read, so we don’t have statements like “if (mode==1)”, which would be much harder to understand to someone reading the code than the equivalent statement “if (mode==MD_ENCRYPT)”.
#define CCC_REGEX "[#|\\*|\\$|@|a-zA-Z|\\d]{8}" #define FIELD_WRAP "<!-- EncryptedCCC: %s -->" #define REDACTED_STRING "<REDACTED>" #define MD_ENCRYPT 1 #define MD_REDACT 2
Our setup function is very similar to all other setup functions we’ve used in the past in this blog. The only difference here is the use of the return type void. Starting in GoFiler 4.20b, all user-defined Legato functions must have declared return type and they must return whatever their stated return value is. So if int was used as the return type, we’d need to add a return of an int at the bottom of the function. Because this function just sets up our script, and doesn’t actually return anything, we want to use void instead of returning a useless value.
/****************************************/ void setup(){ /* setup - adds function to menu */ /****************************************/ string obfuscate[]; /* menu options array */ string fn; /* script filename */ /* */ fn = GetScriptFilename(); /* gets the filename of the script */ obfuscate["Code"] = "OBFUSCATE_CCC"; /* menu code of the function */ obfuscate["MenuText"] = "Obfuscate CCC"; /* menu description of the function */ obfuscate["Description"]="<b>Obfuscate CCC</b> Obfuscate CCC "; /* long description of the function */ obfuscate["Description"]+="by either redacting or encrypting them.";/* long description of the function */ obfuscate["SmallBitmap"] = "IMAGE_ICON"; /* image icon to use */ /* */ MenuAddFunction(obfuscate); /* adds menu function to tools menu */ /* */ MenuSetHook(obfuscate["Code"],fn,"run_obfuscate"); /* add run function to menu function */ } /* */
The main function is also void, because it doesn’t need to return anything. This version of main includes an if statement that checks to see if the parent script is the LegatoIDE, which means we’re running it in the development environment. If it’s in this environment, it will run setup and then execute the run_obfuscate function, which is the primary function of the script. Otherwise, it does nothing. This makes it a lot easier to debug because we can run the function in the development environment with breakpoints and logging that’s easier to use.
/****************************************/ void main(){ /* main - primary program entry pt */ /****************************************/ if (GetScriptParent()=="LegatoIDE"){ /* if running within legato IDE */ setup(); /* setup hook on menu */ run_obfuscate(1,"preprocess"); /* run the actual fucntion */ } /* */ } /* */
The run_obfuscate function is the primary function of the script, the one that actually does all the work. The return type is void, so instead of returning ERROR_NONE or something similar, we just return with no value. This could occur off the bat if the script is not running in preprocess mode. If it passes that check, it opens the dialog box specified in the resource file to ask the user what folder should be obscured. If the dialog exited in a state other than ERROR_NONE, the function returns and ends. Otherwise, we continue and get the target folder from the settings file. Next we open a progress bar, and after that we can use the EnumerateFiles function to get a list of all files in the target directory and store them in an array. Our function uses our progress bar to show how many files are being searched. The process can take quite a bit of time if the function had to search through a lot of folders, so a progress bar is a great idea to show the user the program is still working and not stuck.
After we have our list of files, we can retrieve the number of files, alert the user of the extent of possible changes, and prompt if he or she wants the script to continue. If they hit anything other than yes, the function returns and exits.
/****************************************/ void run_obfuscate(int f_id, string mode){ /* run_obfuscate - obfuscate CCC */ /****************************************/ int rc; /* return code */ handle log; /* log handle */ string msg; /* dialog message */ string target; /* target folder to obscure */ string files[]; /* array of files to obscure */ string file_path; /* path to current file */ string extension; /* extension of a file */ int ix,numfiles,modified; /* counter, num files, modified files */ /* */ if (mode!="preprocess"){ /* if not running in preprocess */ return; /* return and exit */ } /* */ /* */ rc = DialogBox("ObfuscateDlg", "obfuscate_"); /* open the options dialog */ if (rc != ERROR_NONE){ /* if the dialog didn't exit normally */ return; /* return and exit */ } /* */ /* */ target = GetSetting("Options","Last Folder"); /* get the target folder to obscure */ ProgressOpen("Obfuscating CCCs"); /* open a progress bar */ ProgressSetStatus("Discovering Files"); /* set the message on progress bar */ files = EnumerateFiles(AddPaths(target,"*.gfp;*.xml"), /* enumerate files in target folder */ FOLDER_USE_PROGRESS | /* set flag to use progress bar */ (FOLDER_LOAD_FOLDER_NAMES | FOLDER_LOAD_RECURSE)); /* set flag to recurse and load folders */ numfiles = ArrayGetAxisDepth(files); /* get the number of files in the array */ msg = "Discovered %d Possible EDGAR Files. "; /* set the dialog message */ msg+= "Obfuscate CCC Codes in Files?"; /* set the dialog message */ rc = YesNoBox('q',msg,numfiles); /* display query to user */ if (rc!=IDYES){ /* if user did anything but press yes */ return; /* return and exit */ } /* */
Next, we update the status, create a log file, and actually start iterating over each file in our array of filenames. Every tenth file, we use the ProgressSetStatus function to change the status display of our progress bar. For each file, we update the progress and check the return code. If the user pressed ‘ESC’ on the dialog, the ProgressUpdate function will return an error, so we can exit the script gracefully by closing the progress window, displaying a message to the user, closing the log, and returning. Otherwise, we keep working on the file.
Using the AddPaths function, we get the fully qualified path to the target file. If the file extension of our file isn’t blank (for example, we might have a folder name, not a file name) we can pass the name of the file to our obfuscate_file function, which actually processes the file. If that function returns true, it means the file was modified, so we can increment our modified files counter. Once we’ve done this for every file, we can close progress, add a message to the log, and display it.
ProgressSetStatus("Obfuscating (Press 'ESC' to stop)"); /* set message on progress */ log = LogCreate("Obfuscate CCCs"); /* create a log */ for (ix=0;ix<numfiles;ix++){ /* for each file in folder */ if (ix%10 == 0){ /* on every 10th file */ ProgressSetStatus(2,"File %d of %d",ix,numfiles); /* update the progress message */ } /* */ rc = ProgressUpdate(ix,numfiles); /* update the progress bar */ if (IsError(rc)){ /* if the user cancelled it */ ProgressClose(); /* close the progress bar */ msg = "Operation stopped. %d files modified."; /* set message to user */ MessageBox('i',msg,modified); /* display message */ LogDisplay(log); /* display the log */ return; /* */ } /* */ file_path = AddPaths(target,files[ix]); /* get the path to the current file */ if (ClipFileExtension(file_path)!=""){ /* if the file has an extension */ if (obfuscate_file(file_path,log)== true){ /* if we made a change in the file */ modified++; /* increment number of modified files */ } /* */ } /* */ } /* */ ProgressClose(); /* close the progress bar */ AddMessage(log,"Modified %d Files.",modified); /* display number of files modified. */ LogDisplay(log); /* display the log */ } /* */
is_obfuscatable is a very basic function. It simply takes a name of a file type and returns true if it’s something that should be obscured by this script. It does this by using a long switch statement. These values are all defined in the Legato SDK.
/****************************************/ boolean is_obfuscatable(string filetype){ /* is_obfuscatable - checks filetype */ /****************************************/ switch (filetype){ /* switch on the filetype */ case "FT_XML_SECTION_16": /* if section 16 */ return true; /* return true */ case "FT_XML_FORM_13H": /* if form 13h */ return true; /* return true */ case "FT_XML_FORM_C": /* if form c */ return true; /* return true */ case "FT_XML_FORM_13F": /* if 13f */ return true; /* return true */ case "FT_XML_FORM_D": /* if form d */ return true; /* return true */ case "FT_XML_FORM_MA": /* if form ma */ return true; /* return true */ case "FT_XML_FORM_N_MFP": /* if form nmfp */ return true; /* return true */ case "FT_XML_FORM_N_SAR": /* if nsar */ return true; /* return true */ case "FT_XML_EDGAR": /* if normal EDGAR XML */ return true; /* return true */ case "FT_XFDL": /* if old school XFDL */ return true; /* return true */ case "FT_GFP_3X_ELO": /* if GoFiler EDGARLinkOnline */ return true; /* return true */ case "FT_GFP_3X_13H": /* if 13H Project File */ return true; /* return true */ case "FT_GFP_3X_13F": /* if 13F Project File */ return true; /* return true */ case "FT_GFP_3X_MA": /* if MA Project File */ return true; /* return true */ case "FT_GOFILER_PROJECT_3X": /* if other GoFiler 3.x project file */ return true; /* return true */ case "FT_GOFILER_PROJECT": /* if old GoFiler Project File */ return true; /* return true */ } /* */ return false; /* return false */ } /* */
The function obfuscate_file actually handles the file passed to it by the run_obfuscate function. The first thing this function does is get the file type of the file using the GetFileTypeString function. It then passes it to is_obfuscatable to check if it needs to do anything with this file. If not, it returns false. Otherwise we continues on. Next, it opens the file to a Mapped Data Object using the OpenMappedTextFile function. It then checks to make sure the file opened correctly by using the IsValidHandle function. If the open operation failed, the function returns false. Otherwise, we get the number of lines in the file and loop through all lines. For each line, we check if the characters “ccc” appear in the line. If they do, we create a Word Parse Object using the WordParseCreate function to help make sense of the line. While we have a next word in the line (the WordParseGetWord function returns an empty string when there’s no next word), we check to see if the “ccc” character string is in the word and if that word is an SGML tag. If this is true, it means that we’re probably parsing a line that contains our CCC value, so we set the in_tag flag to true.
If the in_tag flag has been set to true, and the word we’re parsing matches our CCC_REGEX regular expression, we’ve got a CCC to replace. We use the ReplaceInString function and obfuscate_string to build a new line. Then we can make use of the ReplaceLine function to place our new line into our file. After, we can increment the number of lines modified, set the modified flag to true, and move onto the next word in our Word Parser. Once we’ve looked at every line in the file, if the file was modified, we can use the MappedTextSave function to save our changes, display any error messages to the user, and then close our open Mapped Text Object and Word Parse Object. Finally, we return our modified flag.
/****************************************/ boolean obfuscate_file(string path, handle log){ /* obfuscate_file - Obscures ccc in file*/ /****************************************/ handle file; /* handle to file mappdata obj */ handle wp; /* word parser handle */ string line; /* a single line of the file */ string ccc; /* the CCC in the file */ boolean modified; /* did we modify the file? */ boolean in_tag; /* is ccc part of an SGML tag? */ string filetype; /* filetype of the file as string */ string msg; /* message to log */ int lines_mod; /* number of lines modified */ int ix,size,rc; /* counter, file legnth, return code */ /* */ modified = false; /* modified defaults to false */ lines_mod = 0; /* number of lines modified */ filetype = GetFileTypeString(path); /* get filetype of file */ if (is_obfuscatable(filetype)==false){ /* if not a recognized obscurable file */ return false; /* return false (file not modified) */ } /* */ file = OpenMappedTextFile(path); /* open handle to map data object */ if (IsValidHandle(file)==false){ /* if we could not get a handle */ MessageBox('x',"Cannot open file %s",path); /* display error message */ return false; /* return false (file not modified */ } /* */ size = GetLineCount(file); /* get the length of the file */ for(ix=0;ix<size;ix++){ /* for each line in the file */ line = ReadLine(file,ix); /* read the line */ if (FindInString(line,"ccc")>0){ /* if the line contains the chars 'ccc' */ wp = WordParseCreate(WP_SGML_TAG,line); /* open handle to Wordparser on line */ ccc = WordParseGetWord(wp); /* get the first word */ in_tag = false; /* reset in_tag to false */ while (ccc!=""){ /* while the word isn't blank */ if (FindInString(ccc,"ccc")>0 && IsSGMLTag(ccc)){ /* if CCC is in string, and it's SGML */ in_tag = true; /* ccc is in an SGML tag */ } /* */ if (in_tag && IsRegexMatch(ccc,CCC_REGEX)){ /* if the word matches CCC regex */ line = ReplaceInString(line,ccc,obfuscate_string(ccc)); /* replace the ccc with obscured one */ ReplaceLine(file,ix,line); /* replcae the line in the mappdata obj */ lines_mod++; /* increment number of lines modified */ modified = true; /* mark file as modified */ } /* */ ccc = WordParseGetWord(wp); /* get the next word in the word parser */ } /* */ } /* */ } /* */ if (modified == true){ /* if we modified the file */ msg = "Obfuscated file: %s"; /* set message to user */ AddMessage(log,msg,path); /* add message to log */ msg = " CCCs Changed: %d"; /* set message to user */ AddMessage(log,msg,lines_mod); /* add message to log */ MappedTextSave(file,path); /* save the file */ } /* */ CloseHandle(file); /* close handle to file */ CloseHandle(wp); /* close the word parser */ return modified; /* return modified status of file (t/f) */ } /* */
The obfuscate_string function is pretty simple. It takes an input string, and, depending on the value of the global obfuscate_mode variable, it can do one of two things: return our redacted string define, or encrypts the CCC. It does the latter with the EncryptSettingsString function and wraps the result with our FIELD_WRAP define. After, we return the new value. The EncryptSettingsString function is non-reversable for security purposes. GoFiler can decrypt it for internal use, but it cannot decrypt this value for the end user.
string obfuscate_string(string input){ /* obfuscate_string - obscure ccc */ /****************************************/ if (obfuscate_mode == MD_REDACT){ /* if we're in mark redacted mode */ return REDACTED_STRING; /* return the default redacted string. */ } /* */ else{ /* if not in redact mode */ return FormatString(FIELD_WRAP,EncryptSettingsString(input)); /* return encrypted CCC */ } /* */ } /* */
The last three functions, obfuscate_load, obfuscate_validate, and obfuscate_action, are all dialog functions. When the DialogBox function is called by the run_obfuscate function, it’s passed the prefix ‘obfuscate_”. This means these functions are called at certain points in the dialog’s cycle. obfuscate_load is called prior to the load process, and it gets the last folder from the settings file and enters it into the dialog for the user. The obfuscate_action function is called when a user does anything with the dialog. The “Start” and “Cancel” buttons have their own predefined actions, so we don’t have to handle those. The only action then that we care about is the user pressing “Browse…”, so we check if the control is the Browse button. If so, the function opens the folder browser for the user to pick a folder. If the user has picked a valid folder, it sets the value into the dialog.
The obfuscate_validate function is called when the user presses the “Start” button. This happens without us having to do anything. It’s a predefined behavior of the dialog class when IDOK is registered from the “Start” button. The validation function checks to make sure the selected path isn’t blank and actually exists on the machine. It then ensures one of the two radio buttons was selected so we have an obfuscation method to use. If the user chose to redact, we warn that this is permanent to double-check they actually meant to pick that. Then we set the global obfuscate_mode value for later use, store the folder to go through in the settings file, and then return without error.
/****************************************/ void obfuscate_load(){ /* called on dialog load, populate dlg */ /****************************************/ string target; /* target folder to obscure */ /* */ target = GetSetting("Options","Last Folder"); /* get the last folder modified */ EditSetText(TARGET,target); /* set the text field on the dialog */ } /* */ /****************************************/ void obfuscate_action(int control, int action){ /* called when user presses anything */ /****************************************/ string target; /* target folder to obscure */ /* */ if (control==BROWSE){ /* if user pressed browse */ target = EditGetText(TARGET); /* get the target text from dialog */ target = BrowseFolder("Select Folder to Obfuscate", target); /* open folder browse for user */ } /* */ if (target==""){ /* if the user didn't pick a valid flder*/ return; /* return */ } /* */ EditSetText(TARGET,target); /* otherwise, set selected folder val */ } /* */ /****************************************/ int obfuscate_validate(){ /* called when user presses 'ok' on dlg */ /****************************************/ string target; /* target folder to obscure */ string msg; /* message back to user */ int redact, encrypt; /* radio button statuses */ int rc; /* return code */ /* */ target = EditGetText(TARGET); /* get folder from dialog */ if (target == "" || IsPath(target)==false){ /* if it's not a valid path */ msg = "Please choose a valid folder location and try again."; /* set response to user */ MessageBox('x',msg); /* display error to user */ return ERROR_EXIT; /* return with error */ } /* */ redact = CheckboxGetState(REDACT); /* get status of redact radio button */ encrypt = CheckboxGetState(ENCRYPT); /* get status of encrypt radio button */ if (redact == encrypt){ /* if the buttons are the same status */ MessageBox('x',"Please choose an obfuscation mode"); /* display an error */ return ERROR_EXIT; /* return with error */ } /* */ if (redact == BST_CHECKED){ /* if redact is checked off */ msg = "CCC codes are NOT recoverable after redacting. Continue?"; /* make sure user actually means it */ rc = YesNoBox('x',msg); /* by displaying a message */ if (rc!=IDYES){ /* if they didn't press yes */ return rc; /* return their response */ } /* */ obfuscate_mode = MD_REDACT; /* store user selection */ } /* */ else{ /* if mode is not redacted */ obfuscate_mode = MD_ENCRYPT; /* store mode as encrypt */ } /* if no errors returned in validation */ PutSetting("Options","Last Folder",target); /* store last folder obscured */ return ERROR_NONE; /* return no error */ } /* */
This script does a good job obscuring large batches of files in a given folder. Because the EnumerateFiles function searches recursively, you could just point this script at your completed jobs folder and let ’er rip. All CCC codes inside documents within that folder will be encrypted. If the jobs don’t need to be recovered ever, you could also redact the CCCs instead of encrypting them and never have to worry about them again.
To increase the security of GoFiler when it comes to CCC values even further, more scripts could be added going forward. For example, a post-process script could triggers after a live or test filing to obscure the CCCs in the created files. A post-process hook could be added to FILE_LIVE as well, that after filing successfully, could close the project file and encrypt it since the job has been live filed and is presumably done. There are a lot of things Legato can do to make your process more secure and give you some peace of mind that every possible thing that can be done to protect your data is being done.
Steven Horowitz has been working for Novaworks for over five years as a technical expert with a focus on EDGAR HTML and XBRL. Since the creation of the Legato language in 2015, Steven has been developing scripts to improve the GoFiler user experience. He is currently working toward a Bachelor of Sciences in Software Engineering at RIT and MCC. |
Additional Resources
Legato Script Developers LinkedIn Group
Primer: An Introduction to Legato