Last week, Josh discussed how we can use the ConvertFile function to convert files from their base format into HTML, in LDC #136. This week, we’re going to take that idea one step further, and create a script that can start and stop a background process that monitors a folder for files to convert. If it finds a file that hasn’t been converted, or has been modified since it was last converted, the background process will convert the file into HTML for us automatically. This creates a “hot” folder effectively, that’s monitored for jobs to do.
Friday, May 24. 2019
LDC #137: Creating background process to convert HTML files in Legato
When we’re creating a background process like this, it’s important to consider that multiple copies of GoFiler could potentially be running, and it’s possible for a user to try to run more than one background process to monitor a single folder. To counter this, I decided to use a “lock file”. When the script runs in the background, it grabs a handle to the lock file. While that file is held, no second copy of this script can be run on that file, because it will fail to delete the lock file to create it’s own. This does mean if you run the “start” function on this script twice in a row, you may see a “starting” message and a “failed” message at the same time, because effectively you can try to start running two scripts at once. The first will run and start successfully, the second will run and not start successfully, and give an error message. This could be resolved by using session variables instead of a lock file to ensure only one process can run at a time, but in our case since the second process will simply terminate with an error while the first keeps running normally, I don’t think it’s a problem.
This script is also going to be broken into two files. The file “autoconvert.ms” is the menu hook file, that sets up the menu options to start and stop the script, and actually creates the background process that runs and monitors the folder. The file “monitor.ls” defines that background process, and handles the actual conversions. The two files can communicate with each other by using GoFiler session variables. The run function in “autoconvert.ms” sets a session value to true, and the stop function sets that value to false. The background process will run every second, and each time it checks that value. If the value has been switched to false, then it knows the user wants to end the monitor, so it can stop executing.
In the “autoconvert.ms” file, the main and setup functions are basically identical to most main and setup functions in all scripts, the main only being that setup just defines two hooks (run and stop) instead of one. Let’s start looking at the run function then, which triggers the monitor to start.
int run(int f_id, string mode){ string lockfile; string contents; string folder; string last_folder; handle hLockfile; int rc; if(mode != "preprocess"){ return ERROR_NONE; } last_folder = GetSetting("History","Target"); folder = BrowseFolder("Select Folder to Monitor",last_folder); rc = GetLastError(); if(IsError(rc)){ return rc; } PutSetting("History","Target",folder);
The run function starts by making sure the mode is preprocess or it returns, to ensure the script only runs once. Then, it gets the last folder selected from a settings file, and asks the user to browse to the folder to monitor. If the user hit cancel or there was an error of any kind, the script just exits here. Otherwise, it stores the folder in a settings file so it can be the base folder it opens to next time, and continues executing the script.
// ensure only one person is running this script on a folder at the same time. lockfile = AddPaths(folder,LOCKFILE_FN); if(DoesFileExist(lockfile)){ rc = DeleteFile(lockfile); if(IsError(rc)){ contents = FileToString(lockfile); MessageBox('x',"Cannot start monitor, folder currently being monitored by %s",contents); return ERROR_EXIT; } } StringToFile(GetUserName(),lockfile); SetSessionValue("Monitor","true"); RunBackgroundScript("Monitor.ls","monitor",lockfile); return ERROR_NONE; }
Now we can handle the lock file. We can get a path to it with AddPaths, and then check to see if the file actually exists. If the file exists, we can try to delete it. If that fails, then it means something is holding the file open, so we can display an error message to the user saying who is using it. If we were able to delete it, or there was no file there in the first place, we can write out our username as the contents of our lock file. After we’ve written out the lock file, we can set the “Monitor” session value to “true”, and start running the background script with RunBackgroundScript. We need to pass the background script the “lockfile” path, so we can use that to figure out the path of the folder we’re monitoring.
int stop(int f_id, string mode){ SetSessionValue("Monitor","false"); return ERROR_NONE; }
The second hooked function in “autoconvert.ms” is the stop function. It’s probably one of the simplest hooked functions I’ve ever written, all it does is set the “Monitor” session value to “false”, so that our background thread will see that and stop running the next time it checks. Then it can just return, because it doesn’t have to do anything else.
Let’s take a look at the “monitor.ls” file now. It only has a single function, monitor, which is the function that is called by the RunBackgroundScript function in our run function above. While our monitor is running, it can report back activity to the user through the MessageBox function. When a script is running in the background like this, the message boxes work a bit differently. They don’t pause script execution like they normally would when run as part of a background script. This means that if a message box pops up saying there’s an error, the script will continue to execute and move on, unlike in a normal script. In this case, I don’t think it’s a problem, but if you were doing a script that required user input from a YesNoBox or some other message box type other than the default notification, there would certainly be issues.
int monitor(string lockfile){ ... variable declarations omitted ... MessageBox("Monitor Starting"); my_lockfile = lockfile; hLockfile = OpenFile(my_lockfile, FO_SHARE_READ); rc = GetLastError(); if(IsError(rc)){ MessageBox('x',"Cannot open file %s",lockfile); return ERROR_EXIT; }
The very first thing our monitor function should do is alert the user that it is starting with a MessageBox. Then it gets the lock file name and stores it as a global variable in case another function needs to call it. As the script is now, it’s not needed, but it could still be useful if we need to modify it later. Then, we need to try to open the lock file. We don’t actually care about the contents, but if we can’t open the lock file, it must mean that someone else is already running this script, so we need to return an error and exit. If we can grab the lock file, then obviously nothing else has a hold on it, so nothing else is monitoring this folder, so we can go ahead and continue executing our script.
path = GetFilePath(my_lockfile); outbox = AddPaths(path,OUTBOX_FN); if(IsFolder(outbox) == false){ rc = CreateFolder(outbox); if(IsError(rc)){ MessageBox('x',"Cannot create outbox"); return ERROR_EXIT; } }
We can’t very well just convert our file and dump everything into the same folder, so we need to ensure an out folder exists. If the out folder doesn’t exist, we can create it. if it doesn’t exist, and we can’t create it, it means that we don’t have write access to this folder, so we should just return an error and exit our script.
while(true){ files = EnumerateFiles(AddPaths(path,"*.doc;*.docx"), FOLDER_LOAD_NO_HIDDEN); size = ArrayGetAxisDepth(files); for(ix=0 ; ix<size ; ix++){ // create our sub output folder output_fn = ClipFileExtension(files[ix]); out_subfolder = AddPaths(outbox,output_fn); if(IsFolder(out_subfolder) == false){ rc = CreateFolder(out_subfolder); if(IsError(rc)){ MessageBox('x',"Cannot create sub folder"); return ERROR_EXIT; } }
Now we’re getting into the meat of our function. I decided to use an infinite loop here, since I want this function to run forever until GoFiler exits or the user tells this script to stop. It’s important to be VERY CAREFUL when using intentional infinite loops to define an exit condition so it doesn’t go on indefinitely. Also ensure that the loop has a call to Sleep, otherwise it will eat up all of your CPU resources and make your computer run much slower. The first thing we need to do in our loop is to get a list of all doc and docx files in the root folder we’re monitoring, using the EnumerateFiles function.
Using EnumerateFiles repeatedly can be demanding on the hard drive or the network if using a shared drive, so when deciding to use this function here the ultimate requirements of the script must be considered. In this case, we’re looking at monitoring a folder that a user is currently working on, converting Word documents as they are edited and updated into HTML files. This intended use case means the folder our script is looking at a couple of files for a single project, so using EnumerateFiles isn’t very expensive. If we were writing a script to monitor several directories, it would probably be best to move the Word files out of the directory being scanned, so we don’t repeatedly request a huge list of files from the file system. Whenever you’re writing a script (or software in general) it’s important to make sure what you’re writing will be efficient enough to work with the requirements of the script.
After we have our list of files we can get the size of the list, and iterate over each item in it. For each doc or docx file, we need to create a sub folder in our output folder. Initially I was going to just dump everything into the same folder, but that’s not really a good idea if you have two or more Word docs that have images. Because images are numbered automatically they will just mix together and get confusing. So having separate sub folders makes a lot of sense. We can use ClipFileExtension on our Word file’s name to get an appropriate sub folder name, and if that folder doesn’t exist we can create it with CreateFolder.
// get paths to in and out files o_path = AddPaths(out_subfolder, output_fn + ".htm"); f_path = AddPaths(path,files[ix]); // get file creation times time = GetFileModifiedTime(f_path); c_time = GetFileModifiedTime(o_path); //MessageBox("word file - %d\r\nhtml file - %d",time,c_time); // if this file is newer than the last conversion if (time > c_time){
Once our sub folder is created within our output folder, we can get the paths to our output file (o_path) and our input file (f_path). Next, we can get the last modified time for each of these files. This is really important, because we don’t want to convert the Word file each time our script runs, that would take a lot of system resources for no real good reason. If the converted HTML file is newer than the Word file, it must mean we converted it and have made no changes to the Word file since the last conversion, so there’s no reason to convert it again. If we haven’t created a conversion yet, GetFileModifiedTime will return a -1 value for modified time, which is great, because our Word file will always have a later time value than -1, so our if statement comparing the times should always be true, even if the converted file doesn’t exist. If our converted file is older than our Word file, then we have to continue on to convert it. Otherwise, we can just skip over this file and go back for the next.
// delete images in the sub folder to avoid duplicate images sub_files = EnumerateFiles(AddPaths(out_subfolder,"*.jpg;*.gif")); size = ArrayGetAxisDepth(sub_files); for(ix=0 ; ix<size ; ix++){ DeleteFile(AddPaths(out_subfolder,sub_files[ix])); } // run the conversion rc = ConvertFile(f_path,o_path); if(IsError(rc)){ MessageBox('x',"Cannot convert file %s, error %0x", f_path, rc); } } }
At this point in our script, we’ve encountered a Word file that either hasn’t been converted yet, or is an older conversion and the Word file has been changed. The first thing to do is to enumerate all images in the sub folder, and delete them. If we don’t do this, then GoFiler will just keep adding new images to the folder, which can get confusing fast. So we can iterate over all enumerated images, and delete each file. If the DeleteFile fails, it doesn’t really matter, since GoFiler will just create new images anyway, so I don’t think we need to check the success or failure of this particular function. After we’ve deleted the images (or at least tried to), we can use ConvertFile to create a new HTML file out of this Word document. If the conversion fails, we can show an error message, but it doesn’t mean all conversions will fail, so the monitor doesn’t end, it just keeps going onto the next file to try it.
// check if monitor has been disabled status = GetSessionString("Monitor"); if(status=="false" || status==""){ MessageBox("Monitor shutting down."); CloseHandle(hLockfile); rc = DeleteFile(my_lockfile); if(IsError(rc)){ MessageBox("Cannot delete lockfile %s",my_lockfile); } return ERROR_NONE; } Sleep(POLLRATE); } }
After the monitor has run through all files in the folder, we need to check to see if the user has told it to stop. We can use GetSessionString to get our “Monitor” value, and if it’s “false” or has been reset, we can close our handle on the lock file, delete the lock file, and then return. If it’s not time to exit our script, we need to use the Sleep function so our script will wait for a defined amount of time (1000 ms in this case) before running again to check for more files to convert. I only put a single check in this script at the end of the execution loop to test if it’s time to stop running, which is fine if you only have 2-3 files in the folder you’re working on, but keep in mind for extremely large folders containing a lot of Word documents, the script doesn’t check if it’s time to stop until everything is converted. This means you could tell the script to stop running, and it would not actually stop for several minutes, after it’s finished all jobs in the folder. This is OK in the current script, within it’s intended use case, but it’s something to keep in mind if you would want to expand it for other uses.
Using lock files and session variables, we can effectively create multiple different programs that work together to do a useful task. This is a pretty simple example of that, being only a single background process running, but it demonstrates the concept and adds a nice feature to GoFiler, to monitor a single folder for new files to convert, that it doesn’t ship with by default. It could certainly be expanded to handle other file types besides Word docs, or to apply different settings to different conversions based on the folder, or to monitor multiple folders, but this is a pretty solid way to start.
Here’s both script files entirely:
/***************************************** Automatic conversion example ------------------ Revision: 05-22-19 SCH Auto converting files in a hot folder ******************************************/ #define LOCKFILE_FN "Monitor.dat" #define FOLDER_IN_USE_ERR "This folder is being monitored already by user %s" int setup (); int run (int f_id, string mode); int stop (int f_id, string mode); int main (); int main(){ string s1; s1 = GetScriptParent(); if(s1 == "LegatoIDE"){ setup(); } return ERROR_NONE; } int setup(){ string s1; string params[]; params["Code"] = "BEGIN_MONITOR_FOLDER"; params["Description"] = "Begin monitoring a target folder for word docs"; params["MenuText"] = "Start Monitor Folder"; MenuAddFunction(params); s1 = GetScriptFilename(); MenuSetHook(params["Code"],s1,"run"); params["Code"] = "STOP_MONITOR_FOLDER"; params["Description"] = "Begin monitoring a target folder for word docs"; params["MenuText"] = "End Monitor Folder"; MenuAddFunction(params); MenuSetHook(params["Code"],s1,"stop"); return ERROR_NONE; } int stop(int f_id, string mode){ SetSessionValue("Monitor","false"); return ERROR_NONE; } int run(int f_id, string mode){ string lockfile; string contents; string folder; string last_folder; handle hLockfile; int rc; if(mode != "preprocess"){ return ERROR_NONE; } last_folder = GetSetting("History","Target"); folder = BrowseFolder("Select Folder to Monitor",last_folder); rc = GetLastError(); if(IsError(rc)){ return rc; } PutSetting("History","Target",folder); // ensure only one person is running this script on a folder at the same time. lockfile = AddPaths(folder,LOCKFILE_FN); if(DoesFileExist(lockfile)){ rc = DeleteFile(lockfile); if(IsError(rc)){ contents = FileToString(lockfile); MessageBox('x',"Cannot start monitor, folder currently being monitored by %s",contents); return ERROR_EXIT; } } StringToFile(GetUserName(),lockfile); SetSessionValue("Monitor","true"); RunBackgroundScript("Monitor.ls","monitor",lockfile); return ERROR_NONE; } /************************************* Monitor ------------------ Revision: 05-22-19 SCH Auto converting files in a hot folder **************************************/ #define POLLRATE 1000 #define OUTBOX_FN "Out" string my_lockfile; handle hLockfile; int monitor(string lockfile); int monitor(string lockfile){ int ix,rx; int size; int rc; string path; string files[]; string sub_files[]; string f_path; string o_path; string output_fn; string out_subfolder; string status; string outbox; qword time; qword c_time; MessageBox("Monitor Starting"); my_lockfile = lockfile; hLockfile = OpenFile(my_lockfile, FO_SHARE_READ); rc = GetLastError(); if(IsError(rc)){ MessageBox('x',"Cannot open file %s",lockfile); return ERROR_EXIT; } path = GetFilePath(my_lockfile); outbox = AddPaths(path,OUTBOX_FN); if(IsFolder(outbox) == false){ rc = CreateFolder(outbox); if(IsError(rc)){ MessageBox('x',"Cannot create outbox"); return ERROR_EXIT; } } while(true){ files = EnumerateFiles(AddPaths(path,"*.doc;*.docx"), FOLDER_LOAD_NO_HIDDEN); size = ArrayGetAxisDepth(files); for(ix=0 ; ix<size ; ix++){ // create our sub output folder output_fn = ClipFileExtension(files[ix]); out_subfolder = AddPaths(outbox,output_fn); if(IsFolder(out_subfolder) == false){ rc = CreateFolder(out_subfolder); if(IsError(rc)){ MessageBox('x',"Cannot create sub folder"); return ERROR_EXIT; } } // get paths to in and out files o_path = AddPaths(out_subfolder, output_fn + ".htm"); f_path = AddPaths(path,files[ix]); // get file creation times time = GetFileModifiedTime(f_path); c_time = GetFileModifiedTime(o_path); //MessageBox("word file - %d\r\nhtml file - %d",time,c_time); // if this file is newer than the last conversion if (time > c_time){ // delete images in the sub folder to avoid duplicate images sub_files = EnumerateFiles(AddPaths(out_subfolder,"*.jpg;*.gif")); size = ArrayGetAxisDepth(sub_files); for(ix=0 ; ix<size ; ix++){ DeleteFile(AddPaths(out_subfolder,sub_files[ix])); } // run the conversion rc = ConvertFile(f_path,o_path); if(IsError(rc)){ MessageBox('x',"Cannot convert file %s, error %0x", f_path, rc); } } } // check if monitor has been disabled status = GetSessionString("Monitor"); if(status=="false" || status==""){ MessageBox("Monitor shutting down."); CloseHandle(hLockfile); rc = DeleteFile(my_lockfile); if(IsError(rc)){ MessageBox("Cannot delete lockfile %s",my_lockfile); } return ERROR_NONE; } Sleep(POLLRATE); } }
Steven Horowitz has been working for Novaworks for over five years as a technical expert with a focus on EDGAR HTML and XBRL. Since the creation of the Legato language in 2015, Steven has been developing scripts to improve the GoFiler user experience. He is currently working toward a Bachelor of Sciences in Software Engineering at RIT and MCC. |
Additional Resources
Legato Script Developers LinkedIn Group
Primer: An Introduction to Legato