A pretty common task in any document editing environment is running find and replace operations. If you have a character that’s repeated, and you want to replace every instance of that character with a different character, then running a find and replace is the fastest way to do it. What happens if you need to execute these operations many times on different documents? You could run Find and Replace and type the information in each time but it’s often a much easier solution to just write a small Legato function. You can execute the Legato function from the Tools menu to run a common find and replace operation. I know in previous blog posts I’ve done similar scripts to replace wingdings characters with character entities, or to replace certain inline tags with other inline tags. For this week’s blog script, I took those previous scripts and made a more generic version, that can be easily modified by anyone to do different find and replace operations.
Friday, June 29. 2018
LDC #91: Find and Replace HTML Script Template
You don’t need to know the ins and outs of Legato to modify this script, but we’re going to go over how it works in depth anyway. Modifying the script to do any find and replace you want is easy but understanding what’s going on in at least a general sense is always a good idea. This script will iterate over every HTML tag in your document, and for each element it finds, it will examine the content, and run a find and replace operation on the content of that element. This means it will only look at the content of block tags such as paragraphs, tables, and divisions. It will not find and replace those tags itself, so if you want to replace all paragraphs with divisions, this script will not work (though it could be modified to do so). If you want to replace some characters with others, or wingdings characters with character entities, this script is a great starting point though, because that’s just modifying the content of paragraphs that already exist. Let’s take a look at the script, starting with the setup function.
This setup function is pretty much like any others we’ve talked about. In this case though, the Code, MenuText, and Description values in the item array are just placeholders, and should really be replaced to be more descriptive of what the function will actually do. Other than that, this function can just be left alone.
/****************************************/ int setup() { /* Called from Application Startup */ /****************************************/ string fnScript; /* Us */ string item[10]; /* Menu Item */ int rc; /* Return Code */ /* */ item["Code"] = "EXTENSION_REPLACE_EXAMPLE"; /* Function Code */ item["MenuText"] = "&Replace Example"; /* Menu Text */ item["Description"] = "<B>Replace Example</B> "; /* Description (long) */ item["Description"]+= "\r\rExample of Replace Function"; /* * description */ fnScript = GetScriptFilename(); /* Get the script filename */ MenuAddFunction(item); /* add the function to the menu */ MenuSetHook(item["Code"], fnScript, "run"); /* Set the Test Hook */ return ERROR_NONE; /* Return value (does not matter) */ } /* end setup */
The run function is the main function called by the menu hook from the Tools menu. Like all run functions we’ve done so far, it checks the mode definition to make sure it’s running in preprocess mode, then gets on with the execution of the script. It starts by getting the active Edit Window with GetActiveEditWindow, and checks the window type to ensure the window is an HTML Page View window. Now that we have an HTML window, we can get the Edit Object, and create an SGML parser using that object.
/****************************************/ void run(int f_id, string mode){ /* call from hook */ /****************************************/ string find, replace; /* segment of text */ int replaced; /* number of replaced items */ handle window; /* window handle */ handle sgml; /* sgml parser */ dword w_type; /* window type */ handle edit_obj; /* current object of text */ /* */ if (mode != "preprocess") { /* check mode */ return; /* return */ } /* */ /* */ window = GetActiveEditWindow(); /* get the active edit window */ w_type = GetEditWindowType(window); /* get the window type */ w_type &= EDX_TYPE_ID_MASK; /* get the window type */ if (w_type != EDX_TYPE_PSG_PAGE_VIEW){ /* if it's not page view */ MessageBox('x',"This function can only be run on an HTML file."); /* display error */ } /* */ edit_obj = GetEditObject(window); /* get current selected edit object */ sgml = SGMLCreate(edit_obj); /* create SGML object */
Now that we have our objects created, we can run our find and replace operations. I marked this section with comments to indicate where you can modify what is being found and how it is replaced. Define a “find” and a “replace” string, then call the find_replace function. The function returns an integer value of how many items were replaced, so we add the returned value to the total of how many items were replaced. After running the two replace operations, or more if you want, a message box pops up to let the user know how many objects were edited, or if no objects were edited.
/* ******************************************* begin edit area **********************************************/ find = " "; /* set find string */ replace = " "; /* set replace string */ replaced += find_replace(edit_obj, find, replace, sgml); /* execute a find / replace */ /* */ find = "<FONT STYLE=\"font-family: Wingdings\">x</FONT>"; /* set find string */ replace = "☒"; /* set replace string */ replaced += find_replace(edit_obj, find, replace, sgml); /* execute a find / replace */ /* */ /* ******************************************* end edit area ************************************************/ if (replaced != 0){ /* if replaced isn't zero */ MessageBox('i',"Edited %d objects in the file.",replaced); /* display message */ } /* */ else{ /* if there is nothing replaced */ MessageBox('i',"Found nothing to replace."); /* display message */ } /* */ } /* */
The find_replace function does the majority of the work in the script. It takes an edit object handle, find and replace string values, and a handle to the SMGL parser as inputs and executes the find and replace operations on the file. Note that it returns the number of block objects that were edited, not the total number of things that were replaced.
The first thing the function does is set the position of the SGML parser back to the start of the file, in case something previously used the parser. Then it grabs the first element from the parser with SGMLNextElement, and enters a while loop. It stays in that loop until it runs out of elements. If we’re looking at an element with “<HTML” or “<BODY” in it, we can just continue processing after grabbing the next element, because we don’t want to parse over that, we want actual block elements inside the body instead of the body itself.
/****************************************/ int find_replace(edit_obj, find, replace, sgml){ /* execute a find and replace */ /****************************************/ int ix, ex, ey, sx, sy; /* counters */ string contents,segment; /* string segment */ /* */ SGMLSetPosition(sgml,0,0); /* reset position */ segment = SGMLNextElement(sgml); /* get the next element */ while(segment!=""){ /* while not at the end of the doc */ if (FindInString(segment,"<HTML")>(-1)){ /* if this an HTML tag */ segment = SGMLNextElement(sgml); /* get the next element */ continue; /* go back for next tag */ } /* */ if (FindInString(segment,"<BODY")>(-1)){ /* if this an HTML tag */ segment = SGMLNextElement(sgml); /* get the next element */ continue; /* go back for next tag */ } /* */
Now we can get the start position of our area to replace by grabbing the end positions of the element we’re on. We’re only replacing the content, so it makes sense to grab the end positions of the current SGML tag. Then we want to use the SGMLFindClosingElement function to advance our parse position to the closing tag, and to get the content of the tag. If the content of the tag contains the string we’re looking for, we can get the end positions of the content by getting the start positions of the closing tag. Then we run a ReplaceInString function on the content to actually do a replace. All that’s left to do then is to use WriteSegment to write out the content, reset our parser position with SGMLSetPosition, and increment the number of elements we’ve edited.
sx = SGMLGetItemPosEX(sgml); /* get start x */ sy = SGMLGetItemPosEY(sgml); /* get start y */ contents = SGMLFindClosingElement(sgml, SP_FCE_CODE_AS_IS); /* get the content of the element */ if (FindInString(contents, find)>(-1)){ /* if the target exists in the string */ ex = SGMLGetItemPosSX(sgml); /* get end x */ ey = SGMLGetItemPosSY(sgml); /* get end y */ contents = ReplaceInString(contents,find,replace); /* get new content */ WriteSegment(edit_obj, contents, sx,sy,ex,ey); /* write new string out */ SGMLSetPosition(sgml,ex,ey); /* set position */ ix++; /* increment counter */ } /* */ segment = SGMLNextElement(sgml); /* get the next element */ } /* */ return ix; /* return no error */ } /* */
Here's a complete copy of our script file:
// // // GoFiler Legato Script - Find Replace // ------------------------------------------ // // Rev 06/29/2018 void run (int f_id, string mode); int find_replace (handle edit_obj, string find, string replace, handle sgml); /****************************************/ int setup() { /* Called from Application Startup */ /****************************************/ string fnScript; /* Us */ string item[10]; /* Menu Item */ int rc; /* Return Code */ /* */ item["Code"] = "EXTENSION_REPLACE_EXAMPLE"; /* Function Code */ item["MenuText"] = "&Replace Example"; /* Menu Text */ item["Description"] = "<B>Replace Example</B> "; /* Description (long) */ item["Description"]+= "\r\rExample of Replace Function"; /* * description */ fnScript = GetScriptFilename(); /* Get the script filename */ MenuAddFunction(item); /* add the function to the menu */ MenuSetHook(item["Code"], fnScript, "run"); /* Set the Test Hook */ return ERROR_NONE; /* Return value (does not matter) */ } /* end setup */ /****************************************/ void run(int f_id, string mode){ /* call from hook */ /****************************************/ string find, replace; /* segment of text */ int replaced; /* number of replaced items */ handle window; /* window handle */ handle sgml; /* sgml parser */ dword w_type; /* window type */ handle edit_obj; /* current object of text */ /* */ if (mode != "preprocess") { /* check mode */ return; /* return */ } /* */ /* */ window = GetActiveEditWindow(); /* get the active edit window */ w_type = GetEditWindowType(window); /* get the window type */ w_type &= EDX_TYPE_ID_MASK; /* get the window type */ if (w_type != EDX_TYPE_PSG_PAGE_VIEW){ /* if it's not page view */ MessageBox('x',"This function can only be run on an HTML file."); /* display error */ } /* */ edit_obj = GetEditObject(window); /* get current selected edit object */ sgml = SGMLCreate(edit_obj); /* create SGML object */ /* ******************************************* begin edit area **********************************************/ find = " "; /* set find string */ replace = " "; /* set replace string */ replaced += find_replace(edit_obj, find, replace, sgml); /* execute a find / replace */ /* */ find = "<FONT STYLE=\"font-family: Wingdings\">x</FONT>"; /* set find string */ replace = "☒"; /* set replace string */ replaced += find_replace(edit_obj, find, replace, sgml); /* execute a find / replace */ /* */ /* ******************************************* end edit area ************************************************/ if (replaced != 0){ /* if replaced isn't zero */ MessageBox('i',"Edited %d objects in the file.",replaced); /* display message */ } /* */ else{ /* if there is nothing replaced */ MessageBox('i',"Found nothing to replace."); /* display message */ } /* */ } /* */ /****************************************/ int find_replace(edit_obj, find, replace, sgml){ /* execute a find and replace */ /****************************************/ int ix, ex, ey, sx, sy; /* counters */ string contents,segment; /* string segment */ /* */ SGMLSetPosition(sgml,0,0); /* reset position */ segment = SGMLNextElement(sgml); /* get the next element */ while(segment!=""){ /* while not at the end of the doc */ if (FindInString(segment,"<HTML")>(-1)){ /* if this an HTML tag */ segment = SGMLNextElement(sgml); /* get the next element */ continue; /* go back for next tag */ } /* */ if (FindInString(segment,"<BODY")>(-1)){ /* if this an HTML tag */ segment = SGMLNextElement(sgml); /* get the next element */ continue; /* go back for next tag */ } /* */ sx = SGMLGetItemPosEX(sgml); /* get start x */ sy = SGMLGetItemPosEY(sgml); /* get start y */ contents = SGMLFindClosingElement(sgml, SP_FCE_CODE_AS_IS); /* get the content of the element */ if (FindInString(contents, find)>(-1)){ /* if the target exists in the string */ ex = SGMLGetItemPosSX(sgml); /* get end x */ ey = SGMLGetItemPosSY(sgml); /* get end y */ contents = ReplaceInString(contents,find,replace); /* get new content */ WriteSegment(edit_obj, contents, sx,sy,ex,ey); /* write new string out */ SGMLSetPosition(sgml,ex,ey); /* set position */ ix++; /* increment counter */ } /* */ segment = SGMLNextElement(sgml); /* get the next element */ } /* */ return ix; /* return no error */ } /* */ /****************************************/ int main(){ /* main method */ /****************************************/ setup(); /* run the setup */ return ERROR_NONE; /* return */ } /* */
This script is meant to be a template, from here you can modify the file and add any find and replace operations you want that would be a normal part of the editing process. This means your users can simply run a function instead of having to manually do find and replace operations on code view. The script itself could also be further modified if you want, to run find and replaces on the block tags themselves, instead of just their content, but this is a good starting point to go from.
Steven Horowitz has been working for Novaworks for over five years as a technical expert with a focus on EDGAR HTML and XBRL. Since the creation of the Legato language in 2015, Steven has been developing scripts to improve the GoFiler user experience. He is currently working toward a Bachelor of Sciences in Software Engineering at RIT and MCC. |
Additional Resources
Legato Script Developers LinkedIn Group
Primer: An Introduction to Legato