I had an interesting issue crop up this week: a client had a file that had a 21 inch indent on a paragraph, causing problems when the file was printed to PDF. When GoFiler converts your Word file to HTML, if there are a lot of spaces or tabs on a line, it can cause GoFiler to interpret it as having a large indent level. Normally this isn’t a big deal, but if the indent ends up as large as this, it can cause rendering issues. So, I was asked if there was a way we could test for this, and figured it would be an interesting problem for a blog post.
Friday, November 01. 2019
LDC #159:Validating HTML Indents and Margins With Legato
At first glance, this seems like a simple task... just open the file and look at the text for large indent levels. After thinking about it for a little while though, it gets a bit more complicated. The initial problem was an indent level of 21 inches... well what if it was 10000px? It would be just as problematic, but it’s in a different unit of measure, so that rules out just looking for something like “21in” in the file. We’re going to have to actually parse the SGML code, get the values, convert them into a standardized unit, and compare that standard unit against a defined value to determine if it’s considered “too large”.
The standard unit we will be converting to and comparing against is called a “TWIP”. This is an abbreviation for a twentieth of a inch point, making a single TWIP 1/1,440 of an inch. We’re looking for what can reasonably be called abnormally large indents, so I figured a 5 inch indent would be a good place to start. That means we need a defined value in the validator for 5 inches in TWIPs, which is then 7,200 TWIPs. With that bit of math out of the way, let’s get into checking out our script.
/****************************************/ int run(int f_id, string mode, handle window) { /* Call from Hook Processor */ /****************************************/ ... omitted declarations ... /* */ if (mode!="postprocess"){ /* if not postproces */ return ERROR_NONE; /* return without error */ } /* */ if(IsWindowHandleValid(window) == false){ /* if the window is valid */ edit_window = GetActiveEditWindow(); /* get handle to edit window */ } /* */ else{ /* if passed a valid window handle */ edit_window = window; /* set edit window to passed window hdl */ } /* */ if(IsError(edit_window)){ /* get active edit window */ return ERROR_NONE; /* return */ } /* */ type = GetEditWindowType(edit_window) & EDX_TYPE_ID_MASK; /* get the type of the window */ if (type!=EDX_TYPE_PSG_PAGE_VIEW && type!=EDX_TYPE_PSG_TEXT_VIEW){ /* and make sure type is HTML or Code */ return ERROR_NONE; /* return error */ } /* */
The run function is called from the hook processor set up in the setup function, which is identical to other setup functions in other blog posts, so not covered again here. We need to ensure we’re running after the validate menu function has already run, so if not postprocess we can just exit. If this function is not passed a valid window (it is passed a valid window if run from the Legato IDE only); it just grabs the current active window as the window to validate. If that failed, it will then return, because it can’t do anything without an edit window. The final step before we can actually begin processing is to test our window type. Unless we’re in page view or code view, we’re going to need to return, because this function is designed to work only with those two window types.
errors = 0; /* reset errors */ ix = 0; /* reset counter */ log = LogCreate("Validate spacing"); /* create a log file */ AddMessage(log,"Validating spacing..."); /* validate log */ LogIndent(log); /* indent the log */ text = GetEditObject(edit_window); /* get text of windodw */ LogSetWindow(log, edit_window); /* set the edit window for the log */ sgml = SGMLCreate(text); /* get SGML object */ element = SGMLNextElement(sgml); /* get the next SGML element */
Now that we’ve ensured we’re running in postprocess mode on a valid window, we can reset our error and ix variables, and create a log file. The log gets a start message added, and we can indent it to make it look a little neater. We can grab the edit object then, since we’ll need that for our SGML parser, and use LogSetWindow to set our current window as the target of the log file. Setting the target of a log means when a user clicks on an error message in the log, it can go to a specific position in the file. This is very handy for validation like we’re doing here. Once we have our log set up, we can create our SGML parser, and get the first SGML element.
while (element!=""){ /* for all SGML elements */ props = CSSGetProperties(sgml); /* get properties of element */ sx = SGMLGetItemPosSX(sgml); /* get start pos */ sy = SGMLGetItemPosSY(sgml); /* get start pos */ ex = SGMLGetItemPosEX(sgml); /* get end pos */ ey = SGMLGetItemPosEY(sgml); /* get end pos */ errors+= test_val(props["text-indent"], log, sx,sy,ex,ey); /* test if error with text-indent */ errors+= test_val(props["margin"], log, sx,sy,ex,ey); /* test if error with text-indent */ errors+= test_val(props["margin-top"], log, sx,sy,ex,ey); /* test if error with text-indent */ errors+= test_val(props["margin-bottom"], log, sx,sy,ex,ey); /* test if error with text-indent */ errors+= test_val(props["margin-left"], log, sx,sy,ex,ey); /* test if error with text-indent */ errors+= test_val(props["margin-right"], log, sx,sy,ex,ey); /* test if error with text-indent */ element = SGMLNextElement(sgml); /* get the next SGML element */ } /* */
Since our SGMLNextElement function will only return a blank string if there is no next value or on an error, when it returns a non-empty string we can assume we got an element. Using the CSSGetProperties function, we can get an array of the CSS properties of the current element, and we can get the start and end positions of our this element as well. With the properties and location known, we can then pass each SGML property we want to test to our test_val sub function. Let’s take a look at that function now, and come back to the run function at the end.
/****************************************/ int test_val(string value, handle log, int sx, int sy, int ex, int ey){ /* test the given value */ /****************************************/ qword margin; /* get margin */ int rc; /* return code */ /* */ if(value == ""){ /* if no value provided */ return 0; /* return 0 for no error */ } /* */ else{ /* if given a value */ margin = SGMLStringToValue(value); /* get the indent val */ rc = GetLastError(); /* get the error */ if(IsError(rc)){ /* if error */ LogSetMessageType(LOG_ERROR); /* set to error type */ LogSetPosition(sx,sy,ex,ey); /* set position of message */ AddMessage(log,"Cannot convert measurement value %s",value); /* add conversion message */ LogSetMessageType(LOG_NONE); /* set to no error type */ return 1; /* increment error count */ } /* */
For each test we run, the test_val function is called. If it’s passed a blank value, it can just return zero indicating there were no problems. An incoming blank value would mean that the element does not have the property we’re testing. If we actually have a value though, we can then convert the string value to a qword PVALUE. A PVALUE is a structured parameter value, that can represent measurements. We’re going to use this format as an intermediate value, which we can then convert to TWIPs, and compare to our defined value. If the SGMLStringToValue function encounters an error though, it means GoFiler doesn’t understand the unit measurement (maybe it’s a syntax error in the source file), and we need to log mode to error, set the position of the error in the log with LogSetPosition so users can click on the error to go to the appropriate location, and log the error message. Then we can return 1, to increment the number of errors in the file.
margin = SGMLValueToTWIPS(margin); /* convert to TWIPS */ if(margin > SPACE_THRESHOLD){ /* if the margin is too big */ LogSetMessageType(LOG_WARNING); /* set type to warning */ LogSetPosition(sx,sy,ex,ey); /* set position of message */ AddMessage(log, "Found margin size %d, line %d", /* display message */ margin,sy+1); /* display message */ LogSetMessageType(LOG_NONE); /* set warning */ return 1; /* increment error count */ } /* */ return 0; /* return no error */ } /* */ }
If the SGMLStringToValue function didn’t generate an error though, we can then use SGMLValueToTWIPS to convert it into a TWIPs value, and directly compare it to our defined value. If the value exceeds our threshold, we can go ahead and set the log type, set the log position, and log the error. With the error logged, we can then return 1 to increment the number of errors in the file. If the number doesn’t exceed our threshold, it means it’s just a normal margin, so we can return 0, to prevent the number of errors from increasing. Now that our test function has returned, we can go back to the run function, and pick up after all tests have been run.
if(errors > 0 ){ /* if we have errors */ MessageBox('i',ISSUES_WRN); /* warn user */ LogDisplay(log); /* display the log */ } /* */ CloseHandle(sgml); /* close the SGML object */ CloseHandle(text); /* close the text object */ CloseHandle(edit_window); /* close the edit window */ CloseHandle(log); /* close handle to log */ return ERROR_NONE; /* return no error */ } /* end run */
All that’s left to do now is check the number of errors. If we have more than zero, we can display a message to the user, display the log file, and close up our handles created during execution. Finally we just return ERROR_NONE, to exit the script.
This script doesn’t correct the errors but it allows the user to find them quickly in the file which reduces the time spent during the proofing and review process.
This validation as it is now simply tests the indent and margins of the objects, since that was the area where errors were found in the file I was testing this on. However, it’s entirely possible for padding to also have problems. This script is easily expandable to test more than just the six attributes it’s given, all you would have to do is add additional lines of code to the run function where all the tests are called, with different CSS property values. The complete script is included below without commentary:
// // // GoFiler Legato Script - Validate Spacing // ------------------------------------------ // // Rev 11/01/2019 // // (c) 2019 Novaworks, LLC -- All rights reserved. // // Detects if a paragraph has unusually indent spacing #define SPACE_THRESHOLD 7200 /* in TWIPS, 7200 = 5 inches */ #define ISSUES_WRN "Found unusually large indents in file, see information view for more details." int run (int f_id, string mode, handle window); int test_val (string value, handle log, int sx, int sy, int ex, int ey); /****************************************/ int setup() { /* Called from Application Startup */ /****************************************/ string fnScript; /* Us */ int rc; /* Return Code */ /* */ fnScript = GetScriptFilename(); /* Get the script filename */ MenuSetHook("EDGAR_VALIDATE", fnScript, "run"); /* Set the Test Hook */ MenuSetHook("REVIEW_DISPLAY_ERRORS ", fnScript, "run"); /* Set the Test Hook */ return ERROR_NONE; /* Return value */ } /* end setup */ /****************************************/ void main() { /* Initialize from Hook Processor */ /****************************************/ string windows[][]; /* array of window info */ int ix,size; /* counter variables */ /* */ if(GetScriptParent()=="LegatoIDE"){ /* if running in IDE mode */ windows = EnumerateEditWindows(); /* windows */ size = ArrayGetAxisDepth(windows); /* get open windows */ for(ix=0; ix<size; ix++){ /* for each variable */ if(IsInString(windows[ix]["Filename"],".htm")){ /* if it's an HTML file window */ run(0,"postprocess",MakeHandle(windows[ix]["ClientHandle"])); /* run the process on the window */ } /* */ } /* */ } /* */ setup(); /* Add to the menu */ } /* end setup */ /****************************************/ int run(int f_id, string mode, handle window) { /* Call from Hook Processor */ /****************************************/ handle edit_window; /* active edit window */ handle text; /* mapped text object */ handle sgml; /* sgml object of page view */ handle log; /* log file */ string props[]; /* css properties of element */ string element; /* line of text */ int sx,sy,ex,ey; /* positions of text */ int errors; /* number of errors */ int ix; /* counter */ int rc; /* return code */ dword type; /* type of window */ /* */ if (mode!="postprocess"){ /* if not postproces */ return ERROR_NONE; /* return without error */ } /* */ if(IsWindowHandleValid(window) == false){ /* if the window is valid */ edit_window = GetActiveEditWindow(); /* get handle to edit window */ } /* */ else{ /* if passed a valid window handle */ edit_window = window; /* set edit window to passed window hdl */ } /* */ if(IsError(edit_window)){ /* get active edit window */ return ERROR_NONE; /* return */ } /* */ type = GetEditWindowType(edit_window) & EDX_TYPE_ID_MASK; /* get the type of the window */ if (type!=EDX_TYPE_PSG_PAGE_VIEW && type!=EDX_TYPE_PSG_TEXT_VIEW){ /* and make sure type is HTML or Code */ return ERROR_NONE; /* return error */ } /* */ errors = 0; /* reset errors */ ix = 0; /* reset counter */ log = LogCreate("Validate spacing"); /* create a log file */ AddMessage(log,"Validating spacing..."); /* validate log */ LogIndent(log); /* indent the log */ text = GetEditObject(edit_window); /* get text of windodw */ LogSetWindow(log, edit_window); /* set the edit window for the log */ sgml = SGMLCreate(text); /* get SGML object */ element = SGMLNextElement(sgml); /* get the next SGML element */ while (element!=""){ /* for all SGML elements */ props = CSSGetProperties(sgml); /* get properties of element */ sx = SGMLGetItemPosSX(sgml); /* get start pos */ sy = SGMLGetItemPosSY(sgml); /* get start pos */ ex = SGMLGetItemPosEX(sgml); /* get end pos */ ey = SGMLGetItemPosEY(sgml); /* get end pos */ errors+= test_val(props["text-indent"], log, sx,sy,ex,ey); /* test if error with text-indent */ errors+= test_val(props["margin"], log, sx,sy,ex,ey); /* test if error with text-indent */ errors+= test_val(props["margin-top"], log, sx,sy,ex,ey); /* test if error with text-indent */ errors+= test_val(props["margin-bottom"], log, sx,sy,ex,ey); /* test if error with text-indent */ errors+= test_val(props["margin-left"], log, sx,sy,ex,ey); /* test if error with text-indent */ errors+= test_val(props["margin-right"], log, sx,sy,ex,ey); /* test if error with text-indent */ element = SGMLNextElement(sgml); /* get the next SGML element */ } /* */ if(errors > 0 ){ /* if we have errors */ MessageBox('i',ISSUES_WRN); /* warn user */ LogDisplay(log); /* display the log */ } /* */ CloseHandle(sgml); /* close the SGML object */ CloseHandle(text); /* close the text object */ CloseHandle(edit_window); /* close the edit window */ CloseHandle(log); /* close handle to log */ return ERROR_NONE; /* return no error */ } /* end run */ /****************************************/ int test_val(string value, handle log, int sx, int sy, int ex, int ey){ /* test the given value */ /****************************************/ qword margin; /* get margin */ int rc; /* return code */ /* */ if(value == ""){ /* if no value provided */ return 0; /* return 0 for no error */ } /* */ else{ /* if given a value */ margin = SGMLStringToValue(value); /* get the indent val */ rc = GetLastError(); /* get the error */ if(IsError(rc)){ /* if error */ LogSetMessageType(LOG_ERROR); /* set to error type */ LogSetPosition(sx,sy,ex,ey); /* set position of message */ AddMessage(log,"Cannot convert measurement value %s",value); /* add conversion message */ LogSetMessageType(LOG_NONE); /* set to no error type */ return 1; /* increment error count */ } /* */ margin = SGMLValueToTWIPS(margin); /* convert to TWIPS */ if(margin > SPACE_THRESHOLD){ /* if the margin is too big */ LogSetMessageType(LOG_WARNING); /* set type to warning */ LogSetPosition(sx,sy,ex,ey); /* set position of message */ AddMessage(log, "Found margin size %d, line %d", /* display message */ margin,sy+1); /* display message */ LogSetMessageType(LOG_NONE); /* set warning */ return 1; /* increment error count */ } /* */ return 0; /* return no error */ } /* */ }
Steven Horowitz has been working for Novaworks for over five years as a technical expert with a focus on EDGAR HTML and XBRL. Since the creation of the Legato language in 2015, Steven has been developing scripts to improve the GoFiler user experience. He is currently working toward a Bachelor of Sciences in Software Engineering at RIT and MCC. |
Additional Resources
Legato Script Developers LinkedIn Group
Primer: An Introduction to Legato