In this article, we will be discussing error codes and error processing while making the complex simpler and taking the mystery out of error processing.
Friday, February 10. 2017
LDC #21: Understanding and Processing Errors
What Is an Error Anyway?
There are a few things in programming life that would seem intuitively objective but are not. Error detection and processing definitely falls into this category, where events that would seem obvious to test for one programmer are completely ignored by another. Countless books are available on the subject and there is certainly a lot of information available on the web. One thing is for sure: the more you look into error processing, the deeper the rabbit hole can become.
Probably 50% of the code of a well-written program is code for error processing. This is particularly true when dealing with external input for which the programmer has no control. Further, failure to correctly identify and process an error can lead to difficulty determining the nature of the problem or, even worse, corrupted or missing data.
This blog will focus not so much on the art of testing and recovering from errors but rather the tools and information to help the Legato programmer deal with errors.
Let’s start by looking at the types of errors. On the input side, there is simple user data entered and validated. If the data is in error, the program should detect the error and make some sort of intelligent report back to the user. For example, if the user enters a bad filename or path, the program should notify him or her. The response might be a friendly message box explaining the error. Even better, after the program reports the error and the user presses ‘ok’ in the message box, the underlying dialog code could set the keyboard focus on the offending value.
Another error might be something along the lines of loading an internal data file but the file could not be found. In that case, there may be little the end user can do to remedy the situation, short of reinstalling the program or contacting whatever technical support is available.
This leads to the other end of the spectrum on errors. For example, consider a serious run-time parameter error that stops a program from continuing execution. Of course, some errors are beyond graceful recovery or even the control of the running program.
Another aspect of error detection and recovery is whether the program is left in a known and well-defined state after the error. For example, while collecting and validating the contents of dialog, if we store the information by replacing global variable data and then encounter a user data error, we have effectively corrupted the state of the program. One might regard this as trivial, but if the user then presses ‘cancel’ on the dialog, they have unknowingly changed certain data. Oops!
While debugging and problem solving, failure to detect errors can be disastrous. As another example, loading a data file and not checking for errors, placing the expected data into another object, and then relying on that erroneous data for further work can lead to an afternoon of befuddled curse words. One can find themselves chasing their tail for hours when proper error checking and reporting would immediately indicate the nature of the problem.
Finally, there are myriad styles of detecting and processing errors. Many are dependent on or limited by the underlying language and, like a lot of things, can sometimes be abused by programmers.
To sum up, every action in a program is subject to a range of errors both in seriousness and recoverability. When programming, one should consider the likelihood of the error and what to do to correct it. By the way, sometimes stopping program execution cold is fine and can help later in locating and correcting an issue.
Program Exceptions, the Disappearing Act, and Internal Errors
Legato is an interpreted language. As such, it is exceedingly rare to have an error in a script cause a ‘trap’ or exception. If this occurs in a persistent or repeatable basis, contact technical support. An exception results when a program attempts to access memory or CPU instructions that either do not exist, are outside of the program’s memory scope, or are not allowed at the current level of execution privilege. A more insidious problem is when Windows decides that your program is done running either because it executes erroneous code to ‘exit’ or has an unrecoverable error, such as getting stuck in a paint loop (program paints and generates another paint command while painting or never validates the initial paint request). Some of these conditions can occur within a script and, to the extent practical, are documented within related functions and procedures.
Another commonly seen error for all windows programs is ‘program not responding’, which results in a program not returning from or responding to Windows in a timely manner. Getting stuck in an infinite loop can result in this condition. While working within the GoFiler IDE, pressing Ctrl+Break will force a script to stop executing.
When internal program errors are displayed, please report them to technical support.
Common Methods of Reporting API Errors
Throughout the history of programming, there have been many methods of reporting API errors from system, library, or user functions. Sometimes the conventions are considerably different between platforms and languages, and sometimes they are mixed within the same paradigm. A good example is the return value of 0. In ANSI C/C++, many functions return 0 as success, while in the Windows API many functions return 0 on failure. For example, take the ‘rename’ (or equivalent) function: in C++ returns 0, in Win32 SDK, non-zero, and in PHP returns TRUE (1). (To add to the confusion, the Windows SDK defines ERROR_SUCCESS as 0, which applies to error codes, not return values.)
Within Legato, all SDK functions that return error values use ERROR_NONE for success (similar to 0 with the top bit as zero) or as a formatted error code on failure. Boolean functions return TRUE (1) or FALSE (0) depending on the tested condition.
The formatted error code method allows for a wealth of information to be returned. For example, a file copy can return not only the fact that an error occurred, but also if it was a file error, whether it was the source or destination file, and the Windows error code.
It is also not uncommon to see -1 as error indicator, particularly when searching for something. For example, the InString function can return a character position for the index of the matching character or -1 if a match could not be made. As an aside, not checking the return value for an error (-1) and then indexing an array with that negative value will result in a runtime error with the script terminating.
Legato also features a last error code and message. Every SDK function will reset and subsequently set the last error value. The only exceptions are SDK functions that check the last error. For example, the GetLastError function will report the last error but not alter its state.
Here are the basic rules for return values Legato:
– Functions that return a numeric return value (marked as int or dword data types) will be a positive number including zero or a formatted error code with the 32nd bit set. Note that -1 will be interpreted as an error in this paradigm. Also note that functions that return 64-bit values will require the programmer to check the last error.
– Functions that return a string value will be empty on failure. However, an empty string can be valid in some cases. Therefore, the last error value must be checked, particularly if the return value is critical. For example, the FileToString function can return an empty string meaning either the file could not be loaded or the file was empty. The only way to be sure is to examine the last error code.
– Functions that return an array will return an empty set on failure. This means all dimensions will have no defined elements (the function ArrayGetAxisDepth will return zero). Again, checking the last error code is the easiest method to determine if there was a failure.
– Functions that return a boolean value will return FALSE (0) on both a false condition and on failure. So, like string return values, the last error code must be checked and may contain additional information. For example, when the IsFile function returns FALSE, the last error code will contain the error resulting in the false condition.
– Functions that return a handle will return NULL_HANDLE or 0 on failure. The Windows operating system also uses INVALID_HANDLE_VALUE, the equivalent of -1. However, for all general Legato SDK calls, 0 is considered an invalid handle. Note that some handles can have the 32nd bit set, so formatted error codes are never returned as handle values.
Always read the documentation for the function being used to be sure of the return value.
User defined functions can follow the above rules, but the rules are not enforced by the programming environment. To set the last error code or message, programs can use the SetLastError function prior to returning.
Be advised that the return value is also used by the application. This is particularly true when returning from procedures and hooks.
Testing For an Error
Legato has four functions to deal with the last error or testing a numeric value as an error. The first is the GetLastError function. It will return that last error’s error code. While rare, some SDK functions may also use the last error code to return additional information about the success of the operation. An example would be the HTTPGetString function that will set the HTTP response code as the last error code even on success. Many internet related messages will also set the last error message, which may contain data from the connected server.
The IsError and IsNotError functions test either the provided parameter and/or the last error depending on the data type supplied. If no parameter is supplied, only the last error code is tested. If the value is numeric, the numeric value is tested. For other return values, both the data and the last error are tested.
The Formatted Error Code
As mentioned above, Legato employs a formatted error code to return error information from most SDK functions. The error structure and defined values can be used for non-SDK functions. We recommend using the codes for all functions that can return an error. A series of predefined SDK terms starting with the prefix ERROR_ are used to define all the masks and types for error bit interpretation.
An error code can convey a wealth of information if you know how to interpret the data.
Generally, the top bit of the 32-bit dword will indicate an error condition. For the most part, functions and programs should avoid returning an integer value with the 32nd bit set. Most SDK functions do not use this bit so as to avoid any issues with interpreting a return value as an error versus good result. (As a side note, -1 is interpreted as an error since the all bits are set, including the 32nd bit.)
The two top bits indicate an error condition and class (two bits shown in red). There are three possible conditions: (i) ERROR_BIT not set, meaning no error; (ii) ERROR_BIT set and 31st bit not set, which is a soft error (designated as ERROR_SOFT); and, (iii) both bits set, which is a hard error (designated as ERROR_FATAL). Having the 31st bit set without the 32nd bit set is not considered an error. Both bits are covered by the ERROR_CLASS_MASK mask.
Soft errors are a class of errors that are generally considered recoverable or are related to natural operation. If a user presses ‘Cancel’ on a dialog box, for example, ERROR_CANCEL will be returned. This is a soft error with the type set as cancel.
On the other hand, fatal errors are exactly that: fatal. Something really bad happened, like running out of memory, a file no longer being available because a drive was unmounted, or perhaps an internal error. Fatal errors are generally not recoverable and require some specific action by the user, including restarting the program or computer.
We will skip the middle bits (24-17) for the moment and talk about the lower bits. For many errors, this will contain a code or some additional information. A good example is a non-fatal file error, generally designated as class and type ERROR_FILE. The lower bits of the code will usually contain error details, such as, ‘File Not Found’ or ‘Path Not Found’, defined in the Windows and Legato SDKs, as ERROR_FILE_NOT_FOUND and ERROR_PATH_NOT_FOUND, respectively. Common Windows SDK errors are also defined in the Legato SDK. To test for a condition such as this (assuming rc contains our formatted error code):
if ((rc & ERROR_CODE_MASK) == ERROR_FILE_NOT_FOUND) { ... do something ... }
Other SDK functions make use of the code to indicate which parameter was in error or the position of the parameter that contained the error.
Moving back to the center bits, they can convey additional information. For example, when working with multiple files, data type values are provided: ERROR_DT_SOURCE and ERROR_DT_DESTINATION. Using these can help to determine which file parameter is the offending parameter. ERROR_CANCEL can sometimes have the ERROR_CANCEL_NON_ELECTIVE bit set, indicating that the operation was cancelled because of a condition rather than an action. These are combined into ERROR_CANCEL_AUTO. The other bit is ERROR_REPORTED, which indicates the error has been logged or displayed to the user.
Finally, there is the message information mask or area, which comprises two bits in the top byte. This can be used as non-error return values, such as ERROR_MESSAGE and ERROR_MESSAGE_OK. They are actually the same thing, but depending on the action, they may be interpreted differently. The former is used by many menu-related functions to indicate a menu function has been translated to a new code, and this is returned in the lower word. The latter is used by many window message functions to indicate success in processing since a plain window message will return 0 if it cannot be processed or the window cannot be reached.
The following table provides an overview of the various error classes, how they are used, and when to use them:
Definition | Bits | Description | |
---|---|---|---|
Control | |||
ERROR_MASK | 0xFF000000 | Error Code Mask | |
ERROR_CLASS_MASK | 0xC0000000 | Type Error Code Mask | |
ERROR_BIT | 0x80000000 | All Errors Must Have Bit Set | |
Code Types | |||
ERROR_CODE_TYPE_MASK | 0x00400000 | Error Code Type Mask | |
ERROR_CT_LOCAL | 0x00000000 | Code is Local (default) | |
ERROR_CT_WINDOWS | 0x00400000 | Code is Windows API Code | |
Optional Report Information | |||
ERROR_REPORTED | 0x00800000 | Error was Reported/Recorded (by default, all fatal errors are reported at the point of the error) | |
Data Types (apply to parameters) | |||
ERROR_DATA_TYPE_MASK | 0x00300000 | Error Code Type Mask | |
ERROR_DT_GENERAL | 0x00000000 | General Error (default) | |
ERROR_DT_SOURCE | 0x00100000 | Applies to Source Data | |
ERROR_DT_DESTINATION | 0x00200000 | Applies to Destination Data | |
Cancel Expansion | |||
ERROR_CANCEL_MASK | 0x00300000 | Mask for Cancel Type | |
ERROR_CANCEL_ELECTIVE | 0x00000000 | Cancelled At Request of User | |
ERROR_CANCEL_NON_ELECTIVE | 0x00100000 | Cancelled Because of Condition | |
No Error | |||
ERROR_NONE | 0x00000000 | No Error | |
ERROR_NONE_MASK | 0x000FFFFF | No Error Return Value Mask | |
ERROR_MESSAGE_OK | 0x20000000 | No Error (Result for Message) | |
ERROR_NO_REPORT | 0x00000000 | Error not Reported (semantic definition) | |
Inter-Window Messages | |||
ERROR_MESSAGE | 0x20000000 | Error/Result is Message | |
Non-Fatal Class Errors | |||
ERROR_SOFT | 0x80000000 | Class (Soft Error) | |
ERROR_EOD | 0x81000000 | End of Data | |
ERROR_CANCEL | 0x82000000 | Operation was Cancelled | |
ERROR_OVERFLOW | 0x83000000 | Value or String Overflow | |
ERROR_SYNTAX | 0x84000000 | Value or String Syntax Error | |
ERROR_FILE | 0x85000000 | File Windows API Error (with type) | |
ERROR_FUNCTION_NOT_SUPPORTED | 0x86000000 | Function Not Supported | |
ERROR_RANGE | 0x87000000 | Parameter Out of Range | |
ERROR_REMOTE | 0x88000000 | Error from Remote System (Cloud) | |
ERROR_EXIT | 0x89000000 | Function Requests Exit/No error | |
ERROR_CONTEXT | 0x8A000000 | The Context Was Not Correct | |
ERROR_TIME_OUT | 0x8B000000 | A Timeout Occurred in a Routine | |
Fatal Class Errors | |||
ERROR_FATAL | 0xC0000000 | Class (Non-Specific Fatal Error) | |
ERROR_MEMORY | 0xC1000000 | Error Allocating or Locking Memory | |
ERROR_FILE_IO | 0xC2000000 | File Error, Read/Write/Position | |
ERROR_FILE_INTERNAL | 0xC3000000 | File Error, Internal File | |
ERROR_FILE_EXTERNAL | 0xC4000000 | File Error, External File (user) | |
ERROR_WINDOWS_API | 0xC5000000 | Windows API Error (with type) | |
ERROR_PARAMETER | 0xC6000000 | An Invalid Parameter Was Passed | |
ERROR_RESOURCE | 0xC7000000 | Resource Could Not be Found | |
ERROR_CONDITION | 0xC8000000 | Invalid Condition Existed in Routine | |
Error Details | |||
ERROR_CODE_MASK | 0x0000FFFF | Error Code Mask | |
ERROR_FATAL_LOCAL | 0xC0000000 | Fatal Error with a Local Code | |
ERROR_SOFT_LOCAL | 0x80000000 | Soft Error with a Local Code | |
ERROR_CANCEL_AUTO | 0x82100000 | Non-Elective Cancelled (Condition) |
By convention, fatal errors are always reported within the application, usually as internal errors and usually with a STOP icon. Many Legato functions have an internal ‘silent’ flag set within the host application to avoid reporting errors directly to the user. This avoids having an automated script getting hung up waiting for user input. The script programmer can control the reporting and response.
Testing For Success
Always doing the following is a bad idea:
if (rc != ERROR_NONE) { ... do something ... }
The reason for this is simple: The error condition is truly indicated by ERROR_BIT. While most functions do return just plain 0 or ERROR_NONE on success, some do not and you will be opening the door to problems.
This is more to the point:
if ((rc & ERROR_BIT) != 0) { ... do something ... }
or,
if ((rc & ERROR__MASK) != ERROR_NONE) { ... do something ... }
And this is even easier and more effective:
if (IsError(rc)) { ... do something ... }
or even,
if (IsError()) { ... do something ... }
Since all SDK functions set the last error code, the last example will test only the last error condition.
Reporting Errors To the User
For scripts running in the application desktop, the normal method of reporting an error is to use a message box. Windows conventionally uses three icons: stop, exclamation, and info:
It is not uncommon to see programs using the stop icon for noncritical messages. This is wrong. Stop should be reserved for when the ca-ca really hits the fan, so to speak. The exclaim icon should be used for most messages. Both messages will also give a specific sound. The information icon is used less frequently, usually for messages that are not errors.
One of the most common errors to process occurs when there is a problem with a user specified file. For example, the user specifies the source file for a data conversion action and it cannot be opened.
handle hFile; string name; int rc; name = "C:\\No Path\\My File.txt"; hFile = OpenFile(name); if (IsError(hFile)) { rc = GetLastError(); if ((rc & ERROR_CODE_MASK) == ERROR_FILE_NOT_FOUND) { MessageBox('x', "Could not find file %s", name); } else { MessageBox('x', "Could not open file %s\r\r%s", name, TranslateWindowsError(rc)); } return rc; }
The handle is checked with the IsError function and the code is retrieved with the GetLastError function. A special condition is tested for the file not being found and that is reported in a more friendly manner while other errors are displayed using a general box with the error code being translated to a string message using the TranslateWindowsError function.
A simple function combines the error reporting above into one operation:
handle hFile; string name; int rc; name = "C:\\No Path\\My File.txt"; hFile = OpenFile(name); if (IsError(hFile)) { rc = GetLastError(); ReportFileError(name, GetLastError()); return rc; }
The ReportFileError function creates a more friendly message by looking at common problems, like file sharing and path errors:
or, a different path:
Let’s look at a variation when validating input from a dialog:
name = EditGetText(MY_FILE_INPUT); hFile = OpenFile(name); if (IsError(hFile)) { ReportFileError(name, GetLastError()); return ERROR_SOFT | MY_FILE_INPUT; }
In this case, we are retrieving the name of the file from an edit control on a dialog page. After checking the error, we report it to the user and the return the ID of the offending control. The dialog validate procedure will automatically stop validating and refocus the keyboard to the dialog control specified by MY_FILE_INPUT. Note the shorthand cheat we used in avoiding the need to use rc for the error or return code as we have before. Remember to use caution with this approach because any SDK function executed prior to calling the GetLastError function will reset the last error code.
Conclusion
Hopefully this information can help you build better programs and understand how to detect and report errors. By the way, when you see a nasty blue screen or some other Windows message like 0xC0000005, you can get a basic idea that it means something fatal (0xC0000000) happened, like an access rights violation (0x00000005) (which is most likely ERROR_ACCESS_DENIED).
Until the next blog, I wish you ‘ERROR_SUCCESS’ in Windows-speak or, in Legato, ‘ERROR_NONE’.
Scott Theis is the President of Novaworks and the principal developer of the Legato scripting language. He has extensive expertise with EDGAR, HTML, XBRL, and other programming languages. |
Additional Resources
Legato Script Developers LinkedIn Group
Primer: An Introduction to Legato