LDC #21: Understanding and Processing Errors

Friday, February 10. 2017

LDC #21: Understanding and Processing Errors

In this article, we will be discussing error codes and error processing while making the complex simpler and taking the mystery out of error processing.

What Is an Error Anyway?

There are a few things in programming life that would seem intuitively objective but are not. Error detection and processing definitely falls into this category, where events that would seem obvious to test for one programmer are completely ignored by another. Countless books are available on the subject and there is certainly a lot of information available on the web. One thing is for sure: the more you look into error processing, the deeper the rabbit hole can become.

Probably 50% of the code of a well-written program is code for error processing. This is particularly true when dealing with external input for which the programmer has no control. Further, failure to correctly identify and process an error can lead to difficulty determining the nature of the problem or, even worse, corrupted or missing data.

This blog will focus not so much on the art of testing and recovering from errors but rather the tools and information to help the Legato programmer deal with errors.

Let’s start by looking at the types of errors. On the input side, there is simple user data entered and validated. If the data is in error, the program should detect the error and make some sort of intelligent report back to the user. For example, if the user enters a bad filename or path, the program should notify him or her. The response might be a friendly message box explaining the error. Even better, after the program reports the error and the user presses ‘ok’ in the message box, the underlying dialog code could set the keyboard focus on the offending value.

Another error might be something along the lines of loading an internal data file but the file could not be found. In that case, there may be little the end user can do to remedy the situation, short of reinstalling the program or contacting whatever technical support is available.

This leads to the other end of the spectrum on errors. For example, consider a serious run-time parameter error that stops a program from continuing execution. Of course, some errors are beyond graceful recovery or even the control of the running program.

Another aspect of error detection and recovery is whether the program is left in a known and well-defined state after the error. For example, while collecting and validating the contents of dialog, if we store the information by replacing global variable data and then encounter a user data error, we have effectively corrupted the state of the program. One might regard this as trivial, but if the user then presses ‘cancel’ on the dialog, they have unknowingly changed certain data. Oops!

While debugging and problem solving, failure to detect errors can be disastrous. As another example, loading a data file and not checking for errors, placing the expected data into another object, and then relying on that erroneous data for further work can lead to an afternoon of befuddled curse words. One can find themselves chasing their tail for hours when proper error checking and reporting would immediately indicate the nature of the problem.

Finally, there are myriad styles of detecting and processing errors. Many are dependent on or limited by the underlying language and, like a lot of things, can sometimes be abused by programmers.

To sum up, every action in a program is subject to a range of errors both in seriousness and recoverability. When programming, one should consider the likelihood of the error and what to do to correct it. By the way, sometimes stopping program execution cold is fine and can help later in locating and correcting an issue.

Program Exceptions, the Disappearing Act, and Internal Errors

Legato is an interpreted language. As such, it is exceedingly rare to have an error in a script cause a ‘trap’ or exception. If this occurs in a persistent or repeatable basis, contact technical support. An exception results when a program attempts to access memory or CPU instructions that either do not exist, are outside of the program’s memory scope, or are not allowed at the current level of execution privilege. A more insidious problem is when Windows decides that your program is done running either because it executes erroneous code to ‘exit’ or has an unrecoverable error, such as getting stuck in a paint loop (program paints and generates another paint command while painting or never validates the initial paint request). Some of these conditions can occur within a script and, to the extent practical, are documented within related functions and procedures.

Another commonly seen error for all windows programs is ‘program not responding’, which results in a program not returning from or responding to Windows in a timely manner. Getting stuck in an infinite loop can result in this condition. While working within the GoFiler IDE, pressing Ctrl+Break will force a script to stop executing.

When internal program errors are displayed, please report them to technical support.

Common Methods of Reporting API Errors

Throughout the history of programming, there have been many methods of reporting API errors from system, library, or user functions. Sometimes the conventions are considerably different between platforms and languages, and sometimes they are mixed within the same paradigm. A good example is the return value of 0. In ANSI C/C++, many functions return 0 as success, while in the Windows API many functions return 0 on failure. For example, take the ‘rename’ (or equivalent) function: in C++ returns 0, in Win32 SDK, non-zero, and in PHP returns TRUE (1). (To add to the confusion, the Windows SDK defines ERROR_SUCCESS as 0, which applies to error codes, not return values.)

Within Legato, all SDK functions that return error values use ERROR_NONE for success (similar to 0 with the top bit as zero) or as a formatted error code on failure. Boolean functions return TRUE (1) or FALSE (0) depending on the tested condition.

The formatted error code method allows for a wealth of information to be returned. For example, a file copy can return not only the fact that an error occurred, but also if it was a file error, whether it was the source or destination file, and the Windows error code.

It is also not uncommon to see -1 as error indicator, particularly when searching for something. For example, the InString function can return a character position for the index of the matching character or -1 if a match could not be made. As an aside, not checking the return value for an error (-1) and then indexing an array with that negative value will result in a runtime error with the script terminating.

Legato also features a last error code and message. Every SDK function will reset and subsequently set the last error value. The only exceptions are SDK functions that check the last error. For example, the GetLastError function will report the last error but not alter its state.

Here are the basic rules for return values Legato:

– Functions that return a numeric return value (marked as int or dword data types) will be a positive number including zero or a formatted error code with the 32nd bit set. Note that -1 will be interpreted as an error in this paradigm. Also note that functions that return 64-bit values will require the programmer to check the last error.

– Functions that return a string value will be empty on failure. However, an empty string can be valid in some cases. Therefore, the last error value must be checked, particularly if the return value is critical. For example, the FileToString function can return an empty string meaning either the file could not be loaded or the file was empty. The only way to be sure is to examine the last error code.

– Functions that return an array will return an empty set on failure. This means all dimensions will have no defined elements (the function ArrayGetAxisDepth will return zero). Again, checking the last error code is the easiest method to determine if there was a failure.

– Functions that return a boolean value will return FALSE (0) on both a false condition and on failure. So, like string return values, the last error code must be checked and may contain additional information. For example, when the IsFile function returns FALSE, the last error code will contain the error resulting in the false condition.

– Functions that return a handle will return NULL_HANDLE or 0 on failure. The Windows operating system also uses INVALID_HANDLE_VALUE, the equivalent of -1. However, for all general Legato SDK calls, 0 is considered an invalid handle. Note that some handles can have the 32nd bit set, so formatted error codes are never returned as handle values.

Always read the documentation for the function being used to be sure of the return value.

User defined functions can follow the above rules, but the rules are not enforced by the programming environment. To set the last error code or message, programs can use the SetLastError function prior to returning.

Be advised that the return value is also used by the application. This is particularly true when returning from procedures and hooks.

Testing For an Error

Legato has four functions to deal with the last error or testing a numeric value as an error. The first is the GetLastError function. It will return that last error’s error code. While rare, some SDK functions may also use the last error code to return additional information about the success of the operation. An example would be the HTTPGetString function that will set the HTTP response code as the last error code even on success. Many internet related messages will also set the last error message, which may contain data from the connected server.

The IsError and IsNotError functions test either the provided parameter and/or the last error depending on the data type supplied. If no parameter is supplied, only the last error code is tested. If the value is numeric, the numeric value is tested. For other return values, both the data and the last error are tested.

The Formatted Error Code

As mentioned above, Legato employs a formatted error code to return error information from most SDK functions. The error structure and defined values can be used for non-SDK functions. We recommend using the codes for all functions that can return an error. A series of predefined SDK terms starting with the prefix ERROR_ are used to define all the masks and types for error bit interpretation.

An error code can convey a wealth of information if you know how to interpret the data.

Bitmap description of a formatted error code.

Generally, the top bit of the 32-bit dword will indicate an error condition. For the most part, functions and programs should avoid returning an integer value with the 32nd bit set. Most SDK functions do not use this bit so as to avoid any issues with interpreting a return value as an error versus good result. (As a side note, -1 is interpreted as an error since the all bits are set, including the 32nd bit.)

The two top bits indicate an error condition and class (two bits shown in red). There are three possible conditions: (i) ERROR_BIT not set, meaning no error; (ii) ERROR_BIT set and 31st bit not set, which is a soft error (designated as ERROR_SOFT); and, (iii) both bits set, which is a hard error (designated as ERROR_FATAL). Having the 31st bit set without the 32nd bit set is not considered an error. Both bits are covered by the ERROR_CLASS_MASK mask.

Soft errors are a class of errors that are generally considered recoverable or are related to natural operation. If a user presses ‘Cancel’ on a dialog box, for example, ERROR_CANCEL will be returned. This is a soft error with the type set as cancel.

On the other hand, fatal errors are exactly that: fatal. Something really bad happened, like running out of memory, a file no longer being available because a drive was unmounted, or perhaps an internal error. Fatal errors are generally not recoverable and require some specific action by the user, including restarting the program or computer.

We will skip the middle bits (24-17) for the moment and talk about the lower bits. For many errors, this will contain a code or some additional information. A good example is a non-fatal file error, generally designated as class and type ERROR_FILE. The lower bits of the code will usually contain error details, such as, ‘File Not Found’ or ‘Path Not Found’, defined in the Windows and Legato SDKs, as ERROR_FILE_NOT_FOUND and ERROR_PATH_NOT_FOUND, respectively. Common Windows SDK errors are also defined in the Legato SDK. To test for a condition such as this (assuming rc contains our formatted error code):

if ((rc & ERROR_CODE_MASK) == ERROR_FILE_NOT_FOUND) { ... do something ... }

Other SDK functions make use of the code to indicate which parameter was in error or the position of the parameter that contained the error.

Moving back to the center bits, they can convey additional information. For example, when working with multiple files, data type values are provided: ERROR_DT_SOURCE and ERROR_DT_DESTINATION. Using these can help to determine which file parameter is the offending parameter. ERROR_CANCEL can sometimes have the ERROR_CANCEL_NON_ELECTIVE bit set, indicating that the operation was cancelled because of a condition rather than an action. These are combined into ERROR_CANCEL_AUTO. The other bit is ERROR_REPORTED, which indicates the error has been logged or displayed to the user.

Finally, there is the message information mask or area, which comprises two bits in the top byte. This can be used as non-error return values, such as ERROR_MESSAGE and ERROR_MESSAGE_OK. They are actually the same thing, but depending on the action, they may be interpreted differently. The former is used by many menu-related functions to indicate a menu function has been translated to a new code, and this is returned in the lower word. The latter is used by many window message functions to indicate success in processing since a plain window message will return 0 if it cannot be processed or the window cannot be reached.

The following table provides an overview of the various error classes, how they are used, and when to use them:

Definition		Bits	Description
Control
	ERROR_MASK	0xFF000000	Error Code Mask
	ERROR_CLASS_MASK	0xC0000000	Type Error Code Mask
	ERROR_BIT	0x80000000	All Errors Must Have Bit Set
Code Types
	ERROR_CODE_TYPE_MASK	0x00400000	Error Code Type Mask
	ERROR_CT_LOCAL	0x00000000	Code is Local (default)
	ERROR_CT_WINDOWS	0x00400000	Code is Windows API Code
Optional Report Information
	ERROR_REPORTED	0x00800000	Error was Reported/Recorded (by default, all fatal errors are reported at the point of the error)
Data Types (apply to parameters)
	ERROR_DATA_TYPE_MASK	0x00300000	Error Code Type Mask
	ERROR_DT_GENERAL	0x00000000	General Error (default)
	ERROR_DT_SOURCE	0x00100000	Applies to Source Data
	ERROR_DT_DESTINATION	0x00200000	Applies to Destination Data
Cancel Expansion
	ERROR_CANCEL_MASK	0x00300000	Mask for Cancel Type
	ERROR_CANCEL_ELECTIVE	0x00000000	Cancelled At Request of User
	ERROR_CANCEL_NON_ELECTIVE	0x00100000	Cancelled Because of Condition
No Error
	ERROR_NONE	0x00000000	No Error
	ERROR_NONE_MASK	0x000FFFFF	No Error Return Value Mask
	ERROR_MESSAGE_OK	0x20000000	No Error (Result for Message)
	ERROR_NO_REPORT	0x00000000	Error not Reported (semantic definition)
Inter-Window Messages
	ERROR_MESSAGE	0x20000000	Error/Result is Message
Non-Fatal Class Errors
	ERROR_SOFT	0x80000000	Class (Soft Error)
	ERROR_EOD	0x81000000	End of Data
	ERROR_CANCEL	0x82000000	Operation was Cancelled
	ERROR_OVERFLOW	0x83000000	Value or String Overflow
	ERROR_SYNTAX	0x84000000	Value or String Syntax Error
	ERROR_FILE	0x85000000	File Windows API Error (with type)
	ERROR_FUNCTION_NOT_SUPPORTED	0x86000000	Function Not Supported
	ERROR_RANGE	0x87000000	Parameter Out of Range
	ERROR_REMOTE	0x88000000	Error from Remote System (Cloud)
	ERROR_EXIT	0x89000000	Function Requests Exit/No error
	ERROR_CONTEXT	0x8A000000	The Context Was Not Correct
	ERROR_TIME_OUT	0x8B000000	A Timeout Occurred in a Routine
Fatal Class Errors
	ERROR_FATAL	0xC0000000	Class (Non-Specific Fatal Error)
	ERROR_MEMORY	0xC1000000	Error Allocating or Locking Memory
	ERROR_FILE_IO	0xC2000000	File Error, Read/Write/Position
	ERROR_FILE_INTERNAL	0xC3000000	File Error, Internal File
	ERROR_FILE_EXTERNAL	0xC4000000	File Error, External File (user)
	ERROR_WINDOWS_API	0xC5000000	Windows API Error (with type)
	ERROR_PARAMETER	0xC6000000	An Invalid Parameter Was Passed
	ERROR_RESOURCE	0xC7000000	Resource Could Not be Found
	ERROR_CONDITION	0xC8000000	Invalid Condition Existed in Routine
Error Details
	ERROR_CODE_MASK	0x0000FFFF	Error Code Mask
	ERROR_FATAL_LOCAL	0xC0000000	Fatal Error with a Local Code
	ERROR_SOFT_LOCAL	0x80000000	Soft Error with a Local Code
	ERROR_CANCEL_AUTO	0x82100000	Non-Elective Cancelled (Condition)

By convention, fatal errors are always reported within the application, usually as internal errors and usually with a STOP icon. Many Legato functions have an internal ‘silent’ flag set within the host application to avoid reporting errors directly to the user. This avoids having an automated script getting hung up waiting for user input. The script programmer can control the reporting and response.

Testing For Success

Always doing the following is a bad idea:

if (rc != ERROR_NONE) { ... do something ... }

The reason for this is simple: The error condition is truly indicated by ERROR_BIT. While most functions do return just plain 0 or ERROR_NONE on success, some do not and you will be opening the door to problems.

This is more to the point:

if ((rc & ERROR_BIT) != 0) { ... do something ... }

or,

if ((rc & ERROR__MASK) != ERROR_NONE) { ... do something ... }

And this is even easier and more effective:

if (IsError(rc)) { ... do something ... }

or even,

if (IsError()) { ... do something ... }

Since all SDK functions set the last error code, the last example will test only the last error condition.

Reporting Errors To the User

For scripts running in the application desktop, the normal method of reporting an error is to use a message box. Windows conventionally uses three icons: stop, exclamation, and info:

An example of a 'stop' error message box.

An example of an 'exclamation' error message box.

An example of an 'information' message box.

It is not uncommon to see programs using the stop icon for noncritical messages. This is wrong. Stop should be reserved for when the ca-ca really hits the fan, so to speak. The exclaim icon should be used for most messages. Both messages will also give a specific sound. The information icon is used less frequently, usually for messages that are not errors.

One of the most common errors to process occurs when there is a problem with a user specified file. For example, the user specifies the source file for a data conversion action and it cannot be opened.

handle     hFile;
string     name;
int        rc;

name = "C:\\No Path\\My File.txt";

hFile = OpenFile(name);
if (IsError(hFile)) {
  rc = GetLastError();
  if ((rc & ERROR_CODE_MASK) == ERROR_FILE_NOT_FOUND) {
    MessageBox('x', "Could not find file %s", name);
    }
  else {
    MessageBox('x', "Could not open file %s\r\r%s", name, TranslateWindowsError(rc));
    }
  return rc;
  }

The handle is checked with the IsError function and the code is retrieved with the GetLastError function. A special condition is tested for the file not being found and that is reported in a more friendly manner while other errors are displayed using a general box with the error code being translated to a string message using the TranslateWindowsError function.

A simple function combines the error reporting above into one operation:

handle      hFile;
string      name;
int         rc;

name = "C:\\No Path\\My File.txt";

hFile = OpenFile(name);
if (IsError(hFile)) {
  rc = GetLastError();
  ReportFileError(name, GetLastError());
  return rc;
  }

The ReportFileError function creates a more friendly message by looking at common problems, like file sharing and path errors:

An example of ReportFileError message box showing 'file not found'.

or, a different path:

An example of ReportFileError message box showing a general windows error.

Let’s look at a variation when validating input from a dialog:

name = EditGetText(MY_FILE_INPUT);
hFile = OpenFile(name);
if (IsError(hFile)) {
  ReportFileError(name, GetLastError());
  return ERROR_SOFT | MY_FILE_INPUT;
  }

In this case, we are retrieving the name of the file from an edit control on a dialog page. After checking the error, we report it to the user and the return the ID of the offending control. The dialog validate procedure will automatically stop validating and refocus the keyboard to the dialog control specified by MY_FILE_INPUT. Note the shorthand cheat we used in avoiding the need to use rc for the error or return code as we have before. Remember to use caution with this approach because any SDK function executed prior to calling the GetLastError function will reset the last error code.

Conclusion

Hopefully this information can help you build better programs and understand how to detect and report errors. By the way, when you see a nasty blue screen or some other Windows message like 0xC0000005, you can get a basic idea that it means something fatal (0xC0000000) happened, like an access rights violation (0x00000005) (which is most likely ERROR_ACCESS_DENIED).

Until the next blog, I wish you ‘ERROR_SUCCESS’ in Windows-speak or, in Legato, ‘ERROR_NONE’.

Scott Theis is the President of Novaworks and the principal developer of the Legato scripting language. He has extensive expertise with EDGAR, HTML, XBRL, and other programming languages.