Scripts can share a data pool that transcends the life of a script and can therefore be shared by multiple other scripts. There are other methods of storing what we usually call “meta” data, but session data is specifically designed to allow scripts to communicate and share information over time and threads. In this blog, we will explore session data and, as a side discussion, atomic operations.
Friday, January 25. 2019
LDC #120: Quickly Sharing and Managing Data Between Scripts
Introduction
There are a number of use cases that require sharing data among scripts. This could include simple operations used by a hook to say “hey, I did this” or “this my result”. Another area where sharing information can useful is to coordinate operations or split up work amongst background scripts. There are a couple of ways to do this, such as using program settings and writing and reading control files. These are slow, but, at least with program settings, they are thread safe.
Legato has a Session Data Manager, and a series of API functions are provided to read to and write data from it. The Session Data Manager is shared by all scripts running within a specific application instance. Data is named and can be grouped. Setting and getting data is facilitated through a handful of functions.
Session Data belongs to the application instance. If another instance of the application is opened, the new instance’s session data is isolated from the first instance. All scripts in the instance can access the data and the data lives until the Session Data Manager is reset or the application instance is shutdown.
The manager consists of the internal string pool tool used by Pool Object functions within Legato. Value names are stored in an index, which is used to track string segments in the larger pool. A new name is added to the index each time data is set using a number of different functions. The index entry and its associated string pool segment will remain until the manager is reset or the entry is deleted.
Meta data is part of each index entry for debugging purposes. This includes a time stamp based on the system tick count and that last handle to modify that entry.
Session data has been demonstrated in a number of previous blogs such as Dave’s RSS Monitor and Running in the Background Part 1 and Part 2. This blog will perform a deep dive into the API functions and the atomic aspects of managing session data.
Space Management and Performance
For general operations, programmers don’t generally have worry about space management and performance considerations. The Session Data Manager is designed for light use without a lot of data turn over. As such, programmers should avoid repeatedly storing large strings or thousands of index entries. Optimization is planned for a future release.
Sample Scripts
To demonstrate how session data can be used to communicate between scripts, we will be working with a set of test scripts that essentially perform a shell game using session variables. The first script, “Shell Swap Main.ls”, creates the session variable names and then runs multiple threads, “Shell Swap Worker.ls”. Each thread then attempts to capture the shell and logs the swap.
Shell Swap Main, the script that you run directly, begins by setting the named session values Collisions, Owner, Stop, Swaps and Shell. This serves either to create the names, or if they have already been created, set them to a known state. Collisions is used to track errors, Owner is the handle of the thread that currently owns the shell, Stop is used to force all threads to stop and exit, Swaps is the number of times the shell changed hands, and finally Shell is the actual shell.
How the game works is a background thread attempts to grab the Shell. If it succeeds, the Shell session value is reset, the Swaps id incremented, and the shell put back for the next grab. Essentially, each background thread repeatedly attempts to grab the shell, adds tracking data, sleeps a little, and then releases the shell.
“Shell Swap Main.ls” is as follows:
// // Shell Swap // ---------- // // This program manages and monitors the swap workers. // // Parameters #define RUN_TIME 20000 #define RUN_THREADS 12 int main() { handle workers[RUN_THREADS]; string s1, s2; int ix, et, cc, l_v; int swaps; int rc; // Set Named Values to Known State SetSessionValue("Collisions", 0); SetSessionValue("Owner", ""); SetSessionValue("Stop", FALSE); SetSessionValue("Swaps", 0); SetSessionValue("Shell", TRUE); // Must set value for 'Shell' SetUnwindFunction("unwind_threads"); ProgressOpen("Critical Section Test", 0); // Run Background Threads s1 = GetScriptFolder() + "Shell Swap Worker.ls"; ix = 0; while (ix < RUN_THREADS) { workers[ix] = RunBackgroundScript(s1); if (IsError(workers[ix])) { MessageBox("Error %08X starting script", GetLastError()); exit; } ix++; } // Monitors for x Seconds et = GetElapsedTime(); while (et < RUN_TIME) { if (l_v != ((RUN_TIME - et) / 100)) { s1 = GetSessionString("Owner"); s2 = GetSessionString("Shell"); ProgressUpdate(et, RUN_TIME); l_v = (RUN_TIME - et) / 100; ProgressSetStatus(1, "%d of %d secs", et / 1000, RUN_TIME / 1000); ProgressSetStatus(2, "Handle: %s Token: %s", s1, s2); } Sleep(50); et = GetElapsedTime(); } // Stop Workers AddMessage("Script Thread Activity:"); SetSessionValue("Stop", TRUE); ix = 0; while (ix < RUN_THREADS) { rc = WaitForObject(workers[ix], 10000); if (IsError(rc)) { MessageBox("Error %08X waiting for %d script", rc, ix); exit; } s1 = FormatString("0x%08X", workers[ix]); s2 = GetSessionString(s1); AddMessage(" %2d %s - %s", ix, s1, s2); swaps += TextToInteger(s2); ix++; } // Test and Display Results s1 = GetSessionString("Shell"); s2 = GetSessionString("Swaps"); cc = GetSessionInteger("Collisions"); if ((s1 != "1") || (cc != 0)) { MessageBox('x', "FAIL! FAIL! FAIL!\r\r%d Collisions", cc); } else { AddMessage("Pass - %s/%d swaps", s2, swaps); } return 0; } // in the event of the run-time error void unwind_threads() { SetSessionValue("Stop", TRUE); }
Continuing with the main program, there are three code sections after the initialization. The first runs the specified number of background threads as directed by the RUN_THREADS define. Note that as the number of threads exceeds the available processor cores and hyper threads, your computer may chatter and be intermittently unresponsive as the number of threads using a mutex expands. We’ll talk about this more later.
The second section loops for the time as specified by the define RUN_TIME (in milliseconds). Each loop checks the handle and token on 10th of a second interval. Note that this cannot monitor each swap but rather the condition at the time of the observation.
The third section stops the threads and waits for each worker thread to exit using the WaitForObject function. This actual completes very quickly once the Stop session value is set.
Finally, the last bit looks at the result.
An unwind function is attached to set the stop flag in the event of some error or run time error. Otherwise, in the event of an error, the worker threads could pile up and consume the computer’s processor. Note that the unwind operation is not always executed if you stop the script using the IDE Ctrl+Break key. Running the script again will stop the workers (it may also result in a fail message box since the data and workers were left in an undefined state). The worker thread, as shown below, also will timeout to accommodate a failure.
The code for the worker threads, “Shell Swap Worker.ls”, is as follows:
// // Atomic 'Test and Set' Test -- Shell Swap Worker // ----------------------------------------------- // // This must be run in the background by Shell Swap Main.ls // #define THREAD_SAFE // Removing demonstrates failure handle hThread; string s1, s2; int rc, r, stop, count; // Setup if (IsScriptInBackground() == FALSE) { MessageBox('x', "Script must be called as background script."); exit; } hThread = GetThreadHandle(); // Loop and Try to Capture Shell stop = GetSessionInteger("Stop"); while ((stop == 0) && (count < 10000)) { if (GetElapsedTime() > 60000) { break; } // Keep from running too long // ** Test/Swap Section #ifndef THREAD_SAFE // Non-Thread Safe 'Test and Set' rc = GetSessionInteger("Shell"); if (rc == FALSE) { stop = GetSessionInteger("Stop"); continue; } SetSessionValue("Shell", FALSE); #endif #ifdef THREAD_SAFE // Thread Safe rc = TestAndSetSessionValue("Shell", TASSV_MATCH, TRUE, FALSE); if (rc == FALSE) { stop = GetSessionInteger("Stop"); continue; } #endif // ** Critical Section // -- Only one thread can be here at any time // Update Swaps r = GetSessionInteger("Swaps"); r++; SetSessionValue("Swaps", r); // Update Thread Stats s1 = FormatString("0x%08X", hThread); SetSessionValue("Owner", s1); count++; SetSessionValue(s1, count); // Sleep to simulate work r = Random() % 10; r *= 10; Sleep(r); // Check to make sure we still own shell s2 = GetSessionString("Owner"); if (s1 != s2) { r = GetSessionInteger("Collisions"); SetSessionValue("Collisions", r+1); } // Pass Back Shell SetSessionValue("Shell", TRUE); stop = GetSessionInteger("Stop"); }
The first thing you will notice is a define called THREAD_SAFE. This is used to illustrate what happens when the programs are run in a manner that is not thread safe. Again, we’ll talk more about that in a bit. Our define is followed by a little “idiot” test to verify that the script is being run as a background thread.
The main loop runs until one of three things happens: session variable Stop is set to TRUE, we swap more than 10000 times or more than 60 seconds has elapsed since the worker thread started. The guts of the loop are divided into two parts: the Test/Swap and the Critical Section.
Test/Swap has two versions of code to get the shell token, the first not being thread safe and the second being thread safe. By default, the second is employed. If you rename or comment THREAD_SAFE, the main test will fail because the test and set for the shell are not atomic. There is more on that in the section below on atomic operations.
Assuming we get the shell token, the code enters the Critical Section, i.e., only one worker should be executing this code at any given time. It increments Swaps, makes a string version of the thread handle and places it in Owner, increments its count and sets it into a session variable named by its thread handle. The script then sleeps for a random amount of time, pretending to do work to emulate the randomness of various tasks. It the retrieves the Owner and compares it to its handle string, if they do not match, then an error occurred in the process because another worker thread modified the owner. Finally, it places the token, TRUE, into Shell. The Stop flag is retrieved and the loop continues.
Copy each of these scripts, put them into a folder, and name them “Shell Swap Main.ls” and “Shell Swap Worker.ls”, respectively. Note that the script containing the worker thread must be named correctly since it is directly referenced from the main program. Pressing F7 (run), the progress window appears showing a 20 second swap:
Upon completion, the log will contain something like this:
Each of the worker thread handle values are dumped along with the total swaps. If there any conflicts, a message box will appear:
How is Data Accessed
Since session data is shared among scripts, programmers must be cognizant that their actions can potentially interfere with other scripts. When running in a closed environment where all the scripts are known and controlled by a single developer, conflicts are less likely to occur. However, in a larger environment, there may be many scripts running and relying on session data. In these cases, simply resetting the session data could be disastrous.
As shown in the Introduction, data is stored in a shared pool using named entries in an internal index. The default and required key is the name string. To help organize the data, to separate differing scripts, and to avoid conflicts, optional group and section strings can be used. The strings are internally limited to 128 characters.
When a value is set, tested, incremented or decremented, an entry is added and the name, group, and section are registered. The index entry exists until deleted or the session manager is reset.
Introduction to Atomic Operations
Our sample is almost as much about multiple thread operations as it is about session data. If you engage in complex programming and wish to be able to perform coordinated operations in the background, you need to understand this aspect of programming. This section discusses only one aspect of atomic operations.
During linear operations, your program will run only in a single thread. That thread will own and control all of its data components. As soon as you start background threads that share data, you may run into a problem in terms of when a particular thread “owns” a particular state or variable. Under the hood, the program cannot control exactly when it will be run and when it will be interrupted and paused for another operation. This is further complicated in today’s multi-core processor environments where threads may not be sharing execution but actually processing at the same time. What this means is one thread can test a variable and in the time it takes to load, test and then set the value, another thread may have set the variable to something else already. Thus both threads think they have permission or have the latest version of the data. For low level assembly language, this may only take a hand full of process cycles, but there is still a chance of a conflict. For a higher-level interpreted language like Legato, a thousand instructions may be processed to perform such an action. So the chance of a conflict occurring rises. Further, when there is a conflict, the results can be unpredictable and difficult to duplicate.
This is a well-worn, age-old programming difficulty that is solved by atomic operations. An atomic operation is one that cannot be interrupted in a manner that might cause a problem. For Legato session data, this involves the TestAndSetSessionValue function.
int = TestAndSetSessionValue ( [string group], [string section], string name,
dword flags, param match, param data );
The function has four main parameters: the name of the value, the test scheme as set by flags, match pattern, and data to which to set the name value in the event a match occurs.
The flags parameter can be either:
TASSV_MATCH (0x00000000) — The match string must match.
TASSV_NO_MATCH (0x00000001) — The match string does not match.
The match and data parameters can be a string or integer-based value, but both must be the same type. If an integer type is provided, the match is performed based on the translated string value of the integer.
Under the hood, all session data API functions employ a mutex such that once a script enters the API function, all other processing must wait. This is a blanket operation such that any test and set operation performed will be locked out until the previous function completes. In addition, other session variable functions use a thread safe mutex in their underlying operations to avoid issues with session table management and corruption. But without test and set, the action of examination and setting data cannot be performed safely.
There are two other functions that operate on a macro atomic level: the DecrementSessionValue and IncrementSessionValue functions.
int = DecrementSessionValue ( [string group], [string section], string name );
int = IncrementSessionValue ( [string group], [string section], string name );
Each of these will adjust an integer value and then return the initial value.
As mentioned in the script description about chattering, there is a down side to atomic operation and mutex usage. While the Session Data Manager is internally checking with a critical section, many other operations within the computer are momentarily locked out. If you set the number of threads to 100, the session manager will spend a lot of time repeating the ‘test and set’ and subsequently internally performing low level interrupt prevention. During that time the script is essentially locking out all other operations in the system. Also, the demonstration is deliberately inefficient, when the shell test fails, it immediately tries again rather than waiting.
Principal Functions
Session Data API functions share a comment parameter structure in that the group and section are optional. Note that it is the total number of strings in the ID group that determine whether a parameter is used. For example, if only one ID string is supplied, the group and section parameters are considered default values and the provided string is taken as the name.
The following is a summary of the session data API functions.
DecrementSessionValue — Decrements a session value on a thread-safe basis and returns the result.
int = DecrementSessionValue ( [string group], [string section], string name );
The return value is the value prior to the decrement action. The action is atomic so only one thread can decrement the value at any given time. When starting a process, the program should set an initial value using the SetSessionValue function. If the name value does not exist, it is created and set to zero.
DeleteSessionValue — Deletes a named session value.
int = DeleteSessionValue ( [string group], [string section], string name );
EnumerateSessionData — Enumerates all named session values and properties.
string[][] = EnumerateSessionData ( );
The string table has the following column key names:
Key Name | Description | |||
group | Group name, if provided. | |||
section | Section name, if provided. | |||
name | Value name. | |||
data | Data being stored. For signed integers, a string representation of a decimal value. For unsigned integers, a string representation of a hexadecimal value. | |||
handle | A hexadecimal representation of the handle for the script that last updated the entry. | |||
time_stamp | A decimal representation as a time stamp as the system tick count. |
This is example code to enumerate all session data:
int ix, size; string data[][]; AddMessage("EnumerateSessionData() - "); data = EnumerateSessionData(); size = ArrayGetAxisDepth(data, AXIS_ROW); if (size == 0) { AddMessage(" No Data"); return 0; } AddMessage("Item Group Section Name Handle Time Data"); AddMessage("---- ------------ ------------ ------------ ---------- ---------- -----------------------------------------------------------------------"); while (ix < size) { AddMessage(" %3d %-12s %-12s %-12s %-11s %12s : %s", ix, data[ix]["group"], data[ix]["section"], data[ix]["name"], data[ix]["handle"], data[ix]["time_stamp"], data[ix]["data"]); ix++; }
After our main script has been run, the above code can be run to dump the session data:
Note that our script leave a number of session entries behind. The thread handles will likely never be reused, so if the main script is run multiple times, it will leave a trail of discard entries. Since indexing is not presently optimized, it is a good idea to delete entities after they are no longer needed. Besides that, it is a best practice to be a good neighbor and clean up. You can add the code below to the end of the main script to delete the worker entries:
// Delete Worker Entries AddMessage("Delete Worker Data"); ix = 0; while (ix < RUN_THREADS) { s1 = FormatString("0x%08X", workers[ix]); DeleteSessionValue(s1); AddMessage(" %2d deleted - %s", ix, s1); ix++; }
GetSessionPoolSize — Returns the overall size of the application session pool.
int = GetSessionPoolSize ( );
The returned value is the next offset position to write into session data pool. Since not all space is recovered during repeated operations, the value represents that total amount of space used. Total allocated bytes will be slightly higher since they are rounded up to the next block size.
Getting the pool size can be used as a debugging tool to determine both the amount of space required for the pool as well as the amount of space trashing.
GetSessionInteger — Gets a session value as an integer.
long = GetSessionInteger ( [string group], [string section], string name );
The stored string value is expected to be in signed decimal form. If that value cannot be translated, a syntax error will be set in the last error value.
GetSessionString — Gets a session value as a string.
string = GetSessionString ( [string group], [string section], string name );
GetSessionWord — Gets a session value as an unsigned word.
qword = GetSessionWord ( [string group], [string section], string name );
The stored session string value is expected to be in hexadecimal form. If that value cannot be translated, a syntax error will be set in the last error value.
IncrementSessionValue — Increments a session value on a thread safe basis and returns the result.
int = IncrementSessionValue ( [string group], [string section], string name );
The return value is the value prior to the increment action. This action is atomic, so only one thread can decrement the value at any given time. When starting a process, the program should set an initial value using the SetSessionValue function. If the name value does not exist, it is created and set to zero.
ResetSessionData — Resets all session data (deletes all named values).
int = ResetSessionData ( );
The reset function clears all session data and consolidates all allocated pool data. Resetting session data should be done with caution since it can affect the operation of unrelated scripts.
SetSessionValue — Sets a named session value.
int = SetSessionValue ( [string group], [string section], string name, param data );
The data parameter is stored according to the following rules:
– When a string is specified for the data parameter, it is added verbatim to the session data pool.
– Signed numeric types, such as int, are converted to string decimal values.
– Unsigned data types, such as word or dword, are converted to string hexadecimal.
– A literal numeric value is treated as a signed integer (this also includes TRUE and FALSE.)
In addition to the data value, the entry is also time stamped with the system tick count and script handle.
TestAndSetSessionValue — Tests and sets for a session value on a thread safe basis and returns the result.
int = TestAndSetSessionValue ( [string group], [string section], string name,
dword flags, param match, param data );
Ownership and timestamp information is only altered on a positive match condition.
Conclusion
Session data can be used to hold data between separate runs of the same script, to store information shared by multiple scripts, and to help coordinate background threads. It is worth noting, as show above, that any script can enumerate application session data, and as such, it is a good idea to encrypt information such as passwords and user IDs or any other sensitive data.
Scott Theis is the President of Novaworks and the principal developer of the Legato scripting language. He has extensive expertise with EDGAR, HTML, XBRL, and other programming languages. |
Additional Resources
Legato Script Developers LinkedIn Group
Primer: An Introduction to Legato