Testing
The introduction of the memory model and atomic operations introduces some interesting testing issues. Its important that any compiler changes can be:
- Tested for conformance.
- Integrated into the regular test suite to prevent regressions.
- Not increase test suite run time in any significant way.
- Work on a minimal or single processor system
- Work with and without optimization
The things specifically that need to be tested are:
- Turning on the various memory-model flags disable the required optimizations.
- Turning off those flags result in no codegen impacts.
The implementation of atomics is both correct and threadsafe..
Scanning output files seems to be a fairly unsafe and difficult to prove approach, so this is what I've come up with:
How to test
This approach will use GDB to control how the testcase is executed. For the record, and inferior function call is when gdb directly calls a user function in the program outside of normal execution.
- The test framework will check for the existence of a suitable GDB, and if it is present, enable a set of gdb-driven testcases.
- These testcases consist of a function to be tested and some auxiliary routines to verify results.
- The testcase is then compiled and GDB is invoked with a script to run the object file.
- The script sets a breakpoint at entry to the routine to be tested, and then proceeds to step through the test routine as single instruction at a time.
- Before and after each single instruction step, the supplied auxiliary routines are called as inferior function calls to set up or verify that side effects of the tested routine on an instruction by instruction basis.
- Once the function to be tested is completed, the test program is allowed to finish, and the final results are verified by another inferior function call.
Conceptually, this turns GDB into a task scheduler in which the test routine is one process which is always given a single instruction time slice. The inferior function calls around this single step "time slice" then act as other processes which can modify or verify shared memory to make sure the results are always as expected.
With a well constructed test case, this approach enables us to properly test the thread-safeness of routines at the instruction level without actually being multi-threaded, and to be sure we have coverage with a single testcase run (as opposed to running 2 threads 1,000,000 times and hoping that if there was a problem it would show up).
This can also be used to verify that an optimization was or was not performed based on how the inferior function calls can see and change shared memory between each instruction.
Im currently using a generic GDB run script for all tests which looks something like:
break main run set $ret = 0 while fini != 1 call other_threads() stepi set $ret |= step_verify() end set $ret |= final_verify() continue quit $ret
The final script will be a bit different. The while loop utilizing 'fini' is executed until a routine called done() is called in the user program. The testcase provide the three routine other_threads(), step_verify(), and final_verify() to do whatever modifying/verifying is required for the test.
A few examples to help indicate how this works.
This is just a sample of the types of things that this testing harness utilizing GDB can accomplish. It should be possible to test all aspects of conformance of the memory model and atomics to the standard eventually, and identify regressions as they occur.
The testcases themselves should execute very quickly, and should add very little time to the overall testrun
When non-compliant errors are found, a testcase can also easily be added. Initially testcases will be added to help identify optimizations which need modifying.
This approach of simulating other threads should also be adaptable to test other threading situations, such as verifying that a linked list insertion is thread-safe or that a hash table lookup is thread-safe while deletions are being performed.