Finding miscompilations on large testcases
Suppose you have a program consisting of multiple .o files and you know which GCC pass is making an invalid transformation. In this case it is possible to automatically find the number of invalid transformation, and then to generate dump files with/without invalid transformation or set gdb breakpoint just before the invalid transformation happens.
The first step is to identify the miscompiled .o file. To do this, produce a set of corresponding .o files that are correctly compiled (by disabling the offending pass). After collecting good and bad .o files into separate directories, adjust and run the script below to find a single .o file that is miscompiled by performing a binary search. Alternatively, delta tool can be used to "reduce" the set of potentially miscompiled .o files.
Binary search script
Expanded version of a binary search script from from gcc/dbgcnt.def
#!/usr/bin/env bash usage () { cat <<EOF Perform a binary search over a range of integer values to find the smallest value Ncrit that makes the supplied command fail. Supplied command is presumed to return 0 on success and 1 on failure or vice versa and to fail only for all N >= Ncrit > LowerBound (so that binary search is possible). Usage: " $0 " [-l lower bound] [-u upper bound] [-i initial guess] <command> -l LB: lower bound (default 0) -u UB: upper bound (default not specified) -i IG: initial guess for UB-LB if UB not specified (default 100) EOF } while getopts "l:u:i:" opt do case $opt in l) LOWER="$OPTARG";; u) UPPER="$OPTARG";; i) UPPER_GUESS="$OPTARG";; ?) usage; exit 3;; esac done shift $[$OPTIND - 1] CMD="$@" if [ -z "$CMD" ]; then usage exit 3 fi LOWER=${LOWER:-0} UPPER_GUESS=${UPPER_GUESS:-100} $CMD $LOWER OK_VALUE=$? echo "$CMD returns $OK_VALUE on success" if [ -z "$UPPER" ]; then # find the upper bound UPPER=$(($UPPER_GUESS + $LOWER)) $CMD $UPPER while [ $? -eq $OK_VALUE ]; do UPPER=$[$UPPER * 10] $CMD $UPPER done echo "Found upper bound: " $UPPER fi LOWER=$[$LOWER + 1] # Loop invariant of binary search: $LOWER-1 is OK, $UPPER is "bad" while [ $UPPER -gt $LOWER ]; do MID=$[($UPPER + $LOWER) / 2] $CMD $MID if [ $? -eq $OK_VALUE ]; then LOWER=$[$MID + 1] else UPPER=$MID fi done echo "$LOWER is the first failing value" $CMD -v $LOWER
The following two helper scripts can be used as the last argument (<command>) to the bisection script above.
Example helper script for finding miscompiled .o file
Suppose you have collected correct .o files and potentially miscompiled .o files into directories ./good_o/ and ./bad_o, respectively. Then you can use the following script as the judging command for the bisection script above:
#!/usr/bin/env bash CC=${CC:-"/path/to/gcc"} CFLAGS=${CFLAGS:-"-O2"} GOOD=./good_o BAD=./bad_o # When passed -v N, print the name of file number N if [ $1 = "-v" ]; then echo -n "Miscompiled file: " ls -1 $BAD/*.o | head -n $2 | tail -n 1 exit 0 fi # Copy first $1 bad object files into current directory for o in $(ls -1 $BAD/*.o | head -n $1); do cp $o . done # Copy all but first $1 good object files for o in $(ls -1 $GOOD/*.o | tail -n +$[$1 + 1]); do cp $o . done # Link the executable $CC $CFLAGS *.o # Adjust execution line to match the character of miscompilation ./a.out
For example, if miscompiled program aborts almost immediately, while the correct runs for more than a second, the last line would look like:
./a.out & sleep 1 kill %1
Supply correct arguments to invocation of the tested executable (and files it opens at runtime, if any).
Using debug counters to find the bad transformation
Debug counters are documented in dbgcnt.def. In short, at each invocation of dbg_cnt function, the value of the counter passed as argument is incremented, and if it reaches the user-supplied limit, dbg_cnt returns false. Performing a binary search on debug counter values allows to quickly find the number of transformation that produces incorrect code.
Adding a debug counter includes the following steps:
- Decide what transformation the new counter would control
- Declare the new counter by adding a new stanza in dbgcnt.def
- #include "dbgcnt.h" in the file implementing the transformation
- Guard applying the transformation with the condition "dbg_cnt (new_counter_name)"
Again, adjust and use the following script with the binary search script to find the number of invalid transformation
#!/usr/bin/env bash CC=${CC:-"/path/to/gcc"} CFLAGS=${CFLAGS:-"-O2"} COUNTER=${COUNTER:-"dbg_counter_name"} SRC=${SRC:-"filename.c"} OBJ=${OBJ:-"filename.o"} # When passed -v N, print compiler invocation if [ $1 = "-v" ]; then echo "Correct invocation: " echo $CC $CFLAGS -fdbg-cnt=$COUNTER:$[$2 - 1] -o $OBJ $SRC echo "Miscompiling invocation: " echo $CC $CFLAGS -fdbg-cnt=$COUNTER:$2 -o $OBJ $SRC exit 0 fi # Recompile the miscompiled .o from source $CC $CFLAGS -fdbg-cnt=$COUNTER:$1 -o $OBJ $SRC # Relink the executable $CC $CFLAGS *.o # Adjust the following line to match the character of miscompilation ./a.out
You can then copy-and-paste the correct/miscompiling command lines printed by the last invocation of this helper script to produce GCC dumps with minimal differences or to attach GDB on the miscompiling transformation (set 'breakpoint' on dbg_cnt function and 'ignore' it counter-1 number of times).