Finding miscompilations on large testcases
Suppose you have a program consisting of multiple .o files and you know which GCC pass is making an invalid transformation. In this case it is possible to automatically find the number of invalid transformation, and then to generate dump files with/without invalid transformation or set gdb breakpoint just before the invalid transformation happens.
The first step is to identify the miscompiled .o file. To do this, produce a set of corresponding .o files that are correctly compiled (by disabling the offending pass). After collecting good and bad .o files into separate directories, adjust and run the script below to find a single .o file that is miscompiled by performing a binary search. Alternatively, delta tool can be used to "reduce" the set of potentially miscompiled .o files.
Binary search script
Expanded version of a binary search script from from gcc/dbgcnt.def
#!/usr/bin/env bash
usage () {
cat <<EOF
Perform a binary search over a range of integer values to find the smallest
value Ncrit that makes the supplied command fail. Supplied command is
presumed to return 0 on success and 1 on failure or vice versa and to
fail only for all N >= Ncrit > LowerBound (so that binary search is possible).
Usage: " $0 " [-l lower bound] [-u upper bound] [-i initial guess] <command>
-l LB: lower bound (default 0)
-u UB: upper bound (default not specified)
-i IG: initial guess for UB-LB if UB not specified (default 100)
EOF
}
while getopts "l:u:i:" opt
do
case $opt in
l) LOWER="$OPTARG";;
u) UPPER="$OPTARG";;
i) UPPER_GUESS="$OPTARG";;
?) usage; exit 3;;
esac
done
shift $[$OPTIND - 1]
CMD="$@"
if [ -z "$CMD" ]; then
usage
exit 3
fi
LOWER=${LOWER:-0}
UPPER_GUESS=${UPPER_GUESS:-100}
$CMD $LOWER
OK_VALUE=$?
echo "$CMD returns $OK_VALUE on success"
if [ -z "$UPPER" ]; then
# find the upper bound
UPPER=$(($UPPER_GUESS + $LOWER))
$CMD $UPPER
while [ $? -eq $OK_VALUE ]; do
UPPER=$[$UPPER * 10]
$CMD $UPPER
done
echo "Found upper bound: " $UPPER
fi
LOWER=$[$LOWER + 1]
# Loop invariant of binary search: $LOWER-1 is OK, $UPPER is "bad"
while [ $UPPER -gt $LOWER ]; do
MID=$[($UPPER + $LOWER) / 2]
$CMD $MID
if [ $? -eq $OK_VALUE ]; then
LOWER=$[$MID + 1]
else
UPPER=$MID
fi
done
echo "$LOWER is the first failing value"
$CMD -v $LOWERThe following two helper scripts can be used as the last argument (<command>) to the bisection script above.
Example helper script for finding miscompiled .o file
Suppose you have collected correct .o files and potentially miscompiled .o files into directories ./good_o/ and ./bad_o, respectively. Then you can use the following script as the judging command for the bisection script above:
#!/usr/bin/env bash
CC=${CC:-"/path/to/gcc"}
CFLAGS=${CFLAGS:-"-O2"}
GOOD=./good_o
BAD=./bad_o
# When passed -v N, print the name of file number N
if [ $1 = "-v" ]; then
echo -n "Miscompiled file: "
ls -1 $BAD/*.o | head -n $2 | tail -n 1
exit 0
fi
# Copy first $1 bad object files into current directory
for o in $(ls -1 $BAD/*.o | head -n $1); do
cp $o .
done
# Copy all but first $1 good object files
for o in $(ls -1 $GOOD/*.o | tail -n +$[$1 + 1]); do
cp $o .
done
# Link the executable
$CC $CFLAGS *.o
# Adjust execution line to match the character of miscompilation
./a.outFor example, if miscompiled program aborts almost immediately, while the correct runs for more than a second, the last line would look like:
./a.out & sleep 1 kill %1
Supply correct arguments to invocation of the tested executable (and files it opens at runtime, if any).
Using debug counters to find the bad transformation
Debug counters are documented in dbgcnt.def. In short, at each invocation of dbg_cnt function, the value of the counter passed as argument is incremented, and if it reaches the user-supplied limit, dbg_cnt returns false. Performing a binary search on debug counter values allows to quickly find the number of transformation that produces incorrect code.
Adding a debug counter includes the following steps:
- Decide what transformation the new counter would control
- Declare the new counter by adding a new stanza in dbgcnt.def
- #include "dbgcnt.h" in the file implementing the transformation
- Guard applying the transformation with the condition "dbg_cnt (new_counter_name)"
Again, adjust and use the following script with the binary search script to find the number of invalid transformation
#!/usr/bin/env bash
CC=${CC:-"/path/to/gcc"}
CFLAGS=${CFLAGS:-"-O2"}
COUNTER=${COUNTER:-"dbg_counter_name"}
SRC=${SRC:-"filename.c"}
OBJ=${OBJ:-"filename.o"}
# When passed -v N, print compiler invocation
if [ $1 = "-v" ]; then
echo "Correct invocation: "
echo $CC $CFLAGS -fdbg-cnt=$COUNTER:$[$2 - 1] -o $OBJ $SRC
echo "Miscompiling invocation: "
echo $CC $CFLAGS -fdbg-cnt=$COUNTER:$2 -o $OBJ $SRC
exit 0
fi
# Recompile the miscompiled .o from source
$CC $CFLAGS -fdbg-cnt=$COUNTER:$1 -o $OBJ $SRC
# Relink the executable
$CC $CFLAGS *.o
# Adjust the following line to match the character of miscompilation
./a.outYou can then copy-and-paste the correct/miscompiling command lines printed by the last invocation of this helper script to produce GCC dumps with minimal differences or to attach GDB on the miscompiling transformation (set 'breakpoint' on dbg_cnt function and 'ignore' it counter-1 number of times).