Bug 39330 - Bad program bahavior with -O2 optimization in plain C
Summary: Bad program bahavior with -O2 optimization in plain C
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: c (show other bugs)
Version: 4.2.4
: P3 major
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-03-01 05:01 UTC by Sabi
Modified: 2009-03-01 08:07 UTC (History)
2 users (show)

See Also:
Host: i486-linux-gnu
Target: i486-linux-gnu
Build: i486-linux-gnu
Known to work:
Known to fail:
Last reconfirmed:


Attachments
Testcase with debug information and explanation (2.58 KB, text/x-csrc)
2009-03-01 05:05 UTC, Sabi
Details
Testcase smaller and simpler (585 bytes, text/x-csrc)
2009-03-01 05:06 UTC, Sabi
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sabi 2009-03-01 05:01:31 UTC
There is a bug in the gcc compiler for the C code that changes the behavior of a simple program with -O2 optimizations, but not with -O1 or -O0.

The attached code is a small implementation of the qsort algorithm over an array of 5 elements of 64bits; compared using only 32bits of those 64bits. This simple code exposes a bug in the compiler in the first call to this function.

The problem COULD be with a variable optimized out caching and old value.

There are two test cases. One with debug information (printf of some values) and other smaller, "minimal" test case. In both test cases, the qsort algorithm should sort a vector of five elements: 5 4 3 2 1. In the first iteration the algorithm divides the vector in "1" (left part), "2" (pivot) and "3 4 5" (right part, already in the right order). There is a detailed explanation (a the end of this description) that explains why this is an incorrect behavior of the compiler.

The problem is that in the second iteration of a while loop, a comparation "vcp < vp" where vcp is 1 and vp is 2 get "false" as result, probably because the vcp variable was optimized, and in the first iteration it's value was 5.

Some strange things:
* A printf("%d", vcp); at the right place fixes the bug, probably because vcp is not cached in a registry anymore. 

I can confirm the same bug in a x86_64 architecture with the same version (ubuntu 8.04.2).

/** Explanation:
 * There are 3 pointers: bp, cp and ep.
 * The parameters b and e define a buffer in memory (begin and end).
 * There are 2 "values": vp and vcp.
 *  - vp is the value of the "pivot" and is written only once in the function
 *  - vcp is the "value" of the cp pointer (not "*cp" !).
 * A "value" is a variable of type "tipoval" which is used to compare elements
 * of type "tipo".
 * In this case, we have a datatype of 8bytes (tipo), and we compare them
 * by the first 4bytes (tipoval) value. 
 *
 * For the first call:
 * The difference e-b is 5 (five uint64 elements)
 * vp = 2; (as shown in the debug output)
 * bp = b; cp = b; and ep = e; at the begin [1].
 *
 * At the point [1]: (shown in debug output)
 *  * cp = b+0, bp = b+0, and ep = b+5
 *  * vector "values" are 5 4 3 2 1
 *
 * Since the "value" of ep-1 is 1 (the last value of the vector),
 * the while loop between [1] and [2] executes no iteration. "2 < 1" is false.
 * Until here the compiled program is ok.
 *
 * At the point [2]: (shown in debug output)
 *  * the state is the same as in [1].
 *
 * Then, we enter in the second while loop for the first time:
 * "while(cp < ep)" since cp=b+0 and ep=b+5.
 * At the first iteration, vcp = 5 (the first "value" in the vector).
 * Here we compare three options: [3] vcp < vp; [4] vcp == vp; or in other case [5] vcp > vp.
 * Since vcp == 5 and vp == 2, we enter in the third case ( Ref [5] ).
 * ep is decremented and now ep = b+4, and the elements at positions 0 and 4 swapped
 * (remember, cp is b+0). The new "values" of vector order is 1 4 3 2 5
 * >> Note: The invariant here is: "Everything between [ep , e) (as a range left
 *          closed but right open) is strictly greater than the pivot vp.
 * 
 * The while loop inside the [5] case is identical to the first one in the
 * function. Again it doesn't enter because the "value" of "ep-1" is VL(b+3)
 * which is now 2, the same as the pivot vp. Then, "2 < 2" is false.
 * We exit from the if case [5] and iterate again the main while loop, back
 * to point [2].
 *
 * We enter again in the main while loop at point [2] because cp == b+0
 * and ep = b+4.
 * Remembering, vector "values" are 1 4 3 2 5; so now vcp is 1.
 * >> Note: Until here I can check the correctness of the binary file with a debugger
 *    (gdb), but "print vcp" fails because "No symbol vcp in current context."
 *    Optimizations here have done something.
 *    Take in mind that from the las iteration, cp was not change, but *cp was
 *    swapped with the last value of the vector.
 * Backing to the code, vcp = 1 and vp = 2, so we are in the case [3] (vcp < vp)
 * In this case, no value is swapped because bp == cp, but both bp and cp are
 * incremented.
 * >> Note: The invariant here is all the values in [b, bp) a strictly lower than
 *          the pivot vp.
 *
 * BUT! That's not what the code does. HERE IS THE BUG (or at least, the symptom).
 *
 * The code apparently enters in the case[5], since ep is decremented according
 * the debugger. Indeed, cp at the end of the main while loop (Ref [6]) is still
 * cp=b, so the never reaches the cases [3] or [4], because in both cases cp is
 * incremented.
 *
 * IMHO, the problem is that vcp was not properly updated in the second iteration
 * and it probably stil is 5.
 *
 * Work-around:
 *
 *  * If you put a "printf("%u\n", vcp);" inside the main while loop,
 *    the bug is not expressed and the program works well.
 * 
 *  * If you remove the __attribute__((always_inline)) for the function
 *    VL, again the bug is not expressed.
 *
 *  * If you replace the function VL with a #define, the bug is also
 *    present. This doesn't fix the problem.
 *
 *  * If you compile with -O1 or -O0 the bug is NOT present.
 *
**/
Comment 1 Sabi 2009-03-01 05:05:53 UTC
Created attachment 17379 [details]
Testcase with debug information and explanation

Compile with:
gcc -save-temps -O2 -Wall -Werror xqsort.c -o xqsort
Comment 2 Sabi 2009-03-01 05:06:32 UTC
Created attachment 17380 [details]
Testcase smaller and simpler
Comment 3 Sabi 2009-03-01 05:08:26 UTC
Comment on attachment 17380 [details]
Testcase smaller and simpler

Compile with:
gcc -save-temps -O2 -Wall -Werror xqsort-small.c -o xqsort-small

Run with:
./xqsort-small && echo ok

If compiled with -O1 , the echo command must be executed.
Comment 4 Eric Botcazou 2009-03-01 08:07:06 UTC
> There is a bug in the gcc compiler for the C code that changes the behavior of
> a simple program with -O2 optimizations, but not with -O1 or -O0.

-O2 enables -fstrict-aliasing so the code must be written in ISO C as far as aliasing is concerned, otherwise the behavior is undefined.

Your code is not written in ISO C:

#define VL(X) (*((uint32*)(X)))

is a direct violation of ISO C.  See the -fstrict-aliasing entry in the GCC manual or the ISO standard.