I had originally posted this on gcc-help because I wasn't sure it was an actual compiler bug or undefined behavior. Ian Lance Taylor replied that he didn't see any undefined behavior. So I'm reporting it now as a bug. Here's the original message: http://gcc.gnu.org/ml/gcc-help/2011-04/msg00476.html But I'll repeat it below: -------------------- Hi all, I believe I found a wrong-code bug. The problem triggers when using gcc-4.5.1, 4.5.2 or 4.5.3, but not when using 4.4.5 or 4.7.0 (snapshot 20110419). It also only triggers with certain optimization levels/flags. I wonder if this is a known problem and already fixed in 4.7.0, or that the problem still exists but for some reason doesn't trigger in 4.7.0 (I couldn't easily find something in bugzilla). Below is a reduced test-case that shows the problem. I tried, but I couldn't get it smaller than these 4 files (combined about 60 lines). While reducing this problem I realized that it *might* not be a compiler bug, but undefined behaviour with the usage of __restrict in Buffer::read(). What I wanted to express there is that the memory write done by memcpy() can never overwrite the member variable 'p'. At the moment I still believe it's a compiler bug, but I'm not 100% sure anymore. So is this a compiler bug or undefined behavior in my program? In case of the latter I would appreciate if someone could explain what the problem is and maybe suggest a way to fix it. Thanks. Wouter BTW: The code for gcc-4.7.0 is correct but contains some useless extra instructions (which I tried to avoid with __restrict). I'd also appreciate hints on how to improve the generate code. I do realize that the code in this reduced test-case may look a bit silly and that suggestions to optimize the code may be hard because of this. /// FooBar.hh ///// struct Loader; struct FooBar { void load(Loader& l); char c1, c2; }; /// Loader.hh ///// #include <cstring> struct Buffer { Buffer(const char* data) : p(data) {} void read(void* __restrict out) __restrict { memcpy(out, p, 1); ++p; } const char* p; }; template<typename Derived> struct Base { void load2(char& t) { Derived& self = static_cast<Derived&>(*this); self.load1(t); } int dummy; }; struct Loader : Base<Loader> { Loader(const char* data) : buffer(data) {} void load1(char& t) { buffer.read(&t); } Buffer buffer; }; /// FooBar.cc ///// #include "FooBar.hh" #include "Loader.hh" #include <cstdio> void FooBar::load(Loader& l) { l.load1(c1); //printf("This print hides the bug\n"); l.load2(c2); } /// main.cc /////// #include "FooBar.hh" #include "Loader.hh" #include <cstdio> int main() { char data[2] = { 3, 5 }; Loader loader(data); FooBar fb; fb.load(loader); if ((fb.c1 == 3) && (fb.c2 == 5)) { printf("Ok\n"); } else { printf("Wrong!\n"); } } > g++ --version g++ (GCC) 4.5.3 20110423 (prerelease) > uname -a Linux argon 2.6.35-28-generic #49-Ubuntu SMP Tue Mar 1 14:39:03 UTC 2011 x86_64 GNU/Linux > g++ -O3 FooBar.cc -c > g++ -O3 main.cc -c > g++ -o bug FooBar.o main.o > ./bug Wrong! > objdump -d FooBar.o (gcc-4.5.3 prerelease) mov 0x8(%rsi),%rdx lea 0x8(%rsi),%rax movzbl (%rdx),%edx mov %dl,(%rdi) mov 0x8(%rsi),%rdx <-- WRONG: still uses original value of Buffer::p addq $0x1,(%rax) <-- it is only increased here (for the 1st time) movzbl (%rdx),%edx mov %dl,0x1(%rdi) addq $0x1,(%rax) retq > objdump -d FooBar.o (gcc-4.7.0 20110419) mov 0x8(%rsi),%rax movzbl (%rax),%edx mov %dl,(%rdi) lea 0x1(%rax),%rdx <-- correct, but I know this is not mov %rdx,0x8(%rsi) <-- required for my application movzbl 0x1(%rax),%edx add $0x2,%rax mov %dl,0x1(%rdi) mov %rax,0x8(%rsi) retq
The bug is that we somehow think after inlining void FooBar::load(Loader&) (struct FooBar * const this, struct Loader & l) { ... <bb 2>: D.3062_2 = &this_1(D)->c1; D.3079_9 = &l_3(D)->buffer; this_11 = (struct Buffer * const restrict) D.3079_9; D.3083_12 = this_11->p; memcpy (D.3062_2, D.3083_12, 1); D.3083_13 = this_11->p; D.3082_14 = D.3083_13 + 1; this_11->p = D.3082_14; D.3063_4 = &this_1(D)->c2; D.3064_5 = &l_3(D)->D.2415; self_15 = (struct Loader &) D.3064_5; D.3087_16 = &self_15->buffer; this_17 = (struct Buffer * const restrict) D.3087_16; D.3091_18 = this_17->p; memcpy (D.3063_4, D.3091_18, 1); D.3091_19 = this_17->p; D.3090_20 = D.3091_19 + 1; this_17->p = D.3090_20; return; this_11 and this_17 only point to (different) objects. The restrict machinery should be robust against inlining effects because it should inherhit the arguments points-to set as well. In this case we have l_3(D), points-to non-local, points-to vars: { } self_15, points-to non-local, points-to vars: { } though, and we explicitly ignore those: /* If both pointers are restrict-qualified try to disambiguate with restrict information. */ if (TYPE_RESTRICT (TREE_TYPE (ptr1)) && TYPE_RESTRICT (TREE_TYPE (ptr2)) && !pt_solutions_same_restrict_base (&pi1->pt, &pi2->pt)) return false; /* Return true if both points-to solutions PT1 and PT2 for two restrict qualified pointers are possibly based on the same pointer. */ bool pt_solutions_same_restrict_base (struct pt_solution *pt1, struct pt_solution *pt2) { /* If we deal with points-to solutions of two restrict qualified pointers solely rely on the pointed-to variable bitmap intersection. For two pointers that are based on each other the bitmaps will intersect. */ if (pt1->vars_contains_restrict && pt2->vars_contains_restrict) { gcc_assert (pt1->vars && pt2->vars); return bitmap_intersect_p (pt1->vars, pt2->vars); } return true; } because all incoming restrict qualified pointers also point to NONLOCAL (so restrict would be useless if we don't ignore that fact). 4.4 handles restrict in a completely different way. 4.6 and 4.7 manage to make more must-aliasing appearant which avoids this issue to manifest in a miscompilation, but the issue itself is latent.
The following fixes it (partially, global vars need similar treatment) at the cost of extra decls and points-to bits. We have to give what we point to a name, not only rely in the nonlocal glob. Index: gcc/tree-ssa-structalias.c =================================================================== --- gcc/tree-ssa-structalias.c (revision 172817) +++ gcc/tree-ssa-structalias.c (working copy) @@ -4727,8 +4727,27 @@ intra_create_variable_infos (void) } for (p = get_vi_for_tree (t); p; p = p->next) - if (p->may_have_pointers) - make_constraint_from (p, nonlocal_id); + { + if (p->may_have_pointers) + { + struct constraint_expr lhsc, rhsc; + tree heapvar; + heapvar = create_tmp_var_raw (TREE_TYPE (TREE_TYPE (t)), + "PARM_PT"); + DECL_EXTERNAL (heapvar) = 1; + get_var_ann (heapvar)->is_heapvar = 1; + add_referenced_var (heapvar); + lhsc.var = p->id; + lhsc.type = SCALAR; + lhsc.offset = 0; + rhsc.var = get_vi_for_tree (heapvar)->id; + rhsc.type = ADDRESSOF; + rhsc.offset = 0; + process_constraint (new_constraint (lhsc, rhsc)); + + make_constraint_from (p, nonlocal_id); + } + } if (POINTER_TYPE_P (TREE_TYPE (t)) && TYPE_RESTRICT (TREE_TYPE (t))) make_constraint_from_restrict (get_vi_for_tree (t), "PARM_RESTRICT"); it would be nice if we could avoid allocating decls for such things (in principle we could simply allocate a DECL_UID only).
The same issue in a more complicated form hits trunk in PR49279.
Fixed on trunk and the 4.6 branch sofar.
Really a dup, the fix should be possible to backport (on it). *** This bug has been marked as a duplicate of bug 49279 ***