Bug 48764 - [4.5 Regression] wrong-code bug in gcc-4.5.x, related to __restrict
Summary: [4.5 Regression] wrong-code bug in gcc-4.5.x, related to __restrict
Status: RESOLVED DUPLICATE of bug 49279
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.5.3
: P2 normal
Target Milestone: 4.5.4
Assignee: Richard Biener
URL:
Keywords: wrong-code
Depends on:
Blocks: restrict 49279
  Show dependency treegraph
 
Reported: 2011-04-25 18:46 UTC by Wouter Vermaelen
Modified: 2012-01-03 12:19 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work: 4.4.4, 4.6.0, 4.6.2, 4.7.0
Known to fail: 4.5.2
Last reconfirmed: 2011-04-26 10:43:12


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Wouter Vermaelen 2011-04-25 18:46:47 UTC
I had originally posted this on gcc-help because I wasn't sure it was an actual compiler bug or undefined behavior. Ian Lance Taylor replied that he didn't see any undefined behavior. So I'm reporting it now as a bug.

Here's the original message:
   http://gcc.gnu.org/ml/gcc-help/2011-04/msg00476.html
But I'll repeat it below:


--------------------


Hi all,

I believe I found a wrong-code bug. The problem triggers when using
gcc-4.5.1, 4.5.2 or 4.5.3, but not when using 4.4.5 or 4.7.0 (snapshot
20110419). It also only triggers with certain optimization levels/flags.
I wonder if this is a known problem and already fixed in 4.7.0, or that
the problem still exists but for some reason doesn't trigger in 4.7.0
(I couldn't easily find something in bugzilla).


Below is a reduced test-case that shows the problem. I tried, but I
couldn't get it smaller than these 4 files (combined about 60 lines).


While reducing this problem I realized that it *might* not be a compiler
bug, but undefined behaviour with the usage of __restrict in
Buffer::read(). What I wanted to express there is that the memory write
done by memcpy() can never overwrite the member variable 'p'. At the
moment I still believe it's a compiler bug, but I'm not 100% sure
anymore.


So is this a compiler bug or undefined behavior in my program? In case
of the latter I would appreciate if someone could explain what the
problem is and maybe suggest a way to fix it.


Thanks.

Wouter


BTW: The code for gcc-4.7.0 is correct but contains some useless extra
instructions (which I tried to avoid with __restrict). I'd also appreciate
hints on how to improve the generate code.
I do realize that the code in this reduced test-case may look a bit silly
and that suggestions to optimize the code may be hard because of this.






/// FooBar.hh /////

struct Loader;
struct FooBar {
	void load(Loader& l);
	char c1, c2;
};




/// Loader.hh /////

#include <cstring>

struct Buffer {
	Buffer(const char* data) : p(data) {}
	void read(void* __restrict out) __restrict {
		memcpy(out, p, 1);
		++p;
	}
	const char* p;
};


template<typename Derived> struct Base {
	void load2(char& t) {
		Derived& self = static_cast<Derived&>(*this);
		self.load1(t);
	}
	int dummy;
};


struct Loader : Base<Loader> {
	Loader(const char* data) : buffer(data) {}
	void load1(char& t) { buffer.read(&t); }
	Buffer buffer;
};




/// FooBar.cc /////

#include "FooBar.hh"
#include "Loader.hh"
#include <cstdio>


void FooBar::load(Loader& l)
{
	l.load1(c1);
	//printf("This print hides the bug\n");
	l.load2(c2);
}




/// main.cc ///////

#include "FooBar.hh"
#include "Loader.hh"
#include <cstdio>


int main()
{
	char data[2] = { 3, 5 };
	Loader loader(data);
	FooBar fb;
	fb.load(loader);


	if ((fb.c1 == 3) && (fb.c2 == 5)) {
		printf("Ok\n");
	} else {
		printf("Wrong!\n");
	}
}




> g++ --version
g++ (GCC) 4.5.3 20110423 (prerelease)


> uname -a
Linux argon 2.6.35-28-generic #49-Ubuntu SMP Tue Mar 1 14:39:03 UTC 2011 x86_64 GNU/Linux


> g++ -O3 FooBar.cc -c
> g++ -O3 main.cc -c
> g++ -o bug FooBar.o main.o


> ./bug
Wrong!






> objdump -d FooBar.o  (gcc-4.5.3 prerelease)
  mov    0x8(%rsi),%rdx
  lea    0x8(%rsi),%rax
  movzbl (%rdx),%edx
  mov    %dl,(%rdi)
  mov    0x8(%rsi),%rdx  <-- WRONG: still uses original value of Buffer::p
  addq   $0x1,(%rax)     <-- it is only increased here (for the 1st time)
  movzbl (%rdx),%edx
  mov    %dl,0x1(%rdi)
  addq   $0x1,(%rax)
  retq



> objdump -d FooBar.o  (gcc-4.7.0 20110419)
  mov    0x8(%rsi),%rax
  movzbl (%rax),%edx
  mov    %dl,(%rdi)
  lea    0x1(%rax),%rdx   <-- correct, but I know this is not
  mov    %rdx,0x8(%rsi)   <-- required for my application
  movzbl 0x1(%rax),%edx
  add    $0x2,%rax
  mov    %dl,0x1(%rdi)
  mov    %rax,0x8(%rsi)
  retq
Comment 1 Richard Biener 2011-04-26 10:43:12 UTC
The bug is that we somehow think after inlining

void FooBar::load(Loader&) (struct FooBar * const this, struct Loader & l)
{
...
<bb 2>:
  D.3062_2 = &this_1(D)->c1;
  D.3079_9 = &l_3(D)->buffer;
  this_11 = (struct Buffer * const restrict) D.3079_9;
  D.3083_12 = this_11->p;
  memcpy (D.3062_2, D.3083_12, 1);
  D.3083_13 = this_11->p;
  D.3082_14 = D.3083_13 + 1;
  this_11->p = D.3082_14;
  D.3063_4 = &this_1(D)->c2;
  D.3064_5 = &l_3(D)->D.2415;
  self_15 = (struct Loader &) D.3064_5;
  D.3087_16 = &self_15->buffer;
  this_17 = (struct Buffer * const restrict) D.3087_16;
  D.3091_18 = this_17->p;
  memcpy (D.3063_4, D.3091_18, 1);
  D.3091_19 = this_17->p;
  D.3090_20 = D.3091_19 + 1;
  this_17->p = D.3090_20;
  return;

this_11 and this_17 only point to (different) objects.  The restrict
machinery should be robust against inlining effects because it should
inherhit the arguments points-to set as well.  In this case we have

l_3(D), points-to non-local, points-to vars: { }
self_15, points-to non-local, points-to vars: { }

though, and we explicitly ignore those:

  /* If both pointers are restrict-qualified try to disambiguate
     with restrict information.  */
  if (TYPE_RESTRICT (TREE_TYPE (ptr1))
      && TYPE_RESTRICT (TREE_TYPE (ptr2))
      && !pt_solutions_same_restrict_base (&pi1->pt, &pi2->pt))
    return false;


/* Return true if both points-to solutions PT1 and PT2 for two restrict
   qualified pointers are possibly based on the same pointer.  */

bool
pt_solutions_same_restrict_base (struct pt_solution *pt1,
                                 struct pt_solution *pt2)
{
  /* If we deal with points-to solutions of two restrict qualified
     pointers solely rely on the pointed-to variable bitmap intersection.
     For two pointers that are based on each other the bitmaps will
     intersect.  */
  if (pt1->vars_contains_restrict
      && pt2->vars_contains_restrict)
    {
      gcc_assert (pt1->vars && pt2->vars);
      return bitmap_intersect_p (pt1->vars, pt2->vars);
    }

  return true;
}

because all incoming restrict qualified pointers also point to NONLOCAL
(so restrict would be useless if we don't ignore that fact).

4.4 handles restrict in a completely different way.

4.6 and 4.7 manage to make more must-aliasing appearant which avoids
this issue to manifest in a miscompilation, but the issue itself is
latent.
Comment 2 Richard Biener 2011-04-26 11:17:59 UTC
The following fixes it (partially, global vars need similar treatment) at the
cost of extra decls and points-to bits.
We have to give what we point to a name, not only rely in the nonlocal glob.

Index: gcc/tree-ssa-structalias.c
===================================================================
--- gcc/tree-ssa-structalias.c  (revision 172817)
+++ gcc/tree-ssa-structalias.c  (working copy)
@@ -4727,8 +4727,27 @@ intra_create_variable_infos (void)
        }
 
       for (p = get_vi_for_tree (t); p; p = p->next)
-       if (p->may_have_pointers)
-         make_constraint_from (p, nonlocal_id);
+       {
+         if (p->may_have_pointers)
+           {
+             struct constraint_expr lhsc, rhsc;
+             tree heapvar;
+             heapvar = create_tmp_var_raw (TREE_TYPE (TREE_TYPE (t)),
+                                           "PARM_PT");
+             DECL_EXTERNAL (heapvar) = 1;
+             get_var_ann (heapvar)->is_heapvar = 1;
+             add_referenced_var (heapvar);
+             lhsc.var = p->id;
+             lhsc.type = SCALAR;
+             lhsc.offset = 0;
+             rhsc.var = get_vi_for_tree (heapvar)->id;
+             rhsc.type = ADDRESSOF;
+             rhsc.offset = 0;
+             process_constraint (new_constraint (lhsc, rhsc));
+
+             make_constraint_from (p, nonlocal_id);
+           }
+       }
       if (POINTER_TYPE_P (TREE_TYPE (t))
          && TYPE_RESTRICT (TREE_TYPE (t)))
        make_constraint_from_restrict (get_vi_for_tree (t), "PARM_RESTRICT");

it would be nice if we could avoid allocating decls for such things
(in principle we could simply allocate a DECL_UID only).
Comment 3 Richard Biener 2011-06-06 14:06:06 UTC
The same issue in a more complicated form hits trunk in PR49279.
Comment 4 Richard Biener 2011-10-10 11:08:20 UTC
Fixed on trunk and the 4.6 branch sofar.
Comment 5 Richard Biener 2012-01-03 12:19:07 UTC
Really a dup, the fix should be possible to backport (on it).

*** This bug has been marked as a duplicate of bug 49279 ***