Bug 48702 - [4.6 Regression] optimization regression with gcc-4.6 on x86_64-unknown-linux-gnu
Summary: [4.6 Regression] optimization regression with gcc-4.6 on x86_64-unknown-linux...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.6.0
: P3 normal
Target Milestone: 4.6.1
Assignee: Richard Biener
URL:
Keywords: wrong-code
: 49144 (view as bug list)
Depends on: 49235
Blocks:
  Show dependency treegraph
 
Reported: 2011-04-20 14:11 UTC by Mariah Lenox
Modified: 2011-06-06 10:18 UTC (History)
4 users (show)

See Also:
Host: x86_64-unknown-linux-gnu
Target: x86_64-unknown-linux-gnu
Build: x86_64-unknown-linux-gnu
Known to work: 4.5.1, 4.6.1, 4.7.0
Known to fail: 4.6.0
Last reconfirmed: 2011-04-20 14:28:27


Attachments
patch (989 bytes, patch)
2011-04-21 10:48 UTC, Richard Biener
Details | Diff
alternative patch for 4.6 (841 bytes, patch)
2011-05-25 10:19 UTC, Richard Biener
Details | Diff
alternative patch for 4.6, with backports (2.24 KB, patch)
2011-05-26 11:05 UTC, Richard Biener
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mariah Lenox 2011-04-20 14:11:52 UTC
/* optimization regression with gcc-4.6.0 on x86_64-unknown-linux-gnu

% gcc-4.6.0 -O2 -o foo foo.c
% foo
% 1
%
% gcc-4.6.0 -O1 -o foo foo.c
% foo
% 4
%
% gcc-4.5.1 -O2 -o foo foo.c
% foo
% 4

% gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-4.6.0/x86_64-Linux-core2-fc/libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /usr/local/gcc-4.6.0/src/gcc-4.6.0/configure --enable-languages=c,c++,fortran --with-gnu-as --with-gnu-as=/usr/local/binutils-2.21/x86_64-Linux-core2-fc-gcc-4.5.1-rh/bin/as --with-gnu-ld --with-ld=/usr/local/binutils-2.21/x86_64-Linux-core2-fc-gcc-4.5.1-rh/bin/ld --with-gmp=/usr/local/mpir-2.3.0/x86_64-Linux-core2-fc-gcc-4.5.1-rh --with-mpfr=/usr/local/mpfr-3.0.0/x86_64-Linux-core2-fc-mpir-2.3.0-gcc-4.5.1-rh --with-mpc=/usr/local/mpc-0.9/x86_64-Linux-core2-fc-mpir-2.3.0-mpfr-3.0.0-gcc-4.5.1-rh --prefix=/usr/local/gcc-4.6.0/x86_64-Linux-core2-fc
Thread model: posix
gcc version 4.6.0 (GCC) 
%

*/

#include  <stdio.h>

#define LEN 4  

void unpack(int  array[LEN])
{
int ii, val;

val = 1;
for (ii = 0; ii < LEN; ii++) {
  array[ii] = val % 2;
  val = val / 2;
}

return;
}

int  pack(int  array[LEN])
{
int ans, ii;

ans = 0;
for (ii = LEN-1; ii >= 0; ii--) {
  ans = 2 * ans + array[ii];
}

return ans;
}

int  foo()
{
int temp, ans;
int array[LEN];

unpack(array);
temp = array[0];
array[0] = array[2];
array[2] = temp;
ans = pack(array);
return ans;
}

int main(void)
{
int val;

val = foo();
printf("%d\n", val);

return 0; 
}
Comment 1 Richard Biener 2011-04-20 14:28:27 UTC
Confirmed.  Seems to work on trunk.

The following fails at -O1:

extern void abort (void);

#define LEN 4

static inline void unpack(int  array[LEN])
{
  int ii, val;
  val = 1;
  for (ii = 0; ii < LEN; ii++) {
      array[ii] = val % 2;
      val = val / 2;
  }
}

static inline int  pack(int  array[LEN])
{
  int ans, ii;
  ans = 0;
  for (ii = LEN-1; ii >= 0; ii--) {
      ans = 2 * ans + array[ii];
  }
  return ans;
}

int __attribute__((noinline))
foo()
{
  int temp, ans;
  int array[LEN];
  unpack(array);
  temp = array[0];
  array[0] = array[2];
  array[2] = temp;
  ans = pack(array);
  return ans;
}

int main(void)
{
  int val;
  val = foo();
  if (val != 4)
    abort ();
  return 0;
}
Comment 2 Richard Biener 2011-04-20 14:38:51 UTC
DSE2 kills the stores of the shuffle

temp = array[0];
array[0] = array[2];
array[2] = temp;

likely being confused about the loops use.  Mine.
Comment 3 Richard Biener 2011-04-20 15:08:32 UTC
I have a patch that makes it fail on trunk as well.  IVOPTs generates

  for (p = &a; p != &a - 3; --p)
    *(p + 3) = ...

and alias analysis doesn't like this invalid pointer.
Comment 4 Richard Biener 2011-04-21 10:48:34 UTC
Created attachment 24066 [details]
patch

Patch I am testing.
Comment 5 Richard Biener 2011-04-21 11:05:09 UTC
Before the patch:

        movq    %rsp, %rdx
        leaq    -16(%rsp), %rcx
        movl    $0, %eax
.L5:
        addl    %eax, %eax
        addl    12(%rdx), %eax
        subq    $4, %rdx
        cmpq    %rcx, %rdx
        jne     .L5

after the patch:

        movl    $0, %edx
        movl    $0, %eax
.L5:
        addl    %eax, %eax
        addl    12(%rsp,%rdx), %eax
        subq    $4, %rdx
        cmpq    $-16, %rdx
        jne     .L5

not 100% clear which to prefer.

Alternatively alias-analysis can be dumbed down, never exploiting knowledge
of a constant offset added to a pointer before dereferencing it
(thus, ignoring the constant offset operand in MEM_REFs and TARGET_MEM_REFs).
FYI, the disambiguation that triggers is

  /* If only one reference is based on a variable, they cannot alias if
     the pointer access is beyond the extent of the variable access.
     (the pointer base cannot validly point to an offset less than zero
     of the variable).
     They also cannot alias if the pointer may not point to the decl.  */
  if ((TREE_CODE (base1) != TARGET_MEM_REF
       || (!TMR_INDEX (base1) && !TMR_INDEX2 (base1)))
      && !ranges_overlap_p (MAX (0, offset1p), -1, offset2p, max_size2))
    return false;

thus, for "the pointer base cannod validly point to an offset less than
zero" say that this only holds for ptr + constant offset, not ptr in
isolation.
Comment 6 Zdenek Dvorak 2011-04-21 11:49:36 UTC
I think we rely on the assumption that pointer arithmetics satisfies the restrictions of C standard (i.e., that the pointer is within a single memory object) on more places than just this one.  So, if the code produced by ivopts is indeed

for (p = &a; p != &a - 3; --p)
  *(p + 3) = ...

this could potentially cause other problems other than with alias analysis.

But, I would expect the code from ivopts to be rather

for (p = &a; ...; p = (pointer) ((unsigned) p - 1)
  *((pointer) ((unsigned) p + 3)) = ...

Which should be a valid code.  Is that the case? If so, fixing alias analysis would be the proper solution.
Comment 7 Richard Biener 2011-04-21 12:32:04 UTC
IVOPTs generates

  ivtmp.25_24 = (long unsigned int) &array;
  array.26_26 = (long unsigned int) &array;
  D.2769_27 = array.26_26 + 0x0fffffffffffffff0;

<bb 3>:
  # ans_21 = PHI <ans_16(4), 0(2)>
  # ivtmp.25_20 = PHI <ivtmp.25_19(4), ivtmp.25_24(2)>
  D.2741_10 = ans_21 * 2;
  D.2767_25 = (void *) ivtmp.25_20;
  D.2737_15 = MEM[(int *)D.2767_25 + 12B];
  ans_16 = D.2741_10 + D.2737_15;
  ivtmp.25_19 = ivtmp.25_20 - 4;
  if (ivtmp.25_19 != D.2769_27)
    goto <bb 4>;
  else
    goto <bb 5>;

<bb 4>:
  goto <bb 3>;

thus, the addition of the constant offset does not happen in a separate
unsigned long computation but is (of course) folded into the (TARGET_)MEM_REF.

What now exposes this kind of bugs is that we 1) prefer to generate MEM_REFs
instead of equivalent TARGET_MEM_REFs, 2) do not completely give up on
TARGET_MEM_REFs in alias analysis.

I am somewhat sympathetic to treating indirect (TARGET_)MEM_REFs as opaque,
only looking at the final pointer that is dereferenced, not at the
pieces of the address computation.  We'd retain the case where two
such derefrences differ only in a constant offset.

I don't think that any pass interprets the address computation that is
implicit in a memory refrence in any way at the moment.
Comment 8 Richard Biener 2011-04-21 12:39:10 UTC
(In reply to comment #7)
> IVOPTs generates
> 
>   ivtmp.25_24 = (long unsigned int) &array;
>   array.26_26 = (long unsigned int) &array;
>   D.2769_27 = array.26_26 + 0x0fffffffffffffff0;
> 
> <bb 3>:
>   # ans_21 = PHI <ans_16(4), 0(2)>
>   # ivtmp.25_20 = PHI <ivtmp.25_19(4), ivtmp.25_24(2)>
>   D.2741_10 = ans_21 * 2;
>   D.2767_25 = (void *) ivtmp.25_20;

Actually D.2767_25 is already the problem as it can point before &array.

>   D.2737_15 = MEM[(int *)D.2767_25 + 12B];

Here we assume that the memory reference happens only to array[3] or beyond,
as D.2767_25 is assumed to at least point to &array[0] but never to &array[-1].

Other passes could derive similar info for D.2767_25 given that points-to
analysis computes it points somewhere into array.
Comment 9 rakdver@kam.mff.cuni.cz 2011-04-21 12:56:20 UTC
>   ivtmp.25_24 = (long unsigned int) &array;
>   array.26_26 = (long unsigned int) &array;
>   D.2769_27 = array.26_26 + 0x0fffffffffffffff0;
> 
> <bb 3>:
>   # ans_21 = PHI <ans_16(4), 0(2)>
>   # ivtmp.25_20 = PHI <ivtmp.25_19(4), ivtmp.25_24(2)>
>   D.2741_10 = ans_21 * 2;
>   D.2767_25 = (void *) ivtmp.25_20;
>   D.2737_15 = MEM[(int *)D.2767_25 + 12B];
>   ans_16 = D.2741_10 + D.2737_15;
>   ivtmp.25_19 = ivtmp.25_20 - 4;
>   if (ivtmp.25_19 != D.2769_27)
>     goto <bb 4>;
>   else
>     goto <bb 5>;
> 
> <bb 4>:
>   goto <bb 3>;

So the computation of the induction variable is performed in (long unsigned int),
which should be safe.

> I am somewhat sympathetic to treating indirect (TARGET_)MEM_REFs as opaque,
> only looking at the final pointer that is dereferenced, not at the
> pieces of the address computation.  We'd retain the case where two
> such derefrences differ only in a constant offset.
> 
> I don't think that any pass interprets the address computation that is
> implicit in a memory refrence in any way at the moment.

We definitely should decide on and document the precise semantics of (TARGET_)MEM_REFs.
One possibility is to give no guarantees on the computations (i.e., treat them as opaque);
this is easy for ivopts, but possibly removes some useful information (at least for MEM_REFs,
I would not do that).  On the other hand, if we decide to enforce the restrictions as for
the pointer arithmetics, we should also say e.g. in what way are the parts of the address
computation in TARGET_MEM_REFs associated, as that may make a difference.
Comment 10 rguenther@suse.de 2011-04-21 13:05:54 UTC
On Thu, 21 Apr 2011, rakdver at kam dot mff.cuni.cz wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48702
> 
> --- Comment #9 from rakdver at kam dot mff.cuni.cz <rakdver at kam dot mff.cuni.cz> 2011-04-21 12:56:20 UTC ---
> >   ivtmp.25_24 = (long unsigned int) &array;
> >   array.26_26 = (long unsigned int) &array;
> >   D.2769_27 = array.26_26 + 0x0fffffffffffffff0;
> > 
> > <bb 3>:
> >   # ans_21 = PHI <ans_16(4), 0(2)>
> >   # ivtmp.25_20 = PHI <ivtmp.25_19(4), ivtmp.25_24(2)>
> >   D.2741_10 = ans_21 * 2;
> >   D.2767_25 = (void *) ivtmp.25_20;
> >   D.2737_15 = MEM[(int *)D.2767_25 + 12B];
> >   ans_16 = D.2741_10 + D.2737_15;
> >   ivtmp.25_19 = ivtmp.25_20 - 4;
> >   if (ivtmp.25_19 != D.2769_27)
> >     goto <bb 4>;
> >   else
> >     goto <bb 5>;
> > 
> > <bb 4>:
> >   goto <bb 3>;
> 
> So the computation of the induction variable is performed in (long unsigned
> int),
> which should be safe.
> 
> > I am somewhat sympathetic to treating indirect (TARGET_)MEM_REFs as opaque,
> > only looking at the final pointer that is dereferenced, not at the
> > pieces of the address computation.  We'd retain the case where two
> > such derefrences differ only in a constant offset.
> > 
> > I don't think that any pass interprets the address computation that is
> > implicit in a memory refrence in any way at the moment.
> 
> We definitely should decide on and document the precise semantics of
> (TARGET_)MEM_REFs.
> One possibility is to give no guarantees on the computations (i.e., treat them
> as opaque);
> this is easy for ivopts, but possibly removes some useful information (at least
> for MEM_REFs,
> I would not do that).  On the other hand, if we decide to enforce the
> restrictions as for
> the pointer arithmetics, we should also say e.g. in what way are the parts of
> the address
> computation in TARGET_MEM_REFs associated, as that may make a difference.

What we lose when we treat them as opaque is disambiguating the load of i
in

int i;
int foo (int *p) { i = 0; *(p + 4) = 1; return i; }

where *(p + 4) is MEM[p, 16] where we see that p + 16 != &i for any
valid (in the C sense) pointer p.

That would indeed be not so nice.

Richard.
Comment 11 rakdver@kam.mff.cuni.cz 2011-04-21 13:34:29 UTC
> > > I am somewhat sympathetic to treating indirect (TARGET_)MEM_REFs as opaque,
> > > only looking at the final pointer that is dereferenced, not at the
> > > pieces of the address computation.  We'd retain the case where two
> > > such derefrences differ only in a constant offset.
> > > 
> > > I don't think that any pass interprets the address computation that is
> > > implicit in a memory refrence in any way at the moment.
> > 
> > We definitely should decide on and document the precise semantics of
> > (TARGET_)MEM_REFs.
> > One possibility is to give no guarantees on the computations (i.e., treat them
> > as opaque);
> > this is easy for ivopts, but possibly removes some useful information (at least
> > for MEM_REFs,
> > I would not do that).  On the other hand, if we decide to enforce the
> > restrictions as for
> > the pointer arithmetics, we should also say e.g. in what way are the parts of
> > the address
> > computation in TARGET_MEM_REFs associated, as that may make a difference.
> 
> What we lose when we treat them as opaque is disambiguating the load of i
> in
> 
> int i;
> int foo (int *p) { i = 0; *(p + 4) = 1; return i; }
> 
> where *(p + 4) is MEM[p, 16] where we see that p + 16 != &i for any
> valid (in the C sense) pointer p.
> 
> That would indeed be not so nice.

what about always interpreting the computations in the type of the base of MEM_REF
(allowing non-pointers as the base)?  So both of the following forms would be valid:

1)
int *p;

MEM[p + 16]
which is equivalent to *(p POINTER_PLUS 16)

2)
int *p;
unsigned v = (unsigned) p;

MEM[v + 16]
equivalent to *(int *) (v + 16)

The form 1) would be default, and contains the information needed by alias analysis.
The form 2) would only be generated by optimizations that are unable to decide whether
  the pointer arithmetics form is valid (ivopts), and would be treated as opaque by
  alias analysis.
Comment 12 davidxl 2011-04-21 16:22:03 UTC
(In reply to comment #3)
> I have a patch that makes it fail on trunk as well.  IVOPTs generates
> 
>   for (p = &a; p != &a - 3; --p)
>     *(p + 3) = ...
> 
> and alias analysis doesn't like this invalid pointer.


I wonder why ivopt does not select the iv candidate whose base is &a+3.
Comment 13 Richard Biener 2011-05-12 10:52:54 UTC
(In reply to comment #12)
> (In reply to comment #3)
> > I have a patch that makes it fail on trunk as well.  IVOPTs generates
> > 
> >   for (p = &a; p != &a - 3; --p)
> >     *(p + 3) = ...
> > 
> > and alias analysis doesn't like this invalid pointer.
> 
> 
> I wonder why ivopt does not select the iv candidate whose base is &a+3.

I think that one is not in the list of initial candidates.

I think for that sake we would want to add stripped &base + object size (+ 1?)
(if we know it) as IV candidate iff iv->step is negative, iff iv->step is
positive continue to add &base.

I don't like the alias oracle fixups too much, but I guess we have to
benchmark both alternatives.
Comment 14 davidxl 2011-05-17 17:17:11 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > (In reply to comment #3)
> > > I have a patch that makes it fail on trunk as well.  IVOPTs generates
> > > 
> > >   for (p = &a; p != &a - 3; --p)
> > >     *(p + 3) = ...
> > > 
> > > and alias analysis doesn't like this invalid pointer.
> > 
> > 
> > I wonder why ivopt does not select the iv candidate whose base is &a+3.
> 
> I think that one is not in the list of initial candidates.
> 
> I think for that sake we would want to add stripped &base + object size (+ 1?)
> (if we know it) as IV candidate iff iv->step is negative, iff iv->step is
> positive continue to add &base.
> 
> I don't like the alias oracle fixups too much, but I guess we have to
> benchmark both alternatives.

The candidate is actually there:

candidate 8
  var_before ivtmp.13
  var_after ivtmp.13
  incremented before exit test
  type long unsigned int
  base (long unsigned int) &MEM[(int *)&array][3]{lb: 0 sz: 4}
  step 0x0fffffffffffffffc
  base object (void *) &array

The following patch fixes the problem. Is it ok?

David


--- tree-ssa-loop-ivopts.c      (revision 173278)
+++ tree-ssa-loop-ivopts.c      (working copy)
@@ -3968,7 +3968,7 @@ get_computation_cost_at (struct ivopts_d
                          int *inv_expr_id)
 {
   tree ubase = use->iv->base, ustep = use->iv->step;
-  tree cbase, cstep;
+  tree cbase, cstep, cbase_strip;
   tree utype = TREE_TYPE (ubase), ctype;
   unsigned HOST_WIDE_INT cstepi, offset = 0;
   HOST_WIDE_INT ratio, aratio;
@@ -4026,6 +4026,13 @@ get_computation_cost_at (struct ivopts_d
   if (!constant_multiple_of (ustep, cstep, &rat))
     return infinite_cost;
 
+  cbase_strip = STRIP_NOPS (cbase);
+  /* Avoid confusing aliaser.  */
+  if (TREE_CODE (cbase_strip) == ADDR_EXPR
+      && TREE_CODE (TREE_OPERAND (cbase_strip, 0)) == VAR_DECL
+      && (HOST_WIDE_INT) cstepi < 0)
+    return infinite_cost;
+
   if (double_int_fits_in_shwi_p (rat))
     ratio = double_int_to_shwi (rat);
   else
Comment 15 rakdver@kam.mff.cuni.cz 2011-05-17 19:26:18 UTC
Hi,

> The following patch fixes the problem. Is it ok?

as a heuristic, this probably makes sense.  Still, it does
not fix the problem, just masks it and makes it harder to reproduce,

Zdenek

> David
> 
> 
> --- tree-ssa-loop-ivopts.c      (revision 173278)
> +++ tree-ssa-loop-ivopts.c      (working copy)
> @@ -3968,7 +3968,7 @@ get_computation_cost_at (struct ivopts_d
>                           int *inv_expr_id)
>  {
>    tree ubase = use->iv->base, ustep = use->iv->step;
> -  tree cbase, cstep;
> +  tree cbase, cstep, cbase_strip;
>    tree utype = TREE_TYPE (ubase), ctype;
>    unsigned HOST_WIDE_INT cstepi, offset = 0;
>    HOST_WIDE_INT ratio, aratio;
> @@ -4026,6 +4026,13 @@ get_computation_cost_at (struct ivopts_d
>    if (!constant_multiple_of (ustep, cstep, &rat))
>      return infinite_cost;
> 
> +  cbase_strip = STRIP_NOPS (cbase);
> +  /* Avoid confusing aliaser.  */
> +  if (TREE_CODE (cbase_strip) == ADDR_EXPR
> +      && TREE_CODE (TREE_OPERAND (cbase_strip, 0)) == VAR_DECL
> +      && (HOST_WIDE_INT) cstepi < 0)
> +    return infinite_cost;
> +
>    if (double_int_fits_in_shwi_p (rat))
>      ratio = double_int_to_shwi (rat);
>    else
> 
> -- 
> Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug.
Comment 16 davidxl 2011-05-17 19:57:01 UTC
This is not really heuristic -- it prevents compiler from generating code in ivopt that violates the aliasing assumption. 

David

(In reply to comment #15)
> Hi,
> 
> > The following patch fixes the problem. Is it ok?
> 
> as a heuristic, this probably makes sense.  Still, it does
> not fix the problem, just masks it and makes it harder to reproduce,
> 
> Zdenek
> 
> > David
> > 
> > 
> > --- tree-ssa-loop-ivopts.c      (revision 173278)
> > +++ tree-ssa-loop-ivopts.c      (working copy)
> > @@ -3968,7 +3968,7 @@ get_computation_cost_at (struct ivopts_d
> >                           int *inv_expr_id)
> >  {
> >    tree ubase = use->iv->base, ustep = use->iv->step;
> > -  tree cbase, cstep;
> > +  tree cbase, cstep, cbase_strip;
> >    tree utype = TREE_TYPE (ubase), ctype;
> >    unsigned HOST_WIDE_INT cstepi, offset = 0;
> >    HOST_WIDE_INT ratio, aratio;
> > @@ -4026,6 +4026,13 @@ get_computation_cost_at (struct ivopts_d
> >    if (!constant_multiple_of (ustep, cstep, &rat))
> >      return infinite_cost;
> > 
> > +  cbase_strip = STRIP_NOPS (cbase);
> > +  /* Avoid confusing aliaser.  */
> > +  if (TREE_CODE (cbase_strip) == ADDR_EXPR
> > +      && TREE_CODE (TREE_OPERAND (cbase_strip, 0)) == VAR_DECL
> > +      && (HOST_WIDE_INT) cstepi < 0)
> > +    return infinite_cost;
> > +
> >    if (double_int_fits_in_shwi_p (rat))
> >      ratio = double_int_to_shwi (rat);
> >    else
> > 
> > -- 
> > Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email
> > ------- You are receiving this mail because: -------
> > You are on the CC list for the bug.
Comment 17 rakdver@kam.mff.cuni.cz 2011-05-17 20:18:39 UTC
> This is not really heuristic -- it prevents compiler from generating code in
> ivopt that violates the aliasing assumption. 

no, it does not.  For example, you only consider the case whent the base of the memory
reference is addr_expr, and when the step is negative.  However, ivopts may generate
"pointers" outside of the memory objects in other cases as well; and if those get used in
mem_refs (where the computations are currently interpreted according to the pointer
arithmetics rules, for the purpose of alias analysis), this leads to misscompilations.
Comment 18 davidxl 2011-05-17 20:40:41 UTC
(In reply to comment #17)
> > This is not really heuristic -- it prevents compiler from generating code in
> > ivopt that violates the aliasing assumption. 
> 
> no, it does not. 

Does, but may not fully do.

> For example, you only consider the case whent the base of the
> memory
> reference is addr_expr, and when the step is negative.  However, ivopts may
> generate
> "pointers" outside of the memory objects in other cases as well;


Those would be bugs too -- do you have such examples? 

David

> and if those
> get used in
> mem_refs (where the computations are currently interpreted according to the
> pointer
> arithmetics rules, for the purpose of alias analysis), this leads to
> misscompilations.
Comment 19 rguenther@suse.de 2011-05-18 08:47:31 UTC
On Tue, 17 May 2011, rakdver at kam dot mff.cuni.cz wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48702
> 
> --- Comment #15 from rakdver at kam dot mff.cuni.cz <rakdver at kam dot mff.cuni.cz> 2011-05-17 19:26:18 UTC ---
> Hi,
> 
> > The following patch fixes the problem. Is it ok?
> 
> as a heuristic, this probably makes sense.  Still, it does
> not fix the problem, just masks it and makes it harder to reproduce,

Looks similar to my original workaround, no?

We can actually use something like the aliasing non-pointer base
Zdenek mentioned upthread.  TARGET_MEM_REF has two index operands
(where usually TMR_INDEX2 is NULL of TMR_BASE is non-constant).
So we could build a TARGET_MEM_REF based off TMR_BASE 0B and
move the non-pointer base to TMR_INDEX2.  The oracle then should
not be able to disambiguate anything (and also no points-to
info would be available, which probably doesn't make this the
very very best idea either).

Richard.
Comment 20 davidxl 2011-05-18 15:51:06 UTC
(In reply to comment #19)
> On Tue, 17 May 2011, rakdver at kam dot mff.cuni.cz wrote:
> 
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48702
> > 
> > --- Comment #15 from rakdver at kam dot mff.cuni.cz <rakdver at kam dot mff.cuni.cz> 2011-05-17 19:26:18 UTC ---
> > Hi,
> > 
> > > The following patch fixes the problem. Is it ok?
> > 
> > as a heuristic, this probably makes sense.  Still, it does
> > not fix the problem, just masks it and makes it harder to reproduce,
> 
> Looks similar to my original workaround, no?
> 
> We can actually use something like the aliasing non-pointer base
> Zdenek mentioned upthread.  TARGET_MEM_REF has two index operands
> (where usually TMR_INDEX2 is NULL of TMR_BASE is non-constant).
> So we could build a TARGET_MEM_REF based off TMR_BASE 0B and
> move the non-pointer base to TMR_INDEX2.  The oracle then should
> not be able to disambiguate anything (and also no points-to
> info would be available, which probably doesn't make this the
> very very best idea either).
> 
> Richard.

This is at the cost of making aliasing info conservative which should at least be modeled in the cost. Given that the alternative iv candidate exist (which does not require such treatment), discarding the troublesome candidate seems reasonable to me.

David
Comment 21 davidxl 2011-05-19 05:02:29 UTC
Before a better fix is found, is the proposed patch ok? If yes, I will do more testing and submit to gcc-patches@

David


(In reply to comment #19)
> On Tue, 17 May 2011, rakdver at kam dot mff.cuni.cz wrote:
> 
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48702
> > 
> > --- Comment #15 from rakdver at kam dot mff.cuni.cz <rakdver at kam dot mff.cuni.cz> 2011-05-17 19:26:18 UTC ---
> > Hi,
> > 
> > > The following patch fixes the problem. Is it ok?
> > 
> > as a heuristic, this probably makes sense.  Still, it does
> > not fix the problem, just masks it and makes it harder to reproduce,
> 
> Looks similar to my original workaround, no?
> 
> We can actually use something like the aliasing non-pointer base
> Zdenek mentioned upthread.  TARGET_MEM_REF has two index operands
> (where usually TMR_INDEX2 is NULL of TMR_BASE is non-constant).
> So we could build a TARGET_MEM_REF based off TMR_BASE 0B and
> move the non-pointer base to TMR_INDEX2.  The oracle then should
> not be able to disambiguate anything (and also no points-to
> info would be available, which probably doesn't make this the
> very very best idea either).
> 
> Richard.
Comment 22 Richard Biener 2011-05-25 10:19:12 UTC
Created attachment 24354 [details]
alternative patch for 4.6

This is another patch - it avoids creating MEM_REFs for base addresses that
we do not know whether they are always in bounds of the object it points to.
Instead we force the use of TARGET_MEM_REFs there and dumb down the alias
oracle to not assume that base address is within bounds.

Sounds like the most reasonable solution for 4.6 to me.
Comment 23 Richard Biener 2011-05-25 10:33:24 UTC
*** Bug 49144 has been marked as a duplicate of this bug. ***
Comment 24 Richard Biener 2011-05-25 13:58:20 UTC
(In reply to comment #22)
> Created attachment 24354 [details]
> alternative patch for 4.6
> 
> This is another patch - it avoids creating MEM_REFs for base addresses that
> we do not know whether they are always in bounds of the object it points to.
> Instead we force the use of TARGET_MEM_REFs there and dumb down the alias
> oracle to not assume that base address is within bounds.
> 
> Sounds like the most reasonable solution for 4.6 to me.

Requires staging on trunk, there are multiple small fixes that need backporting
and one last cleanup (and fix) that I had pending needs to be finished first.

But I'll continue in this direction.
Comment 25 Richard Biener 2011-05-26 10:01:33 UTC
I have benchmarked SPEC2k6 on the 4.6 branch with the patch (and required
backports) with -O3 -ffast-math -funroll-loops, Base is unpatched, Peak
is patched.  Average of three runs on a Intel(R) Core(TM) i7 CPU X 980 @ 3.33GHz, the noise in the tests is +-1s.  So from a performance point
the patch doesn't seem to have problems.

                                  Estimated                       Estimated
                Base     Base       Base        Peak     Peak       Peak
Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
-------------- ------  ---------  ---------    ------  ---------  ---------
400.perlbench    9770        334       29.2 *    9770        337       29.0 *  
401.bzip2        9650        467       20.7 *    9650        459       21.0 *  
403.gcc          8050        289       27.9 *    8050        289       27.8 *  
429.mcf          9120        302       30.2 *    9120        304       30.0 *  
445.gobmk       10490        413       25.4 *   10490        413       25.4 *  
456.hmmer        9330        375       24.9 *    9330        374       24.9 *  
458.sjeng       12100        446       27.1 *   12100        446       27.1 *  
462.libquantum  20720        332       62.4 *   20720        332       62.4 *  
464.h264ref     22130        540       41.0 *   22130        540       41.0 *  
471.omnetpp      6250        300       20.9 *    6250        305       20.5 *  
473.astar        7020        373       18.8 *    7020        373       18.8 *  
483.xalancbmk    6900        225       30.7 *    6900        226       30.5 *  
 Est. SPECint_base2006                 28.3
 Est. SPECint2006                                                      28.3

                                  Estimated                       Estimated
                Base     Base       Base        Peak     Peak       Peak
Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
-------------- ------  ---------  ---------    ------  ---------  ---------
410.bwaves      13590        347       39.2 *   13590        347       39.2 *  
416.gamess      19580        740       26.5 *   19580        738       26.5 *  
433.milc         9180        382       24.0 *    9180        381       24.1 *  
434.zeusmp       9100        336       27.1 *    9100        336       27.1 *  
435.gromacs      7140        318       22.4 *    7140        318       22.4 *  
436.cactusADM   11950        422       28.3 *   11950        426       28.1 *  
437.leslie3d     9400        316       29.8 *    9400        315       29.9 *  
444.namd         8020        397       20.2 *    8020        398       20.2 *  
447.dealII      11440        315       36.3 *   11440        314       36.5 *  
450.soplex       8340        220       37.9 *    8340        220       37.9 *  
453.povray       5320        186       28.6 *    5320        187       28.4 *  
454.calculix     8250        365       22.6 *    8250        365       22.6 *  
459.GemsFDTD    10610        344       30.9 *   10610        340       31.2 *  
465.tonto        9840        380       25.9 *    9840        378       26.0 *  
470.lbm         13740        254       54.0 *   13740        255       53.9 *  
481.wrf         11170        351       31.8 *   11170        352       31.8 *  
482.sphinx3     19490        483       40.4 *   19490        485       40.2 *  
 Est. SPECfp_base2006                  30.0
 Est. SPECfp2006                                                       30.0
Comment 26 Richard Biener 2011-05-26 11:05:18 UTC
Created attachment 24362 [details]
alternative patch for 4.6, with backports

That's what I have successfully bootstrapped and tested and SPEC tested.
Comment 27 Richard Biener 2011-05-26 13:01:50 UTC
Author: rguenth
Date: Thu May 26 13:01:48 2011
New Revision: 174282

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=174282
Log:
2011-05-26  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/48702
	* tree-ssa-address.c (create_mem_ref_raw): Create MEM_REFs
	only when we know the base address is within bounds.
	* tree-ssa-alias.c (indirect_ref_may_alias_decl_p): Do not
	assume the base address of TARGET_MEM_REFs is in bounds.

	* gcc.dg/torture/pr48702.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/torture/pr48702.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-address.c
    trunk/gcc/tree-ssa-alias.c
Comment 28 Richard Biener 2011-05-26 13:03:46 UTC
Fixed on trunk sofar.
Comment 29 Richard Biener 2011-06-06 10:13:27 UTC
Author: rguenth
Date: Mon Jun  6 10:13:23 2011
New Revision: 174688

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=174688
Log:
2011-06-06  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/48702
	* tree-ssa-address.c (create_mem_ref_raw): Create MEM_REFs
	only when we know the base address is within bounds.
	* tree-ssa-alias.c (indirect_ref_may_alias_decl_p): Do not
	assume the base address of TARGET_MEM_REFs is in bounds.
	(indirect_refs_may_alias_p): Fix TARGET_MEM_REF without index tests.

	* gcc.dg/torture/pr48702.c: New testcase.

	Backport from mainline
	2011-05-31  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/49235
	* tree-ssa-address.c (gen_addr_rtx): Ignore base if it is const0_rtx.
	(create_mem_ref_raw): Create MEM_REF even if base is INTEGER_CST.

	* gcc.dg/pr49235.c: New test.

Added:
    branches/gcc-4_6-branch/gcc/testsuite/gcc.dg/pr49235.c
    branches/gcc-4_6-branch/gcc/testsuite/gcc.dg/torture/pr48702.c
Modified:
    branches/gcc-4_6-branch/gcc/ChangeLog
    branches/gcc-4_6-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_6-branch/gcc/tree-ssa-address.c
    branches/gcc-4_6-branch/gcc/tree-ssa-alias.c
Comment 30 Richard Biener 2011-06-06 10:18:30 UTC
Fixed.