Bug 91137 - [7 Regression] Wrong code with -O3
Summary: [7 Regression] Wrong code with -O3
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 10.0
: P2 normal
Target Milestone: 7.5
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks: yarpgen
  Show dependency treegraph
 
Reported: 2019-07-10 17:53 UTC by Vsevolod Livinskii
Modified: 2021-11-01 23:07 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work: 10.0, 6.5.0, 7.5.0, 8.3.1, 9.1.1
Known to fail: 7.1.0, 7.3.0, 8.3.0, 9.1.0
Last reconfirmed: 2019-07-11 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vsevolod Livinskii 2019-07-10 17:53:21 UTC
GCC produces wrong code with –O3.

Reproducer:

#include <stdio.h>

long long a;
unsigned b;
int c[70];
int d[70][70];
int e;

void f(long long *g, int p2) { *g = p2; }

void fn2() {
  for (int j = 0; j < 70; j++) {
    for (int i = 0; i < 70; i++) {
      if (b)
        c[i] = 0;
      for (int l = 0; l < 70; l++)
        d[i][1] = d[l][i];
    }
    for (int k = 0; k < 70; k++)
      e = c[0];
  }
}

int main() {
  b = 5;
  for (int j = 0; j < 70; ++j)
    c[j] = 2075593088;
  fn2();
  f(&a, e);
  printf("%lld\n", a);
}

Error:
>$ gcc -O3 repr.c ; ./a.out
2075593088
>$ gcc -O0 repr.c ; ./a.out
0

GCC version:
gcc version 10.0.0 (Rev: 273261)
This error can also be reproduced with gcc version 7.3.1
Comment 1 Richard Biener 2019-07-11 07:49:46 UTC
Confirmed.  Happens with -O2 -fno-inline already.
Comment 2 Richard Biener 2019-07-11 08:04:34 UTC
And it is IVOPTs generating a strange IV:

  _44 = (unsigned long) &d;
  ivtmp.22_43 = _44 + 19600;
  # ivtmp.22_13 = PHI <ivtmp.22_43(9), ivtmp.22_42(7)>
...

  _50 = -_44;
  MEM[symbol: c, base: _50, index: ivtmp.22_13, offset: -19600B] = 0;

somehow confusing RTL alias analysis.  -fno-gcse hides the issue.

We expand this to

;; MEM[symbol: c, base: _50, index: ivtmp.22_13, offset: -19600B] = 0;

(insn 13 12 14 (parallel [
            (set (reg:DI 99)
                (neg:DI (reg:DI 95 [ _44 ])))
            (clobber (reg:CC 17 flags))
        ]) "t.c":15 -1
     (nil))

(insn 14 13 15 (set (reg/f:DI 100)
        (const:DI (plus:DI (symbol_ref:DI ("c") [flags 0x2]  <var_decl 0x7ff98b1d3b40 c>)
                (const_int -19600 [0xffffffffffffb370])))) "t.c":15 -1
     (nil))

(insn 15 14 0 (set (mem:SI (plus:DI (plus:DI (reg:DI 99)
                    (reg:DI 91 [ ivtmp.22 ]))
                (const:DI (plus:DI (symbol_ref:DI ("c") [flags 0x2]  <var_decl 0x7ff98b1d3b40 c>)
                        (const_int -19600 [0xffffffffffffb370])))) [2 MEM[symbol: c, base: _50, index: ivtmp.22_13, offset: -19600B]+0 S4 A32])
        (const_int 0 [0])) "t.c":15 -1
     (nil))

and I can very well imagine we're getting confused by find_base_term
logic here.

There's logic in IVOPTs to not generate IVs based on two different
objects but somehow it doesn't trigger here.
Comment 3 Richard Biener 2019-07-11 08:36:50 UTC
I think the issue is

inv_expr 5:     ((signed long) &d + 19600) - (signed long) &c

I hope Bin can investigate this.  In the end we can only fight the RTL
alias analysis issue with heuristics here though...

We're fearing compile-time issues from a fix like the following, but
ultimatively find_base_term is known broken for a _looooong_ time.

Index: gcc/alias.c
===================================================================
--- gcc/alias.c (revision 273355)
+++ gcc/alias.c (working copy)
@@ -2003,16 +2003,20 @@ find_base_term (rtx x, vec<std::pair<cse
           term is from a pointer or is a named object or a special address
           (like an argument or stack reference), then use it for the
           base term.  */
-       rtx base = find_base_term (tmp1, visited_vals);
-       if (base != NULL_RTX
-           && ((REG_P (tmp1) && REG_POINTER (tmp1))
-                || known_base_value_p (base)))
-         return base;
-       base = find_base_term (tmp2, visited_vals);
-       if (base != NULL_RTX
-           && ((REG_P (tmp2) && REG_POINTER (tmp2))
-                || known_base_value_p (base)))
-         return base;
+       rtx base1 = find_base_term (tmp1, visited_vals);
+       if (!(base1 != NULL_RTX
+             && ((REG_P (tmp1) && REG_POINTER (tmp1))
+                 || known_base_value_p (base1))))
+         base1 = NULL_RTX;
+       rtx base2 = find_base_term (tmp2, visited_vals);
+       if (!(base2 != NULL_RTX
+             && ((REG_P (tmp2) && REG_POINTER (tmp2))
+                 || known_base_value_p (base2))))
+         base2 = NULL_RTX;
+       if (base1 && !base2)
+         return base1;
+       if (!base1 && base2)
+         return base2;
 
        /* We could not determine which of the two operands was the
           base register and which was the index.  So we can determine
Comment 4 Richard Biener 2019-07-11 08:52:26 UTC
(In reply to Richard Biener from comment #3)
> I think the issue is
> 
> inv_expr 5:     ((signed long) &d + 19600) - (signed long) &c
> 
> I hope Bin can investigate this.  In the end we can only fight the RTL
> alias analysis issue with heuristics here though...
> 
> We're fearing compile-time issues from a fix like the following, but
> ultimatively find_base_term is known broken for a _looooong_ time.

Note the patch is only extra heuristics since a correct fix would be
conservatively determining whether any argument contains a base value.
See PR49330 for all the glory details.

> Index: gcc/alias.c
> ===================================================================
> --- gcc/alias.c (revision 273355)
> +++ gcc/alias.c (working copy)
> @@ -2003,16 +2003,20 @@ find_base_term (rtx x, vec<std::pair<cse
>            term is from a pointer or is a named object or a special address
>            (like an argument or stack reference), then use it for the
>            base term.  */
> -       rtx base = find_base_term (tmp1, visited_vals);
> -       if (base != NULL_RTX
> -           && ((REG_P (tmp1) && REG_POINTER (tmp1))
> -                || known_base_value_p (base)))
> -         return base;
> -       base = find_base_term (tmp2, visited_vals);
> -       if (base != NULL_RTX
> -           && ((REG_P (tmp2) && REG_POINTER (tmp2))
> -                || known_base_value_p (base)))
> -         return base;
> +       rtx base1 = find_base_term (tmp1, visited_vals);
> +       if (!(base1 != NULL_RTX
> +             && ((REG_P (tmp1) && REG_POINTER (tmp1))
> +                 || known_base_value_p (base1))))
> +         base1 = NULL_RTX;
> +       rtx base2 = find_base_term (tmp2, visited_vals);
> +       if (!(base2 != NULL_RTX
> +             && ((REG_P (tmp2) && REG_POINTER (tmp2))
> +                 || known_base_value_p (base2))))
> +         base2 = NULL_RTX;
> +       if (base1 && !base2)
> +         return base1;
> +       if (!base1 && base2)
> +         return base2;
>  
>         /* We could not determine which of the two operands was the
>            base register and which was the index.  So we can determine
Comment 5 bin cheng 2019-07-11 09:37:17 UTC
Will try to find some time this WE, sorry for delaying.
Comment 6 bin cheng 2019-07-15 11:49:18 UTC
(In reply to Richard Biener from comment #2)
> 
> and I can very well imagine we're getting confused by find_base_term
> logic here.
> 
> There's logic in IVOPTs to not generate IVs based on two different
> objects but somehow it doesn't trigger here.

Hmm, it's because determine_base_object failed to identify the `base_object` for IV because it has non-pointer type:
IV struct:
  SSA_NAME:     _32
  Type: unsigned long
  Base: (unsigned long) &d + 19600
  Step: 4
  Biv:  N
  Overflowness wrto loop niter: Overflow

And we have short-circuit in determine_base_object:

static tree
determine_base_object (tree expr)
{
  enum tree_code code = TREE_CODE (expr);
  tree base, obj;

  /* If this is a pointer casted to any type, we need to determine
     the base object for the pointer; so handle conversions before
     throwing away non-pointer expressions.  */
  if (CONVERT_EXPR_P (expr))
    return determine_base_object (TREE_OPERAND (expr, 0));

  if (!POINTER_TYPE_P (TREE_TYPE (expr)))
    return NULL_TREE;

The IV is generated from inner loop ivopts as we rewrite using unsigned type.

Any suggestion how to fix this?
Comment 7 rguenther@suse.de 2019-07-15 12:01:02 UTC
On Mon, 15 Jul 2019, amker at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91137
> 
> --- Comment #6 from bin cheng <amker at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #2)
> > 
> > and I can very well imagine we're getting confused by find_base_term
> > logic here.
> > 
> > There's logic in IVOPTs to not generate IVs based on two different
> > objects but somehow it doesn't trigger here.
> 
> Hmm, it's because determine_base_object failed to identify the `base_object`
> for IV because it has non-pointer type:
> IV struct:
>   SSA_NAME:     _32
>   Type: unsigned long
>   Base: (unsigned long) &d + 19600
>   Step: 4
>   Biv:  N
>   Overflowness wrto loop niter: Overflow
> 
> And we have short-circuit in determine_base_object:
> 
> static tree
> determine_base_object (tree expr)
> {
>   enum tree_code code = TREE_CODE (expr);
>   tree base, obj;
> 
>   /* If this is a pointer casted to any type, we need to determine
>      the base object for the pointer; so handle conversions before
>      throwing away non-pointer expressions.  */
>   if (CONVERT_EXPR_P (expr))
>     return determine_base_object (TREE_OPERAND (expr, 0));
> 
>   if (!POINTER_TYPE_P (TREE_TYPE (expr)))
>     return NULL_TREE;
> 
> The IV is generated from inner loop ivopts as we rewrite using unsigned type.
> 
> Any suggestion how to fix this?

I think we need to elide this check and make the following code
more powerful which includes actually handling PLUS/MINUS_EXPR.
There's also ptr + (unsigned)&ptr to be considered for the
POINTER_PLUS_EXPR case - thus we cannot simply only search the
pointer chain.

Which then leads us into exponential behavior if this is asked
on SCEV results which may have tree sharing, thus we need some
'visited' machinery.  In the end I think we should re-do
it like I re-did contains_abnormal_ssa_name_p, use
walk_tree_without_duplicates.  Btw, what should happen if the
walk discovers two bases are used in the expression?
Comment 8 bin cheng 2019-07-15 12:12:56 UTC
(In reply to rguenther@suse.de from comment #7)
> On Mon, 15 Jul 2019, amker at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91137
> > 
> > --- Comment #6 from bin cheng <amker at gcc dot gnu.org> ---
> > (In reply to Richard Biener from comment #2)
> > > 
> > > and I can very well imagine we're getting confused by find_base_term
> > > logic here.
> > > 
> > > There's logic in IVOPTs to not generate IVs based on two different
> > > objects but somehow it doesn't trigger here.
> > 
> > Hmm, it's because determine_base_object failed to identify the `base_object`
> > for IV because it has non-pointer type:
> > IV struct:
> >   SSA_NAME:     _32
> >   Type: unsigned long
> >   Base: (unsigned long) &d + 19600
> >   Step: 4
> >   Biv:  N
> >   Overflowness wrto loop niter: Overflow
> > 
> > And we have short-circuit in determine_base_object:
> > 
> > static tree
> > determine_base_object (tree expr)
> > {
> >   enum tree_code code = TREE_CODE (expr);
> >   tree base, obj;
> > 
> >   /* If this is a pointer casted to any type, we need to determine
> >      the base object for the pointer; so handle conversions before
> >      throwing away non-pointer expressions.  */
> >   if (CONVERT_EXPR_P (expr))
> >     return determine_base_object (TREE_OPERAND (expr, 0));
> > 
> >   if (!POINTER_TYPE_P (TREE_TYPE (expr)))
> >     return NULL_TREE;
> > 
> > The IV is generated from inner loop ivopts as we rewrite using unsigned type.
> > 
> > Any suggestion how to fix this?
> 
> I think we need to elide this check and make the following code
> more powerful which includes actually handling PLUS/MINUS_EXPR.
> There's also ptr + (unsigned)&ptr to be considered for the
> POINTER_PLUS_EXPR case - thus we cannot simply only search the
> pointer chain.
> 
> Which then leads us into exponential behavior if this is asked
> on SCEV results which may have tree sharing, thus we need some
> 'visited' machinery.  In the end I think we should re-do
Will work on a patch.  One thing I am unclear about is the ptr + (unsigned)&ptr stuff, when would we have this? Could you provide an example please?

> it like I re-did contains_abnormal_ssa_name_p, use
> walk_tree_without_duplicates.  Btw, what should happen if the
> walk discovers two bases are used in the expression?
I guess it depends on why there are multiple bases in the first place?
Comment 9 rguenther@suse.de 2019-07-15 12:20:54 UTC
On Mon, 15 Jul 2019, amker at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91137
> 
> --- Comment #8 from bin cheng <amker at gcc dot gnu.org> ---
> (In reply to rguenther@suse.de from comment #7)
> > On Mon, 15 Jul 2019, amker at gcc dot gnu.org wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91137
> > > 
> > > --- Comment #6 from bin cheng <amker at gcc dot gnu.org> ---
> > > (In reply to Richard Biener from comment #2)
> > > > 
> > > > and I can very well imagine we're getting confused by find_base_term
> > > > logic here.
> > > > 
> > > > There's logic in IVOPTs to not generate IVs based on two different
> > > > objects but somehow it doesn't trigger here.
> > > 
> > > Hmm, it's because determine_base_object failed to identify the `base_object`
> > > for IV because it has non-pointer type:
> > > IV struct:
> > >   SSA_NAME:     _32
> > >   Type: unsigned long
> > >   Base: (unsigned long) &d + 19600
> > >   Step: 4
> > >   Biv:  N
> > >   Overflowness wrto loop niter: Overflow
> > > 
> > > And we have short-circuit in determine_base_object:
> > > 
> > > static tree
> > > determine_base_object (tree expr)
> > > {
> > >   enum tree_code code = TREE_CODE (expr);
> > >   tree base, obj;
> > > 
> > >   /* If this is a pointer casted to any type, we need to determine
> > >      the base object for the pointer; so handle conversions before
> > >      throwing away non-pointer expressions.  */
> > >   if (CONVERT_EXPR_P (expr))
> > >     return determine_base_object (TREE_OPERAND (expr, 0));
> > > 
> > >   if (!POINTER_TYPE_P (TREE_TYPE (expr)))
> > >     return NULL_TREE;
> > > 
> > > The IV is generated from inner loop ivopts as we rewrite using unsigned type.
> > > 
> > > Any suggestion how to fix this?
> > 
> > I think we need to elide this check and make the following code
> > more powerful which includes actually handling PLUS/MINUS_EXPR.
> > There's also ptr + (unsigned)&ptr to be considered for the
> > POINTER_PLUS_EXPR case - thus we cannot simply only search the
> > pointer chain.
> > 
> > Which then leads us into exponential behavior if this is asked
> > on SCEV results which may have tree sharing, thus we need some
> > 'visited' machinery.  In the end I think we should re-do
> Will work on a patch.  One thing I am unclear about is the ptr + (unsigned)&ptr
> stuff, when would we have this? Could you provide an example please?

Can't provide a complete example but maybe sth like

int a,b;
void foo (int *p, int *q)
{
 p += (uintptr_t)&a - (uintptr_t)&b;
 for (; p != q; ++p)
   *p = 1;
}

> > it like I re-did contains_abnormal_ssa_name_p, use
> > walk_tree_without_duplicates.  Btw, what should happen if the
> > walk discovers two bases are used in the expression?
> I guess it depends on why there are multiple bases in the first place?

See above, in reality there's only one base but later on RTL we
lose the distinction between pointers and integers and thus are
easily fooled.
Comment 10 bin cheng 2019-07-18 08:38:40 UTC
Author: amker
Date: Thu Jul 18 08:38:09 2019
New Revision: 273570

URL: https://gcc.gnu.org/viewcvs?rev=273570&root=gcc&view=rev
Log:
        PR tree-optimization/91137
        * tree-ssa-loop-ivopts.c (struct ivopts_data): New field.
        (tree_ssa_iv_optimize_init, alloc_iv, tree_ssa_iv_optimize_finalize):
        Init, use and fini the above new field.
        (determine_base_object_1): New function.
        (determine_base_object): Reimplement using walk_tree.

gcc/testsuite
        PR tree-optimization/91137
        * gcc.c-torture/execute/pr91137.c: New test.

Added:
    trunk/gcc/testsuite/gcc.c-torture/execute/pr91137.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-loop-ivopts.c
Comment 11 bin cheng 2019-07-21 00:56:37 UTC
Hi, suppose this patch should be backported to 8/7 if no further issues.
Comment 12 Richard Biener 2019-07-22 10:47:09 UTC
(In reply to bin cheng from comment #11)
> Hi, suppose this patch should be backported to 8/7 if no further issues.

Yes, please do the backport to GCC 9 reasonably quickly and 8/7 later if
the GCC 9 one doesn't show any issues.
Comment 13 bin cheng 2019-07-24 01:29:04 UTC
Author: amker
Date: Wed Jul 24 01:28:33 2019
New Revision: 273754

URL: https://gcc.gnu.org/viewcvs?rev=273754&root=gcc&view=rev
Log:
        Backport from mainline
        2019-07-18  Bin Cheng  <bin.linux@linux.alibaba.com>

        PR tree-optimization/91137
        * tree-ssa-loop-ivopts.c (struct ivopts_data): New field.
        (tree_ssa_iv_optimize_init, alloc_iv, tree_ssa_iv_optimize_finalize):
        Init, use and fini the above new field.
        (determine_base_object_1): New function.
        (determine_base_object): Reimplement using walk_tree.

gcc/testsuite
        2019-07-18  Bin Cheng  <bin.linux@linux.alibaba.com>

        PR tree-optimization/91137
        * gcc.c-torture/execute/pr91137.c: New test.

Added:
    branches/gcc-9-branch/gcc/testsuite/gcc.c-torture/execute/pr91137.c
Modified:
    branches/gcc-9-branch/gcc/ChangeLog
    branches/gcc-9-branch/gcc/testsuite/ChangeLog
    branches/gcc-9-branch/gcc/tree-ssa-loop-ivopts.c
Comment 14 bin cheng 2019-08-30 11:03:19 UTC
Author: amker
Date: Fri Aug 30 11:02:48 2019
New Revision: 275064

URL: https://gcc.gnu.org/viewcvs?rev=275064&root=gcc&view=rev
Log:
	Backport from mainline
	2019-07-18  Bin Cheng  <bin.linux@linux.alibaba.com>

	PR tree-optimization/91137
	* tree-ssa-loop-ivopts.c (struct ivopts_data): New field.
	(tree_ssa_iv_optimize_init, alloc_iv, tree_ssa_iv_optimize_finalize):
	Init, use and fini the above new field.
	(determine_base_object_1): New function.
	(determine_base_object): Reimplement using walk_tree.

	2019-07-18  Bin Cheng  <bin.linux@linux.alibaba.com>

	PR tree-optimization/91137
	* gcc.c-torture/execute/pr91137.c: New test.

Added:
    branches/gcc-8-branch/gcc/testsuite/gcc.c-torture/execute/pr91137.c
Modified:
    branches/gcc-8-branch/gcc/ChangeLog
    branches/gcc-8-branch/gcc/testsuite/ChangeLog
    branches/gcc-8-branch/gcc/tree-ssa-loop-ivopts.c
Comment 15 bin cheng 2019-09-02 10:11:16 UTC
Author: amker
Date: Mon Sep  2 10:10:44 2019
New Revision: 275304

URL: https://gcc.gnu.org/viewcvs?rev=275304&root=gcc&view=rev
Log:
	Backport from mainline
	2019-07-18  Bin Cheng  <bin.linux@linux.alibaba.com>

	PR tree-optimization/91137
	* tree-ssa-loop-ivopts.c (struct ivopts_data): New field.
	(tree_ssa_iv_optimize_init, alloc_iv, tree_ssa_iv_optimize_finalize):
	Init, use and fini the above new field.
	(determine_base_object_1): New function.
	(determine_base_object): Reimplement using walk_tree.

	2019-07-18  Bin Cheng  <bin.linux@linux.alibaba.com>

	PR tree-optimization/91137
	* gcc.c-torture/execute/pr91137.c: New test.

Added:
    branches/gcc-7-branch/gcc/testsuite/gcc.c-torture/execute/pr91137.c
Modified:
    branches/gcc-7-branch/gcc/ChangeLog
    branches/gcc-7-branch/gcc/testsuite/ChangeLog
    branches/gcc-7-branch/gcc/tree-ssa-loop-ivopts.c
Comment 16 Richard Biener 2019-11-14 11:39:50 UTC
Fixed.