GCC produces wrong code with –O3. Reproducer: #include <stdio.h> long long a; unsigned b; int c[70]; int d[70][70]; int e; void f(long long *g, int p2) { *g = p2; } void fn2() { for (int j = 0; j < 70; j++) { for (int i = 0; i < 70; i++) { if (b) c[i] = 0; for (int l = 0; l < 70; l++) d[i][1] = d[l][i]; } for (int k = 0; k < 70; k++) e = c[0]; } } int main() { b = 5; for (int j = 0; j < 70; ++j) c[j] = 2075593088; fn2(); f(&a, e); printf("%lld\n", a); } Error: >$ gcc -O3 repr.c ; ./a.out 2075593088 >$ gcc -O0 repr.c ; ./a.out 0 GCC version: gcc version 10.0.0 (Rev: 273261) This error can also be reproduced with gcc version 7.3.1
Confirmed. Happens with -O2 -fno-inline already.
And it is IVOPTs generating a strange IV: _44 = (unsigned long) &d; ivtmp.22_43 = _44 + 19600; # ivtmp.22_13 = PHI <ivtmp.22_43(9), ivtmp.22_42(7)> ... _50 = -_44; MEM[symbol: c, base: _50, index: ivtmp.22_13, offset: -19600B] = 0; somehow confusing RTL alias analysis. -fno-gcse hides the issue. We expand this to ;; MEM[symbol: c, base: _50, index: ivtmp.22_13, offset: -19600B] = 0; (insn 13 12 14 (parallel [ (set (reg:DI 99) (neg:DI (reg:DI 95 [ _44 ]))) (clobber (reg:CC 17 flags)) ]) "t.c":15 -1 (nil)) (insn 14 13 15 (set (reg/f:DI 100) (const:DI (plus:DI (symbol_ref:DI ("c") [flags 0x2] <var_decl 0x7ff98b1d3b40 c>) (const_int -19600 [0xffffffffffffb370])))) "t.c":15 -1 (nil)) (insn 15 14 0 (set (mem:SI (plus:DI (plus:DI (reg:DI 99) (reg:DI 91 [ ivtmp.22 ])) (const:DI (plus:DI (symbol_ref:DI ("c") [flags 0x2] <var_decl 0x7ff98b1d3b40 c>) (const_int -19600 [0xffffffffffffb370])))) [2 MEM[symbol: c, base: _50, index: ivtmp.22_13, offset: -19600B]+0 S4 A32]) (const_int 0 [0])) "t.c":15 -1 (nil)) and I can very well imagine we're getting confused by find_base_term logic here. There's logic in IVOPTs to not generate IVs based on two different objects but somehow it doesn't trigger here.
I think the issue is inv_expr 5: ((signed long) &d + 19600) - (signed long) &c I hope Bin can investigate this. In the end we can only fight the RTL alias analysis issue with heuristics here though... We're fearing compile-time issues from a fix like the following, but ultimatively find_base_term is known broken for a _looooong_ time. Index: gcc/alias.c =================================================================== --- gcc/alias.c (revision 273355) +++ gcc/alias.c (working copy) @@ -2003,16 +2003,20 @@ find_base_term (rtx x, vec<std::pair<cse term is from a pointer or is a named object or a special address (like an argument or stack reference), then use it for the base term. */ - rtx base = find_base_term (tmp1, visited_vals); - if (base != NULL_RTX - && ((REG_P (tmp1) && REG_POINTER (tmp1)) - || known_base_value_p (base))) - return base; - base = find_base_term (tmp2, visited_vals); - if (base != NULL_RTX - && ((REG_P (tmp2) && REG_POINTER (tmp2)) - || known_base_value_p (base))) - return base; + rtx base1 = find_base_term (tmp1, visited_vals); + if (!(base1 != NULL_RTX + && ((REG_P (tmp1) && REG_POINTER (tmp1)) + || known_base_value_p (base1)))) + base1 = NULL_RTX; + rtx base2 = find_base_term (tmp2, visited_vals); + if (!(base2 != NULL_RTX + && ((REG_P (tmp2) && REG_POINTER (tmp2)) + || known_base_value_p (base2)))) + base2 = NULL_RTX; + if (base1 && !base2) + return base1; + if (!base1 && base2) + return base2; /* We could not determine which of the two operands was the base register and which was the index. So we can determine
(In reply to Richard Biener from comment #3) > I think the issue is > > inv_expr 5: ((signed long) &d + 19600) - (signed long) &c > > I hope Bin can investigate this. In the end we can only fight the RTL > alias analysis issue with heuristics here though... > > We're fearing compile-time issues from a fix like the following, but > ultimatively find_base_term is known broken for a _looooong_ time. Note the patch is only extra heuristics since a correct fix would be conservatively determining whether any argument contains a base value. See PR49330 for all the glory details. > Index: gcc/alias.c > =================================================================== > --- gcc/alias.c (revision 273355) > +++ gcc/alias.c (working copy) > @@ -2003,16 +2003,20 @@ find_base_term (rtx x, vec<std::pair<cse > term is from a pointer or is a named object or a special address > (like an argument or stack reference), then use it for the > base term. */ > - rtx base = find_base_term (tmp1, visited_vals); > - if (base != NULL_RTX > - && ((REG_P (tmp1) && REG_POINTER (tmp1)) > - || known_base_value_p (base))) > - return base; > - base = find_base_term (tmp2, visited_vals); > - if (base != NULL_RTX > - && ((REG_P (tmp2) && REG_POINTER (tmp2)) > - || known_base_value_p (base))) > - return base; > + rtx base1 = find_base_term (tmp1, visited_vals); > + if (!(base1 != NULL_RTX > + && ((REG_P (tmp1) && REG_POINTER (tmp1)) > + || known_base_value_p (base1)))) > + base1 = NULL_RTX; > + rtx base2 = find_base_term (tmp2, visited_vals); > + if (!(base2 != NULL_RTX > + && ((REG_P (tmp2) && REG_POINTER (tmp2)) > + || known_base_value_p (base2)))) > + base2 = NULL_RTX; > + if (base1 && !base2) > + return base1; > + if (!base1 && base2) > + return base2; > > /* We could not determine which of the two operands was the > base register and which was the index. So we can determine
Will try to find some time this WE, sorry for delaying.
(In reply to Richard Biener from comment #2) > > and I can very well imagine we're getting confused by find_base_term > logic here. > > There's logic in IVOPTs to not generate IVs based on two different > objects but somehow it doesn't trigger here. Hmm, it's because determine_base_object failed to identify the `base_object` for IV because it has non-pointer type: IV struct: SSA_NAME: _32 Type: unsigned long Base: (unsigned long) &d + 19600 Step: 4 Biv: N Overflowness wrto loop niter: Overflow And we have short-circuit in determine_base_object: static tree determine_base_object (tree expr) { enum tree_code code = TREE_CODE (expr); tree base, obj; /* If this is a pointer casted to any type, we need to determine the base object for the pointer; so handle conversions before throwing away non-pointer expressions. */ if (CONVERT_EXPR_P (expr)) return determine_base_object (TREE_OPERAND (expr, 0)); if (!POINTER_TYPE_P (TREE_TYPE (expr))) return NULL_TREE; The IV is generated from inner loop ivopts as we rewrite using unsigned type. Any suggestion how to fix this?
On Mon, 15 Jul 2019, amker at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91137 > > --- Comment #6 from bin cheng <amker at gcc dot gnu.org> --- > (In reply to Richard Biener from comment #2) > > > > and I can very well imagine we're getting confused by find_base_term > > logic here. > > > > There's logic in IVOPTs to not generate IVs based on two different > > objects but somehow it doesn't trigger here. > > Hmm, it's because determine_base_object failed to identify the `base_object` > for IV because it has non-pointer type: > IV struct: > SSA_NAME: _32 > Type: unsigned long > Base: (unsigned long) &d + 19600 > Step: 4 > Biv: N > Overflowness wrto loop niter: Overflow > > And we have short-circuit in determine_base_object: > > static tree > determine_base_object (tree expr) > { > enum tree_code code = TREE_CODE (expr); > tree base, obj; > > /* If this is a pointer casted to any type, we need to determine > the base object for the pointer; so handle conversions before > throwing away non-pointer expressions. */ > if (CONVERT_EXPR_P (expr)) > return determine_base_object (TREE_OPERAND (expr, 0)); > > if (!POINTER_TYPE_P (TREE_TYPE (expr))) > return NULL_TREE; > > The IV is generated from inner loop ivopts as we rewrite using unsigned type. > > Any suggestion how to fix this? I think we need to elide this check and make the following code more powerful which includes actually handling PLUS/MINUS_EXPR. There's also ptr + (unsigned)&ptr to be considered for the POINTER_PLUS_EXPR case - thus we cannot simply only search the pointer chain. Which then leads us into exponential behavior if this is asked on SCEV results which may have tree sharing, thus we need some 'visited' machinery. In the end I think we should re-do it like I re-did contains_abnormal_ssa_name_p, use walk_tree_without_duplicates. Btw, what should happen if the walk discovers two bases are used in the expression?
(In reply to rguenther@suse.de from comment #7) > On Mon, 15 Jul 2019, amker at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91137 > > > > --- Comment #6 from bin cheng <amker at gcc dot gnu.org> --- > > (In reply to Richard Biener from comment #2) > > > > > > and I can very well imagine we're getting confused by find_base_term > > > logic here. > > > > > > There's logic in IVOPTs to not generate IVs based on two different > > > objects but somehow it doesn't trigger here. > > > > Hmm, it's because determine_base_object failed to identify the `base_object` > > for IV because it has non-pointer type: > > IV struct: > > SSA_NAME: _32 > > Type: unsigned long > > Base: (unsigned long) &d + 19600 > > Step: 4 > > Biv: N > > Overflowness wrto loop niter: Overflow > > > > And we have short-circuit in determine_base_object: > > > > static tree > > determine_base_object (tree expr) > > { > > enum tree_code code = TREE_CODE (expr); > > tree base, obj; > > > > /* If this is a pointer casted to any type, we need to determine > > the base object for the pointer; so handle conversions before > > throwing away non-pointer expressions. */ > > if (CONVERT_EXPR_P (expr)) > > return determine_base_object (TREE_OPERAND (expr, 0)); > > > > if (!POINTER_TYPE_P (TREE_TYPE (expr))) > > return NULL_TREE; > > > > The IV is generated from inner loop ivopts as we rewrite using unsigned type. > > > > Any suggestion how to fix this? > > I think we need to elide this check and make the following code > more powerful which includes actually handling PLUS/MINUS_EXPR. > There's also ptr + (unsigned)&ptr to be considered for the > POINTER_PLUS_EXPR case - thus we cannot simply only search the > pointer chain. > > Which then leads us into exponential behavior if this is asked > on SCEV results which may have tree sharing, thus we need some > 'visited' machinery. In the end I think we should re-do Will work on a patch. One thing I am unclear about is the ptr + (unsigned)&ptr stuff, when would we have this? Could you provide an example please? > it like I re-did contains_abnormal_ssa_name_p, use > walk_tree_without_duplicates. Btw, what should happen if the > walk discovers two bases are used in the expression? I guess it depends on why there are multiple bases in the first place?
On Mon, 15 Jul 2019, amker at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91137 > > --- Comment #8 from bin cheng <amker at gcc dot gnu.org> --- > (In reply to rguenther@suse.de from comment #7) > > On Mon, 15 Jul 2019, amker at gcc dot gnu.org wrote: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91137 > > > > > > --- Comment #6 from bin cheng <amker at gcc dot gnu.org> --- > > > (In reply to Richard Biener from comment #2) > > > > > > > > and I can very well imagine we're getting confused by find_base_term > > > > logic here. > > > > > > > > There's logic in IVOPTs to not generate IVs based on two different > > > > objects but somehow it doesn't trigger here. > > > > > > Hmm, it's because determine_base_object failed to identify the `base_object` > > > for IV because it has non-pointer type: > > > IV struct: > > > SSA_NAME: _32 > > > Type: unsigned long > > > Base: (unsigned long) &d + 19600 > > > Step: 4 > > > Biv: N > > > Overflowness wrto loop niter: Overflow > > > > > > And we have short-circuit in determine_base_object: > > > > > > static tree > > > determine_base_object (tree expr) > > > { > > > enum tree_code code = TREE_CODE (expr); > > > tree base, obj; > > > > > > /* If this is a pointer casted to any type, we need to determine > > > the base object for the pointer; so handle conversions before > > > throwing away non-pointer expressions. */ > > > if (CONVERT_EXPR_P (expr)) > > > return determine_base_object (TREE_OPERAND (expr, 0)); > > > > > > if (!POINTER_TYPE_P (TREE_TYPE (expr))) > > > return NULL_TREE; > > > > > > The IV is generated from inner loop ivopts as we rewrite using unsigned type. > > > > > > Any suggestion how to fix this? > > > > I think we need to elide this check and make the following code > > more powerful which includes actually handling PLUS/MINUS_EXPR. > > There's also ptr + (unsigned)&ptr to be considered for the > > POINTER_PLUS_EXPR case - thus we cannot simply only search the > > pointer chain. > > > > Which then leads us into exponential behavior if this is asked > > on SCEV results which may have tree sharing, thus we need some > > 'visited' machinery. In the end I think we should re-do > Will work on a patch. One thing I am unclear about is the ptr + (unsigned)&ptr > stuff, when would we have this? Could you provide an example please? Can't provide a complete example but maybe sth like int a,b; void foo (int *p, int *q) { p += (uintptr_t)&a - (uintptr_t)&b; for (; p != q; ++p) *p = 1; } > > it like I re-did contains_abnormal_ssa_name_p, use > > walk_tree_without_duplicates. Btw, what should happen if the > > walk discovers two bases are used in the expression? > I guess it depends on why there are multiple bases in the first place? See above, in reality there's only one base but later on RTL we lose the distinction between pointers and integers and thus are easily fooled.
Author: amker Date: Thu Jul 18 08:38:09 2019 New Revision: 273570 URL: https://gcc.gnu.org/viewcvs?rev=273570&root=gcc&view=rev Log: PR tree-optimization/91137 * tree-ssa-loop-ivopts.c (struct ivopts_data): New field. (tree_ssa_iv_optimize_init, alloc_iv, tree_ssa_iv_optimize_finalize): Init, use and fini the above new field. (determine_base_object_1): New function. (determine_base_object): Reimplement using walk_tree. gcc/testsuite PR tree-optimization/91137 * gcc.c-torture/execute/pr91137.c: New test. Added: trunk/gcc/testsuite/gcc.c-torture/execute/pr91137.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-loop-ivopts.c
Hi, suppose this patch should be backported to 8/7 if no further issues.
(In reply to bin cheng from comment #11) > Hi, suppose this patch should be backported to 8/7 if no further issues. Yes, please do the backport to GCC 9 reasonably quickly and 8/7 later if the GCC 9 one doesn't show any issues.
Author: amker Date: Wed Jul 24 01:28:33 2019 New Revision: 273754 URL: https://gcc.gnu.org/viewcvs?rev=273754&root=gcc&view=rev Log: Backport from mainline 2019-07-18 Bin Cheng <bin.linux@linux.alibaba.com> PR tree-optimization/91137 * tree-ssa-loop-ivopts.c (struct ivopts_data): New field. (tree_ssa_iv_optimize_init, alloc_iv, tree_ssa_iv_optimize_finalize): Init, use and fini the above new field. (determine_base_object_1): New function. (determine_base_object): Reimplement using walk_tree. gcc/testsuite 2019-07-18 Bin Cheng <bin.linux@linux.alibaba.com> PR tree-optimization/91137 * gcc.c-torture/execute/pr91137.c: New test. Added: branches/gcc-9-branch/gcc/testsuite/gcc.c-torture/execute/pr91137.c Modified: branches/gcc-9-branch/gcc/ChangeLog branches/gcc-9-branch/gcc/testsuite/ChangeLog branches/gcc-9-branch/gcc/tree-ssa-loop-ivopts.c
Author: amker Date: Fri Aug 30 11:02:48 2019 New Revision: 275064 URL: https://gcc.gnu.org/viewcvs?rev=275064&root=gcc&view=rev Log: Backport from mainline 2019-07-18 Bin Cheng <bin.linux@linux.alibaba.com> PR tree-optimization/91137 * tree-ssa-loop-ivopts.c (struct ivopts_data): New field. (tree_ssa_iv_optimize_init, alloc_iv, tree_ssa_iv_optimize_finalize): Init, use and fini the above new field. (determine_base_object_1): New function. (determine_base_object): Reimplement using walk_tree. 2019-07-18 Bin Cheng <bin.linux@linux.alibaba.com> PR tree-optimization/91137 * gcc.c-torture/execute/pr91137.c: New test. Added: branches/gcc-8-branch/gcc/testsuite/gcc.c-torture/execute/pr91137.c Modified: branches/gcc-8-branch/gcc/ChangeLog branches/gcc-8-branch/gcc/testsuite/ChangeLog branches/gcc-8-branch/gcc/tree-ssa-loop-ivopts.c
Author: amker Date: Mon Sep 2 10:10:44 2019 New Revision: 275304 URL: https://gcc.gnu.org/viewcvs?rev=275304&root=gcc&view=rev Log: Backport from mainline 2019-07-18 Bin Cheng <bin.linux@linux.alibaba.com> PR tree-optimization/91137 * tree-ssa-loop-ivopts.c (struct ivopts_data): New field. (tree_ssa_iv_optimize_init, alloc_iv, tree_ssa_iv_optimize_finalize): Init, use and fini the above new field. (determine_base_object_1): New function. (determine_base_object): Reimplement using walk_tree. 2019-07-18 Bin Cheng <bin.linux@linux.alibaba.com> PR tree-optimization/91137 * gcc.c-torture/execute/pr91137.c: New test. Added: branches/gcc-7-branch/gcc/testsuite/gcc.c-torture/execute/pr91137.c Modified: branches/gcc-7-branch/gcc/ChangeLog branches/gcc-7-branch/gcc/testsuite/ChangeLog branches/gcc-7-branch/gcc/tree-ssa-loop-ivopts.c
Fixed.