Testcase: (compiled with -O2 at least) int f(int a, int b) { if (a > 0x7FFFFFF0) return 0; if (b > 0x7FFFFFF0) return 0; int c = (a - 20) + (b - 20); return c > 0x7FFFFFF0; } GCC 4.1.2 and 4.3.0 (snapshot from 2006-12-17) optimizes the whole function to a single "return 0;". This would be correct if the function was actually written with "c = a + b - 40" under a non-overflow assumption. GCC could indeed deduce that c is no bigger than 0x7FFFFFFF - 40. But as the function was originally written, this property does not hold any longer. For example, a = 0x7FFFFFF0 and b = 41 will not cause any overflow during computations, and the last conditional shall hence evaluate to true. The problem is that GCC performs VRP with C language semantic (undefined behavior on overflow) on code that is no longer the input as written by the user; so this semantic is not valid at that point. The user input should not have undergone a transformation based on associativity. Tested with Debian packages. GCC 3.3.6, 3.4.6, and 4.0.4 generate correct code. GCC 4.1.2 and 4.3.0 generates wrong code. As the expression "a + b - 40" is generated early, I suppose any GCC with VRP would produce wrong code for this testcase.
Is it really a middle-end issue? It could also be seen as a front-end issue, as it does produce "a + b - 40", doesn't it? If the front-end had given "(a - 20) + (b - 20)" to the middle-end, then the correct ranges would have been computed. When adjusting the testcase in such a way, GCC does generate correct code: int f(int a, int b) { if (a > 0x7FFFFFF0) return 0; if (b > 0x7FFFFFF0) return 0; int d = a - 20; int e = b - 20; int c = d + e; return c > 0x7FFFFFF0; }
(In reply to comment #1) > Is it really a middle-end issue? Yes because fold is part of the middle-end that is most likely causing this being translated into "a + b - 40". And this is the reason why the way wrote it the second way works is because fold does not see the full expression so it does not do the folding.
This most likely comes from the " associate:" case in fold_binary where we reassociate the PLUS_EXPR and MINUS_EXPR and then fold the -20 + -20 into -40.
Just for the sake of completeness. Wrong code is also generated when addition and multiplication are mixed, because of distributivity: int f(int a) { if (a > 0x7FFFFFF0) return 0; int b = (a - 20) * 2; return b > 0x7FFFFFF0; }
Do the parenthesis matter in C? They do matter in say Fortran, but in C I think (a - 20) + (b - 20) can be evaluated as (a + b) + (-20 + -20) or a - 20 - 20 + b etc.
(In reply to comment #5) > Do the parenthesis matter in C? They do matter in say Fortran, but in C I > think > (a - 20) + (b - 20) can be evaluated as (a + b) + (-20 + -20) or a - 20 - 20 + > b > etc. In fact, a-20 + c will be valid even if a+c overflows but (a-20) + c does not. There are comments in the standard somewhere which makes a mention of this.
From C99, 5.1.2.3/14: 14 EXAMPLE 6 To illustrate the grouping behavior of expressions, in the following fragment int a, b; /* ... */ a = a + 32760 + b + 5; the expression statement behaves exactly the same as a = (((a + 32760) + b) + 5); due to the associativity and precedence of these operators. Thus, the result of the sum (a + 32760) is next added to b, and that result is then added to 5 which results in the value assigned to a. On a machine in which overflows produce an explicit trap and in which the range of values representable by an int is [−32768, +32767], the implementation cannot rewrite this expression as a = ((a + b) + 32765); since if the values for a and b were, respectively, −32754 and −15, the sum a + b would produce a trap while the original expression would not; nor can the expression be rewritten either as a = ((a + 32765) + b); or a = (a + (b + 32765)); since the values for a and b might have been, respectively, 4 and −8 or −17 and 12. However, on a machine in which overflow silently generates some value and where positive and negative overflows cancel, the above expression statement can be rewritten by the implementation in any of the above ways because the same result will occur. That is most explict thing about overflow and groupping. In C, every expression has an implicate parenthesises.
Subject: Re: [4.1/4.2/4.3 Regression] Wrong variable ranges due to constant folding "jakub at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes: | Do the parenthesis matter in C? Yes, see paragraphes 11 dowward in the section 5.1.2.3. | They do matter in say Fortran, but in C I | think | (a - 20) + (b - 20) can be evaluated as (a + b) + (-20 + -20) or a - 20 - 20 + | b while that is permitted by the fortran standard, it is my understanding that *existing* practice has fortran compilers refrain from messing with parenthesis. However, this has become a middle-end issue; so I guess I must voice the concern for languages where unconditional re-association is not permitted. The C and C++ languages are example of those. -- Gaby
Subject: Re: [4.1/4.2/4.3 Regression] Wrong variable ranges due to constant folding "pinskia at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes: [...] | That is most explict thing about overflow and groupping. In C, every | expression has an implicate parenthesises. Yes. Note. however, that if people don't insist on signed integer arithmetic overflow being undefined, but intead define it as wrapping, they can reassociate and as they wish and the result won't change. -- Gaby
Mine.
Subject: Bug 30364 Author: rguenth Date: Wed Feb 28 21:56:41 2007 New Revision: 122414 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=122414 Log: 2007-02-28 Richard Guenther <rguenther@suse.de> PR middle-end/30364 * fold-const.c (fold_binary): Do not associate expressions with more than one variable for integer types that do not wrap. * gcc.dg/torture/pr30364-1.c: New testcase. * gcc.dg/torture/pr30364-2.c: Likewise. * gcc.dg/torture/pr30364-3.c: Likewise. Added: trunk/gcc/testsuite/gcc.dg/torture/pr30364-1.c trunk/gcc/testsuite/gcc.dg/torture/pr30364-2.c trunk/gcc/testsuite/gcc.dg/torture/pr30364-3.c Modified: trunk/gcc/ChangeLog trunk/gcc/fold-const.c trunk/gcc/testsuite/ChangeLog
Fixed on the mainline.
Subject: Bug 30364 Author: rguenth Date: Mon Mar 5 13:15:25 2007 New Revision: 122548 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=122548 Log: 2007-03-05 Richard Guenther <rguenther@suse.de> Backport from mainline: 2007-02-28 Richard Guenther <rguenther@suse.de> PR middle-end/30364 * fold-const.c (fold_binary): Do not associate expressions with more than one variable for integer types that do not wrap. * gcc.dg/torture/pr30364-1.c: New testcase. * gcc.dg/torture/pr30364-2.c: Likewise. * gcc.dg/torture/pr30364-3.c: Likewise. Added: branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/torture/pr30364-1.c - copied unchanged from r122414, trunk/gcc/testsuite/gcc.dg/torture/pr30364-1.c branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/torture/pr30364-2.c - copied unchanged from r122414, trunk/gcc/testsuite/gcc.dg/torture/pr30364-2.c branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/torture/pr30364-3.c - copied unchanged from r122414, trunk/gcc/testsuite/gcc.dg/torture/pr30364-3.c Modified: branches/gcc-4_2-branch/gcc/ChangeLog branches/gcc-4_2-branch/gcc/fold-const.c branches/gcc-4_2-branch/gcc/testsuite/ChangeLog
And the 4.2 branch.
Subject: Bug 30364 Author: rguenth Date: Thu Mar 15 18:09:26 2007 New Revision: 122956 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=122956 Log: 2007-03-15 Richard Guenther <rguenther@suse.de> Backport from mainline 2007-02-28 Richard Guenther <rguenther@suse.de> PR middle-end/30364 * fold-const.c (fold_binary): Do not associate expressions with more than one variable for integer types that do not wrap. * gcc.dg/torture/pr30364-1.c: New testcase. * gcc.dg/torture/pr30364-2.c: Likewise. * gcc.dg/torture/pr30364-3.c: Likewise. Added: branches/gcc-4_1-branch/gcc/testsuite/gcc.dg/torture/pr30364-1.c - copied unchanged from r122548, branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/torture/pr30364-1.c branches/gcc-4_1-branch/gcc/testsuite/gcc.dg/torture/pr30364-2.c - copied unchanged from r122548, branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/torture/pr30364-2.c branches/gcc-4_1-branch/gcc/testsuite/gcc.dg/torture/pr30364-3.c - copied unchanged from r122548, branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/torture/pr30364-3.c Modified: branches/gcc-4_1-branch/gcc/ChangeLog branches/gcc-4_1-branch/gcc/fold-const.c branches/gcc-4_1-branch/gcc/testsuite/ChangeLog
Fixed.
Does this make sense to do this for POINTER_TYPE_P in 4.1/4.2, which reassociate in many other places? I belive in 4.3 that changed with POINTER_PLUS_EXPR stuff. struct A { int i; int v[10000]; }; int *foo (struct A *a, int x) { return &a->v[x - 2147483646]; } in 4.2/4.1 gives: foo (a, x) { <bb 2>: return &a->v[0] - 4294967288B + (int *) ((unsigned int) x * 4); } so it will overflow (well, underflow). E.g. on Linux kernel code this commit causes some stack size usage regressions.
Maybe - I was just conservative here to not possibly introduce new overflow with the re-association as elsewhere we may rely on the undefinedness of overflow. Another way to fix this particular regression is to disable the decomposing of &a->v[x - 2147483646] to pointer arithmetic in the C frontend here: /* For &x[y], return x+y */ if (TREE_CODE (arg) == ARRAY_REF) { tree op0 = TREE_OPERAND (arg, 0); if (!c_mark_addressable (op0)) return error_mark_node; return build_binary_op (PLUS_EXPR, (TREE_CODE (TREE_TYPE (op0)) == ARRAY_TYPE ? array_to_pointer_conversion (op0) : op0), TREE_OPERAND (arg, 1), 1); } like I tried a couple of times.