Bug 30364 - [4.1 Regression] Wrong variable ranges due to constant folding
Summary: [4.1 Regression] Wrong variable ranges due to constant folding
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.1.2
: P1 blocker
Target Milestone: 4.1.3
Assignee: Richard Biener
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2007-01-04 09:16 UTC by Guillaume Melquiond
Modified: 2007-08-24 16:40 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail: 4.1.2
Last reconfirmed: 2007-02-21 14:24:36


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Guillaume Melquiond 2007-01-04 09:16:35 UTC
Testcase: (compiled with -O2 at least)

int f(int a, int b)
{
  if (a > 0x7FFFFFF0) return 0;
  if (b > 0x7FFFFFF0) return 0;

  int c = (a - 20) + (b - 20);
  return c > 0x7FFFFFF0;
}

GCC 4.1.2 and 4.3.0 (snapshot from 2006-12-17) optimizes the whole function to a single "return 0;". This would be correct if the function was actually written with "c = a + b - 40" under a non-overflow assumption. GCC could indeed deduce that c is no bigger than 0x7FFFFFFF - 40.

But as the function was originally written, this property does not hold any longer. For example, a = 0x7FFFFFF0 and b = 41 will not cause any overflow during computations, and the last conditional shall hence evaluate to true.

The problem is that GCC performs VRP with C language semantic (undefined behavior on overflow) on code that is no longer the input as written by the user; so this semantic is not valid at that point. The user input should not have undergone a transformation based on associativity.

Tested with Debian packages. GCC 3.3.6, 3.4.6, and 4.0.4 generate correct code. GCC 4.1.2 and 4.3.0 generates wrong code. As the expression "a + b - 40" is generated early, I suppose any GCC with VRP would produce wrong code for this testcase.
Comment 1 Guillaume Melquiond 2007-01-04 10:15:15 UTC
Is it really a middle-end issue?

It could also be seen as a front-end issue, as it does produce "a + b - 40", doesn't it? If the front-end had given "(a - 20) + (b - 20)" to the middle-end, then the correct ranges would have been computed. When adjusting the testcase in such a way, GCC does generate correct code:

int f(int a, int b)
{
  if (a > 0x7FFFFFF0) return 0;
  if (b > 0x7FFFFFF0) return 0;

  int d = a - 20;
  int e = b - 20;
  int c = d + e;
  return c > 0x7FFFFFF0;
}
Comment 2 Andrew Pinski 2007-01-04 10:23:19 UTC
(In reply to comment #1)
> Is it really a middle-end issue?
Yes because fold is part of the middle-end that is most likely causing this being translated into "a + b - 40".

And this is the reason why the way wrote it the second way works is because fold does not see the full expression so it does not do the folding.
Comment 3 Andrew Pinski 2007-01-04 10:26:23 UTC
This most likely comes from the "    associate:" case in fold_binary where we reassociate the PLUS_EXPR and MINUS_EXPR and then fold the -20 + -20 into -40.
Comment 4 Guillaume Melquiond 2007-01-04 11:25:32 UTC
Just for the sake of completeness. Wrong code is also generated when addition and multiplication are mixed, because of distributivity:

int f(int a)
{
  if (a > 0x7FFFFFF0) return 0;
  int b = (a - 20) * 2;
  return b > 0x7FFFFFF0;
}
Comment 5 Jakub Jelinek 2007-01-05 19:06:22 UTC
Do the parenthesis matter in C?  They do matter in say Fortran, but in C I think
(a - 20) + (b - 20) can be evaluated as (a + b) + (-20 + -20) or a - 20 - 20 + b
etc.
Comment 6 Andrew Pinski 2007-01-05 19:58:35 UTC
(In reply to comment #5)
> Do the parenthesis matter in C?  They do matter in say Fortran, but in C I
> think
> (a - 20) + (b - 20) can be evaluated as (a + b) + (-20 + -20) or a - 20 - 20 +
> b
> etc.

In fact, a-20 + c will be valid even if a+c overflows but (a-20) + c does not.  There are comments in the standard somewhere which makes a mention of this. 
Comment 7 Andrew Pinski 2007-01-05 20:32:37 UTC
From C99, 5.1.2.3/14:
14 EXAMPLE 6 To illustrate the grouping behavior of expressions, in the following fragment
int a, b;
/* ... */
a = a + 32760 + b + 5;
the expression statement behaves exactly the same as
a = (((a + 32760) + b) + 5);
due to the associativity and precedence of these operators. Thus, the result of the sum (a + 32760) is
next added to b, and that result is then added to 5 which results in the value assigned to a. On a machine in
which overflows produce an explicit trap and in which the range of values representable by an int is
[−32768, +32767], the implementation cannot rewrite this expression as
a = ((a + b) + 32765);
since if the values for a and b were, respectively, −32754 and −15, the sum a + b would produce a trap
while the original expression would not; nor can the expression be rewritten either as
a = ((a + 32765) + b);
or
a = (a + (b + 32765));
since the values for a and b might have been, respectively, 4 and −8 or −17 and 12. However, on a machine
in which overflow silently generates some value and where positive and negative overflows cancel, the
above expression statement can be rewritten by the implementation in any of the above ways because the
same result will occur.

That is most explict thing about overflow and groupping.  In C, every expression has an implicate parenthesises.
Comment 8 Gabriel Dos Reis 2007-01-05 21:06:42 UTC
Subject: Re:  [4.1/4.2/4.3 Regression] Wrong variable ranges due to constant folding

"jakub at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes:

| Do the parenthesis matter in C?

Yes, see paragraphes 11 dowward in the section 5.1.2.3.

|  They do matter in say Fortran, but in C I
| think
| (a - 20) + (b - 20) can be evaluated as (a + b) + (-20 + -20) or a - 20 - 20 +
| b

while that is permitted by the fortran standard, it is my understanding
that *existing* practice has fortran compilers refrain from messing
with parenthesis.

However, this has become a middle-end issue; so I guess I must voice
the concern for languages where unconditional re-association is not
permitted.  The C and C++ languages are example of those.


-- Gaby
Comment 9 Gabriel Dos Reis 2007-01-05 21:09:11 UTC
Subject: Re:  [4.1/4.2/4.3 Regression] Wrong variable ranges due to constant folding

"pinskia at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes:


[...]

| That is most explict thing about overflow and groupping.  In C, every
| expression has an implicate parenthesises.

Yes.

Note. however, that if people don't insist on signed integer
arithmetic overflow being undefined, but intead define it as wrapping,
they can reassociate and as they wish and the result won't change.

-- Gaby
Comment 10 Richard Biener 2007-02-21 14:24:36 UTC
Mine.
Comment 11 Richard Biener 2007-02-28 21:56:53 UTC
Subject: Bug 30364

Author: rguenth
Date: Wed Feb 28 21:56:41 2007
New Revision: 122414

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=122414
Log:
2007-02-28  Richard Guenther  <rguenther@suse.de>

	PR middle-end/30364
	* fold-const.c (fold_binary): Do not associate expressions
	with more than one variable for integer types that do not wrap.

	* gcc.dg/torture/pr30364-1.c: New testcase.
	* gcc.dg/torture/pr30364-2.c: Likewise.
	* gcc.dg/torture/pr30364-3.c: Likewise.

Added:
    trunk/gcc/testsuite/gcc.dg/torture/pr30364-1.c
    trunk/gcc/testsuite/gcc.dg/torture/pr30364-2.c
    trunk/gcc/testsuite/gcc.dg/torture/pr30364-3.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/fold-const.c
    trunk/gcc/testsuite/ChangeLog

Comment 12 Richard Biener 2007-02-28 23:03:43 UTC
Fixed on the mainline.
Comment 13 Richard Biener 2007-03-05 13:15:43 UTC
Subject: Bug 30364

Author: rguenth
Date: Mon Mar  5 13:15:25 2007
New Revision: 122548

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=122548
Log:
2007-03-05  Richard Guenther  <rguenther@suse.de>

	Backport from mainline:
	2007-02-28  Richard Guenther  <rguenther@suse.de>

	PR middle-end/30364
	* fold-const.c (fold_binary): Do not associate expressions
	with more than one variable for integer types that do not wrap.

	* gcc.dg/torture/pr30364-1.c: New testcase.
	* gcc.dg/torture/pr30364-2.c: Likewise.
	* gcc.dg/torture/pr30364-3.c: Likewise.

Added:
    branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/torture/pr30364-1.c
      - copied unchanged from r122414, trunk/gcc/testsuite/gcc.dg/torture/pr30364-1.c
    branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/torture/pr30364-2.c
      - copied unchanged from r122414, trunk/gcc/testsuite/gcc.dg/torture/pr30364-2.c
    branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/torture/pr30364-3.c
      - copied unchanged from r122414, trunk/gcc/testsuite/gcc.dg/torture/pr30364-3.c
Modified:
    branches/gcc-4_2-branch/gcc/ChangeLog
    branches/gcc-4_2-branch/gcc/fold-const.c
    branches/gcc-4_2-branch/gcc/testsuite/ChangeLog

Comment 14 Richard Biener 2007-03-05 13:16:57 UTC
And the 4.2 branch.
Comment 15 Richard Biener 2007-03-15 18:09:44 UTC
Subject: Bug 30364

Author: rguenth
Date: Thu Mar 15 18:09:26 2007
New Revision: 122956

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=122956
Log:
2007-03-15  Richard Guenther  <rguenther@suse.de>

        Backport from mainline
        2007-02-28  Richard Guenther  <rguenther@suse.de>

	PR middle-end/30364
	* fold-const.c (fold_binary): Do not associate expressions
	with more than one variable for integer types that do not wrap.

	* gcc.dg/torture/pr30364-1.c: New testcase.
	* gcc.dg/torture/pr30364-2.c: Likewise.
	* gcc.dg/torture/pr30364-3.c: Likewise.

Added:
    branches/gcc-4_1-branch/gcc/testsuite/gcc.dg/torture/pr30364-1.c
      - copied unchanged from r122548, branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/torture/pr30364-1.c
    branches/gcc-4_1-branch/gcc/testsuite/gcc.dg/torture/pr30364-2.c
      - copied unchanged from r122548, branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/torture/pr30364-2.c
    branches/gcc-4_1-branch/gcc/testsuite/gcc.dg/torture/pr30364-3.c
      - copied unchanged from r122548, branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/torture/pr30364-3.c
Modified:
    branches/gcc-4_1-branch/gcc/ChangeLog
    branches/gcc-4_1-branch/gcc/fold-const.c
    branches/gcc-4_1-branch/gcc/testsuite/ChangeLog

Comment 16 Richard Biener 2007-03-15 18:10:03 UTC
Fixed.
Comment 17 Jakub Jelinek 2007-08-24 16:33:06 UTC
Does this make sense to do this for POINTER_TYPE_P in 4.1/4.2, which reassociate
in many other places?
I belive in 4.3 that changed with POINTER_PLUS_EXPR stuff.

struct A { int i; int v[10000]; };
int *foo (struct A *a, int x)
{
  return &a->v[x - 2147483646];
}

in 4.2/4.1 gives:

foo (a, x)
{
<bb 2>:
  return &a->v[0] - 4294967288B + (int *) ((unsigned int) x * 4);

}

so it will overflow (well, underflow).

E.g. on Linux kernel code this commit causes some stack size usage regressions.
Comment 18 Richard Biener 2007-08-24 16:40:39 UTC
Maybe - I was just conservative here to not possibly introduce new overflow
with the re-association as elsewhere we may rely on the undefinedness of overflow.

Another way to fix this particular regression is to disable the decomposing
of &a->v[x - 2147483646] to pointer arithmetic in the C frontend here:

      /* For &x[y], return x+y */
      if (TREE_CODE (arg) == ARRAY_REF)
        {
          tree op0 = TREE_OPERAND (arg, 0);
          if (!c_mark_addressable (op0))
            return error_mark_node;
          return build_binary_op (PLUS_EXPR,
                                  (TREE_CODE (TREE_TYPE (op0)) == ARRAY_TYPE
                                   ? array_to_pointer_conversion (op0)
                                   : op0),
                                  TREE_OPERAND (arg, 1), 1);
        }

like I tried a couple of times.