Bug 30364 - [4.1 Regression] Wrong variable ranges due to constant folding
: [4.1 Regression] Wrong variable ranges due to constant folding
Status: RESOLVED FIXED
Product: gcc
Classification: Unclassified
Component: middle-end
: 4.1.2
: P1 blocker
: 4.1.3
Assigned To: Richard Guenther
:
: wrong-code
:
:
  Show dependency treegraph
 
Reported: 2007-01-04 09:16 UTC by Guillaume Melquiond
Modified: 2007-08-24 16:40 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail: 4.1.2
Last reconfirmed: 2007-02-21 14:24:36


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Guillaume Melquiond 2007-01-04 09:16:35 UTC
Testcase: (compiled with -O2 at least)

int f(int a, int b)
{
  if (a > 0x7FFFFFF0) return 0;
  if (b > 0x7FFFFFF0) return 0;

  int c = (a - 20) + (b - 20);
  return c > 0x7FFFFFF0;
}

GCC 4.1.2 and 4.3.0 (snapshot from 2006-12-17) optimizes the whole function to
a single "return 0;". This would be correct if the function was actually
written with "c = a + b - 40" under a non-overflow assumption. GCC could indeed
deduce that c is no bigger than 0x7FFFFFFF - 40.

But as the function was originally written, this property does not hold any
longer. For example, a = 0x7FFFFFF0 and b = 41 will not cause any overflow
during computations, and the last conditional shall hence evaluate to true.

The problem is that GCC performs VRP with C language semantic (undefined
behavior on overflow) on code that is no longer the input as written by the
user; so this semantic is not valid at that point. The user input should not
have undergone a transformation based on associativity.

Tested with Debian packages. GCC 3.3.6, 3.4.6, and 4.0.4 generate correct code.
GCC 4.1.2 and 4.3.0 generates wrong code. As the expression "a + b - 40" is
generated early, I suppose any GCC with VRP would produce wrong code for this
testcase.
Comment 1 Guillaume Melquiond 2007-01-04 10:15:15 UTC
Is it really a middle-end issue?

It could also be seen as a front-end issue, as it does produce "a + b - 40",
doesn't it? If the front-end had given "(a - 20) + (b - 20)" to the middle-end,
then the correct ranges would have been computed. When adjusting the testcase
in such a way, GCC does generate correct code:

int f(int a, int b)
{
  if (a > 0x7FFFFFF0) return 0;
  if (b > 0x7FFFFFF0) return 0;

  int d = a - 20;
  int e = b - 20;
  int c = d + e;
  return c > 0x7FFFFFF0;
}
Comment 2 Andrew Pinski 2007-01-04 10:23:19 UTC
(In reply to comment #1)
> Is it really a middle-end issue?
Yes because fold is part of the middle-end that is most likely causing this
being translated into "a + b - 40".

And this is the reason why the way wrote it the second way works is because
fold does not see the full expression so it does not do the folding.
Comment 3 Andrew Pinski 2007-01-04 10:26:23 UTC
This most likely comes from the "    associate:" case in fold_binary where we
reassociate the PLUS_EXPR and MINUS_EXPR and then fold the -20 + -20 into -40.
Comment 4 Guillaume Melquiond 2007-01-04 11:25:32 UTC
Just for the sake of completeness. Wrong code is also generated when addition
and multiplication are mixed, because of distributivity:

int f(int a)
{
  if (a > 0x7FFFFFF0) return 0;
  int b = (a - 20) * 2;
  return b > 0x7FFFFFF0;
}
Comment 5 Jakub Jelinek 2007-01-05 19:06:22 UTC
Do the parenthesis matter in C?  They do matter in say Fortran, but in C I
think
(a - 20) + (b - 20) can be evaluated as (a + b) + (-20 + -20) or a - 20 - 20 +
b
etc.
Comment 6 Andrew Pinski 2007-01-05 19:58:35 UTC
(In reply to comment #5)
> Do the parenthesis matter in C?  They do matter in say Fortran, but in C I
> think
> (a - 20) + (b - 20) can be evaluated as (a + b) + (-20 + -20) or a - 20 - 20 +
> b
> etc.

In fact, a-20 + c will be valid even if a+c overflows but (a-20) + c does not. 
There are comments in the standard somewhere which makes a mention of this. 
Comment 7 Andrew Pinski 2007-01-05 20:32:37 UTC
From C99, 5.1.2.3/14:
14 EXAMPLE 6 To illustrate the grouping behavior of expressions, in the
following fragment
int a, b;
/* ... */
a = a + 32760 + b + 5;
the expression statement behaves exactly the same as
a = (((a + 32760) + b) + 5);
due to the associativity and precedence of these operators. Thus, the result of
the sum (a + 32760) is
next added to b, and that result is then added to 5 which results in the value
assigned to a. On a machine in
which overflows produce an explicit trap and in which the range of values
representable by an int is
[−32768, +32767], the implementation cannot rewrite this expression as
a = ((a + b) + 32765);
since if the values for a and b were, respectively, −32754 and −15,
the sum a + b would produce a trap
while the original expression would not; nor can the expression be rewritten
either as
a = ((a + 32765) + b);
or
a = (a + (b + 32765));
since the values for a and b might have been, respectively, 4 and −8 or
−17 and 12. However, on a machine
in which overflow silently generates some value and where positive and negative
overflows cancel, the
above expression statement can be rewritten by the implementation in any of the
above ways because the
same result will occur.

That is most explict thing about overflow and groupping.  In C, every
expression has an implicate parenthesises.
Comment 8 Gabriel Dos Reis 2007-01-05 21:06:42 UTC
Subject: Re:  [4.1/4.2/4.3 Regression] Wrong variable ranges due to constant
folding

"jakub at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes:

| Do the parenthesis matter in C?

Yes, see paragraphes 11 dowward in the section 5.1.2.3.

|  They do matter in say Fortran, but in C I
| think
| (a - 20) + (b - 20) can be evaluated as (a + b) + (-20 + -20) or a - 20 - 20
+
| b

while that is permitted by the fortran standard, it is my understanding
that *existing* practice has fortran compilers refrain from messing
with parenthesis.

However, this has become a middle-end issue; so I guess I must voice
the concern for languages where unconditional re-association is not
permitted.  The C and C++ languages are example of those.


-- Gaby
Comment 9 Gabriel Dos Reis 2007-01-05 21:09:11 UTC
Subject: Re:  [4.1/4.2/4.3 Regression] Wrong variable ranges due to constant
folding

"pinskia at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes:


[...]

| That is most explict thing about overflow and groupping.  In C, every
| expression has an implicate parenthesises.

Yes.

Note. however, that if people don't insist on signed integer
arithmetic overflow being undefined, but intead define it as wrapping,
they can reassociate and as they wish and the result won't change.

-- Gaby
Comment 10 Richard Guenther 2007-02-21 14:24:36 UTC
Mine.
Comment 11 Richard Guenther 2007-02-28 21:56:53 UTC
Subject: Bug 30364

Author: rguenth
Date: Wed Feb 28 21:56:41 2007
New Revision: 122414

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=122414
Log:
2007-02-28  Richard Guenther  <rguenther@suse.de>

    PR middle-end/30364
    * fold-const.c (fold_binary): Do not associate expressions
    with more than one variable for integer types that do not wrap.

    * gcc.dg/torture/pr30364-1.c: New testcase.
    * gcc.dg/torture/pr30364-2.c: Likewise.
    * gcc.dg/torture/pr30364-3.c: Likewise.

Added:
    trunk/gcc/testsuite/gcc.dg/torture/pr30364-1.c
    trunk/gcc/testsuite/gcc.dg/torture/pr30364-2.c
    trunk/gcc/testsuite/gcc.dg/torture/pr30364-3.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/fold-const.c
    trunk/gcc/testsuite/ChangeLog
Comment 12 Richard Guenther 2007-02-28 23:03:43 UTC
Fixed on the mainline.
Comment 13 Richard Guenther 2007-03-05 13:15:43 UTC
Subject: Bug 30364

Author: rguenth
Date: Mon Mar  5 13:15:25 2007
New Revision: 122548

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=122548
Log:
2007-03-05  Richard Guenther  <rguenther@suse.de>

    Backport from mainline:
    2007-02-28  Richard Guenther  <rguenther@suse.de>

    PR middle-end/30364
    * fold-const.c (fold_binary): Do not associate expressions
    with more than one variable for integer types that do not wrap.

    * gcc.dg/torture/pr30364-1.c: New testcase.
    * gcc.dg/torture/pr30364-2.c: Likewise.
    * gcc.dg/torture/pr30364-3.c: Likewise.

Added:
    branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/torture/pr30364-1.c
      - copied unchanged from r122414,
trunk/gcc/testsuite/gcc.dg/torture/pr30364-1.c
    branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/torture/pr30364-2.c
      - copied unchanged from r122414,
trunk/gcc/testsuite/gcc.dg/torture/pr30364-2.c
    branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/torture/pr30364-3.c
      - copied unchanged from r122414,
trunk/gcc/testsuite/gcc.dg/torture/pr30364-3.c
Modified:
    branches/gcc-4_2-branch/gcc/ChangeLog
    branches/gcc-4_2-branch/gcc/fold-const.c
    branches/gcc-4_2-branch/gcc/testsuite/ChangeLog
Comment 14 Richard Guenther 2007-03-05 13:16:57 UTC
And the 4.2 branch.
Comment 15 Richard Guenther 2007-03-15 18:09:44 UTC
Subject: Bug 30364

Author: rguenth
Date: Thu Mar 15 18:09:26 2007
New Revision: 122956

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=122956
Log:
2007-03-15  Richard Guenther  <rguenther@suse.de>

        Backport from mainline
        2007-02-28  Richard Guenther  <rguenther@suse.de>

    PR middle-end/30364
    * fold-const.c (fold_binary): Do not associate expressions
    with more than one variable for integer types that do not wrap.

    * gcc.dg/torture/pr30364-1.c: New testcase.
    * gcc.dg/torture/pr30364-2.c: Likewise.
    * gcc.dg/torture/pr30364-3.c: Likewise.

Added:
    branches/gcc-4_1-branch/gcc/testsuite/gcc.dg/torture/pr30364-1.c
      - copied unchanged from r122548,
branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/torture/pr30364-1.c
    branches/gcc-4_1-branch/gcc/testsuite/gcc.dg/torture/pr30364-2.c
      - copied unchanged from r122548,
branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/torture/pr30364-2.c
    branches/gcc-4_1-branch/gcc/testsuite/gcc.dg/torture/pr30364-3.c
      - copied unchanged from r122548,
branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/torture/pr30364-3.c
Modified:
    branches/gcc-4_1-branch/gcc/ChangeLog
    branches/gcc-4_1-branch/gcc/fold-const.c
    branches/gcc-4_1-branch/gcc/testsuite/ChangeLog
Comment 16 Richard Guenther 2007-03-15 18:10:03 UTC
Fixed.
Comment 17 Jakub Jelinek 2007-08-24 16:33:06 UTC
Does this make sense to do this for POINTER_TYPE_P in 4.1/4.2, which
reassociate
in many other places?
I belive in 4.3 that changed with POINTER_PLUS_EXPR stuff.

struct A { int i; int v[10000]; };
int *foo (struct A *a, int x)
{
  return &a->v[x - 2147483646];
}

in 4.2/4.1 gives:

foo (a, x)
{
<bb 2>:
  return &a->v[0] - 4294967288B + (int *) ((unsigned int) x * 4);

}

so it will overflow (well, underflow).

E.g. on Linux kernel code this commit causes some stack size usage regressions.
Comment 18 Richard Guenther 2007-08-24 16:40:39 UTC
Maybe - I was just conservative here to not possibly introduce new overflow
with the re-association as elsewhere we may rely on the undefinedness of
overflow.

Another way to fix this particular regression is to disable the decomposing
of &a->v[x - 2147483646] to pointer arithmetic in the C frontend here:

      /* For &x[y], return x+y */
      if (TREE_CODE (arg) == ARRAY_REF)
        {
          tree op0 = TREE_OPERAND (arg, 0);
          if (!c_mark_addressable (op0))
            return error_mark_node;
          return build_binary_op (PLUS_EXPR,
                                  (TREE_CODE (TREE_TYPE (op0)) == ARRAY_TYPE
                                   ? array_to_pointer_conversion (op0)
                                   : op0),
                                  TREE_OPERAND (arg, 1), 1);
        }

like I tried a couple of times.