Bug 49419 - [4.6/4.7 regression] gcc -O2 miscompiles gp2c
Summary: [4.6/4.7 regression] gcc -O2 miscompiles gp2c
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.6.0
: P3 normal
Target Milestone: 4.6.1
Assignee: Not yet assigned to anyone
Keywords: wrong-code
Depends on:
Reported: 2011-06-15 11:11 UTC by Matthias Klose
Modified: 2011-06-16 18:18 UTC (History)
2 users (show)

See Also:
Known to work: 4.4.6, 4.5.3
Known to fail: 4.6.0, 4.7.0
Last reconfirmed: 2011-06-15 16:02:59

gcc47-vrp.patch (682 bytes, patch)
2011-06-16 18:18 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Matthias Klose 2011-06-15 11:11:09 UTC
[forwarded from http://bugs.debian.org/630567]

fails with -O2/-O3, works with -O1/-Os with 20110611 4.6 branch, 20110531 trunk, apparently on {powerpc,mips,mipsel,s390}-linux.

gcc-4.6 miscompiles gp2c on several platforms including powerpc, see 

On powerpc (32 bit):
The function 'listostack' in src/utils.c is miscompiled with -O2.
Please find a testcase.

gcc-4.6 -O2 -Wall gccbug.c -o gccbug && ./gccbug
zsh: segmentation fault  ./gccbug
gcc-4.6 -O1 -Wall gccbug.c -o gccbug && ./gccbug
0: 1
1: 2

#include <stdio.h>
#include <stdlib.h>

typedef struct
  enum {LEAF=0, NODE=1} f;            /*node function*/
  int x;                              /*node left child*/
  int y;                              /*node right child*/
} node; 

node *tree;

int listtostack(int n, int f, int *stack, int nbmax)
  int x,i,nb;
  if (n==-1) return 0;
  for(x=n,i=0;tree[x].f==f && i<nbmax;x=tree[x].x,i++);
  if (i==nbmax)
  return nb;

#define STACKSZ 3

int main(void)
  int stack[STACKSZ];
  int i,nb;
  tree = (node *)calloc(STACKSZ,sizeof(node));
  tree[0].f = NODE;
  tree[0].x = 1;
  tree[0].y = 2;
    printf("%d: %d\n",i,stack[i]);
  return 0;
Comment 1 Mikael Pettersson 2011-06-15 11:54:27 UTC
Same breakage also seen on m68k-linux.
Comment 2 Jakub Jelinek 2011-06-15 11:55:40 UTC
Confirmed, seems VRP2 removes changes the
loop into endless loop.  Looking into it.
Comment 3 Jakub Jelinek 2011-06-15 12:19:00 UTC
So, i_47 which is the i value on entry of the second loop is determined to be
[0, +INF(OVF)] and something goes wrong afterwards.  The bug will be there, but
I should note that VRP should easily prove that i never overflows.
Simpler testcases for that:
void bar (void);

void foo (int x, int y)
  if (x < y)
      if (x == __INT_MAX__)
        bar ();

void baz (int x, int y, int z)
  if (x < y || x < z)
      if (x == __INT_MAX__)
        bar ();

Here IMNSHO vrp should optimize the call to bar away, as when x is smaller than some variable of the same type, it can't be the maximum of that type, because even if the other number is __INT_MAX__, x must be smaller than that.
Comment 4 Jakub Jelinek 2011-06-15 12:29:51 UTC
Perhaps even for:
void bar (void);
int test (int);

int fn (int x, int y)
  int i;
  for (i = 0; i < y && test (i); i++)
  if (i == y)
  if (i == __INT_MAX__)
    bar ();

we should be able to determine that bar can't be called.  In the loop body
i is known to be i < y, after the loop i <= y but the i == y test rules out the equality again, so we end up again with i < y which is certainly i < __INT_MAX__.
Comment 5 Jakub Jelinek 2011-06-15 14:59:50 UTC
Anyway, the bug is elsewhere, in particular I'd say the bug is that we use the normal SSA_NAME as init value of the second loop instead of the SSA_NAME initialized from ASSERT_EXPR for that.

The interesting stmts are:
  # iD.1254_47 = PHI <[pr49419.c : 15:67] iD.1254_70(6), [pr49419.c : 15:67] iD.1254_69(8), 0(3), 0(21)>
  iD.1254_86 = ASSERT_EXPR <iD.1254_47, iD.1254_47 != nbmaxD.1250_17(D)>;
  iD.1254_85 = ASSERT_EXPR <iD.1254_86, iD.1254_86 > 0>;
  # iD.1254_50 = PHI <[pr49419.c : 19:60] iD.1254_40(18), iD.1254_85(12)>
and the second loop sees i_85 as the initial value.  Now, first the i_47
setter is walked, with just one of the edges with value 0 executable, so i_47
has [0, 0] range there.  Then i_86 is walked, and as var's range is VR_RANGE,
while limit range is VR_ANTI_RANGE, surprisingly symbolic range
~[nbmax_17(D), nbmax_17(D)] wins.  I'd think [0, 0] would be much better to return, both because it is not symbolic, is quite narrow and is range instead of anti range.  After a while, i_50 stmt is walked several times, including using
adjust_range_with_scev, which uses i_47 instead of i_85 as init for some reason.
At that point it sees it has [0, 0] range and so computes something from that.
After a while, i_47 stmt is visited several times, last time resulting into
[0, +INF(OVF)] range (see other comments that it ought to be improved, but uninteresting for this bug). So, the PHI setting i_47 is returning
SSA_PROP_INTERESTING, which results in all succ edges of that bb to be marked for revisiting.  We revisit i_86 stmt, but that again returns the same
(uninteresting) ~[nbmax_17(D), nbmax_17(D)] range, which also means that
it is SSA_PROP_NOT_INTERESTING.  That bb contains just:
  [pr49419.c : 18:6] nbD.1255_24 = iD.1254_86 + 1;
  [pr49419.c : 19:3] if (iD.1254_86 > 0)
after it, but as i_86's VR didn't change, no other VRs in that bb change either,
which means none of the succ bb's of this bb are queued for revisiting.
So, I think we should arrange for the actual current ASSERT_EXPR SSA_NAME to be used as init during scev if possible, instead of the original one, and furthermore I believe extract_range_from_assert_expr should do a better job.
Comment 6 Jakub Jelinek 2011-06-16 07:45:22 UTC
Author: jakub
Date: Thu Jun 16 07:45:17 2011
New Revision: 175092

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=175092
	PR tree-optimization/49419
	* tree-vrp.c (execute_vrp): Call init_range_assertions
	before estimate_numbers_of_iterations, call
	free_number_of_iterations_estimates before calling

	* gcc.c-torture/execute/pr49419.c: New test.

Comment 7 Jakub Jelinek 2011-06-16 07:54:46 UTC
Author: jakub
Date: Thu Jun 16 07:54:43 2011
New Revision: 175095

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=175095
	PR tree-optimization/49419
	* tree-vrp.c (execute_vrp): Call init_range_assertions
	before estimate_numbers_of_iterations, call
	free_number_of_iterations_estimates before calling

	* gcc.c-torture/execute/pr49419.c: New test.

Comment 8 Jakub Jelinek 2011-06-16 08:17:46 UTC
The bug should be fixed now, though VRP should be cerrtainly improved in various ways, including choosing better vr during extract_range_from_assert, trying to derive numeric ranges from symbolic ranges x - 1 maximums or x + 1 minimums,
and IMHO adjust_range_for_scev could use number_of_latch_executions to see if
the number is simple enough and a smaller iteration count can be derived from
its value ranges or, e.g. if it is the init SSA_NAME itself or that plus/minus a constant, it could determine it doesn't decrease until -INF but only until 0 +- constant.
Comment 9 Jakub Jelinek 2011-06-16 18:18:55 UTC
Created attachment 24548 [details]

So, I've tried this patch for extract_range_from_assert.
It bootstrapped/regtested on x86_64-linux and i686-linux, but regressed
FAIL: g++.dg/tree-ssa/pr18178.C scan-tree-dump-times vrp1 "if " 1
(just the vrp1 dump checking and code removal, the assembly is the same at the end).
The VR is just different (- is vanilla, + is with the patch).
 Value ranges after VRP:
-i_1: [0, +INF(OVF)]
+i_1: [0, +INF]
 a_3(D): VARYING
 D.2084_4: VARYING
 D.2090_7: VARYING
@@ -681,10 +927,10 @@ D.2094_11: VARYING
 i_12: [1, +INF(OVF)]
 D.2077_13: VARYING
-i_19: [-INF, D.2084_4 + -1]  EQUIVALENCES: { i_1 } (1 elements)
-i_20: [0, +INF]  EQUIVALENCES: { i_1 i_19 } (2 elements)
+i_19: [0, +INF]  EQUIVALENCES: { } (0 elements)
+i_20: [0, +INF]  EQUIVALENCES: { i_19 } (1 elements)
 a_21: ~[0B, 0B]  EQUIVALENCES: { a_3(D) } (1 elements)
-D.2084_22: [i_19 + 1, +INF]  EQUIVALENCES: { D.2084_4 } (1 elements)
+D.2084_22: [1, +INF]  EQUIVALENCES: { D.2084_4 } (1 elements)
 D.2077_23: ~[0B, 0B]  EQUIVALENCES: { D.2077_13 } (1 elements)

but resulted in:
 Folding statement: if (i_20 >= D.2084_22)
-Folding predicate i_20 >= D.2084_22 to 0
-Folded into: if (0 != 0)
+Not folded

So, not sure if I should push this anyway (with adjusting/removing the testcase), or not.