Bug 46693 - incorrect code generation with -O2 optimization enabled
Summary: incorrect code generation with -O2 optimization enabled
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.5.2
: P3 normal
Target Milestone: 4.5.3
Assignee: Ramana Radhakrishnan
URL:
Keywords: wrong-code
: 46882 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-11-28 15:46 UTC by Sergey Matyukevich
Modified: 2012-01-12 20:31 UTC (History)
5 users (show)

See Also:
Host: i686-linux
Target: arm-gnueabi
Build: i686-linux
Known to work: 4.3.5, 4.4.6
Known to fail: 4.5.1
Last reconfirmed: 2010-12-03 09:42:32


Attachments
testcase (222 bytes, text/x-c)
2010-11-28 15:48 UTC, Sergey Matyukevich
Details
dumpspecs (2.18 KB, text/plain)
2010-11-28 15:48 UTC, Sergey Matyukevich
Details
gcc version (646 bytes, text/plain)
2010-11-28 15:49 UTC, Sergey Matyukevich
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sergey Matyukevich 2010-11-28 15:46:41 UTC

    
Comment 1 Sergey Matyukevich 2010-11-28 15:48:05 UTC
Created attachment 22554 [details]
testcase

code example
Comment 2 Sergey Matyukevich 2010-11-28 15:48:48 UTC
Created attachment 22555 [details]
dumpspecs

gcc dumpspecs
Comment 3 Sergey Matyukevich 2010-11-28 15:49:27 UTC
Created attachment 22556 [details]
gcc version
Comment 4 Richard Biener 2010-11-28 15:59:13 UTC
Works for me on x86_64-darwin.
Comment 5 Sergey Matyukevich 2010-11-28 16:12:47 UTC
I am using an arm gcc cross-toolchain created with OpenEmbedded. Toolchain version and its dumpspecs are available in attachements. Wrong code for attached simple testcase is generated when I use -O2 optimization option. When optimization options are not used or when -O0 is in use then code is generated correctly. Running testcase is straightforward: compile an example with -O2 and without it, then run both examples (either on a suitable arm machine or under qemu) and get two different outputs (OK and WRONG) for two binaries.
Comment 6 Mikael Pettersson 2010-11-28 22:36:37 UTC
This test case works for me on arm-linux-gnueabi with gcc-4.3.5 and gcc-4.4.6, but fails with current gcc-4.5.2 and gcc-4.6.
Comment 7 Ramana Radhakrishnan 2010-12-03 09:42:32 UTC
Confirmed.
Comment 8 Richard Biener 2010-12-10 13:29:12 UTC
*** Bug 46882 has been marked as a duplicate of this bug. ***
Comment 9 Ramana Radhakrishnan 2010-12-13 10:53:31 UTC
On trunk 

vrp converts the IR from :


<bb 3>:
  c_5 = D.2034_4;
  D.2026_6 = D.2034_4 == 13;
  D.2027_7 = D.2034_4 == 10;
  D.2028_8 = D.2026_6 || D.2027_7;
  if (D.2028_8 != 0)
    goto <bb 7>;
  else
    goto <bb 4>;

<bb 4>:
  D.2030_9 = D.2034_4 <= 31;
  D.2031_10 = D.2034_4 != 9;
  D.2032_11 = D.2030_9 && D.2031_10;
  if (D.2032_11 != 0)
    goto <bb 7>;
  else
    goto <bb 5>;

<bb 5>:
  str_12 = str_1 + 1;

<bb 6>:
  # str_1 = PHI <str_3(D)(2), str_12(5)>
  D.2034_4 = *str_1;
  if (D.2034_4 != 0)
    goto <bb 3>;
  else
    goto <bb 7>;

<bb 7>:
  # D.2033_2 = PHI <0(4), 1(6), 0(3)>
  return D.2033_2;


to

<bb 3>:
  c_5 = D.2034_4;
  D.2026_6 = D.2034_4 == 13;
  D.2027_7 = D.2034_4 == 10;
  D.2028_8 = D.2026_6 | D.2027_7;
  if (D.2028_8 != 0)
    goto <bb 7>;
  else
    goto <bb 4>;

<bb 4>:
  D.2030_9 = D.2034_4 <= 31;
  D.2031_10 = D.2034_4 != 9;
  D.2032_11 = D.2030_9 & D.2031_10;
  if (D.2032_11 != 0)
    goto <bb 7>;
  else
    goto <bb 5>;

<bb 5>:
  str_12 = str_1 + 1;

<bb 6>:
  # str_1 = PHI <str_3(D)(2), str_12(5)>
  D.2034_4 = *str_1;
  if (D.2034_4 != 0)
    goto <bb 3>;
  else
    goto <bb 7>;

<bb 7>:
  # D.2033_2 = PHI <0(4), 1(6), 0(3)>
  return D.2033_2;

After a while ifcombine comes along and removes basic block 5 and merges blocks 3 and 4 into 1 basic block because it thinks that the 


optimizing two comparisons to 1
Merging blocks 3 and 4
Removing basic block 5

and converts this to this form: 

<bb 2>:
  goto <bb 4>;


<bb 3>:
  D.2026_6 = D.2034_4 == 13;
  D.2027_7 = D.2034_4 == 10;
  D.2028_8 = D.2026_6 | D.2027_7;
  D.2030_9 = D.2034_4 <= 31;
  D.2031_10 = D.2034_4 != 9;
  D.2032_11 = D.2030_9 & D.2031_10;
  goto <bb 5>;

<bb 4>:
Invalid sum of incoming frequencies 873, should be 10000
  # str_1 = PHI <str_3(D)(2)>
  D.2034_4 = *str_1;
  if (D.2034_4 != 0)
    goto <bb 3>;
  else
    goto <bb 5>;

<bb 5>:
Invalid sum of incoming frequencies 10000, should be 873
  # D.2033_2 = PHI <0(3), 1(4)>
  return D.2033_2;
Comment 10 Ramana Radhakrishnan 2010-12-13 14:11:13 UTC
(In reply to comment #4)
> Works for me on x86_64-darwin.

Fails for me on x86_64 -linux with trunk as of today. 

Ramana
Comment 11 Ramana Radhakrishnan 2010-12-14 09:27:58 UTC
So the problem on trunk atleast seems to be in gimple-fold 

maybe_fold_or_comparisons and friends where all the comparisons are folded out into a single boolean node of 1 where ideally the result of the basic block should be reduced to the (c <= 31) && (x != 9) check . The other 2 equality comparisons are superfluous. 

I won't be able to look at this for a couple of days - hence unassigning myself.


The problem for this file goes away with -fno-tree-vrp but that's a heavy weight work around.



cheers
Ramana
Comment 12 Jakub Jelinek 2010-12-14 10:06:52 UTC
Maybe dup of PR46909?
Comment 13 Ramana Radhakrishnan 2010-12-14 12:07:54 UTC
(In reply to comment #12)
> Maybe dup of PR46909?

I can verify that the fix for PR46909 fixes the issue on trunk.

I'm looking into 4.5 branch right now.


cheers
Ramana
Comment 14 Ramana Radhakrishnan 2010-12-14 12:13:00 UTC
It doesn't seem to fail for me with the RC for GCC 4.5.2

with -march=armv5te -mthumb
     -march=armv5te
     -march=armv7-a
     -march=armv7-a -mthumb

at -O2, -O3 and -Os
Comment 15 Andrew Pinski 2012-01-12 20:31:24 UTC
Fixed so closing as such.