Bug 52734 - [4.7/4.8 Regression] Incorrect optimization of uClibc sbrk()
Summary: [4.7/4.8 Regression] Incorrect optimization of uClibc sbrk()
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.7.0
: P2 normal
Target Milestone: 4.7.1
Assignee: Not yet assigned to anyone
Keywords: wrong-code
Depends on:
Reported: 2012-03-27 05:16 UTC by Michael Deutschmann
Modified: 2019-06-14 19:16 UTC (History)
5 users (show)

See Also:
Host: i386-pc-linux-gnu
Target: i386-pc-linux-gnu
Build: i386-pc-linux-gnu
Known to work: 4.6.3
Known to fail:
Last reconfirmed: 2012-03-27 00:00:00

Simplified code analogous to the sbrk problem (99 bytes, text/plain)
2012-03-27 05:16 UTC, Michael Deutschmann
adjusted test case (174 bytes, text/plain)
2012-03-27 07:42 UTC, Mikael Pettersson
Typescript of Michael running Mikael's version of the testcase (799 bytes, text/plain)
2012-03-27 07:57 UTC, Michael Deutschmann
Assembly output of Mikael's testcase, from Michael's computer (351 bytes, text/plain)
2012-03-27 08:05 UTC, Michael Deutschmann
Tentative patch (1.85 KB, patch)
2012-03-27 13:19 UTC, Tom de Vries
Details | Diff
pr52734.2.patch (1.16 KB, patch)
2012-04-12 16:25 UTC, Tom de Vries
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Deutschmann 2012-03-27 05:16:36 UTC
Created attachment 27005 [details]
Simplified code analogous to the sbrk problem

GCC 4.7.0 miscompiles uClibc 0.9.33 producing an unusable library.  The problem is in uClibc's sbrk() implementation -- GCC optimizes away sbrk's attempt to save a global variable before a called function changes it.  Basically, that function (brk) is wrongly treated as an __attribute__((pure)) function.

The above text is to help anyone searching for the bug.  I'm attaching a distilled case showing the same problem.  The function ccc() is supposed to return the value of bbb before the second call to aaa(), but when compiled with -Os it returns the value of bbb afterwards.

This is on i386-pc-linux-uclibc.  GCC 4.6.3 did not have the problem.
Comment 1 Richard Biener 2012-03-27 07:35:27 UTC
The testcase does not reproduce the issue for me on i?86-linux.  Are you
sure the testcase reproduces the issue for you?  Can you attach the wrong
assembly that is produced and the output of the compiler command producing
it when you append -v to it?
Comment 2 Mikael Pettersson 2012-03-27 07:42:46 UTC
Created attachment 27008 [details]
adjusted test case

This test case (adjusted to __builtin_abort() on error) fails with gcc-4.7-20120324 with -Os/-O2/-O3 on x86_64-pc-linux-gnu, but works with -O0/-O1, and works with gcc-4.6.3 and any -O option.
Comment 3 Mikael Pettersson 2012-03-27 07:45:49 UTC
Here's the bogus assembly code for ccc() with gcc-4.7 and -Os:

        .globl  ccc
        .type   ccc, @function
        call    aaa
        testl   %eax, %eax
        je      .L3
        movl    bbb(%rip), %eax
        call    aaa
        testl   %eax, %eax
        je      .L5
        xorl    %eax, %eax
        .size   ccc, .-ccc

Note how it fails to read bbb into a local before the second call to aaa.
Comment 4 Mikael Pettersson 2012-03-27 07:53:52 UTC
Not x86-specific, gcc-4.7 fails on arm-linux-gnueabi and m68k-linux too.
Comment 5 Michael Deutschmann 2012-03-27 07:57:02 UTC
Created attachment 27009 [details]
Typescript of Michael running Mikael's version of the testcase
Comment 6 Michael Deutschmann 2012-03-27 08:05:06 UTC
Created attachment 27011 [details]
Assembly output of Mikael's testcase, from Michael's computer

Here's the assembly output you requested.  The "-v" data is in the typescript file added separately.

I've used the modified version of the case submitted by Mikael, rather than my original file.

Notice that "bbb" appears in the assembly only twice.  The first time is the increment in Mikael's aaa implementation, and the second is on a code path that goes straight to the end of the function.  Nowhere in the code is the load of ddd from bbb.
Comment 7 Richard Biener 2012-03-27 08:24:27 UTC
Comment 8 Richard Biener 2012-03-27 09:21:51 UTC
Caused by tail merging.  -fno-tree-tail-merge fixes it.
Comment 9 Tom de Vries 2012-03-27 13:05:09 UTC
Blocks 3 and 5, with successor 6 are considered equal and merged.
  # BLOCK 3 freq:6102
  # PRED: 2 [61.0%]  (true,exec)
  # VUSE <.MEMD.1734_10>
  dddD.1710_3 = bbbD.1703;
  goto <bb 6>;
  # SUCC: 6 [100.0%]  (fallthru,exec)

  # BLOCK 5 freq:2378
  # PRED: 4 [61.0%]  (false,exec)
  # SUCC: 6 [100.0%]  (fallthru,exec)

  # BLOCK 6 freq:10000
  # PRED: 3 [100.0%]  (fallthru,exec) 7 [100.0%]  (fallthru) 5 [100.0%]  (fallthru,exec)
  # dddD.1710_1 = PHI <dddD.1710_3(3), 0(7), dddD.1710_4(5)>
  # .MEMD.1734_8 = PHI <.MEMD.1734_10(3), .MEMD.1734_11(7), .MEMD.1734_11(5)>
  # VUSE <.MEMD.1734_8>
  return dddD.1710_1;
  # SUCC: EXIT [100.0%] 

Tail merge considers 2 blocks equal if the effect at the tail is equal, meaning:
- the sequence of side effects produced by each block is equal
- the value phis are equal

There are no side effects in block 3 and 5, and the phi alternatives of dddD.1710_1 for 3 (dddD.1710_3)  and 5 (dddD.1710_4)  are proven equal by gvn.

The problem is that changing the (4->5) edge into a (4->3) edge changes the value of dddD.1710_3, because block 4 contains a store that affects the load in block 3.
  # BLOCK 4 freq:3898
  # PRED: 2 [39.0%]  (false,exec)
  # VUSE <.MEMD.1734_10>
  dddD.1710_4 = bbbD.1703;
  # .MEMD.1734_11 = VDEF <.MEMD.1734_10>
  # USE = nonlocal null 
  # CLB = nonlocal null 
  D.1724_5 = aaaD.1705 ();
  if (D.1724_5 != 0)
    goto <bb 7>;
    goto <bb 5>;
  # SUCC: 7 [39.0%]  (true,exec) 5 [61.0%]  (false,exec)
Comment 10 Tom de Vries 2012-03-27 13:19:12 UTC
Created attachment 27014 [details]
Tentative patch

The key element of this patch is:
+  if (gvn_used && vuse1 != vuse2)
+    return;

If gvn was use to prove two bbs equal, and the incoming vuses are not equal, don't do the substitution. This is a bit too restrictive, I'm pondering about a more precise fix.
Comment 11 Tom de Vries 2012-03-27 17:04:54 UTC
> Created attachment 27014 [details]
> Tentative patch

Bootstrapped and reg-tested on x86_64, no issues found.
Comment 12 joerg.jungermann 2012-04-10 22:40:41 UTC
We at Freetz[1] had serveral problems compiling a running uclibc 0.9.3[23] with gcc 4.7.0. -fno-tree-tail-merge did fix this, so I applied the patch from #c10.

[1] http://freetz.org/ticket/1752
Comment 13 Sedat Dilek 2012-04-10 22:57:15 UTC
Just some remarks:

Freetz is an OSS project supporting router (platforms) of a popular German company. The target-system is mostly MIPS(EL). The majority of our developers build on AMD64 hosts (32bit-toolchain).
The full patchset for gcc-4.7.0 can be found in [1].
Jörg provided his adapted/backported patch in [2].
Thank you for testing!

[0] http://freetz.org/
[1] http://freetz.org/browser/trunk/toolchain/make/target/gcc/4.7.0/
[2] http://freetz.org/attachment/ticket/1752/r8889-gcc-4.7-fix.patch
Comment 14 Richard Biener 2012-04-11 10:12:24 UTC
(In reply to comment #11)
> > Created attachment 27014 [details]
> > Tentative patch
> Bootstrapped and reg-tested on x86_64, no issues found.

Can you post it for review then?
Comment 15 Tom de Vries 2012-04-12 16:25:28 UTC
Created attachment 27142 [details]

Another tentative patch. Also not precise, but a simpler approach.

Now testing.
Comment 16 Tom de Vries 2012-04-13 19:32:27 UTC
Fixed in trunk: r186418
Fixed in 4.7 branch: r186424
Comment 17 Sedat Dilek 2012-04-13 19:38:24 UTC
Thanks, Tom!
Comment 18 Jackie Rosen 2014-02-16 13:16:12 UTC Comment hidden (spam)