Bug 61144 - [4.9/5 Regression] Invalid optimizations for extern vars with local weak definitions
Summary: [4.9/5 Regression] Invalid optimizations for extern vars with local weak defi...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: ipa (show other bugs)
Version: 4.9.0
: P2 normal
Target Milestone: 4.9.2
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2014-05-11 01:29 UTC by Rich Felker
Modified: 2014-10-05 04:56 UTC (History)
10 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.8.2
Known to fail: 4.9.0
Last reconfirmed: 2014-05-13 00:00:00


Attachments
proposed patch (277 bytes, patch)
2014-05-20 20:28 UTC, Rich Felker
Details | Diff
updated patch (712 bytes, patch)
2014-07-14 16:13 UTC, Alexander Monakov
Details | Diff
patch I am testing (510 bytes, patch)
2014-07-30 09:14 UTC, Jan Hubicka
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Rich Felker 2014-05-11 01:29:56 UTC
GCC 4.8.2 introduced a bug in optimization (present at all -O levels except -O0) in the presence of extern objects with weak definitions local to the translation unit. The bug is also present in 4.9.0. The following minimal testcase shows the problem:

static int dummy = 0;
extern int foo __attribute__((__weak__, __alias__("dummy")));
int bar() { if (foo) return 1; return 0; }

This should produce a nontrivial bar which can conditionally return 0 or 1 depending on the contents of foo. Instead, on gcc 4.8.2 and 4.9.0, it produces a function which always returns 0.

GCC versions with this bug have been reported to produce a seriously broken libc.a/libc.so for musl libc (e.g. fflush(NULL) fails to flush stdout).

Removing static above causes the symptom to go away, so presumably GCC is wrongly transferring knowledge that "dummy" is static onto "foo", and thereby assuming "foo" is not externally reachable/modifiable. IIRC clang/LLVM had the same bug a couple years back and fixed it; it looks like GCC has newly introduced it.

I don't know what component this should be marked as; sadly there's no "unknown" option so I just put "c". It should presumably be on one of the optimization ones, but I'm not sure which.
Comment 1 Andrew Pinski 2014-05-11 03:20:28 UTC
dummy is not weak or even extern.  foo is but it is an alias to dummy which means it is also static rather than extern or weak.  if dummy was an alias to foo which was weak and extern then GCC would not optimize the code.
Comment 2 Rich Felker 2014-05-11 03:32:11 UTC
dummy is relevant because there is no reference to dummy except the weak alias. foo is extern and the reference is to foo. It has a weak definition local to the translation unit (provided by the weak alias) but this can be overridden. GCC has handled this correctly ever since 2.x; 4.8.2 is the first version to mess it up.
Comment 3 Rich Felker 2014-05-11 03:32:54 UTC
That should have read "dummy is irrelevant".
Comment 4 Rich Felker 2014-05-11 19:05:03 UTC
Some users have been unable to reproduce the issue with gcc 4.8.2. Is it possible that building with/without isl/cloog libraries (or other third-party optimization-related libaries or plugins?) could affect this, and is there any good way to tell if gcc was built to use such things?
Comment 5 Andrew Pinski 2014-05-11 19:41:43 UTC
A weak alias to a non global variable does not make sense.  As it can never be overridden.
Comment 6 Rich Felker 2014-05-11 20:05:31 UTC
The alias is global and it can be overridden (in the case of the test case, by any external object named "foo" in any other translation unit). The static in the definition of dummy does NOT apply to the alias foo, and never has. This is easy to verify, and it's frustrating that you're attacking this bug report which is clearly a regression from a standpoint of not understanding the semantics of the alias attribute.
Comment 7 James Cloos 2014-05-12 04:18:24 UTC
This seems to be a post-4.8.2 change.

(At least, it doesn’t show up on the Debian and Gentoo versions of 4.8.2.)

It does show up on Deb’s version of 4.9 and their compile from the 4.10 branch.
Comment 8 Rich Felker 2014-05-12 12:36:50 UTC
Further investigation suggests that the real gcc 4.8.2 is not affected; I was mislead by the fact that Debian is shipping as "gcc-4.8_4.8.2-21" an svn snapshot that's actually post-4.8.2. So 4.9.0 seems to be the only official release that's affected.
Comment 9 Jody Lee Bruchon 2014-05-12 13:23:18 UTC
For my gcc versions (x86_64) compiled from release sources, I have the following for this testcase, with and without static, using -O2:

=== gcc 4.8.2, with and without static ===

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.2/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-4.8.2/configure --prefix=/usr --mandir=/usr/man --libexecdir=/usr/lib --enable-languages=c,c++,objc --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --enable-shared --disable-nls --with-x=no --with-system-zlib --disable-multilib --disable-bootstrap --disable-debug
Thread model: posix
gcc version 4.8.2 (GCC)

$ objdump -dr test.o

test.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <bar>:
   0:   8b 15 00 00 00 00       mov    0x0(%rip),%edx        # 6 <bar+0x6>
                        2: R_X86_64_PC32        foo-0x4
   6:   31 c0                   xor    %eax,%eax
   8:   85 d2                   test   %edx,%edx
   a:   0f 95 c0                setne  %al
   d:   c3                      retq

=== gcc 4.9.0 ===

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.9.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-4.9.0/configure --prefix=/usr --mandir=/usr/man --libexecdir=/usr/lib --enable-languages=c,c++,objc --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --enable-shared --disable-nls --with-x=no --with-system-zlib --disable-multilib --disable-debug
Thread model: posix
gcc version 4.9.0 (GCC)

=== With static (4.9.0) ===

$ objdump -dr test.o

test.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <bar>:
   0:   31 c0                   xor    %eax,%eax
   2:   c3                      retq

=== Without static (4.9.0) ===

$ objdump -dr test.o

test.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <bar>:
   0:   8b 15 00 00 00 00       mov    0x0(%rip),%edx        # 6 <bar+0x6>
                        2: R_X86_64_PC32        foo-0x4
   6:   31 c0                   xor    %eax,%eax
   8:   85 d2                   test   %edx,%edx
   a:   0f 95 c0                setne  %al
   d:   c3                      retq
Comment 10 James Cloos 2014-05-12 13:59:51 UTC
My tests on debian sid with GCC: (Debian 4.8.2-21) 4.8.2 do not replicate the bug.

On debian sid, only 4.9 and gcc-snapshot (the 4.10 branch) show the bug.
Comment 11 Rich Felker 2014-05-12 15:16:25 UTC
Adding __attribute__((__used__)) to the static object suppresses the symptom in case that helps to isolate what's causing it.
Comment 12 Rich Felker 2014-05-12 17:42:47 UTC
Furthermore, __builtin_constant_p(dummy) wrongly returns 1, even though dummy is modifiable externally via foo (assuming foo is not replaced by a strong definition elsewhere). Perhaps this should be filed as a separate bug since it's also user-visible, but I think the cause is the same.

On the other hand, according to my tests on gcc.godbolt.org, the incorrect value for __builtin_constant_p predates the optimization bug. This could mean they're separate bugs, but I suspect the later changes that cause the optimization bug were "correct" except that they're using the wrong concept of "is this expression constant?" from the earlier bug.
Comment 13 Rich Felker 2014-05-12 19:02:02 UTC
I've added the related issue 61159.
Comment 14 Marc Glisse 2014-05-12 20:31:36 UTC
This seems related to PR 59948 where Honza says the code is "really broken".
Comment 15 Rich Felker 2014-05-13 03:26:40 UTC
Can you clarify? As far as I can tell, the other bug is a missed optimization and this is an overly-aggressive, incorrect optimization.
Comment 16 Marc Glisse 2014-05-13 08:45:12 UTC
(In reply to Rich Felker from comment #15)
> Can you clarify? As far as I can tell, the other bug is a missed
> optimization and this is an overly-aggressive, incorrect optimization.

The same test is wrong, it optimizes cases it shouldn't and doesn't optimize cases it should (I am not familiar enough to tell, just trusting you), goes together. Having testcases in both directions will be useful when Jan (or someone else) rewrites it.
Comment 17 Richard Biener 2014-05-13 11:19:29 UTC
Confirmed for 4.9.0.  Note that I agree that the testcase looks suspicious,
but weak aliases have odd behavior.
Comment 18 Rich Felker 2014-05-14 05:08:15 UTC
Any ideas on how to reliably test for this bug without having to run programs (breaks cross-compiling) or use additional tools (nm, objdump, etc. which we don't already depend on) to detect it? Since this issue was raised in relation to musl libc we're holding up the pending release trying to decide how best to work around it. My hope is to simply reject the broken version(s) with a message to try -fno-toplevel-reorder, and I could just hard-code 4.9.0 as the only broken version if folks on the GCC side are willing to make sure this gets fixed in time for 4.9.1, but obviously having a test for the bug would be preferable.
Comment 19 Rich Felker 2014-05-20 08:11:17 UTC
Here is the commit that seems to have introduced the bug:

https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=df8d3e8981a99e264b49876f0f5064bdb30ac981

In the function ctor_for_folding, the following erroneous logic appears:

+  /* Non-readonly alias of readonly variable is also de-facto readonly,
+     because the variable itself is in readonly section.  
+     We also honnor READONLY flag on alias assuming that user knows
+     what he is doing.  */
+  if (!TREE_READONLY (decl) && !TREE_READONLY (real_decl))
+    return error_mark_node;

This treats the value of an alias as a compile-time constant if either the alias itself or the alias target has TREE_READONLY being true. Replacing the && with || seems to make the problem go away in my test case, and makes a bit more sense (perhaps that was the original intent?), but it's only sufficient if TREE_READONLY is always false for weak aliases (since they can always be overridden by a strong symbol from another translation unit, even if the alias is const-qualified). I'm not sure where the value of TREE_READONLY is set for aliases yet but I'll keep looking...
Comment 20 Rich Felker 2014-05-20 08:31:50 UTC
On further investigation, it looks like the code I cited deals with strong aliases as well as weak ones, and in the strong alias case, the strong folding behavior might be desirable. A better fix seems to be adding an explicit check for weak aliases (DECL_WEAK(decl)) when an alias is found and returning error_mark_node in that case.

Note that prior to the above-mentioned commit, the !TREE_READONLY(decl) case was always treated as non-foldable, so there seems to have been no subtlety to avoiding errors with weak aliases. But the new code performs much more aggressive constant folding and thus needs to avoid stepping on weak aliases.
Comment 21 Rich Felker 2014-05-20 20:28:03 UTC
Created attachment 32830 [details]
proposed patch

patch is generated against the revision that introduced this bug.
Comment 22 Rich Felker 2014-06-26 13:35:52 UTC
> Richard Biener <rguenth at gcc dot gnu.org> changed:
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>            Priority|P3                          |P2

Can we please make this P1 rather than P2? It is a wrong-code
regression and only present in a single release, so by the bug
management guidelines (https://gcc.gnu.org/bugs/management.html) I
think it should be P1.

There is already a working patch, test-case, and discussion on the
gcc-patches list so I don't see what's blocking fixing it before the
next release.
Comment 23 patrick 2014-07-02 12:22:40 UTC
(In reply to Rich Felker from comment #22)
> > Richard Biener <rguenth at gcc dot gnu.org> changed:
> >            What    |Removed                     |Added
> > ----------------------------------------------------------------------------
> >            Priority|P3                          |P2
> 
> Can we please make this P1 rather than P2? It is a wrong-code
> regression and only present in a single release, so by the bug
> management guidelines (https://gcc.gnu.org/bugs/management.html) I
> think it should be P1.
> 
> There is already a working patch, test-case, and discussion on the
> gcc-patches list so I don't see what's blocking fixing it before the
> next release.

It looks like this bug has been recently fixed in the trunk (no testsuite addition however).  I'm not sure what its status is on the release branches, though.
Comment 24 Rich Felker 2014-07-02 14:47:01 UTC
On Wed, Jul 02, 2014 at 12:22:40PM +0000, patrick at parcs dot ath.cx wrote:
> It looks like this bug has been recently fixed in the trunk (no testsuite
> addition however).  I'm not sure what its status is on the release branches,
> though.

Per discussion on gcc-patches, the issue for this exact test case went
away due to related changes not explicitly aimed at fixing this bug.
However, if dummy is const-qualified, the bug supposedly remains. I
have not yet tested with trunk to confirm this; I can do that if
necessary.

Here is the new test case:

static const int dummy = 0;
extern const int foo __attribute__((__weak__, __alias__("dummy")));
int bar() { if (foo) return 1; return 0; }

Even with the const qualifier on both dummy and foo, foo should not be
considered for constant folding because it's replaceable by alternate
definitions which may provide different values.
Comment 25 Alexander Monakov 2014-07-14 16:11:45 UTC
I have tried to bootstrap and regtest the patch (to be attached) that reflects the latest comments on the mailing list (except the concern raised by Jan about C++ optimization).  Unfortunately the patch regresses the following simple C++ testcase, not folding the conditional access to the constant (even at -O3), but not emitting the constant either and thus causing an undefined reference at link time.  Uncommenting the commented line allows folding but also emits the definition of the static member, increasing .rodata size.

It seems odd to me that decl_replaceable_p(z::cst) is true, and that for instance TREE_PUBLIC(z::cst) is true.  Any help here please?

struct z {
  static const int cst = 1;
};

// const int z::cst;

int foo(int cond)
{
  return cond ? z::cst : 0;
}
Comment 26 Alexander Monakov 2014-07-14 16:13:07 UTC
Created attachment 33120 [details]
updated patch
Comment 27 Jakub Jelinek 2014-07-16 13:29:23 UTC
GCC 4.9.1 has been released.
Comment 28 Rich Felker 2014-07-16 18:16:51 UTC
I'm quite disappointed that another release was made with this bug present. I've been trying to find a workaround, but even at -O0 the invalid constant folding optimizations are made. Is there anything I can do short of blacklisting 4.9.0 and 4.9.1 (either hard-coded or via detecting the bug)?
Comment 29 Jan Hubicka 2014-07-30 09:14:22 UTC
Created attachment 33209 [details]
patch I am testing

Sorry for taking so long on this issue - I had busy time and there is no really good answer to this problem.  C++ require constant variables to be folded, ELF interposition interpreted in full generality prevents folding of many symbols.  We behave in C++ way for years and I think it should stay a default, so I think we can special case the user specified WEAK attribute - i.e. do not give up on PIC exported symbols nor C++ COMDATs as we already special case COMDAT.

I hope this won't break non-ELF targets, like AIX - will give it a try.
Comment 30 Rich Felker 2014-07-31 22:56:45 UTC
I don't really understand how weak aliases come into play for implementing C++ features, but if their semantics are not identical to the necessary C++ semantics, the compiler should just distinguish between the relevant C++ semantics and weak alias semantics rather than trying to force two semantically different concepts to use the same feature in ways that breaks either one or the other.

Also, was C++ behavior broken prior to 4.9.0, or is there some other reason that weak aliases worked without breaking C++ mandatory constant folding prior to 4.9.0?
Comment 31 Jan Hubicka 2014-10-05 04:52:56 UTC
Author: hubicka
Date: Sun Oct  5 04:52:19 2014
New Revision: 215896

URL: https://gcc.gnu.org/viewcvs?rev=215896&root=gcc&view=rev
Log:

	PR ipa/61144
	* varpool.c (ctor_for_folding): Do not fold WEAK symbols.
	* gcc.dg/tree-ssa/pr61144.c: New testcase.

Added:
    branches/gcc-4_9-branch/gcc/testsuite/gcc.dg/tree-ssa/pr61144.c
Modified:
    branches/gcc-4_9-branch/gcc/ChangeLog
    branches/gcc-4_9-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_9-branch/gcc/varpool.c
Comment 32 Jan Hubicka 2014-10-05 04:54:37 UTC
Fixed now both on branch and mainline.
Comment 33 Jan Hubicka 2014-10-05 04:55:37 UTC
Fixed now both on branch and mainline.
Comment 34 Jan Hubicka 2014-10-05 04:56:51 UTC
Author: hubicka
Date: Sun Oct  5 04:56:14 2014
New Revision: 215897

URL: https://gcc.gnu.org/viewcvs?rev=215897&root=gcc&view=rev
Log:

	PR ipa/61144
	* gcc.dg/tree-ssa/pr61144.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr61144.c
Modified:
    trunk/gcc/testsuite/ChangeLog