GCC 4.8.2 introduced a bug in optimization (present at all -O levels except -O0) in the presence of extern objects with weak definitions local to the translation unit. The bug is also present in 4.9.0. The following minimal testcase shows the problem: static int dummy = 0; extern int foo __attribute__((__weak__, __alias__("dummy"))); int bar() { if (foo) return 1; return 0; } This should produce a nontrivial bar which can conditionally return 0 or 1 depending on the contents of foo. Instead, on gcc 4.8.2 and 4.9.0, it produces a function which always returns 0. GCC versions with this bug have been reported to produce a seriously broken libc.a/libc.so for musl libc (e.g. fflush(NULL) fails to flush stdout). Removing static above causes the symptom to go away, so presumably GCC is wrongly transferring knowledge that "dummy" is static onto "foo", and thereby assuming "foo" is not externally reachable/modifiable. IIRC clang/LLVM had the same bug a couple years back and fixed it; it looks like GCC has newly introduced it. I don't know what component this should be marked as; sadly there's no "unknown" option so I just put "c". It should presumably be on one of the optimization ones, but I'm not sure which.
dummy is not weak or even extern. foo is but it is an alias to dummy which means it is also static rather than extern or weak. if dummy was an alias to foo which was weak and extern then GCC would not optimize the code.
dummy is relevant because there is no reference to dummy except the weak alias. foo is extern and the reference is to foo. It has a weak definition local to the translation unit (provided by the weak alias) but this can be overridden. GCC has handled this correctly ever since 2.x; 4.8.2 is the first version to mess it up.
That should have read "dummy is irrelevant".
Some users have been unable to reproduce the issue with gcc 4.8.2. Is it possible that building with/without isl/cloog libraries (or other third-party optimization-related libaries or plugins?) could affect this, and is there any good way to tell if gcc was built to use such things?
A weak alias to a non global variable does not make sense. As it can never be overridden.
The alias is global and it can be overridden (in the case of the test case, by any external object named "foo" in any other translation unit). The static in the definition of dummy does NOT apply to the alias foo, and never has. This is easy to verify, and it's frustrating that you're attacking this bug report which is clearly a regression from a standpoint of not understanding the semantics of the alias attribute.
This seems to be a post-4.8.2 change. (At least, it doesn’t show up on the Debian and Gentoo versions of 4.8.2.) It does show up on Deb’s version of 4.9 and their compile from the 4.10 branch.
Further investigation suggests that the real gcc 4.8.2 is not affected; I was mislead by the fact that Debian is shipping as "gcc-4.8_4.8.2-21" an svn snapshot that's actually post-4.8.2. So 4.9.0 seems to be the only official release that's affected.
For my gcc versions (x86_64) compiled from release sources, I have the following for this testcase, with and without static, using -O2: === gcc 4.8.2, with and without static === $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.2/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../gcc-4.8.2/configure --prefix=/usr --mandir=/usr/man --libexecdir=/usr/lib --enable-languages=c,c++,objc --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --enable-shared --disable-nls --with-x=no --with-system-zlib --disable-multilib --disable-bootstrap --disable-debug Thread model: posix gcc version 4.8.2 (GCC) $ objdump -dr test.o test.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <bar>: 0: 8b 15 00 00 00 00 mov 0x0(%rip),%edx # 6 <bar+0x6> 2: R_X86_64_PC32 foo-0x4 6: 31 c0 xor %eax,%eax 8: 85 d2 test %edx,%edx a: 0f 95 c0 setne %al d: c3 retq === gcc 4.9.0 === $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.9.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../gcc-4.9.0/configure --prefix=/usr --mandir=/usr/man --libexecdir=/usr/lib --enable-languages=c,c++,objc --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --enable-shared --disable-nls --with-x=no --with-system-zlib --disable-multilib --disable-debug Thread model: posix gcc version 4.9.0 (GCC) === With static (4.9.0) === $ objdump -dr test.o test.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <bar>: 0: 31 c0 xor %eax,%eax 2: c3 retq === Without static (4.9.0) === $ objdump -dr test.o test.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <bar>: 0: 8b 15 00 00 00 00 mov 0x0(%rip),%edx # 6 <bar+0x6> 2: R_X86_64_PC32 foo-0x4 6: 31 c0 xor %eax,%eax 8: 85 d2 test %edx,%edx a: 0f 95 c0 setne %al d: c3 retq
My tests on debian sid with GCC: (Debian 4.8.2-21) 4.8.2 do not replicate the bug. On debian sid, only 4.9 and gcc-snapshot (the 4.10 branch) show the bug.
Adding __attribute__((__used__)) to the static object suppresses the symptom in case that helps to isolate what's causing it.
Furthermore, __builtin_constant_p(dummy) wrongly returns 1, even though dummy is modifiable externally via foo (assuming foo is not replaced by a strong definition elsewhere). Perhaps this should be filed as a separate bug since it's also user-visible, but I think the cause is the same. On the other hand, according to my tests on gcc.godbolt.org, the incorrect value for __builtin_constant_p predates the optimization bug. This could mean they're separate bugs, but I suspect the later changes that cause the optimization bug were "correct" except that they're using the wrong concept of "is this expression constant?" from the earlier bug.
I've added the related issue 61159.
This seems related to PR 59948 where Honza says the code is "really broken".
Can you clarify? As far as I can tell, the other bug is a missed optimization and this is an overly-aggressive, incorrect optimization.
(In reply to Rich Felker from comment #15) > Can you clarify? As far as I can tell, the other bug is a missed > optimization and this is an overly-aggressive, incorrect optimization. The same test is wrong, it optimizes cases it shouldn't and doesn't optimize cases it should (I am not familiar enough to tell, just trusting you), goes together. Having testcases in both directions will be useful when Jan (or someone else) rewrites it.
Confirmed for 4.9.0. Note that I agree that the testcase looks suspicious, but weak aliases have odd behavior.
Any ideas on how to reliably test for this bug without having to run programs (breaks cross-compiling) or use additional tools (nm, objdump, etc. which we don't already depend on) to detect it? Since this issue was raised in relation to musl libc we're holding up the pending release trying to decide how best to work around it. My hope is to simply reject the broken version(s) with a message to try -fno-toplevel-reorder, and I could just hard-code 4.9.0 as the only broken version if folks on the GCC side are willing to make sure this gets fixed in time for 4.9.1, but obviously having a test for the bug would be preferable.
Here is the commit that seems to have introduced the bug: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=df8d3e8981a99e264b49876f0f5064bdb30ac981 In the function ctor_for_folding, the following erroneous logic appears: + /* Non-readonly alias of readonly variable is also de-facto readonly, + because the variable itself is in readonly section. + We also honnor READONLY flag on alias assuming that user knows + what he is doing. */ + if (!TREE_READONLY (decl) && !TREE_READONLY (real_decl)) + return error_mark_node; This treats the value of an alias as a compile-time constant if either the alias itself or the alias target has TREE_READONLY being true. Replacing the && with || seems to make the problem go away in my test case, and makes a bit more sense (perhaps that was the original intent?), but it's only sufficient if TREE_READONLY is always false for weak aliases (since they can always be overridden by a strong symbol from another translation unit, even if the alias is const-qualified). I'm not sure where the value of TREE_READONLY is set for aliases yet but I'll keep looking...
On further investigation, it looks like the code I cited deals with strong aliases as well as weak ones, and in the strong alias case, the strong folding behavior might be desirable. A better fix seems to be adding an explicit check for weak aliases (DECL_WEAK(decl)) when an alias is found and returning error_mark_node in that case. Note that prior to the above-mentioned commit, the !TREE_READONLY(decl) case was always treated as non-foldable, so there seems to have been no subtlety to avoiding errors with weak aliases. But the new code performs much more aggressive constant folding and thus needs to avoid stepping on weak aliases.
Created attachment 32830 [details] proposed patch patch is generated against the revision that introduced this bug.
> Richard Biener <rguenth at gcc dot gnu.org> changed: > What |Removed |Added > ---------------------------------------------------------------------------- > Priority|P3 |P2 Can we please make this P1 rather than P2? It is a wrong-code regression and only present in a single release, so by the bug management guidelines (https://gcc.gnu.org/bugs/management.html) I think it should be P1. There is already a working patch, test-case, and discussion on the gcc-patches list so I don't see what's blocking fixing it before the next release.
(In reply to Rich Felker from comment #22) > > Richard Biener <rguenth at gcc dot gnu.org> changed: > > What |Removed |Added > > ---------------------------------------------------------------------------- > > Priority|P3 |P2 > > Can we please make this P1 rather than P2? It is a wrong-code > regression and only present in a single release, so by the bug > management guidelines (https://gcc.gnu.org/bugs/management.html) I > think it should be P1. > > There is already a working patch, test-case, and discussion on the > gcc-patches list so I don't see what's blocking fixing it before the > next release. It looks like this bug has been recently fixed in the trunk (no testsuite addition however). I'm not sure what its status is on the release branches, though.
On Wed, Jul 02, 2014 at 12:22:40PM +0000, patrick at parcs dot ath.cx wrote: > It looks like this bug has been recently fixed in the trunk (no testsuite > addition however). I'm not sure what its status is on the release branches, > though. Per discussion on gcc-patches, the issue for this exact test case went away due to related changes not explicitly aimed at fixing this bug. However, if dummy is const-qualified, the bug supposedly remains. I have not yet tested with trunk to confirm this; I can do that if necessary. Here is the new test case: static const int dummy = 0; extern const int foo __attribute__((__weak__, __alias__("dummy"))); int bar() { if (foo) return 1; return 0; } Even with the const qualifier on both dummy and foo, foo should not be considered for constant folding because it's replaceable by alternate definitions which may provide different values.
I have tried to bootstrap and regtest the patch (to be attached) that reflects the latest comments on the mailing list (except the concern raised by Jan about C++ optimization). Unfortunately the patch regresses the following simple C++ testcase, not folding the conditional access to the constant (even at -O3), but not emitting the constant either and thus causing an undefined reference at link time. Uncommenting the commented line allows folding but also emits the definition of the static member, increasing .rodata size. It seems odd to me that decl_replaceable_p(z::cst) is true, and that for instance TREE_PUBLIC(z::cst) is true. Any help here please? struct z { static const int cst = 1; }; // const int z::cst; int foo(int cond) { return cond ? z::cst : 0; }
Created attachment 33120 [details] updated patch
GCC 4.9.1 has been released.
I'm quite disappointed that another release was made with this bug present. I've been trying to find a workaround, but even at -O0 the invalid constant folding optimizations are made. Is there anything I can do short of blacklisting 4.9.0 and 4.9.1 (either hard-coded or via detecting the bug)?
Created attachment 33209 [details] patch I am testing Sorry for taking so long on this issue - I had busy time and there is no really good answer to this problem. C++ require constant variables to be folded, ELF interposition interpreted in full generality prevents folding of many symbols. We behave in C++ way for years and I think it should stay a default, so I think we can special case the user specified WEAK attribute - i.e. do not give up on PIC exported symbols nor C++ COMDATs as we already special case COMDAT. I hope this won't break non-ELF targets, like AIX - will give it a try.
I don't really understand how weak aliases come into play for implementing C++ features, but if their semantics are not identical to the necessary C++ semantics, the compiler should just distinguish between the relevant C++ semantics and weak alias semantics rather than trying to force two semantically different concepts to use the same feature in ways that breaks either one or the other. Also, was C++ behavior broken prior to 4.9.0, or is there some other reason that weak aliases worked without breaking C++ mandatory constant folding prior to 4.9.0?
Author: hubicka Date: Sun Oct 5 04:52:19 2014 New Revision: 215896 URL: https://gcc.gnu.org/viewcvs?rev=215896&root=gcc&view=rev Log: PR ipa/61144 * varpool.c (ctor_for_folding): Do not fold WEAK symbols. * gcc.dg/tree-ssa/pr61144.c: New testcase. Added: branches/gcc-4_9-branch/gcc/testsuite/gcc.dg/tree-ssa/pr61144.c Modified: branches/gcc-4_9-branch/gcc/ChangeLog branches/gcc-4_9-branch/gcc/testsuite/ChangeLog branches/gcc-4_9-branch/gcc/varpool.c
Fixed now both on branch and mainline.
Author: hubicka Date: Sun Oct 5 04:56:14 2014 New Revision: 215897 URL: https://gcc.gnu.org/viewcvs?rev=215897&root=gcc&view=rev Log: PR ipa/61144 * gcc.dg/tree-ssa/pr61144.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/pr61144.c Modified: trunk/gcc/testsuite/ChangeLog