Created attachment 53798 [details] Source Code g++ -g Record.cpp -c -fPIC -Os -flto g++ -g -shared Record.o -fno-strict-aliasing -o librecord.so g++ main.cpp -L. -lrecord LD_LIBRARY_PATH=$PWD ./a.out This sometimes prints 1 and sometimes prints 0. If I remove the -fno-strict-aliasing flag of the second g++ command, or add it to the first g++ command, it always prints 1. There is no violation of the strict-aliasing rule in Record.cpp, at least none that I could spot. If you are afraid of non-reproducibility, fear not, the generated assembly is the same every time, and is obviously wrong: The method Combined<OtherClass>::clashy() is missing a call to get_const objdump librecord.so --disassemble=_ZN8CombinedI10OtherClassE6clashyEv.isra.0 000000000000113e <_ZN8CombinedI10OtherClassE6clashyEv.isra.0>: 113e: 48 83 ec 38 sub $0x38,%rsp 1142: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax 1149: 00 00 114b: 48 89 44 24 28 mov %rax,0x28(%rsp) 1150: 31 c0 xor %eax,%eax 1152: 48 8d 74 24 18 lea 0x18(%rsp),%rsi 1157: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi 115c: e8 b9 ff ff ff call 111a <_Z13compare_pairsIP10OtherClassEbRKSt4pairIiT_ES6_> 1161: 48 8b 54 24 28 mov 0x28(%rsp),%rdx 1166: 64 48 2b 14 25 28 00 sub %fs:0x28,%rdx 116d: 00 00 116f: 74 05 je 1176 <_ZN8CombinedI10OtherClassE6clashyEv.isra.0+0x38> 1171: e8 ca fe ff ff call 1040 <__stack_chk_fail@plt> 1176: 48 83 c4 38 add $0x38,%rsp 117a: c3 ret The almost identical Combined<SomeClass>::clashy() gets compiled correctly: objdump librecord.so --disassemble=_ZN8CombinedI9SomeClassE6clashyEv.isra.0 000000000000117c <_ZN8CombinedI9SomeClassE6clashyEv.isra.0>: 117c: 48 83 ec 38 sub $0x38,%rsp 1180: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax 1187: 00 00 1189: 48 89 44 24 28 mov %rax,0x28(%rsp) 118e: 31 c0 xor %eax,%eax 1190: 48 8d 74 24 18 lea 0x18(%rsp),%rsi 1195: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi 119a: e8 97 ff ff ff call 1136 <_ZN8CombinedI9SomeClassE9get_constEv.constprop.0> 119f: 48 89 54 24 20 mov %rdx,0x20(%rsp) 11a4: 89 44 24 18 mov %eax,0x18(%rsp) 11a8: e8 89 ff ff ff call 1136 <_ZN8CombinedI9SomeClassE9get_constEv.constprop.0> 11ad: 48 89 54 24 10 mov %rdx,0x10(%rsp) 11b2: 89 44 24 08 mov %eax,0x8(%rsp) 11b6: e8 5f ff ff ff call 111a <_Z13compare_pairsIP10OtherClassEbRKSt4pairIiT_ES6_> 11bb: 48 8b 54 24 28 mov 0x28(%rsp),%rdx 11c0: 64 48 2b 14 25 28 00 sub %fs:0x28,%rdx 11c7: 00 00 11c9: 74 05 je 11d0 <_ZN8CombinedI9SomeClassE6clashyEv.isra.0+0x54> 11cb: e8 70 fe ff ff call 1040 <__stack_chk_fail@plt> 11d0: 48 83 c4 38 add $0x38,%rsp 11d4: c3 ret $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/12.2.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /build/gcc/src/gcc/configure --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --enable-bootstrap --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --with-build-config=bootstrap-lto --with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-linker-build-id --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch --disable-werror Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 12.2.0 (GCC) $ cat main.cpp void other_func(); int main() { other_func(); }
Here is the unincluded version of the source: #include <utility> template <typename C> bool __attribute__((noinline)) compare_pairs(const std::pair<int, C> &lhs, const std::pair<int, C> &rhs) { return lhs.first == rhs.first && lhs.second == rhs.second; } template <typename B> struct Combined { std::pair<int, B *> __attribute__((noinline)) get_const() { return std::make_pair(123, nullptr); } bool __attribute__((noinline)) clashy() { return compare_pairs(get_const(), get_const()); } }; class SomeClass {}; class OtherClass {}; void some_func() { Combined<SomeClass> myvar; __builtin_printf("%i\n", myvar.clashy()); } void other_func() { Combined<OtherClass> myvar; __builtin_printf("%i\n", myvar.clashy()); }
Here is a single file example which just needs -Os -flto -fno-strict-aliasing: template <class T> struct pair { int first; T second; }; template <typename C> [[gnu::optimize("strict-aliasing")]] bool __attribute__((noinline)) compare_pairs(const pair<C> &lhs, const pair<C> &rhs) { return lhs.first == rhs.first && lhs.second == rhs.second; } template <typename B> struct Combined { pair<B *> __attribute__((noinline)) get_const() { return pair<B *>{123, nullptr}; } [[gnu::optimize("strict-aliasing")]] bool __attribute__((noinline)) clashy() { return compare_pairs(get_const(), get_const()); } }; class SomeClass {}; class OtherClass {}; [[gnu::optimize("strict-aliasing")]] [[gnu::used]] void some_func() { Combined<SomeClass> myvar; __builtin_printf("%i\n", myvar.clashy()); } [[gnu::optimize("strict-aliasing")]] void other_func() { Combined<OtherClass> myvar; __builtin_printf("%i\n", myvar.clashy()); } [[gnu::optimize("O0")]] int main() { other_func(); }
Created attachment 53799 [details] Testcase compile with -O2 -fno-strict-aliasing -flto New self contained testcase.
doesn't reproduce with -fwhole-program instead of -flto or with -flto-partition=max added. -fno-ipa-modref avoids the issue.
Also related to ICF, where -fno-ipa-icf fixes that as well. Lemme take a look.
Running that under valgrind I see it started with r12-656-ga564da506f52be66: ==13643== Conditional jump or move depends on uninitialised value(s) ==13643== at 0x401148: bool compare_pairs<SomeClass*>(pair<SomeClass*> const&, pair<SomeClass*> const&) (pr107467-2.C:12) ==13643== by 0x401176: Combined<OtherClass>::clashy() [clone .isra.0] (pr107467-2.C:23) ==13643== by 0x4011B2: other_func() (pr107467-2.C:40) ==13643== by 0x4011ED: main (pr107467-2.C:48) So I guess it might be something in between DSE and aliasing info investigated by IPA modref. @Honza: Can you take a look?
mine.
> diff -u a.ltrans0.ltrans.115t.vrp1 a.ltrans0.ltrans.116t.dse2 ... __attribute__((noinline, optimize ("strict-aliasing"))) bool clashy.isra () { @@ -90,8 +64,6 @@ bool _1; <bb 2> [local count: 1073741824]: - D.5191 = get_const.isra (); - D.5190 = get_const.isra (); _1 = compare_pairs (&D.5190, &D.5191); so it's ICFed compare_pairs having modref TBAA info that makes the stores dead. I suppose ICF needs to reset / alter the modref summaries?
> > so it's ICFed compare_pairs having modref TBAA info that makes the > stores dead. I suppose ICF needs to reset / alter the modref summaries? Well, matching that ICF does should be enough to verify that modref sumaries are same. Modref only does what would happen after inlining and ICF should be safe WRT both modref and inlining.
GCC 12.3 is being released, retargeting bugs to GCC 12.4.
GCC 12.4 is being released, retargeting bugs to GCC 12.5.
This seems to work on trunk now.
(In reply to Sam James from comment #12) > This seems to work on trunk now. Fixed on trunk by r15-626-ga5b3721c06646b for PR115110, then see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115110#c8 which refers to r15-512-g9b7cad5884f21c which sounds a lot like the fix for this.
Mine for backporting then (I've verified this fixes the attached single-file testcase).
Fixed for 14.3
We should add the testcase as well (I can do it).
(In reply to Sam James from comment #16) > We should add the testcase as well (I can do it). Posted: https://inbox.sourceware.org/gcc-patches/2a3d484399534a1449ac3a29d6089ed62b839a5a.1729885673.git.sam@gentoo.org/
The master branch has been updated by Sam James <sjames@gcc.gnu.org>: https://gcc.gnu.org/g:4e09ae37dbe0a10f48490214f50ff733cc92280a commit r15-4723-g4e09ae37dbe0a10f48490214f50ff733cc92280a Author: Sam James <sam@gentoo.org> Date: Mon Oct 21 12:11:42 2024 +0100 testsuite: add testcase for fixed PR107467 PR107467 ended up being fixed by the fix for PR115110, but let's add the testcase on top. gcc/testsuite/ChangeLog: PR tree-optimization/107467 PR middle-end/115110 * g++.dg/lto/pr107467_0.C: New test.
The releases/gcc-14 branch has been updated by Sam James <sjames@gcc.gnu.org>: https://gcc.gnu.org/g:2f0d109bd871d11b5b93468f271aa6dc34ef88d8 commit r14-10845-g2f0d109bd871d11b5b93468f271aa6dc34ef88d8 Author: Sam James <sam@gentoo.org> Date: Mon Oct 21 12:11:42 2024 +0100 testsuite: add testcase for fixed PR107467 PR107467 ended up being fixed by the fix for PR115110, but let's add the testcase on top. gcc/testsuite/ChangeLog: PR tree-optimization/107467 PR middle-end/115110 * g++.dg/lto/pr107467_0.C: New test.
The releases/gcc-14 branch has been updated by Sam James <sjames@gcc.gnu.org>: https://gcc.gnu.org/g:6247aae35faaf04de48c5f3d9c4e4af6f7e3789c commit r14-10847-g6247aae35faaf04de48c5f3d9c4e4af6f7e3789c Author: Sam James <sam@gentoo.org> Date: Mon Oct 21 12:11:42 2024 +0100 testsuite: add testcase for fixed PR107467 PR107467 ended up being fixed by the fix for PR115110, but let's add the testcase on top. gcc/testsuite/ChangeLog: PR tree-optimization/107467 PR middle-end/115110 * g++.dg/lto/pr107467_0.C: New test. (cherry picked from commit 4e09ae37dbe0a10f48490214f50ff733cc92280a)