Bug 107467 - [12/13 Regression] Miscompilation involving -Os, flto, and -fno-strict-aliasing since r12-656-ga564da506f52be66
Summary: [12/13 Regression] Miscompilation involving -Os, flto, and -fno-strict-aliasi...
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 12.2.0
: P2 normal
Target Milestone: 12.5
Assignee: Richard Biener
URL:
Keywords: lto, wrong-code
Depends on:
Blocks:
 
Reported: 2022-10-30 22:10 UTC by Volker Weißmann
Modified: 2024-10-28 17:07 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work: 10.1.0, 11.1.0, 11.3.0, 14.2.1, 15.0
Known to fail: 12.1.0, 14.2.0
Last reconfirmed: 2023-06-06 00:00:00


Attachments
Source Code (17.45 KB, text/plain)
2022-10-30 22:10 UTC, Volker Weißmann
Details
Testcase compile with -O2 -fno-strict-aliasing -flto (409 bytes, text/plain)
2022-10-30 23:39 UTC, Andrew Pinski
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Volker Weißmann 2022-10-30 22:10:16 UTC
Created attachment 53798 [details]
Source Code

g++ -g Record.cpp -c -fPIC -Os -flto
g++ -g -shared Record.o -fno-strict-aliasing -o librecord.so
g++ main.cpp -L. -lrecord
LD_LIBRARY_PATH=$PWD ./a.out

This sometimes prints 1 and sometimes prints 0.
If I remove the -fno-strict-aliasing flag of the second g++ command, or add it to the first g++ command, it always prints 1.

There is no violation of the strict-aliasing rule in Record.cpp, at least none that I could spot.

If you are afraid of non-reproducibility, fear not, the generated assembly is the same every time, and is obviously wrong:

The method Combined<OtherClass>::clashy() is missing a call to get_const

objdump librecord.so --disassemble=_ZN8CombinedI10OtherClassE6clashyEv.isra.0
000000000000113e <_ZN8CombinedI10OtherClassE6clashyEv.isra.0>:
    113e:	48 83 ec 38          	sub    $0x38,%rsp
    1142:	64 48 8b 04 25 28 00 	mov    %fs:0x28,%rax
    1149:	00 00 
    114b:	48 89 44 24 28       	mov    %rax,0x28(%rsp)
    1150:	31 c0                	xor    %eax,%eax
    1152:	48 8d 74 24 18       	lea    0x18(%rsp),%rsi
    1157:	48 8d 7c 24 08       	lea    0x8(%rsp),%rdi
    115c:	e8 b9 ff ff ff       	call   111a <_Z13compare_pairsIP10OtherClassEbRKSt4pairIiT_ES6_>
    1161:	48 8b 54 24 28       	mov    0x28(%rsp),%rdx
    1166:	64 48 2b 14 25 28 00 	sub    %fs:0x28,%rdx
    116d:	00 00 
    116f:	74 05                	je     1176 <_ZN8CombinedI10OtherClassE6clashyEv.isra.0+0x38>
    1171:	e8 ca fe ff ff       	call   1040 <__stack_chk_fail@plt>
    1176:	48 83 c4 38          	add    $0x38,%rsp
    117a:	c3                   	ret


The almost identical Combined<SomeClass>::clashy() gets compiled correctly:

objdump librecord.so --disassemble=_ZN8CombinedI9SomeClassE6clashyEv.isra.0
000000000000117c <_ZN8CombinedI9SomeClassE6clashyEv.isra.0>:
    117c:	48 83 ec 38          	sub    $0x38,%rsp
    1180:	64 48 8b 04 25 28 00 	mov    %fs:0x28,%rax
    1187:	00 00 
    1189:	48 89 44 24 28       	mov    %rax,0x28(%rsp)
    118e:	31 c0                	xor    %eax,%eax
    1190:	48 8d 74 24 18       	lea    0x18(%rsp),%rsi
    1195:	48 8d 7c 24 08       	lea    0x8(%rsp),%rdi
    119a:	e8 97 ff ff ff       	call   1136 <_ZN8CombinedI9SomeClassE9get_constEv.constprop.0>
    119f:	48 89 54 24 20       	mov    %rdx,0x20(%rsp)
    11a4:	89 44 24 18          	mov    %eax,0x18(%rsp)
    11a8:	e8 89 ff ff ff       	call   1136 <_ZN8CombinedI9SomeClassE9get_constEv.constprop.0>
    11ad:	48 89 54 24 10       	mov    %rdx,0x10(%rsp)
    11b2:	89 44 24 08          	mov    %eax,0x8(%rsp)
    11b6:	e8 5f ff ff ff       	call   111a <_Z13compare_pairsIP10OtherClassEbRKSt4pairIiT_ES6_>
    11bb:	48 8b 54 24 28       	mov    0x28(%rsp),%rdx
    11c0:	64 48 2b 14 25 28 00 	sub    %fs:0x28,%rdx
    11c7:	00 00 
    11c9:	74 05                	je     11d0 <_ZN8CombinedI9SomeClassE6clashyEv.isra.0+0x54>
    11cb:	e8 70 fe ff ff       	call   1040 <__stack_chk_fail@plt>
    11d0:	48 83 c4 38          	add    $0x38,%rsp
    11d4:	c3                   	ret


$ gcc -v                                                                          
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/12.2.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --enable-bootstrap --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --with-build-config=bootstrap-lto --with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-linker-build-id --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch --disable-werror
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.2.0 (GCC) 

$ cat main.cpp
void other_func();
int main() { other_func(); }
Comment 1 Andrew Pinski 2022-10-30 23:22:52 UTC
Here is the unincluded version of the source:
#include <utility>

template <typename C>
bool __attribute__((noinline))
compare_pairs(const std::pair<int, C> &lhs, const std::pair<int, C> &rhs) {
  return lhs.first == rhs.first && lhs.second == rhs.second;
}

template <typename B> struct Combined {
  std::pair<int, B *> __attribute__((noinline)) get_const() {
    return std::make_pair(123, nullptr);
  }
  bool __attribute__((noinline)) clashy() {
    return compare_pairs(get_const(), get_const());
  }
};

class SomeClass {};
class OtherClass {};

void some_func() {
  Combined<SomeClass> myvar;
  __builtin_printf("%i\n", myvar.clashy());
}

void other_func() {
  Combined<OtherClass> myvar;
  __builtin_printf("%i\n", myvar.clashy());
}
Comment 2 Andrew Pinski 2022-10-30 23:36:28 UTC
Here is a single file example which just needs -Os -flto -fno-strict-aliasing:

template <class T>
struct pair
{
    int first;
    T second;
};

template <typename C>
[[gnu::optimize("strict-aliasing")]]
bool __attribute__((noinline))
compare_pairs(const pair<C> &lhs, const pair<C> &rhs) {
  return lhs.first == rhs.first && lhs.second == rhs.second;
}

template <typename B> struct Combined {
  pair<B *> 
__attribute__((noinline)) get_const() {
    return pair<B *>{123, nullptr};
  }
[[gnu::optimize("strict-aliasing")]]
  bool 
__attribute__((noinline)) clashy() {
    return compare_pairs(get_const(), get_const());
  }
};

class SomeClass {};
class OtherClass {};

[[gnu::optimize("strict-aliasing")]]
[[gnu::used]]
void some_func() {
  Combined<SomeClass> myvar;
  __builtin_printf("%i\n", myvar.clashy());
}

[[gnu::optimize("strict-aliasing")]]
void other_func() {
  Combined<OtherClass> myvar;
  __builtin_printf("%i\n", myvar.clashy());
}

[[gnu::optimize("O0")]]
int main()
{
  other_func();
}
Comment 3 Andrew Pinski 2022-10-30 23:39:19 UTC
Created attachment 53799 [details]
Testcase compile with -O2 -fno-strict-aliasing -flto

New self contained testcase.
Comment 4 Richard Biener 2022-11-05 10:29:51 UTC
doesn't reproduce with -fwhole-program instead of -flto or with -flto-partition=max added.  -fno-ipa-modref avoids the issue.
Comment 5 Martin Liška 2022-11-21 13:23:19 UTC
Also related to ICF, where -fno-ipa-icf fixes that as well. Lemme take a look.
Comment 6 Martin Liška 2022-12-28 10:07:49 UTC
Running that under valgrind I see it started with r12-656-ga564da506f52be66:

==13643== Conditional jump or move depends on uninitialised value(s)
==13643==    at 0x401148: bool compare_pairs<SomeClass*>(pair<SomeClass*> const&, pair<SomeClass*> const&) (pr107467-2.C:12)
==13643==    by 0x401176: Combined<OtherClass>::clashy() [clone .isra.0] (pr107467-2.C:23)
==13643==    by 0x4011B2: other_func() (pr107467-2.C:40)
==13643==    by 0x4011ED: main (pr107467-2.C:48)

So I guess it might be something in between DSE and aliasing info investigated by IPA modref.

@Honza: Can you take a look?
Comment 7 Jan Hubicka 2023-01-04 17:29:47 UTC
mine.
Comment 8 Richard Biener 2023-01-13 11:47:40 UTC
> diff -u a.ltrans0.ltrans.115t.vrp1 a.ltrans0.ltrans.116t.dse2
...
 __attribute__((noinline, optimize ("strict-aliasing")))
 bool clashy.isra ()
 {
@@ -90,8 +64,6 @@
   bool _1;
 
   <bb 2> [local count: 1073741824]:
-  D.5191 = get_const.isra ();
-  D.5190 = get_const.isra ();
   _1 = compare_pairs (&D.5190, &D.5191);

so it's ICFed compare_pairs having modref TBAA info that makes the
stores dead.  I suppose ICF needs to reset / alter the modref summaries?
Comment 9 Jan Hubicka 2023-01-13 22:08:36 UTC
> 
> so it's ICFed compare_pairs having modref TBAA info that makes the
> stores dead.  I suppose ICF needs to reset / alter the modref summaries?

Well, matching that ICF does should be enough to verify that modref
sumaries are same.  Modref only does what would happen after inlining
and ICF should be safe WRT both modref and inlining.
Comment 10 Richard Biener 2023-05-08 12:25:48 UTC
GCC 12.3 is being released, retargeting bugs to GCC 12.4.
Comment 11 Richard Biener 2024-06-20 09:10:41 UTC
GCC 12.4 is being released, retargeting bugs to GCC 12.5.
Comment 12 Sam James 2024-10-19 08:50:41 UTC
This seems to work on trunk now.
Comment 13 Sam James 2024-10-19 09:43:19 UTC
(In reply to Sam James from comment #12)
> This seems to work on trunk now.

Fixed on trunk by r15-626-ga5b3721c06646b for PR115110, then see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115110#c8 which refers to r15-512-g9b7cad5884f21c which sounds a lot like the fix for this.
Comment 14 Richard Biener 2024-10-21 07:54:45 UTC
Mine for backporting then (I've verified this fixes the attached single-file testcase).
Comment 15 Richard Biener 2024-10-21 10:50:11 UTC
Fixed for 14.3
Comment 16 Sam James 2024-10-21 10:52:48 UTC
We should add the testcase as well (I can do it).
Comment 17 Sam James 2024-10-25 19:49:05 UTC
(In reply to Sam James from comment #16)
> We should add the testcase as well (I can do it).

Posted: https://inbox.sourceware.org/gcc-patches/2a3d484399534a1449ac3a29d6089ed62b839a5a.1729885673.git.sam@gentoo.org/
Comment 18 GCC Commits 2024-10-28 17:05:32 UTC
The master branch has been updated by Sam James <sjames@gcc.gnu.org>:

https://gcc.gnu.org/g:4e09ae37dbe0a10f48490214f50ff733cc92280a

commit r15-4723-g4e09ae37dbe0a10f48490214f50ff733cc92280a
Author: Sam James <sam@gentoo.org>
Date:   Mon Oct 21 12:11:42 2024 +0100

    testsuite: add testcase for fixed PR107467
    
    PR107467 ended up being fixed by the fix for PR115110, but let's
    add the testcase on top.
    
    gcc/testsuite/ChangeLog:
            PR tree-optimization/107467
            PR middle-end/115110
    
            * g++.dg/lto/pr107467_0.C: New test.
Comment 19 GCC Commits 2024-10-28 17:06:16 UTC
The releases/gcc-14 branch has been updated by Sam James <sjames@gcc.gnu.org>:

https://gcc.gnu.org/g:2f0d109bd871d11b5b93468f271aa6dc34ef88d8

commit r14-10845-g2f0d109bd871d11b5b93468f271aa6dc34ef88d8
Author: Sam James <sam@gentoo.org>
Date:   Mon Oct 21 12:11:42 2024 +0100

    testsuite: add testcase for fixed PR107467
    
    PR107467 ended up being fixed by the fix for PR115110, but let's
    add the testcase on top.
    
    gcc/testsuite/ChangeLog:
            PR tree-optimization/107467
            PR middle-end/115110
    
            * g++.dg/lto/pr107467_0.C: New test.
Comment 20 GCC Commits 2024-10-28 17:07:00 UTC
The releases/gcc-14 branch has been updated by Sam James <sjames@gcc.gnu.org>:

https://gcc.gnu.org/g:6247aae35faaf04de48c5f3d9c4e4af6f7e3789c

commit r14-10847-g6247aae35faaf04de48c5f3d9c4e4af6f7e3789c
Author: Sam James <sam@gentoo.org>
Date:   Mon Oct 21 12:11:42 2024 +0100

    testsuite: add testcase for fixed PR107467
    
    PR107467 ended up being fixed by the fix for PR115110, but let's
    add the testcase on top.
    
    gcc/testsuite/ChangeLog:
            PR tree-optimization/107467
            PR middle-end/115110
    
            * g++.dg/lto/pr107467_0.C: New test.
    
    (cherry picked from commit 4e09ae37dbe0a10f48490214f50ff733cc92280a)