Bug 107467 - [12/13/14 Regression] Miscompilation involing -Os , -flto and -fno-strict-aliasing since r12-656-ga564da506f52be66
Summary: [12/13/14 Regression] Miscompilation involing -Os , -flto and -fno-strict-ali...
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 12.2.0
: P2 normal
Target Milestone: 12.4
Assignee: Jan Hubicka
URL:
Keywords: lto, wrong-code
Depends on:
Blocks:
 
Reported: 2022-10-30 22:10 UTC by Volker Weißmann
Modified: 2023-06-06 23:07 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work: 10.1.0, 11.1.0, 11.3.0
Known to fail: 12.1.0
Last reconfirmed: 2023-06-06 00:00:00


Attachments
Source Code (17.45 KB, text/plain)
2022-10-30 22:10 UTC, Volker Weißmann
Details
Testcase compile with -O2 -fno-strict-aliasing -flto (409 bytes, text/plain)
2022-10-30 23:39 UTC, Andrew Pinski
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Volker Weißmann 2022-10-30 22:10:16 UTC
Created attachment 53798 [details]
Source Code

g++ -g Record.cpp -c -fPIC -Os -flto
g++ -g -shared Record.o -fno-strict-aliasing -o librecord.so
g++ main.cpp -L. -lrecord
LD_LIBRARY_PATH=$PWD ./a.out

This sometimes prints 1 and sometimes prints 0.
If I remove the -fno-strict-aliasing flag of the second g++ command, or add it to the first g++ command, it always prints 1.

There is no violation of the strict-aliasing rule in Record.cpp, at least none that I could spot.

If you are afraid of non-reproducibility, fear not, the generated assembly is the same every time, and is obviously wrong:

The method Combined<OtherClass>::clashy() is missing a call to get_const

objdump librecord.so --disassemble=_ZN8CombinedI10OtherClassE6clashyEv.isra.0
000000000000113e <_ZN8CombinedI10OtherClassE6clashyEv.isra.0>:
    113e:	48 83 ec 38          	sub    $0x38,%rsp
    1142:	64 48 8b 04 25 28 00 	mov    %fs:0x28,%rax
    1149:	00 00 
    114b:	48 89 44 24 28       	mov    %rax,0x28(%rsp)
    1150:	31 c0                	xor    %eax,%eax
    1152:	48 8d 74 24 18       	lea    0x18(%rsp),%rsi
    1157:	48 8d 7c 24 08       	lea    0x8(%rsp),%rdi
    115c:	e8 b9 ff ff ff       	call   111a <_Z13compare_pairsIP10OtherClassEbRKSt4pairIiT_ES6_>
    1161:	48 8b 54 24 28       	mov    0x28(%rsp),%rdx
    1166:	64 48 2b 14 25 28 00 	sub    %fs:0x28,%rdx
    116d:	00 00 
    116f:	74 05                	je     1176 <_ZN8CombinedI10OtherClassE6clashyEv.isra.0+0x38>
    1171:	e8 ca fe ff ff       	call   1040 <__stack_chk_fail@plt>
    1176:	48 83 c4 38          	add    $0x38,%rsp
    117a:	c3                   	ret


The almost identical Combined<SomeClass>::clashy() gets compiled correctly:

objdump librecord.so --disassemble=_ZN8CombinedI9SomeClassE6clashyEv.isra.0
000000000000117c <_ZN8CombinedI9SomeClassE6clashyEv.isra.0>:
    117c:	48 83 ec 38          	sub    $0x38,%rsp
    1180:	64 48 8b 04 25 28 00 	mov    %fs:0x28,%rax
    1187:	00 00 
    1189:	48 89 44 24 28       	mov    %rax,0x28(%rsp)
    118e:	31 c0                	xor    %eax,%eax
    1190:	48 8d 74 24 18       	lea    0x18(%rsp),%rsi
    1195:	48 8d 7c 24 08       	lea    0x8(%rsp),%rdi
    119a:	e8 97 ff ff ff       	call   1136 <_ZN8CombinedI9SomeClassE9get_constEv.constprop.0>
    119f:	48 89 54 24 20       	mov    %rdx,0x20(%rsp)
    11a4:	89 44 24 18          	mov    %eax,0x18(%rsp)
    11a8:	e8 89 ff ff ff       	call   1136 <_ZN8CombinedI9SomeClassE9get_constEv.constprop.0>
    11ad:	48 89 54 24 10       	mov    %rdx,0x10(%rsp)
    11b2:	89 44 24 08          	mov    %eax,0x8(%rsp)
    11b6:	e8 5f ff ff ff       	call   111a <_Z13compare_pairsIP10OtherClassEbRKSt4pairIiT_ES6_>
    11bb:	48 8b 54 24 28       	mov    0x28(%rsp),%rdx
    11c0:	64 48 2b 14 25 28 00 	sub    %fs:0x28,%rdx
    11c7:	00 00 
    11c9:	74 05                	je     11d0 <_ZN8CombinedI9SomeClassE6clashyEv.isra.0+0x54>
    11cb:	e8 70 fe ff ff       	call   1040 <__stack_chk_fail@plt>
    11d0:	48 83 c4 38          	add    $0x38,%rsp
    11d4:	c3                   	ret


$ gcc -v                                                                          
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/12.2.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --enable-bootstrap --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --with-build-config=bootstrap-lto --with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-linker-build-id --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch --disable-werror
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.2.0 (GCC) 

$ cat main.cpp
void other_func();
int main() { other_func(); }
Comment 1 Andrew Pinski 2022-10-30 23:22:52 UTC
Here is the unincluded version of the source:
#include <utility>

template <typename C>
bool __attribute__((noinline))
compare_pairs(const std::pair<int, C> &lhs, const std::pair<int, C> &rhs) {
  return lhs.first == rhs.first && lhs.second == rhs.second;
}

template <typename B> struct Combined {
  std::pair<int, B *> __attribute__((noinline)) get_const() {
    return std::make_pair(123, nullptr);
  }
  bool __attribute__((noinline)) clashy() {
    return compare_pairs(get_const(), get_const());
  }
};

class SomeClass {};
class OtherClass {};

void some_func() {
  Combined<SomeClass> myvar;
  __builtin_printf("%i\n", myvar.clashy());
}

void other_func() {
  Combined<OtherClass> myvar;
  __builtin_printf("%i\n", myvar.clashy());
}
Comment 2 Andrew Pinski 2022-10-30 23:36:28 UTC
Here is a single file example which just needs -Os -flto -fno-strict-aliasing:

template <class T>
struct pair
{
    int first;
    T second;
};

template <typename C>
[[gnu::optimize("strict-aliasing")]]
bool __attribute__((noinline))
compare_pairs(const pair<C> &lhs, const pair<C> &rhs) {
  return lhs.first == rhs.first && lhs.second == rhs.second;
}

template <typename B> struct Combined {
  pair<B *> 
__attribute__((noinline)) get_const() {
    return pair<B *>{123, nullptr};
  }
[[gnu::optimize("strict-aliasing")]]
  bool 
__attribute__((noinline)) clashy() {
    return compare_pairs(get_const(), get_const());
  }
};

class SomeClass {};
class OtherClass {};

[[gnu::optimize("strict-aliasing")]]
[[gnu::used]]
void some_func() {
  Combined<SomeClass> myvar;
  __builtin_printf("%i\n", myvar.clashy());
}

[[gnu::optimize("strict-aliasing")]]
void other_func() {
  Combined<OtherClass> myvar;
  __builtin_printf("%i\n", myvar.clashy());
}

[[gnu::optimize("O0")]]
int main()
{
  other_func();
}
Comment 3 Andrew Pinski 2022-10-30 23:39:19 UTC
Created attachment 53799 [details]
Testcase compile with -O2 -fno-strict-aliasing -flto

New self contained testcase.
Comment 4 Richard Biener 2022-11-05 10:29:51 UTC
doesn't reproduce with -fwhole-program instead of -flto or with -flto-partition=max added.  -fno-ipa-modref avoids the issue.
Comment 5 Martin Liška 2022-11-21 13:23:19 UTC
Also related to ICF, where -fno-ipa-icf fixes that as well. Lemme take a look.
Comment 6 Martin Liška 2022-12-28 10:07:49 UTC
Running that under valgrind I see it started with r12-656-ga564da506f52be66:

==13643== Conditional jump or move depends on uninitialised value(s)
==13643==    at 0x401148: bool compare_pairs<SomeClass*>(pair<SomeClass*> const&, pair<SomeClass*> const&) (pr107467-2.C:12)
==13643==    by 0x401176: Combined<OtherClass>::clashy() [clone .isra.0] (pr107467-2.C:23)
==13643==    by 0x4011B2: other_func() (pr107467-2.C:40)
==13643==    by 0x4011ED: main (pr107467-2.C:48)

So I guess it might be something in between DSE and aliasing info investigated by IPA modref.

@Honza: Can you take a look?
Comment 7 Jan Hubicka 2023-01-04 17:29:47 UTC
mine.
Comment 8 Richard Biener 2023-01-13 11:47:40 UTC
> diff -u a.ltrans0.ltrans.115t.vrp1 a.ltrans0.ltrans.116t.dse2
...
 __attribute__((noinline, optimize ("strict-aliasing")))
 bool clashy.isra ()
 {
@@ -90,8 +64,6 @@
   bool _1;
 
   <bb 2> [local count: 1073741824]:
-  D.5191 = get_const.isra ();
-  D.5190 = get_const.isra ();
   _1 = compare_pairs (&D.5190, &D.5191);

so it's ICFed compare_pairs having modref TBAA info that makes the
stores dead.  I suppose ICF needs to reset / alter the modref summaries?
Comment 9 Jan Hubicka 2023-01-13 22:08:36 UTC
> 
> so it's ICFed compare_pairs having modref TBAA info that makes the
> stores dead.  I suppose ICF needs to reset / alter the modref summaries?

Well, matching that ICF does should be enough to verify that modref
sumaries are same.  Modref only does what would happen after inlining
and ICF should be safe WRT both modref and inlining.
Comment 10 Richard Biener 2023-05-08 12:25:48 UTC
GCC 12.3 is being released, retargeting bugs to GCC 12.4.