[528] % gcctk -v Using built-in specs. COLLECT_GCC=gcctk COLLECT_LTO_WRAPPER=/local/suz-local/software/local/gcc-trunk/libexec/gcc/x86_64-pc-linux-gnu/11.0.1/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ../gcc-trunk/configure --disable-bootstrap --prefix=/local/suz-local/software/local/gcc-trunk --enable-languages=c,c++ --disable-werror --enable-multilib --with-system-zlib Thread model: posix Supported LTO compression algorithms: zlib gcc version 11.0.1 20210414 (experimental) [master revision 006783f4b16:29da9c11552:0589be0c59767cf4cbb0ef0e7d918cf6aa3d606c] (GCC) [529] % [529] % gcctk -O2 -S -o O2.s small.c [530] % gcctk -O3 -S -o O3.s small.c [531] % [531] % wc O2.s O3.s 101 229 1393 O2.s 145 329 2038 O3.s 246 558 3431 total [532] % [532] % grep foo O2.s [533] % grep foo O3.s call foo [534] % [534] % cat small.c extern void foo(void); volatile int a, b, h; int *c, d[4], f, i, j; long g; static unsigned e() { int k; while (b) { for (k = 0; k < 4; k++) { d[k] && a; h = j ? 0 : i; } *c = 0; } return 0; } int main() { for (f = 0; f < 5; f++) g = 1; if (!e() ^ g) foo(); return 0; }
Confirmed. At -O2 PRE manages to optimize the call to foo. The difference starts at cunrolli where -O3 unrolls but -O2 not, disabling cunrolli restores optimization.
Seems fixed in GCC 12
Looks like it is jump threading differences between GCC 11 and GCC 12 which fixes this.