GCC with LTO seems to be hoisting a memory read to a place too early. It only seems to reproduce with LTO, so please excuse posting multiple files. Compile command: gcc -flto -O2 -fno-strict-aliasing one.c two.c three.c four.c //--------------- one.c -------------------------------- typedef unsigned long VALUE; __attribute__ ((cold)) void rb_check_type(VALUE, int); static VALUE repro(VALUE dummy, VALUE hash) { if (hash == 0) { rb_check_type(hash, 1); } else if (*(long *)hash) { rb_check_type(hash, 1); } return *(long *)hash; } static VALUE (*that)(VALUE dummy, VALUE hash) = repro; int main(int argc, char **argv) { argc--; that(0, argc); rb_check_type(argc, argc); } //------------ end of one.c ---------------------------- //------------ two.c ----------------------------------- typedef unsigned long VALUE; __attribute__ ((noreturn)) void rexc_raise(VALUE mesg); VALUE rb_donothing(VALUE klass); static void funexpected_type(VALUE x, int xt, int t) { rexc_raise(rb_donothing(0)); } __attribute__ ((cold)) void rb_check_type(VALUE x, int t) { int xt; if (x == 0) { funexpected_type(x, xt, t); } } //------------- end of two.c --------------------------- //------------ three.c --------------------------------- typedef unsigned long VALUE; static void thing(void) {} static void (*ptr)(void) = &thing; VALUE rb_donothing(VALUE klass) { ptr(); return 0; } //-------- end of three.c ------------------------------ //-------- four.c -------------------------------------- typedef unsigned long VALUE; __attribute__((noreturn)) void rexc_raise(VALUE mesg) { __builtin_exit(42); } //------------- end of four.c -------------------------- The code for repo() reads from memory before doing the check for zero: 0x00000000004011a0 <+0>: sub $0x18,%rsp => 0x00000000004011a4 <+4>: mov (%rsi),%rax 0x00000000004011a7 <+7>: test %rsi,%rsi 0x00000000004011aa <+10>: je 0x401051 <repro.cold> 0x00000000004011b0 <+16>: test %rax,%rax 0x00000000004011b3 <+19>: jne 0x401067 <repro.cold+22> 0x00000000004011b9 <+25>: add $0x18,%rsp 0x00000000004011bd <+29>: ret Here is the output of gcc -v. I'm using the 11.2.0 Docker Hub image. Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-linux-gnu/11.2.0/lto-wrapper Target: x86_64-linux-gnu Configured with: /usr/src/gcc/configure --build=x86_64-linux-gnu --disable-multilib --enable-languages=c,c++,fortran,go Thread model: posix Supported LTO compression algorithms: zlib gcc version 11.2.0 (GCC)
Works for me on the trunk: repro: .LFB9: .cfi_startproc subq $24, %rsp testq %rsi, %rsi je .L14 movq (%rsi), %rax testq %rax, %rax jne .L15 .L10: addq $24, %rsp ... .L14: xorl %edi, %edi call rb_check_type.isra.0 movq 0, %rax jmp .L10 .L15: movq %rsi, %rdi movq %rax, 8(%rsp) call rb_check_type.isra.0 movq 8(%rsp), %rax jmp .L10
(In reply to Andrew Pinski from comment #1) > Works for me on the trunk: I almost want to say this was fixed by PR 101373. Before Pre we had: if (hash_6(D) == 0) goto <bb 3>; [0.00%] else goto <bb 4>; [100.00%] <bb 3> [count: 0]: rb_check_type.isra (0); goto <bb 6>; [0.00%] <bb 4> [local count: 1073741824]: _1 = (long int *) hash_6(D); _2 = *_1; if (_2 != 0) goto <bb 5>; [0.00%] else goto <bb 6>; [100.00%] <bb 5> [count: 0]: rb_check_type.isra (hash_6(D)); <bb 6> [local count: 1073741824]: _3 = (long int *) hash_6(D); _4 = *_3; Pre is able to figure that _3 and _1 are the same and even *_3 and *_1 would be the same except rb_check_type.isra (hash_6(D)) can do a noreturn depending on the argument even though it is a "pure" function otherwise.
(In reply to Andrew Pinski from comment #2) > (In reply to Andrew Pinski from comment #1) > > Works for me on the trunk: > > I almost want to say this was fixed by PR 101373. Yes, I can confirm that it was fixed with r12-2254-gfedcf3c476aff753. @Richi: Can we close it as dup?
(In reply to Martin Liška from comment #3) > (In reply to Andrew Pinski from comment #2) > > (In reply to Andrew Pinski from comment #1) > > > Works for me on the trunk: > > > > I almost want to say this was fixed by PR 101373. > > Yes, I can confirm that it was fixed with r12-2254-gfedcf3c476aff753. > @Richi: Can we close it as dup? We should at least put this as a testcase. I suspect it is a regression from when -fcode-hoisting was added to GCC too.
(In reply to Martin Liška from comment #3) > (In reply to Andrew Pinski from comment #2) > > (In reply to Andrew Pinski from comment #1) > > > Works for me on the trunk: > > > > I almost want to say this was fixed by PR 101373. > > Yes, I can confirm that it was fixed with r12-2254-gfedcf3c476aff753. > @Richi: Can we close it as dup? Yes, can you add the testcase?
> Yes, can you add the testcase? Sure.
Is there any chance that this fix could be backported to 11 or is it too risky?
(In reply to Marek Polacek from comment #7) > Is there any chance that this fix could be backported to 11 or is it too > risky? To fix this bug it should be enough to backport the following part: * tree-ssa-pre.c (prune_clobbered_mems): Also prune trapping references when the BB may not return. I'll check and do that.
I also have a testcase for the testsuite.
The releases/gcc-11 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0 commit r11-8875-gee875b63b22e30a0dcb4b05f7532c2c416ba6cd0 Author: Richard Biener <rguenther@suse.de> Date: Tue Aug 17 08:38:35 2021 +0200 tree-optimization/101868 - avoid PRE of trapping mems across calls This backports a fix for the omission of a check of trapping mems when hoisting them across calls that might not return. This was originally done as part of a fix to handle const functions that throw properly. 2021-08-17 Richard Biener <rguenther@suse.de> PR tree-optimization/101373 PR tree-optimization/101868 * tree-ssa-pre.c (prune_clobbered_mems): Also prune trapping references when the BB may not return. * gcc.dg/lto/pr101868_0.c: New testcase. * gcc.dg/lto/pr101868_1.c: Likewise. * gcc.dg/lto/pr101868_2.c: Likewise. * gcc.dg/lto/pr101868_3.c: Likewise.
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:3ed779689631ff8f398dcde06d5efa2a3c43ef27 commit r12-2943-g3ed779689631ff8f398dcde06d5efa2a3c43ef27 Author: Richard Biener <rguenther@suse.de> Date: Tue Aug 17 11:23:06 2021 +0200 tree-optimization/101868 - avoid PRE of trapping mems across calls This adds the testcase from the fix for the PR. 2021-08-17 Richard Biener <rguenther@suse.de> PR tree-optimization/101868 * gcc.dg/lto/pr101868_0.c: New testcase. * gcc.dg/lto/pr101868_1.c: Likewise. * gcc.dg/lto/pr101868_2.c: Likewise. * gcc.dg/lto/pr101868_3.c: Likewise.
The releases/gcc-10 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:95a95ec274cd0ec125ce48ab002fad4e400e345b commit r10-10206-g95a95ec274cd0ec125ce48ab002fad4e400e345b Author: Richard Biener <rguenther@suse.de> Date: Tue Aug 17 08:38:35 2021 +0200 tree-optimization/101868 - avoid PRE of trapping mems across calls This backports a fix for the omission of a check of trapping mems when hoisting them across calls that might not return. This was originally done as part of a fix to handle const functions that throw properly. 2021-08-17 Richard Biener <rguenther@suse.de> PR tree-optimization/101373 PR tree-optimization/101868 * tree-ssa-pre.c (prune_clobbered_mems): Also prune trapping references when the BB may not return. * gcc.dg/lto/pr101868_0.c: New testcase. * gcc.dg/lto/pr101868_1.c: Likewise. * gcc.dg/lto/pr101868_2.c: Likewise. * gcc.dg/lto/pr101868_3.c: Likewise. (cherry picked from commit ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0)
The releases/gcc-9 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:2498de689b735422ef71d93e2afe7ae3e6988bb3 commit r9-9818-g2498de689b735422ef71d93e2afe7ae3e6988bb3 Author: Richard Biener <rguenther@suse.de> Date: Tue Aug 17 08:38:35 2021 +0200 tree-optimization/101868 - avoid PRE of trapping mems across calls This backports a fix for the omission of a check of trapping mems when hoisting them across calls that might not return. This was originally done as part of a fix to handle const functions that throw properly. 2021-08-17 Richard Biener <rguenther@suse.de> PR tree-optimization/101373 PR tree-optimization/101868 * tree-ssa-pre.c (prune_clobbered_mems): Also prune trapping references when the BB may not return. * gcc.dg/lto/pr101868_0.c: New testcase. * gcc.dg/lto/pr101868_1.c: Likewise. * gcc.dg/lto/pr101868_2.c: Likewise. * gcc.dg/lto/pr101868_3.c: Likewise. (cherry picked from commit ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0)
Fixed.
The master branch has been updated by Dimitar Dimitrov <dimitar@gcc.gnu.org>: https://gcc.gnu.org/g:b1d0d3520e96802dee37e8fc1c56e19c13d598b1 commit r13-1257-gb1d0d3520e96802dee37e8fc1c56e19c13d598b1 Author: Dimitar Dimitrov <dimitar@dinux.eu> Date: Sun May 15 17:30:52 2022 +0300 testsuite: Remove reliance on argc in lto/pr101868_0.c Some embedded targets do not pass any argv arguments. When argc is zero, this causes spurious failures for lto/pr101868_0.c. Fix by following the strategy in r0-114701-g2c49569ecea56d. Use a volatile variable instead of argc to inject a runtime value into the test. I validated the following: - No changes in testresults for x86_64-pc-linux-gnu. - The spurious failures are fixed for PRU target. - lto/pr101868_0.c still fails on x86_64-pc-linux-gnu, if the PR/101868 fix (r12-2254-gfedcf3c476aff7) is reverted. PR tree-optimization/101868 gcc/testsuite/ChangeLog: * gcc.dg/lto/pr101868_0.c (zero): New volatile variable. (main): Use it instead of argc. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>