gcc trunk, rev250140, x86_64. Test case has no undefined behavior, but -O2 produces incorrect result. > cat f.cpp #include <iostream> unsigned long long int ll = 0; unsigned long long int ull1 = 1ULL; unsigned long long int ull2 = 12008284144813806346ULL; unsigned long long int ull3; void foo() { ll = -5597998501375493990LL; ll = unsigned(5677365550390624949L - ll) - (ull1 > 0); ull3 = unsigned(2067854353L << (((ll + -2129105131L) ^ 10280750144413668236ULL) - 10280750143997242009ULL)) >> ((2873442921854271231ULL | ull2) - 12098357307243495419ULL); } int main() { foo(); std::cout << ull3 << " (expected 3998784)\n"; return 0; } > g++ f.cpp -O0 -o out; ./out 3998784 (expected 3998784) > g++ f.cpp -O2 -o out; ./out 3493659712 (expected 3998784)
Confirmed, started with r225463.
The referred commit enabled new extzv<mode> and insv<mode> patterns, and in the past, it exposed a bunch of bugs in the generic RTL part. Let me analyse what is going on here.
Following is a c testcase: --cut here-- unsigned long long int ll = 0; unsigned long long int ull1 = 1ULL; unsigned long long int ull2 = 12008284144813806346ULL; unsigned long long int ull3; void foo () { ll = -5597998501375493990LL; ll = (5677365550390624949L - ll) - (ull1 > 0); ull3 = (unsigned int) (2067854353L << (((ll + -2129105131L) ^ 10280750144413668236ULL) - 10280750143997242009ULL)) >> ((2873442921854271231ULL | ull2) - 12098357307243495419ULL); } int main () { foo (); printf ("%llu expected 3998784)\n", ull3); printf ("%x expected 3d0440)\n", ull3); return 0; } --cut here-- The problem is in combine pass, which combines following sequence in the main function: (insn 17 16 18 2 (parallel [ (set (reg:SI 114) (ior:SI (subreg:SI (reg:DI 113 [ ull2 ]) 0) (const_int -8651009 [0xffffffffff7bfeff]))) (clobber (reg:CC 17 flags)) ]) "pr81423.c":15 415 {*iorsi_1} (expr_list:REG_DEAD (reg:DI 113 [ ull2 ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))) (insn 18 17 19 2 (parallel [ (set (reg:SI 115) (plus:SI (reg:SI 114) (const_int 5 [0x5]))) (clobber (reg:CC 17 flags)) ]) "pr81423.c":16 217 {*addsi_1} (expr_list:REG_DEAD (reg:SI 114) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))) (insn 19 18 20 2 (parallel [ (set (reg:SI 116) (lshiftrt:SI (subreg:SI (reg:DI 111) 0) (subreg:QI (reg:SI 115) 0))) (clobber (reg:CC 17 flags)) ]) "pr81423.c":15 544 {*lshrsi3_1} (expr_list:REG_DEAD (reg:SI 115) (expr_list:REG_DEAD (reg:DI 111) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))))) (insn 20 19 21 2 (set (reg:DI 103 [ _20 ]) (zero_extend:DI (reg:SI 116))) "pr81423.c":15 131 {*zero_extendsidi2} (expr_list:REG_DEAD (reg:SI 116) (nil))) Trying 17, 18, 19 -> 20: Failed to match this instruction: (set (reg:DI 103 [ _20 ]) (zero_extract:DI (reg:DI 111) (const_int 32 [0x20]) (const_int 4 [0x4]))) Failed to match this instruction: (set (reg:DI 103 [ _20 ]) (and:DI (lshiftrt:DI (reg:DI 111) (const_int 4 [0x4])) (const_int 4294967295 [0xffffffff]))) Successfully matched this instruction: (set (reg:DI 116) (ashift:DI (reg:DI 111) (const_int 28 [0x1c]))) Successfully matched this instruction: (set (reg:DI 103 [ _20 ]) (lshiftrt:DI (reg:DI 116) (const_int 32 [0x20]))) allowing combination of insns 17, 18, 19 and 20 original costs 6 + 4 + 4 + 1 = 15 replacement costs 4 + 4 = 8 deferring deletion of insn with uid = 18. deferring deletion of insn with uid = 17. deferring deletion of insn with uid = 16. modifying insn i2 19: {r116:DI=r111:DI<<0x1c;clobber flags:CC;} REG_UNUSED flags:CC REG_DEAD r111:DI deferring rescan insn with uid = 19. modifying insn i3 20: {r103:DI=r116:DI 0>>0x20;clobber flags:CC;} REG_UNUSED flags:CC REG_DEAD r116:DI deferring rescan insn with uid = 20. Please note that we have in the original sequence: (insn 19 18 20 2 (parallel [ (set (reg:SI 116) (lshiftrt:SI (subreg:SI (reg:DI 111) 0) (subreg:QI (reg:SI 115) 0))) (clobber (reg:CC 17 flags)) ]) "pr81423.c":15 544 {*lshrsi3_1} (expr_list:REG_DEAD (reg:SI 115) (expr_list:REG_DEAD (reg:DI 111) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))))) in effect, this insn is shifting value, truncated to 32 bits, from rax 0x1ed03d04400 2117482857472 by 4 via: 0x0000000000400557 <+71>: add $0x5,%ecx 0x000000000040055a <+74>: shr %cl,%eax resulting in: rax 0x3d0440 3998784 rcx 0x4 4 However, combine tries to create: (set (reg:DI 103 [ _20 ]) (zero_extract:DI (reg:DI 111) (const_int 32 [0x20]) (const_int 4 [0x4]))) trying to extract 32bit value from position 4. This is not correct, due to 32bit left shift in the sequence, the combined insn should zero-extract 28 bits only.
CC maintainer.
There are a few issues here. 1) I cannot reproduce it: a trunk x86_64-linux compiler stores to memory in that insn 20 (the testcase from comment 3). 2) 17, 18 -> 19 should already match. Instead we get Trying 17, 18 -> 19: Failed to match this instruction: (parallel [ (set (reg:SI 114) (lshiftrt:SI (subreg:SI (reg:DI 109) 0) (plus:QI (subreg:QI (ior:SI (subreg:SI (reg:DI 111 [ ull2 ]) 0) (const_int -8651009 [0xffffffffff7bfeff])) 0) (const_int 5 [0x5])))) (clobber (reg:CC 17 flags)) ]) If the subreg:QI there was simplified properly (to just 0xff) the plus:QI could be simplified as well. 3) The reported issue. Confirmed.
(In reply to Segher Boessenkool from comment #5) > There are a few issues here. > > 1) I cannot reproduce it: a trunk x86_64-linux compiler stores to > memory in that insn 20 (the testcase from comment 3). Just re-checked with todays SVN. There are two (insn 20) RTXes; one in the function itself, the second one (the problematic one) in the inlined copy in the main function.
Author: segher Date: Wed Jul 19 19:28:41 2017 New Revision: 250363 URL: https://gcc.gnu.org/viewcvs?rev=250363&root=gcc&view=rev Log: simplify-rtx: The truncation of an IOR can have all bits set (PR81423) ... if it is an IOR with a constant with all bits set in the mode that is truncated to, for example. Handle that case. PR rtl-optimization/81423 * simplify-rtx.c (simplify_truncation): Handle truncating an IOR with a constant that is -1 in the truncated to mode. Modified: trunk/gcc/ChangeLog trunk/gcc/simplify-rtx.c
Author: segher Date: Wed Jul 19 19:31:26 2017 New Revision: 250365 URL: https://gcc.gnu.org/viewcvs?rev=250365&root=gcc&view=rev Log: combine: Fix for PR81423 We here have an AND of a SUBREG of an LSHIFTRT. If that SUBREG is paradoxical, the extraction we form is the length of the size of the inner mode, which includes some bits that should not be in the result. Just give up in that case. PR rtl-optimization/81423 * combine.c (make_compound_operation_int): Don't try to optimize the AND of a SUBREG of an LSHIFTRT if that SUBREG is paradoxical. Modified: trunk/gcc/ChangeLog trunk/gcc/combine.c
Author: segher Date: Wed Aug 9 21:05:45 2017 New Revision: 251004 URL: https://gcc.gnu.org/viewcvs?rev=251004&root=gcc&view=rev Log: Testcase for PR81423 gcc/testsuite/ PR rtl-optimization/81423 * gcc.c-torture/execute/pr81423.c: New testcase. Modified: trunk/gcc/testsuite/ChangeLog
Author: segher Date: Wed Aug 9 21:52:30 2017 New Revision: 251011 URL: https://gcc.gnu.org/viewcvs?rev=251011&root=gcc&view=rev Log: This time with the file added. Testcase for PR81423 gcc/testsuite/ PR rtl-optimization/81423 * gcc.c-torture/execute/pr81423.c: New testcase. Added: trunk/gcc/testsuite/gcc.c-torture/execute/pr81423.c
Fixed on trunk.
Author: aldyh Date: Wed Sep 13 16:44:48 2017 New Revision: 252366 URL: https://gcc.gnu.org/viewcvs?rev=252366&root=gcc&view=rev Log: Testcase for PR81423 gcc/testsuite/ PR rtl-optimization/81423 * gcc.c-torture/execute/pr81423.c: New testcase. Modified: branches/range-gen2/gcc/testsuite/ChangeLog
Author: aldyh Date: Wed Sep 13 16:46:39 2017 New Revision: 252374 URL: https://gcc.gnu.org/viewcvs?rev=252374&root=gcc&view=rev Log: This time with the file added. Testcase for PR81423 gcc/testsuite/ PR rtl-optimization/81423 * gcc.c-torture/execute/pr81423.c: New testcase. Added: branches/range-gen2/gcc/testsuite/gcc.c-torture/execute/pr81423.c
Assuming this is fixed on the trunk now.
Apologies if I'm wrong and just making noise but doesn't the test case involve a left shift of a value greater than the bit width of the left operand and therefore invoke UB? ll = -5597998501375493990LL; ll = (5677365550390624949L - ll) - (ull1 > 0); //ull3 = (unsigned int) // (2067854353L << // (((ll + -2129105131L) ^ 10280750144413668236ULL) - // 10280750143997242009ULL)) >> ((2873442921854271231ULL | ull2) // - 12098357307243495419ULL); // Above broken down: unsigned long long int t1 = ll + -2129105131L; unsigned long long int t2 = t1 ^ 10280750144413668236ULL; unsigned long long int t3 = t2 - 10280750143997242009ULL; // t3 = 9523455470476460042 !!! long t4 = 2067854353L << t3; // UB!! unsigned int t5 = (unsigned int)t4; unsigned long long int t6 = 2873442921854271231ULL | ull2; unsigned long long int t7 = t6 - 12098357307243495419ULL; ull3 = t5 >> t7; Fixed test case would be: --- begin --- unsigned long long int ll = 0; unsigned long long int ull1 = 1ULL; unsigned long long int ull2 = 12008284144813806346ULL; unsigned long long int ull3; void foo () { ll = -5597998501375493990LL; ll = (5677365550390624949L - ll) - (ull1 > 0); ull3 = (unsigned int) (2067854353L << (((ll + -2129105131L) ^ 10280750144413668236ULL) + 17089282532945401191ULL)) >> ((2873442921854271231ULL | ull2) - 12098357307243495419ULL); } int main () { foo (); printf ("%llu expected 3998784)\n", ull3); printf ("%llx expected 3d0440)\n", ull3); return 0; } --- end --- This still demonstrates the bug.
ll = -5597998501375493990LL; // result is 2595916314 here. ll = unsigned(5677365550390624949L - ll) - (ull1 > 0); So: // t1 is 466811183 unsigned long long int t1 = ll + -2129105131L; // t2 is 10280750143997242019 unsigned long long int t2 = t1 ^ 10280750144413668236ULL; // t3 is 10 unsigned long long int t3 = t2 - 10280750143997242009ULL; So the test case looks correct to me. UBSAN also doesn't complain.
Author: jakub Date: Sat Oct 14 08:34:11 2017 New Revision: 253749 URL: https://gcc.gnu.org/viewcvs?rev=253749&root=gcc&view=rev Log: PR rtl-optimization/81423 * gcc.c-torture/execute/pr81423.c (foo): Add missing cast. Change L suffixes to LL. (main): Punt if either long long isn't 64-bit or int isn't 32-bit. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.c-torture/execute/pr81423.c
(In reply to Davin McCall from comment #15) > Apologies if I'm wrong and just making noise but doesn't the test case > involve a left shift of a value greater than the bit width of the left > operand and therefore invoke UB? The test as committed invokes UB, because the #c3 transcription into C omitted the cast unsigned(5677365550390624949L - ll) from the original testcase. Fixed thusly, I've additionally replaced all spots using L suffixes with LL for consistent behavior on 32-bit targets, and add a test that int is 32-bit, for say 16-bit int and 64-bit long long the testcase would have UB again (as well as end up with a different than expected value anyway, as 3998784 doesn't fit into 16-bit int.
GCC 6 branch is being closed
Fixed in GCC 8.