Created attachment 47737 [details] reduced testcase Output: $ powerpc64le-unknown-linux-gnu-gcc -Og testcase.c -static $ ./a.out qemu: uncaught target signal 6 (Aborted) - core dumped Aborted The function foo() collapses to just return 0. $ powerpc64le-unknown-linux-gnu-gcc -v Using built-in specs. COLLECT_GCC=/repo/gcc-trunk/binary-latest-powerpc64le/bin/powerpc64le-unknown-linux-gnu-gcc COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r10-6322-g6693911f069-checking-yes-rtl-df-extra-powerpc64le/bin/../libexec/gcc/powerpc64le-unknown-linux-gnu/10.0.1/lto-wrapper Target: powerpc64le-unknown-linux-gnu Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++ --enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra --with-cloog --with-ppl --with-isl --with-sysroot=/usr/powerpc64le-unknown-linux-gnu --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=powerpc64le-unknown-linux-gnu --with-ld=/usr/bin/powerpc64le-unknown-linux-gnu-ld --with-as=/usr/bin/powerpc64le-unknown-linux-gnu-as --disable-libsanitizer --disable-libstdcxx-pch --prefix=/repo/gcc-trunk//binary-trunk-r10-6322-g6693911f069-checking-yes-rtl-df-extra-powerpc64le Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 10.0.1 20200129 (experimental) (GCC)
I think this code is undefined ....
The reason why I say that is because the shiffter/rotater can be the full 64bit value or a smaller value. For PowerPC, it is the full 64bit value.
Just to quote x = x >> __builtin_bswap64 (-a) | x << (-__builtin_bswap64 (-a) & 31); with a == ~0. The right shift argument is > 31 which makes its behavior undefined. Does adding a & 31 help?
(In reply to Andrew Pinski from comment #1) > I think this code is undefined .... Thank you for having a look. -fsanitize=undefined doesn't complain. x = x >> __builtin_bswap64 (-a) | x << (-__builtin_bswap64 (-a) & 31); a == 0, so -a == 0, so bswap(-a) == 0, so -bswap(-a) == 0, so everything should be OK (in the non-reduced testcase, there were & 31 in the right-hand operand of both shifts, so it was always defined) (In reply to Richard Biener from comment #3) > Just to quote > > x = x >> __builtin_bswap64 (-a) | x << (-__builtin_bswap64 (-a) & 31); > > with a == ~0. The right shift argument is > 31 which makes its behavior > undefined. Does adding a & 31 help? a == 0, x == ~0
Created attachment 47738 [details] only partially reduced testcase, with always defined shifts It also needs more flags to reproduce: -Os -fno-expensive-optimizations -fno-forward-propagate -fno-inline -fno-ipa-vrp -fno-tree-bit-ccp -fno-tree-ter (not all of those are needed; switching to -Og might help to reduce the number of needed flags) I've just deleted my gcc repo + binaries, so I can't test it atm.
(In reply to Zdenek Sojka from comment #5) > Created attachment 47738 [details] > only partially reduced testcase, with always defined shifts > > It also needs more flags to reproduce: > -Os -fno-expensive-optimizations -fno-forward-propagate -fno-inline > -fno-ipa-vrp -fno-tree-bit-ccp -fno-tree-ter > > (not all of those are needed; switching to -Og might help to reduce the > number of needed flags) > > I've just deleted my gcc repo + binaries, so I can't test it atm. Actually, only -Og is enough even for that testcase: $ powerpc64le-unknown-linux-gnu-gcc -Og testcase-min1.c -static -w && ./a.out 00000000000000000000000000000000 (wrong) $ powerpc64le-unknown-linux-gnu-gcc -O0 testcase-min1.c -static -w && ./a.out 00000000000000010000000000000000 (correct)
Still bisecting it, but I'd say the bug is in expand_binop: /* If we were trying to rotate, and that didn't work, try rotating the other direction before falling back to shifts and bitwise-or. */ if (((binoptab == rotl_optab && (icode = optab_handler (rotr_optab, mode)) != CODE_FOR_nothing) || (binoptab == rotr_optab && (icode = optab_handler (rotl_optab, mode)) != CODE_FOR_nothing)) && is_int_mode (mode, &int_mode)) { optab otheroptab = (binoptab == rotl_optab ? rotr_optab : rotl_optab); rtx newop1; unsigned int bits = GET_MODE_PRECISION (int_mode); if (CONST_INT_P (op1)) newop1 = gen_int_shift_amount (int_mode, bits - INTVAL (op1)); else if (targetm.shift_truncation_mask (int_mode) == bits - 1) newop1 = negate_rtx (GET_MODE (op1), op1); else newop1 = expand_binop (GET_MODE (op1), sub_optab, gen_int_mode (bits, GET_MODE (op1)), op1, NULL_RTX, unsignedp, OPTAB_DIRECT); temp = expand_binop_directly (icode, int_mode, otheroptab, op0, newop1, target, unsignedp, methods, last); if (temp) return temp; } The above is wrong if op1 is or might be 0 and targetm.shift_truncation_mask (int_mode) != bits - 1, because for original valid rotate by 0 it will create invalid rotate in the other direction by bits which is out of bounds rotation count. So, I'd say we should either mask the result of subtraction with bits - 1, or perhaps better do a negate + and, i.e. -op1 & (bits - 1).
And the original testcase is perfectly valid, so I don't see why we are looking for another one.
Bisection shows this started with my r8-3850-g39382c092ee9bb5c00330 change. If I change the testcase slightly (still -Og): unsigned a; unsigned foo (unsigned x) { unsigned y = __builtin_bswap64 (-a); x = x >> y | x << (-y & 31); x >>= 31; return x; } int main () { unsigned x = foo (~0); if (x != 1) __builtin_abort (); return 0; } it started in between r225210 and r227605, but have trouble building cross compilers in that range. Anyway, what happens is with the out of bounds rotate we get: Trying 13 -> 16: 13: r129:SI=r132:DI#0<-<0x20 REG_DEAD r132:DI 16: r123:DI=r129:SI<0 REG_DEAD r129:SI Successfully matched this instruction: (set (reg/v:DI 123 [ <retval> ]) (const_int 0 [0])) during combine. So, perhaps we could also change simplify-rtx.c to punt if it is out of bounds rather than trying to optimize anything.
Created attachment 47740 [details] gcc10-pr93505.patch Untested combiner fix. IMHO even when we fix expand_binop we want it anyway, because we don't want to invoke UB in the compiler, even on UB in the testcase.
As for the expand_binop bug, if I fix it like: --- gcc/optabs.c.jj 2020-01-12 11:54:36.690409230 +0100 +++ gcc/optabs.c 2020-01-30 16:05:43.520649234 +0100 @@ -1226,16 +1226,22 @@ expand_binop (machine_mode mode, optab b unsigned int bits = GET_MODE_PRECISION (int_mode); if (CONST_INT_P (op1)) - newop1 = gen_int_shift_amount (int_mode, bits - INTVAL (op1)); + newop1 + = gen_int_shift_amount (int_mode, (bits - UINTVAL (op1)) % bits); else if (targetm.shift_truncation_mask (int_mode) == bits - 1) newop1 = negate_rtx (GET_MODE (op1), op1); - else - newop1 = expand_binop (GET_MODE (op1), sub_optab, - gen_int_mode (bits, GET_MODE (op1)), op1, + else if (pow2p_hwi (bits)) + newop1 = expand_binop (GET_MODE (op1), and_optab, + negate_rtx (GET_MODE (op1), op1), + gen_int_mode (bits - 1, GET_MODE (op1)), NULL_RTX, unsignedp, OPTAB_DIRECT); + else + newop1 = NULL_RTX; - temp = expand_binop_directly (icode, int_mode, otheroptab, op0, newop1, - target, unsignedp, methods, last); + temp = NULL_RTX; + if (newop1) + temp = expand_binop_directly (icode, int_mode, otheroptab, op0, newop1, + target, unsignedp, methods, last); if (temp) return temp; } then unfortunately given -O2 on unsigned int bar (unsigned int x, int y) { return x >> y | x << (-y & 31); } powerpc64le-linux emitted assembly changes like: bar: .LFB1: .cfi_startproc - subfic 4,4,32 + neg 4,4 + rlwinm 4,4,0,27,31 rotlw 3,3,4 blr Given unsigned int baz (unsigned int x, int y) { y &= 31; return x >> y | x << (-y & 31); } the change is: bar: .LFB1: .cfi_startproc + neg 4,4 rlwinm 4,4,0,27,31 - subfic 4,4,32 rotlw 3,3,4 blr and so I'd say even if we just don't fix expand_binop, this shows an optimization opportunity for the rs6000 backend if the rotlw instruction only uses bottom 5 bits from the last operand.
(In reply to Jakub Jelinek from comment #10) > Created attachment 47740 [details] > gcc10-pr93505.patch > > Untested combiner fix. IMHO even when we fix expand_binop we want it > anyway, because we don't want to invoke UB in the compiler, even on UB in > the testcase. This patch is approved for trunk and any wanted backports (if it passes regression testing ;-) )
(In reply to Jakub Jelinek from comment #11) > and so I'd say even if we just don't fix expand_binop, this shows an > optimization opportunity for the rs6000 backend > if the rotlw instruction only uses bottom 5 bits from the last operand. Yes, but there is no easy way to express that. For shifts the mask is one bit *more* (in the GPRs; it is not in vectors). But we could do more in the .md, sure.
(In reply to Segher Boessenkool from comment #13) > (In reply to Jakub Jelinek from comment #11) > > and so I'd say even if we just don't fix expand_binop, this shows an > > optimization opportunity for the rs6000 backend > > if the rotlw instruction only uses bottom 5 bits from the last operand. > > Yes, but there is no easy way to express that. For shifts the mask is one > bit *more* (in the GPRs; it is not in vectors). But we could do more in > the .md, sure. But for rotlw the mask is exactly the precision? If yes, there could be something like: (define_insn "rotl<mode>3_cntmask" [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") (rotate:GPR (match_operand:GPR 1 "gpc_reg_operand" "r") (and:SI (match_operand:SI 2 "reg_or_cint_operand" "rn") (match_operand:SI 3 "const_int_operand" "n")))] "INTVAL (operands[3]) == GET_MODE_PRECISION (<MODE>mode)" "rotl<wd>%I2 %0,%1,%<hH>2" [(set_attr "type" "shift") (set_attr "maybe_var_shift" "yes")]) (though, there are various patterns with rotate and something around it, so perhaps it would need to be done for those too).
(In reply to Jakub Jelinek from comment #14) > (In reply to Segher Boessenkool from comment #13) > > (In reply to Jakub Jelinek from comment #11) > > > and so I'd say even if we just don't fix expand_binop, this shows an > > > optimization opportunity for the rs6000 backend > > > if the rotlw instruction only uses bottom 5 bits from the last operand. > > > > Yes, but there is no easy way to express that. For shifts the mask is one > > bit *more* (in the GPRs; it is not in vectors). But we could do more in > > the .md, sure. > > But for rotlw the mask is exactly the precision? > If yes, there could be something like: > (define_insn "rotl<mode>3_cntmask" > [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") > (rotate:GPR (match_operand:GPR 1 "gpc_reg_operand" "r") > (and:SI (match_operand:SI 2 "reg_or_cint_operand" "rn") > (match_operand:SI 3 "const_int_operand" "n")))] > "INTVAL (operands[3]) == GET_MODE_PRECISION (<MODE>mode)" > "rotl<wd>%I2 %0,%1,%<hH>2" > [(set_attr "type" "shift") > (set_attr "maybe_var_shift" "yes")]) > (though, there are various patterns with rotate and something around it, so > perhaps it would need to be done for those too). Obviously not GET_MODE_PRECISION but GET_MODE_MASK.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:56b92750f83724177d2c6eae30c208e935a56a37 commit r10-6358-g56b92750f83724177d2c6eae30c208e935a56a37 Author: Jakub Jelinek <jakub@redhat.com> Date: Thu Jan 30 21:28:17 2020 +0100 combine: Punt on out of range rotate counts [PR93505] What happens on this testcase is with the out of bounds rotate we get: Trying 13 -> 16: 13: r129:SI=r132:DI#0<-<0x20 REG_DEAD r132:DI 16: r123:DI=r129:SI<0 REG_DEAD r129:SI Successfully matched this instruction: (set (reg/v:DI 123 [ <retval> ]) (const_int 0 [0])) during combine. So, perhaps we could also change simplify-rtx.c to punt if it is out of bounds rather than trying to optimize anything. Or, but probably GCC11 material, if we decide that ROTATE/ROTATERT doesn't have out of bounds counts or introduce targetm.rotate_truncation_mask, we should truncate the argument instead of punting. Punting is better for backports though. 2020-01-30 Jakub Jelinek <jakub@redhat.com> PR middle-end/93505 * combine.c (simplify_comparison) <case ROTATE>: Punt on out of range rotate counts. * gcc.c-torture/compile/pr93505.c: New test.
The wrong-code is now fixed on the trunk. What we want to do further depends on whether we want to make ROTATE/ROTATERT imply truncation (always or only on selected targets), let's defer that part to PR93512.
PR93512 is marked as enhancement, but if we don't fix this it is a regression.
The releases/gcc-9 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:329475795c6eeaa2b122672091c9119b9d6c5564 commit r9-8217-g329475795c6eeaa2b122672091c9119b9d6c5564 Author: Jakub Jelinek <jakub@redhat.com> Date: Thu Jan 30 21:28:17 2020 +0100 combine: Punt on out of range rotate counts [PR93505] What happens on this testcase is with the out of bounds rotate we get: Trying 13 -> 16: 13: r129:SI=r132:DI#0<-<0x20 REG_DEAD r132:DI 16: r123:DI=r129:SI<0 REG_DEAD r129:SI Successfully matched this instruction: (set (reg/v:DI 123 [ <retval> ]) (const_int 0 [0])) during combine. So, perhaps we could also change simplify-rtx.c to punt if it is out of bounds rather than trying to optimize anything. Or, but probably GCC11 material, if we decide that ROTATE/ROTATERT doesn't have out of bounds counts or introduce targetm.rotate_truncation_mask, we should truncate the argument instead of punting. Punting is better for backports though. 2020-01-30 Jakub Jelinek <jakub@redhat.com> PR middle-end/93505 * combine.c (simplify_comparison) <case ROTATE>: Punt on out of range rotate counts. * gcc.c-torture/compile/pr93505.c: New test.
Fixed for 9.3+ too.
The releases/gcc-8 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:a73ee641c3d2ca729bdf55225afd881f57bf4d96 commit r8-10013-ga73ee641c3d2ca729bdf55225afd881f57bf4d96 Author: Jakub Jelinek <jakub@redhat.com> Date: Fri Feb 14 15:46:13 2020 +0100 combine: Punt on out of range rotate counts [PR93505] What happens on this testcase is with the out of bounds rotate we get: Trying 13 -> 16: 13: r129:SI=r132:DI#0<-<0x20 REG_DEAD r132:DI 16: r123:DI=r129:SI<0 REG_DEAD r129:SI Successfully matched this instruction: (set (reg/v:DI 123 [ <retval> ]) (const_int 0 [0])) during combine. So, perhaps we could also change simplify-rtx.c to punt if it is out of bounds rather than trying to optimize anything. Or, but probably GCC11 material, if we decide that ROTATE/ROTATERT doesn't have out of bounds counts or introduce targetm.rotate_truncation_mask, we should truncate the argument instead of punting. Punting is better for backports though. 2020-01-30 Jakub Jelinek <jakub@redhat.com> PR middle-end/93505 * combine.c (simplify_comparison) <case ROTATE>: Punt on out of range rotate counts. * gcc.c-torture/compile/pr93505.c: New test.
Fixed for 8.4+ too.