Bug 93505 - [8 Regression] wrong code or ICE with __builtin_bswap64() and rotation at -Og
Summary: [8 Regression] wrong code or ICE with __builtin_bswap64() and rotation at -Og
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 10.0
: P2 normal
Target Milestone: 8.4
Assignee: Jakub Jelinek
URL:
Keywords: wrong-code
Depends on:
Blocks: 93512
  Show dependency treegraph
 
Reported: 2020-01-30 06:32 UTC by Zdenek Sojka
Modified: 2020-02-14 17:31 UTC (History)
6 users (show)

See Also:
Host: x86_64-pc-linux-gnu
Target: powerpc64{,le}-unknown-linux-gnu
Build:
Known to work:
Known to fail: 10.0, 8.3.1, 9.2.1
Last reconfirmed: 2020-01-30 00:00:00


Attachments
reduced testcase (154 bytes, text/plain)
2020-01-30 06:32 UTC, Zdenek Sojka
Details
only partially reduced testcase, with always defined shifts (446 bytes, text/plain)
2020-01-30 08:15 UTC, Zdenek Sojka
Details
gcc10-pr93505.patch (718 bytes, patch)
2020-01-30 14:38 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Zdenek Sojka 2020-01-30 06:32:42 UTC
Created attachment 47737 [details]
reduced testcase

Output:
$ powerpc64le-unknown-linux-gnu-gcc -Og testcase.c -static
$ ./a.out 
qemu: uncaught target signal 6 (Aborted) - core dumped
Aborted

The function foo() collapses to just return 0.

$ powerpc64le-unknown-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-powerpc64le/bin/powerpc64le-unknown-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r10-6322-g6693911f069-checking-yes-rtl-df-extra-powerpc64le/bin/../libexec/gcc/powerpc64le-unknown-linux-gnu/10.0.1/lto-wrapper
Target: powerpc64le-unknown-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++ --enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra --with-cloog --with-ppl --with-isl --with-sysroot=/usr/powerpc64le-unknown-linux-gnu --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=powerpc64le-unknown-linux-gnu --with-ld=/usr/bin/powerpc64le-unknown-linux-gnu-ld --with-as=/usr/bin/powerpc64le-unknown-linux-gnu-as --disable-libsanitizer --disable-libstdcxx-pch --prefix=/repo/gcc-trunk//binary-trunk-r10-6322-g6693911f069-checking-yes-rtl-df-extra-powerpc64le
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.0.1 20200129 (experimental) (GCC)
Comment 1 Andrew Pinski 2020-01-30 07:53:21 UTC
I think this code is undefined ....
Comment 2 Andrew Pinski 2020-01-30 07:54:41 UTC
The reason why I say that is because the shiffter/rotater can be the full 64bit value or a smaller value.  For PowerPC, it is the full 64bit value.
Comment 3 Richard Biener 2020-01-30 08:02:28 UTC
Just to quote

  x = x >> __builtin_bswap64 (-a) | x << (-__builtin_bswap64 (-a) & 31);

with a == ~0.  The right shift argument is > 31 which makes its behavior
undefined.  Does adding a & 31 help?
Comment 4 Zdenek Sojka 2020-01-30 08:09:51 UTC
(In reply to Andrew Pinski from comment #1)
> I think this code is undefined ....

Thank you for having a look.
-fsanitize=undefined doesn't complain.


x = x >> __builtin_bswap64 (-a) | x << (-__builtin_bswap64 (-a) & 31);

a == 0, so -a == 0, so bswap(-a) == 0, so -bswap(-a) == 0, so everything should be OK
(in the non-reduced testcase, there were & 31 in the right-hand operand of both shifts, so it was always defined)

(In reply to Richard Biener from comment #3)
> Just to quote
> 
>   x = x >> __builtin_bswap64 (-a) | x << (-__builtin_bswap64 (-a) & 31);
> 
> with a == ~0.  The right shift argument is > 31 which makes its behavior
> undefined.  Does adding a & 31 help?

a == 0, x == ~0
Comment 5 Zdenek Sojka 2020-01-30 08:15:51 UTC
Created attachment 47738 [details]
only partially reduced testcase, with always defined shifts

It also needs more flags to reproduce:
-Os -fno-expensive-optimizations -fno-forward-propagate -fno-inline -fno-ipa-vrp -fno-tree-bit-ccp -fno-tree-ter

(not all of those are needed; switching to -Og might help to reduce the number of needed flags)

I've just deleted my gcc repo + binaries, so I can't test it atm.
Comment 6 Zdenek Sojka 2020-01-30 10:33:08 UTC
(In reply to Zdenek Sojka from comment #5)
> Created attachment 47738 [details]
> only partially reduced testcase, with always defined shifts
> 
> It also needs more flags to reproduce:
> -Os -fno-expensive-optimizations -fno-forward-propagate -fno-inline
> -fno-ipa-vrp -fno-tree-bit-ccp -fno-tree-ter
> 
> (not all of those are needed; switching to -Og might help to reduce the
> number of needed flags)
> 
> I've just deleted my gcc repo + binaries, so I can't test it atm.

Actually, only -Og is enough even for that testcase:

$ powerpc64le-unknown-linux-gnu-gcc -Og testcase-min1.c -static -w && ./a.out 
00000000000000000000000000000000
(wrong)

$ powerpc64le-unknown-linux-gnu-gcc -O0 testcase-min1.c -static -w && ./a.out 
00000000000000010000000000000000
(correct)
Comment 7 Jakub Jelinek 2020-01-30 10:44:16 UTC
Still bisecting it, but I'd say the bug is in expand_binop:
  /* If we were trying to rotate, and that didn't work, try rotating
     the other direction before falling back to shifts and bitwise-or.  */
  if (((binoptab == rotl_optab
        && (icode = optab_handler (rotr_optab, mode)) != CODE_FOR_nothing)
       || (binoptab == rotr_optab
           && (icode = optab_handler (rotl_optab, mode)) != CODE_FOR_nothing))
      && is_int_mode (mode, &int_mode))
    {
      optab otheroptab = (binoptab == rotl_optab ? rotr_optab : rotl_optab);
      rtx newop1;
      unsigned int bits = GET_MODE_PRECISION (int_mode);
    
      if (CONST_INT_P (op1))
        newop1 = gen_int_shift_amount (int_mode, bits - INTVAL (op1));
      else if (targetm.shift_truncation_mask (int_mode) == bits - 1)
        newop1 = negate_rtx (GET_MODE (op1), op1);
      else
        newop1 = expand_binop (GET_MODE (op1), sub_optab,
                               gen_int_mode (bits, GET_MODE (op1)), op1,
                               NULL_RTX, unsignedp, OPTAB_DIRECT);

      temp = expand_binop_directly (icode, int_mode, otheroptab, op0, newop1,
                                    target, unsignedp, methods, last);
      if (temp)
        return temp;
    }

The above is wrong if op1 is or might be 0 and targetm.shift_truncation_mask (int_mode) != bits - 1,
because for original valid rotate by 0 it will create invalid rotate in the other direction by bits
which is out of bounds rotation count.
So, I'd say we should either mask the result of subtraction with bits - 1, or perhaps better
do a negate + and, i.e. -op1 & (bits - 1).
Comment 8 Jakub Jelinek 2020-01-30 10:44:45 UTC
And the original testcase is perfectly valid, so I don't see why we are looking for another one.
Comment 9 Jakub Jelinek 2020-01-30 13:51:58 UTC
Bisection shows this started with my r8-3850-g39382c092ee9bb5c00330 change.
If I change the testcase slightly (still -Og):
unsigned a;

unsigned
foo (unsigned x)
{
  unsigned y = __builtin_bswap64 (-a);
  x = x >> y | x << (-y & 31);
  x >>= 31;
  return x;
}

int
main ()
{
  unsigned x = foo (~0);
  if (x != 1)
    __builtin_abort ();
  return 0;
}
it started in between r225210 and r227605, but have trouble building cross compilers in that range.
Anyway, what happens is with the out of bounds rotate we get:
Trying 13 -> 16:
   13: r129:SI=r132:DI#0<-<0x20
      REG_DEAD r132:DI
   16: r123:DI=r129:SI<0
      REG_DEAD r129:SI
Successfully matched this instruction:
(set (reg/v:DI 123 [ <retval> ])
    (const_int 0 [0]))
during combine.  So, perhaps we could also change simplify-rtx.c to punt if it is out of bounds rather than trying to optimize anything.
Comment 10 Jakub Jelinek 2020-01-30 14:38:44 UTC
Created attachment 47740 [details]
gcc10-pr93505.patch

Untested combiner fix.  IMHO even when we fix expand_binop we want it anyway, because we don't want to invoke UB in the compiler, even on UB in the testcase.
Comment 11 Jakub Jelinek 2020-01-30 15:20:06 UTC
As for the expand_binop bug, if I fix it like:
--- gcc/optabs.c.jj	2020-01-12 11:54:36.690409230 +0100
+++ gcc/optabs.c	2020-01-30 16:05:43.520649234 +0100
@@ -1226,16 +1226,22 @@ expand_binop (machine_mode mode, optab b
       unsigned int bits = GET_MODE_PRECISION (int_mode);
 
       if (CONST_INT_P (op1))
-	newop1 = gen_int_shift_amount (int_mode, bits - INTVAL (op1));
+	newop1
+	  = gen_int_shift_amount (int_mode, (bits - UINTVAL (op1)) % bits);
       else if (targetm.shift_truncation_mask (int_mode) == bits - 1)
         newop1 = negate_rtx (GET_MODE (op1), op1);
-      else
-        newop1 = expand_binop (GET_MODE (op1), sub_optab,
-			       gen_int_mode (bits, GET_MODE (op1)), op1,
+      else if (pow2p_hwi (bits))
+        newop1 = expand_binop (GET_MODE (op1), and_optab,
+			       negate_rtx (GET_MODE (op1), op1),
+			       gen_int_mode (bits - 1, GET_MODE (op1)),
 			       NULL_RTX, unsignedp, OPTAB_DIRECT);
+      else
+	newop1 = NULL_RTX;
 
-      temp = expand_binop_directly (icode, int_mode, otheroptab, op0, newop1,
-				    target, unsignedp, methods, last);
+      temp = NULL_RTX;
+      if (newop1)
+	temp = expand_binop_directly (icode, int_mode, otheroptab, op0, newop1,
+				      target, unsignedp, methods, last);
       if (temp)
 	return temp;
     }
then unfortunately given -O2 on
unsigned int
bar (unsigned int x, int y)
{
  return x >> y | x << (-y & 31);
}
powerpc64le-linux emitted assembly changes like:
 bar:
 .LFB1:
 	.cfi_startproc
-	subfic 4,4,32
+	neg 4,4
+	rlwinm 4,4,0,27,31
 	rotlw 3,3,4
 	blr

Given
unsigned int
baz (unsigned int x, int y)
{
  y &= 31;
  return x >> y | x << (-y & 31);
}
the change is:
 bar:
 .LFB1:
 	.cfi_startproc
+	neg 4,4
 	rlwinm 4,4,0,27,31
-	subfic 4,4,32
 	rotlw 3,3,4
 	blr
and so I'd say even if we just don't fix expand_binop, this shows an optimization opportunity for the rs6000 backend
if the rotlw instruction only uses bottom 5 bits from the last operand.
Comment 12 Segher Boessenkool 2020-01-30 16:09:37 UTC
(In reply to Jakub Jelinek from comment #10)
> Created attachment 47740 [details]
> gcc10-pr93505.patch
> 
> Untested combiner fix.  IMHO even when we fix expand_binop we want it
> anyway, because we don't want to invoke UB in the compiler, even on UB in
> the testcase.

This patch is approved for trunk and any wanted backports (if it passes
regression testing ;-) )
Comment 13 Segher Boessenkool 2020-01-30 16:12:39 UTC
(In reply to Jakub Jelinek from comment #11)
> and so I'd say even if we just don't fix expand_binop, this shows an
> optimization opportunity for the rs6000 backend
> if the rotlw instruction only uses bottom 5 bits from the last operand.

Yes, but there is no easy way to express that.  For shifts the mask is one
bit *more* (in the GPRs; it is not in vectors).  But we could do more in
the .md, sure.
Comment 14 Jakub Jelinek 2020-01-30 16:20:14 UTC
(In reply to Segher Boessenkool from comment #13)
> (In reply to Jakub Jelinek from comment #11)
> > and so I'd say even if we just don't fix expand_binop, this shows an
> > optimization opportunity for the rs6000 backend
> > if the rotlw instruction only uses bottom 5 bits from the last operand.
> 
> Yes, but there is no easy way to express that.  For shifts the mask is one
> bit *more* (in the GPRs; it is not in vectors).  But we could do more in
> the .md, sure.

But for rotlw the mask is exactly the precision?
If yes, there could be something like:
(define_insn "rotl<mode>3_cntmask"
  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
        (rotate:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
                    (and:SI (match_operand:SI 2 "reg_or_cint_operand" "rn")
                            (match_operand:SI 3 "const_int_operand" "n")))]
  "INTVAL (operands[3]) == GET_MODE_PRECISION (<MODE>mode)"
  "rotl<wd>%I2 %0,%1,%<hH>2"
  [(set_attr "type" "shift")
   (set_attr "maybe_var_shift" "yes")])
(though, there are various patterns with rotate and something around it, so perhaps it would need to be done for those too).
Comment 15 Jakub Jelinek 2020-01-30 16:20:50 UTC
(In reply to Jakub Jelinek from comment #14)
> (In reply to Segher Boessenkool from comment #13)
> > (In reply to Jakub Jelinek from comment #11)
> > > and so I'd say even if we just don't fix expand_binop, this shows an
> > > optimization opportunity for the rs6000 backend
> > > if the rotlw instruction only uses bottom 5 bits from the last operand.
> > 
> > Yes, but there is no easy way to express that.  For shifts the mask is one
> > bit *more* (in the GPRs; it is not in vectors).  But we could do more in
> > the .md, sure.
> 
> But for rotlw the mask is exactly the precision?
> If yes, there could be something like:
> (define_insn "rotl<mode>3_cntmask"
>   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
>         (rotate:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
>                     (and:SI (match_operand:SI 2 "reg_or_cint_operand" "rn")
>                             (match_operand:SI 3 "const_int_operand" "n")))]
>   "INTVAL (operands[3]) == GET_MODE_PRECISION (<MODE>mode)"
>   "rotl<wd>%I2 %0,%1,%<hH>2"
>   [(set_attr "type" "shift")
>    (set_attr "maybe_var_shift" "yes")])
> (though, there are various patterns with rotate and something around it, so
> perhaps it would need to be done for those too).

Obviously not GET_MODE_PRECISION but GET_MODE_MASK.
Comment 16 GCC Commits 2020-01-30 20:32:22 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:56b92750f83724177d2c6eae30c208e935a56a37

commit r10-6358-g56b92750f83724177d2c6eae30c208e935a56a37
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Jan 30 21:28:17 2020 +0100

    combine: Punt on out of range rotate counts [PR93505]
    
    What happens on this testcase is with the out of bounds rotate we get:
    Trying 13 -> 16:
       13: r129:SI=r132:DI#0<-<0x20
          REG_DEAD r132:DI
       16: r123:DI=r129:SI<0
          REG_DEAD r129:SI
    Successfully matched this instruction:
    (set (reg/v:DI 123 [ <retval> ])
        (const_int 0 [0]))
    during combine.  So, perhaps we could also change simplify-rtx.c to punt
    if it is out of bounds rather than trying to optimize anything.
    Or, but probably GCC11 material, if we decide that ROTATE/ROTATERT doesn't
    have out of bounds counts or introduce targetm.rotate_truncation_mask,
    we should truncate the argument instead of punting.
    Punting is better for backports though.
    
    2020-01-30  Jakub Jelinek  <jakub@redhat.com>
    
    	PR middle-end/93505
    	* combine.c (simplify_comparison) <case ROTATE>: Punt on out of range
    	rotate counts.
    
    	* gcc.c-torture/compile/pr93505.c: New test.
Comment 17 Jakub Jelinek 2020-01-31 14:47:32 UTC
The wrong-code is now fixed on the trunk.  What we want to do further depends on whether we want to make ROTATE/ROTATERT imply truncation (always or only on selected targets), let's defer that part to PR93512.
Comment 18 Segher Boessenkool 2020-02-03 14:46:01 UTC
PR93512 is marked as enhancement, but if we don't fix this it is a regression.
Comment 19 GCC Commits 2020-02-13 22:32:24 UTC
The releases/gcc-9 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:329475795c6eeaa2b122672091c9119b9d6c5564

commit r9-8217-g329475795c6eeaa2b122672091c9119b9d6c5564
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Jan 30 21:28:17 2020 +0100

    combine: Punt on out of range rotate counts [PR93505]
    
    What happens on this testcase is with the out of bounds rotate we get:
    Trying 13 -> 16:
       13: r129:SI=r132:DI#0<-<0x20
          REG_DEAD r132:DI
       16: r123:DI=r129:SI<0
          REG_DEAD r129:SI
    Successfully matched this instruction:
    (set (reg/v:DI 123 [ <retval> ])
        (const_int 0 [0]))
    during combine.  So, perhaps we could also change simplify-rtx.c to punt
    if it is out of bounds rather than trying to optimize anything.
    Or, but probably GCC11 material, if we decide that ROTATE/ROTATERT doesn't
    have out of bounds counts or introduce targetm.rotate_truncation_mask,
    we should truncate the argument instead of punting.
    Punting is better for backports though.
    
    2020-01-30  Jakub Jelinek  <jakub@redhat.com>
    
    	PR middle-end/93505
    	* combine.c (simplify_comparison) <case ROTATE>: Punt on out of range
    	rotate counts.
    
    	* gcc.c-torture/compile/pr93505.c: New test.
Comment 20 Jakub Jelinek 2020-02-13 22:48:53 UTC
Fixed for 9.3+ too.
Comment 21 GCC Commits 2020-02-14 16:38:30 UTC
The releases/gcc-8 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:a73ee641c3d2ca729bdf55225afd881f57bf4d96

commit r8-10013-ga73ee641c3d2ca729bdf55225afd881f57bf4d96
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Fri Feb 14 15:46:13 2020 +0100

    combine: Punt on out of range rotate counts [PR93505]
    
    What happens on this testcase is with the out of bounds rotate we get:
    Trying 13 -> 16:
       13: r129:SI=r132:DI#0<-<0x20
          REG_DEAD r132:DI
       16: r123:DI=r129:SI<0
          REG_DEAD r129:SI
    Successfully matched this instruction:
    (set (reg/v:DI 123 [ <retval> ])
        (const_int 0 [0]))
    during combine.  So, perhaps we could also change simplify-rtx.c to punt
    if it is out of bounds rather than trying to optimize anything.
    Or, but probably GCC11 material, if we decide that ROTATE/ROTATERT doesn't
    have out of bounds counts or introduce targetm.rotate_truncation_mask,
    we should truncate the argument instead of punting.
    Punting is better for backports though.
    
    2020-01-30  Jakub Jelinek  <jakub@redhat.com>
    
    	PR middle-end/93505
    	* combine.c (simplify_comparison) <case ROTATE>: Punt on out of range
    	rotate counts.
    
    	* gcc.c-torture/compile/pr93505.c: New test.
Comment 22 Jakub Jelinek 2020-02-14 17:31:51 UTC
Fixed for 8.4+ too.