[PATCH] Fix *r<noxa>sbg_sidi_srl pattern (PR target/89369)

Mon Feb 18 11:17:00 GMT 2019

On 16.02.19 19:38, Jakub Jelinek wrote:
> Hi!
> 
> The following patch fixes wrong-code on the following testcase extracted
> from pseudo-RNG with e.g. -march=zEC12 -O2.
> The problem is in the instruction emitted by the *r<noxa>sbg_sidi_srl
> patterns.  We have in *.final correct:
> (insn 67 65 68 (parallel [
>             (set (reg:SI 1 %r1 [189])
>                 (xor:SI (subreg:SI (zero_extract:DI (reg/v:DI 11 %r11 [orig:89 th ] [89])
>                             (const_int 32 [0x20])
>                             (const_int 24 [0x18])) 4)
>                     (reg:SI 1 %r1 [187])))
>             (clobber (reg:CC 33 %cc))
>         ]) "pr89369.c":44:73 1419 {*rxsbg_sidi_srl}
>      (expr_list:REG_DEAD (reg/v:DI 11 %r11 [orig:89 th ] [89])
>         (expr_list:REG_UNUSED (reg:CC 33 %cc)
>             (nil))))
> which is effectively (reg:SI %r1) ^= (unsigned) ((reg:DI %r11) >> 8).
> But the pattern emits rxsbg	%r1,%r11,40,63,56 which is effectively
> (reg:SI %r1) ^= ((unsigned) ((reg:DI %r11) >> 8) & 0xffffff)
> or equivalently
> (reg:SI %r1) ^= ((reg:SI %r11) >> 8).  Even in the pattern one can see
> that it wants to extract exactly 32 bits always, no matter what the shift
> count is.  Fixed by always emitting 32,63,(32+pos from zero extract).
> On that pr89369.c testcase, the patch also changes
> -	rxsbg	%r12,%r9,64,63,32
> -	rxsbg	%r12,%r1,64,63,32
> +	rxsbg	%r12,%r9,32,63,32
> +	rxsbg	%r12,%r1,32,63,32
> and
> -	rxsbg	%r1,%r3,64,63,32
> +	rxsbg	%r1,%r3,32,63,32
> which are all with zero_extract with len 32 and pos 0, so again, it wants
> to extract the low 32 bits.  I3 64 larger than I4 63 is just weird.
> The patch also changes the instructions emitted in rXsbg_mode_sXl.c:
> -	rosbg	%r2,%r3,34,63,62
> +	rosbg	%r2,%r3,32,63,62
> and
> -	rxsbg	%r2,%r3,34,63,62
> +	rxsbg	%r2,%r3,32,63,62
> Here, it is
> __attribute__ ((noinline)) unsigned int
> rosbg_si_srl (unsigned int a, unsigned int b)
> {
>   return a | (b >> 2);
> }
> __attribute__ ((noinline)) unsigned int
> rxsbg_si_srl (unsigned int a, unsigned int b)
> {
>   return a ^ (b >> 2);
> }
> so from quick POV, one might think 34,63,62 is better, as we want to or in
> just the 30 bits from b.  Both should actually work the same though, because
> (subreg/s/v:SI (reg/v:DI 64 [ b+-4 ]) 4) - the b argument is passed zero
> extended and so it doesn't really matter how many bits we extract, as long
> as it is 30 or more.  If I try instead:
> __attribute__ ((noinline, noipa)) unsigned int
> rosbg_si_srl (unsigned int a, unsigned long long b)
> {
>   return a | ((unsigned) b >> 2);
> }
> __attribute__ ((noinline, noipa)) unsigned int
> rxsbg_si_srl (unsigned int a, unsigned long long b)
> {
>   return a ^ ((unsigned) b >> 2);
> }
> then both the unpatched and patched compiler emit properly
>         rosbg   %r2,%r3,34,63,62
> and
>         rxsbg   %r2,%r3,34,63,62
> through a different pattern, because in that case we must not rely on the
> upper 32 bits of b being zero.
> 
> In addition to this change, the patch adds a cleanup, there is no reason to
> use a static buffer in each instruction and increase global state, we can
> just tweak the arguments and let the caller deal with it.  That is something
> normally done in other parts of the s390.md as well.  As small CONST_INTs
> are hashed, it shouldn't increase compile time memory.
> 
> Bootstrapped/regtested on s390x-linux, ok for trunk?
> 
> 2019-02-16  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/89369
> 	* config/s390/s390.md (*r<noxa>sbg_<mode>_srl_bitmask,
> 	*r<noxa>sbg_<mode>_sll, *r<noxa>sbg_<mode>_srl): Don't construct
> 	pattern in a temporary buffer.
> 	(*r<noxa>sbg_sidi_srl): Likewise.  Always use 32 as I3 rather
> 	than 64-operands[2].
> 
> 	* gcc.c-torture/execute/pr89369.c: New test.
> 	* gcc.target/s390/md/rXsbg_mode_sXl.c (rosbg_si_srl,
> 	rxsbg_si_srl): Expect last 3 operands 32,63,62 rather than
> 	34,63,62.

Ok. Thanks!

Andreas