[PATCH] Fix *r<noxa>sbg_sidi_srl pattern (PR target/89369)
Andreas Krebbel
krebbel@linux.ibm.com
Mon Feb 18 11:17:00 GMT 2019
On 16.02.19 19:38, Jakub Jelinek wrote:
> Hi!
>
> The following patch fixes wrong-code on the following testcase extracted
> from pseudo-RNG with e.g. -march=zEC12 -O2.
> The problem is in the instruction emitted by the *r<noxa>sbg_sidi_srl
> patterns. We have in *.final correct:
> (insn 67 65 68 (parallel [
> (set (reg:SI 1 %r1 [189])
> (xor:SI (subreg:SI (zero_extract:DI (reg/v:DI 11 %r11 [orig:89 th ] [89])
> (const_int 32 [0x20])
> (const_int 24 [0x18])) 4)
> (reg:SI 1 %r1 [187])))
> (clobber (reg:CC 33 %cc))
> ]) "pr89369.c":44:73 1419 {*rxsbg_sidi_srl}
> (expr_list:REG_DEAD (reg/v:DI 11 %r11 [orig:89 th ] [89])
> (expr_list:REG_UNUSED (reg:CC 33 %cc)
> (nil))))
> which is effectively (reg:SI %r1) ^= (unsigned) ((reg:DI %r11) >> 8).
> But the pattern emits rxsbg %r1,%r11,40,63,56 which is effectively
> (reg:SI %r1) ^= ((unsigned) ((reg:DI %r11) >> 8) & 0xffffff)
> or equivalently
> (reg:SI %r1) ^= ((reg:SI %r11) >> 8). Even in the pattern one can see
> that it wants to extract exactly 32 bits always, no matter what the shift
> count is. Fixed by always emitting 32,63,(32+pos from zero extract).
> On that pr89369.c testcase, the patch also changes
> - rxsbg %r12,%r9,64,63,32
> - rxsbg %r12,%r1,64,63,32
> + rxsbg %r12,%r9,32,63,32
> + rxsbg %r12,%r1,32,63,32
> and
> - rxsbg %r1,%r3,64,63,32
> + rxsbg %r1,%r3,32,63,32
> which are all with zero_extract with len 32 and pos 0, so again, it wants
> to extract the low 32 bits. I3 64 larger than I4 63 is just weird.
> The patch also changes the instructions emitted in rXsbg_mode_sXl.c:
> - rosbg %r2,%r3,34,63,62
> + rosbg %r2,%r3,32,63,62
> and
> - rxsbg %r2,%r3,34,63,62
> + rxsbg %r2,%r3,32,63,62
> Here, it is
> __attribute__ ((noinline)) unsigned int
> rosbg_si_srl (unsigned int a, unsigned int b)
> {
> return a | (b >> 2);
> }
> __attribute__ ((noinline)) unsigned int
> rxsbg_si_srl (unsigned int a, unsigned int b)
> {
> return a ^ (b >> 2);
> }
> so from quick POV, one might think 34,63,62 is better, as we want to or in
> just the 30 bits from b. Both should actually work the same though, because
> (subreg/s/v:SI (reg/v:DI 64 [ b+-4 ]) 4) - the b argument is passed zero
> extended and so it doesn't really matter how many bits we extract, as long
> as it is 30 or more. If I try instead:
> __attribute__ ((noinline, noipa)) unsigned int
> rosbg_si_srl (unsigned int a, unsigned long long b)
> {
> return a | ((unsigned) b >> 2);
> }
> __attribute__ ((noinline, noipa)) unsigned int
> rxsbg_si_srl (unsigned int a, unsigned long long b)
> {
> return a ^ ((unsigned) b >> 2);
> }
> then both the unpatched and patched compiler emit properly
> rosbg %r2,%r3,34,63,62
> and
> rxsbg %r2,%r3,34,63,62
> through a different pattern, because in that case we must not rely on the
> upper 32 bits of b being zero.
>
> In addition to this change, the patch adds a cleanup, there is no reason to
> use a static buffer in each instruction and increase global state, we can
> just tweak the arguments and let the caller deal with it. That is something
> normally done in other parts of the s390.md as well. As small CONST_INTs
> are hashed, it shouldn't increase compile time memory.
>
> Bootstrapped/regtested on s390x-linux, ok for trunk?
>
> 2019-02-16 Jakub Jelinek <jakub@redhat.com>
>
> PR target/89369
> * config/s390/s390.md (*r<noxa>sbg_<mode>_srl_bitmask,
> *r<noxa>sbg_<mode>_sll, *r<noxa>sbg_<mode>_srl): Don't construct
> pattern in a temporary buffer.
> (*r<noxa>sbg_sidi_srl): Likewise. Always use 32 as I3 rather
> than 64-operands[2].
>
> * gcc.c-torture/execute/pr89369.c: New test.
> * gcc.target/s390/md/rXsbg_mode_sXl.c (rosbg_si_srl,
> rxsbg_si_srl): Expect last 3 operands 32,63,62 rather than
> 34,63,62.
Ok. Thanks!
Andreas
More information about the Gcc-patches
mailing list