[Bug target/100866] PPC: Inefficient code for vec_revb(vector unsigned short) < P9
luoxhu at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Mon Jun 21 02:29:59 GMT 2021
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866
--- Comment #8 from luoxhu at gcc dot gnu.org ---
(In reply to Jens Seifert from comment #7)
> Regarding vec_revb for vector unsigned int. I agree that
> revb:
> .LFB0:
> .cfi_startproc
> vspltish %v1,8
> vspltisw %v0,-16
> vrlh %v2,%v2,%v1
> vrlw %v2,%v2,%v0
> blr
>
> works. But in this case, I would prefer the vperm approach assuming that the
> loaded constant for the permute vector can be re-used multiple times.
> But please get rid of the xxlnor 32,32,32. That does not make sense after
> loading a constant. Change the constant that need to be loaded.
xxlnor is LE specific requirement(not existed if build with -mbig), we need to
turn the index {0,1,2,3} to {31, 30,29,28} for vperm usage, it is required
otherwise produces incorrect result:
6| 0x0000000010000630 <+16>: lvx v0,0,r9
7+> 0x0000000010000634 <+20>: xxlnor vs32,vs32,vs32
8| 0x0000000010000638 <+24>: vperm v2,v2,v2,v0
9| 0x000000001000063c <+28>: blr
(gdb)
0x0000000010000634 in revb ()
2: /x $vs34.uint128 = 0x42345678323456782234567812345678
5: /x $vs32.uint128 = 0xc0d0e0f08090a0b0405060700010203
(gdb) si
0x0000000010000638 in revb ()
2: /x $vs34.uint128 = 0x42345678323456782234567812345678
5: /x $vs32.uint128 = 0xf3f2f1f0f7f6f5f4fbfaf9f8fffefdfc
(gdb) si
0x000000001000063c in revb ()
2: /x $vs34.uint128 = 0x78563442785634327856342278563412
5: /x $vs32.uint128 = 0xf3f2f1f0f7f6f5f4fbfaf9f8fffefdfc
Quoted from the ISA:
vperm VRT,VRA,VRB,VRC
vsrc.qword[0] ← VSR[VRA+32]
vsrc.qword[1] ← VSR[VRB+32]
do i = 0 to 15
index ← VSR[VRC+32].byte[i].bit[3:7]
VSR[VRT+32].byte[i] ← src.byte[index]
end
Let the source vector be the concatenation of the
contents of VSR[VRA+32] followed by the contents of
VSR[VRB+32].
For each integer value i from 0 to 15, do the following.
Let index be the value specified by bits 3:7 of byte
element i of VSR[VRC+32].
The contents of byte element index of src are
placed into byte element i of VSR[VRT+32].
More information about the Gcc-bugs
mailing list