[PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]
Segher Boessenkool
segher@kernel.crashing.org
Thu Sep 24 19:36:28 GMT 2020
Hi!
On Thu, Sep 24, 2020 at 04:55:21PM +0200, Richard Biener wrote:
> Btw, on x86_64 the following produces sth reasonable:
>
> #define N 32
> typedef int T;
> typedef T V __attribute__((vector_size(N)));
> V setg (V v, int idx, T val)
> {
> V valv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
> V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == valv);
> v = (v & ~mask) | (valv & mask);
> return v;
> }
>
> vmovd %edi, %xmm1
> vpbroadcastd %xmm1, %ymm1
> vpcmpeqd .LC0(%rip), %ymm1, %ymm2
> vpblendvb %ymm2, %ymm1, %ymm0, %ymm0
> ret
>
> I'm quite sure you could do sth similar on power?
This only allows inserting aligned elements. Which is probably fine
of course (we don't allow elements that straddle vector boundaries
either, anyway).
And yes, we can do that :-)
That should be
#define N 32
typedef int T;
typedef T V __attribute__((vector_size(N)));
V setg (V v, int idx, T val)
{
V valv = (V){val, val, val, val, val, val, val, val};
V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv);
v = (v & ~mask) | (valv & mask);
return v;
}
after which I get (-march=znver2)
setg:
vmovd %edi, %xmm1
vmovd %esi, %xmm2
vpbroadcastd %xmm1, %ymm1
vpbroadcastd %xmm2, %ymm2
vpcmpeqd .LC0(%rip), %ymm1, %ymm1
vpandn %ymm0, %ymm1, %ymm0
vpand %ymm2, %ymm1, %ymm1
vpor %ymm0, %ymm1, %ymm0
ret
.LC0:
.long 0
.long 1
.long 2
.long 3
.long 4
.long 5
.long 6
.long 7
and for powerpc (changing it to 16B vectors, -mcpu=power9) it is
setg:
addis 9,2,.LC0@toc@ha
mtvsrws 32,5
mtvsrws 33,6
addi 9,9,.LC0@toc@l
lxv 45,0(9)
vcmpequw 0,0,13
xxsel 34,34,33,32
blr
.LC0:
.long 0
.long 1
.long 2
.long 3
(We can generate that 0..3 vector without doing loads; I guess x86 can
do that as well? But it takes more than one insn to do (of course we
have to set up the memory address first *with* the load, heh).)
For power8 it becomes (we need to splat in separate insns):
setg:
addis 9,2,.LC0@toc@ha
mtvsrwz 32,5
mtvsrwz 33,6
addi 9,9,.LC0@toc@l
lxvw4x 45,0,9
xxspltw 32,32,1
xxspltw 33,33,1
vcmpequw 0,0,13
xxsel 34,34,33,32
blr
Segher
More information about the Gcc-patches
mailing list