[Bug target/97194] optimize vector element set/extract at variable position
rguenth at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Thu Sep 24 14:39:47 GMT 2020
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
So for set with T == int and N == 32 we could generate
vmovd %edi, %xmm1
vpbroadcastd %xmm1, %ymm1
vpcmpeqd .LC0(%rip), %ymm1, %ymm2
vpblendvb %ymm2, %ymm1, %ymm0, %ymm0
ret
.LC0:
.long 0
.long 1
.long 2
.long 3
.long 4
.long 5
.long 6
.long 7
aka, with GCC generic vectors
V setg (V v, int idx, T val)
{
V valv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == valv);
v = (v & ~mask) | (valv & mask);
return v;
}
There's ongoing patch iteration on the ml adding variable index vec_set
expanders for powerpc (and the related middle-end changes). The question
is whether optabs can try many things or the target should have the choice
(probably better).
Eventually there's a more efficient way to generate {0, 1, 2, 3...}.
More information about the Gcc-bugs
mailing list