[Bug target/85538] kortest for 32 and 64 bit masks incorrectly uses k0
kretz at kde dot org
gcc-bugzilla@gcc.gnu.org
Fri Apr 27 08:51:00 GMT 2018
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85538
--- Comment #3 from Matthias Kretz <kretz at kde dot org> ---
Some more observations:
1. The instruction sequence:
kmovq %k1,-0x8(%rsp)
vmovq -0x8(%rsp),%xmm1
vmovq %xmm1,%rax
kmovq %rax,%k0
should be a simple `kmovq %k1,%k0` instead.
2. Adding `asm("");` before the compare intrinsic makes the problem go away.
3. Using inline asm in place of the kortest intrinsic shows the same preference
for using the k0 register. Test case:
void bad(__m512i x, __m512i y) {
auto k = _mm512_cmp_epi8_mask(x, y, _MM_CMPINT_EQ);
asm("kmovq %0,%%rax" :: "k"(k));
}
4. The following test cases still unnecessarily prefers k0, but does it with a
nicer `kmovq %k1,%0`:
auto almost_good(__m512i x, __m512i y) {
auto k = _mm512_cmp_epi8_mask(x, y, _MM_CMPINT_EQ);
asm("kmovq %0, %0" : "+k"(k));
return k;
}
(cf. https://godbolt.org/g/hZTga4 for 2, 3 and 4)
More information about the Gcc-bugs
mailing list