[Bug target/85538] kortest for 32 and 64 bit masks incorrectly uses k0

kretz at kde dot org gcc-bugzilla@gcc.gnu.org
Fri Apr 27 08:51:00 GMT 2018


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85538

--- Comment #3 from Matthias Kretz <kretz at kde dot org> ---
Some more observations:

1. The instruction sequence:

    kmovq %k1,-0x8(%rsp)
    vmovq -0x8(%rsp),%xmm1
    vmovq %xmm1,%rax
    kmovq %rax,%k0

   should be a simple `kmovq %k1,%k0` instead.

2. Adding `asm("");` before the compare intrinsic makes the problem go away.

3. Using inline asm in place of the kortest intrinsic shows the same preference
for using the k0 register. Test case:

    void bad(__m512i x, __m512i y) {
        auto k = _mm512_cmp_epi8_mask(x, y, _MM_CMPINT_EQ);
        asm("kmovq %0,%%rax" :: "k"(k));
    }

4. The following test cases still unnecessarily prefers k0, but does it with a
nicer `kmovq %k1,%0`:

    auto almost_good(__m512i x, __m512i y) {
        auto k = _mm512_cmp_epi8_mask(x, y, _MM_CMPINT_EQ);
        asm("kmovq %0, %0" : "+k"(k));
        return k;
    }

(cf. https://godbolt.org/g/hZTga4 for 2, 3 and 4)


More information about the Gcc-bugs mailing list