[Bug target/106453] New: Redundant zero extension after crc32q

Wed Jul 27 09:55:09 GMT 2022

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106453

            Bug ID: 106453
           Summary: Redundant zero extension after crc32q
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---

On 64-bit x86, straightforward use of SSE 4.2 crc instruction looks like

#include <immintrin.h>
#include <stdint.h>

uint32_t f(uint32_t c, uint64_t *p, size_t n)
{
    for (size_t i = 0; i < n; i++)
        c = _mm_crc32_u64(c, p[i]);
    return c;
}

On the ISA level, the crc32q instruction takes 64-bit operands, and resulting
assembly is (gcc -O2 -msse4.2):

f:
        mov     eax, edi
        test    rdx, rdx
        je      .L1
        lea     rdx, [rsi+rdx*8]
.L3:
        mov     eax, eax
        add     rsi, 8
        crc32   rax, QWORD PTR [rsi-8]
        cmp     rdx, rsi
        jne     .L3
.L1:
        ret

Note zero-extension of 'eax' (which is usually not move-eliminated since
destination is the same as source).

The crc32q instruction zero-extends rax from the 32-bit result (it also ignores
high 32 bits when reading the destination operand), so I think it should be
possible to model zero extension in the .md pattern, allowing to eliminate the
explicit extension.

A source-level workaround is using a 64-bit variable in the loop, so the
extension happens just once before the loop.