[Bug target/106453] New: Redundant zero extension after crc32q
amonakov at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Wed Jul 27 09:55:09 GMT 2022
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106453
Bug ID: 106453
Summary: Redundant zero extension after crc32q
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: amonakov at gcc dot gnu.org
Target Milestone: ---
On 64-bit x86, straightforward use of SSE 4.2 crc instruction looks like
#include <immintrin.h>
#include <stdint.h>
uint32_t f(uint32_t c, uint64_t *p, size_t n)
{
for (size_t i = 0; i < n; i++)
c = _mm_crc32_u64(c, p[i]);
return c;
}
On the ISA level, the crc32q instruction takes 64-bit operands, and resulting
assembly is (gcc -O2 -msse4.2):
f:
mov eax, edi
test rdx, rdx
je .L1
lea rdx, [rsi+rdx*8]
.L3:
mov eax, eax
add rsi, 8
crc32 rax, QWORD PTR [rsi-8]
cmp rdx, rsi
jne .L3
.L1:
ret
Note zero-extension of 'eax' (which is usually not move-eliminated since
destination is the same as source).
The crc32q instruction zero-extends rax from the 32-bit result (it also ignores
high 32 bits when reading the destination operand), so I think it should be
possible to model zero extension in the .md pattern, allowing to eliminate the
explicit extension.
A source-level workaround is using a 64-bit variable in the loop, so the
extension happens just once before the loop.
More information about the Gcc-bugs
mailing list