[Bug target/87317] New: Missed optimisation: merging VMOVQ with operations that only use the low 8 bytes
thiago at kde dot org
gcc-bugzilla@gcc.gnu.org
Sat Sep 15 06:22:00 GMT 2018
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87317
Bug ID: 87317
Summary: Missed optimisation: merging VMOVQ with operations
that only use the low 8 bytes
Product: gcc
Version: 8.2.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: thiago at kde dot org
Target Milestone: ---
Test:
#include <immintrin.h>
int f(void *ptr)
{
__m128i data = _mm_loadl_epi64((__m128i *)ptr);
data = _mm_cvtepu8_epi16(data);
return _mm_cvtsi128_si32(data);
}
GCC generates (-march=haswell or -march=skylake):
vmovq (%rdi), %xmm0
vpmovzxbw %xmm0, %xmm0
vmovd %xmm0, %eax
ret
Note that the VPMOVZXBW instruction only reads the low 8 bytes from the source,
including if it is a memory reference. Both Clang and ICC generate:
vpmovzxbw (%rdi), %xmm0
vmovd %xmm0, %eax
retq
Similarly for:
void f(void *dst, void *ptr)
{
__m128i data = _mm_cvtsi32_si128(*(int*)ptr);
data = _mm_cvtepu8_epi32(data);
_mm_storeu_si128((__m128i*)dst, data);
}
GCC:
vmovd (%rsi), %xmm0
vpmovzxbd %xmm0, %xmm0
vmovups %xmm0, (%rdi)
ret
Clang and ICC:
vpmovzxbd (%rsi), %xmm0
vmovdqu %xmm0, (%rdi)
retq
There are other instructions that might benefit from this.
AVX-512 memory instructions where the OpMask is a constant might be candidates
too.
More information about the Gcc-bugs
mailing list