This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/85324] New: missing constant propagation on SSE/AVX conversion intrinsics
- From: "kretz at kde dot org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 10 Apr 2018 14:48:30 +0000
- Subject: [Bug target/85324] New: missing constant propagation on SSE/AVX conversion intrinsics
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85324
Bug ID: 85324
Summary: missing constant propagation on SSE/AVX conversion
intrinsics
Product: gcc
Version: 8.0.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
The following test case shows that constant propagation through conversion
intrinsics does not work:
#include <x86intrin.h>
template <class T> using V [[gnu::vector_size(16)]] = T;
// missed optimization:
auto a1() { return 1 + (V< int>)_mm_cvttps_epi32(_mm_set1_ps(1.f)); }
auto b1() { return 1 + (V< long>)_mm_cvttps_epi64(_mm_set1_ps(1.f)); }
auto c1() { return 1 + (V< int>)_mm_cvttpd_epi32(_mm_set1_pd(1.)); }
auto d1() { return 1 + (V< long>)_mm_cvttpd_epi64(_mm_set1_pd(1.)); }
auto e1() { return 1 + (V<short>)_mm_cvtepi32_epi16(_mm_set1_epi32(1)); }
The resulting asm is (`-O3 -march=skylake-avx512 -std=c++17`):
a1():
vcvttps2dq .LC0(%rip), %xmm0
vpaddd %xmm0, %xmm0, %xmm0
ret
b1():
vcvttps2qq .LC0(%rip), %xmm0
vpaddq %xmm0, %xmm0, %xmm0
ret
c1():
vmovdqa64 .LC1(%rip), %xmm0
vcvttpd2dqx .LC5(%rip), %xmm1
vpaddd %xmm0, %xmm1, %xmm0
ret
d1():
vcvttpd2qq .LC5(%rip), %xmm0
vpaddq %xmm0, %xmm0, %xmm0
ret
e1():
vmovdqa64 .LC7(%rip), %xmm1
vmovdqa64 .LC1(%rip), %xmm0
vpmovdw %xmm0, %xmm0
vpaddw %xmm1, %xmm0, %xmm0
ret
It should be a single load of a constant in each function. (A wrapper using
__builtin_constant_p can work around it; cf. https://godbolt.org/g/8dta7B)