This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug target/85324] New: missing constant propagation on SSE/AVX conversion intrinsics

From: "kretz at kde dot org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Tue, 10 Apr 2018 14:48:30 +0000
Subject: [Bug target/85324] New: missing constant propagation on SSE/AVX conversion intrinsics
Auto-submitted: auto-generated

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85324

            Bug ID: 85324
           Summary: missing constant propagation on SSE/AVX conversion
                    intrinsics
           Product: gcc
           Version: 8.0.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kretz at kde dot org
  Target Milestone: ---

The following test case shows that constant propagation through conversion
intrinsics does not work:

#include <x86intrin.h>

template <class T> using V [[gnu::vector_size(16)]] = T;

// missed optimization:
auto a1() { return 1 + (V<  int>)_mm_cvttps_epi32(_mm_set1_ps(1.f)); }
auto b1() { return 1 + (V< long>)_mm_cvttps_epi64(_mm_set1_ps(1.f)); }
auto c1() { return 1 + (V<  int>)_mm_cvttpd_epi32(_mm_set1_pd(1.)); }
auto d1() { return 1 + (V< long>)_mm_cvttpd_epi64(_mm_set1_pd(1.)); }
auto e1() { return 1 + (V<short>)_mm_cvtepi32_epi16(_mm_set1_epi32(1)); }

The resulting asm is (`-O3 -march=skylake-avx512 -std=c++17`):
a1():
  vcvttps2dq .LC0(%rip), %xmm0
  vpaddd %xmm0, %xmm0, %xmm0
  ret
b1():
  vcvttps2qq .LC0(%rip), %xmm0
  vpaddq %xmm0, %xmm0, %xmm0
  ret
c1():
  vmovdqa64 .LC1(%rip), %xmm0
  vcvttpd2dqx .LC5(%rip), %xmm1
  vpaddd %xmm0, %xmm1, %xmm0
  ret
d1():
  vcvttpd2qq .LC5(%rip), %xmm0
  vpaddq %xmm0, %xmm0, %xmm0
  ret
e1():
  vmovdqa64 .LC7(%rip), %xmm1
  vmovdqa64 .LC1(%rip), %xmm0
  vpmovdw %xmm0, %xmm0
  vpaddw %xmm1, %xmm0, %xmm0
  ret

It should be a single load of a constant in each function. (A wrapper using
__builtin_constant_p can work around it; cf. https://godbolt.org/g/8dta7B)

Follow-Ups:
- [Bug target/85324] missing constant propagation on SSE/AVX conversion intrinsics
  - From: rguenth at gcc dot gnu.org

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]