This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/85572] New: faster code for absolute value of __v2di
- From: "kretz at kde dot org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Mon, 30 Apr 2018 09:33:34 +0000
- Subject: [Bug target/85572] New: faster code for absolute value of __v2di
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85572
Bug ID: 85572
Summary: faster code for absolute value of __v2di
Product: gcc
Version: 9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
The absolute value for 64-bit integer SSE vectors is only optimized when
AVX512VL is available. Test case (`-O2 -ffast-math` and one of -mavx512vl,
-msse4, or -msse2):
#include <x86intrin.h>
__v2di abs(__v2di x) {
return x < 0 ? -x : x;
}
With SSE4 I suggest:
abs(long long __vector(2)):
pxor %xmm1, %xmm1
pcmpgtq %xmm0, %xmm1
pxor %xmm1, %xmm0
psubq %xmm1, %xmm0
ret
in C++:
auto neg = reinterpret_cast<__v2di>(x < 0);
return (x ^ neg) - neg;
Without SSE4:
abs(long long __vector(2)):
movdqa %xmm0, %xmm2
pxor %xmm1, %xmm1
psrlq $63, %xmm2
psubq %xmm2, %xmm1
pxor %xmm1, %xmm0
paddq %xmm2, %xmm0
ret
in C++:
auto neg = -reinterpret_cast<__v2di>(reinterpret_cast<__v2du>(x) >> 63);
return (x ^ neg) - neg;
related issue for scalars: #67510