This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug middle-end/86999] New: Incorrect code generation and missing optimization with -fno-signed-zeros.
- From: "asd0025 at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 17 Aug 2018 17:03:03 +0000
- Subject: [Bug middle-end/86999] New: Incorrect code generation and missing optimization with -fno-signed-zeros.
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86999
Bug ID: 86999
Summary: Incorrect code generation and missing optimization
with -fno-signed-zeros.
Product: gcc
Version: 8.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: asd0025 at gmail dot com
Target Milestone: ---
Consider the following trivial example (https://godbolt.org/g/5ms6Bf):
#include <limits.h>
typedef float v4f __attribute__((vector_size(16)));
typedef int v4i __attribute__((vector_size(16)));
v4f foo(v4f n, v4f p)
{
return n * p + p;
}
template <int N> v4f __neg1(v4f a)
{
v4i v = {((N & 1) ? INT_MIN : 0), ((N & 2) ? INT_MIN : 0), ((N & 4) ?
INT_MIN : 0), ((N & 8) ? INT_MIN : 0)};
return __builtin_ia32_xorps(a, (v4f)v);
}
template <int N> v4f __neg2(v4f a)
{
v4i v = {((N & 1) ? INT_MIN : 0), ((N & 2) ? INT_MIN : 0), ((N & 4) ?
INT_MIN : 0), ((N & 8) ? INT_MIN : 0)};
return (v4f)((v4i)a ^ v);
}
v4f neg1C(v4f a)
{
return __neg1<0x0C>(a);
}
v4f neg2C(v4f a)
{
return __neg2<0x0C>(a);
}
On GCC 7.x/8.x with -fno-signed-zeros (or implied by other flags eg.: -Ofast)
foo() is not optimal on FMA capable hardware:
foo(float __vector(4), float __vector(4)):
vmulps xmm0, xmm0, xmm1
vaddps xmm0, xmm0, xmm1
ret
With -fsigned-zeros:
foo(float __vector(4), float __vector(4)):
vfmadd132ps xmm0, xmm1, xmm1
ret
Incorrect code is generated only on GCC 8.x with -fno-signed-zeros:
neg1C(float __vector(4)):
ret
With -fsigned-zeros or with GCC 7.x:
neg1C(float __vector(4)):
vxorps xmm0, xmm0, XMMWORD PTR .LC1[rip]
ret
.LC1:
.long 0
.long 0
.long 2147483648
.long 2147483648
Note however when using bitwise xor instead of __builtin_ia32_xorps() the
generated code is correct in all cases:
neg2C(float __vector(4)):
vxorps xmm0, xmm0, XMMWORD PTR .LC1[rip]
ret
.LC1:
.long 0
.long 0
.long 2147483648
.long 2147483648