I have a test-case from graphene package that started to fail with r273194: $ cat matrix.c #include <xmmintrin.h> #define GRAPHENE_ALIGN16 __attribute__((aligned(16))) typedef int v4si __attribute__((vector_size(16))); typedef float v4sf __attribute__((vector_size(16))); typedef __m128 graphene_simd4f_t; typedef struct { int f[4]; } graphene_simd4f_uif_t; __m128 graphene_simd4x4f_inverse(__m128 r1_sum) { const graphene_simd4f_uif_t __pnpn = { {0x00000000, 0x80000000, 0x00000000, 0x80000000}}; return (graphene_simd4f_t) _mm_xor_ps((r1_sum), _mm_load_ps(__pnpn.f)); } int main() { __m128 a = { -1.0f, -1.0f, -1.0f, -1.0f }; if (graphene_simd4x4f_inverse (a)[1] != 1.0) __builtin_abort (); return 0; } $ gcc -O2 matrix.c -ffast-math && ./a.out matrix.c: In function ‘graphene_simd4x4f_inverse’: matrix.c:17:69: warning: passing argument 1 of ‘_mm_load_ps’ from incompatible pointer type [-Wincompatible-pointer-types] 17 | return (graphene_simd4f_t) _mm_xor_ps((r1_sum), _mm_load_ps(__pnpn.f)); | ~~~~~~^~ | | | const int * In file included from matrix.c:1: /home/marxin/bin/gcc2/lib64/gcc/x86_64-pc-linux-gnu/10.0.0/include/xmmintrin.h:925:27: note: expected ‘const float *’ but argument is of type ‘const int *’ 925 | _mm_load_ps (float const *__P) | ~~~~~~~~~~~~~^~~ Aborted (core dumped) But without -ffast-math it's fine: $ gcc -O2 matrix.c && ./a.out matrix.c: In function ‘graphene_simd4x4f_inverse’: matrix.c:17:69: warning: passing argument 1 of ‘_mm_load_ps’ from incompatible pointer type [-Wincompatible-pointer-types] 17 | return (graphene_simd4f_t) _mm_xor_ps((r1_sum), _mm_load_ps(__pnpn.f)); | ~~~~~~^~ | | | const int * In file included from matrix.c:1: /home/marxin/bin/gcc2/lib64/gcc/x86_64-pc-linux-gnu/10.0.0/include/xmmintrin.h:925:27: note: expected ‘const float *’ but argument is of type ‘const int *’ 925 | _mm_load_ps (float const *__P) | ~~~~~~~~~~~~~^~~

clang seems to be happy with -ffast-math: $ clang -O2 matrix.c -ffast-math && ./a.out matrix.c:17:63: warning: incompatible pointer types passing 'int const[4]' to parameter of type 'const float *' [-Wincompatible-pointer-types] return (graphene_simd4f_t) _mm_xor_ps((r1_sum), _mm_load_ps(__pnpn.f)); ^~~~~~~~ /usr/lib64/clang/9.0.0/include/xmmintrin.h:1723:26: note: passing argument to parameter '__p' here _mm_load_ps(const float *__p) ^ 1 warning generated.

Testcase already failing with GCC 8 (but not GCC 7 appearantly): #include <xmmintrin.h> typedef int v4si __attribute__((vector_size(16))); typedef float v4sf __attribute__((vector_size(16))); __m128 graphene_simd4x4f_inverse(__m128 r1_sum) { v4si temi = (v4si) {0x00000000, 0x80000000, 0x00000000, 0x80000000}; v4sf temf = (v4sf) temi; return (__m128) _mm_xor_ps((r1_sum), temf); } int main() { __m128 a = { -1.0f, -1.0f, -1.0f, -1.0f }; if (graphene_simd4x4f_inverse (a)[1] != 1.0) __builtin_abort (); return 0; } somehow we are "normalizing" FP numbers in constant folding, losing the signed zeros. That worked in GCC 7. Guess we can bisect that.

Slightly cleaned up testcase: #include <xmmintrin.h> struct S { int f[4]; }; __m128 foo (__m128 x) { const struct S a = { {0x00000000, 0x80000000, 0x00000000, 0x80000000}}; return _mm_xor_ps (x, _mm_load_ps ((float *) a.f)); } int main () { __m128 a = { -1.0f, -1.0f, -1.0f, -1.0f }; if (foo (a)[1] != 1.0) __builtin_abort (); return 0; }

Started with r255474.

Even more reduced: #include <xmmintrin.h> __m128 foo (__m128 x) { int f[4] __attribute__((aligned (16))) = { 0x00000000, 0x80000000, 0x00000000, 0x80000000 }; return _mm_xor_ps (x, *(__m128 *) f); } int main () { __m128 a = { -1.0f, -1.0f, -1.0f, -1.0f }; if (foo (a)[1] != 1.0) __builtin_abort (); return 0; } Though, with -fno-signed-zeros, we say that the sign of a zero isn't significant, but for this testcase it is very much significant. So, maybe invalid?

Possibly the ->equal_p () use in vector-builder elides the -0.0 since it may appear "equal" to 0.0?

(In reply to Jakub Jelinek from comment #5) > Even more reduced: > #include <xmmintrin.h> > > __m128 > foo (__m128 x) > { > int f[4] __attribute__((aligned (16))) > = { 0x00000000, 0x80000000, 0x00000000, 0x80000000 }; > return _mm_xor_ps (x, *(__m128 *) f); > } > > int > main () > { > __m128 a = { -1.0f, -1.0f, -1.0f, -1.0f }; > if (foo (a)[1] != 1.0) > __builtin_abort (); > return 0; > } > > Though, with -fno-signed-zeros, we say that the sign of a zero isn't > significant, but for this testcase it is very much significant. > So, maybe invalid? Well, clearly the testcase simply loads a vector which we do not expect to alter bit-patterns. The issue is that besides the new vector encoder stuff, that native_interpret_expr can end up "normalizing" a FP number as well. Note that various intrinsic docs suggest to use the matching xor_ps variant if the main work is on float vectors while xor_pd when working on integer vectors.

Previously, in PR 86999 I pointed out to the reporter that it was okay for gcc to turn a vector constructor with negative zeros to a trivial all-positive-zeros constructor under -fno-signed-zeros, and nobody contradicted me at the time. I think the documentation needs to be clarified if that's not the intent, right now I cannot for sure deduce from the manual what exactly the optimizations may or may not do when constant propagation or such produces a "negative zero" value.

Mine.

The vector builder calls operand_equal_p, which in turn does: case REAL_CST: if (real_identical (&TREE_REAL_CST (arg0), &TREE_REAL_CST (arg1))) return true; if (!HONOR_SIGNED_ZEROS (arg0)) { /* If we do not distinguish between signed and unsigned zero, consider them equal. */ if (real_zerop (arg0) && real_zerop (arg1)) return true; } return false; So, what could be done is temporarily enable flag_signed_zeros in the vector builder.

On Tue, 3 Dec 2019, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92768 > > Jakub Jelinek <jakub at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |jakub at gcc dot gnu.org > > --- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > The vector builder calls operand_equal_p, which in turn does: > case REAL_CST: > if (real_identical (&TREE_REAL_CST (arg0), &TREE_REAL_CST (arg1))) > return true; > > > if (!HONOR_SIGNED_ZEROS (arg0)) > { > /* If we do not distinguish between signed and unsigned zero, > consider them equal. */ > if (real_zerop (arg0) && real_zerop (arg1)) > return true; > } > return false; > So, what could be done is temporarily enable flag_signed_zeros in the vector > builder. Alternatively add another flag to operand_equal_p to say whether exact literal equality is asked for.

(In reply to rguenther@suse.de from comment #11) > Alternatively add another flag to operand_equal_p to say whether > exact literal equality is asked for. That is fine with me. Though, as I said on IRC, it can work then by accident, but might break any time, e.g. won't VN if it sees with -fno-signed-zeros: _2 = { 0.f, 0.f, 0.f, 0.f }; use (_2); ... _5 = { 0.f, -0.f, 0.f, -0.f }; use (_5); happily replace _5 with _2, or anything else that uses operand_equal_p and won't pass this new magic flag?

On December 3, 2019 4:09:12 PM GMT+01:00, "jakub at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92768 > >--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> --- >(In reply to rguenther@suse.de from comment #11) >> Alternatively add another flag to operand_equal_p to say whether >> exact literal equality is asked for. > >That is fine with me. Though, as I said on IRC, it can work then by >accident, >but might break any time, e.g. won't VN if it sees with >-fno-signed-zeros: > _2 = { 0.f, 0.f, 0.f, 0.f }; > use (_2); > ... > _5 = { 0.f, -0.f, 0.f, -0.f }; > use (_5); >happily replace _5 with _2, or anything else that uses operand_equal_p >and >won't pass this new magic flag? Yes.

So I think it's unfortunate that we break testcases like this using _mm_xor_ps with -ffast-math since users expect the mask to not be treated as signed zero/zero. The error here obviously lies in the use of FP vectors for arguments to _mm_xor_ps but I have no idea how we can rectify users (correct) expectation here. As Jakub says fixing the regression doesn't really solve the issue in general. Still I'd not expect middle-end core-infrastructure like the vector builder to strip off signs from zeros.

For SSE2+, the code can of course use _mm_xor_si128 instead and _mm_castsi128_ps/_mm_castps_si128, but for SSE that is not an option. And not really sure what can be done there, the _mm_xor_ps arguments are given, and even if we e.g. change the xorps builtin to take integral vectors, there still would be casts to the floating point vectors and still risk of the optimizers with -fno-signed-zeros to optimize that away.

Author: rsandifo Date: Thu Dec 5 14:20:38 2019 New Revision: 279002 URL: https://gcc.gnu.org/viewcvs?rev=279002&root=gcc&view=rev Log: Check for bitwise identity when encoding VECTOR_CSTs (PR 92768) This PR shows that we weren't checking for bitwise-identical values when trying to encode a VECTOR_CST, so -0.0 was treated the same as 0.0 for -fno-signed-zeros. The patch adds a new OEP flag to select that behaviour. 2019-12-05 Richard Sandiford <richard.sandiford@arm.com> gcc/ PR middle-end/92768 * tree-core.h (OEP_BITWISE): New flag. * fold-const.c (operand_compare::operand_equal_p): Handle it. * tree-vector-builder.h (tree_vector_builder::equal_p): Pass it. gcc/testsuite/ PR middle-end/92768 * gcc.dg/pr92768.c: New test. Added: trunk/gcc/testsuite/gcc.dg/pr92768.c Modified: trunk/gcc/ChangeLog trunk/gcc/fold-const.c trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-core.h trunk/gcc/tree-vector-builder.h

Author: jakub Date: Thu Dec 5 23:53:09 2019 New Revision: 279024 URL: https://gcc.gnu.org/viewcvs?rev=279024&root=gcc&view=rev Log: PR tree-optimization/92768 * gcc.dg/pr92768.c: Add -w -Wno-psabi to dg-options. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/pr92768.c

Fixed on the trunk.