Bug 92768 - [8/9 Regression] Maybe a wrong code for vector constants
Summary: [8/9 Regression] Maybe a wrong code for vector constants
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 10.0
: P3 normal
Target Milestone: 8.4
Assignee: rsandifo@gcc.gnu.org
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2019-12-03 12:55 UTC by Martin Liška
Modified: 2019-12-06 17:50 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work: 7.5.0
Known to fail: 10.0, 8.3.0, 9.2.0
Last reconfirmed: 2019-12-03 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Liška 2019-12-03 12:55:00 UTC
I have a test-case from graphene package that started to fail with r273194:

$ cat matrix.c
#include <xmmintrin.h>

#define GRAPHENE_ALIGN16  __attribute__((aligned(16)))

typedef int v4si __attribute__((vector_size(16)));
typedef float v4sf __attribute__((vector_size(16)));

typedef __m128 graphene_simd4f_t;
typedef struct {
  int f[4];
} graphene_simd4f_uif_t;

__m128
graphene_simd4x4f_inverse(__m128 r1_sum)
{
  const graphene_simd4f_uif_t __pnpn = { {0x00000000, 0x80000000, 0x00000000, 0x80000000}};
  return (graphene_simd4f_t) _mm_xor_ps((r1_sum), _mm_load_ps(__pnpn.f));
}

int main()
{
  __m128 a = { -1.0f, -1.0f, -1.0f, -1.0f };
  if (graphene_simd4x4f_inverse (a)[1] != 1.0)
    __builtin_abort ();

  return 0;
}

$ gcc -O2 matrix.c -ffast-math && ./a.out 
matrix.c: In function ‘graphene_simd4x4f_inverse’:
matrix.c:17:69: warning: passing argument 1 of ‘_mm_load_ps’ from incompatible pointer type [-Wincompatible-pointer-types]
   17 |   return (graphene_simd4f_t) _mm_xor_ps((r1_sum), _mm_load_ps(__pnpn.f));
      |                                                               ~~~~~~^~
      |                                                                     |
      |                                                                     const int *
In file included from matrix.c:1:
/home/marxin/bin/gcc2/lib64/gcc/x86_64-pc-linux-gnu/10.0.0/include/xmmintrin.h:925:27: note: expected ‘const float *’ but argument is of type ‘const int *’
  925 | _mm_load_ps (float const *__P)
      |              ~~~~~~~~~~~~~^~~
Aborted (core dumped)

But without -ffast-math it's fine:

$ gcc -O2 matrix.c  && ./a.out 
matrix.c: In function ‘graphene_simd4x4f_inverse’:
matrix.c:17:69: warning: passing argument 1 of ‘_mm_load_ps’ from incompatible pointer type [-Wincompatible-pointer-types]
   17 |   return (graphene_simd4f_t) _mm_xor_ps((r1_sum), _mm_load_ps(__pnpn.f));
      |                                                               ~~~~~~^~
      |                                                                     |
      |                                                                     const int *
In file included from matrix.c:1:
/home/marxin/bin/gcc2/lib64/gcc/x86_64-pc-linux-gnu/10.0.0/include/xmmintrin.h:925:27: note: expected ‘const float *’ but argument is of type ‘const int *’
  925 | _mm_load_ps (float const *__P)
      |              ~~~~~~~~~~~~~^~~
Comment 1 Martin Liška 2019-12-03 12:55:58 UTC
clang seems to be happy with -ffast-math:

$ clang -O2 matrix.c -ffast-math && ./a.out 
matrix.c:17:63: warning: incompatible pointer types passing 'int const[4]' to parameter of type 'const float *' [-Wincompatible-pointer-types]
  return (graphene_simd4f_t) _mm_xor_ps((r1_sum), _mm_load_ps(__pnpn.f));
                                                              ^~~~~~~~
/usr/lib64/clang/9.0.0/include/xmmintrin.h:1723:26: note: passing argument to parameter '__p' here
_mm_load_ps(const float *__p)
                         ^
1 warning generated.
Comment 2 Richard Biener 2019-12-03 13:07:47 UTC
Testcase already failing with GCC 8 (but not GCC 7 appearantly):

#include <xmmintrin.h>

typedef int v4si __attribute__((vector_size(16)));
typedef float v4sf __attribute__((vector_size(16)));

__m128
graphene_simd4x4f_inverse(__m128 r1_sum)
{
  v4si temi = (v4si) {0x00000000, 0x80000000, 0x00000000, 0x80000000};
  v4sf temf = (v4sf) temi;
  return (__m128) _mm_xor_ps((r1_sum), temf);
}

int main()
{
  __m128 a = { -1.0f, -1.0f, -1.0f, -1.0f };
  if (graphene_simd4x4f_inverse (a)[1] != 1.0)
    __builtin_abort ();

  return 0;
}

somehow we are "normalizing" FP numbers in constant folding, losing the signed zeros.  That worked in GCC 7.

Guess we can bisect that.
Comment 3 Jakub Jelinek 2019-12-03 13:11:27 UTC
Slightly cleaned up testcase:
#include <xmmintrin.h>

struct S { int f[4]; };

__m128
foo (__m128 x)
{
  const struct S a = { {0x00000000, 0x80000000, 0x00000000, 0x80000000}};
  return _mm_xor_ps (x, _mm_load_ps ((float *) a.f));
}

int
main ()
{
  __m128 a = { -1.0f, -1.0f, -1.0f, -1.0f };
  if (foo  (a)[1] != 1.0)
    __builtin_abort ();

  return 0;
}
Comment 4 Martin Liška 2019-12-03 13:17:44 UTC
Started with r255474.
Comment 5 Jakub Jelinek 2019-12-03 13:18:24 UTC
Even more reduced:
#include <xmmintrin.h>

__m128
foo (__m128 x)
{
  int f[4] __attribute__((aligned (16)))
    = { 0x00000000, 0x80000000, 0x00000000, 0x80000000 };
  return _mm_xor_ps (x, *(__m128 *) f);
}

int
main ()
{
  __m128 a = { -1.0f, -1.0f, -1.0f, -1.0f };
  if (foo (a)[1] != 1.0)
    __builtin_abort ();
  return 0;
}

Though, with -fno-signed-zeros, we say that the sign of a zero isn't significant, but for this testcase it is very much significant.
So, maybe invalid?
Comment 6 Richard Biener 2019-12-03 13:19:19 UTC
Possibly the ->equal_p () use in vector-builder elides the -0.0 since it
may appear "equal" to 0.0?
Comment 7 Richard Biener 2019-12-03 13:22:17 UTC
(In reply to Jakub Jelinek from comment #5)
> Even more reduced:
> #include <xmmintrin.h>
> 
> __m128
> foo (__m128 x)
> {
>   int f[4] __attribute__((aligned (16)))
>     = { 0x00000000, 0x80000000, 0x00000000, 0x80000000 };
>   return _mm_xor_ps (x, *(__m128 *) f);
> }
> 
> int
> main ()
> {
>   __m128 a = { -1.0f, -1.0f, -1.0f, -1.0f };
>   if (foo (a)[1] != 1.0)
>     __builtin_abort ();
>   return 0;
> }
> 
> Though, with -fno-signed-zeros, we say that the sign of a zero isn't
> significant, but for this testcase it is very much significant.
> So, maybe invalid?

Well, clearly the testcase simply loads a vector which we do not expect
to alter bit-patterns.

The issue is that besides the new vector encoder stuff, that
native_interpret_expr can end up "normalizing" a FP number as well.

Note that various intrinsic docs suggest to use the matching
xor_ps variant if the main work is on float vectors while xor_pd
when working on integer vectors.
Comment 8 Alexander Monakov 2019-12-03 13:46:44 UTC
Previously, in PR 86999 I pointed out to the reporter that it was okay for gcc to turn a vector constructor with negative zeros to a trivial all-positive-zeros constructor under -fno-signed-zeros, and nobody contradicted me at the time.

I think the documentation needs to be clarified if that's not the intent, right now I cannot for sure deduce from the manual what exactly the optimizations may or may not do when constant propagation or such produces a "negative zero" value.
Comment 9 rsandifo@gcc.gnu.org 2019-12-03 13:47:08 UTC
Mine.
Comment 10 Jakub Jelinek 2019-12-03 13:51:46 UTC
The vector builder calls operand_equal_p, which in turn does:
      case REAL_CST:
        if (real_identical (&TREE_REAL_CST (arg0), &TREE_REAL_CST (arg1)))
          return true;
      
      
        if (!HONOR_SIGNED_ZEROS (arg0))
          {
            /* If we do not distinguish between signed and unsigned zero,
               consider them equal.  */
            if (real_zerop (arg0) && real_zerop (arg1))
              return true;
          }
        return false;
So, what could be done is temporarily enable flag_signed_zeros in the vector builder.
Comment 11 rguenther@suse.de 2019-12-03 14:39:08 UTC
On Tue, 3 Dec 2019, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92768
> 
> Jakub Jelinek <jakub at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |jakub at gcc dot gnu.org
> 
> --- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> The vector builder calls operand_equal_p, which in turn does:
>       case REAL_CST:
>         if (real_identical (&TREE_REAL_CST (arg0), &TREE_REAL_CST (arg1)))
>           return true;
> 
> 
>         if (!HONOR_SIGNED_ZEROS (arg0))
>           {
>             /* If we do not distinguish between signed and unsigned zero,
>                consider them equal.  */
>             if (real_zerop (arg0) && real_zerop (arg1))
>               return true;
>           }
>         return false;
> So, what could be done is temporarily enable flag_signed_zeros in the vector
> builder.

Alternatively add another flag to operand_equal_p to say whether
exact literal equality is asked for.
Comment 12 Jakub Jelinek 2019-12-03 15:09:12 UTC
(In reply to rguenther@suse.de from comment #11)
> Alternatively add another flag to operand_equal_p to say whether
> exact literal equality is asked for.

That is fine with me.  Though, as I said on IRC, it can work then by accident, but might break any time, e.g. won't VN if it sees with -fno-signed-zeros:
  _2 = { 0.f, 0.f, 0.f, 0.f };
  use (_2);
  ...
  _5 = { 0.f, -0.f, 0.f, -0.f };
  use (_5);
happily replace _5 with _2, or anything else that uses operand_equal_p and won't pass this new magic flag?
Comment 13 rguenther@suse.de 2019-12-03 17:05:53 UTC
On December 3, 2019 4:09:12 PM GMT+01:00, "jakub at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92768
>
>--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
>(In reply to rguenther@suse.de from comment #11)
>> Alternatively add another flag to operand_equal_p to say whether
>> exact literal equality is asked for.
>
>That is fine with me.  Though, as I said on IRC, it can work then by
>accident,
>but might break any time, e.g. won't VN if it sees with
>-fno-signed-zeros:
>  _2 = { 0.f, 0.f, 0.f, 0.f };
>  use (_2);
>  ...
>  _5 = { 0.f, -0.f, 0.f, -0.f };
>  use (_5);
>happily replace _5 with _2, or anything else that uses operand_equal_p
>and
>won't pass this new magic flag?

Yes.
Comment 14 Richard Biener 2019-12-04 09:55:29 UTC
So I think it's unfortunate that we break testcases like this using _mm_xor_ps
with -ffast-math since users expect the mask to not be treated as signed zero/zero.  The error here obviously lies in the use of FP vectors for arguments
to _mm_xor_ps but I have no idea how we can rectify users (correct) expectation
here.  As Jakub says fixing the regression doesn't really solve the issue in general.

Still I'd not expect middle-end core-infrastructure like the vector builder
to strip off signs from zeros.
Comment 15 Jakub Jelinek 2019-12-04 10:06:56 UTC
For SSE2+, the code can of course use _mm_xor_si128 instead and _mm_castsi128_ps/_mm_castps_si128, but for SSE that is not an option.
And not really sure what can be done there, the _mm_xor_ps arguments are given, and even if we e.g. change the xorps builtin to take integral vectors, there still would be casts to the floating point vectors and still risk of the optimizers with -fno-signed-zeros to optimize that away.
Comment 16 rsandifo@gcc.gnu.org 2019-12-05 14:21:09 UTC
Author: rsandifo
Date: Thu Dec  5 14:20:38 2019
New Revision: 279002

URL: https://gcc.gnu.org/viewcvs?rev=279002&root=gcc&view=rev
Log:
Check for bitwise identity when encoding VECTOR_CSTs (PR 92768)

This PR shows that we weren't checking for bitwise-identical values
when trying to encode a VECTOR_CST, so -0.0 was treated the same as
0.0 for -fno-signed-zeros.  The patch adds a new OEP flag to select
that behaviour.

2019-12-05  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	PR middle-end/92768
	* tree-core.h (OEP_BITWISE): New flag.
	* fold-const.c (operand_compare::operand_equal_p): Handle it.
	* tree-vector-builder.h (tree_vector_builder::equal_p): Pass it.

gcc/testsuite/
	PR middle-end/92768
	* gcc.dg/pr92768.c: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/pr92768.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/fold-const.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-core.h
    trunk/gcc/tree-vector-builder.h
Comment 17 Jakub Jelinek 2019-12-05 23:53:41 UTC
Author: jakub
Date: Thu Dec  5 23:53:09 2019
New Revision: 279024

URL: https://gcc.gnu.org/viewcvs?rev=279024&root=gcc&view=rev
Log:
	PR tree-optimization/92768
	* gcc.dg/pr92768.c: Add -w -Wno-psabi to dg-options.

Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/pr92768.c
Comment 18 Jakub Jelinek 2019-12-06 17:50:53 UTC
Fixed on the trunk.