This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug middle-end/86999] New: Incorrect code generation and missing optimization with -fno-signed-zeros.

From: "asd0025 at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Fri, 17 Aug 2018 17:03:03 +0000
Subject: [Bug middle-end/86999] New: Incorrect code generation and missing optimization with -fno-signed-zeros.
Auto-submitted: auto-generated

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86999

            Bug ID: 86999
           Summary: Incorrect code generation and missing optimization
                    with -fno-signed-zeros.
           Product: gcc
           Version: 8.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: asd0025 at gmail dot com
  Target Milestone: ---

Consider the following trivial example (https://godbolt.org/g/5ms6Bf):

  #include <limits.h>

  typedef float v4f __attribute__((vector_size(16)));
  typedef int v4i __attribute__((vector_size(16)));

  v4f foo(v4f n, v4f p)
  {
     return n * p + p;
  }

  template <int N> v4f __neg1(v4f a)
  {
    v4i v = {((N & 1) ? INT_MIN : 0), ((N & 2) ? INT_MIN : 0), ((N & 4) ?
INT_MIN : 0), ((N & 8) ? INT_MIN : 0)};
    return __builtin_ia32_xorps(a, (v4f)v);
  }

  template <int N> v4f __neg2(v4f a)
  {
    v4i v = {((N & 1) ? INT_MIN : 0), ((N & 2) ? INT_MIN : 0), ((N & 4) ?
INT_MIN : 0), ((N & 8) ? INT_MIN : 0)};
    return (v4f)((v4i)a ^ v);
  }

  v4f neg1C(v4f a)
  {
    return __neg1<0x0C>(a);
  }

  v4f neg2C(v4f a)
  {
    return __neg2<0x0C>(a);
  }

On GCC 7.x/8.x with -fno-signed-zeros (or implied by other flags eg.: -Ofast)
foo() is not optimal on FMA capable hardware:

  foo(float __vector(4), float __vector(4)):
        vmulps  xmm0, xmm0, xmm1
        vaddps  xmm0, xmm0, xmm1
        ret

With -fsigned-zeros:

  foo(float __vector(4), float __vector(4)):
        vfmadd132ps     xmm0, xmm1, xmm1
        ret

Incorrect code is generated only on GCC 8.x with -fno-signed-zeros:

  neg1C(float __vector(4)):
        ret

With -fsigned-zeros or with GCC 7.x:

  neg1C(float __vector(4)):
        vxorps  xmm0, xmm0, XMMWORD PTR .LC1[rip]
        ret
  .LC1:
        .long   0
        .long   0
        .long   2147483648
        .long   2147483648

Note however when using bitwise xor instead of __builtin_ia32_xorps() the
generated code is correct in all cases:

  neg2C(float __vector(4)):
        vxorps  xmm0, xmm0, XMMWORD PTR .LC1[rip]
        ret
  .LC1:
        .long   0
        .long   0
        .long   2147483648
        .long   2147483648

Follow-Ups:
- [Bug middle-end/86999] Incorrect code generation and missing optimization with -fno-signed-zeros.
  - From: amonakov at gcc dot gnu.org
- [Bug middle-end/86999] Incorrect code generation and missing optimization with -fno-signed-zeros.
  - From: asd0025 at gmail dot com
- [Bug middle-end/86999] Incorrect code generation and missing optimization with -fno-signed-zeros.
  - From: glisse at gcc dot gnu.org
- [Bug tree-optimization/86999] missed FMA optimization with -fassociative-math
  - From: rguenth at gcc dot gnu.org

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]