This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] [x86] Don't use builtins for SIMD shifts

From: Marc Glisse <marc dot glisse at inria dot fr>
To: Richard Biener <richard dot guenther at gmail dot com>
Cc: Allan Sandfeld Jensen <linux at carewolf dot com>, gcc-patches <gcc-patches at gcc dot gnu dot org>, Uros Bizjak <ubizjak at gmail dot com>
Date: Tue, 13 Dec 2016 10:09:49 +0100 (CET)
Subject: Re: [PATCH] [x86] Don't use builtins for SIMD shifts
Authentication-results: sourceware.org; auth=none
References: <201612101827.41276.linux@carewolf.com> <alpine.DEB.2.20.1612101833060.2039@laptop-mg.saclay.inria.fr> <alpine.DEB.2.20.1612101852430.2039@laptop-mg.saclay.inria.fr> <201612101918.26034.linux@carewolf.com> <alpine.DEB.2.20.1612101919580.2039@laptop-mg.saclay.inria.fr> <CAFiYyc2YHCaLgnLFRaq2zdMBJUZySCjKpKDXSKM+9=vPZJ3-0w@mail.gmail.com>

On Tue, 13 Dec 2016, Richard Biener wrote:

On Sat, Dec 10, 2016 at 7:59 PM, Marc Glisse <marc.glisse@inria.fr> wrote:

On Sat, 10 Dec 2016, Allan Sandfeld Jensen wrote:

On Saturday 10 December 2016, Marc Glisse wrote:


On Sat, 10 Dec 2016, Marc Glisse wrote:


On Sat, 10 Dec 2016, Allan Sandfeld Jensen wrote:


Replaces the definitions of the shift intrinsics with GCC extension
syntax to
allow GCC to reason about what the instructions does. Tests are added
to
ensure the intrinsics still produce the right instructions, and that a
few basic optimizations now work.



I don't think we can do it in such a straightforward way. Those
intrinsics are well defined for any shift argument, while operator<< and
operator>> are considered undefined for many input values. I believe you
may need to use an unsigned type for the lhs of left shifts, and write a
<< (b & 31) to match the semantics for the rhs (I haven't checked
Intel's doc). Which might require adding a few patterns in sse.md to
avoid generating code for that AND.



Oups, apparently I got that wrong, got confused with the scalar case.
Left
shift by more than precision is well defined for vectors, but it gives 0,
it doesn't take the shift count modulo the precision. Which is even
harder
to explain to gcc (maybe ((unsigned)b<=31)?a<<b:0...). Yes, the way we
model operations in gcc is often inconvenient.

There was a similar issue for LLVM (https://reviews.llvm.org/D3353).

Well, it is undefined behaviour by the C standard, but is it also
undefined
inside GCC (since this specifically uses a GCC extension)? I would assume
it
just produces 0.



In the optimizers, we tend to handle vectors the same as scalars. I don't
remember if we have any optimization that takes advantage of the limited
range for the second operand of shifts (mostly we disable optimizations when
there is a risk they might produce a shift amount larger than prec), but I
don't think we have specified the behavior for larger-than-prec shifts, so
it would be dangerous. Note that depending on the platform, gcc may lower
vector operations to smaller vectors or even scalars, which makes it
inconvenient to have different semantics for vectors and scalars (not
impossible, we would for instance need to teach the lowering pass that vec1
<< vec2 lowers to { ((unsigned)vec2[0]<=31)?vec1[0]<<vec2[0]:0,
((unsigned)vec2[1]<=31)?vec1[1]<<vec2[1]:0 }). I think on some platforms,
shifting vectors does use the modulo behavior, or handles negative shift
values as shifts in the other direction, we can't look at just one target.

We could have a middle-end policy of vec << 1234 is not undefined (we cannot
assume it doesn't happen) but we don't know what it is defined to, so we
avoid any optimization that may have it as input or output. That might mean
disabling all shift optimizations though, which would be counter productive.

The non-immediate case is simpler then, because it produces
the instructions which will inherently return the right thing.



It will only generate the instructions you expect if it hasn't already
optimized it to something else, which might not even involve a shift
anymore.

I expect reviewers (I am only commenting) will need convincing arguments
that this is safe (I could be wrong, maybe they'll find reasons that I
missed that make it obviously safe).


I agree that for all the reasons you stated it is important to have
vector (and complex!)
integer ops behave the same with respect to overflow.

That we do not yet take advantage of out-of-bound shifts is merely a
missed optimization.

For Allan, I guess this means that ix86_fold_builtin /ix86_gimple_fold_builtin is where constant folding and combiningconsecutive shifts could be implemented. I agree that it is sad we have toduplicate a lot of code between regular shifts and the builtins.


--
Marc Glisse

References:
- Re: [PATCH] [x86] Don't use builtins for SIMD shifts
  - From: Marc Glisse
- Re: [PATCH] [x86] Don't use builtins for SIMD shifts
  - From: Marc Glisse
- Re: [PATCH] [x86] Don't use builtins for SIMD shifts
  - From: Marc Glisse
- Re: [PATCH] [x86] Don't use builtins for SIMD shifts
  - From: Richard Biener

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]