This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] [x86] Don't use builtins for SIMD shifts


On Sat, 10 Dec 2016, Allan Sandfeld Jensen wrote:

On Saturday 10 December 2016, Marc Glisse wrote:
On Sat, 10 Dec 2016, Marc Glisse wrote:
On Sat, 10 Dec 2016, Allan Sandfeld Jensen wrote:
Replaces the definitions of the shift intrinsics with GCC extension
syntax to
allow GCC to reason about what the instructions does. Tests are added to
ensure the intrinsics still produce the right instructions, and that a
few basic optimizations now work.

I don't think we can do it in such a straightforward way. Those
intrinsics are well defined for any shift argument, while operator<< and
operator>> are considered undefined for many input values. I believe you
may need to use an unsigned type for the lhs of left shifts, and write a
<< (b & 31) to match the semantics for the rhs (I haven't checked
Intel's doc). Which might require adding a few patterns in sse.md to
avoid generating code for that AND.

Oups, apparently I got that wrong, got confused with the scalar case. Left
shift by more than precision is well defined for vectors, but it gives 0,
it doesn't take the shift count modulo the precision. Which is even harder
to explain to gcc (maybe ((unsigned)b<=31)?a<<b:0...). Yes, the way we
model operations in gcc is often inconvenient.

There was a similar issue for LLVM (https://reviews.llvm.org/D3353).

Well, it is undefined behaviour by the C standard, but is it also undefined
inside GCC (since this specifically uses a GCC extension)? I would assume it
just produces 0.

In the optimizers, we tend to handle vectors the same as scalars. I don't remember if we have any optimization that takes advantage of the limited range for the second operand of shifts (mostly we disable optimizations when there is a risk they might produce a shift amount larger than prec), but I don't think we have specified the behavior for larger-than-prec shifts, so it would be dangerous. Note that depending on the platform, gcc may lower vector operations to smaller vectors or even scalars, which makes it inconvenient to have different semantics for vectors and scalars (not impossible, we would for instance need to teach the lowering pass that vec1 << vec2 lowers to { ((unsigned)vec2[0]<=31)?vec1[0]<<vec2[0]:0, ((unsigned)vec2[1]<=31)?vec1[1]<<vec2[1]:0 }). I think on some platforms, shifting vectors does use the modulo behavior, or handles negative shift values as shifts in the other direction, we can't look at just one target.

We could have a middle-end policy of vec << 1234 is not undefined (we cannot assume it doesn't happen) but we don't know what it is defined to, so we avoid any optimization that may have it as input or output. That might mean disabling all shift optimizations though, which would be counter productive.

The non-immediate case is simpler then, because it produces
the instructions which will inherently return the right thing.

It will only generate the instructions you expect if it hasn't already optimized it to something else, which might not even involve a shift anymore.

I expect reviewers (I am only commenting) will need convincing arguments that this is safe (I could be wrong, maybe they'll find reasons that I missed that make it obviously safe).

--
Marc Glisse


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]