[AArch64] Fix integer vabs intrinsics
James Greenhalgh
james.greenhalgh@arm.com
Fri May 2 09:21:00 GMT 2014
On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote:
> On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh
> <james.greenhalgh@arm.com> wrote:
> >
> > Hi,
> >
> > Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as
> > undefined/impossible, the neon intrinsics vabs intrinsics should behave as
> > the hardware. That is to say, the pseudo-code sequence:
>
>
> Only for signed integer types. You should be able to use an unsigned
> integer type here instead.
If anything, I think that puts us in a worse position. The issue that
inspires this patch is that GCC will happily fold:
t1 = ABS_EXPR (x)
t2 = GE_EXPR (t1, 0)
to
t2 = TRUE
Surely an unsigned integer type is going to suffer the same fate? Certainly I
can imagine somewhere in the compiler there being a fold path for:
(unsigned >= 0) == TRUE
> >
> > a = vabs_s8 (vdup_n_s8 (-128));
> > assert (a >= 0);
> >
> > does not hold. As in hardware
> >
> > abs (-128) == -128
> >
> > Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should avoid
> > it. In fact, we have to be even more careful than that, and keep the integer
> > vabs intrinsics as an unspec in the back end.
>
> No it is not. The mistake is to use signed integer types here. Just
> add a conversion to an unsigned integer vector and it will work
> correctly.
> In fact the ABS rtl code is not undefined for the overflow.
Here we are covering ourselves against a seperate issue. For auto-vectorized
code we want the SABD combine patterns to kick in whenever sensible. For
intrinsics code, in the case where vsub_s8 (x, y) would cause an underflow:
vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y)
So in this case, the combine would be erroneous. Likewise SABA.
Thanks,
James
More information about the Gcc-patches
mailing list