This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [AArch64] Fix integer vabs intrinsics



> On May 2, 2014, at 2:21 AM, James Greenhalgh <james.greenhalgh@arm.com> wrote:
> 
>> On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote:
>> On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh
>> <james.greenhalgh@arm.com> wrote:
>>> 
>>> Hi,
>>> 
>>> Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as
>>> undefined/impossible, the neon intrinsics vabs intrinsics should behave as
>>> the hardware. That is to say, the pseudo-code sequence:
>> 
>> 
>> Only for signed integer types.  You should be able to use an unsigned
>> integer type here instead.
> 
> If anything, I think that puts us in a worse position.

Not if you cast it back. 


> The issue that
> inspires this patch is that GCC will happily fold:
> 
>  t1 = ABS_EXPR (x)
>  t2 = GE_EXPR (t1, 0)
> 
> to
> 
>  t2 = TRUE
> 
> Surely an unsigned integer type is going to suffer the same fate? Certainly I
> can imagine somewhere in the compiler there being a fold path for:

Yes but if add a cast from the unsigned type to the signed type gcc does not optimize that. If it does it is a bug since the overflow is defined there. 

> 
>  (unsigned >= 0) == TRUE
> 
>>> 
>>>  a = vabs_s8 (vdup_n_s8 (-128));
>>>  assert (a >= 0);
>>> 
>>> does not hold. As in hardware
>>> 
>>>  abs (-128) == -128
>>> 
>>> Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should avoid
>>> it. In fact, we have to be even more careful than that, and keep the integer
>>> vabs intrinsics as an unspec in the back end.
>> 
>> No it is not.  The mistake is to use signed integer types here.  Just
>> add a conversion to an unsigned integer vector and it will work
>> correctly.
>> In fact the ABS rtl code is not undefined for the overflow.
> 
> Here we are covering ourselves against a seperate issue. For auto-vectorized
> code we want the SABD combine patterns to kick in whenever sensible. For
> intrinsics code, in the case where vsub_s8 (x, y) would cause an underflow:
> 
>  vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y)
> 
> So in this case, the combine would be erroneous. Likewise SABA.

This sounds like it would problematic for unsigned types  and not just for vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8 instead. Since in rtl overflow and underflow is defined to be wrapping. 

Thanks,
Andrew Pinski

> 
> Thanks,
> James


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]