[ARM] PR66791: Replace builtins in vshl_n

Richard Earnshaw Richard.Earnshaw@foss.arm.com
Thu Jul 22 11:58:31 GMT 2021



On 22/07/2021 12:32, Prathamesh Kulkarni wrote:
> On Thu, 22 Jul 2021 at 16:03, Richard Earnshaw
> <Richard.Earnshaw@foss.arm.com> wrote:
>>
>>
>>
>> On 22/07/2021 08:45, Prathamesh Kulkarni via Gcc-patches wrote:
>>> Hi,
>>> The attached patch removes calls to builtins from vshl_n intrinsics,
>>> and replacing them
>>> with left shift operator. The patch passes bootstrap+test on
>>> arm-linux-gnueabihf.
>>>
>>> Altho, I noticed, that the patch causes 3 extra registers to spill
>>> using << instead
>>> of the builtin for vshl_n.c. Could that be perhaps due to inlining of
>>> intrinsics ?
>>> Before patch, the shift operation was performed by call to
>>> __builtin_neon_vshl<type> (__a, __b)
>>> and now it's inlined to __a << __b, which might result in increased
>>> register pressure ?
>>>
>>> Thanks,
>>> Prathamesh
>>>
>>
>>
>> You're missing a ChangeLog for the patch.
> Sorry, updated in this patch.
>>
>> However, I'm not sure about this.  The register shift form of VSHL
>> performs a right shift if the value is negative, which is UB if you
>> write `<<` instead.
>>
>> Have I missed something here?
> Hi Richard,
> According to this article:
> https://developer.arm.com/documentation/den0018/a/NEON-Intrinsics-Reference/Shift/VSHL-N
> For vshl_n, the shift amount is always in the non-negative range for all types.
> 
> I tried using vshl_n_s32 (a, -1), and the compiler emitted following diagnostic:
> foo.c: In function ‘main’:
> foo.c:17:1: error: constant -1 out of range 0 - 31
>     17 | }
>        | ^
> 

It does do that now, but that's because the intrinsic expansion does 
some bounds checking; when you remove the call into the back-end 
intrinsic that will no-longer happen.

I think with this change various things are likely:

- We'll no-longer reject non-immediate values, so users will be able to 
write

         int b = 5;
	vshl_n_s32 (a, b);

   which will expand to a vdup followed by the register form.

- we'll rely on the front-end diagnosing out-of range shifts

- code of the form

	int b = -1;
	vshl_n_s32 (a, b);

   will probably now go through without any errors, especially at low 
optimization levels.  It may end up doing what the user wanted, but it's 
definitely a change in behaviour - and perhaps worse, the compiler might 
diagnose the above as UB and silently throw some stuff away.

It might be that we need to insert some form of static assertion that 
the second argument is a __builtin_constant_p().

R.

> So, is the attached patch OK ?

> 
> Thanks,
> Prathamesh
>>
>> R.


More information about the Gcc-patches mailing list