[PATCH V2] aarch64: Add bfloat16 vldN_lane_bf16 + vldNq_lane_bf16 intrisics

Mon Oct 26 16:18:21 GMT 2020

Richard Sandiford <richard.sandiford@arm.com> writes:

> Andrea Corallo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> Hi all,
>>
>> Second version of the patch here implementing the bfloat16_t neon
>> related load intrinsics: vld2_lane_bf16, vld2q_lane_bf16,
>> vld3_lane_bf16, vld3q_lane_bf16 vld4_lane_bf16, vld4q_lane_bf16.
>>
>> This better narrows testcases so they do not cause regressions for the
>> arm backend where these intrinsics are not yet present.
>>
>> Please see refer to:
>> ACLE <https://developer.arm.com/docs/101028/latest>
>> ISA  <https://developer.arm.com/docs/ddi0596/latest>
>
> The intrinsics are documented to require +bf16, but it looks like this
> makes the bf16 forms available without that.  (This is enforced indirectly,
> by complaining that the intrinsic wrapper can't be inlined into a caller
> that uses incompatible target flags.)
>
> Perhaps we should keep the existing intrinsics where they are and
> just move the #undefs to the end, similarly to __aarch64_vget_lane_any.
>
> Thanks,
> Richard

Hi Richard,

thanks for reviewing.  I was wondering if wouldn't be better to wrap the
new intrinsic definition into the correct pragma so the macro definition
stays narrowed.  WDYT?

Thanks

  Andrea