[PATCH] ARM PR68620 (ICE with FP16 on armeb)

Fri Jan 22 17:07:00 GMT 2016

On 20/01/16 21:10, Christophe Lyon wrote:
> On 19 January 2016 at 15:51, Alan Lawrence <alan.lawrence@foss.arm.com> wrote:
>> On 19/01/16 11:15, Christophe Lyon wrote:
>>
>>>>> For neon_vdupn, I chose to implement neon_vdup_nv4hf and
>>>>> neon_vdup_nv8hf instead of updating the VX iterator because I thought
>>>>> it was not desirable to impact neon_vrev32<mode>.
>>>>
>>>>
>>>> Well, the same instruction will suffice for vrev32'ing vectors of HF just
>>>> as
>>>> well as vectors of HI, so I think I'd argue that's harmless enough. To
>>>> gain the
>>>> benefit, we'd need to update arm_evpc_neon_vrev with a few new cases,
>>>> though.
>>>>
>>> Since this is more intrusive, I'd rather leave that part for later. OK?
>>
>>
>> Sure.
>>
>>>>> +#ifdef __ARM_BIG_ENDIAN
>>>>> +  /* Here, 3 is (4-1) where 4 is the number of lanes. This is also the
>>>>> +     right value for vectors with 8 lanes.  */
>>>>> +#define __arm_lane(__vec, __idx) (__idx ^ 3)
>>>>> +#else
>>>>> +#define __arm_lane(__vec, __idx) __idx
>>>>> +#endif
>>>>> +
>>>>
>>>>
>>>> Looks right, but sounds... my concern here is that I'm hoping at some
>>>> point we
>>>> will move the *other* vget/set_lane intrinsics to use GCC vector
>>>> extensions
>>>> too. At which time (unlike __aarch64_lane which can be used everywhere)
>>>> this
>>>> will be the wrong formula. Can we name (and/or comment) it to avoid
>>>> misleading
>>>> anyone? The key characteristic seems to be that it is for vectors of
>>>> 16-bit
>>>> elements only.
>>>>
>>> I'm not to follow, here. Looking at the patterns for
>>> neon_vget_lane<mode>_*internal in neon.md,
>>> I can see 2 flavours: one for VD, one for VQ2. The latter uses "halfelts".
>>>
>>> Do you prefer that I create 2 macros (say __arm_lane and __arm_laneq),
>>> that would be similar to the aarch64 ones (by computing the number of
>>> lanes of the input vector), but the "q" one would use half the total
>>> number of lanes instead?
>>
>>
>> That works for me! Sthg like:
>>
>> #define __arm_lane(__vec, __idx) NUM_LANES(__vec) - __idx
>> #define __arm_laneq(__vec, __idx) (__idx & (NUM_LANES(__vec)/2)) +
>> (NUM_LANES(__vec)/2 - __idx)
>> //or similarly
>> #define __arm_laneq(__vec, __idx) (__idx ^ (NUM_LANES(__vec)/2 - 1))
>>
>> Alternatively I'd been thinking
>>
>> #define __arm_lane_32xN(__idx) __idx ^ 1
>> #define __arm_lane_16xN(__idx) __idx ^ 3
>> #define __arm_lane_8xN(__idx) __idx ^ 7
>>
>> Bear in mind PR64893 that we had on AArch64 :-(
>>
>
> Here is a new version, based on the comments above.
> I've also removed the addition of arm_fp_ok effective target since I
> added that in my other testsuite patch.
>
> OK now?

Yes. I realize my worry about PR64893 doesn't apply here since we pass constant 
lane numbers / vector bounds into __builtin_arm_lane_check. Thanks!

--Alan

>
> Thanks,
>
> Christophe
>
>> Cheers, Alan