[PATCH] ARM PR68620 (ICE with FP16 on armeb)

Alan Lawrence alan.lawrence@foss.arm.com
Tue Jan 19 14:51:00 GMT 2016

On 19/01/16 11:15, Christophe Lyon wrote:

>>> For neon_vdupn, I chose to implement neon_vdup_nv4hf and
>>> neon_vdup_nv8hf instead of updating the VX iterator because I thought
>>> it was not desirable to impact neon_vrev32<mode>.
>> Well, the same instruction will suffice for vrev32'ing vectors of HF just as
>> well as vectors of HI, so I think I'd argue that's harmless enough. To gain the
>> benefit, we'd need to update arm_evpc_neon_vrev with a few new cases, though.
> Since this is more intrusive, I'd rather leave that part for later. OK?


>>> +#ifdef __ARM_BIG_ENDIAN
>>> +  /* Here, 3 is (4-1) where 4 is the number of lanes. This is also the
>>> +     right value for vectors with 8 lanes.  */
>>> +#define __arm_lane(__vec, __idx) (__idx ^ 3)
>>> +#else
>>> +#define __arm_lane(__vec, __idx) __idx
>>> +#endif
>>> +
>> Looks right, but sounds... my concern here is that I'm hoping at some point we
>> will move the *other* vget/set_lane intrinsics to use GCC vector extensions
>> too. At which time (unlike __aarch64_lane which can be used everywhere) this
>> will be the wrong formula. Can we name (and/or comment) it to avoid misleading
>> anyone? The key characteristic seems to be that it is for vectors of 16-bit
>> elements only.
> I'm not to follow, here. Looking at the patterns for
> neon_vget_lane<mode>_*internal in neon.md,
> I can see 2 flavours: one for VD, one for VQ2. The latter uses "halfelts".
> Do you prefer that I create 2 macros (say __arm_lane and __arm_laneq),
> that would be similar to the aarch64 ones (by computing the number of
> lanes of the input vector), but the "q" one would use half the total
> number of lanes instead?

That works for me! Sthg like:

#define __arm_lane(__vec, __idx) NUM_LANES(__vec) - __idx
#define __arm_laneq(__vec, __idx) (__idx & (NUM_LANES(__vec)/2)) + 
(NUM_LANES(__vec)/2 - __idx)
//or similarly
#define __arm_laneq(__vec, __idx) (__idx ^ (NUM_LANES(__vec)/2 - 1))

Alternatively I'd been thinking

#define __arm_lane_32xN(__idx) __idx ^ 1
#define __arm_lane_16xN(__idx) __idx ^ 3
#define __arm_lane_8xN(__idx) __idx ^ 7

Bear in mind PR64893 that we had on AArch64 :-(

Cheers, Alan

More information about the Gcc-patches mailing list