This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] ARM PR68620 (ICE with FP16 on armeb)


On 18 January 2016 at 20:01, Alan Lawrence <alan.lawrence@arm.com> wrote:
> Thanks for working on this, Christophe, and sorry I missed the PR. You got
> further in fixing more things than I did though :). A couple of comments:
>
>> For the vec_set<mode>_internal and neon_vld1_dup<mode> patterns, I
>> switched to an existing iterator which already had the needed
>> V4HF/V8HF (so I switched to VD_LANE and VQ2).
>
> It's a separate issue, and I hadn't done this either, but looking again - I
> don't see any reason why we shouldn't apply VD->VD_LANE to the vec_extract
> standard name pattern too. (At present looks like we have vec_extractv8hf but no
> vec_extractv4hf ?)
>
OK, I'l add that to my patch

>> For neon_vdupn, I chose to implement neon_vdup_nv4hf and
>> neon_vdup_nv8hf instead of updating the VX iterator because I thought
>> it was not desirable to impact neon_vrev32<mode>.
>
> Well, the same instruction will suffice for vrev32'ing vectors of HF just as
> well as vectors of HI, so I think I'd argue that's harmless enough. To gain the
> benefit, we'd need to update arm_evpc_neon_vrev with a few new cases, though.
>
Since this is more intrusive, I'd rather leave that part for later. OK?

>> @@ -5252,12 +5252,22 @@ vget_lane_s32 (int32x2_t __a, const int __b)
>>     were marked always-inline so there were no call sites, the declaration
>>     would nonetheless raise an error.  Hence, we must use a macro instead.  */
>>
>> +  /* For big-endian, GCC's vector indices are the opposite way around
>> +     to the architectural lane indices used by Neon intrinsics.  */
>
> Not quite the opposite way around, as you take into account yourself! 'Reversed
> within each 64 bits', perhaps?
>
OK, I'll try to rephrase that.

>> +#ifdef __ARM_BIG_ENDIAN
>> +  /* Here, 3 is (4-1) where 4 is the number of lanes. This is also the
>> +     right value for vectors with 8 lanes.  */
>> +#define __arm_lane(__vec, __idx) (__idx ^ 3)
>> +#else
>> +#define __arm_lane(__vec, __idx) __idx
>> +#endif
>> +
>
> Looks right, but sounds... my concern here is that I'm hoping at some point we
> will move the *other* vget/set_lane intrinsics to use GCC vector extensions
> too. At which time (unlike __aarch64_lane which can be used everywhere) this
> will be the wrong formula. Can we name (and/or comment) it to avoid misleading
> anyone? The key characteristic seems to be that it is for vectors of 16-bit
> elements only.
>
I'm not to follow, here. Looking at the patterns for
neon_vget_lane<mode>_*internal in neon.md,
I can see 2 flavours: one for VD, one for VQ2. The latter uses "halfelts".

Do you prefer that I create 2 macros (say __arm_lane and __arm_laneq),
that would be similar to the aarch64 ones (by computing the number of
lanes of the input vector), but the "q" one would use half the total
number of lanes instead?

>> @@ -5334,7 +5344,7 @@ vgetq_lane_s32 (int32x4_t __a, const int __b)
>>      ({                                               \
>>        float16x8_t __vec = (__v);             \
>>        __builtin_arm_lane_check (8, __idx);   \
>> -      float16_t __res = __vec[__idx];                \
>> +      float16_t __res = __vec[__arm_lane(__vec, __idx)];     \
>
> In passing - the function name in the @@ header is of course misleading, this is #define vgetq_lane_f16 (and the later hunks)
>
> Thanks, Alan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]