This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] ARM PR68620 (ICE with FP16 on armeb)
- From: Alan Lawrence <alan dot lawrence at arm dot com>
- To: christophe dot lyon at linaro dot org
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Mon, 18 Jan 2016 19:01:51 +0000
- Subject: Re: [PATCH] ARM PR68620 (ICE with FP16 on armeb)
- Authentication-results: sourceware.org; auth=none
- References: <CAKdteOZbgTw44UarkL9zmmZBd3_gGPof9v0m8mPH-xu9=8yqqQ at mail dot gmail dot com>
Thanks for working on this, Christophe, and sorry I missed the PR. You got
further in fixing more things than I did though :). A couple of comments:
> For the vec_set<mode>_internal and neon_vld1_dup<mode> patterns, I
> switched to an existing iterator which already had the needed
> V4HF/V8HF (so I switched to VD_LANE and VQ2).
It's a separate issue, and I hadn't done this either, but looking again - I
don't see any reason why we shouldn't apply VD->VD_LANE to the vec_extract
standard name pattern too. (At present looks like we have vec_extractv8hf but no
vec_extractv4hf ?)
> For neon_vdupn, I chose to implement neon_vdup_nv4hf and
> neon_vdup_nv8hf instead of updating the VX iterator because I thought
> it was not desirable to impact neon_vrev32<mode>.
Well, the same instruction will suffice for vrev32'ing vectors of HF just as
well as vectors of HI, so I think I'd argue that's harmless enough. To gain the
benefit, we'd need to update arm_evpc_neon_vrev with a few new cases, though.
> @@ -5252,12 +5252,22 @@ vget_lane_s32 (int32x2_t __a, const int __b)
> were marked always-inline so there were no call sites, the declaration
> would nonetheless raise an error. Hence, we must use a macro instead. */
>
> + /* For big-endian, GCC's vector indices are the opposite way around
> + to the architectural lane indices used by Neon intrinsics. */
Not quite the opposite way around, as you take into account yourself! 'Reversed
within each 64 bits', perhaps?
> +#ifdef __ARM_BIG_ENDIAN
> + /* Here, 3 is (4-1) where 4 is the number of lanes. This is also the
> + right value for vectors with 8 lanes. */
> +#define __arm_lane(__vec, __idx) (__idx ^ 3)
> +#else
> +#define __arm_lane(__vec, __idx) __idx
> +#endif
> +
Looks right, but sounds... my concern here is that I'm hoping at some point we
will move the *other* vget/set_lane intrinsics to use GCC vector extensions
too. At which time (unlike __aarch64_lane which can be used everywhere) this
will be the wrong formula. Can we name (and/or comment) it to avoid misleading
anyone? The key characteristic seems to be that it is for vectors of 16-bit
elements only.
> @@ -5334,7 +5344,7 @@ vgetq_lane_s32 (int32x4_t __a, const int __b)
> ({ \
> float16x8_t __vec = (__v); \
> __builtin_arm_lane_check (8, __idx); \
> - float16_t __res = __vec[__idx]; \
> + float16_t __res = __vec[__arm_lane(__vec, __idx)]; \
In passing - the function name in the @@ header is of course misleading, this is #define vgetq_lane_f16 (and the later hunks)
Thanks, Alan