This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] ARM PR68620 (ICE with FP16 on armeb)

From: Alan Lawrence <alan dot lawrence at arm dot com>
To: christophe dot lyon at linaro dot org
Cc: gcc-patches at gcc dot gnu dot org
Date: Mon, 18 Jan 2016 19:01:51 +0000
Subject: Re: [PATCH] ARM PR68620 (ICE with FP16 on armeb)
Authentication-results: sourceware.org; auth=none
References: <CAKdteOZbgTw44UarkL9zmmZBd3_gGPof9v0m8mPH-xu9=8yqqQ at mail dot gmail dot com>

Thanks for working on this, Christophe, and sorry I missed the PR. You got
further in fixing more things than I did though :). A couple of comments:

> For the vec_set<mode>_internal and neon_vld1_dup<mode> patterns, I
> switched to an existing iterator which already had the needed
> V4HF/V8HF (so I switched to VD_LANE and VQ2).

It's a separate issue, and I hadn't done this either, but looking again - I
don't see any reason why we shouldn't apply VD->VD_LANE to the vec_extract
standard name pattern too. (At present looks like we have vec_extractv8hf but no
vec_extractv4hf ?)

> For neon_vdupn, I chose to implement neon_vdup_nv4hf and
> neon_vdup_nv8hf instead of updating the VX iterator because I thought
> it was not desirable to impact neon_vrev32<mode>.

Well, the same instruction will suffice for vrev32'ing vectors of HF just as
well as vectors of HI, so I think I'd argue that's harmless enough. To gain the
benefit, we'd need to update arm_evpc_neon_vrev with a few new cases, though.

> @@ -5252,12 +5252,22 @@ vget_lane_s32 (int32x2_t __a, const int __b)
>     were marked always-inline so there were no call sites, the declaration
>     would nonetheless raise an error.  Hence, we must use a macro instead.  */
>
> +  /* For big-endian, GCC's vector indices are the opposite way around
> +     to the architectural lane indices used by Neon intrinsics.  */

Not quite the opposite way around, as you take into account yourself! 'Reversed
within each 64 bits', perhaps?

> +#ifdef __ARM_BIG_ENDIAN
> +  /* Here, 3 is (4-1) where 4 is the number of lanes. This is also the
> +     right value for vectors with 8 lanes.  */
> +#define __arm_lane(__vec, __idx) (__idx ^ 3)
> +#else
> +#define __arm_lane(__vec, __idx) __idx
> +#endif
> +

Looks right, but sounds... my concern here is that I'm hoping at some point we
will move the *other* vget/set_lane intrinsics to use GCC vector extensions
too. At which time (unlike __aarch64_lane which can be used everywhere) this
will be the wrong formula. Can we name (and/or comment) it to avoid misleading
anyone? The key characteristic seems to be that it is for vectors of 16-bit
elements only.

> @@ -5334,7 +5344,7 @@ vgetq_lane_s32 (int32x4_t __a, const int __b)
>      ({						\
>        float16x8_t __vec = (__v);		\
>        __builtin_arm_lane_check (8, __idx);	\
> -      float16_t __res = __vec[__idx];		\
> +      float16_t __res = __vec[__arm_lane(__vec, __idx)];	\

In passing - the function name in the @@ header is of course misleading, this is #define vgetq_lane_f16 (and the later hunks)

Thanks, Alan

Follow-Ups:
- Re: [PATCH] ARM PR68620 (ICE with FP16 on armeb)
  - From: Christophe Lyon

References:
- [PATCH] ARM PR68620 (ICE with FP16 on armeb)
  - From: Christophe Lyon

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]