This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][AArch64] Improve code generation for float16 vector code
- From: James Greenhalgh <james dot greenhalgh at arm dot com>
- To: Alan Lawrence <Alan dot Lawrence at arm dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 8 Sep 2015 09:21:08 +0100
- Subject: Re: [PATCH][AArch64] Improve code generation for float16 vector code
- Authentication-results: sourceware.org; auth=none
- References: <20150904095450 dot GB18679 at arm dot com> <1441631341-6599-1-git-send-email-alan dot lawrence at arm dot com>
On Mon, Sep 07, 2015 at 02:09:01PM +0100, Alan Lawrence wrote:
> On 04/09/15 13:32, James Greenhalgh wrote:
> > In that case, these should be implemented as inline assembly blocks. As it
> > stands, the code generation for these intrinsics will be very poor with this
> > patch applied.
> >
> > I'm going to hold off OKing this until I see a follow-up to fix the code
> > generation, either replacing those particular intrinsics with inline asm,
> > or doing the more comprehensive fix in the back-end.
> >
> > Thanks,
> > James
>
> In that case, here is the follow-up now ;). This fixes each of the following
> functions to generate a single instruction followed by ret:
> * vld1_dup_f16, vld1q_dup_f16
> * vset_lane_f16, vsetq_lane_f16
> * vget_lane_f16, vgetq_lane_f16
> * For IN of type either float16x4_t or float16x8_t, and constant C:
> return (float16x4_t) {in[C], in[C], in[C], in[C]};
> * Similarly,
> return (float16x8_t) {in[C], in[C], in[C], in[C], in[C], in[C], in[C], in[C]};
> (These correspond intuitively to what one might expect for "vdup_lane_f16",
> "vdup_laneq_f16", "vdupq_lane_f16" and "vdupq_laneq_f16" intrinsics,
> although such intrinsics do not actually exist.)
>
> This patch does not deal with equivalents to vdup_n_s16 and other intrinsics
> that load immediates, rather than using elements of pre-existing vectors.
What is code generation like for these then? if I remeber correctly it
was the vdup_n_f16 implementation that looked most objectionable before.
> I'd welcome thoughts/opinions on what testcase would be appropriate.
> Correctness of all the intrinsics is already tested by the advsimd-intrinsics
> testsuite, and the only way I can see to verify code generation, is to
> scan-assembler looking for particular instructions; do we wish to see more
> scan-assembler tests?
I think these are fine without a test case, as you say corectness is
already handled elsewhere.
> Bootstrapped + check-gcc on aarch64-none-linux-gnu.
OK,
Thanks,
James
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-simd.md (aarch64_simd_dup<mode>,
> aarch64_dup_lane<mode>, aarch64_dup_lane_<vswap_width_name><mode>,
> aarch64_simd_vec_set<mode>, vec_set<mode>, vec_perm_const<mode>,
> vec_init<mode>, *aarch64_simd_ld1r<mode>, vec_extract<mode>): Add
> V4HF and V8HF variants to iterator.
>
> * config/aarch64/aarch64.c (aarch64_evpc_dup): Add V4HF and V8HF cases.
>
> * config/aarch64/iterators.md (VDQF_F16): New.
> (VSWAP_WIDTH, vswap_width_name): Add V4HF and V8HF cases.