This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics
- From: Sameera Deshpande <sameera dot deshpande at linaro dot org>
- To: James Greenhalgh <james dot greenhalgh at arm dot com>
- Cc: Christophe Lyon <christophe dot lyon at linaro dot org>, gcc-patches <gcc-patches at gcc dot gnu dot org>, Richard Earnshaw <Richard dot Earnshaw at arm dot com>, nd <nd at arm dot com>, Sudi Das <Sudi dot Das at arm dot com>
- Date: Tue, 22 May 2018 21:50:04 +0530
- Subject: Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics
- References: <CAAdirjxgB2dzBN973aesZCBVsv-N20dkrzC8Yoer0Jtb+zZ5ZA@mail.gmail.com> <CAKdteOZqEDT2yNM7jpjCEULFJXcqxnamT1T5jzEJvVaxzUbtkA@mail.gmail.com> <CAAdirjwDY1fXfG1hc4Bv94+g6a9ONOeUjEKqz5vhf5pU9vvbXQ@mail.gmail.com> <CAKdteOaEdhm8CCpBr_vavbWEgbTPm_iux4KjJAv7Bz5_5kkW=A@mail.gmail.com> <20180413143421.GA30909@arm.com> <CAAdirjwrwYBm2Gx_2N_gA-NDMrkD0h++xgbH8f5oHRFvP0tZyg@mail.gmail.com> <20180413145119.GB30909@arm.com> <CAAdirjxVH5jUHPBtpU5i7s2HuyBeS6NX=jNA2Zbz+rF1K5E-ZA@mail.gmail.com> <20180522155641.GD36840@arm.com>
On Tue 22 May, 2018, 9:26 PM James Greenhalgh, <james.greenhalgh@arm.com>
wrote:
> On Mon, Apr 30, 2018 at 06:35:11PM -0500, Sameera Deshpande wrote:
> > On 13 April 2018 at 20:21, James Greenhalgh <james.greenhalgh@arm.com>
> wrote:
> > > On Fri, Apr 13, 2018 at 03:39:32PM +0100, Sameera Deshpande wrote:
> > >> On Fri 13 Apr, 2018, 8:04 PM James Greenhalgh, <
> james.greenhalgh@arm.com<mailto:james.greenhalgh@arm.com>> wrote:
> > >> On Fri, Apr 06, 2018 at 08:55:47PM +0100, Christophe Lyon wrote:
> > >> > Hi,
> > >> >
> > >> > 2018-04-06 12:15 GMT+02:00 Sameera Deshpande <
> sameera.deshpande@linaro.org<mailto:sameera.deshpande@linaro.org>>:
> > >> > > Hi Christophe,
> > >> > >
> > >> > > Please find attached the updated patch with testcases.
> > >> > >
> > >> > > Ok for trunk?
> > >> >
> > >> > Thanks for the update.
> > >> >
> > >> > Since the new intrinsics are only available on aarch64, you want to
> > >> > prevent the tests from running on arm.
> > >> > Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the
> two targets.
> > >> > There are several examples on how to do that in that directory.
> > >> >
> > >> > I have also noticed that the tests fail at execution on aarch64_be.
> > >>
> > >> I think this is important to fix. We don't want the big-endian target
> to have
> > >> failing implementations of the Neon intrinsics. What is the nature of
> the
> > >> failure?
> > >>
> > >> From what I can see, nothing in the patch prevents using these
> intrinsics
> > >> on big-endian, so either the intrinsics behaviour is wrong (we have a
> wrong
> > >> code bug), or the testcase expected behaviour is wrong.
> > >>
> > >> I don't think disabling the test for big-endian is the right fix. We
> should
> > >> either fix the intrinsics, or fix the testcase.
> > >>
> > >> Thanks,
> > >> James
> > >>
> > >> Hi James,
> > >>
> > >> As the tests assume the little endian order of elements while
> checking the
> > >> results, the tests are failing for big endian targets. So, the
> failures are
> > >> not because of intrinsic implementations, but because of the testcase.
> > >
> > > The testcase is a little hard to follow through the macros, but why
> would
> > > this be the case?
> > >
> > > ld1 is deterministic on big and little endian for which elements will
> be
> > > loaded from memory, as is st1.
> > >
> > > My expectation would be that:
> > >
> > > int __attribute__ ((noinline))
> > > test_vld_u16_x3 ()
> > > {
> > > uint16_t data[3 * 3];
> > > uint16_t temp[3 * 3];
> > > uint16x4x3_t vectors;
> > > int i,j;
> > > for (i = 0; i < 3 * 3; i++)
> > > data [i] = (uint16_t) 3*i;
> > > asm volatile ("" : : : "memory");
> > > vectors = vld1_u16_x3 (data);
> > > vst1_u16 (temp, vectors.val[0]);
> > > vst1_u16 (&temp[3], vectors.val[1]);
> > > vst1_u16 (&temp[3 * 2], vectors.val[2]);
> > > asm volatile ("" : : : "memory");
> > > for (j = 0; j < 3 * 3; j++)
> > > if (temp[j] != data[j])
> > > return 1;
> > > return 0;
> > > }
> > >
> > > would work equally well for big- or little-endian.
> > >
> > > I think this is more likely to be an intrinsics implementation bug.
> > >
> > > Thanks,
> > > James
> > >
> >
> > Hi James,
> >
> > Please find attached the updated patch, which now passes for little as
> > well as big endian.
> > Ok for trunk?
>
>
> OK.
>
> Thanks,
> James
>
> >
> > --
> > - Thanks and regards,
> > Sameera D.
> >
> > gcc/Changelog:
> >
> > 2018-05-01 Sameera Deshpande <sameera.deshpande@linaro.org>
> >
> >
> > * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
> > (st1x2): Likewise.
> > (st1x3): Likewise.
> > * config/aarch64/aarch64-simd.md
> > (aarch64_ld1x3<VALLDIF:mode>): New pattern.
> > (aarch64_ld1_x3_<mode>): Likewise
> > (aarch64_st1x2<VALLDIF:mode>): Likewise
> > (aarch64_st1_x2_<mode>): Likewise
> > (aarch64_st1x3<VALLDIF:mode>): Likewise
> > (aarch64_st1_x3_<mode>): Likewise
> > * config/aarch64/arm_neon.h (vld1_u8_x3): New function.
> > (vld1_s8_x3): Likewise.
> > (vld1_u16_x3): Likewise.
> > (vld1_s16_x3): Likewise.
> > (vld1_u32_x3): Likewise.
> > (vld1_s32_x3): Likewise.
> > (vld1_u64_x3): Likewise.
> > (vld1_s64_x3): Likewise.
> > (vld1_f16_x3): Likewise.
> > (vld1_f32_x3): Likewise.
> > (vld1_f64_x3): Likewise.
> > (vld1_p8_x3): Likewise.
> > (vld1_p16_x3): Likewise.
> > (vld1_p64_x3): Likewise.
> > (vld1q_u8_x3): Likewise.
> > (vld1q_s8_x3): Likewise.
> > (vld1q_u16_x3): Likewise.
> > (vld1q_s16_x3): Likewise.
> > (vld1q_u32_x3): Likewise.
> > (vld1q_s32_x3): Likewise.
> > (vld1q_u64_x3): Likewise.
> > (vld1q_s64_x3): Likewise.
> > (vld1q_f16_x3): Likewise.
> > (vld1q_f32_x3): Likewise.
> > (vld1q_f64_x3): Likewise.
> > (vld1q_p8_x3): Likewise.
> > (vld1q_p16_x3): Likewise.
> > (vld1q_p64_x3): Likewise.
> > (vst1_s64_x2): Likewise.
> > (vst1_u64_x2): Likewise.
> > (vst1_f64_x2): Likewise.
> > (vst1_s8_x2): Likewise.
> > (vst1_p8_x2): Likewise.
> > (vst1_s16_x2): Likewise.
> > (vst1_p16_x2): Likewise.
> > (vst1_s32_x2): Likewise.
> > (vst1_u8_x2): Likewise.
> > (vst1_u16_x2): Likewise.
> > (vst1_u32_x2): Likewise.
> > (vst1_f16_x2): Likewise.
> > (vst1_f32_x2): Likewise.
> > (vst1_p64_x2): Likewise.
> > (vst1q_s8_x2): Likewise.
> > (vst1q_p8_x2): Likewise.
> > (vst1q_s16_x2): Likewise.
> > (vst1q_p16_x2): Likewise.
> > (vst1q_s32_x2): Likewise.
> > (vst1q_s64_x2): Likewise.
> > (vst1q_u8_x2): Likewise.
> > (vst1q_u16_x2): Likewise.
> > (vst1q_u32_x2): Likewise.
> > (vst1q_u64_x2): Likewise.
> > (vst1q_f16_x2): Likewise.
> > (vst1q_f32_x2): Likewise.
> > (vst1q_f64_x2): Likewise.
> > (vst1q_p64_x2): Likewise.
> > (vst1_s64_x3): Likewise.
> > (vst1_u64_x3): Likewise.
> > (vst1_f64_x3): Likewise.
> > (vst1_s8_x3): Likewise.
> > (vst1_p8_x3): Likewise.
> > (vst1_s16_x3): Likewise.
> > (vst1_p16_x3): Likewise.
> > (vst1_s32_x3): Likewise.
> > (vst1_u8_x3): Likewise.
> > (vst1_u16_x3): Likewise.
> > (vst1_u32_x3): Likewise.
> > (vst1_f16_x3): Likewise.
> > (vst1_f32_x3): Likewise.
> > (vst1_p64_x3): Likewise.
> > (vst1q_s8_x3): Likewise.
> > (vst1q_p8_x3): Likewise.
> > (vst1q_s16_x3): Likewise.
> > (vst1q_p16_x3): Likewise.
> > (vst1q_s32_x3): Likewise.
> > (vst1q_s64_x3): Likewise.
> > (vst1q_u8_x3): Likewise.
> > (vst1q_u16_x3): Likewise.
> > (vst1q_u32_x3): Likewise.
> > (vst1q_u64_x3): Likewise.
> > (vst1q_f16_x3): Likewise.
> > (vst1q_f32_x3): Likewise.
> > (vst1q_f64_x3): Likewise.
> > (vst1q_p64_x3): Likewise.
> >
> > gcc/testsuite/Changelog:
> >
> > 2018-05-01 Sameera Deshpande <sameera.deshpande@linaro.org>
> >
> > * gcc.target/aarch64/advsimd-intrinsics/vld1x3.c: New test for
> > vld1x3 intrinsics for aarch64.
> > * gcc.target/aarch64/advsimd-intrinsics/vst1x2.c: New test for
> > vst1x2 intrinsics for aarch64.
> > * gcc.target/aarch64/advsimd-intrinsics/vst1x3.c: New test for
> > vst1x3 intrinsics for aarch64.
>
> Thanks James!