This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [AArch64_be] Fix vtbl[34] and vtbx4
- From: James Greenhalgh <james dot greenhalgh at arm dot com>
- To: Christophe Lyon <christophe dot lyon at linaro dot org>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 8 Oct 2015 10:12:30 +0100
- Subject: Re: [AArch64_be] Fix vtbl[34] and vtbx4
- Authentication-results: sourceware.org; auth=none
- References: <CAKdteObt_dP63aqn3eH6mHiK5zXP+Y_rL+DfN55D=WfK_4cVGw at mail dot gmail dot com> <20151007150941 dot GA31205 at arm dot com> <CAKdteOYBU7y-z0J5d9ijU+O=DZPkLTPjjiRyhD8ywHoa4K5QPw at mail dot gmail dot com>
On Wed, Oct 07, 2015 at 09:07:30PM +0100, Christophe Lyon wrote:
> On 7 October 2015 at 17:09, James Greenhalgh <james.greenhalgh@arm.com> wrote:
> > On Tue, Sep 15, 2015 at 05:25:25PM +0100, Christophe Lyon wrote:
> >
> > Why do we want this for vtbx4 rather than putting out a VTBX instruction
> > directly (as in the inline asm versions you replace)?
> >
> I just followed the pattern used for vtbx3.
>
> > This sequence does make sense for vtbx3.
> In fact, I don't see why vtbx3 and vtbx4 should be different?
The difference between TBL and TBX is in their handling of a request to
select an out-of-range value. For TBL this returns zero, for TBX this
returns the value which was already in the destination register.
Because the byte-vectors used by the TBX instruction in aarch64 are 128-bit
(so two of them togather allow selecting elements in the range 0-31), and
vtbx3 needs to emulate the AArch32 behaviour of picking elements from 3x64-bit
vectors (allowing elements in the range 0-23), we need to manually check for
values which would have been out-of-range on AArch32, but are not out
of range for AArch64 and handle them appropriately. For vtbx4 on the other
hand, 2x128-bit registers give the range 0..31 and 4x64-bit registers give
the range 0..31, so we don't need the special masked handling.
You can find the suggested instruction sequences for the Neon intrinsics
in this document:
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf
> >> /* vtrn */
> >>
> >> __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
> >> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> >> index b8a45d1..dfbd9cd 100644
> >> --- a/gcc/config/aarch64/iterators.md
> >> +++ b/gcc/config/aarch64/iterators.md
> >> @@ -100,6 +100,8 @@
> >> ;; All modes.
> >> (define_mode_iterator VALL [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF V2DF])
> >>
> >> +(define_mode_iterator V8Q [V8QI])
> >> +
> >
> > This can be dropped if you use VAR1 in aarch64-builtins.c.
> >
> > Thanks for working on this, with your patch applied, the only
> > remaining intrinsics I see failing for aarch64_be are:
> >
> > vqtbl2_*8
> > vqtbl2q_*8
> > vqtbl3_*8
> > vqtbl3q_*8
> > vqtbl4_*8
> > vqtbl4q_*8
> >
> > vqtbx2_*8
> > vqtbx2q_*8
> > vqtbx3_*8
> > vqtbx3q_*8
> > vqtbx4_*8
> > vqtbx4q_*8
> >
> Quite possibly. Which tests are you looking at? Since these are
> aarch64-specific, they are not part of the
> tests I added (advsimd-intrinsics). Do you mean
> gcc.target/aarch64/table-intrinsics.c?
Sorry, yes I should have given a reference. I'm running with a variant of
a testcase from the LLVM test-suite repository:
SingleSource/UnitTests/Vector/AArch64/aarch64_neon_intrinsics.c
This has an execute test for most of the intrinsics specified for AArch64.
It needs some modification to cover the intrinsics we don't implement yet.
Thanks,
James