This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [AArch64_be] Fix vtbl[34] and vtbx4
- From: James Greenhalgh <james dot greenhalgh at arm dot com>
- To: Christophe Lyon <christophe dot lyon at linaro dot org>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Mon, 12 Oct 2015 14:30:23 +0100
- Subject: Re: [AArch64_be] Fix vtbl[34] and vtbx4
- Authentication-results: sourceware.org; auth=none
- References: <CAKdteObt_dP63aqn3eH6mHiK5zXP+Y_rL+DfN55D=WfK_4cVGw at mail dot gmail dot com> <20151007150941 dot GA31205 at arm dot com> <CAKdteOYBU7y-z0J5d9ijU+O=DZPkLTPjjiRyhD8ywHoa4K5QPw at mail dot gmail dot com> <20151008091230 dot GA13098 at arm dot com> <CAKdteOawhToG=aw7sYYvHva4EiW46a2EDWiA9hW8GtbAqmNRkQ at mail dot gmail dot com>
On Fri, Oct 09, 2015 at 05:16:05PM +0100, Christophe Lyon wrote:
> On 8 October 2015 at 11:12, James Greenhalgh <james.greenhalgh@arm.com> wrote:
> > On Wed, Oct 07, 2015 at 09:07:30PM +0100, Christophe Lyon wrote:
> >> On 7 October 2015 at 17:09, James Greenhalgh <james.greenhalgh@arm.com> wrote:
> >> > On Tue, Sep 15, 2015 at 05:25:25PM +0100, Christophe Lyon wrote:
> >> >
> >> > Why do we want this for vtbx4 rather than putting out a VTBX instruction
> >> > directly (as in the inline asm versions you replace)?
> >> >
> >> I just followed the pattern used for vtbx3.
> >>
> >> > This sequence does make sense for vtbx3.
> >> In fact, I don't see why vtbx3 and vtbx4 should be different?
> >
> > The difference between TBL and TBX is in their handling of a request to
> > select an out-of-range value. For TBL this returns zero, for TBX this
> > returns the value which was already in the destination register.
> >
> > Because the byte-vectors used by the TBX instruction in aarch64 are 128-bit
> > (so two of them togather allow selecting elements in the range 0-31), and
> > vtbx3 needs to emulate the AArch32 behaviour of picking elements from 3x64-bit
> > vectors (allowing elements in the range 0-23), we need to manually check for
> > values which would have been out-of-range on AArch32, but are not out
> > of range for AArch64 and handle them appropriately. For vtbx4 on the other
> > hand, 2x128-bit registers give the range 0..31 and 4x64-bit registers give
> > the range 0..31, so we don't need the special masked handling.
> >
> > You can find the suggested instruction sequences for the Neon intrinsics
> > in this document:
> >
> > http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf
> >
>
> Hi James,
>
> Please find attached an updated version which hopefully addresses your comments.
> Tested on aarch64-none-elf and aarch64_be-none-elf using the Foundation Model.
>
> OK?
Looks good to me,
Thanks,
James