This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [SLP/AArch64] Fix unpack handling for big-endian SVE

From: Richard Sandiford <richard dot sandiford at linaro dot org>
To: Richard Biener <richard dot guenther at gmail dot com>
Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Richard Earnshaw <richard dot earnshaw at arm dot com>, james dot greenhalgh at arm dot com, Marcus Shawcroft <marcus dot shawcroft at arm dot com>
Date: Fri, 26 Jan 2018 15:04:36 +0000
Subject: Re: [SLP/AArch64] Fix unpack handling for big-endian SVE
Authentication-results: sourceware.org; auth=none
References: <87h8r8og4o.fsf@linaro.org> <CAFiYyc2r7=762xSQpxf9yY=Cr9BTVYLeSR8Me91feXV5_kHXag@mail.gmail.com>

Richard Biener <richard.guenther@gmail.com> writes:
> On Fri, Jan 26, 2018 at 3:21 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> I hadn't realised that on big-endian targets, VEC_UNPACK*HI_EXPR unpacks
>> the low-numbered lanes and VEC_UNPACK*LO_EXPR unpacks the high-numbered
>> lanes.  This meant that both the SVE patterns and the handling of
>> fully-masked loops were wrong.
>>
>> The patch deals with that by making sure that all vec_unpack* optabs
>> are define_expands, using BYTES_BIG_ENDIAN to choose the appropriate
>> define_insn.  This in turn meant that we can get rid of the duplication
>> between the signed and unsigned patterns for predicates.  (We provide
>> implementations of both the signed and unsigned optabs because the sign
>> doesn't matter for predicates: every element contains only one
>> significant bit.)
>>
>> Also, the float unpacks need to unpack one half of the input vector,
>> but the unpacked upper bits are "don't care".  There are two obvious
>> ways of handling that: use an unpack (filling with zeros) or use a ZIP
>> (filling with a duplicate of the low bits).  The code previously used
>> unpacks, but the sequence involved a subreg that is semantically an
>> element reverse on big-endian targets.  Using the ZIP patterns avoids
>> that, and at the moment there's no reason to prefer one over the other
>> for performance reasons, so the patch switches to ZIP unconditionally.
>> As the comment says, it would be easy to optimise this later if UUNPK
>> turns out to be better for some implementations.
>>
>> Tested on aarch64_be-elf, aarch64-linux-gnu and x86_64-linux-gnu.
>> I think the tree-vect-loop-manip.c part is obvious, but OK for the
>> AArch64 bits?
>>
>> Richard
>>
>>
>> 2018-01-26  Richard Sandiford  <richard.sandiford@linaro.org>
>>
>> gcc/
>>         * tree-vect-loop-manip.c (vect_maybe_permute_loop_masks):
>>         Reverse the choice between VEC_UNPACK_LO_EXPR and VEC_UNPACK_HI_EXPR
>>         for big-endian.
>>         * config/aarch64/iterators.md (hi_lanes_optab): New int attribute.
>>         * config/aarch64/aarch64-sve.md
>>         (*aarch64_sve_<perm_insn><perm_hilo><mode>): Rename to...
>>         (aarch64_sve_<perm_insn><perm_hilo><mode>): ...this.
>>         (*extend<mode><Vwide>2): Rename to...
>>         (aarch64_sve_extend<mode><Vwide>2): ...this.
>>         (vec_unpack<su>_<perm_hilo>_<mode>): Turn into a define_expand,
>>         renaming the old pattern to...
>>         (aarch64_sve_punpk<perm_hilo>_<mode>): ...this.  Only define
>>         unsigned packs.
>>         (vec_unpack<su>_<perm_hilo>_<SVE_BHSI:mode>): Turn into a
>>         define_expand, renaming the old pattern to...
>>         (aarch64_sve_<su>unpk<perm_hilo>_<SVE_BHSI:mode>): ...this.
>>         (*vec_unpacku_<perm_hilo>_<mode>_no_convert): Delete.
>>         (vec_unpacks_<perm_hilo>_<mode>): Take BYTES_BIG_ENDIAN into
>>         account when deciding which SVE instruction the optab should use.
>>         (vec_unpack<su_optab>_float_<perm_hilo>_vnx4si): Likewise.
>>
>> gcc/testsuite/
>>         * gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Expect zips rather
>>         than unpacks.
>>         * gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise.
>>         * gcc.target/aarch64/sve/unpack_float_1.c: Likewise.
>>
>> Index: gcc/tree-vect-loop-manip.c
>> ===================================================================
>> --- gcc/tree-vect-loop-manip.c  2018-01-13 18:02:00.948360274 +0000
>> +++ gcc/tree-vect-loop-manip.c  2018-01-26 14:00:15.717916957 +0000
>> @@ -334,7 +334,8 @@ vect_maybe_permute_loop_masks (gimple_se
>>         {
>>           tree src = src_rgm->masks[i / 2];
>>           tree dest = dest_rgm->masks[i];
>> -         tree_code code = (i & 1 ? VEC_UNPACK_HI_EXPR
>> +         tree_code code = ((i & 1) == (BYTES_BIG_ENDIAN ? 0 : 1)
>> +                           ? VEC_UNPACK_HI_EXPR
>>                             : VEC_UNPACK_LO_EXPR);
>
> This looks broken anyways -- the element oder on GIMPLE is the same
> for little and big-endian.  I see two other BYTES_BIG_ENDIAN uses in
> tree-vect-*, not sure when they crept in but they are all broken.

Are you sure?  Everything apart from this new code seems consistently
to treat UNPACK_HI as "high part of the register" rather than
"high memory address/lane number".  I think changing it now would
break powerpc64 and mips.

Thanks,
Richard

Follow-Ups:
- Re: [SLP/AArch64] Fix unpack handling for big-endian SVE
  - From: Richard Biener

References:
- [SLP/AArch64] Fix unpack handling for big-endian SVE
  - From: Richard Sandiford
- Re: [SLP/AArch64] Fix unpack handling for big-endian SVE
  - From: Richard Biener

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]