This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [SLP/AArch64] Fix unpack handling for big-endian SVE
- From: Richard Sandiford <richard dot sandiford at linaro dot org>
- To: Richard Biener <richard dot guenther at gmail dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Richard Earnshaw <richard dot earnshaw at arm dot com>, james dot greenhalgh at arm dot com, Marcus Shawcroft <marcus dot shawcroft at arm dot com>
- Date: Fri, 26 Jan 2018 15:04:36 +0000
- Subject: Re: [SLP/AArch64] Fix unpack handling for big-endian SVE
- Authentication-results: sourceware.org; auth=none
- References: <87h8r8og4o.fsf@linaro.org> <CAFiYyc2r7=762xSQpxf9yY=Cr9BTVYLeSR8Me91feXV5_kHXag@mail.gmail.com>
Richard Biener <richard.guenther@gmail.com> writes:
> On Fri, Jan 26, 2018 at 3:21 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> I hadn't realised that on big-endian targets, VEC_UNPACK*HI_EXPR unpacks
>> the low-numbered lanes and VEC_UNPACK*LO_EXPR unpacks the high-numbered
>> lanes. This meant that both the SVE patterns and the handling of
>> fully-masked loops were wrong.
>>
>> The patch deals with that by making sure that all vec_unpack* optabs
>> are define_expands, using BYTES_BIG_ENDIAN to choose the appropriate
>> define_insn. This in turn meant that we can get rid of the duplication
>> between the signed and unsigned patterns for predicates. (We provide
>> implementations of both the signed and unsigned optabs because the sign
>> doesn't matter for predicates: every element contains only one
>> significant bit.)
>>
>> Also, the float unpacks need to unpack one half of the input vector,
>> but the unpacked upper bits are "don't care". There are two obvious
>> ways of handling that: use an unpack (filling with zeros) or use a ZIP
>> (filling with a duplicate of the low bits). The code previously used
>> unpacks, but the sequence involved a subreg that is semantically an
>> element reverse on big-endian targets. Using the ZIP patterns avoids
>> that, and at the moment there's no reason to prefer one over the other
>> for performance reasons, so the patch switches to ZIP unconditionally.
>> As the comment says, it would be easy to optimise this later if UUNPK
>> turns out to be better for some implementations.
>>
>> Tested on aarch64_be-elf, aarch64-linux-gnu and x86_64-linux-gnu.
>> I think the tree-vect-loop-manip.c part is obvious, but OK for the
>> AArch64 bits?
>>
>> Richard
>>
>>
>> 2018-01-26 Richard Sandiford <richard.sandiford@linaro.org>
>>
>> gcc/
>> * tree-vect-loop-manip.c (vect_maybe_permute_loop_masks):
>> Reverse the choice between VEC_UNPACK_LO_EXPR and VEC_UNPACK_HI_EXPR
>> for big-endian.
>> * config/aarch64/iterators.md (hi_lanes_optab): New int attribute.
>> * config/aarch64/aarch64-sve.md
>> (*aarch64_sve_<perm_insn><perm_hilo><mode>): Rename to...
>> (aarch64_sve_<perm_insn><perm_hilo><mode>): ...this.
>> (*extend<mode><Vwide>2): Rename to...
>> (aarch64_sve_extend<mode><Vwide>2): ...this.
>> (vec_unpack<su>_<perm_hilo>_<mode>): Turn into a define_expand,
>> renaming the old pattern to...
>> (aarch64_sve_punpk<perm_hilo>_<mode>): ...this. Only define
>> unsigned packs.
>> (vec_unpack<su>_<perm_hilo>_<SVE_BHSI:mode>): Turn into a
>> define_expand, renaming the old pattern to...
>> (aarch64_sve_<su>unpk<perm_hilo>_<SVE_BHSI:mode>): ...this.
>> (*vec_unpacku_<perm_hilo>_<mode>_no_convert): Delete.
>> (vec_unpacks_<perm_hilo>_<mode>): Take BYTES_BIG_ENDIAN into
>> account when deciding which SVE instruction the optab should use.
>> (vec_unpack<su_optab>_float_<perm_hilo>_vnx4si): Likewise.
>>
>> gcc/testsuite/
>> * gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Expect zips rather
>> than unpacks.
>> * gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise.
>> * gcc.target/aarch64/sve/unpack_float_1.c: Likewise.
>>
>> Index: gcc/tree-vect-loop-manip.c
>> ===================================================================
>> --- gcc/tree-vect-loop-manip.c 2018-01-13 18:02:00.948360274 +0000
>> +++ gcc/tree-vect-loop-manip.c 2018-01-26 14:00:15.717916957 +0000
>> @@ -334,7 +334,8 @@ vect_maybe_permute_loop_masks (gimple_se
>> {
>> tree src = src_rgm->masks[i / 2];
>> tree dest = dest_rgm->masks[i];
>> - tree_code code = (i & 1 ? VEC_UNPACK_HI_EXPR
>> + tree_code code = ((i & 1) == (BYTES_BIG_ENDIAN ? 0 : 1)
>> + ? VEC_UNPACK_HI_EXPR
>> : VEC_UNPACK_LO_EXPR);
>
> This looks broken anyways -- the element oder on GIMPLE is the same
> for little and big-endian. I see two other BYTES_BIG_ENDIAN uses in
> tree-vect-*, not sure when they crept in but they are all broken.
Are you sure? Everything apart from this new code seems consistently
to treat UNPACK_HI as "high part of the register" rather than
"high memory address/lane number". I think changing it now would
break powerpc64 and mips.
Thanks,
Richard