This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [Patch AArch64] Simplify reduc_plus_scal_v2[sd]f sequence
- From: Marcus Shawcroft <marcus dot shawcroft at gmail dot com>
- To: James Greenhalgh <james dot greenhalgh at arm dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 17 May 2016 14:36:29 +0100
- Subject: Re: [Patch AArch64] Simplify reduc_plus_scal_v2[sd]f sequence
- Authentication-results: sourceware.org; auth=none
- References: <1463476002-513-1-git-send-email-james dot greenhalgh at arm dot com> <CAFqB+Px3moXAhPPLvWQGdj=wJsAszHsP2UEWHEZZwCsEAy=vDA at mail dot gmail dot com> <20160517110244 dot GA3885 at arm dot com>
On 17 May 2016 at 12:02, James Greenhalgh <james.greenhalgh@arm.com> wrote:
> On Tue, May 17, 2016 at 11:32:36AM +0100, Marcus Shawcroft wrote:
>> On 17 May 2016 at 10:06, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>> >
>> > Hi,
>> >
>> > This is just a simplification, it probably makes life easier for register
>> > allocation in some corner cases and seems the right thing to do. We don't
>> > use the internal version elsewhere, so we're safe to delete it and change
>> > the types.
>> >
>> > OK?
>> >
>> > Bootstrapped on AArch64 with no issues.
>>
>> Help me understand why this is ok for BE ?
>
> The reduc_plus_scal_<mode> pattern wants to take a vector and return a scalar
> value representing the sum of the lanes of that vector. We want to go
> from V2DFmode to DFmode.
>
> The architectural instruction FADDP writes to a scalar value in the low
> bits of the register, leaving zeroes in the upper bits.
>
> i.e.
>
> faddp d0, v1.2d
>
> 128 64 0
> | 0x0 | v1.d[0] + v1.d[1] |
>
> In the current implementation, we use the
> aarch64_reduc_plus_internal<mode> pattern, which treats the result of
> FADDP as a vector of two elements. We then need an extra step to extract
> the correct scalar value from that vector. From GCC's point of view the lane
> containing the result is either lane 0 (little-endian) or lane 1
> (big-endian), which is why the current code is endian dependent. The extract
> operation will always be a NOP move from architectural bits 0-63 to
> architectural bits 0-63 - but we never elide the move as future passes can't
> be certain that the upper bits are zero (they come out of an UNSPEC so
> could be anything).
>
> However, this is all unneccesary. FADDP does exactly what we want,
> regardless of endianness, we just need to model the instruction as writing
> the scalar value in the first place. Which is what this patch wires up.
>
> We probably just missed this optimization in the migration from the
> reduc_splus optabs (which required a vector return value) to the
> reduc_plus_scal optabs (which require a scalar return value).
>
> Does that help?
Yep. Thanks. OK to commit. /Marcus
> Thanks,
> James
>