This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch AArch64] Simplify reduc_plus_scal_v2[sd]f sequence


On 17 May 2016 at 12:02, James Greenhalgh <james.greenhalgh@arm.com> wrote:
> On Tue, May 17, 2016 at 11:32:36AM +0100, Marcus Shawcroft wrote:
>> On 17 May 2016 at 10:06, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>> >
>> > Hi,
>> >
>> > This is just a simplification, it probably makes life easier for register
>> > allocation in some corner cases and seems the right thing to do. We don't
>> > use the internal version elsewhere, so we're safe to delete it and change
>> > the types.
>> >
>> > OK?
>> >
>> > Bootstrapped on AArch64 with no issues.
>>
>> Help me understand why this is ok for BE ?
>
> The reduc_plus_scal_<mode> pattern wants to take a vector and return a scalar
> value representing the sum of the lanes of that vector. We want to go
> from V2DFmode to DFmode.
>
> The architectural instruction FADDP writes to a scalar value in the low
> bits of the register, leaving zeroes in the upper bits.
>
> i.e.
>
>         faddp  d0, v1.2d
>
> 128                 64                    0
>  |    0x0            | v1.d[0] + v1.d[1]  |
>
> In the current implementation, we use the
> aarch64_reduc_plus_internal<mode> pattern, which treats the result of
> FADDP as a vector of two elements. We then need an extra step to extract
> the correct scalar value from that vector. From GCC's point of view the lane
> containing the result is either lane 0 (little-endian) or lane 1
> (big-endian), which is why the current code is endian dependent. The extract
> operation will always be a NOP move from architectural bits 0-63 to
> architectural bits 0-63 - but we never elide the move as future passes can't
> be certain that the upper bits are zero (they come out of an UNSPEC so
> could be anything).
>
> However, this is all unneccesary. FADDP does exactly what we want,
> regardless of endianness, we just need to model the instruction as writing
> the scalar value in the first place. Which is what this patch wires up.
>
> We probably just missed this optimization in the migration from the
> reduc_splus optabs (which required a vector return value) to the
> reduc_plus_scal optabs (which require a scalar return value).
>
> Does that help?


Yep. Thanks. OK to commit. /Marcus

> Thanks,
> James
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]