[PATCH][AArch64] Expand DImode constant stores to two SImode stores when profitable

Kyrill Tkachov kyrylo.tkachov@foss.arm.com
Thu Nov 10 09:04:00 GMT 2016


Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00040.html

Andrew, do you have any objections to this version?
Thanks,
Kyrill

On 01/11/16 15:21, Kyrill Tkachov wrote:
>
> On 31/10/16 11:54, Kyrill Tkachov wrote:
>>
>> On 24/10/16 17:15, Andrew Pinski wrote:
>>> On Mon, Oct 24, 2016 at 7:27 AM, Kyrill Tkachov
>>> <kyrylo.tkachov@foss.arm.com> wrote:
>>>> Hi all,
>>>>
>>>> When storing a 64-bit immediate that has equal bottom and top halves we
>>>> currently
>>>> synthesize the repeating 32-bit pattern twice and perform a single X-store.
>>>> With this patch we synthesize the 32-bit pattern once into a W register and
>>>> store
>>>> that twice using an STP. This reduces codesize bloat from synthesising the
>>>> same
>>>> constant multiple times at the expense of converting a store to a
>>>> store-pair.
>>>> It will only trigger if we can save two or more instructions, so it will
>>>> only transform:
>>>>          mov     x1, 49370
>>>>          movk    x1, 0xc0da, lsl 32
>>>>          str     x1, [x0]
>>>>
>>>> into:
>>>>
>>>>          mov     w1, 49370
>>>>          stp     w1, w1, [x0]
>>>>
>>>> when optimising for -Os, whereas it will always transform a 4-insn synthesis
>>>> sequence into a two-insn sequence + STP (see comments in the patch).
>>>>
>>>> This patch triggers already but will trigger more with the store merging
>>>> pass
>>>> that I'm working on since that will generate more of these repeating 64-bit
>>>> constants.
>>>> This helps improve codegen on 456.hmmer where store merging can sometimes
>>>> create very
>>>> complex repeating constants and target-specific expand needs to break them
>>>> down.
>>>
>>> Doing STP might be worse on ThunderX 1 than the mov/movk.  Or this
>>> might cause an ICE with -mcpu=thunderx; I can't remember if the check
>>> for slow unaligned store pair word is with the pattern or not.
>>
>> I can't get it to ICE with -mcpu=thunderx.
>> The restriction is just on the STP forming code in the sched-fusion hooks AFAIK.
>>
>>> Basically the rule is
>>> 1) if 4 byte aligned, then it is better to do two str.
>>> 2) If 8 byte aligned, then doing stp is good
>>> 3) Otherwise it is better to do two str.
>>
>> Ok, then I'll make the function just emit two stores and depend on the sched-fusion
>> machinery to fuse them into an STP when appropriate since that has the logic that
>> takes thunderx into account.
>>
>
> Here it is.
> I've confirmed that it emits to STRs for 4 byte aligned stores when -mtune=thunderx
> and still generates STP for the other tunings, though now sched-fusion is responsible for
> merging them, which is ok by me.
>
> Bootstrapped and tested on aarch64.
> Ok for trunk?
>
> Thanks,
> Kyril
>
>
> 2016-11-01  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * config/aarch64/aarch64.md (mov<mode>): Call
>     aarch64_split_dimode_const_store on DImode constant stores.
>     * config/aarch64/aarch64-protos.h (aarch64_split_dimode_const_store):
>     New prototype.
>     * config/aarch64/aarch64.c (aarch64_split_dimode_const_store): New
>     function.
>
> 2016-11-01  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * gcc.target/aarch64/store_repeating_constant_1.c: New test.
>     * gcc.target/aarch64/store_repeating_constant_2.c: Likewise.
>
>>
>>
>>>
>>> Thanks,
>>> Andrew
>>>
>>>
>>>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>>>
>>>> Ok for trunk?
>>>>
>>>> Thanks,
>>>> Kyrill
>>>>
>>>> 2016-10-24  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>      * config/aarch64/aarch64.md (mov<mode>): Call
>>>>      aarch64_split_dimode_const_store on DImode constant stores.
>>>>      * config/aarch64/aarch64-protos.h (aarch64_split_dimode_const_store):
>>>>      New prototype.
>>>>      * config/aarch64/aarch64.c (aarch64_split_dimode_const_store): New
>>>>      function.
>>>>
>>>> 2016-10-24  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>      * gcc.target/aarch64/store_repeating_constant_1.c: New test.
>>>>      * gcc.target/aarch64/store_repeating_constant_2.c: Likewise.
>>
>



More information about the Gcc-patches mailing list