[PATCH][AArch64] Expand DImode constant stores to two SImode stores when profitable

Kyrill Tkachov kyrylo.tkachov@foss.arm.com
Mon Oct 31 14:26:00 GMT 2016


On 31/10/16 13:42, Richard Earnshaw (lists) wrote:
> On 31/10/16 11:54, Kyrill Tkachov wrote:
>> On 24/10/16 17:15, Andrew Pinski wrote:
>>> On Mon, Oct 24, 2016 at 7:27 AM, Kyrill Tkachov
>>> <kyrylo.tkachov@foss.arm.com> wrote:
>>>> Hi all,
>>>>
>>>> When storing a 64-bit immediate that has equal bottom and top halves we
>>>> currently
>>>> synthesize the repeating 32-bit pattern twice and perform a single
>>>> X-store.
>>>> With this patch we synthesize the 32-bit pattern once into a W
>>>> register and
>>>> store
>>>> that twice using an STP. This reduces codesize bloat from
>>>> synthesising the
>>>> same
>>>> constant multiple times at the expense of converting a store to a
>>>> store-pair.
>>>> It will only trigger if we can save two or more instructions, so it will
>>>> only transform:
>>>>           mov     x1, 49370
>>>>           movk    x1, 0xc0da, lsl 32
>>>>           str     x1, [x0]
>>>>
>>>> into:
>>>>
>>>>           mov     w1, 49370
>>>>           stp     w1, w1, [x0]
>>>>
>>>> when optimising for -Os, whereas it will always transform a 4-insn
>>>> synthesis
>>>> sequence into a two-insn sequence + STP (see comments in the patch).
>>>>
>>>> This patch triggers already but will trigger more with the store merging
>>>> pass
>>>> that I'm working on since that will generate more of these repeating
>>>> 64-bit
>>>> constants.
>>>> This helps improve codegen on 456.hmmer where store merging can
>>>> sometimes
>>>> create very
>>>> complex repeating constants and target-specific expand needs to break
>>>> them
>>>> down.
>>> Doing STP might be worse on ThunderX 1 than the mov/movk.  Or this
>>> might cause an ICE with -mcpu=thunderx; I can't remember if the check
>>> for slow unaligned store pair word is with the pattern or not.
>> I can't get it to ICE with -mcpu=thunderx.
>> The restriction is just on the STP forming code in the sched-fusion
>> hooks AFAIK.
>>
>>> Basically the rule is
>>> 1) if 4 byte aligned, then it is better to do two str.
>>> 2) If 8 byte aligned, then doing stp is good
>>> 3) Otherwise it is better to do two str.
>> Ok, then I'll make the function just emit two stores and depend on the
>> sched-fusion
>> machinery to fuse them into an STP when appropriate since that has the
>> logic that
>> takes thunderx into account.
> If the mode is DImode (ie the pattern is 'movdi', then surely we must
> have a 64-bit aligned store.

I don't think that's guaranteed. At least the Standard Names documentation
doesn't mention it. I've gone through an example with gdb where store merging
produces an unaligned store and the gen_movdi expander is called with a source
of (const_int 8589934593 [0x200000001]) and a destination of:
(mem:DI (reg/v/f:DI 73 [ a ]) [1 MEM[(int *)a_2(D)]+0 S8 A32])

i.e. a 32-bit aligned DImode MEM.

Thanks,
Kyrill

> R.
>
>> Thanks for the info.
>> Kyrill
>>
>>> Thanks,
>>> Andrew
>>>
>>>
>>>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>>>
>>>> Ok for trunk?
>>>>
>>>> Thanks,
>>>> Kyrill
>>>>
>>>> 2016-10-24  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>       * config/aarch64/aarch64.md (mov<mode>): Call
>>>>       aarch64_split_dimode_const_store on DImode constant stores.
>>>>       * config/aarch64/aarch64-protos.h
>>>> (aarch64_split_dimode_const_store):
>>>>       New prototype.
>>>>       * config/aarch64/aarch64.c (aarch64_split_dimode_const_store): New
>>>>       function.
>>>>
>>>> 2016-10-24  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>       * gcc.target/aarch64/store_repeating_constant_1.c: New test.
>>>>       * gcc.target/aarch64/store_repeating_constant_2.c: Likewise.



More information about the Gcc-patches mailing list