This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Ping. https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00040.html Andrew, do you have any objections to this version? Thanks, Kyrill On 01/11/16 15:21, Kyrill Tkachov wrote:
On 31/10/16 11:54, Kyrill Tkachov wrote:On 24/10/16 17:15, Andrew Pinski wrote:On Mon, Oct 24, 2016 at 7:27 AM, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:Hi all, When storing a 64-bit immediate that has equal bottom and top halves we currently synthesize the repeating 32-bit pattern twice and perform a single X-store. With this patch we synthesize the 32-bit pattern once into a W register and store that twice using an STP. This reduces codesize bloat from synthesising the same constant multiple times at the expense of converting a store to a store-pair. It will only trigger if we can save two or more instructions, so it will only transform: mov x1, 49370 movk x1, 0xc0da, lsl 32 str x1, [x0] into: mov w1, 49370 stp w1, w1, [x0] when optimising for -Os, whereas it will always transform a 4-insn synthesis sequence into a two-insn sequence + STP (see comments in the patch). This patch triggers already but will trigger more with the store merging pass that I'm working on since that will generate more of these repeating 64-bit constants. This helps improve codegen on 456.hmmer where store merging can sometimes create very complex repeating constants and target-specific expand needs to break them down.Doing STP might be worse on ThunderX 1 than the mov/movk. Or this might cause an ICE with -mcpu=thunderx; I can't remember if the check for slow unaligned store pair word is with the pattern or not.I can't get it to ICE with -mcpu=thunderx. The restriction is just on the STP forming code in the sched-fusion hooks AFAIK.Basically the rule is 1) if 4 byte aligned, then it is better to do two str. 2) If 8 byte aligned, then doing stp is good 3) Otherwise it is better to do two str.Ok, then I'll make the function just emit two stores and depend on the sched-fusion machinery to fuse them into an STP when appropriate since that has the logic that takes thunderx into account.Here it is. I've confirmed that it emits to STRs for 4 byte aligned stores when -mtune=thunderx and still generates STP for the other tunings, though now sched-fusion is responsible for merging them, which is ok by me. Bootstrapped and tested on aarch64. Ok for trunk? Thanks, Kyril 2016-11-01 Kyrylo Tkachov <kyrylo.tkachov@arm.com> * config/aarch64/aarch64.md (mov<mode>): Call aarch64_split_dimode_const_store on DImode constant stores. * config/aarch64/aarch64-protos.h (aarch64_split_dimode_const_store): New prototype. * config/aarch64/aarch64.c (aarch64_split_dimode_const_store): New function. 2016-11-01 Kyrylo Tkachov <kyrylo.tkachov@arm.com> * gcc.target/aarch64/store_repeating_constant_1.c: New test. * gcc.target/aarch64/store_repeating_constant_2.c: Likewise.Thanks, AndrewBootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill 2016-10-24 Kyrylo Tkachov <kyrylo.tkachov@arm.com> * config/aarch64/aarch64.md (mov<mode>): Call aarch64_split_dimode_const_store on DImode constant stores. * config/aarch64/aarch64-protos.h (aarch64_split_dimode_const_store): New prototype. * config/aarch64/aarch64.c (aarch64_split_dimode_const_store): New function. 2016-10-24 Kyrylo Tkachov <kyrylo.tkachov@arm.com> * gcc.target/aarch64/store_repeating_constant_1.c: New test. * gcc.target/aarch64/store_repeating_constant_2.c: Likewise.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |