This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[GCC RFC]A new and simple pass merging paired load store instructions

Targets like ARM and AARCH64 support double-word load store instructions,
and these instructions are generally faster than the corresponding two
load/stores.  GCC currently uses peephole2 to merge paired load/store into
one single instruction which has a disadvantage.  It can only handle simple
cases like the two instructions actually appear sequentially in instruction
stream, and is too weak to handle cases in which the two load/store are
intervened by other irrelevant instructions.

Here comes up with a new GCC pass looking through each basic block and
merging paired load store even they are not adjacent to each other.  The
algorithm is pretty simple:
1) In initialization pass iterating over instruction stream it collects
relevant memory access information for each instruction.
2) It iterates over each basic block, tries to find possible paired
instruction for each memory access instruction.  During this work, it checks
dependencies between the two possible instructions and also records the
information indicating how to pair the two instructions.  To avoid quadratic
behavior of the algorithm, It introduces new parameter
max-merge-paired-loadstore-distance and set the default value to 4, which is
large enough to catch major part of opportunities on ARM/cortex-a15.
3) For each candidate pair, it calls back-end's hook to do target dependent
check and merge the two instructions if possible.

Though the parameter is set to 4, for miscellaneous benchmarks, this pass
can merge numerous opportunities except ones already merged by peephole2
(same level numbers of opportunities comparing to peepholed ones).  GCC
bootstrap can also confirm this finding.

Yet there is an open issue about when we should run this new pass.  Though
register renaming is disabled by default now, I put this pass after it,
because renaming can resolve some false dependencies thus benefit this pass.
Another finding is, it can capture a lot more opportunities if it's after
sched2, but I am not sure whether it will mess up with scheduling results in
this way.

So, any comments about this?


2014-05-15  Bin Cheng  <>
	* common.opt (flag_merge_paired_loadstore): New option.
	* merge-paired-loadstore.c: New file.
	* Support new file.
	* config/arm/arm.c (TARGET_MERGE_PAIRED_LOADSTORE): New macro.
	(load_latency_expanded_p, arm_merge_paired_loadstore): New function.
	* doc/invoke.texi (-fmerge-paired-loadstore): New.
	(max-merge-paired-loadstore-distance): New.
	* doc/tm.texi: Regenerated.
	* target.def (merge_paired_loadstore): New.
	* tree-pass.h (make_pass_merge_paired_loadstore): New decl.
	* passes.def (pass_merge_paired_loadstore): New pass.
	* timevar.def (TV_MERGE_PAIRED_LOADSTORE): New time var.

2014-05-15  Bin Cheng  <>

	* New test.

Attachment: merge-paired-loadstore-20140515.txt
Description: Text document

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]