This is the mail archive of the
mailing list for the GCC project.
Re: [GCC RFC]A new and simple pass merging paired load store instructions
- From: Oleg Endo <oleg dot endo at t-online dot de>
- To: "bin.cheng" <bin dot cheng at arm dot com>
- Cc: "<gcc-patches at gcc dot gnu dot org>" <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 15 May 2014 12:31:14 +0200
- Subject: Re: [GCC RFC]A new and simple pass merging paired load store instructions
- Authentication-results: sourceware.org; auth=none
- References: <004d01cf700e$ef1e30e0$cd5a92a0$ at arm dot com>
On 15 May 2014, at 09:26, "bin.cheng" <email@example.com> wrote:
> Targets like ARM and AARCH64 support double-word load store instructions,
> and these instructions are generally faster than the corresponding two
> load/stores. GCC currently uses peephole2 to merge paired load/store into
> one single instruction which has a disadvantage. It can only handle simple
> cases like the two instructions actually appear sequentially in instruction
> stream, and is too weak to handle cases in which the two load/store are
> intervened by other irrelevant instructions.
> Here comes up with a new GCC pass looking through each basic block and
> merging paired load store even they are not adjacent to each other. The
> algorithm is pretty simple:
> 1) In initialization pass iterating over instruction stream it collects
> relevant memory access information for each instruction.
> 2) It iterates over each basic block, tries to find possible paired
> instruction for each memory access instruction. During this work, it checks
> dependencies between the two possible instructions and also records the
> information indicating how to pair the two instructions. To avoid quadratic
> behavior of the algorithm, It introduces new parameter
> max-merge-paired-loadstore-distance and set the default value to 4, which is
> large enough to catch major part of opportunities on ARM/cortex-a15.
> 3) For each candidate pair, it calls back-end's hook to do target dependent
> check and merge the two instructions if possible.
> Though the parameter is set to 4, for miscellaneous benchmarks, this pass
> can merge numerous opportunities except ones already merged by peephole2
> (same level numbers of opportunities comparing to peepholed ones). GCC
> bootstrap can also confirm this finding.
This is interesting. E.g. on SH there are insns to load/store SFmode pairs. However, these insns require a mode switch and have some constraints on register usage. So in the SH case the load/store pairing would need to be done before reg alloc and before mode switching.
> Yet there is an open issue about when we should run this new pass. Though
> register renaming is disabled by default now, I put this pass after it,
> because renaming can resolve some false dependencies thus benefit this pass.
> Another finding is, it can capture a lot more opportunities if it's after
> sched2, but I am not sure whether it will mess up with scheduling results in
> this way.
How about the following.
Instead of adding new hooks and inserting the pass to the general pass list, make the new pass class take the necessary callback functions directly. Then targets can just instantiate the pass, passing their impl of the callbacks, and insert the pass object into the pass list at a place that fits best for the target.
> So, any comments about this?
> 2014-05-15 Bin Cheng <firstname.lastname@example.org>
> * common.opt (flag_merge_paired_loadstore): New option.
> * merge-paired-loadstore.c: New file.
> * Makefile.in: Support new file.
> * config/arm/arm.c (TARGET_MERGE_PAIRED_LOADSTORE): New macro.
> (load_latency_expanded_p, arm_merge_paired_loadstore): New function.
> * params.def (PARAM_MAX_MERGE_PAIRED_LOADSTORE_DISTANCE): New param.
> * doc/invoke.texi (-fmerge-paired-loadstore): New.
> (max-merge-paired-loadstore-distance): New.
> * doc/tm.texi.in (TARGET_MERGE_PAIRED_LOADSTORE): New.
> * doc/tm.texi: Regenerated.
> * target.def (merge_paired_loadstore): New.
> * tree-pass.h (make_pass_merge_paired_loadstore): New decl.
> * passes.def (pass_merge_paired_loadstore): New pass.
> * timevar.def (TV_MERGE_PAIRED_LOADSTORE): New time var.
> 2014-05-15 Bin Cheng <email@example.com>
> * gcc.target/arm/merge-paired-loadstore.c: New test.