This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi, Targets like ARM and AARCH64 support double-word load store instructions, and these instructions are generally faster than the corresponding two load/stores. GCC currently uses peephole2 to merge paired load/store into one single instruction which has a disadvantage. It can only handle simple cases like the two instructions actually appear sequentially in instruction stream, and is too weak to handle cases in which the two load/store are intervened by other irrelevant instructions. Here comes up with a new GCC pass looking through each basic block and merging paired load store even they are not adjacent to each other. The algorithm is pretty simple: 1) In initialization pass iterating over instruction stream it collects relevant memory access information for each instruction. 2) It iterates over each basic block, tries to find possible paired instruction for each memory access instruction. During this work, it checks dependencies between the two possible instructions and also records the information indicating how to pair the two instructions. To avoid quadratic behavior of the algorithm, It introduces new parameter max-merge-paired-loadstore-distance and set the default value to 4, which is large enough to catch major part of opportunities on ARM/cortex-a15. 3) For each candidate pair, it calls back-end's hook to do target dependent check and merge the two instructions if possible. Though the parameter is set to 4, for miscellaneous benchmarks, this pass can merge numerous opportunities except ones already merged by peephole2 (same level numbers of opportunities comparing to peepholed ones). GCC bootstrap can also confirm this finding. Yet there is an open issue about when we should run this new pass. Though register renaming is disabled by default now, I put this pass after it, because renaming can resolve some false dependencies thus benefit this pass. Another finding is, it can capture a lot more opportunities if it's after sched2, but I am not sure whether it will mess up with scheduling results in this way. So, any comments about this? Thanks, bin 2014-05-15 Bin Cheng <bin.cheng@arm.com> * common.opt (flag_merge_paired_loadstore): New option. * merge-paired-loadstore.c: New file. * Makefile.in: Support new file. * config/arm/arm.c (TARGET_MERGE_PAIRED_LOADSTORE): New macro. (load_latency_expanded_p, arm_merge_paired_loadstore): New function. * params.def (PARAM_MAX_MERGE_PAIRED_LOADSTORE_DISTANCE): New param. * doc/invoke.texi (-fmerge-paired-loadstore): New. (max-merge-paired-loadstore-distance): New. * doc/tm.texi.in (TARGET_MERGE_PAIRED_LOADSTORE): New. * doc/tm.texi: Regenerated. * target.def (merge_paired_loadstore): New. * tree-pass.h (make_pass_merge_paired_loadstore): New decl. * passes.def (pass_merge_paired_loadstore): New pass. * timevar.def (TV_MERGE_PAIRED_LOADSTORE): New time var. gcc/testsuite/ChangeLog 2014-05-15 Bin Cheng <bin.cheng@arm.com> * gcc.target/arm/merge-paired-loadstore.c: New test.
Attachment:
merge-paired-loadstore-20140515.txt
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |