This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi, GCC's IVO currently handles every IV use independently, which is not right by learning from cases reported in PR65447. The rationale is: 1) Lots of address type IVs refer to the same memory object, share similar base and have same step. We should handle these IVs as a group in order to maximize CSE opportunities, prefer reg+offset addressing mode. 2) GCC's IVO algorithm is expensive and only is run when candidate set is small enough. By grouping same family uses, we can decrease the number of both uses and candidates. Before this patch, number of candidates for PR65447 is too big to run expensive IVO algorithm, resulting in bad assembly code on targets like AArch64 and Mips. 3) Even for cases the assembly code isn't improved, we can still get compilation time benefit with this patch. 4) This is a prerequisite for enabling auto-increment support in IVO on AArch64. For now, this is only done to address type IVs, in the future I may extend it to general IVs too. For AArch64: Benchmarks 470.lbm/spec2k6 and 173.applu/spec2k are improved obviously by this patch. A couple of cases from spec2k/fp appear regressed. I looked into generated assembly code and can confirm the regression is false alarm except one case (189.lucas). For that case, I think it's another issue exposed by this patch (GCC failed to CSE candidate setup code, resulting in bloated loop header). Anyway, I also fined tuned the patch to minimize the impact. For AArch32, this patch seems to be able to improve spec2kfp too, but I didn't look deep into it. I guess the reason is it can make life for auto-increment support in IVO better. One of defects of this patch is computation of max offset in compute_max_addr_offset is basically borrowed from get_address_cost. The comment says we should find a better way to compute all information. People also complained we need to refactor that part of code. I don't have good solution to that yet, though I did try best to keep compute_max_addr_offset simple. I believe this is a generally wanted change, bootstrap and test on x86_64 and AArch64, so is it ok? 2015-05-08 Bin Cheng <bin.cheng@arm.com> PR tree-optimization/65447 * tree-ssa-loop-ivopts.c (struct iv_use): New fields. (dump_use, dump_uses): Support to dump sub use. (record_use): New parameters to support sub use. Remove call to dump_use. (record_sub_use, record_group_use): New functions. (compute_max_addr_offset, split_all_small_groups): New functions. (group_address_uses, rewrite_use_address): New functions. (strip_offset): New declaration. (find_interesting_uses_address): Call record_group_use. (add_candidate): New assertion. (infinite_cost_p): Move definition forward. (add_costs): Check INFTY cost and return immediately. (get_computation_cost_at): Clear setup cost and dependent bitmap for sub uses. (determine_use_iv_cost_address): Compute cost for sub uses. (rewrite_use_address_1): Rename from old rewrite_use_address. (free_loop_data): Free sub uses. (tree_ssa_iv_optimize_loop): Call group_address_uses. gcc/testsuite/ChangeLog 2015-05-08 Bin Cheng <bin.cheng@arm.com> PR tree-optimization/65447 * gcc.dg/tree-ssa/pr65447.c: New test.
Attachment:
j4766-20150507.txt
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |