This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi, This patch is to fix PR66388. It's an old issue but recently became worse after my scev overflow change. IVOPT now can only compute iv use with candidate which has at least same type precision. See below code: if (TYPE_PRECISION (utype) > TYPE_PRECISION (ctype)) { /* We do not have a precision to express the values of use. */ return infinite_cost; } This is not always true. It's possible to compute with a candidate of smaller precision if it has enough stepping periods to express the iv use. Just as code in iv_elimination. Well, since now we have iv no_overflow information, we can use that to prove it's safe. Actually I am thinking about improving iv elimination with overflow information too. So this patch relaxes the constraint to allow computation of uses with smaller precision candidates. Benchmark data shows several cases in spec2k6 are obviously improved on aarch64: 400.perlbench 2.32% 445.gobmk 0.86% 456.hmmer 11.72% 464.h264ref 1.93% 473.astar 0.75% 433.milc -1.49% 436.cactusADM 6.61% 444.namd -0.76% I looked into assembly code of 456.hmmer&436.cactusADM, and can confirm hot loops are reduced. Also perf data could confirm the improvement in 456.hmmer. I looked into 433.milc and found most hot functions are not affected by this patch. But I do observe two kinds of regressions described as below: A) For some loops, auto-increment addressing mode is generated before this patch, but "base + index<<scale" is generated after. I don't worry about this too much because auto-increment support in IVO hasn't been enabled on AArch64 yet. On the contrary, we should worry that auto-increment support is too aggressive in IVO, resulting in auto-increment addressing mode generated where it shouldn't. I suspect the regression we monitored before is caused by such kind of reason. B) This patch enables computation of 64 bits address iv use with 32 bits biv candidate. So there will be a sign extension before the candidate can be used in memory reference as an index. I already increased the cost by 2 for such biv candidates but there still be some peculiar cases... Decreasing cost in determine_iv_cost for biv candidates makes this worse. It does that to make debugging simpler, nothing to do with performance. Bootstrap and test on x86_64. It fixes failure of pr49781-1.c. Unfortunately, it introduces new failure of g++.dg/debug/dwarf2/deallocator.C. I looked into the test and found with this patch, the loop is transformed into a shape that can be later eliminated(because it can be proved never loop back?). We can further discuss if it's this patch's problem or the case should be tuned. Also bootstrap and test on aarch64. So what's your opinion? Thanks, bin 2015-07-16 Bin Cheng <bin.cheng@arm.com> PR tree-optimization/66388 * tree-ssa-loop-ivopts.c (dump_iv): Dump no_overflow info. (add_candidate_1): New parameter. Use unsigned type when iv overflows. Pass no_overflow to alloc_iv. (add_autoinc_candidates, add_candidate): New parameter. Pass no_overflow to add_candidate_1. (add_candidate): Ditto. (add_iv_candidate_for_biv, add_iv_candidate_for_use): Pass iv's no_overflow info to add_candidate and add_candidate_1. (get_computation_aff, get_computation_cost_at): Handle candidate with smaller precision than iv use. gcc/testsuite/ChangeLog 2015-07-16 Bin Cheng <bin.cheng@arm.com> PR tree-optimization/66388 * gcc.dg/tree-ssa/pr66388.c: New test.
Attachment:
j5612-20150716.txt
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |