This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi, This patch is a new approach to fix PR66388. IVO today computes iv_use with iv_cand which has at least same type precision as the use. On 64bit platforms like AArch64, this results in different iv_cand created for each address type iv_use, and register pressure increased. As a matter of fact, the BIV should be used for all iv_uses in some of these cases. It is a latent bug but recently getting worse because of overflow changes. The original approach at https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01484.html can fix the issue except it conflict with IV elimination. Seems to me it is impossible to mitigate the contradiction. This new approach fixes the issue by adding sizetype iv_cand for BIVs directly. In cases if the original BIV is preferred, the sizetype iv_cand will be chosen. As for code generation, the sizetype iv_cand has the same effect as the original BIV. Actually, it's better because BIV needs to be explicitly extended to sizetype to be used in address expression on most targets. One shortage of this approach is it may introduce more iv candidates. To minimize the impact, this patch does sophisticated code analysis and adds sizetype candidate for BIV only if it is used as index. Moreover, it avoids to add candidate of the original type if the BIV is only used as index. Statistics for compiling spec2k6 shows increase of candidate number is modest and can be ignored. There are two more patches following to fix corner cases revealed by this one. In together they bring obvious perf improvement for spec26k/int on aarch64. Spec2k6/int 400.perlbench 3.44% 445.gobmk -0.86% 456.hmmer 14.83% 458.sjeng 2.49% 462.libquantum -0.79% GEOMEAN 1.68% There is also about 0.36% improvement for spec2k6/fp, mostly because of case 436.cactusADM. I believe it can be further improved, but that should be another patch. I also collected benchmark data for x86_64. Spec2k6/fp is not affected. As for spec2k6/int, though the geomean is improved slightly, 400.perlbench is regressed by ~3%. I can see BIVs are chosen for some loops instead of address candidates. Generally, the loop header will be simplified because iv elimination with BIV is simpler; the number of instructions in loop body isn't changed. I suspect the regression comes from different addressing modes. With BIV, complex addressing mode like [base + index << scale + disp] is used, rather than [base + disp]. I guess the former has more micro-ops, thus more expensive. This guess can be confirmed by manually suppressing the complex addressing mode with higher address cost. Now the problem becomes why overall cost of BIV is computed lower while the actual cost is higher. I noticed for most affected loops, loop header is bloated because of iv elimination using the old address candidate. The bloated loop header results in much higher cost than BIV. As a result, BIV is preferred. I also noticed the bloated loop header generally can be simplified (I have a following patch for this). After applying the local patch, the old address candidate is chosen, and most of regression is recovered. Conclusion is I think loop header bloated issue should be blamed for the regression, and it can be resolved. Bootstrap and test on x64_64 and aarch64. It fixes failure of gcc.target/i386/pr49781-1.c, without new breakage. So what do you think? Thanks, bin 2015-08-31 Bin Cheng <bin.cheng@arm.com> * tree-affine.c (aff_combination_expand): New parameters. (tree_to_aff_combination_expand): Ditto. * tree-affine.h (aff_combination_expand): New declaration. (tree_to_aff_combination_expand): Ditto. * tree-ssa-loop-ivopts.c (struct iv, iv_cand): New fields. (dump_iv): Dump no_overflow information. (alloc_iv): Initialize new field for struct iv. (struct expand_data): New struct for affine combination expanding. (stop_expand): New callback func for affine combination expanding. (find_deriving_biv_for_iv, record_biv_for_address_use): New functions. (idx_find_step): Call new functions above. (find_depends, add_candidate): New paramter. (add_iv_candidate_for_biv): Add sizetype cand for BIV. (get_computation_aff): Simplify convertion of cand for BIV. (get_computation_cost_at): Step cand's base if necessary.
Attachment:
pr66388_1-20150824.txt
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |