This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi, This patch series rewrites parts of IVOPTs. The change consists of below described parts: A) New cost computation model. Currently, there are big amount code trying to understand tree expression and estimate its computation cost. The model is designed long ago for generic tree expressions. In order to process generic expression (even address expression of array/memory references), it has code for too many corner cases. The problem is it's somehow impossible to handle all complicated expressions, even with complicated logic in functions like get_computation_cost_at, difference_cost, ptr_difference_cost, get_address_cost and so on... The second problem is it's hard to keep cost model consistent among special cases. As special cases being added from time to time, the model is no long unified any more. There are cases that right cost results in bad code, or vice versa, wrong cost results in good code. Finally, it's also difficult to add code for new cases. This patch introduces a new cost computation model by using tree affine. Tree exprs are lowered to aff_tree which is simple arithmetic operation usually. Code handling special cases is no longer necessary, which brings us quite simplicity. It is also easier to compute consistent costs among different expressions using tree affine, which gives us a unified cost model. This change is implemented in [PATCH rewrite-cost-computation-*.txt]. B) In rewriting both nonlinear iv_use and address iv_use, current code does bad association by mixing computation of invariant and induction. This introduces inconsistency between cost computation and code generation because costs of invariant and induction are computed separately. This also prevents loop inv from being hoisted out of loop. This change fixes the issue by re-associating invariant and induction parts separately for both nonlinear and address iv_use. This patch is implemented in two patches: [PATCH nonlinear-iv_use-rewrite-*.txt] [PATCH address-iv_use-rewrite-*.txt] C) Current implementation shares the same register pressure computation with RTL loop inv pass. It has difficulty in handling (especially large) loop nest, and quite often generating too many candidates (especially for outer loops). This change introduces new register pressure estimation. The brief idea is to differentiate (hot) innermost loop and outer loop. for (possibly hot) innermost loop, more registers are allowed as long as overall register pressure is within the range of number of target available registers. This change is implemented in below patches: [PATCH record-newly-used-inv_var-*.txt] [PATCH skip-non_int-phi-reg-pressure-*.txt] [PATCH ivopt-reg_pressure-model-*.txt] D) Other small refactors and improvements. These will be described in each patch's review message. E) Patches allow better induction variable optimizations for vectorized loops. These patches are blocked at the moment because current IVOPTs implementation can generate worse code on targets with limited addressing mode support. [PATCH range_info-for-vect_loop-niters-*.txt] [PATCH pr69710-*.txt] As a bonus, issues like PR53090/PR71361 are now fixed with better code generation than what the two PRs were expecting. I collected spec2k6 data on my local AArch64 and X86_64 machines. Overall FP is improved +1% on both machines; while INT mainly remains neutral. I think part of improvement comes from IVOPTs itself, and rest of it comes from opportunities enabled as described by E). Also It would be great if other targets can run some benchmarks with this patch series in case of any performance breakage. The patch series is bootstrap and test on X86_64 and AArch64, no real regression found, though some tests do need further adjustment. As the start, this is the first patch of the series. It simply handles TRUNCATE between tieable modes in rtx_cost. Since we don't need additional instruction for such truncate, it simply return 0 cost. Is it OK? Thanks, bin 2017-04-11 Bin Cheng <bin.cheng@arm.com> * rtlanal.c (rtx_cost): Handle TRUNCATE between tieable modes.
Attachment:
0001-no_cost-for-tieable-type-truncate-20170220.txt
Description: 0001-no_cost-for-tieable-type-truncate-20170220.txt
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |