This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
This patch adds vectorizer cost model testing for the SPU, and also defines the cost-model target-specific costs for the spu. While tuning the cost-model for the SPU I also found a few places where we can be a little more accurate, so this patch also includes several small fixes/enhancements to the cost-model itself: - Use TARG_SCALAR_TO_VEC_COST for the reduction initialization cost (instead of TARG_VEC_STMT_COST). - Use TARG_VEC_TO_SCALAR_COST for the reduction finalization cost (instead of TARG_VEC_STMT_COST). - Have vect_estimate_min_profitable_iters return the threshold it computed minus 1 (instead of returning min_profitable_iters), because the way it is used later is in the following condition: "if (niters <= min_profitable_iters) then skip the vectorized loop" (whereas min_profitable_iters is already expected to be profitable, so we are being too conservative). - Differentiate between the costs of different scalar stmts instead of using a cost 1 for all scalar stmts (even this is probably not enough - we probably want to be able to have a finer differentiation eventually). - Add a target-builtin to allow targets to add any additional global costs. - When we don't know the number of prologue/epilogue iterations we currently assume the worst (i.e. VF-1). Instead, this patch changes it to use (VF-1)/2, which is supposed to be "statistically" closer to reality, with a small bias towards vectorizing. We can consider having different levels of conservativeness to the cost model, according to a user specified parameter, which would affect, among other things, if we estimate the prologue/epilogu iteration count to be (VF-1)/2, or VF/2, or VF-1 (most conservative). About the SPU specific stuff - these are the costs that I currently set (based on tuning only on one benchmark suite, I'm sure this can be further refined): - scalar load: 2 (rational: load + rotate) - aligned vector load: 1 - unaligned vector load: 2 (rational: load + shuffle) - scalar store: 10 (rational: it takes about 10 cycles for the stqd to start when doing a scalar store because it's preceded by a load + shuffle sequence) - targetm.vectorization_cost: adds to the latency of a mispredicted branch (19) to the costs of choosing the scalar version of the loop (the cost of following the mis-predicted path when skipping the vectorized loop). - branch cost: 6 (rational: somewhere between the latency of a correctly predicted branch (1) and the latency of an incorrectly predicted branch (19) relative to the latency of other insns (2-7). i.e some kind of "average" over {1,19/7,19/2}). - all other costs - 1 per insn At least in the specific benchmark suite I was playing with, it was almost always better to vectorize on the SPU (which is not surprising). The interesting thing was that in the few cases when it's not profitable to vectorize (when the loop-count was very small), it was usually better to fall through to the slower vectorized version of the loop, than to jump to the scalar version of the loop, thereby paying the cost of a mis-predicted branch (which is very painful on the SPU because of no branch prediction). Bootstrapped with vectorization enabled, with and without the cost model, and tested on the vectorizer testcases, on powerpc-linux and i386-linux. Also built for the SPU and tested the vectorizer testcases on the SPU. Committed to autovect-branch. To be submitted to mainline after the freeze. Dorit * target.h (builtin_vectorization_cost): Add new target builtin. * target-def.h (TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST): New. * tree-vectorizer.h (TARG_SCALAR_STMT_COST): New. (TARG_SCALAR_LOAD_COST, TARG_SCALAR_STORE_COST): New. * tree-vect-analyze.c (vect_analyze_slp_instance): Initisliaze uninitialized variables. * tree-vect-transform.c (cost_for_stmt): New function. (vect_estimate_min_profitable_iters): Call cost_for_stmt instead of using cost 1 for all scalar stmts. Be less conservative when estimating the number of prologue/epulogue iterations. Cell targetm.vectorize.builtin_vectorization_cost. Return min_profitable_iters-1. (vect_model_reduction_cost): Use TARG_SCALAR_TO_VEC_COST for initialization cost instead of TARG_VEC_STMT_COST. Use TARG_VEC_TO_SCALAR_COST instead of TARG_VEC_STMT_COST for reduction epilogue code. * config/spu/spu.c (spu_builtin_vectorization_cost): New. (TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST): Implement. * config/spu/spu.h (TARG_COND_BRANCH_COST, TARG_SCALAR_STMT_COST): (TARG_SCALAR_LOAD_COST, TARG_SCALAR_STORE_COST, TARG_VEC_STMT_COST): (TARG_VEC_TO_SCALAR_COST, TARG_SCALAR_TO_VEC, TARG_VEC_LOAD_COST): (TARG_VEC_UNALIGNED_LOAD_COST, TARG_VEC_STORE_COST): Define. * gcc.dg/vect/no-scevccp-outer-18.c: Fix dg-final check: works only for vect_interleave targets. * gcc.dg/vect/no-scevccp-outer-19.c: Fix dg-final check: works only for vect_unpack targets. * gcc.dg/vect/no-scevccp-outer-21.c: Likewise. * gcc.dg/vect/no-scevccp-outer-16.c: Likewise. * gcc.dg/vect/no-scevccp-outer-17.c: Likewise. * gcc.dg/vect/vect-outer-2.c: Fix dg-final check: vect_intfloat_cvt. * gcc.dg/vect/costmodel/spu/spu-costmodel-vect.exp: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-outer-fir.c: New. * gcc.dg/vect/costmodel/spu/costmodel-fast-math-vect-pr29925.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-31a.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-31b.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-31c.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-31d.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-iv-9.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-33.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-76a.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-76b.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-76c.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-68a.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-68b.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-68c.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-68d.c: New. * lib/target-supports.exp (check_effective_target_vect_int_mul): Add spu. (See attached file: costmodelfixes.autovect.txt)
Attachment:
costmodelfixes.autovect.txt
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |