This is the mail archive of the
`gcc-patches@gcc.gnu.org`
mailing list for the GCC project.

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |

Other format: | [Raw text] |

*From*: Dorit Nuzman <DORIT at il dot ibm dot com>*To*: gcc-patches at gcc dot gnu dot org*Cc*: Andrew_Pinski at PlayStation dot Sony dot Com*Date*: Thu, 5 Jul 2007 22:36:11 +0300*Subject*: [patch] more vectorizer costmodel fixes/improvements + spu costs (needs review)

This patch brings over the following patches from autovect-branch: http://gcc.gnu.org/ml/gcc-patches/2007-06/msg01981.html http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00225.html I need approval for the bits outside the vectorizer (target.h, target-def.h), and for the changes in the spu port. " This patch adds vectorizer cost model testing for the SPU, and also defines the cost-model target-specific costs for the spu. While tuning the cost-model for the SPU I also found a few places where we can be a little more accurate, so this patch also includes several small fixes/enhancements to the cost-model itself: - Use TARG_SCALAR_TO_VEC_COST for the reduction initialization cost (instead of TARG_VEC_STMT_COST). - Use TARG_VEC_TO_SCALAR_COST for the reduction finalization cost (instead of TARG_VEC_STMT_COST). - In computing the cost of the reduction epilogue there was a bug which caused us to never take the path of the epilogue that uses vector-shifts (instead we always took the path of the epilogue that uses scalar operations). This caused to estimate higher costs for reduction epilogues on powerpc (in one testcase the cost was computed to be 32, instead of 10). So now with this fix a couple loops in the powerpc costmodel testsuite get vectorized when they didn't before. - Have vect_estimate_min_profitable_iters return the threshold it computed minus 1 (instead of returning min_profitable_iters), because the way it is used later is in the following condition: "if (niters <= min_profitable_iters) then skip the vectorized loop" (whereas min_profitable_iters is already expected to be profitable, so we are being too conservative). - Differentiate between the costs of different scalar stmts instead of using a cost 1 for all scalar stmts (even this is probably not enough - we probably want to be able to have a finer differentiation eventually). - Add a target-builtin to allow targets to add any additional global costs. - When we don't know the number of prologue/epilogue iterations we currently assume the worst (i.e. VF-1). Instead, this patch changes it to use (VF-1)/2, which is supposed to be "statistically" closer to reality, with a small bias towards vectorizing. We can consider having different levels of conservativeness to the cost model, according to a user specified parameter, which would affect, among other things, if we estimate the prologue/epilogu iteration count to be (VF-1)/2, or VF/2, or VF-1 (most conservative). About the SPU specific stuff - these are the costs that I currently set (based on tuning only on one benchmark suite, I'm sure this can be further refined): - scalar load: 2 (rational: load + rotate) - aligned vector load: 1 - unaligned vector load: 2 (rational: load + shuffle) - scalar store: 10 (rational: it takes about 10 cycles for the stqd to start when doing a scalar store because it's preceded by a load + shuffle sequence) - targetm.vectorization_cost: adds to the latency of a mispredicted branch (19) to the costs of choosing the scalar version of the loop (the cost of following the mis-predicted path when skipping the vectorized loop). - branch cost: 6 (rational: somewhere between the latency of a correctly predicted branch (1) and the latency of an incorrectly predicted branch (19) relative to the latency of other insns (2-7). i.e some kind of "average" over {1,19/7,19/2}). - all other costs - 1 per insn " Bootstrapped with vectorization enabled and tested on the vectorizer testcases on i386-linux. Also bootstrapped on powerpc-linux (without fortran, because of http://gcc.gnu.org/ml/gcc/2007-07/msg00038.html). Also built for the SPU and tested the vectorizer testcases on the SPU. :ADDPATCH target-builtin,spu: thanks, dorit * target.h (builtin_vectorization_cost): Add new target builtin. * target-def.h (TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST): New. * tree-vectorizer.h (TARG_SCALAR_STMT_COST): New. (TARG_SCALAR_LOAD_COST, TARG_SCALAR_STORE_COST): New. * tree-vect-analyze.c (vect_analyze_slp_instance): Initisliaze uninitialized variables. * tree-vect-transform.c (cost_for_stmt): New function. (vect_estimate_min_profitable_iters): Call cost_for_stmt instead of using cost 1 for all scalar stmts. Be less conservative when estimating the number of prologue/epulogue iterations. Call targetm.vectorize.builtin_vectorization_cost. Return min_profitable_iters-1. (vect_model_reduction_cost): Use TARG_SCALAR_TO_VEC_COST for initialization cost instead of TARG_VEC_STMT_COST. Use TARG_VEC_TO_SCALAR_COST instead of TARG_VEC_STMT_COST for reduction epilogue code. Fix epilogue cost computation. * config/spu/spu.c (spu_builtin_vectorization_cost): New. (TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST): Implement. * config/spu/spu.h (TARG_COND_BRANCH_COST, TARG_SCALAR_STMT_COST): (TARG_SCALAR_LOAD_COST, TARG_SCALAR_STORE_COST, TARG_VEC_STMT_COST): (TARG_VEC_TO_SCALAR_COST, TARG_SCALAR_TO_VEC, TARG_VEC_LOAD_COST): (TARG_VEC_UNALIGNED_LOAD_COST, TARG_VEC_STORE_COST): Define. * gcc.dg/vect/costmodel/ppc/costmodel-vect-reduc-1char.c: Loops now get vectorized. * gcc.dg/vect/costmodel/i386/costmodel-vect-reduc-1char.c: Loops now get vectorized. * gcc.dg/vect/costmodel/spu/spu-costmodel-vect.exp: New. * gcc.dg/vect/costmodel/spu/costmodel-fast-math-vect-pr29925.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-31a.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-31b.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-31c.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-31d.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-iv-9.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-33.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-76a.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-76b.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-76c.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-68a.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-68b.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-68c.c: New. * gcc.dg/vect/costmodel/spu/costmodel-vect-68d.c: New. * lib/target-supports.exp (check_effective_target_vect_int_mul): Add spu. (See attached file: costmodelfixes2.txt)

**Attachment:
costmodelfixes2.txt**

**Follow-Ups**:

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |