This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[patch] more vectorizer costmodel fixes/improvements + spu costs (needs review)

From: Dorit Nuzman <DORIT at il dot ibm dot com>
To: gcc-patches at gcc dot gnu dot org
Cc: Andrew_Pinski at PlayStation dot Sony dot Com
Date: Thu, 5 Jul 2007 22:36:11 +0300
Subject: [patch] more vectorizer costmodel fixes/improvements + spu costs (needs review)

This patch brings over the following patches from autovect-branch:
http://gcc.gnu.org/ml/gcc-patches/2007-06/msg01981.html
http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00225.html

I need approval for the bits outside the vectorizer (target.h,
target-def.h), and for the changes in the spu port.

"
This patch adds vectorizer cost model testing for the SPU, and also defines
the cost-model target-specific costs for the spu. While tuning the
cost-model for the SPU I also found a few places where we can be a little
more accurate, so this patch also includes several small fixes/enhancements
to the cost-model itself:

- Use TARG_SCALAR_TO_VEC_COST for the reduction initialization cost
(instead of TARG_VEC_STMT_COST).

- Use TARG_VEC_TO_SCALAR_COST for the reduction finalization cost (instead
of TARG_VEC_STMT_COST).

- In computing the cost of the reduction epilogue there was a bug which
caused us to never take the path of the epilogue that uses vector-shifts
(instead we always took the path of the epilogue that uses scalar
operations). This caused to estimate higher costs for reduction epilogues
on powerpc (in one testcase the cost was computed to be 32, instead of 10).
So now with this fix a couple loops in the powerpc costmodel testsuite get
vectorized when they didn't before.

- Have vect_estimate_min_profitable_iters return the threshold it computed
minus 1 (instead of returning min_profitable_iters), because the way it is
used later is in the following condition:
      "if (niters <= min_profitable_iters) then skip the vectorized loop"
(whereas min_profitable_iters is already expected to be profitable, so we
are being too conservative).

- Differentiate between the costs of different scalar stmts instead of
using a cost 1 for all scalar stmts (even this is probably not enough - we
probably want to be able to have a finer differentiation eventually).

- Add a target-builtin to allow targets to add any additional global costs.

- When we don't know the number of prologue/epilogue iterations we
currently assume the worst (i.e. VF-1). Instead, this patch changes it to
use (VF-1)/2, which is supposed to be "statistically" closer to reality,
with a small bias towards vectorizing. We can consider having different
levels of conservativeness to the cost model, according to a user specified
parameter, which would affect, among other things, if we estimate the
prologue/epilogu iteration count to be (VF-1)/2, or VF/2, or VF-1 (most
conservative).

About the SPU specific stuff - these are the costs that I currently set
(based on tuning only on one benchmark suite, I'm sure this can be further
refined):

- scalar load: 2 (rational: load + rotate)
- aligned vector load: 1
- unaligned vector load: 2 (rational: load + shuffle)
- scalar store: 10 (rational: it takes about 10 cycles for the stqd to
start when doing a scalar store because it's preceded by a load + shuffle
sequence)
- targetm.vectorization_cost: adds to the latency of a mispredicted branch
(19) to the costs of choosing the scalar version of the loop (the cost of
following the mis-predicted path when skipping the vectorized loop).
- branch cost: 6 (rational: somewhere between the latency of a correctly
predicted branch (1) and the latency of an incorrectly predicted branch
(19) relative to the latency of other insns (2-7). i.e some kind of
"average" over {1,19/7,19/2}).
- all other costs - 1 per insn
"

Bootstrapped with vectorization enabled and tested on the vectorizer
testcases on i386-linux. Also bootstrapped on powerpc-linux (without
fortran, because of http://gcc.gnu.org/ml/gcc/2007-07/msg00038.html).
Also built for the SPU and tested the vectorizer testcases on the SPU.

:ADDPATCH target-builtin,spu:

thanks,
dorit

        * target.h (builtin_vectorization_cost): Add new target builtin.
        * target-def.h (TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST): New.

        * tree-vectorizer.h (TARG_SCALAR_STMT_COST): New.
        (TARG_SCALAR_LOAD_COST, TARG_SCALAR_STORE_COST): New.
        * tree-vect-analyze.c (vect_analyze_slp_instance): Initisliaze
        uninitialized variables.
        * tree-vect-transform.c (cost_for_stmt): New function.
        (vect_estimate_min_profitable_iters): Call cost_for_stmt instead of
        using cost 1 for all scalar stmts. Be less conservative when
        estimating the number of prologue/epulogue iterations. Call
        targetm.vectorize.builtin_vectorization_cost. Return
        min_profitable_iters-1.
        (vect_model_reduction_cost): Use TARG_SCALAR_TO_VEC_COST for
        initialization cost instead of TARG_VEC_STMT_COST. Use
        TARG_VEC_TO_SCALAR_COST instead of TARG_VEC_STMT_COST for reduction
        epilogue code. Fix epilogue cost computation.

        * config/spu/spu.c (spu_builtin_vectorization_cost): New.
        (TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST): Implement.
        * config/spu/spu.h (TARG_COND_BRANCH_COST, TARG_SCALAR_STMT_COST):
        (TARG_SCALAR_LOAD_COST, TARG_SCALAR_STORE_COST,
TARG_VEC_STMT_COST):
        (TARG_VEC_TO_SCALAR_COST, TARG_SCALAR_TO_VEC, TARG_VEC_LOAD_COST):
        (TARG_VEC_UNALIGNED_LOAD_COST, TARG_VEC_STORE_COST): Define.

        * gcc.dg/vect/costmodel/ppc/costmodel-vect-reduc-1char.c: Loops now
        get vectorized.
        * gcc.dg/vect/costmodel/i386/costmodel-vect-reduc-1char.c: Loops
now
        get vectorized.

        * gcc.dg/vect/costmodel/spu/spu-costmodel-vect.exp: New.
        * gcc.dg/vect/costmodel/spu/costmodel-fast-math-vect-pr29925.c:
        New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-31a.c: New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-31b.c: New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-31c.c: New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-31d.c: New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-iv-9.c: New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-33.c: New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-76a.c: New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-76b.c: New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-76c.c: New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-68a.c: New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-68b.c: New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-68c.c: New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-68d.c: New.

        * lib/target-supports.exp (check_effective_target_vect_int_mul):
        Add spu.

(See attached file: costmodelfixes2.txt)

Attachment: costmodelfixes2.txt
Description: Text document

Follow-Ups:
- Re: [patch] more vectorizer costmodel fixes/improvements + spu costs (needs review)
  - From: Andrew Pinski

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]