This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[patch] [autovect] vectorizer cost-model: spu testcases and costs + fixes


This patch adds vectorizer cost model testing for the SPU, and also defines
the cost-model target-specific costs for the spu. While tuning the
cost-model for the SPU I also found a few places where we can be a little
more accurate, so this patch also includes several small fixes/enhancements
to the cost-model itself:

- Use TARG_SCALAR_TO_VEC_COST for the reduction initialization cost
(instead of TARG_VEC_STMT_COST).

- Use TARG_VEC_TO_SCALAR_COST for the reduction finalization cost (instead
of TARG_VEC_STMT_COST).

- Have vect_estimate_min_profitable_iters return the threshold it computed
minus 1 (instead of returning min_profitable_iters), because the way it is
used later is in the following condition:
      "if (niters <= min_profitable_iters) then skip the vectorized loop"
(whereas min_profitable_iters is already expected to be profitable, so we
are being too conservative).

- Differentiate between the costs of different scalar stmts instead of
using a cost 1 for all scalar stmts (even this is probably not enough - we
probably want to be able to have a finer differentiation eventually).

- Add a target-builtin to allow targets to add any additional global costs.

- When we don't know the number of prologue/epilogue iterations we
currently assume the worst (i.e. VF-1). Instead, this patch changes it to
use (VF-1)/2, which is supposed to be "statistically" closer to reality,
with a small bias towards vectorizing. We can consider having different
levels of conservativeness to the cost model, according to a user specified
parameter, which would affect, among other things, if we estimate the
prologue/epilogu iteration count to be (VF-1)/2, or VF/2, or VF-1 (most
conservative).

About the SPU specific stuff - these are the costs that I currently set
(based on tuning only on one benchmark suite, I'm sure this can be further
refined):

- scalar load: 2 (rational: load + rotate)
- aligned vector load: 1
- unaligned vector load: 2 (rational: load + shuffle)
- scalar store: 10 (rational: it takes about 10 cycles for the stqd to
start when doing a scalar store because it's preceded by a load + shuffle
sequence)
- targetm.vectorization_cost: adds to the latency of a mispredicted branch
(19) to the costs of choosing the scalar version of the loop (the cost of
following the mis-predicted path when skipping the vectorized loop).
- branch cost: 6 (rational: somewhere between the latency of a correctly
predicted branch (1) and the latency of an incorrectly predicted branch
(19) relative to the latency of other insns (2-7). i.e some kind of
"average" over {1,19/7,19/2}).
- all other costs - 1 per insn

At least in the specific benchmark suite I was playing with, it was almost
always better to vectorize on the SPU (which is not surprising). The
interesting thing was that in the few cases when it's not profitable to
vectorize (when the loop-count was very small), it was usually better to
fall through to the slower vectorized version of the loop, than to jump to
the scalar version of the loop, thereby paying the cost of a mis-predicted
branch (which is very painful on the SPU because of no branch prediction).

Bootstrapped with vectorization enabled,
with and without the cost model,
and tested on the vectorizer testcases,
on powerpc-linux and i386-linux.
Also built for the SPU and tested the vectorizer testcases on the SPU.

Committed to autovect-branch.
To be submitted to mainline after the freeze.

Dorit

        * target.h (builtin_vectorization_cost): Add new target builtin.
        * target-def.h (TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST): New.

        * tree-vectorizer.h (TARG_SCALAR_STMT_COST): New.
        (TARG_SCALAR_LOAD_COST, TARG_SCALAR_STORE_COST): New.
        * tree-vect-analyze.c (vect_analyze_slp_instance): Initisliaze
        uninitialized variables.
        * tree-vect-transform.c (cost_for_stmt): New function.
        (vect_estimate_min_profitable_iters): Call cost_for_stmt instead of
        using cost 1 for all scalar stmts. Be less conservative when
        estimating the number of prologue/epulogue iterations. Cell
        targetm.vectorize.builtin_vectorization_cost. Return
        min_profitable_iters-1.
        (vect_model_reduction_cost): Use TARG_SCALAR_TO_VEC_COST for
        initialization cost instead of TARG_VEC_STMT_COST. Use
        TARG_VEC_TO_SCALAR_COST instead of TARG_VEC_STMT_COST for reduction
        epilogue code.

        * config/spu/spu.c (spu_builtin_vectorization_cost): New.
        (TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST): Implement.
        * config/spu/spu.h (TARG_COND_BRANCH_COST, TARG_SCALAR_STMT_COST):
        (TARG_SCALAR_LOAD_COST, TARG_SCALAR_STORE_COST,
TARG_VEC_STMT_COST):
        (TARG_VEC_TO_SCALAR_COST, TARG_SCALAR_TO_VEC, TARG_VEC_LOAD_COST):
        (TARG_VEC_UNALIGNED_LOAD_COST, TARG_VEC_STORE_COST): Define.

        * gcc.dg/vect/no-scevccp-outer-18.c: Fix dg-final check: works only
        for vect_interleave targets.
        * gcc.dg/vect/no-scevccp-outer-19.c: Fix dg-final check: works only
        for vect_unpack targets.
        * gcc.dg/vect/no-scevccp-outer-21.c: Likewise.
        * gcc.dg/vect/no-scevccp-outer-16.c: Likewise.
        * gcc.dg/vect/no-scevccp-outer-17.c: Likewise.
        * gcc.dg/vect/vect-outer-2.c: Fix dg-final check:
vect_intfloat_cvt.

        * gcc.dg/vect/costmodel/spu/spu-costmodel-vect.exp: New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-outer-fir.c: New.
        * gcc.dg/vect/costmodel/spu/costmodel-fast-math-vect-pr29925.c:
New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-31a.c: New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-31b.c: New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-31c.c: New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-31d.c: New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-iv-9.c: New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-33.c: New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-76a.c: New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-76b.c: New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-76c.c: New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-68a.c: New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-68b.c: New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-68c.c: New.
        * gcc.dg/vect/costmodel/spu/costmodel-vect-68d.c: New.

        * lib/target-supports.exp (check_effective_target_vect_int_mul):
Add
        spu.

(See attached file: costmodelfixes.autovect.txt)

Attachment: costmodelfixes.autovect.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]