This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[patch][vectorizer, SPU, PPC] Support load permutation in loop-aware SLP


Hi,

Current loop-aware SLP scheme starts from a group of adjacent stores and
follows use-def chains until getting to a group of loads. The loads must be
adjacent and their order must match the order of the stores, i.e., no
permutations are currently allowed.

This patch adds a support of a specific type of load permutations along
with general support of load permutations in SLP. It aims to vectorize RGB
to YUV conversion, that can be viewed as {y, u, v} = M * {r, g, b}, where M
is a matrix of constant coefficients, and the calculation is performed in a
single-nested loop:
  for i
    yi = M00 * ri +  M01 * gi + M02 * bi
    ui = M10 * ri +  M11 * gi + M12 * bi
    vi = M20 * ri +  M21 * gi + M22 * bi
The required permutation of loads is to transform rgb stream into {r,r,r},
{g,g,g} and {b,b,b} vectors (ignoring vector size for simplicity).

The SLP analysis detects such cases: all the loads in the same SLP node
must access the same memory location, and all the SLP nodes that contain
loads must form a group of adjacent memory accesses. The transformation
phase generates vector permutations of the input vectors with compiler
generated masks, depending on the data type, vectorization factor and size
of SLP nodes.

Bootstrapped with vectorization enabled on ppc-linux and tested on Cell SPU
and ppc-linux.
O.K. for mainline?

Thanks,
Ira

ChangeLog:

      * target.h (struct vectorize): Add new target builtin.
      * tree-vectorizer.h (enum slp_load_perm_type): New.
      (struct _slp_tree): Add new field loads_perm_type..
      (struct _slp_instance): Add new field same_perm_nodes.
      (SLP_INSTANCE_SAME_PERM_NODES): New.
      (SLP_TREE_LOADS_PERM_TYPE, TARG_VEC_PERMUTE_COST): New.
      (vectorizable_load): Add argument.
      (vect_transform_slp_perm_load): new.
      * tree-vect-analyze.c (vect_analyze_operations): Add an argument to
      vectorizable_load.
      (vect_build_slp_tree): Add new argument. Allow load permutations for
the case
      when all the loads in the same SLP node access the same memory
location.
      (vect_analyze_slp_instance): In case of same location loads check
that the
      loads from different nodes form an interleaving chain. Sort the nodes
according
      to the chain.
      * target-def.h (TARGET_VECTORIZE_BUILTIN_VEC_PERM): New.
      * tree-vect-transform.c (vect_transform_stmt): Add new argument.
      (vectorizable_store): Allow number of created vectors to be greater
than the
      size of an interleaving group. Don't go along the interleaving chain
for SLP.
      (vect_create_mask_and_perm): New function.
      (vect_get_mask_element, vect_transform_slp_perm_load): Likewise.
      (vectorizable_load): Allocate DR_CHAIN according to the number of
generated
      vectors. Don't keep the created vectors statements in the node if
permutation
      is required. Call vect_transform_slp_perm_load to generate the
permutation.
      (vect_transform_stmt): Add new argument. Call vectorizable_load with
additional
      argument. Don't wait for other stores in case of SLP.
      (vect_schedule_slp_instance): Add new argument. Calculate the number
of vector
      statements. In case of loads from the same location, allocate
vectorized
      statements structure for all the related SLP nodes. Call
vect_transform_stmt with
      additional argument.
      (vect_schedule_slp): Remove one argument. Move number of vector
statements
      calculation to vect_schedule_slp_instance.
      (vect_transform_loop): Call vect_transform_stmt and vect_schedule_slp
with
      correct arguments.
      * config/spu/spu.c (spu_builtin_vec_perm): New.
      (TARGET_VECTORIZE_BUILTIN_VEC_PERM): Redefine..
      * config/spu/spu.h (TARG_VEC_PERMUTE_COS): Define.
      * config/rs6000/rs6000.c (rs6000_builtin_vec_perm): New.
      (TARGET_VECTORIZE_BUILTIN_VEC_PERM): Redefine.

testsuite/ChangeLog:

      * lib/target-supports.exp (check_effective_target_vect_perm): New.
      * gcc.dg/vect/slp-perm-1.c: New testcase.
      * gcc.dg/vect/slp-perm-2.c: Likewise.
      * gcc.dg/vect/slp-perm-3.c: Likewise.
      * gcc.dg/vect/slp-perm-4.c: Likewise.
      * gcc.dg/vect/slp-perm-5.c: Likewise.
      * gcc.dg/vect/slp-perm-6.c: Likewise.
      * gcc.dg/vect/slp-perm-7.c: Likewise.
      * gcc.dg/vect/slp-perm-8.c: Likewise.
      * gcc.dg/vect/slp-perm-9.c: Likewise.

(See attached file: slp-perm.txt)(See attached file: tests.txt)



Attachment: slp-perm.txt
Description: Text document

Attachment: tests.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]