This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Don't use permutes for single-element accesses (PR83753)


On Tue, Jan 9, 2018 at 10:59 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> After cunrolling the inner loop, the remaining loop in the testcase
> has a single 32-bit access and a group of 64-bit accesses.  We first
> try to vectorise at 128 bits (VF 4), but decide not to for cost reasons.
> We then try with 64 bits (VF 2) instead.  This means that the group
> of 64-bit accesses uses a single-element vector, which is deliberately
> supported as of r251538.  We then try to create "permutes" for these
> single-element vectors and fall foul of:
>
>               for (i = 0; i < 6; i++)
>                 sel[i] += exact_div (nelt, 2);
>
> in vect_grouped_store_supported, since nelt==1.
>
> Maybe we shouldn't even be trying to vectorise statements in the
> single-element case, and instead just copy the scalar statement
> for each member of the group.  But until then, this patch treats
> non-strided grouped accesses as VMAT_CONTIGUOUS if no permutation
> is necessary.
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?

Ok.

RIchard.

> Richard
>
>
> 2018-01-09  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
>         PR tree-optimization/83753
>         * tree-vect-stmts.c (get_group_load_store_type): Use VMAT_CONTIGUOUS
>         for non-strided grouped accesses if the number of elements is 1.
>
> gcc/testsuite/
>         PR tree-optimization/83753
>         * gcc.dg/torture/pr83753.c: New test.
>
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2018-01-09 15:46:34.439449019 +0000
> +++ gcc/tree-vect-stmts.c       2018-01-09 18:15:53.481983778 +0000
> @@ -1849,10 +1849,16 @@ get_group_load_store_type (gimple *stmt,
>           && (can_overrun_p || !would_overrun_p)
>           && compare_step_with_zero (stmt) > 0)
>         {
> -         /* First try using LOAD/STORE_LANES.  */
> -         if (vls_type == VLS_LOAD
> -             ? vect_load_lanes_supported (vectype, group_size)
> -             : vect_store_lanes_supported (vectype, group_size))
> +         /* First cope with the degenerate case of a single-element
> +            vector.  */
> +         if (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 1U))
> +           *memory_access_type = VMAT_CONTIGUOUS;
> +
> +         /* Otherwise try using LOAD/STORE_LANES.  */
> +         if (*memory_access_type == VMAT_ELEMENTWISE
> +             && (vls_type == VLS_LOAD
> +                 ? vect_load_lanes_supported (vectype, group_size)
> +                 : vect_store_lanes_supported (vectype, group_size)))
>             {
>               *memory_access_type = VMAT_LOAD_STORE_LANES;
>               overrun_p = would_overrun_p;
> Index: gcc/testsuite/gcc.dg/torture/pr83753.c
> ===================================================================
> --- /dev/null   2018-01-08 18:48:58.045015662 +0000
> +++ gcc/testsuite/gcc.dg/torture/pr83753.c      2018-01-09 18:15:53.480983817 +0000
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mcpu=xgene1" { target aarch64*-*-* } } */
> +
> +typedef struct {
> +  int m1[10];
> +  double m2[10][8];
> +} blah;
> +
> +void
> +foo (blah *info) {
> +  int i, d;
> +
> +  for (d=0; d<10; d++) {
> +    info->m1[d] = 0;
> +    info->m2[d][0] = 1;
> +    for (i=1; i<8; i++)
> +      info->m2[d][i] = 2;
> +  }
> +}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]