This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Don't use permutes for single-element accesses (PR83753)
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: GCC Patches <gcc-patches at gcc dot gnu dot org>, Richard Sandiford <richard dot sandiford at linaro dot org>
- Date: Wed, 10 Jan 2018 14:02:15 +0100
- Subject: Re: Don't use permutes for single-element accesses (PR83753)
- Authentication-results: sourceware.org; auth=none
- References: <87lgh6bsmr.fsf@linaro.org>
On Tue, Jan 9, 2018 at 10:59 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> After cunrolling the inner loop, the remaining loop in the testcase
> has a single 32-bit access and a group of 64-bit accesses. We first
> try to vectorise at 128 bits (VF 4), but decide not to for cost reasons.
> We then try with 64 bits (VF 2) instead. This means that the group
> of 64-bit accesses uses a single-element vector, which is deliberately
> supported as of r251538. We then try to create "permutes" for these
> single-element vectors and fall foul of:
>
> for (i = 0; i < 6; i++)
> sel[i] += exact_div (nelt, 2);
>
> in vect_grouped_store_supported, since nelt==1.
>
> Maybe we shouldn't even be trying to vectorise statements in the
> single-element case, and instead just copy the scalar statement
> for each member of the group. But until then, this patch treats
> non-strided grouped accesses as VMAT_CONTIGUOUS if no permutation
> is necessary.
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?
Ok.
RIchard.
> Richard
>
>
> 2018-01-09 Richard Sandiford <richard.sandiford@linaro.org>
>
> gcc/
> PR tree-optimization/83753
> * tree-vect-stmts.c (get_group_load_store_type): Use VMAT_CONTIGUOUS
> for non-strided grouped accesses if the number of elements is 1.
>
> gcc/testsuite/
> PR tree-optimization/83753
> * gcc.dg/torture/pr83753.c: New test.
>
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c 2018-01-09 15:46:34.439449019 +0000
> +++ gcc/tree-vect-stmts.c 2018-01-09 18:15:53.481983778 +0000
> @@ -1849,10 +1849,16 @@ get_group_load_store_type (gimple *stmt,
> && (can_overrun_p || !would_overrun_p)
> && compare_step_with_zero (stmt) > 0)
> {
> - /* First try using LOAD/STORE_LANES. */
> - if (vls_type == VLS_LOAD
> - ? vect_load_lanes_supported (vectype, group_size)
> - : vect_store_lanes_supported (vectype, group_size))
> + /* First cope with the degenerate case of a single-element
> + vector. */
> + if (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 1U))
> + *memory_access_type = VMAT_CONTIGUOUS;
> +
> + /* Otherwise try using LOAD/STORE_LANES. */
> + if (*memory_access_type == VMAT_ELEMENTWISE
> + && (vls_type == VLS_LOAD
> + ? vect_load_lanes_supported (vectype, group_size)
> + : vect_store_lanes_supported (vectype, group_size)))
> {
> *memory_access_type = VMAT_LOAD_STORE_LANES;
> overrun_p = would_overrun_p;
> Index: gcc/testsuite/gcc.dg/torture/pr83753.c
> ===================================================================
> --- /dev/null 2018-01-08 18:48:58.045015662 +0000
> +++ gcc/testsuite/gcc.dg/torture/pr83753.c 2018-01-09 18:15:53.480983817 +0000
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mcpu=xgene1" { target aarch64*-*-* } } */
> +
> +typedef struct {
> + int m1[10];
> + double m2[10][8];
> +} blah;
> +
> +void
> +foo (blah *info) {
> + int i, d;
> +
> + for (d=0; d<10; d++) {
> + info->m1[d] = 0;
> + info->m2[d][0] = 1;
> + for (i=1; i<8; i++)
> + info->m2[d][i] = 2;
> + }
> +}