This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC] S/390: Alignment peeling prolog generation

From: "Bin.Cheng" <amker dot cheng at gmail dot com>
To: Robin Dapp <rdapp at linux dot vnet dot ibm dot com>
Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>
Date: Tue, 11 Apr 2017 15:57:29 +0100
Subject: Re: [RFC] S/390: Alignment peeling prolog generation
Authentication-results: sourceware.org; auth=none
References: <0296a54f-cb8d-d9b8-380a-9cc553dbb6da@linux.vnet.ibm.com>

On Tue, Apr 11, 2017 at 3:38 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
> Hi,
>
> when looking at various vectorization examples on s390x I noticed that
> we still peel vf/2 iterations for alignment even though vectorization
> costs of unaligned loads and stores are the same as normal loads/stores.
>
> A simple example is
>
> void foo(int *restrict a, int *restrict b, unsigned int n)
> {
>   for (unsigned int i = 0; i < n; i++)
>     {
>       b[i] = a[i] * 2 + 1;
>     }
> }
>
> which gets peeled unless __builtin_assume_aligned (a, 8) is used.
>
> In tree-vect-data-refs.c there are several checks that involve costs  in
> the peeling decision none of which seems to suffice in this case. For a
> loop with only read DRs there is a check that has been triggering (i.e.
> disable peeling) since we implemented the vectorization costs.
>
> Here, we have DR_MISALIGNMENT (dr) == -1 for all DRs but the costs
> should still dictate to never peel. I attached a tentative patch for
> discussion which fixes the problem by checking the costs for npeel = 0
> and npeel = vf/2 after ensuring we support all misalignments. Is there a
> better way and place to do it? Are we missing something somewhere else
> that would preclude the peeling from happening?
>
> This is not indended for stage 4 obviously :)
Hi Robin,
Seems Richi added code like below comparing costs between aligned and
unsigned loads, and only peeling if it's beneficial:

      /* In case there are only loads with different unknown misalignments, use
         peeling only if it may help to align other accesses in the loop or
     if it may help improving load bandwith when we'd end up using
     unaligned loads.  */
      tree dr0_vt = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr0)));
      if (!first_store
      && !STMT_VINFO_SAME_ALIGN_REFS (
          vinfo_for_stmt (DR_STMT (dr0))).length ()
      && (vect_supportable_dr_alignment (dr0, false)
          != dr_unaligned_supported
          || (builtin_vectorization_cost (vector_load, dr0_vt, 0)
          == builtin_vectorization_cost (unaligned_load, dr0_vt, -1))))
        do_peeling = false;

I think similar codes can be added for store cases too.

Thanks,
bin
>
> Regards
>  Robin

Follow-Ups:
- Re: [RFC] S/390: Alignment peeling prolog generation
  - From: Robin Dapp
- Re: [RFC] S/390: Alignment peeling prolog generation
  - From: Richard Biener

References:
- [RFC] S/390: Alignment peeling prolog generation
  - From: Robin Dapp

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]