This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC][tree-vect]PR 88915: Further vectorize second loop when versioning


On Mon, 15 Jul 2019, Andre Vieira (lists) wrote:

> 
> 
> On 12/07/2019 11:19, Richard Biener wrote:
> > On Thu, 11 Jul 2019, Andre Vieira (lists) wrote:
> > 
> > 
> > I think for code-size reason it would make sense to do it like
> > 
> >    if (iterations_check_for_lowest_VF ())
> >      {
> >        if (alias_check_for_highest_VF ())
> >          {
> >            vectorized_for_highest_VF ();
> >            vectorized epilogues;
> >          }
> >      }
> > 
> > and make the vectorized_for_highest_VF loop skipped, falling through
> > to the vectorized epilogues, when the number of iterations isn't
> > enough to hit it.
> 
> Are you suggesting we only make the distinction between highest and lowest VF?
> Why not something like:
> 
> if (alias_check_for_highest_VF ())
> {
>   if (iterations_check_VF_0 ())
>     goto VF_0;
>   else if (iterations_check_VF_1 ())
>     goto VF_1;
>   else if (iterations_check_VF_2 ())
>     goto VF_2;
>   ...
> VF_0:
>  vectorized_for_vf_0();
> VF_1:
>  vectorized_for_vf_1();
> VF_2:
>  vectorized_for_vf_2();
> ...
> }
> else
> {
>   goto scalar_loop;
> }

I think it will actually do it this way via the epilogue vectorization
path.
 
> I'll go have a look at how to best do this. The benefit of the earlier
> approach is it was able to use a lot of the existing vectorizer code to get it
> done.
>
> I have code that can split the condition and alias checks in
> 'vect_loop_versioning'. For this approach I am considering keeping that bit of
> code and seeing if I can patch up the checks after vectorizing the epilogue
> further. I think initially I will just go with a "hacked up" way of passing
> down the bb with the iteration check and split the false edge every time we
> vectorize it further. Will keep you posted on progress. If you have any
> pointers/tips they are most welcome!

I thought to somehow force the idea that we have a prologue loop
to the vectorizer so it creates the number-of-vectorized iterations
check and branch around the main (highest VF) vectorized loop.

> > 
> > The advantage is that this would just use the epilogue vectorization
> > code and it would avoid excessive code growth if you have many
> > VFs to consider (on x86 we now have 8 byte, 16 byte, 32 byte and
> > 64 byte vectors...).  The disadvantage is of course that a small
> > number of loops will not enter the vector code at all - namely
> > those that would pass the alias check for lowest_VF but not the
> > one for highest_VF.  I'm sure this isn't a common situation and
> > in quite a number of cases we formulate the alias check in a way
> > that it isn't dependent on the VF anyways.
> 
> The code growth is indeed a factor and I can see the argument for choosing
> this approach over the other. Cases of such specific overlaps are most likely
> oddities rather than the common situation.

Yeah, it also looks simplest to me (and a motivation to enable
epilogue vectorization by default).

> > There's also possibly
> > an extra branch for the case the highest_VF loop isn't entered
> > (unless there already was a prologue loop).
> I don't understand this one, can you elaborate?

The branch around the main vectorized loop I talked about above.
So I'd fool the versioning condition to use the lowest VF for
the iteration count checking and use the code that handles
zero-trip iteration count for the vector loop unconditionally.

In some way this makes checking the niter condition on the version
check pointless (at least if we have a really low lowest VF like
on x64 where it will likely be 2), so we may want to elide that
completely?  For the check to be "correct" we'd also need to
compute the lowest VF a vectorized epilogue is still profitable
(on x86 those will run once or never, but we can also end up
with say main AVX512 vectorization, and a single vectorized
epilogue with SSE2 if we somehow figure AVX256 vectorization
isn't profitable for it - we can also end up with non-vectorizable
epilogue).  So with the current setup how we vectorize epilogues
we maybe want to have a location of the version niter check we
can "patch up" later after (not) vectorizing the epilogue(s).

Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]