This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC][tree-vect]PR 88915: Further vectorize second loop when versioning




On 12/07/2019 11:19, Richard Biener wrote:
On Thu, 11 Jul 2019, Andre Vieira (lists) wrote:


I think for code-size reason it would make sense to do it like

   if (iterations_check_for_lowest_VF ())
     {
       if (alias_check_for_highest_VF ())
         {
           vectorized_for_highest_VF ();
           vectorized epilogues;
         }
     }

and make the vectorized_for_highest_VF loop skipped, falling through
to the vectorized epilogues, when the number of iterations isn't
enough to hit it.

Are you suggesting we only make the distinction between highest and lowest VF? Why not something like:

if (alias_check_for_highest_VF ())
{
  if (iterations_check_VF_0 ())
    goto VF_0;
  else if (iterations_check_VF_1 ())
    goto VF_1;
  else if (iterations_check_VF_2 ())
    goto VF_2;
  ...
VF_0:
 vectorized_for_vf_0();
VF_1:
 vectorized_for_vf_1();
VF_2:
 vectorized_for_vf_2();
...
}
else
{
  goto scalar_loop;
}

I'll go have a look at how to best do this. The benefit of the earlier approach is it was able to use a lot of the existing vectorizer code to get it done.

I have code that can split the condition and alias checks in 'vect_loop_versioning'. For this approach I am considering keeping that bit of code and seeing if I can patch up the checks after vectorizing the epilogue further. I think initially I will just go with a "hacked up" way of passing down the bb with the iteration check and split the false edge every time we vectorize it further. Will keep you posted on progress. If you have any pointers/tips they are most welcome!


The advantage is that this would just use the epilogue vectorization
code and it would avoid excessive code growth if you have many
VFs to consider (on x86 we now have 8 byte, 16 byte, 32 byte and
64 byte vectors...).  The disadvantage is of course that a small
number of loops will not enter the vector code at all - namely
those that would pass the alias check for lowest_VF but not the
one for highest_VF.  I'm sure this isn't a common situation and
in quite a number of cases we formulate the alias check in a way
that it isn't dependent on the VF anyways.

The code growth is indeed a factor and I can see the argument for choosing this approach over the other. Cases of such specific overlaps are most likely oddities rather than the common situation.



There's also possibly
an extra branch for the case the highest_VF loop isn't entered
(unless there already was a prologue loop).
I don't understand this one, can you elaborate?

Cheers,
Andre


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]