This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC][tree-vect]PR 88915: Further vectorize second loop when versioning




On 30/07/2019 13:16, Andre Vieira (lists) wrote:
Hi Richard,

I believe this is in line with what you were explaining to me earlier. The one thing I haven't quite done here is the jump around for if there is no prolog peeling. Though I think skip_vectors introduces the jumping we need.

The main gist of it is:
1) if '--param vect-epilogues-nomask=1' is passed we refrain from adding the versioning threshold check when versioning a loop at first, 2) later we allow the vectorizer to use skip_vectors to basically check niters to be able to skip the vectorized loop on to the right epilogue, 3) after vectorizing the epilogue, we check whether this was a versioned loop and whether we skipped adding the versioning threshold, if so we add a threshold based on the last VF

There is a caveat here, if we don't vectorize the epilogue, because say our architecture doesn't know how, we will end up with a regression. Let's say we have a loop for which our target can vectorize for 32x but not 16x, we get:

if (alias & threshold_for_16x ())
{
   if (niters > 31)
     vectorized_31 ();
   else
     scalar ();
}
else
  scalar ();

Imagine our iteration count is 18, all we hit is the scalar loop, but now we go through 1x alias and 2x niters. Whereas potentially we could have done with just 1x niters.

A mitigation for this could be to only add the threshold check if we actually vectorized the last loop, I believe a: 'if (LOOP_VINFO_VECTORIZABLE_P (loop_vinfo)) return NULL;' inside that new "else" in vect_transform_loop would do the trick. Though then we will still have that extra alias check... > I am going to hack around in the back-end to "fake" a situation like this and see what happens.

Right should have quickly tried this before sending the email, what happens is actually vect_transform_loop never gets called because try_vectorize_loop_1 will recognize it can't vectorize the epilogue. So we end up with the "mitigation" result I suggested, where we simply don't have a versioning threshold which might still not be ideal.


Another potential issue arises when the profitability threshold obtained from the cost model isn't guaranteed to be either LE or EQ to the versioning threshold. In that case there are two separate checks, which now we no longer will attempt to fold.

And I just realized I don't take the cost model threshold into account when creating the new threshold check, I guess if it is ordered_p we should again take the max there between the new threshold check and the cost threshold for the new check.


I am trying to find a way to test and benchmark these changes. Unfortunately I am having trouble doing this for AVX512 as I find that the old '--param vect_epilogues_nomask=1' already causes wrong codegen in SPEC2017 for the gcc and perlbench benchmarks.


Cheers,
Andre

On 19/07/2019 13:38, Andre Vieira (lists) wrote:


On 19/07/2019 12:35, Richard Biener wrote:
On Fri, 19 Jul 2019, Andre Vieira (lists) wrote:



On 15/07/2019 11:54, Richard Biener wrote:
On Mon, 15 Jul 2019, Andre Vieira (lists) wrote:



On 12/07/2019 11:19, Richard Biener wrote:
On Thu, 11 Jul 2019, Andre Vieira (lists) wrote:


I have code that can split the condition and alias checks in
'vect_loop_versioning'. For this approach I am considering keeping that
bit of
code and seeing if I can patch up the checks after vectorizing the
epilogue
further. I think initially I will just go with a "hacked up" way of
passing
down the bb with the iteration check and split the false edge every time
we
vectorize it further. Will keep you posted on progress. If you have any
pointers/tips they are most welc ome!

I thought to somehow force the idea that we have a prologue loop
to the vectorizer so it creates the number-of-vectorized iterations
check and branch around the main (highest VF) vectorized loop.


Hmm I think I may have skimmed over this earlier. I am reading it now and am
not entirely sure what you mean by this. Specifically the "number of
vectorized iterations" or how a prologue loop plays a role in this.

When there is no prologue then the versioning condition currently
ensures we always enter the vector loop.  Only if there is a prologue
is there a check and branch eventually skipping right to the epilogue.
If the versioning condition (now using a lower VF) doesn't guarantee
this we have to add this jump-around.

Right, I haven't looked at the prologue path yet. I had a quick look and can't find where this branch skipping to the epilogue is constructed.  I will take a better look after I got my current example to work.


I guess this sheds some light on the comment above. And it definitely implies we would need to know the lowest VF when creating this condition. Which is
tricky.

We can simply use the smallest vector size supported by the target to
derive it from the actual VF, no?

So I could wait to introduce this check after all epilogue vectorization is done, back track to the last niter check and duplicate that in the outer loop.

What I didn't want to do was use the smallest possible vector size for the target because I was under the impression that does not necessarily correspond to the smallest VF we may have for a loop, as the vectorizer may have decided not to vectorize for that vector size because of costs? If it I can assume this never happens, that once it starts to vectorize epilogues that it will keep vectorizing them for any vector size it knows off then yeah I can use that.


I'm not sure I understand - why would you have any check not inside
the outer loop?  Yes, we now eventually hoist versioning checks
but the VF checks for the individual variants should be around
the vectorized loop itself (so not really part of the versioning check).

Yeah I agree. I was just explaining what I had done wrong now.

Cheers,
Andre

PS: I often find myself having to patch the DOMINATOR information, sometimes its easy to, but sometimes it can get pretty complicated. I wonder whether it would make sense to write something that traverses a loop and corrects this,
if it doesn't exist already.

There's iterate-fix-dominators, but unless you create new edges/blocks
manually rather than doing split-block/redirect-edge which should do
dominator updating for you.

Ah I was doing everything manually after having some bad experiences with lv_add_condition_to_bb.  I will have a look at those thanks!

Cheers,
Andre


Richard.



Richard.





Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]