This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC][tree-vect]PR 88915: Further vectorize second loop when versioning


On Tue, 30 Jul 2019, Andre Vieira (lists) wrote:

> 
> 
> On 30/07/2019 13:16, Andre Vieira (lists) wrote:
> > Hi Richard,
> > 
> > I believe this is in line with what you were explaining to me earlier. The
> > one thing I haven't quite done here is the jump around for if there is no
> > prolog peeling. Though I think skip_vectors introduces the jumping we need.
> > 
> > The main gist of it is:
> > 1) if '--param vect-epilogues-nomask=1' is passed we refrain from adding the
> > versioning threshold check when versioning a loop at first,
> > 2) later we allow the vectorizer to use skip_vectors to basically check
> > niters to be able to skip the vectorized loop on to the right epilogue,
> > 3) after vectorizing the epilogue, we check whether this was a versioned
> > loop and whether we skipped adding the versioning threshold, if so we add a
> > threshold based on the last VF
> > 
> > There is a caveat here, if we don't vectorize the epilogue, because say our
> > architecture doesn't know how, we will end up with a regression.
> > Let's say we have a loop for which our target can vectorize for 32x but not
> > 16x, we get:
> > 
> > if (alias & threshold_for_16x ())
> > {
> >    if (niters > 31)
> >      vectorized_31 ();
> >    else
> >      scalar ();
> > }
> > else
> >   scalar ();
> > 
> > Imagine our iteration count is 18, all we hit is the scalar loop, but now we
> > go through 1x alias and 2x niters. Whereas potentially we could have done
> > with just 1x niters.

True.  Note we should swap the checks to

  if (threshold_for_16x && alias)

> > A mitigation for this could be to only add the threshold check if we
> > actually vectorized the last loop, I believe a:
> > 'if (LOOP_VINFO_VECTORIZABLE_P (loop_vinfo)) return NULL;' inside that new
> > "else" in vect_transform_loop would do the trick. Though then we will still
> > have that extra alias check... >
> > I am going to hack around in the back-end to "fake" a situation like this
> > and see what happens.
> 
> Right should have quickly tried this before sending the email, what happens is
> actually vect_transform_loop never gets called because try_vectorize_loop_1
> will recognize it can't vectorize the epilogue. So we end up with the
> "mitigation" result I suggested, where we simply don't have a versioning
> threshold which might still not be ideal.

I think the main issue is how we have implemented epilogue vectorization.
Ideally when vectorizing the loop () we'd recognize all VFs we can handle
and thus know beforehand.  I think that's already done for the sake
of openmp simd now so doing this when epilogue vectorization is enabled
as well wouldn't be too bad - so then we know, at the time we do the
versioning, what and how many vectorized epilogues we create.  See
vect_analyze_loop and the loop over vector sizes.

> > 
> > Another potential issue arises when the profitability threshold obtained
> > from the cost model isn't guaranteed to be either LE or EQ to the versioning
> > threshold. In that case there are two separate checks, which now we no
> > longer will attempt to fold.
> 
> And I just realized I don't take the cost model threshold into account when
> creating the new threshold check, I guess if it is ordered_p we should again
> take the max there between the new threshold check and the cost threshold for
> the new check.

Also see https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01397.html for
a patch that computes costs of each possible VF, with that we could
compute a better combined estimate for minimum number of iters
(also considering the extra jumps to reach the main VF/2 loop).

> > 
> > I am trying to find a way to test and benchmark these changes. Unfortunately
> > I am having trouble doing this for AVX512 as I find that the old '--param
> > vect_epilogues_nomask=1' already causes wrong codegen in SPEC2017 for the
> > gcc and perlbench benchmarks.

There's a reason it is not enabled by default.  But I'd like to
fix bugs it has so can you possibly reduce testcases and file
bugs for it?

Thanks,
Richard.

> > 
> > 
> > Cheers,
> > Andre
> > 
> > On 19/07/2019 13:38, Andre Vieira (lists) wrote:
> > > 
> > > 
> > > On 19/07/2019 12:35, Richard Biener wrote:
> > > > On Fri, 19 Jul 2019, Andre Vieira (lists) wrote:
> > > > 
> > > > > 
> > > > > 
> > > > > On 15/07/2019 11:54, Richard Biener wrote:
> > > > > > On Mon, 15 Jul 2019, Andre Vieira (lists) wrote:
> > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > On 12/07/2019 11:19, Richard Biener wrote:
> > > > > > > > On Thu, 11 Jul 2019, Andre Vieira (lists) wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > I have code that can split the condition and alias checks in
> > > > > > > 'vect_loop_versioning'. For this approach I am considering keeping
> > > > > > > that
> > > > > > > bit of
> > > > > > > code and seeing if I can patch up the checks after vectorizing the
> > > > > > > epilogue
> > > > > > > further. I think initially I will just go with a "hacked up" way
> > > > > > > of
> > > > > > > passing
> > > > > > > down the bb with the iteration check and split the false edge
> > > > > > > every time
> > > > > > > we
> > > > > > > vectorize it further. Will keep you posted on progress. If you
> > > > > > > have any
> > > > > > > pointers/tips they are most welc ome!
> > > > > > 
> > > > > > I thought to somehow force the idea that we have a prologue loop
> > > > > > to the vectorizer so it creates the number-of-vectorized iterations
> > > > > > check and branch around the main (highest VF) vectorized loop.
> > > > > > 
> > > > > 
> > > > > Hmm I think I may have skimmed over this earlier. I am reading it now
> > > > > and am
> > > > > not entirely sure what you mean by this. Specifically the "number of
> > > > > vectorized iterations" or how a prologue loop plays a role in this.
> > > > 
> > > > When there is no prologue then the versioning condition currently
> > > > ensures we always enter the vector loop.  Only if there is a prologue
> > > > is there a check and branch eventually skipping right to the epilogue.
> > > > If the versioning condition (now using a lower VF) doesn't guarantee
> > > > this we have to add this jump-around.
> > > 
> > > Right, I haven't looked at the prologue path yet. I had a quick look and
> > > can't find where this branch skipping to the epilogue is constructed.  I
> > > will take a better look after I got my current example to work.
> > > 
> > > > > 
> > > > > I guess this sheds some light on the comment above. And it definitely
> > > > > implies
> > > > > we would need to know the lowest VF when creating this condition.
> > > > > Which is
> > > > > tricky.
> > > > 
> > > > We can simply use the smallest vector size supported by the target to
> > > > derive it from the actual VF, no?
> > > 
> > > So I could wait to introduce this check after all epilogue vectorization
> > > is done, back track to the last niter check and duplicate that in the
> > > outer loop.
> > > 
> > > What I didn't want to do was use the smallest possible vector size for the
> > > target because I was under the impression that does not necessarily
> > > correspond to the smallest VF we may have for a loop, as the vectorizer
> > > may have decided not to vectorize for that vector size because of costs?
> > > If it I can assume this never happens, that once it starts to vectorize
> > > epilogues that it will keep vectorizing them for any vector size it knows
> > > off then yeah I can use that.
> > > 
> > > 
> > > > I'm not sure I understand - why would you have any check not inside
> > > > the outer loop?  Yes, we now eventually hoist versioning checks
> > > > but the VF checks for the individual variants should be around
> > > > the vectorized loop itself (so not really part of the versioning check).
> > > 
> > > Yeah I agree. I was just explaining what I had done wrong now.
> > > > 
> > > > > Cheers,
> > > > > Andre
> > > > > 
> > > > > PS: I often find myself having to patch the DOMINATOR information,
> > > > > sometimes
> > > > > its easy to, but sometimes it can get pretty complicated. I wonder
> > > > > whether it
> > > > > would make sense to write something that traverses a loop and corrects
> > > > > this,
> > > > > if it doesn't exist already.
> > > > 
> > > > There's iterate-fix-dominators, but unless you create new edges/blocks
> > > > manually rather than doing split-block/redirect-edge which should do
> > > > dominator updating for you.
> > > 
> > > Ah I was doing everything manually after having some bad experiences with
> > > lv_add_condition_to_bb.  I will have a look at those thanks!
> > > 
> > > Cheers,
> > > Andre
> > > 
> > > > 
> > > > Richard.
> > > > 
> > > > > 
> > > > > > 
> > > > > > Richard.
> > > > > > 
> > > > > 
> > > > 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]