[PATCH v2 3/16]middle-end Add basic SLP pattern matching scaffolding.

Richard Biener rguenther@suse.de
Tue Sep 29 06:45:06 GMT 2020


On Mon, 28 Sep 2020, Tamar Christina wrote:

> 
> 
> > -----Original Message-----
> > From: Gcc-patches <gcc-patches-bounces@gcc.gnu.org> On Behalf Of Tamar
> > Christina
> > Sent: Monday, September 28, 2020 3:56 PM
> > To: Richard Biener <rguenther@suse.de>
> > Cc: nd <nd@arm.com>; gcc-patches@gcc.gnu.org; ook@ucw.cz
> > Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching
> > scaffolding.
> > 
> > Hi Richi,
> > 
> > Thanks for the review!
> > 
> > Just some answers to your questions:
> > 
> > > -----Original Message-----
> > > From: rguenther@c653.arch.suse.de <rguenther@c653.arch.suse.de> On
> > > Behalf Of Richard Biener
> > > Sent: Monday, September 28, 2020 1:37 PM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; ook@ucw.cz
> > > Subject: Re: [PATCH v2 3/16]middle-end Add basic SLP pattern matching
> > > scaffolding.
> > >
> > > On Fri, 25 Sep 2020, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > This patch adds the basic infrastructure for doing pattern matching
> > > > on SLP
> > > trees.
> > > > This is done immediately after the SLP tree creation because it can
> > > > change the shape of the tree in radical ways and so we would like to
> > > > do it before any analysis is performed on the tree.
> > > >
> > > > A new file tree-vect-slp-patterns.c is added which contains all the
> > > > code for pattern matching on SLP trees.
> > > >
> > > > This cover letter is short because the changes are heavily commented.
> > > >
> > > > All pattern matchers need to implement the abstract type
> > > VectPatternMatch.
> > > > The VectSimplePatternMatch abstract class provides some default
> > > > functionality for pattern matchers that need to rebuild nodes.
> > > >
> > > > The pattern matcher requires if replacing a statement in a node,
> > > > that ALL statements be replaced.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > >
> > > +    gcall *build ()
> > > +    {
> > > +      stmt_vec_info stmt_info;
> > > +
> > >
> > > please define functions out-of-line (apart from the 1-liners)
> > >
> > > +      /* We have to explicitly mark the old statement as unused
> > > + because
> > > during
> > > +        statement analysis the original and new pattern statement may
> > > require
> > > +        different level of unrolling.  As an example add/sub when
> > > vectorized
> > > +        without a pattern requires 4 copies, whereas with a
> > > + COMPLEX_ADD
> > > pattern
> > > +        this only requires 2 copies and the two statement will be
> > > + treated
> > > as
> > > +        hand unrolled.  That means that the analysis won't happen as
> > > it'll find
> > > +        a mismatch.  So we don't analyze the old statement and if we
> > > + end
> > > up
> > > +        needing it, e.g. SLP fails then we have to quickly re-analyze it.
> > > */
> > > +      STMT_VINFO_RELEVANT (stmt_info) = vect_unused_in_scope;
> > > +      STMT_VINFO_SLP_VECT_ONLY (call_stmt_info) = true;
> > > +      STMT_VINFO_RELATED_STMT (call_stmt_info) = stmt_info;
> > >
> > > so this means all uses have to be inside the pattern as otherwise
> > > there may be even non-SLP uses.  vect_mark_pattern_stmts supports
> > > detecting patterns of patterns, I suppose the two-phase analysis for
> > > SLP patterns does not support this right now?
> > >
> > > +      SLP_TREE_CODE (this->m_node) = gimple_expr_code (call_stmt);;
> > >
> > > double ;, just make it CALL_EXPR literally (or leave it ERROR_MARK)
> > >
> > > You seem to do in-place changing of the SLP node you match off?
> > 
> > Yes since this would allow me to change the root node as well, though
> > thinking about it I can probably do it by passing it as a reference which then
> > would allow me to re-use vect_create_new_slp_node which is probably
> > preferable.
> > 
> > >
> > > @@ -2192,6 +2378,17 @@ vect_analyze_slp_instance (vec_info *vinfo,
> > >                               &tree_size, bst_map);
> > >    if (node != NULL)
> > >      {
> > > +      /* Temporarily allow add_stmt calls again.  */
> > > +      vinfo->stmt_vec_info_ro = false;
> > > +
> > > +      /* See if any patterns can be found in the constructed SLP tree
> > > +        before we do any analysis on it.  */
> > > +      vect_match_slp_patterns (node, vinfo, group_size, &max_nunits,
> > > +                              matches, &npermutes, &tree_size,
> > > + bst_map);
> > > +
> > > +      /* After this no more add_stmt calls are allowed.  */
> > > +      vinfo->stmt_vec_info_ro = true;
> > > +
> > >
> > > I think this is a bit early to match patterns - I'd defer it to the
> > > point where all entries into the same SLP subgraph are analyzed, thus
> > > somewhere at the end of vect_analyze_slp loop over all instances and
> > > match patterns?  That way phases are more clearly separated.
> > 
> > That would probably work, my only worry is that the SLP analysis itself may
> > fail and bail out at
> > 
> > 	  /* If the loads and stores can be handled with load/store-lane
> > 	     instructions do not generate this SLP instance.  */
> > 	  if (is_a <loop_vec_info> (vinfo)
> > 	      && loads_permuted
> > 	      && dr && vect_store_lanes_supported (vectype, group_size,
> > false))

Ah, that piece of code.  Yeah, I'm repeatedly running into it as well - 
it's a bad hack that stands in the way all the time :/

I guess we should try moving this upward like to
vect_analyze_loop_2 right before

  /* Check the SLP opportunities in the loop, analyze and build SLP trees.  
*/
  ok = vect_analyze_slp (loop_vinfo, *n_stmts);
  if (!ok)
    return ok;

and there check whether all grouped loads and stores can be handled
with store-/load-lanes (and there are no SLP reduction chains?) in
which case do not try to attempt SLP at all.  Because the testcases
this check was supposed to change were all-load/store-lane or
all SLP so the mixed case is probably not worth special casing.

Since load-/store-lanes is an arm speciality I tried to only touch
this fragile part with a ten-foot pole ;)  CCing Richard, if he
acks the above I can produce a patch.

> > Which in the initial tree may be true, but in the patterned tree may not be.
> > In the previous revision of the patch you had suggested I return a boolean
> > which can be used to cancel such checks.  Would that be the preferred
> > approach?



> > >
> > > Note that fiddling with vinfo->stmt_vec_info_ro is a bit ugly, maybe
> > > add a -
> > > >add_pattern_stmt (gimple *pattern_stmt, stmt_vec_info
> > > orig_stmt) variant that also sets STMT_VINFO_RELATED_STMT but doesn't
> > > check !stmt_vec_info_ro.  That could be used from tree-vect-patterns.c
> > > as well and we could set stmt_vec_info_ro earlier.
> > >
> > > +  VectPattern *pattern = patt_fn (node, vinfo);  uint8_t n =
> > > + pattern->get_arity ();
> > > +
> > > +  if (group_size % n != 0)
> > > +    {
> > > +      delete pattern;
> > >
> > > seems to require VectPattern allocation even upon failure, I suggest
> > > to return NULL then to avoid excessive allocations.
> > >
> > > +      if (!pattern->matches (stmt_infos, i))
> > > +       {
> > > +         /* We can only do replacements for entire groups, we must
> > > replace all
> > > +            statements in a node as the argument list/children may
> > > + not
> > > have
> > > +            equal height then.  Operations that don't rewrite the
> > > arguments
> > > +            may be safe to do, so perhaps paramatrise it.  */
> > > +
> > > +         found_p = false;
> > >
> > > I find it a bit ugly to iterate over "unrolls" in the machinery rather
> > > than the individual pattern matcher which might have an easier and in
> > > particular cheaper job here.  Since you require
> > > all lanes to match the same pattern anyway.   Not sure if your
> > > later patches support say, mixing complex add with different rotate in
> > > the same SLP node.
> > 
> > It does, as the constraint only applies to one pattern matcher class handling
> > the entire node.
> > 
> > An example of such case is
> > 
> > node 0x531a1f0 (max_nunits=2, refcnt=2)
> >  stmt 0 *_9 = _10;
> >  stmt 1 *_15 = _16;
> >  stmt 2 *_25 = _26;
> >  stmt 3 *_31 = _32;
> >  children 0x531a980
> > node 0x531a980 (max_nunits=2, refcnt=2)
> >  stmt 0 slp_patt_112 = .COMPLEX_ADD_ROT90 (_4, _14);  stmt 1 slp_patt_111
> > = .COMPLEX_ADD_ROT90 (_12, _8);  stmt 2 slp_patt_110
> > = .COMPLEX_ADD_ROT270 (_20, _30);  stmt 3 slp_patt_109
> > = .COMPLEX_ADD_ROT270 (_28, _24);  lane permutation { 0[0] 1[1] 1[2] 0[3] }
> > children 0x5310680 0x530e040 node 0x5310680 (max_nunits=2, refcnt=4)
> > stmt 0 _4 = *_3;  stmt 1 _12 = *_11;  stmt 2 _20 = *_19;  stmt 3 _28 = *_27;
> > load permutation { 0 1 2 3 } node 0x530e040 (max_nunits=2, refcnt=2)  stmt 0
> > _14 = *_13;  stmt 1 _8 = *_7;  stmt 2 _30 = *_29;  stmt 3 _24 = *_23;  load
> > permutation { 0 1 2 3 }
> > 
> > though looking at the resulting assembly the code is incorrect,
> > 
> > .L6:
> >         ldr     q1, [x1, x3]
> >         ldr     q0, [x0, x3]
> >         fcadd   v0.2d, v0.2d, v1.2d, #270
> >         str     q0, [x2, x3]
> >         ldr     q1, [x5, x3]
> >         ldr     q0, [x6, x3]
> >         fcadd   v0.2d, v0.2d, v1.2d, #270
> >         str     q0, [x4, x3]
> >         add     x3, x3, 32
> >         cmp     x3, 1600
> >         bne     .L6
> >         ret
> > 
> > Which I assume is because SLP_TREE_REPRESENTATIVE is pointing to the
> > rotate 270?

Or maybe due to the lane permutations.

> > > Note the ultimate idea in the end is that a SLP node can, of course,
> > > be split into two [but at this point the vector type / unroll factor
> > > is not final so general splitting at vector boundary is not desired yet].
> > > The split can be undone for consumers by inserting a VEC_PERM node
> > > (which should semantically be a concat + select)
> > >
> > > +      tree type = gimple_expr_type (STMT_VINFO_STMT (stmt_info));
> > > +      tree vectype = get_vectype_for_scalar_type (vinfo, type, node);
> > >
> > > use
> > >
> > >       tree vectype = SLP_TREE_VECTYPE (node);
> > >
> > > generally avoid looking at scalar stmts, iff then look at
> > > SLP_TREE_REPRESENTATIVE - all lanes have uniform operations applied to
> > > (but the scalar stmts may not appear to do so!  the scalar stmts
> > > merely stand for their 'def').
> > >
> > > +  /* Perform recursive matching, it's important to do this after
> > > + matching
> > > things
> > > +     in the current node as the matches here may re-order the nodes
> > > + below
> > > it.
> > > +     As such the pattern that needs to be subsequently match may change.
> > > */
> > > +
> > > +  if (SLP_TREE_CHILDREN (node).exists ()) {
> > > +    slp_tree child;
> > > +    FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
> > > +      found_rec_p |= vect_match_slp_patterns_2 (child, vinfo, group_size,
> > > +                                               patt_fn, max_nunits,
> > > matches,
> > > +                                               npermutes, tree_size,
> > > bst_map);
> > > +  }
> > >
> > >
> > > you definitely need a visited set - you are walking a graph and nodes
> > > can appear along multiple paths!
> > >
> > > +      vect_mark_slp_stmts_relevant (node);
> > >
> > > that walks the whole subgraph but if you need to do anything you at
> > > most want to touch the node itself, no?
> > >
> > > To make patterns-of-patterns viable you need to do all parts of the
> > > walk in post-order.  What breaks if you do ->matches/->validate in
> > > post-order?  I think that would be more future-proof.
> > 
> > You lose the ability to match the longest pattern. As an example the complex
> > add and complex fma patterns overlap. Right now I can try matching the fma
> > first and then add.
> > But doing it in post order the fma woud never match as the subtree would be
> > too small and the add would always match.
> > 
> 
> Oops, I forgot that this new version tries the same pattern over the 
> entire tree. So it should work. You would only lose the ability to 
> navigate by SSA name, but it doesn't need that anyway..
> 
> I'll make that change and see.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> > Aside from that it makes it very difficult to rebuild the subtrees as the SSA
> > names have changed (since build Is already done in post order), So right now
> > I can use e.g. _3, _4 etc, however if the patterns have already been applied I
> > would need to know what their replacements are since build () would
> > replace them and you lose the ability to navigate by SSA name.
> > 
> > Regards,
> > Tamar
> > 
> > >
> > > Otherwise this looks like an OK overall design.
> > >
> > > Thanks for working on it!
> > >
> > > Richard.
> > >
> > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 	* Makefile.in (tree-vect-slp-patterns.o): New.
> > > > 	* doc/passes.texi: Update documentation.
> > > > 	* tree-vect-slp.c (vect_match_slp_patterns_2,
> > > vect_match_slp_patterns):
> > > > 	New.
> > > > 	(vect_analyze_slp_instance): Call pattern matcher.
> > > > 	* tree-vectorizer.h (class VectPatternMatch, class VectPattern): New.
> > > > 	* tree-vect-slp-patterns.c: New file.
> > > >
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > > Nuernberg, Germany; GF: Felix Imend
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


More information about the Gcc-patches mailing list