[PATCH v2 3/16]middle-end Add basic SLP pattern matching scaffolding.
Tamar Christina
Tamar.Christina@arm.com
Mon Sep 28 17:24:03 GMT 2020
> -----Original Message-----
> From: Gcc-patches <gcc-patches-bounces@gcc.gnu.org> On Behalf Of Tamar
> Christina
> Sent: Monday, September 28, 2020 3:56 PM
> To: Richard Biener <rguenther@suse.de>
> Cc: nd <nd@arm.com>; gcc-patches@gcc.gnu.org; ook@ucw.cz
> Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching
> scaffolding.
>
> Hi Richi,
>
> Thanks for the review!
>
> Just some answers to your questions:
>
> > -----Original Message-----
> > From: rguenther@c653.arch.suse.de <rguenther@c653.arch.suse.de> On
> > Behalf Of Richard Biener
> > Sent: Monday, September 28, 2020 1:37 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; ook@ucw.cz
> > Subject: Re: [PATCH v2 3/16]middle-end Add basic SLP pattern matching
> > scaffolding.
> >
> > On Fri, 25 Sep 2020, Tamar Christina wrote:
> >
> > > Hi All,
> > >
> > > This patch adds the basic infrastructure for doing pattern matching
> > > on SLP
> > trees.
> > > This is done immediately after the SLP tree creation because it can
> > > change the shape of the tree in radical ways and so we would like to
> > > do it before any analysis is performed on the tree.
> > >
> > > A new file tree-vect-slp-patterns.c is added which contains all the
> > > code for pattern matching on SLP trees.
> > >
> > > This cover letter is short because the changes are heavily commented.
> > >
> > > All pattern matchers need to implement the abstract type
> > VectPatternMatch.
> > > The VectSimplePatternMatch abstract class provides some default
> > > functionality for pattern matchers that need to rebuild nodes.
> > >
> > > The pattern matcher requires if replacing a statement in a node,
> > > that ALL statements be replaced.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> >
> > + gcall *build ()
> > + {
> > + stmt_vec_info stmt_info;
> > +
> >
> > please define functions out-of-line (apart from the 1-liners)
> >
> > + /* We have to explicitly mark the old statement as unused
> > + because
> > during
> > + statement analysis the original and new pattern statement may
> > require
> > + different level of unrolling. As an example add/sub when
> > vectorized
> > + without a pattern requires 4 copies, whereas with a
> > + COMPLEX_ADD
> > pattern
> > + this only requires 2 copies and the two statement will be
> > + treated
> > as
> > + hand unrolled. That means that the analysis won't happen as
> > it'll find
> > + a mismatch. So we don't analyze the old statement and if we
> > + end
> > up
> > + needing it, e.g. SLP fails then we have to quickly re-analyze it.
> > */
> > + STMT_VINFO_RELEVANT (stmt_info) = vect_unused_in_scope;
> > + STMT_VINFO_SLP_VECT_ONLY (call_stmt_info) = true;
> > + STMT_VINFO_RELATED_STMT (call_stmt_info) = stmt_info;
> >
> > so this means all uses have to be inside the pattern as otherwise
> > there may be even non-SLP uses. vect_mark_pattern_stmts supports
> > detecting patterns of patterns, I suppose the two-phase analysis for
> > SLP patterns does not support this right now?
> >
> > + SLP_TREE_CODE (this->m_node) = gimple_expr_code (call_stmt);;
> >
> > double ;, just make it CALL_EXPR literally (or leave it ERROR_MARK)
> >
> > You seem to do in-place changing of the SLP node you match off?
>
> Yes since this would allow me to change the root node as well, though
> thinking about it I can probably do it by passing it as a reference which then
> would allow me to re-use vect_create_new_slp_node which is probably
> preferable.
>
> >
> > @@ -2192,6 +2378,17 @@ vect_analyze_slp_instance (vec_info *vinfo,
> > &tree_size, bst_map);
> > if (node != NULL)
> > {
> > + /* Temporarily allow add_stmt calls again. */
> > + vinfo->stmt_vec_info_ro = false;
> > +
> > + /* See if any patterns can be found in the constructed SLP tree
> > + before we do any analysis on it. */
> > + vect_match_slp_patterns (node, vinfo, group_size, &max_nunits,
> > + matches, &npermutes, &tree_size,
> > + bst_map);
> > +
> > + /* After this no more add_stmt calls are allowed. */
> > + vinfo->stmt_vec_info_ro = true;
> > +
> >
> > I think this is a bit early to match patterns - I'd defer it to the
> > point where all entries into the same SLP subgraph are analyzed, thus
> > somewhere at the end of vect_analyze_slp loop over all instances and
> > match patterns? That way phases are more clearly separated.
>
> That would probably work, my only worry is that the SLP analysis itself may
> fail and bail out at
>
> /* If the loads and stores can be handled with load/store-lane
> instructions do not generate this SLP instance. */
> if (is_a <loop_vec_info> (vinfo)
> && loads_permuted
> && dr && vect_store_lanes_supported (vectype, group_size,
> false))
>
> Which in the initial tree may be true, but in the patterned tree may not be.
> In the previous revision of the patch you had suggested I return a boolean
> which can be used to cancel such checks. Would that be the preferred
> approach?
>
> >
> > Note that fiddling with vinfo->stmt_vec_info_ro is a bit ugly, maybe
> > add a -
> > >add_pattern_stmt (gimple *pattern_stmt, stmt_vec_info
> > orig_stmt) variant that also sets STMT_VINFO_RELATED_STMT but doesn't
> > check !stmt_vec_info_ro. That could be used from tree-vect-patterns.c
> > as well and we could set stmt_vec_info_ro earlier.
> >
> > + VectPattern *pattern = patt_fn (node, vinfo); uint8_t n =
> > + pattern->get_arity ();
> > +
> > + if (group_size % n != 0)
> > + {
> > + delete pattern;
> >
> > seems to require VectPattern allocation even upon failure, I suggest
> > to return NULL then to avoid excessive allocations.
> >
> > + if (!pattern->matches (stmt_infos, i))
> > + {
> > + /* We can only do replacements for entire groups, we must
> > replace all
> > + statements in a node as the argument list/children may
> > + not
> > have
> > + equal height then. Operations that don't rewrite the
> > arguments
> > + may be safe to do, so perhaps paramatrise it. */
> > +
> > + found_p = false;
> >
> > I find it a bit ugly to iterate over "unrolls" in the machinery rather
> > than the individual pattern matcher which might have an easier and in
> > particular cheaper job here. Since you require
> > all lanes to match the same pattern anyway. Not sure if your
> > later patches support say, mixing complex add with different rotate in
> > the same SLP node.
>
> It does, as the constraint only applies to one pattern matcher class handling
> the entire node.
>
> An example of such case is
>
> node 0x531a1f0 (max_nunits=2, refcnt=2)
> stmt 0 *_9 = _10;
> stmt 1 *_15 = _16;
> stmt 2 *_25 = _26;
> stmt 3 *_31 = _32;
> children 0x531a980
> node 0x531a980 (max_nunits=2, refcnt=2)
> stmt 0 slp_patt_112 = .COMPLEX_ADD_ROT90 (_4, _14); stmt 1 slp_patt_111
> = .COMPLEX_ADD_ROT90 (_12, _8); stmt 2 slp_patt_110
> = .COMPLEX_ADD_ROT270 (_20, _30); stmt 3 slp_patt_109
> = .COMPLEX_ADD_ROT270 (_28, _24); lane permutation { 0[0] 1[1] 1[2] 0[3] }
> children 0x5310680 0x530e040 node 0x5310680 (max_nunits=2, refcnt=4)
> stmt 0 _4 = *_3; stmt 1 _12 = *_11; stmt 2 _20 = *_19; stmt 3 _28 = *_27;
> load permutation { 0 1 2 3 } node 0x530e040 (max_nunits=2, refcnt=2) stmt 0
> _14 = *_13; stmt 1 _8 = *_7; stmt 2 _30 = *_29; stmt 3 _24 = *_23; load
> permutation { 0 1 2 3 }
>
> though looking at the resulting assembly the code is incorrect,
>
> .L6:
> ldr q1, [x1, x3]
> ldr q0, [x0, x3]
> fcadd v0.2d, v0.2d, v1.2d, #270
> str q0, [x2, x3]
> ldr q1, [x5, x3]
> ldr q0, [x6, x3]
> fcadd v0.2d, v0.2d, v1.2d, #270
> str q0, [x4, x3]
> add x3, x3, 32
> cmp x3, 1600
> bne .L6
> ret
>
> Which I assume is because SLP_TREE_REPRESENTATIVE is pointing to the
> rotate 270?
>
> > Note the ultimate idea in the end is that a SLP node can, of course,
> > be split into two [but at this point the vector type / unroll factor
> > is not final so general splitting at vector boundary is not desired yet].
> > The split can be undone for consumers by inserting a VEC_PERM node
> > (which should semantically be a concat + select)
> >
> > + tree type = gimple_expr_type (STMT_VINFO_STMT (stmt_info));
> > + tree vectype = get_vectype_for_scalar_type (vinfo, type, node);
> >
> > use
> >
> > tree vectype = SLP_TREE_VECTYPE (node);
> >
> > generally avoid looking at scalar stmts, iff then look at
> > SLP_TREE_REPRESENTATIVE - all lanes have uniform operations applied to
> > (but the scalar stmts may not appear to do so! the scalar stmts
> > merely stand for their 'def').
> >
> > + /* Perform recursive matching, it's important to do this after
> > + matching
> > things
> > + in the current node as the matches here may re-order the nodes
> > + below
> > it.
> > + As such the pattern that needs to be subsequently match may change.
> > */
> > +
> > + if (SLP_TREE_CHILDREN (node).exists ()) {
> > + slp_tree child;
> > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
> > + found_rec_p |= vect_match_slp_patterns_2 (child, vinfo, group_size,
> > + patt_fn, max_nunits,
> > matches,
> > + npermutes, tree_size,
> > bst_map);
> > + }
> >
> >
> > you definitely need a visited set - you are walking a graph and nodes
> > can appear along multiple paths!
> >
> > + vect_mark_slp_stmts_relevant (node);
> >
> > that walks the whole subgraph but if you need to do anything you at
> > most want to touch the node itself, no?
> >
> > To make patterns-of-patterns viable you need to do all parts of the
> > walk in post-order. What breaks if you do ->matches/->validate in
> > post-order? I think that would be more future-proof.
>
> You lose the ability to match the longest pattern. As an example the complex
> add and complex fma patterns overlap. Right now I can try matching the fma
> first and then add.
> But doing it in post order the fma woud never match as the subtree would be
> too small and the add would always match.
>
Oops, I forgot that this new version tries the same pattern over the entire tree. So it should work.
You would only lose the ability to navigate by SSA name, but it doesn't need that anyway..
I'll make that change and see.
Thanks,
Tamar
> Aside from that it makes it very difficult to rebuild the subtrees as the SSA
> names have changed (since build Is already done in post order), So right now
> I can use e.g. _3, _4 etc, however if the patterns have already been applied I
> would need to know what their replacements are since build () would
> replace them and you lose the ability to navigate by SSA name.
>
> Regards,
> Tamar
>
> >
> > Otherwise this looks like an OK overall design.
> >
> > Thanks for working on it!
> >
> > Richard.
> >
> >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > * Makefile.in (tree-vect-slp-patterns.o): New.
> > > * doc/passes.texi: Update documentation.
> > > * tree-vect-slp.c (vect_match_slp_patterns_2,
> > vect_match_slp_patterns):
> > > New.
> > > (vect_analyze_slp_instance): Call pattern matcher.
> > > * tree-vectorizer.h (class VectPatternMatch, class VectPattern): New.
> > > * tree-vect-slp-patterns.c: New file.
> > >
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > Nuernberg, Germany; GF: Felix Imend
More information about the Gcc-patches
mailing list