[PATCH v2 3/16]middle-end Add basic SLP pattern matching scaffolding.
Richard Biener
rguenther@suse.de
Tue Sep 29 06:45:06 GMT 2020
On Mon, 28 Sep 2020, Tamar Christina wrote:
>
>
> > -----Original Message-----
> > From: Gcc-patches <gcc-patches-bounces@gcc.gnu.org> On Behalf Of Tamar
> > Christina
> > Sent: Monday, September 28, 2020 3:56 PM
> > To: Richard Biener <rguenther@suse.de>
> > Cc: nd <nd@arm.com>; gcc-patches@gcc.gnu.org; ook@ucw.cz
> > Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching
> > scaffolding.
> >
> > Hi Richi,
> >
> > Thanks for the review!
> >
> > Just some answers to your questions:
> >
> > > -----Original Message-----
> > > From: rguenther@c653.arch.suse.de <rguenther@c653.arch.suse.de> On
> > > Behalf Of Richard Biener
> > > Sent: Monday, September 28, 2020 1:37 PM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; ook@ucw.cz
> > > Subject: Re: [PATCH v2 3/16]middle-end Add basic SLP pattern matching
> > > scaffolding.
> > >
> > > On Fri, 25 Sep 2020, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > This patch adds the basic infrastructure for doing pattern matching
> > > > on SLP
> > > trees.
> > > > This is done immediately after the SLP tree creation because it can
> > > > change the shape of the tree in radical ways and so we would like to
> > > > do it before any analysis is performed on the tree.
> > > >
> > > > A new file tree-vect-slp-patterns.c is added which contains all the
> > > > code for pattern matching on SLP trees.
> > > >
> > > > This cover letter is short because the changes are heavily commented.
> > > >
> > > > All pattern matchers need to implement the abstract type
> > > VectPatternMatch.
> > > > The VectSimplePatternMatch abstract class provides some default
> > > > functionality for pattern matchers that need to rebuild nodes.
> > > >
> > > > The pattern matcher requires if replacing a statement in a node,
> > > > that ALL statements be replaced.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > >
> > > + gcall *build ()
> > > + {
> > > + stmt_vec_info stmt_info;
> > > +
> > >
> > > please define functions out-of-line (apart from the 1-liners)
> > >
> > > + /* We have to explicitly mark the old statement as unused
> > > + because
> > > during
> > > + statement analysis the original and new pattern statement may
> > > require
> > > + different level of unrolling. As an example add/sub when
> > > vectorized
> > > + without a pattern requires 4 copies, whereas with a
> > > + COMPLEX_ADD
> > > pattern
> > > + this only requires 2 copies and the two statement will be
> > > + treated
> > > as
> > > + hand unrolled. That means that the analysis won't happen as
> > > it'll find
> > > + a mismatch. So we don't analyze the old statement and if we
> > > + end
> > > up
> > > + needing it, e.g. SLP fails then we have to quickly re-analyze it.
> > > */
> > > + STMT_VINFO_RELEVANT (stmt_info) = vect_unused_in_scope;
> > > + STMT_VINFO_SLP_VECT_ONLY (call_stmt_info) = true;
> > > + STMT_VINFO_RELATED_STMT (call_stmt_info) = stmt_info;
> > >
> > > so this means all uses have to be inside the pattern as otherwise
> > > there may be even non-SLP uses. vect_mark_pattern_stmts supports
> > > detecting patterns of patterns, I suppose the two-phase analysis for
> > > SLP patterns does not support this right now?
> > >
> > > + SLP_TREE_CODE (this->m_node) = gimple_expr_code (call_stmt);;
> > >
> > > double ;, just make it CALL_EXPR literally (or leave it ERROR_MARK)
> > >
> > > You seem to do in-place changing of the SLP node you match off?
> >
> > Yes since this would allow me to change the root node as well, though
> > thinking about it I can probably do it by passing it as a reference which then
> > would allow me to re-use vect_create_new_slp_node which is probably
> > preferable.
> >
> > >
> > > @@ -2192,6 +2378,17 @@ vect_analyze_slp_instance (vec_info *vinfo,
> > > &tree_size, bst_map);
> > > if (node != NULL)
> > > {
> > > + /* Temporarily allow add_stmt calls again. */
> > > + vinfo->stmt_vec_info_ro = false;
> > > +
> > > + /* See if any patterns can be found in the constructed SLP tree
> > > + before we do any analysis on it. */
> > > + vect_match_slp_patterns (node, vinfo, group_size, &max_nunits,
> > > + matches, &npermutes, &tree_size,
> > > + bst_map);
> > > +
> > > + /* After this no more add_stmt calls are allowed. */
> > > + vinfo->stmt_vec_info_ro = true;
> > > +
> > >
> > > I think this is a bit early to match patterns - I'd defer it to the
> > > point where all entries into the same SLP subgraph are analyzed, thus
> > > somewhere at the end of vect_analyze_slp loop over all instances and
> > > match patterns? That way phases are more clearly separated.
> >
> > That would probably work, my only worry is that the SLP analysis itself may
> > fail and bail out at
> >
> > /* If the loads and stores can be handled with load/store-lane
> > instructions do not generate this SLP instance. */
> > if (is_a <loop_vec_info> (vinfo)
> > && loads_permuted
> > && dr && vect_store_lanes_supported (vectype, group_size,
> > false))
Ah, that piece of code. Yeah, I'm repeatedly running into it as well -
it's a bad hack that stands in the way all the time :/
I guess we should try moving this upward like to
vect_analyze_loop_2 right before
/* Check the SLP opportunities in the loop, analyze and build SLP trees.
*/
ok = vect_analyze_slp (loop_vinfo, *n_stmts);
if (!ok)
return ok;
and there check whether all grouped loads and stores can be handled
with store-/load-lanes (and there are no SLP reduction chains?) in
which case do not try to attempt SLP at all. Because the testcases
this check was supposed to change were all-load/store-lane or
all SLP so the mixed case is probably not worth special casing.
Since load-/store-lanes is an arm speciality I tried to only touch
this fragile part with a ten-foot pole ;) CCing Richard, if he
acks the above I can produce a patch.
> > Which in the initial tree may be true, but in the patterned tree may not be.
> > In the previous revision of the patch you had suggested I return a boolean
> > which can be used to cancel such checks. Would that be the preferred
> > approach?
> > >
> > > Note that fiddling with vinfo->stmt_vec_info_ro is a bit ugly, maybe
> > > add a -
> > > >add_pattern_stmt (gimple *pattern_stmt, stmt_vec_info
> > > orig_stmt) variant that also sets STMT_VINFO_RELATED_STMT but doesn't
> > > check !stmt_vec_info_ro. That could be used from tree-vect-patterns.c
> > > as well and we could set stmt_vec_info_ro earlier.
> > >
> > > + VectPattern *pattern = patt_fn (node, vinfo); uint8_t n =
> > > + pattern->get_arity ();
> > > +
> > > + if (group_size % n != 0)
> > > + {
> > > + delete pattern;
> > >
> > > seems to require VectPattern allocation even upon failure, I suggest
> > > to return NULL then to avoid excessive allocations.
> > >
> > > + if (!pattern->matches (stmt_infos, i))
> > > + {
> > > + /* We can only do replacements for entire groups, we must
> > > replace all
> > > + statements in a node as the argument list/children may
> > > + not
> > > have
> > > + equal height then. Operations that don't rewrite the
> > > arguments
> > > + may be safe to do, so perhaps paramatrise it. */
> > > +
> > > + found_p = false;
> > >
> > > I find it a bit ugly to iterate over "unrolls" in the machinery rather
> > > than the individual pattern matcher which might have an easier and in
> > > particular cheaper job here. Since you require
> > > all lanes to match the same pattern anyway. Not sure if your
> > > later patches support say, mixing complex add with different rotate in
> > > the same SLP node.
> >
> > It does, as the constraint only applies to one pattern matcher class handling
> > the entire node.
> >
> > An example of such case is
> >
> > node 0x531a1f0 (max_nunits=2, refcnt=2)
> > stmt 0 *_9 = _10;
> > stmt 1 *_15 = _16;
> > stmt 2 *_25 = _26;
> > stmt 3 *_31 = _32;
> > children 0x531a980
> > node 0x531a980 (max_nunits=2, refcnt=2)
> > stmt 0 slp_patt_112 = .COMPLEX_ADD_ROT90 (_4, _14); stmt 1 slp_patt_111
> > = .COMPLEX_ADD_ROT90 (_12, _8); stmt 2 slp_patt_110
> > = .COMPLEX_ADD_ROT270 (_20, _30); stmt 3 slp_patt_109
> > = .COMPLEX_ADD_ROT270 (_28, _24); lane permutation { 0[0] 1[1] 1[2] 0[3] }
> > children 0x5310680 0x530e040 node 0x5310680 (max_nunits=2, refcnt=4)
> > stmt 0 _4 = *_3; stmt 1 _12 = *_11; stmt 2 _20 = *_19; stmt 3 _28 = *_27;
> > load permutation { 0 1 2 3 } node 0x530e040 (max_nunits=2, refcnt=2) stmt 0
> > _14 = *_13; stmt 1 _8 = *_7; stmt 2 _30 = *_29; stmt 3 _24 = *_23; load
> > permutation { 0 1 2 3 }
> >
> > though looking at the resulting assembly the code is incorrect,
> >
> > .L6:
> > ldr q1, [x1, x3]
> > ldr q0, [x0, x3]
> > fcadd v0.2d, v0.2d, v1.2d, #270
> > str q0, [x2, x3]
> > ldr q1, [x5, x3]
> > ldr q0, [x6, x3]
> > fcadd v0.2d, v0.2d, v1.2d, #270
> > str q0, [x4, x3]
> > add x3, x3, 32
> > cmp x3, 1600
> > bne .L6
> > ret
> >
> > Which I assume is because SLP_TREE_REPRESENTATIVE is pointing to the
> > rotate 270?
Or maybe due to the lane permutations.
> > > Note the ultimate idea in the end is that a SLP node can, of course,
> > > be split into two [but at this point the vector type / unroll factor
> > > is not final so general splitting at vector boundary is not desired yet].
> > > The split can be undone for consumers by inserting a VEC_PERM node
> > > (which should semantically be a concat + select)
> > >
> > > + tree type = gimple_expr_type (STMT_VINFO_STMT (stmt_info));
> > > + tree vectype = get_vectype_for_scalar_type (vinfo, type, node);
> > >
> > > use
> > >
> > > tree vectype = SLP_TREE_VECTYPE (node);
> > >
> > > generally avoid looking at scalar stmts, iff then look at
> > > SLP_TREE_REPRESENTATIVE - all lanes have uniform operations applied to
> > > (but the scalar stmts may not appear to do so! the scalar stmts
> > > merely stand for their 'def').
> > >
> > > + /* Perform recursive matching, it's important to do this after
> > > + matching
> > > things
> > > + in the current node as the matches here may re-order the nodes
> > > + below
> > > it.
> > > + As such the pattern that needs to be subsequently match may change.
> > > */
> > > +
> > > + if (SLP_TREE_CHILDREN (node).exists ()) {
> > > + slp_tree child;
> > > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
> > > + found_rec_p |= vect_match_slp_patterns_2 (child, vinfo, group_size,
> > > + patt_fn, max_nunits,
> > > matches,
> > > + npermutes, tree_size,
> > > bst_map);
> > > + }
> > >
> > >
> > > you definitely need a visited set - you are walking a graph and nodes
> > > can appear along multiple paths!
> > >
> > > + vect_mark_slp_stmts_relevant (node);
> > >
> > > that walks the whole subgraph but if you need to do anything you at
> > > most want to touch the node itself, no?
> > >
> > > To make patterns-of-patterns viable you need to do all parts of the
> > > walk in post-order. What breaks if you do ->matches/->validate in
> > > post-order? I think that would be more future-proof.
> >
> > You lose the ability to match the longest pattern. As an example the complex
> > add and complex fma patterns overlap. Right now I can try matching the fma
> > first and then add.
> > But doing it in post order the fma woud never match as the subtree would be
> > too small and the add would always match.
> >
>
> Oops, I forgot that this new version tries the same pattern over the
> entire tree. So it should work. You would only lose the ability to
> navigate by SSA name, but it doesn't need that anyway..
>
> I'll make that change and see.
Thanks,
Richard.
> Thanks,
> Tamar
>
> > Aside from that it makes it very difficult to rebuild the subtrees as the SSA
> > names have changed (since build Is already done in post order), So right now
> > I can use e.g. _3, _4 etc, however if the patterns have already been applied I
> > would need to know what their replacements are since build () would
> > replace them and you lose the ability to navigate by SSA name.
> >
> > Regards,
> > Tamar
> >
> > >
> > > Otherwise this looks like an OK overall design.
> > >
> > > Thanks for working on it!
> > >
> > > Richard.
> > >
> > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * Makefile.in (tree-vect-slp-patterns.o): New.
> > > > * doc/passes.texi: Update documentation.
> > > > * tree-vect-slp.c (vect_match_slp_patterns_2,
> > > vect_match_slp_patterns):
> > > > New.
> > > > (vect_analyze_slp_instance): Call pattern matcher.
> > > > * tree-vectorizer.h (class VectPatternMatch, class VectPattern): New.
> > > > * tree-vect-slp-patterns.c: New file.
> > > >
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > > Nuernberg, Germany; GF: Felix Imend
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend
More information about the Gcc-patches
mailing list