This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [patch] Loop-aware SLP 1/5

From: Dorit Nuzman <DORIT at il dot ibm dot com>
To: Ira Rosen <IRAR at il dot ibm dot com>
Cc: gcc-patches at gcc dot gnu dot org
Date: Sat, 18 Aug 2007 20:39:06 +0300
Subject: Re: [patch] Loop-aware SLP 1/5

Ira Rosen/Haifa/IBM wrote on 14/08/2007 16:04:37:

> This is the first part of loop-aware SLP patch. This part adds SLP
> data structures and their initialization.
> It also adds an SLP argument for vectorizable_... functions (e.g.,
> vectorizable_store).
>
> During SLP analysis a computation tree that represents an SLP
> instance (a sequence of groups of SLPable stmts) in the loop is
> built. Each node contains a group of scalar stmts that can be packed
> together into vector stmt(s), the vector stmt(s) (after the
> transformation phase), and tree information. We keep this data in
> struct _slp_tree.
>
> SLP instance info (the root to its computation tree, the size of
> groups of scalar stmts packed together, required unrolling factor)
> is represented by struct _slp_instance. For each loop, we keep a VEC
> of its SLP instances.
>

+/* The type of vectorization that can be applied to the stmt: regular
loop-base
+   vectorization, pure SLP (the stmt is a part of SLP instances and does
not
+   have uses outside SLP instances) or hybrid SLP and loop-based (the stmt
is
+   a part of SLP instance and also must be loop-based vectorized, since it
has
+   uses outside SLP sequences).  */
+enum slp_vect_type {
+  loop_vect = 0,
+  pure_slp,
+  hybrid
+};
+

Something that confused me, especially when I went over patch 2/5, is that
you use similar terms to those we used in our summit paper/presentation,
but with a somewhat different meaning:

I thought "pure slp" was meant to convey that we exploit only
intra-iteration parallelism in the loop; i.e., the loop can be vectorized
without doing any unrolling, cause we don't pack together stmts from
different iterations, only within a single iteration. This is a special
case of what you call "pure slp" in the patch - in the patch it means that
all the stmts in the loop can be SLPed, be it with or without unrolling.

"Hybrid slp" I thought was supposed to mean that we exploit both
intra-iteration and inter-iteration parallelism (e.g. VS=4 and the
slp-group-size is 2, in which case we don't have enough parallelism within
an iteration, so we obtain the rest of the parallelism from subsequent
iterations by unrolling the loop by 2. In the patch this case may still be
classified as "pure_slp"). "Hybrid" in the patch is just a special case of
intra+inter iteration parallelism, in which we use both SLP (with or
without unrolling) and loop vectorization (i.e. unrolling by VF).

I understand that the terms "pure" and "hybrid" as defined in the paper (to
classify where the different sources of the parallelism are coming from)
are maybe not so useful for the actual implementation. I wonder if it
wouldn't be better to try to avoid mixing up terms and just come up with
different names for the patch (since we can't change the paper...)? Maybe
something like:

vect_pure_slp     -->  vect_all_slp?

and

+enum slp_vect_type {
+  loop_vect = 0,
+  pure_slp,      --> only_slp_vect?
+  hybrid         --> slp_and_loop_vect?
+};

(In fact, we'd probably need another value to indicate that no
vectorization is applicable to a stmt, to support partial vectorization in
the future: loop_vect, only_slp_vect, slp_and_loop_vect, no_vect).

What do you think? (I'm not saying I necessarily like the names above
better, I just want to avoid confusion. I don't know which naming scheme is
more intuitive to people. I leave the decision to you).

Other than that this patch is ok. I'd appreciate if you could also consider
adding the following documentation:

* writing something about 'vec_stmts_size' where it's defined in
tree-vectorizer.h, just so it's clear why it's needed and why it's not
equal to the group size.

* maybe an example for the data that these fields would hold for a specific
example (if it makes sense)

* just to make sure I understand correctly:

+  /* All interleaving chains of stores in the loop, represented by the
first
+     stmt in the chain.  */
+  VEC(tree, heap) *strided_stores;
+
+  /* All SLP instances in the loop.  */
+  VEC(slp_instance, heap) *slp_instances;
+

would it be correct to say that currently the set of "roots" of all the
"slp_instances" is a subgroup of the "strided_stores" set?

+  /* The unrolling factor needed to SLP the loop.  */
+  unsigned slp_unrolling_factor;
 } *loop_vec_info;

would it be correct to say that in the case of "pure slp" (as defined in
the paper) "slp_unrolling_factor"=1?

thanks,
dorit

> Thanks,
> Ira
>

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]