This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH GCC][13/13]Distribute loop with loop versioning under runtime alias check


On Fri, Jun 23, 2017 at 12:30 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Tue, Jun 20, 2017 at 10:22 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Mon, Jun 12, 2017 at 6:03 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>> Hi,
>>> This is the main patch rewriting loop distribution in order to handle hmmer.
>>> It improves loop distribution by versioning loop under runtime alias check conditions.
>>> As described in comments, the patch basically implements distribution in the following
>>> steps:
>>>
>>>      1) Seed partitions with specific type statements.  For now we support
>>>         two types seed statements: statement defining variable used outside
>>>         of loop; statement storing to memory.
>>>      2) Build reduced dependence graph (RDG) for loop to be distributed.
>>>         The vertices (RDG:V) model all statements in the loop and the edges
>>>         (RDG:E) model flow and control dependencies between statements.
>>>      3) Apart from RDG, compute data dependencies between memory references.
>>>      4) Starting from seed statement, build up partition by adding depended
>>>         statements according to RDG's dependence information.  Partition is
>>>         classified as parallel type if it can be executed paralleled; or as
>>>         sequential type if it can't.  Parallel type partition is further
>>>         classified as different builtin kinds if it can be implemented as
>>>         builtin function calls.
>>>      5) Build partition dependence graph (PG) based on data dependencies.
>>>         The vertices (PG:V) model all partitions and the edges (PG:E) model
>>>         all data dependencies between every partitions pair.  In general,
>>>         data dependence is either compilation time known or unknown.  In C
>>>         family languages, there exists quite amount compilation time unknown
>>>         dependencies because of possible alias relation of data references.
>>>         We categorize PG's edge to two types: "true" edge that represents
>>>         compilation time known data dependencies; "alias" edge for all other
>>>         data dependencies.
>>>      6) Traverse subgraph of PG as if all "alias" edges don't exist.  Merge
>>>         partitions in each strong connected component (SCC) correspondingly.
>>>         Build new PG for merged partitions.
>>>      7) Traverse PG again and this time with both "true" and "alias" edges
>>>         included.  We try to break SCCs by removing some edges.  Because
>>>         SCCs by "true" edges are all fused in step 6), we can break SCCs
>>>         by removing some "alias" edges.  It's NP-hard to choose optimal
>>>         edge set, fortunately simple approximation is good enough for us
>>>         given the small problem scale.
>>>      8) Collect all data dependencies of the removed "alias" edges.  Create
>>>         runtime alias checks for collected data dependencies.
>>>      9) Version loop under the condition of runtime alias checks.  Given
>>>         loop distribution generally introduces additional overhead, it is
>>>         only useful if vectorization is achieved in distributed loop.  We
>>>         version loop with internal function call IFN_LOOP_DIST_ALIAS.  If
>>>         no distributed loop can be vectorized, we simply remove distributed
>>>         loops and recover to the original one.
>>>
>>> Also, there are some more to improve in the future (which isn't difficult I think):
>>>    TODO:
>>>      1) We only distribute innermost loops now.  This pass should handle loop
>>>         nests in the future.
>>>      2) We only fuse partitions in SCC now.  A better fusion algorithm is
>>>         desired to minimize loop overhead, maximize parallelism and maximize
>>>
>>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>>
>> Trivial updated due to changes in previous patches.  Also fixed issues
>> mentioned by Kugan.
> Rebased V3 for changes in previous patches.  Bootstap and test on
> x86_64 and aarch64.

why is ldist-12.c no longer distributed?  your comment says it doesn't expose
more "parallelism" but the point is to reduce memory bandwith requirements
which it clearly does.

Likewise for -13.c, -14.c.  -4.c may be a questionable case but the wording
of "parallelism" still confuses me.

Can you elaborate on that.  Now onto the patch:

+   Loop distribution is the dual of loop fusion.  It separates statements
+   of a loop (or loop nest) into multiple loops (or loop nests) with the
+   same loop header.  The major goal is to separate statements which may
+   be vectorized from those that can't.  This pass implements distribution
+   in the following steps:

misses the goal of being a memory stream optimization, not only a vectorization
enabler.  distributing a loop can also reduce register pressure.

You introduce ldist_alias_id in struct loop (probably in 01/n which I
didn't look
into yet).  If you don't use that please introduce it separately.

+             /* Be conservative.  If data references are not well analyzed,
+                or the two data references have the same base address and
+                offset, add dependence and consider it alias to each other.
+                In other words, the dependence can not be resolved by
+                runtime alias check.  */
+             if (!DR_BASE_ADDRESS (dr1) || !DR_BASE_ADDRESS (dr2)
+                 || !DR_OFFSET (dr1) || !DR_OFFSET (dr2)
+                 || !DR_INIT (dr1) || !DR_INIT (dr2)
+                 || !DR_STEP (dr1) || !tree_fits_uhwi_p (DR_STEP (dr1))
+                 || !DR_STEP (dr2) || !tree_fits_uhwi_p (DR_STEP (dr2))
+                 || res == 0)

ISTR a helper that computes whether we can handle a runtime alias check for
a specific case?

+  /* Depend on vectorizer to fold IFN_LOOP_DIST_ALIAS.  */
+  if (flag_tree_loop_vectorize)
+    {

so at this point I'd condition the whole runtime-alias check generating
on flag_tree_loop_vectorize.  You seem to support versioning w/o
that here but in other places disable versioning w/o flag_tree_loop_vectorize.
That looks somewhat inconsistent...

+  /* Don't version loop if any partition is builtin.  */
+  for (i = 0; partitions->iterate (i, &partition); ++i)
+    {
+      if (partition->kind != PKIND_NORMAL)
+       break;
+    }

why's that?  Do you handle the case where only a subset of partitions
require a runtime alias check to be generated?  Thus from a loop
generate three loops, only two of them being versioned?  The
complication would be that both the runtime alias and non-alias
versions would be "transformed".  Or do we rely on recursively
distributing the distribution result, thus if we have partitions that
can be handled w/o runtime alias check fuse the remaining partitions
and recurse on those?

Otherwise the patch looks good to me.

Thanks,
Richard.

> Thanks,
> bin
>>
>> Thanks,
>> bin
>>
>> 2017-06-17  Bin Cheng  <bin.cheng@arm.com>
>>
>>     * tree-loop-distribution.c: Add general explanantion on the pass.
>>     (pg_add_dependence_edges): New parameter.  handle alias data
>>     dependence specially and record it in the parameter if asked.
>>     (struct pg_vdata, pg_edata, pg_edge_callback_data): New structs.
>>     (init_partition_graph_vertices, add_partition_graph_edge): New.
>>     (pg_skip_alias_edge, free_partition_graph_edata_cb): New.
>>     (free_partition_graph_vdata, build_partition_graph): New.
>>     (sort_partitions_by_post_order, merge_dep_scc_partitions): New.
>>     (pg_collect_alias_ddrs, break_alias_scc_partitions): New.
>>     (data_ref_segment_size, latch_dominated_by_data_ref): New.
>>     (compute_alias_check_pairs, version_loop_by_alias_check): New.
>>     (version_for_distribution_p, finalize_partitions): New.
>>     (distribute_loop): Handle alias data dependence specially.  Factor
>>     out loop fusion code as functions and call these functions.
>>
>> gcc/testsuite/ChangeLog
>> 2017-06-17  Bin Cheng  <bin.cheng@arm.com>
>>
>>     * gcc.dg/tree-ssa/ldist-4.c: Adjust test string.
>>     * gcc.dg/tree-ssa/ldist-12.c: Ditto.
>>     * gcc.dg/tree-ssa/ldist-13.c: Ditto.
>>     * gcc.dg/tree-ssa/ldist-14.c: Ditto.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]