[PATCH GCC][5/7]Extend loop distribution for two-level innermost loop nest

Richard Biener richard.guenther@gmail.com
Wed Oct 11 12:24:00 GMT 2017


On Wed, Oct 11, 2017 at 2:05 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Mon, Oct 9, 2017 at 2:48 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Thu, Oct 5, 2017 at 3:17 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>> Hi,
>>> For now distribution pass only handles the innermost loop.  This patch extends the pass
>>> to cover two-level innermost loop nest.  It also refactors code in pass_loop_distribution::execute
>>> for better reading.  Note I restrict it to 2-level loop nest on purpose because of high
>>> cost in data dependence computation.  Some compilation time optimizations like reusing
>>> the data reference finding, data dependence computing, would require a rewrite of this
>>> pass like the proposed loop interchange implementation.  But that's another task.
>>>
>>> This patch introduces a temporary TODO for loop nest builtin partition which is covered
>>> by next two patches.
>>>
>>> With this patch, kernel loop in bwaves now can be distributed, thus exposed for further
>>> interchange.  This patch adds new test for matrix multiplication, as well as adjusts
>>> test strings of existing tests.
>>> Bootstrap and test in patch set on x86_64 and AArch64, is it OK?
>>
>> @ -714,9 +719,11 @@ ssa_name_has_uses_outside_loop_p (tree def, loop_p loop)
>>
>>    FOR_EACH_IMM_USE_FAST (use_p, imm_iter, def)
>>      {
>> -      gimple *use_stmt = USE_STMT (use_p);
>> -      if (!is_gimple_debug (use_stmt)
>> -         && loop != loop_containing_stmt (use_stmt))
>> +      if (is_gimple_debug (USE_STMT (use_p)))
>> +       continue;
>> +
>> +      basic_block use_bb = gimple_bb (USE_STMT (use_p));
>> +      if (use_bb == NULL || !flow_bb_inside_loop_p (loop, use_bb))
>>         return true;
>>
>> use_bb should never be NULL.
> Done.
>>
>> +      /* Don't support loop nest distribution under runtime alias check
>> +        since it's not likely to enable many vectorization opportunities.  */
>> +      if (loop->inner)
>> +       {
>> +         merge_dep_scc_partitions (rdg, &partitions, false);
>> +       }
>>
>> extra {}
> Done.
>>
>> +      /* Support loop nest distribution enclosing current innermost loop.
>> +        For the moment, we only support the innermost two-level loop nest.  */
>> +      if (flag_tree_loop_distribution
>> +         && outer->num > 0 && outer->inner == loop && loop->next == NULL
>>
>> The canonical check for is-this-non-root is loop_outer (outer) instead
>> of outer->num > 0.
> Done.
>>
>> +         && single_exit (outer)
>>
>> not sure how exits are counted but if the inner loop exits also the
>> outer loop do
>> we correctly handle/reject this case?
> I tend to believe this can be handled if it's not rejected by
> niters/exit condition,
> but I am not very sure about this.
>>
>> -      if (nb_generated_loops + nb_generated_calls > 0)
>> -       {
>> -         changed = true;
>> -         dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
>> -                          loc, "Loop %d distributed: split to %d loops "
>> -                          "and %d library calls.\n",
>> -                          num, nb_generated_loops, nb_generated_calls);
>> +         if (nb_generated_loops + nb_generated_calls > 0)
>> +           {
>> +             changed = true;
>> +             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
>> +                              loc, "Loop%s %d distributed: split to %d loops "
>> +                              "and %d library calls.\n",
>> +                              loop_nest_p ? " nest" : "", loop->num,
>> +                              nb_generated_loops, nb_generated_call
>> ...
>>
>> can you adjust the printfs to say "loop nest distributed" in case we distributed
>> a nest?
> Done.
>>
>> Can you rewrite the iteration over the nest so it would theoretically support
>> arbitrary deep perfect nests?  Thus simply initialize loop_nest_p less
>> cleverly...
> Done.  I factored it out as a function "prepare_perfect_loop_nest".  I
> also tested
> the updated patch by enabling full loop nest distribution, there is no failure
> in bootstrap, regression test, spec benchmarks.  Of course, the final patch
> still only supports 2-level innermost loop nest.
>
> Is this OK?

Ok.

Thanks,
Richard.

> Thanks,
> bin
> 2017-10-04  Bin Cheng  <bin.cheng@arm.com>
>
>     * tree-loop-distribution.c: Adjust the general comment.
>     (NUM_PARTITION_THRESHOLD): New macro.
>     (ssa_name_has_uses_outside_loop_p): Support loop nest distribution.
>     (classify_partition): Skip builtin pattern of loop nest's inner loop.
>     (merge_dep_scc_partitions): New parameter ignore_alias_p and use it
>     in call to build_partition_graph.
>     (finalize_partitions): New parameter.  Make loop distribution more
>     conservative by fusing more partitions.
>     (distribute_loop): Don't do runtime alias check in case of loop nest
>     distribution.
>     (find_seed_stmts_for_distribution): New function.
>     (prepare_perfect_loop_nest): New function.
>     (pass_loop_distribution::execute): Refactor code finding seed stmts
>     and loop nest into above functions.  Support loop nest distribution.
>     Adjust dump information accordingly.
>
> gcc/testsuite/ChangeLog
> 2017-10-04  Bin Cheng  <bin.cheng@arm.com>
>
>     * gcc.dg/tree-ssa/ldist-7.c: Adjust test string.
>     * gcc.dg/tree-ssa/ldist-16.c: Ditto.
>     * gcc.dg/tree-ssa/ldist-25.c: Ditto.
>     * gcc.dg/tree-ssa/ldist-33.c: New test.



More information about the Gcc-patches mailing list