This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH GCC][04/13]Sort statements in topological order for loop distribution


On Wed, Jun 14, 2017 at 11:25 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Wed, Jun 14, 2017 at 10:15 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Wed, Jun 14, 2017 at 9:53 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Tue, Jun 13, 2017 at 11:59 AM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Mon, Jun 12, 2017 at 7:02 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>>> Hi,
>>>>> During the work I ran into a latent bug for distributing.  For the moment we sort statements
>>>>> in dominance order, but that's not enough because basic blocks may be sorted in reverse order
>>>>> of execution flow.  This results in wrong data dependence direction later.  This patch fixes
>>>>> the issue by sorting in topological order.
>>>>>
>>>>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>>>
>>>> I suppose you are fixing
>>>>
>>>> static int
>>>> pg_add_dependence_edges (struct graph *rdg, vec<loop_p> loops, int dir,
>>>>                          vec<data_reference_p> drs1,
>>>>                          vec<data_reference_p> drs2)
>>>> {
>>>> ...
>>>>         /* Re-shuffle data-refs to be in dominator order.  */
>>>>         if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1))
>>>>             > rdg_vertex_for_stmt (rdg, DR_STMT (dr2)))
>>>>           {
>>>>             std::swap (dr1, dr2);
>>>>             this_dir = -this_dir;
>>>>           }
>>>>
>>>> but then for stmts that are not "ordered" by RPO or DOM like
>>>>
>>>>    if (flag)
>>>>      ... = dr1;
>>>>    else
>>>>      ... = dr2;
>>>>
>>>> this doesn't avoid spurious swaps?  Also the code was basically
>>> No, this is mainly for below case:
>>>   if (flag)
>>>     {
>>>       partition1: arr[i] = x;
>>>     }
>>>   partition2: arr[i] = y;
>>>
>>> function pg_add_dependence_edges is like:
>>>     /* Re-shuffle data-refs to be in dominator order. */
>>>     if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1))
>>>         > rdg_vertex_for_stmt (rdg, DR_STMT (dr2)))
>>>       {
>>>         std::swap (dr1, dr2);
>>>         this_dir = -this_dir;
>>>       }
>>>     //...
>>>     else if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE)
>>>       {
>>>          if (DDR_REVERSED_P (ddr))
>>>            {
>>>              std::swap (dr1, dr2);
>>>              this_dir = -this_dir;
>>>            }
>>>          /* Known dependences can still be unordered througout the
>>>             iteration space, see gcc.dg/tree-ssa/ldist-16.c.  */
>>>          if (DDR_NUM_DIST_VECTS (ddr) != 1)
>>>            this_dir = 2;
>>>          /* If the overlap is exact preserve stmt order.  */
>>>          else if (lambda_vector_zerop (DDR_DIST_VECT (ddr, 0), 1))
>>>            ;
>>>          else
>>>            {
>>>              /* Else as the distance vector is lexicographic positive
>>>                 swap the dependence direction.  */
>>>              this_dir = -this_dir;
>>>            }
>>>       }
>>> For non-ZERO distance vector dependence, the second part always
>>> computes src->dest dependence info correctly, as well as edge
>>> direction of PG.  For ZERO distance vector dependence, we rely on the
>>> swap part (thus topological order) to get correct dependence
>>> direction.  For mentioned case, the two data references are unordered
>>> in dominance relation, but ordered in RPO.  This is why DOM is not
>>> enough.  For if-then-else case, the order actually doesn't matter, and
>>> the references are unordered in either dominance relation or RPO.
>>> Specifically, src->dest is always computed correctly for non-ZERO
>>> distance vector cases, no matter <dr1, dr2> or <dr2, dr1> is passed to
>>> data dependence analyzer.  As for ZERO distance vector (exact
>>> overlap), the order doesn't matter either because they control
>>> dependent on the same condition.  We can simply assume an arbitrary
>>> order for it.
>>
>> Ok, if it only is an issue for zero-distance then yes, I agree.
> Yeah, so for distribution, I think we can further simplify
> pg_add_dependence_edge because swap is only necessary for
> zero-distance cases.
>>
>>>> copied from tree-data-refs.c:find_data_references_in_loop which
>>>> does iterate over get_loop_body_in_dom_order as well.  So isn't the
>>>> issue latent there as well?
>>> In theory maybe.  In practice, this is not a problem at all since loop
>>> distribution is the only one handles control dependence so far.
>>
>> You mean the only one running into the bogus BB ordering case.
>> I don't see how handling control dependences factor in here.
> Yes, that's what I meant.  The ordering issue doesn't exist if there
> is no control dependence between basic blocks in loop body I think.  I
> didn't realize the autopar pass.
>>
>>>>
>>>> That said, what about those "unordered" stmts?  I suppose
>>>> dependence analysis still happily computes a dependence
>>>> distance but in reality we'd have to consider both execution
>>>> orders?
>>> As explained, there is no need to consider both orders.  GCC doesn't
>>> really support control dependence parallelization?
>>
>> I think autopar supports an arbitrary CFG inside the loops but as it
>> will never split them it won't change stmt ordering for zero-distance.
>>
>> That said, if dependence info can be incorrect if applied to a loop
>> we should fixup tree-data-refs.c as well.  It might be useful
>> to make get_loop_body_in_rpo_order available then (and eventually
>> all _in_dom_order users can work with rpo order as well thus we
>> can replace it entirely as a second step).
> I can work on that as a followup patch.  Is it OK?

Yes.

Thanks,
Richard.

> Thanks,
> bin
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> bin
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> bin
>>>>> 2017-06-07  Bin Cheng  <bin.cheng@arm.com>
>>>>>
>>>>>         * tree-loop-distribution.c (bb_top_order_index): New.
>>>>>         (bb_top_order_index_size, bb_top_order_cmp): New.
>>>>>         (stmts_from_loop): Use topological order.
>>>>>         (pass_loop_distribution::execute): Compute topological order for.
>>>>>         basic blocks.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]