This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH GCC][04/13]Sort statements in topological order for loop distribution
On Wed, Jun 14, 2017 at 11:25 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Wed, Jun 14, 2017 at 10:15 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Wed, Jun 14, 2017 at 9:53 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Tue, Jun 13, 2017 at 11:59 AM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Mon, Jun 12, 2017 at 7:02 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>>> Hi,
>>>>> During the work I ran into a latent bug for distributing. For the moment we sort statements
>>>>> in dominance order, but that's not enough because basic blocks may be sorted in reverse order
>>>>> of execution flow. This results in wrong data dependence direction later. This patch fixes
>>>>> the issue by sorting in topological order.
>>>>>
>>>>> Bootstrap and test on x86_64 and AArch64. Is it OK?
>>>>
>>>> I suppose you are fixing
>>>>
>>>> static int
>>>> pg_add_dependence_edges (struct graph *rdg, vec<loop_p> loops, int dir,
>>>> vec<data_reference_p> drs1,
>>>> vec<data_reference_p> drs2)
>>>> {
>>>> ...
>>>> /* Re-shuffle data-refs to be in dominator order. */
>>>> if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1))
>>>> > rdg_vertex_for_stmt (rdg, DR_STMT (dr2)))
>>>> {
>>>> std::swap (dr1, dr2);
>>>> this_dir = -this_dir;
>>>> }
>>>>
>>>> but then for stmts that are not "ordered" by RPO or DOM like
>>>>
>>>> if (flag)
>>>> ... = dr1;
>>>> else
>>>> ... = dr2;
>>>>
>>>> this doesn't avoid spurious swaps? Also the code was basically
>>> No, this is mainly for below case:
>>> if (flag)
>>> {
>>> partition1: arr[i] = x;
>>> }
>>> partition2: arr[i] = y;
>>>
>>> function pg_add_dependence_edges is like:
>>> /* Re-shuffle data-refs to be in dominator order. */
>>> if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1))
>>> > rdg_vertex_for_stmt (rdg, DR_STMT (dr2)))
>>> {
>>> std::swap (dr1, dr2);
>>> this_dir = -this_dir;
>>> }
>>> //...
>>> else if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE)
>>> {
>>> if (DDR_REVERSED_P (ddr))
>>> {
>>> std::swap (dr1, dr2);
>>> this_dir = -this_dir;
>>> }
>>> /* Known dependences can still be unordered througout the
>>> iteration space, see gcc.dg/tree-ssa/ldist-16.c. */
>>> if (DDR_NUM_DIST_VECTS (ddr) != 1)
>>> this_dir = 2;
>>> /* If the overlap is exact preserve stmt order. */
>>> else if (lambda_vector_zerop (DDR_DIST_VECT (ddr, 0), 1))
>>> ;
>>> else
>>> {
>>> /* Else as the distance vector is lexicographic positive
>>> swap the dependence direction. */
>>> this_dir = -this_dir;
>>> }
>>> }
>>> For non-ZERO distance vector dependence, the second part always
>>> computes src->dest dependence info correctly, as well as edge
>>> direction of PG. For ZERO distance vector dependence, we rely on the
>>> swap part (thus topological order) to get correct dependence
>>> direction. For mentioned case, the two data references are unordered
>>> in dominance relation, but ordered in RPO. This is why DOM is not
>>> enough. For if-then-else case, the order actually doesn't matter, and
>>> the references are unordered in either dominance relation or RPO.
>>> Specifically, src->dest is always computed correctly for non-ZERO
>>> distance vector cases, no matter <dr1, dr2> or <dr2, dr1> is passed to
>>> data dependence analyzer. As for ZERO distance vector (exact
>>> overlap), the order doesn't matter either because they control
>>> dependent on the same condition. We can simply assume an arbitrary
>>> order for it.
>>
>> Ok, if it only is an issue for zero-distance then yes, I agree.
> Yeah, so for distribution, I think we can further simplify
> pg_add_dependence_edge because swap is only necessary for
> zero-distance cases.
>>
>>>> copied from tree-data-refs.c:find_data_references_in_loop which
>>>> does iterate over get_loop_body_in_dom_order as well. So isn't the
>>>> issue latent there as well?
>>> In theory maybe. In practice, this is not a problem at all since loop
>>> distribution is the only one handles control dependence so far.
>>
>> You mean the only one running into the bogus BB ordering case.
>> I don't see how handling control dependences factor in here.
> Yes, that's what I meant. The ordering issue doesn't exist if there
> is no control dependence between basic blocks in loop body I think. I
> didn't realize the autopar pass.
>>
>>>>
>>>> That said, what about those "unordered" stmts? I suppose
>>>> dependence analysis still happily computes a dependence
>>>> distance but in reality we'd have to consider both execution
>>>> orders?
>>> As explained, there is no need to consider both orders. GCC doesn't
>>> really support control dependence parallelization?
>>
>> I think autopar supports an arbitrary CFG inside the loops but as it
>> will never split them it won't change stmt ordering for zero-distance.
>>
>> That said, if dependence info can be incorrect if applied to a loop
>> we should fixup tree-data-refs.c as well. It might be useful
>> to make get_loop_body_in_rpo_order available then (and eventually
>> all _in_dom_order users can work with rpo order as well thus we
>> can replace it entirely as a second step).
> I can work on that as a followup patch. Is it OK?
Yes.
Thanks,
Richard.
> Thanks,
> bin
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> bin
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> bin
>>>>> 2017-06-07 Bin Cheng <bin.cheng@arm.com>
>>>>>
>>>>> * tree-loop-distribution.c (bb_top_order_index): New.
>>>>> (bb_top_order_index_size, bb_top_order_cmp): New.
>>>>> (stmts_from_loop): Use topological order.
>>>>> (pass_loop_distribution::execute): Compute topological order for.
>>>>> basic blocks.