This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH PR81740]Enforce dependence check for outer loop vectorization


On Fri, Dec 15, 2017 at 1:19 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, Dec 15, 2017 at 1:35 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Fri, Dec 15, 2017 at 12:09 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Fri, Dec 15, 2017 at 11:55 AM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Fri, Dec 15, 2017 at 12:30 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>>> Hi,
>>>>> As explained in the PR, given below test case:
>>>>> int a[8][10] = { [2][5] = 4 }, c;
>>>>>
>>>>> int
>>>>> main ()
>>>>> {
>>>>>   short b;
>>>>>   int i, d;
>>>>>   for (b = 4; b >= 0; b--)
>>>>>     for (c = 0; c <= 6; c++)
>>>>>       a[c + 1][b + 2] = a[c][b + 1];
>>>>>   for (i = 0; i < 8; i++)
>>>>>     for (d = 0; d < 10; d++)
>>>>>       if (a[i][d] != (i == 3 && d == 6) * 4)
>>>>>         __builtin_abort ();
>>>>>   return 0;
>>>>>
>>>>> the loop nest is illegal for vectorization without reversing inner loop.  The issue
>>>>> is in data dependence checking of vectorizer, I believe the mentioned revision just
>>>>> exposed this.  Previously the vectorization is skipped because of unsupported memory
>>>>> operation.  The outer loop vectorization unrolls the outer loop into:
>>>>>
>>>>>   for (b = 4; b > 0; b -= 4)
>>>>>   {
>>>>>     for (c = 0; c <= 6; c++)
>>>>>       a[c + 1][6] = a[c][5];
>>>>>     for (c = 0; c <= 6; c++)
>>>>>       a[c + 1][5] = a[c][4];
>>>>>     for (c = 0; c <= 6; c++)
>>>>>       a[c + 1][4] = a[c][3];
>>>>>     for (c = 0; c <= 6; c++)
>>>>>       a[c + 1][3] = a[c][2];
>>>>>   }
>>>>> Then four inner loops are fused into:
>>>>>   for (b = 4; b > 0; b -= 4)
>>>>>   {
>>>>>     for (c = 0; c <= 6; c++)
>>>>>     {
>>>>>       a[c + 1][6] = a[c][5];  // S1
>>>>>       a[c + 1][5] = a[c][4];  // S2
>>>>>       a[c + 1][4] = a[c][3];
>>>>>       a[c + 1][3] = a[c][2];
>>>>>     }
>>>>>   }
>>>>
>>>> Note that they are not really "fused" but they are interleaved.  With
>>>> GIMPLE in mind
>>>> that makes a difference, you should get the equivalent of
>>>>
>>>>    for (c = 0; c <= 6; c++)
>>>>      {
>>>>        tem1 = a[c][5];
>>>>        tem2 = a[c][4];
>>>>        tem3 = a[c][3];
>>>>        tem4 = a[c][2];
>>>>        a[c+1][6] = tem1;
>>>>        a[c +1][5] = tem2;
>>>>         a[c+1][4] = tem3;
>>>>         a[c+1][3] = tem4;
>>>>      }
>>> Yeah, I will double check if this abstract breaks the patch and how.
>> Hmm, I think this doesn't break it, well at least for part of the
>> analysis, because it is loop carried (backward) dependence goes wrong,
>> interleaving or not with the same iteration doesn't matter here.
>
> I think the idea is that forward dependences are always fine (negative distance)
> to vectorize.  But with backward dependences we have to adhere to max_vf.
>
> It looks like for outer loop vectorization we only look at the distances in the
> outer loop but never at inner ones.  But here the same applies but isn't that
> independend on the distances with respect to the outer loop?
>
> But maybe I'm misunderstanding how "distances" work here.
Hmm, I am not sure I understand "distance" correctly.  With
description as in book like "Optimizing compilers for Modern
Architectures", distance is "# of iteration of sink ref - # of
iteration of source ref".  Given below example:
  for (i = 0; i < N; ++i)
    {
      x = arr[idx_1];  // S1
      arr[idx_2] = x;  // S2
    }
if S1 is source ref, distance = idx_2 - idx_1, and distance > 0.  Also
this is forward dependence.  For example, idx_1 is i + 1 and idx_2 is
i;
If S2 is source ref, distance = idx_1 - idx_2, and distance < 0.  Also
this is backward dependence.  For example idx_1 is i and idx_2 is i +
1;

In GCC, we always try to subtract idx_2 from idx_1 first in computing
classic distance, we could result in negative distance in case of
backward dependence.  When this happens at dependence carried level,
we manually reverse it.  When this happens at inner level loop, we
simply keep the negative distance.  And it's meaningless to talk about
forward/backward given dependence is carried at outer level loop.

Outer loop vectorization is interesting.  The problematic case has
backward dependence carried by outer loop.  Because we don't check
dist vs. max_vf for such dep, the unrolled references could have outer
loop index equal to each other, as in a[c][5] and a[c+1][5].  So it's
like we have outer loop index resolved as equal.  Now it has
dependence only if c == c' + 1.  I think previous comment on fusion
still hold for this and we now need to check backward dependence
between the two reference at inner level loop.  I guess it's a bit
trick to model dependence between a[c][5] and a[c+1][5] using the
original references and dependence.  I think it's okay to do that
because distance of a[c][5] and a[c+1][5] is what we computed
previously for the original references at inner level loop.

Not sure if I have missed something important.

Thanks,
bin
>
> Richard.
>
>> Thanks,
>> bin
>>>
>>>>
>>>>> The loop fusion needs to meet the dependence requirement.  Basically, GCC's data
>>>>> dependence analyzer does not model dep between references in sibling loops, but
>>>>> in practice, fusion requirement can be checked by analyzing all data references
>>>>> after fusion, and there is no backward data dependence.
>>>>>
>>>>> Apparently, the requirement is violated because we have backward data dependence
>>>>> between references (a[c][5], a[c+1][5]) in S1/S2.  Note, if we reverse the inner
>>>>> loop, the outer loop would become legal for vectorization.
>>>>>
>>>>> This patch fixes the issue by enforcing dependence check.  It also adds two tests
>>>>> with one shouldn't be vectorized and the other should.  Bootstrap and test on x86_64
>>>>> and AArch64.  Is it OK?
>>>>
>>>> I think you have identified the spot where things go wrong but I'm not
>>>> sure you fix the
>>>> problem fully.  The spot you pacth is (loop is the outer loop):
>>>>
>>>>   loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr));
>>>> ...
>>>>   FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v)
>>>>     {
>>>>       int dist = dist_v[loop_depth];
>>>> ...
>>>>       if (dist > 0 && DDR_REVERSED_P (ddr))
>>>>         {
>>>>           /* If DDR_REVERSED_P the order of the data-refs in DDR was
>>>>              reversed (to make distance vector positive), and the actual
>>>>              distance is negative.  */
>>>>           if (dump_enabled_p ())
>>>>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>>>                              "dependence distance negative.\n");
>>>>
>>>> where you add
>>>>
>>>> +         /* When doing outer loop vectorization, we need to check if there is
>>>> +            backward dependence at inner loop level if dependence at the outer
>>>> +            loop is reversed.  See PR81740 for more information.  */
>>>> +         if (nested_in_vect_loop_p (loop, DR_STMT (dra))
>>>> +             || nested_in_vect_loop_p (loop, DR_STMT (drb)))
>>>> +           {
>>>> +             unsigned inner_depth = index_in_loop_nest (loop->inner->num,
>>>> +                                                        DDR_LOOP_NEST (ddr));
>>>> +             if (dist_v[inner_depth] < 0)
>>>> +               return true;
>>>> +           }
>>>>
>>>> but I don't understand how the dependence direction with respect to the
>>>> outer loop matters here.
>>> If the direction wrto outer loop is positive by itself, i.e,
>>> reversed_p equals to false, then dist is checked against max_vf.  In
>>> this case, it's not possible to have references refer to the same
>>> object?
>>> On the other hand, dist is not checked at all for reversed case.
>>> Maybe an additional check "dist < max_vf" can relax the patch a bit.
>>>>
>>>> Given there's DDR_REVERSED on the outer loop distance what does that
>>>> mean for the inner loop distance given the quite non-obvious code handling
>>>> this case in tree-data-ref.c:
>>>>
>>>>       /* Verify a basic constraint: classic distance vectors should
>>>>          always be lexicographically positive.
>>>>
>>>>          Data references are collected in the order of execution of
>>>>          the program, thus for the following loop
>>>>
>>>>          | for (i = 1; i < 100; i++)
>>>>          |   for (j = 1; j < 100; j++)
>>>>          |     {
>>>>          |       t = T[j+1][i-1];  // A
>>>>          |       T[j][i] = t + 2;  // B
>>>>          |     }
>>>>
>>>>          references are collected following the direction of the wind:
>>>>          A then B.  The data dependence tests are performed also
>>>>          following this order, such that we're looking at the distance
>>>>          separating the elements accessed by A from the elements later
>>>>          accessed by B.  But in this example, the distance returned by
>>>>          test_dep (A, B) is lexicographically negative (-1, 1), that
>>>>          means that the access A occurs later than B with respect to
>>>>          the outer loop, ie. we're actually looking upwind.  In this
>>>>          case we solve test_dep (B, A) looking downwind to the
>>>>          lexicographically positive solution, that returns the
>>>>          distance vector (1, -1).  */
>>>>       if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr)))
>>>>         {
>>>>           lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr));
>>>>           if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest))
>>>>             return false;
>>>>           compute_subscript_distance (ddr);
>>>>           if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b,
>>>>                                             &index_carry))
>>>>             return false;
>>>>           save_dist_v (ddr, save_v);
>>>>           DDR_REVERSED_P (ddr) = true;
>>>>
>>>> that is, what does dist_v[inner_depth] < 0 tell us here?  Does it tell us
>>>> that the dependence direction is reverse of the outer loop?
>>> Hmm, this part of code is a bit confusing for me.  So we always
>>> compute classic distance by sorting two data references in lexico
>>> order, which could be the opposite given we collect references by dom
>>> order.  IIUC, the negative dist at inner loop means a backward
>>> dependence for that index/loop.  It's not related to outer loop or the
>>> reversed_p flag.
>>>
>>> Thanks,
>>> bin
>>>>
>>>> CCing Micha who might remember some more details from unroll-and-jam.
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>> Thanks,
>>>>> bin
>>>>> 2017-12-15  Bin Cheng  <bin.cheng@arm.com>
>>>>>
>>>>>         PR tree-optimization/81740
>>>>>         * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): In case
>>>>>         of outer loop vectorization, check backward dependence at inner loop
>>>>>         if dependence at outer loop is reversed.
>>>>>
>>>>> gcc/testsuite
>>>>> 2017-12-15  Bin Cheng  <bin.cheng@arm.com>
>>>>>
>>>>>         PR tree-optimization/81740
>>>>>         * gcc.dg/vect/pr81740-1.c: New test.
>>>>>         * gcc.dg/vect/pr81740-2.c: Refine test.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]