[PATCH PR81740]Enforce dependence check for outer loop vectorization

Bin.Cheng amker.cheng@gmail.com
Fri Dec 15 12:09:00 GMT 2017


On Fri, Dec 15, 2017 at 11:55 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, Dec 15, 2017 at 12:30 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>> Hi,
>> As explained in the PR, given below test case:
>> int a[8][10] = { [2][5] = 4 }, c;
>>
>> int
>> main ()
>> {
>>   short b;
>>   int i, d;
>>   for (b = 4; b >= 0; b--)
>>     for (c = 0; c <= 6; c++)
>>       a[c + 1][b + 2] = a[c][b + 1];
>>   for (i = 0; i < 8; i++)
>>     for (d = 0; d < 10; d++)
>>       if (a[i][d] != (i == 3 && d == 6) * 4)
>>         __builtin_abort ();
>>   return 0;
>>
>> the loop nest is illegal for vectorization without reversing inner loop.  The issue
>> is in data dependence checking of vectorizer, I believe the mentioned revision just
>> exposed this.  Previously the vectorization is skipped because of unsupported memory
>> operation.  The outer loop vectorization unrolls the outer loop into:
>>
>>   for (b = 4; b > 0; b -= 4)
>>   {
>>     for (c = 0; c <= 6; c++)
>>       a[c + 1][6] = a[c][5];
>>     for (c = 0; c <= 6; c++)
>>       a[c + 1][5] = a[c][4];
>>     for (c = 0; c <= 6; c++)
>>       a[c + 1][4] = a[c][3];
>>     for (c = 0; c <= 6; c++)
>>       a[c + 1][3] = a[c][2];
>>   }
>> Then four inner loops are fused into:
>>   for (b = 4; b > 0; b -= 4)
>>   {
>>     for (c = 0; c <= 6; c++)
>>     {
>>       a[c + 1][6] = a[c][5];  // S1
>>       a[c + 1][5] = a[c][4];  // S2
>>       a[c + 1][4] = a[c][3];
>>       a[c + 1][3] = a[c][2];
>>     }
>>   }
>
> Note that they are not really "fused" but they are interleaved.  With
> GIMPLE in mind
> that makes a difference, you should get the equivalent of
>
>    for (c = 0; c <= 6; c++)
>      {
>        tem1 = a[c][5];
>        tem2 = a[c][4];
>        tem3 = a[c][3];
>        tem4 = a[c][2];
>        a[c+1][6] = tem1;
>        a[c +1][5] = tem2;
>         a[c+1][4] = tem3;
>         a[c+1][3] = tem4;
>      }
Yeah, I will double check if this abstract breaks the patch and how.

>
>> The loop fusion needs to meet the dependence requirement.  Basically, GCC's data
>> dependence analyzer does not model dep between references in sibling loops, but
>> in practice, fusion requirement can be checked by analyzing all data references
>> after fusion, and there is no backward data dependence.
>>
>> Apparently, the requirement is violated because we have backward data dependence
>> between references (a[c][5], a[c+1][5]) in S1/S2.  Note, if we reverse the inner
>> loop, the outer loop would become legal for vectorization.
>>
>> This patch fixes the issue by enforcing dependence check.  It also adds two tests
>> with one shouldn't be vectorized and the other should.  Bootstrap and test on x86_64
>> and AArch64.  Is it OK?
>
> I think you have identified the spot where things go wrong but I'm not
> sure you fix the
> problem fully.  The spot you pacth is (loop is the outer loop):
>
>   loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr));
> ...
>   FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v)
>     {
>       int dist = dist_v[loop_depth];
> ...
>       if (dist > 0 && DDR_REVERSED_P (ddr))
>         {
>           /* If DDR_REVERSED_P the order of the data-refs in DDR was
>              reversed (to make distance vector positive), and the actual
>              distance is negative.  */
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                              "dependence distance negative.\n");
>
> where you add
>
> +         /* When doing outer loop vectorization, we need to check if there is
> +            backward dependence at inner loop level if dependence at the outer
> +            loop is reversed.  See PR81740 for more information.  */
> +         if (nested_in_vect_loop_p (loop, DR_STMT (dra))
> +             || nested_in_vect_loop_p (loop, DR_STMT (drb)))
> +           {
> +             unsigned inner_depth = index_in_loop_nest (loop->inner->num,
> +                                                        DDR_LOOP_NEST (ddr));
> +             if (dist_v[inner_depth] < 0)
> +               return true;
> +           }
>
> but I don't understand how the dependence direction with respect to the
> outer loop matters here.
If the direction wrto outer loop is positive by itself, i.e,
reversed_p equals to false, then dist is checked against max_vf.  In
this case, it's not possible to have references refer to the same
object?
On the other hand, dist is not checked at all for reversed case.
Maybe an additional check "dist < max_vf" can relax the patch a bit.
>
> Given there's DDR_REVERSED on the outer loop distance what does that
> mean for the inner loop distance given the quite non-obvious code handling
> this case in tree-data-ref.c:
>
>       /* Verify a basic constraint: classic distance vectors should
>          always be lexicographically positive.
>
>          Data references are collected in the order of execution of
>          the program, thus for the following loop
>
>          | for (i = 1; i < 100; i++)
>          |   for (j = 1; j < 100; j++)
>          |     {
>          |       t = T[j+1][i-1];  // A
>          |       T[j][i] = t + 2;  // B
>          |     }
>
>          references are collected following the direction of the wind:
>          A then B.  The data dependence tests are performed also
>          following this order, such that we're looking at the distance
>          separating the elements accessed by A from the elements later
>          accessed by B.  But in this example, the distance returned by
>          test_dep (A, B) is lexicographically negative (-1, 1), that
>          means that the access A occurs later than B with respect to
>          the outer loop, ie. we're actually looking upwind.  In this
>          case we solve test_dep (B, A) looking downwind to the
>          lexicographically positive solution, that returns the
>          distance vector (1, -1).  */
>       if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr)))
>         {
>           lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr));
>           if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest))
>             return false;
>           compute_subscript_distance (ddr);
>           if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b,
>                                             &index_carry))
>             return false;
>           save_dist_v (ddr, save_v);
>           DDR_REVERSED_P (ddr) = true;
>
> that is, what does dist_v[inner_depth] < 0 tell us here?  Does it tell us
> that the dependence direction is reverse of the outer loop?
Hmm, this part of code is a bit confusing for me.  So we always
compute classic distance by sorting two data references in lexico
order, which could be the opposite given we collect references by dom
order.  IIUC, the negative dist at inner loop means a backward
dependence for that index/loop.  It's not related to outer loop or the
reversed_p flag.

Thanks,
bin
>
> CCing Micha who might remember some more details from unroll-and-jam.
>
> Thanks,
> Richard.
>
>> Thanks,
>> bin
>> 2017-12-15  Bin Cheng  <bin.cheng@arm.com>
>>
>>         PR tree-optimization/81740
>>         * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): In case
>>         of outer loop vectorization, check backward dependence at inner loop
>>         if dependence at outer loop is reversed.
>>
>> gcc/testsuite
>> 2017-12-15  Bin Cheng  <bin.cheng@arm.com>
>>
>>         PR tree-optimization/81740
>>         * gcc.dg/vect/pr81740-1.c: New test.
>>         * gcc.dg/vect/pr81740-2.c: Refine test.



More information about the Gcc-patches mailing list