This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH PR81740]Enforce dependence check for outer loop vectorization
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Bin Cheng <Bin dot Cheng at arm dot com>, Michael Matz <matz at suse dot de>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>
- Date: Fri, 15 Dec 2017 12:55:31 +0100
- Subject: Re: [PATCH PR81740]Enforce dependence check for outer loop vectorization
- Authentication-results: sourceware.org; auth=none
- References: <DB5PR0801MB27422285F855F93CFAEA210BE70B0@DB5PR0801MB2742.eurprd08.prod.outlook.com>
On Fri, Dec 15, 2017 at 12:30 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
> Hi,
> As explained in the PR, given below test case:
> int a[8][10] = { [2][5] = 4 }, c;
>
> int
> main ()
> {
> short b;
> int i, d;
> for (b = 4; b >= 0; b--)
> for (c = 0; c <= 6; c++)
> a[c + 1][b + 2] = a[c][b + 1];
> for (i = 0; i < 8; i++)
> for (d = 0; d < 10; d++)
> if (a[i][d] != (i == 3 && d == 6) * 4)
> __builtin_abort ();
> return 0;
>
> the loop nest is illegal for vectorization without reversing inner loop. The issue
> is in data dependence checking of vectorizer, I believe the mentioned revision just
> exposed this. Previously the vectorization is skipped because of unsupported memory
> operation. The outer loop vectorization unrolls the outer loop into:
>
> for (b = 4; b > 0; b -= 4)
> {
> for (c = 0; c <= 6; c++)
> a[c + 1][6] = a[c][5];
> for (c = 0; c <= 6; c++)
> a[c + 1][5] = a[c][4];
> for (c = 0; c <= 6; c++)
> a[c + 1][4] = a[c][3];
> for (c = 0; c <= 6; c++)
> a[c + 1][3] = a[c][2];
> }
> Then four inner loops are fused into:
> for (b = 4; b > 0; b -= 4)
> {
> for (c = 0; c <= 6; c++)
> {
> a[c + 1][6] = a[c][5]; // S1
> a[c + 1][5] = a[c][4]; // S2
> a[c + 1][4] = a[c][3];
> a[c + 1][3] = a[c][2];
> }
> }
Note that they are not really "fused" but they are interleaved. With
GIMPLE in mind
that makes a difference, you should get the equivalent of
for (c = 0; c <= 6; c++)
{
tem1 = a[c][5];
tem2 = a[c][4];
tem3 = a[c][3];
tem4 = a[c][2];
a[c+1][6] = tem1;
a[c +1][5] = tem2;
a[c+1][4] = tem3;
a[c+1][3] = tem4;
}
> The loop fusion needs to meet the dependence requirement. Basically, GCC's data
> dependence analyzer does not model dep between references in sibling loops, but
> in practice, fusion requirement can be checked by analyzing all data references
> after fusion, and there is no backward data dependence.
>
> Apparently, the requirement is violated because we have backward data dependence
> between references (a[c][5], a[c+1][5]) in S1/S2. Note, if we reverse the inner
> loop, the outer loop would become legal for vectorization.
>
> This patch fixes the issue by enforcing dependence check. It also adds two tests
> with one shouldn't be vectorized and the other should. Bootstrap and test on x86_64
> and AArch64. Is it OK?
I think you have identified the spot where things go wrong but I'm not
sure you fix the
problem fully. The spot you pacth is (loop is the outer loop):
loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr));
...
FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v)
{
int dist = dist_v[loop_depth];
...
if (dist > 0 && DDR_REVERSED_P (ddr))
{
/* If DDR_REVERSED_P the order of the data-refs in DDR was
reversed (to make distance vector positive), and the actual
distance is negative. */
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"dependence distance negative.\n");
where you add
+ /* When doing outer loop vectorization, we need to check if there is
+ backward dependence at inner loop level if dependence at the outer
+ loop is reversed. See PR81740 for more information. */
+ if (nested_in_vect_loop_p (loop, DR_STMT (dra))
+ || nested_in_vect_loop_p (loop, DR_STMT (drb)))
+ {
+ unsigned inner_depth = index_in_loop_nest (loop->inner->num,
+ DDR_LOOP_NEST (ddr));
+ if (dist_v[inner_depth] < 0)
+ return true;
+ }
but I don't understand how the dependence direction with respect to the
outer loop matters here.
Given there's DDR_REVERSED on the outer loop distance what does that
mean for the inner loop distance given the quite non-obvious code handling
this case in tree-data-ref.c:
/* Verify a basic constraint: classic distance vectors should
always be lexicographically positive.
Data references are collected in the order of execution of
the program, thus for the following loop
| for (i = 1; i < 100; i++)
| for (j = 1; j < 100; j++)
| {
| t = T[j+1][i-1]; // A
| T[j][i] = t + 2; // B
| }
references are collected following the direction of the wind:
A then B. The data dependence tests are performed also
following this order, such that we're looking at the distance
separating the elements accessed by A from the elements later
accessed by B. But in this example, the distance returned by
test_dep (A, B) is lexicographically negative (-1, 1), that
means that the access A occurs later than B with respect to
the outer loop, ie. we're actually looking upwind. In this
case we solve test_dep (B, A) looking downwind to the
lexicographically positive solution, that returns the
distance vector (1, -1). */
if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr)))
{
lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr));
if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest))
return false;
compute_subscript_distance (ddr);
if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b,
&index_carry))
return false;
save_dist_v (ddr, save_v);
DDR_REVERSED_P (ddr) = true;
that is, what does dist_v[inner_depth] < 0 tell us here? Does it tell us
that the dependence direction is reverse of the outer loop?
CCing Micha who might remember some more details from unroll-and-jam.
Thanks,
Richard.
> Thanks,
> bin
> 2017-12-15 Bin Cheng <bin.cheng@arm.com>
>
> PR tree-optimization/81740
> * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): In case
> of outer loop vectorization, check backward dependence at inner loop
> if dependence at outer loop is reversed.
>
> gcc/testsuite
> 2017-12-15 Bin Cheng <bin.cheng@arm.com>
>
> PR tree-optimization/81740
> * gcc.dg/vect/pr81740-1.c: New test.
> * gcc.dg/vect/pr81740-2.c: Refine test.