[PATCH, PR69110] Don't return NULL access_fns in dr_analyze_indices
Richard Biener
rguenther@suse.de
Tue Jan 12 11:22:00 GMT 2016
On Tue, 12 Jan 2016, Tom de Vries wrote:
> Hi,
>
> This patch fixes PR69110, a wrong-code bug in autopar.
>
>
> I.
>
> consider testcase test.c:
> ...
> #define N 1000
>
> unsigned int i = 0;
>
> static void __attribute__((noinline, noclone))
> foo (void)
> {
> unsigned int z;
>
> for (z = 0; z < N; ++z)
> ++i;
> }
>
> extern void abort (void);
>
> int
> main (void)
> {
> foo ();
> if (i != N)
> abort ();
>
> return 0;
> }
> ...
>
> When compiled with -O1 -ftree-parallelize-loops=2 -fno-tree-loop-im, the test
> fails:
> ...
> $ gcc test.c -O1 -ftree-parallelize-loops=2 -Wl,-rpath=$(pwd
> -P)//install/lib64 -fno-tree-loop-im
> $ ./a.out
> Aborted (core dumped)
> $
> ...
>
>
> II.
>
> Before parloops, at ivcanon we have the loop body:
> ...
> <bb 3>:
> # z_10 = PHI <z_7(4), 0(2)>
> # ivtmp_12 = PHI <ivtmp_2(4), 1000(2)>
> i.1_4 = i;
> _5 = i.1_4 + 1;
> i = _5;
> z_7 = z_10 + 1;
> ivtmp_2 = ivtmp_12 - 1;
> if (ivtmp_2 != 0)
> goto <bb 4>;
> else
> goto <bb 5>;
> ...
>
> There's a loop-carried dependency in i, that is, the read from i in iteration
> z == 1 depends on the write to i in iteration z == 0. So the loop cannot be
> parallelized. The test-case fails because parloops still parallelizes the
> loop.
>
>
> III.
>
> Since the loop carried dependency is in-memory, it is not handled by the code
> analyzing reductions, since that code ignores the virtual phi.
>
> So, AFAIU, this loop carried dependency should be handled by the dependency
> testing in loop_parallel_p. And loop_parallel_p returns true for this loop.
>
> A comment in loop_parallel_p reads: "Check for problems with dependences. If
> the loop can be reversed, the iterations are independent."
>
> AFAIU, the loop order can actually be reversed. But, it cannot be executed in
> parallel.
>
> So from this perspective, it seems in this case the comment matches the check,
> but the check is not sufficient.
>
>
> IV.
>
> OTOH, if we replace the declaration of i with i[1], and replace the references
> of i with i[0], we see that loop_parallel_p fails. So the loop_parallel_p
> check in this case seems sufficient, and there's something else that causes
> the check to fail in this case.
>
> The difference is in the generated data ref:
> - in the 'i[1]' case, we set DR_ACCESS_FNS in dr_analyze_indices to
> vector with a single element: access function 0.
> - in the 'i' case, we set DR_ACCESS_FNS to NULL.
>
> This difference causes different handling in the dependency generation, in
> particular in add_distance_for_zero_overlaps which has no effect for the 'i'
> case because DDR_NUM_SUBSCRIPTS (ddr) == 0 (as a consequence of the NULL
> access_fns of both the source and sink data refs).
>
> From this perspective, it seems that the loop_parallel_p check is sufficient,
> and that dr_analyze_indices shouldn't return a NULL access_fns for 'i'.
>
>
> V.
>
> When compiling with graphite using -floop-parallelize-all --param
> graphite-min-loops-per-function=1, we find:
> ...
> [scop-detection-fail] Graphite cannot handle data-refs in stmt:
> # VUSE <.MEM_11>
> i.1_4 = i;
> ...
>
> The function scop_detection::stmt_has_simple_data_refs_p returns false because
> of the code recently added for PR66980 at r228357:
> ...
> int nb_subscripts = DR_NUM_DIMENSIONS (dr);
>
> if (nb_subscripts < 1)
> {
> free_data_refs (drs);
> return false;
> }
> ...
>
> [ DR_NUM_DIMENSIONS (dr) is 0 as a consequence of the NULL access_fns. ]
>
> This code labels DR_NUM_DIMENSIONS (dr) == 0 as 'data reference analysis has
> failed'.
>
> From this perspective, it seems that the dependence handling should bail out
> once it finds a data ref with DR_NUM_DIMENSIONS (dr) == 0 (or DR_ACCESS_FNS ==
> 0).
>
>
> VI.
>
> This test-case used to pass in 4.6 because in find_data_references_in_stmt we
> had:
> ...
> /* FIXME -- data dependence analysis does not work correctly for
> objects with invariant addresses in loop nests. Let us fail
> here until the problem is fixed. */
> if (dr_address_invariant_p (dr) && nest)
> {
> free_data_ref (dr);
> if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file,
> "\tFAILED as dr address is invariant\n");
> ret = false;
> break;
> }
> ...
>
> That FIXME was removed in the fix for PR46787, at r175704.
>
> The test-case fails in 4.8, and I guess from there onwards.
>
>
> VII.
>
> The attached patch fixes the problem by returning a zero access function for
> 'i' in dr_analyze_indices.
>
> [ But I can also imagine a solution similar to the graphite fix:
> ...
> @@ -3997,6 +3999,12 @@ find_data_references_in_stmt
> dr = create_data_ref (nest, loop_containing_stmt (stmt),
> ref->ref, stmt, ref->is_read);
> gcc_assert (dr != NULL);
> + if (DR_NUM_DIMENSIONS (dr) == 0)
> + {
> + datarefs->release ();
> + return false;
> + }
> +
> datarefs->safe_push (dr);
> }
> references.release ();
> ...
>
> I'm not familiar enough with the dependency analysis code to know where
> exactly this should be fixed. ]
>
> Bootstrapped and reg-tested on x86_64.
>
> OK for trunk?
>
> OK for release branches?
Bah - attachement [please post patches inline]
So without quote ;)
Doesnt' the same issue apply to
> unsigned int *p;
>
> static void __attribute__((noinline, noclone))
> foo (void)
> {
> unsigned int z;
>
> for (z = 0; z < N; ++z)
> ++(*p);
> }
thus when we have a MEM_REF[p_1]? SCEV will not analyze
its evolution to a POLYNOMIAL_CHREC and thus access_fns will
be NULL again.
I think avoiding a NULL access_fns is ok but it should be done
unconditionally, not only for the DECL_P case.
Thanks,
Richard.
More information about the Gcc-patches
mailing list