[PATCH, PR69110] Don't return NULL access_fns in dr_analyze_indices

Tue Jan 12 11:22:00 GMT 2016

On Tue, 12 Jan 2016, Tom de Vries wrote:

> Hi,
> 
> This patch fixes PR69110, a wrong-code bug in autopar.
> 
> 
> I.
> 
> consider testcase test.c:
> ...
> #define N 1000
> 
> unsigned int i = 0;
> 
> static void __attribute__((noinline, noclone))
> foo (void)
> {
>   unsigned int z;
> 
>   for (z = 0; z < N; ++z)
>     ++i;
> }
> 
> extern void abort (void);
> 
> int
> main (void)
> {
>   foo ();
>   if (i != N)
>     abort ();
> 
>   return 0;
> }
> ...
> 
> When compiled with -O1 -ftree-parallelize-loops=2 -fno-tree-loop-im, the test
> fails:
> ...
> $ gcc test.c -O1 -ftree-parallelize-loops=2 -Wl,-rpath=$(pwd
> -P)//install/lib64 -fno-tree-loop-im
> $ ./a.out
> Aborted (core dumped)
> $
> ...
> 
> 
> II.
> 
> Before parloops, at ivcanon we have the loop body:
> ...
>   <bb 3>:
>   # z_10 = PHI <z_7(4), 0(2)>
>   # ivtmp_12 = PHI <ivtmp_2(4), 1000(2)>
>   i.1_4 = i;
>   _5 = i.1_4 + 1;
>   i = _5;
>   z_7 = z_10 + 1;
>   ivtmp_2 = ivtmp_12 - 1;
>   if (ivtmp_2 != 0)
>     goto <bb 4>;
>   else
>     goto <bb 5>;
> ...
> 
> There's a loop-carried dependency in i, that is, the read from i in iteration
> z == 1 depends on the write to i in iteration z == 0. So the loop cannot be
> parallelized. The test-case fails because parloops still parallelizes the
> loop.
> 
> 
> III.
> 
> Since the loop carried dependency is in-memory, it is not handled by the code
> analyzing reductions, since that code ignores the virtual phi.
> 
> So, AFAIU, this loop carried dependency should be handled by the dependency
> testing in loop_parallel_p. And loop_parallel_p returns true for this loop.
> 
> A comment in loop_parallel_p reads: "Check for problems with dependences.  If
> the loop can be reversed, the iterations are independent."
> 
> AFAIU, the loop order can actually be reversed. But, it cannot be executed in
> parallel.
> 
> So from this perspective, it seems in this case the comment matches the check,
> but the check is not sufficient.
> 
> 
> IV.
> 
> OTOH, if we replace the declaration of i with i[1], and replace the references
> of i with i[0], we see that loop_parallel_p fails.  So the loop_parallel_p
> check in this case seems sufficient, and there's something else that causes
> the check to fail in this case.
> 
> The difference is in the generated data ref:
> - in the 'i[1]' case, we set DR_ACCESS_FNS in dr_analyze_indices to
>   vector with a single element: access function 0.
> - in the 'i' case, we set DR_ACCESS_FNS to NULL.
> 
> This difference causes different handling in the dependency generation, in
> particular in add_distance_for_zero_overlaps which has no effect for the 'i'
> case because  DDR_NUM_SUBSCRIPTS (ddr) == 0 (as a consequence of the NULL
> access_fns of both the source and sink data refs).
> 
> From this perspective, it seems that the loop_parallel_p check is sufficient,
> and that dr_analyze_indices shouldn't return a NULL access_fns for 'i'.
> 
> 
> V.
> 
> When compiling with graphite using -floop-parallelize-all --param
> graphite-min-loops-per-function=1, we find:
> ...
> [scop-detection-fail] Graphite cannot handle data-refs in stmt:
> # VUSE <.MEM_11>
> i.1_4 = i;
> ...
> 
> The function scop_detection::stmt_has_simple_data_refs_p returns false because
> of the code recently added for PR66980 at r228357:
> ...
>       int nb_subscripts = DR_NUM_DIMENSIONS (dr);
> 
>       if (nb_subscripts < 1)
> 	{
>           free_data_refs (drs);
>           return false;
>         }
> ...
> 
> [ DR_NUM_DIMENSIONS (dr) is 0 as a consequence of the NULL access_fns. ]
> 
> This code labels DR_NUM_DIMENSIONS (dr) == 0 as 'data reference analysis has
> failed'.
> 
> From this perspective, it seems that the dependence handling should bail out
> once it finds a data ref with DR_NUM_DIMENSIONS (dr) == 0 (or DR_ACCESS_FNS ==
> 0).
> 
> 
> VI.
> 
> This test-case used to pass in 4.6 because in find_data_references_in_stmt we
> had:
> ...
>       /* FIXME -- data dependence analysis does not work correctly for
>          objects with invariant addresses in loop nests.  Let us fail
>          here until the problem is fixed.  */
>       if (dr_address_invariant_p (dr) && nest)
> 	{
>           free_data_ref (dr);
>           if (dump_file && (dump_flags & TDF_DETAILS))
>             fprintf (dump_file,
>                      "\tFAILED as dr address is invariant\n");
>           ret = false;
>           break;
> 	}
> ...
> 
> That FIXME was removed in the fix for PR46787, at r175704.
> 
> The test-case fails in 4.8, and I guess from there onwards.
> 
> 
> VII.
> 
> The attached patch fixes the problem by returning a zero access function for
> 'i' in dr_analyze_indices.
> 
> [ But I can also imagine a solution similar to the graphite fix:
> ...
> @@ -3997,6 +3999,12 @@ find_data_references_in_stmt
>        dr = create_data_ref (nest, loop_containing_stmt (stmt),
>                             ref->ref, stmt, ref->is_read);
>        gcc_assert (dr != NULL);
> +      if (DR_NUM_DIMENSIONS (dr) == 0)
> +       {
> +         datarefs->release ();
> +         return false;
> +       }
> +
>        datarefs->safe_push (dr);
>      }
>    references.release ();
> ...
> 
> I'm not familiar enough with the dependency analysis code to know where
> exactly this should be fixed. ]
> 
> Bootstrapped and reg-tested on x86_64.
> 
> OK for trunk?
> 
> OK for release branches?

Bah - attachement [please post patches inline]

So without quote ;)

Doesnt' the same issue apply to 

> unsigned int *p;
>       
> static void __attribute__((noinline, noclone))
> foo (void)
> {         
>   unsigned int z;
>   
>   for (z = 0; z < N; ++z)
>     ++(*p);
> }

thus when we have a MEM_REF[p_1]?  SCEV will not analyze
its evolution to a POLYNOMIAL_CHREC and thus access_fns will
be NULL again.

I think avoiding a NULL access_fns is ok but it should be done
unconditionally, not only for the DECL_P case.

Thanks,
Richard.