[gomp4, committed] Fix parallelization for fortran oacc kernels tests

Richard Biener rguenther@suse.de
Thu Jun 18 11:50:00 GMT 2015


On Thu, 18 Jun 2015, Tom de Vries wrote:

> On 18/06/15 12:48, Richard Biener wrote:
> > On Thu, 18 Jun 2015, Tom de Vries wrote:
> > 
> > > Hi,
> > > 
> > > I ran into a problem with fortran loops in oacc kernels regions not being
> > > parallelized, after introducting transform_to_exit_first_loop_alt.
> > > 
> > > For gfortran.dg/goacc/kernels-loop.f95, we get:
> > > ...
> > > #pragma omp target oacc_parallel num_gangs(1)
> > > ...
> > > instead of the desired num_gangs (32).
> > > 
> > > transform_to_exit_first_loop_alt fails because nit is _135, where nit is
> > > defined by:
> > > ...
> > > *_105 = 0;
> > > D__lsm.27_50 = *_105;
> > > _32 = (unsigned int) D__lsm.27_50;
> > > _135 = 1023 - _32;
> > > ...
> > > 
> > > pass_fre would manage to propagate the '*105 = 0' assignment. But in the
> > > current pass order, pass_fre is run before pass_lim, where this pattern is
> > > introduced:
> > > ...
> > >                NEXT_PASS (pass_ch_oacc_kernels);
> > >                NEXT_PASS (pass_fre);
> > >                NEXT_PASS (pass_tree_loop_init);
> > >                NEXT_PASS (pass_lim);
> > >                NEXT_PASS (pass_copy_prop);
> > >                NEXT_PASS (pass_scev_cprop);
> > >                NEXT_PASS (pass_parallelize_loops_oacc_kernels);
> > >                NEXT_PASS (pass_expand_omp_ssa);
> > >                NEXT_PASS (pass_tree_loop_done);
> > > ...
> > > 
> > > The patch moves pass_fre to the location of pass_copy_prop, and replaces
> > > it.
> > > Furthermore, it adds scans to the fortran test-cases to make sure they get
> > > properly parallelized.
> > 
> > You may now figure out that LIM needs FRE to detect equal memory
> > references to apply store-motion.  But maybe the issues oacc
> > lowering introduces are limited and under your control.
> > 
> 
> To show the context of the pass group, after this commit the pass group looks
> like this:
> ...
>           NEXT_PASS (pass_sra_early);
>           NEXT_PASS (pass_build_ealias);
>           NEXT_PASS (pass_fre);
>           NEXT_PASS (pass_oacc_kernels);
>           PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
>               NEXT_PASS (pass_ch_oacc_kernels);
>               NEXT_PASS (pass_tree_loop_init);
>               NEXT_PASS (pass_lim);
>               NEXT_PASS (pass_tree_loop_done);
>               NEXT_PASS (pass_fre);
>               NEXT_PASS (pass_tree_loop_init);
>               NEXT_PASS (pass_scev_cprop);
>               NEXT_PASS (pass_parallelize_loops_oacc_kernels);
>               NEXT_PASS (pass_expand_omp_ssa);
>               NEXT_PASS (pass_tree_loop_done);
>           POP_INSERT_PASSES ()
>           NEXT_PASS (pass_merge_phi);
>           NEXT_PASS (pass_dse);
>           NEXT_PASS (pass_cd_dce);
> ...
> In other words, the pass group is run directly after pass_fre.
> 
> When I move pass_fre before the pass group to directly after the pass group, I
> start seeing the failure mode you describe.

Yes, it really depends on what kind of changes pass_oacc_kernels
does (though pass_ch_oacc_kernels which is loop-header copying? may
also do relevant changes enabling LIM/store-motion after FRE cleanup
if there is a loop nest involved)

Richard.

> Thanks,
> - Tom
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)



More information about the Gcc-patches mailing list