This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [gomp4] openacc kernels directive support

On 06-08-14 17:10, Tom de Vries wrote:
The place after build_ealias is early enough to be before the lto-stream
write/read. I don't see how we can do this earlier. Before ealias, there's no
alias info, and one of the loops fails to be recognized as parallel.
Furthermore, pass_ch, pass_ccp, pass_lim_aux and pass_parloops are written to
work on cfg/ssa code, which we don't have at omp_low/omp_exp time.

Slight correction: we do have cfg at omp_exp time.

We could insert a pass-group here that only deals with functions that have the
kernels directive, and do the auto-par thing in a pass_oacc_kernels (which
should share the majority of the infrastructure with the parloops pass):
           NEXT_PASS (pass_build_ealias);
           INSERT_PASSES_AFTER/WITHIN (passes_oacc_kernels)
              NEXT_PASS (pass_ch);
              NEXT_PASS (pass_ccp);
              NEXT_PASS (pass_lim_aux);
              NEXT_PASS (pass_oacc_par);
           POP_INSERT_PASSES ()

Any comments, ideas or suggestions ?

I've experimented with implementing this on top of gomp-4_0-branch, and I ran into PR46032.

PR46032 is about vectorization failure on a function split off by omp parallelization. The vectorization fails due to aliasing constraints in the split off function, which are not present in the original code.

In the gomp-4_0-branch, the code marked by the openacc kernels directive is split off during omp_expand. The generated code has the same additional aliasing constraints, and in pass_oacc_par the parallelization fails.

The PR46032 contains a tentative patch by Richard Biener, which applies cleanly on top of 4.6 (I haven't yet reached a level of understanding of tree-ssa-structalias.c to be able to resolve the conflict in intra_create_variable_infos when applying on 4.7). The tentative patch involves running ipa-pta, which is also a pass run after the point where we write out the lto stream. I'm not sure whether it makes sense to run the pta-ipa pass as part of the pass_oacc_kernels pass list.

I see three ways of continuing from here:
- take the tentative patch and make it work, including running pta-ipa during
- same, but try somehow to manage without running pta-ipa.
- try to postpone splitting of the function until the end of pass_oacc_par.

Some advice on how to continue from here would be *highly* appreciated. My hunch atm is to investigate the last option.

- Tom

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]