[gomp4] Preserve NVPTX "reconvergence" points
Bernd Schmidt
bernds@codesourcery.com
Fri Jun 19 13:07:00 GMT 2015
On 06/19/2015 02:25 PM, Jakub Jelinek wrote:
> Emitting PTX specific code from current ompexp is highly undesirable of
> course, but I must say I'm not a big fan of keeping the GOMP_* gimple trees
> around for too long either, they've never meant to be used in low gimple,
> and even all the early optimization passes could screw them up badly,
The idea is not to keep them around for very long, but I think there's
no reason why they couldn't survive a while longer. Between ompexpand
and the end of build_ssa_passes, we have (ignoring things like chkp and
ubsan which can just be turned off for offloaded functions if necessary):
NEXT_PASS (pass_ipa_free_lang_data);
NEXT_PASS (pass_ipa_function_and_variable_visibility);
NEXT_PASS (pass_fixup_cfg);
NEXT_PASS (pass_init_datastructures);
NEXT_PASS (pass_build_ssa);
NEXT_PASS (pass_early_warn_uninitialized);
NEXT_PASS (pass_nothrow);
Nothing in there strikes me as particularly problematic if we can make
things like GIMPLE_OMP_FOR survive into-ssa - which I think I did in my
patch. Besides, the OpenACC kernels path generates them in SSA form
anyway during parloops so one could make the argument that this is a
step towards better consistency.
> they are also very much OpenMP or OpenACC specific, rather than representing
> language neutral behavior, so there is a problem that you'd need M x N
> different expansions of those constructs, which is not really maintainable
> (M being number of supported offloading standards, right now 2, and N
> number of different offloading devices (host, XeonPhi, PTX, HSA, ...)).
Well, that's a problem we have anyway, independent on how we implement
all these devices and standards. I don't see how that's relevant to the
discussion.
> I wonder why struct loop flags and other info together with function
> attributes and/or cgraph flags and other info aren't sufficient for the
> OpenACC needs.
> Have you or Thomas looked what we're doing for OpenMP simd / Cilk+ simd?
> Why can't the execution model (normal, vector-single and worker-single)
> be simply attributes on functions or cgraph node flags and the kind of
> #acc loop simply be flags on struct loop, like already OpenMP simd
> / Cilk+ simd is?
We haven't looked at Cilk+ or anything like that. You suggest using
attributes and flags, but at what point do you intend to actually lower
the IR to actually represent what's going on?
> The vector level parallelism is something where on the host/host_noshm/XeonPhi
> (dunno about HSA) you want vectorization to happen, and that is already
> implemented in the vectorizer pass, implementing it again elsewhere is
> highly undesirable. For PTX the implementation is of course different,
> and the vectorizer is likely not the right pass to handle them, but why
> can't the same struct loop flags be used by the pass that handles the
> conditionalization of execution for the 2 of the 3 above modes?
Agreed on wanting the vectorizer to handle things for "normal" machines,
that is one of the motivations for pushing the lowering past the offload
LTO writeout stage. The problem with OpenACC on GPUs is that the
predication really changes the CFG and the data flow - I fear
unpredictable effects if we let any optimizers run before lowering
OpenACC to the point where we actually represent what's going on in the
function.
Bernd
More information about the Gcc-patches
mailing list