This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [gomp4] Preserve NVPTX "reconvergence" points

From: Bernd Schmidt <bernds at codesourcery dot com>
To: Jakub Jelinek <jakub at redhat dot com>
Cc: Thomas Schwinge <thomas at codesourcery dot com>, <gcc-patches at gcc dot gnu dot org>, Nathan Sidwell <nathan at codesourcery dot com>, Julian Brown <julian at codesourcery dot com>
Date: Fri, 19 Jun 2015 15:03:38 +0200
Subject: Re: [gomp4] Preserve NVPTX "reconvergence" points
Authentication-results: sourceware.org; auth=none
References: <20150528150635 dot 7bd5db23 at octopus> <20150528142011 dot GN10247 at tucnak dot redhat dot com> <87pp5kg3js dot fsf at schwinge dot name> <20150528150802 dot GO10247 at tucnak dot redhat dot com> <5583E68A dot 9020608 at codesourcery dot com> <20150619122557 dot GO10247 at tucnak dot redhat dot com>

On 06/19/2015 02:25 PM, Jakub Jelinek wrote:

Emitting PTX specific code from current ompexp is highly undesirable of
course, but I must say I'm not a big fan of keeping the GOMP_* gimple trees
around for too long either, they've never meant to be used in low gimple,
and even all the early optimization passes could screw them up badly,

The idea is not to keep them around for very long, but I think there'sno reason why they couldn't survive a while longer. Between ompexpandand the end of build_ssa_passes, we have (ignoring things like chkp andubsan which can just be turned off for offloaded functions if necessary):

  NEXT_PASS (pass_ipa_free_lang_data);
  NEXT_PASS (pass_ipa_function_and_variable_visibility);
      NEXT_PASS (pass_fixup_cfg);
      NEXT_PASS (pass_init_datastructures);
      NEXT_PASS (pass_build_ssa);
      NEXT_PASS (pass_early_warn_uninitialized);
      NEXT_PASS (pass_nothrow);

Nothing in there strikes me as particularly problematic if we can makethings like GIMPLE_OMP_FOR survive into-ssa - which I think I did in mypatch. Besides, the OpenACC kernels path generates them in SSA formanyway during parloops so one could make the argument that this is astep towards better consistency.

they are also very much OpenMP or OpenACC specific, rather than representing
language neutral behavior, so there is a problem that you'd need M x N
different expansions of those constructs, which is not really maintainable
(M being number of supported offloading standards, right now 2, and N
number of different offloading devices (host, XeonPhi, PTX, HSA, ...)).

Well, that's a problem we have anyway, independent on how we implementall these devices and standards. I don't see how that's relevant to thediscussion.

I wonder why struct loop flags and other info together with function
attributes and/or cgraph flags and other info aren't sufficient for the
OpenACC needs.
Have you or Thomas looked what we're doing for OpenMP simd / Cilk+ simd?

Why can't the execution model (normal, vector-single and worker-single)
be simply attributes on functions or cgraph node flags and the kind of
#acc loop simply be flags on struct loop, like already OpenMP simd
/ Cilk+ simd is?

We haven't looked at Cilk+ or anything like that. You suggest usingattributes and flags, but at what point do you intend to actually lowerthe IR to actually represent what's going on?

The vector level parallelism is something where on the host/host_noshm/XeonPhi
(dunno about HSA) you want vectorization to happen, and that is already
implemented in the vectorizer pass, implementing it again elsewhere is
highly undesirable.  For PTX the implementation is of course different,
and the vectorizer is likely not the right pass to handle them, but why
can't the same struct loop flags be used by the pass that handles the
conditionalization of execution for the 2 of the 3 above modes?

Agreed on wanting the vectorizer to handle things for "normal" machines,that is one of the motivations for pushing the lowering past the offloadLTO writeout stage. The problem with OpenACC on GPUs is that thepredication really changes the CFG and the data flow - I fearunpredictable effects if we let any optimizers run before loweringOpenACC to the point where we actually represent what's going on in thefunction.



Bernd

Follow-Ups:
- Re: [gomp4] Preserve NVPTX "reconvergence" points
  - From: Jakub Jelinek

References:
- Re: [gomp4] Preserve NVPTX "reconvergence" points
  - From: Bernd Schmidt
- Re: [gomp4] Preserve NVPTX "reconvergence" points
  - From: Jakub Jelinek

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]