This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [gomp4] Preserve NVPTX "reconvergence" points
- From: Jakub Jelinek <jakub at redhat dot com>
- To: Julian Brown <julian at codesourcery dot com>
- Cc: Bernd Schmidt <bernds at codesourcery dot com>, Thomas Schwinge <thomas at codesourcery dot com>, gcc-patches at gcc dot gnu dot org, Nathan Sidwell <nathan at codesourcery dot com>
- Date: Mon, 22 Jun 2015 20:27:59 +0200
- Subject: Re: [gomp4] Preserve NVPTX "reconvergence" points
- Authentication-results: sourceware.org; auth=none
- References: <20150528150635 dot 7bd5db23 at octopus> <20150528142011 dot GN10247 at tucnak dot redhat dot com> <87pp5kg3js dot fsf at schwinge dot name> <20150528150802 dot GO10247 at tucnak dot redhat dot com> <5583E68A dot 9020608 at codesourcery dot com> <20150619122557 dot GO10247 at tucnak dot redhat dot com> <20150622145549 dot 481d4549 at octopus> <20150622142456 dot GZ10247 at tucnak dot redhat dot com> <20150622184810 dot 76fba1c2 at octopus>
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Mon, Jun 22, 2015 at 06:48:10PM +0100, Julian Brown wrote:
> In vector-single or worker-single mode, divergence of threads within a
> warp or a CTA is controlled by broadcasting the controlling expression
> of conditional branches to the set of "inactive" threads, so each of
> those follows along with the active thread. So you only get
> potentially-problematic thread divergence when workers or vectors are
> operating in partitioned mode.
>
> So, for instance, a made-up example:
>
> #pragma acc parallel
> {
> #pragma acc loop gang
> for (i = 0; i < N; i++))
> {
> #pragma acc loop worker
> for (j = 0; j < M; j++)
> {
> if (j < M / 2)
> /* stmt 1 */
> else
> /* stmt 2 */
> }
>
> /* reconvergence point: thread barrier */
>
> [...]
> }
> }
>
> Here "stmt 1" and "stmt 2" execute in worker-partitioned, vector-single
> mode. With "early lowering", the reconvergence point can be
> inserted at the end of the loop, and abnormal edges (etc.) can be used
> to ensure that the CFG does not get changed in such a way that there is
> no longer a unique point at which the loop threads reconverge.
>
> With "late lowering", it's no longer obvious to me if that can still be
> done.
Why? The loop still has an exit edge (if there is no break/return/throw out of
the loop which I bet is not allowed), so you just insert the reconvergence
point at the exit edge from the loop.
For the "late lowering", I said it is up for benchmarking/investigation
where it would be best placed, it doesn't have to be after the loop passes,
there are plenty of optimization passes even before those. But once you turn
many of the SSA_NAMEs in a function into (ab) ssa vars, many optimizations
just give up.
And, if you really want to avoid certain loop optimizations, you have always
the possibility to e.g. wrap certain statement in the loop in internal
function (e.g. the loop condition) or something similar to make the passes
more careful about those loops and make it easier to lower it later.
Jakub