This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [gomp4] Preserve NVPTX "reconvergence" points


On Mon, Jun 22, 2015 at 06:48:10PM +0100, Julian Brown wrote:
> In vector-single or worker-single mode, divergence of threads within a
> warp or a CTA is controlled by broadcasting the controlling expression
> of conditional branches to the set of "inactive" threads, so each of
> those follows along with the active thread. So you only get
> potentially-problematic thread divergence when workers or vectors are
> operating in partitioned mode.
> 
> So, for instance, a made-up example:
> 
> #pragma acc parallel
> {
>   #pragma acc loop gang
>   for (i = 0; i < N; i++))
>   {
>     #pragma acc loop worker
>     for (j = 0; j < M; j++)
>     {
>       if (j < M / 2)
>         /* stmt 1 */
>       else
>         /* stmt 2 */
>     }
> 
>     /* reconvergence point: thread barrier */
> 
>     [...]
>   }
> }
> 
> Here "stmt 1" and "stmt 2" execute in worker-partitioned, vector-single
> mode. With "early lowering", the reconvergence point can be
> inserted at the end of the loop, and abnormal edges (etc.) can be used
> to ensure that the CFG does not get changed in such a way that there is
> no longer a unique point at which the loop threads reconverge.
> 
> With "late lowering", it's no longer obvious to me if that can still be
> done.

Why?  The loop still has an exit edge (if there is no break/return/throw out of
the loop which I bet is not allowed), so you just insert the reconvergence
point at the exit edge from the loop.
For the "late lowering", I said it is up for benchmarking/investigation
where it would be best placed, it doesn't have to be after the loop passes,
there are plenty of optimization passes even before those.  But once you turn
many of the SSA_NAMEs in a function into (ab) ssa vars, many optimizations
just give up.
And, if you really want to avoid certain loop optimizations, you have always
the possibility to e.g. wrap certain statement in the loop in internal
function (e.g. the loop condition) or something similar to make the passes
more careful about those loops and make it easier to lower it later.

	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]