[gomp4] Vector-single predication

Jakub Jelinek jakub@redhat.com
Thu May 21 12:23:00 GMT 2015


On Thu, May 21, 2015 at 01:42:11PM +0200, Bernd Schmidt wrote:
> This uses the patch I committed yesterday which introduces warp broadcasts
> to implement the vector-single predication needed for OpenACC. Outside a
> loop with vector parallelism, only one of the threads representing a vector
> must execute, the others follow along. So we skip the real work in each
> basic block for the inactive threads, then broadcast the direction to take
> in the control flow graph from the active one, and jump as a group.
> 
> This will get extended with similar functionality for worker-single. Julian
> is working on some patches on top of that to ensure the later optimizers
> don't destroy the control flow - we really need the threads to reconverge
> and perform the broadcast/jump in lockstep.
> 
> Committed on gomp-4_0-branch.

What do you do with function calls?
Do you call them just in the (tid.x & 31) == 0 threads (then they can't use
vectorization), or for all threads (then it is an ABI change, they
would need to know whether they are called this way and depending on that
handle it similarly (skip all the real work, except for function calls, for
(tid.x & 31) != 0, unless it is a vectorized region).
Or is OpenACC restricting this to statements in the constructs directly
(rather than anywhere in the region)?
Haven't seen any accompanying testcases for this, so it is unclear to me how
do you express this in OpenACC.

	Jakub



More information about the Gcc-patches mailing list