This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [gomp4] Vector-single predication

On Thu, May 21, 2015 at 01:42:11PM +0200, Bernd Schmidt wrote:
> This uses the patch I committed yesterday which introduces warp broadcasts
> to implement the vector-single predication needed for OpenACC. Outside a
> loop with vector parallelism, only one of the threads representing a vector
> must execute, the others follow along. So we skip the real work in each
> basic block for the inactive threads, then broadcast the direction to take
> in the control flow graph from the active one, and jump as a group.
> This will get extended with similar functionality for worker-single. Julian
> is working on some patches on top of that to ensure the later optimizers
> don't destroy the control flow - we really need the threads to reconverge
> and perform the broadcast/jump in lockstep.
> Committed on gomp-4_0-branch.

What do you do with function calls?
Do you call them just in the (tid.x & 31) == 0 threads (then they can't use
vectorization), or for all threads (then it is an ABI change, they
would need to know whether they are called this way and depending on that
handle it similarly (skip all the real work, except for function calls, for
(tid.x & 31) != 0, unless it is a vectorized region).
Or is OpenACC restricting this to statements in the constructs directly
(rather than anywhere in the region)?
Haven't seen any accompanying testcases for this, so it is unclear to me how
do you express this in OpenACC.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]