This is the mail archive of the
mailing list for the GCC project.
Re: [gomp4] Vector-single predication
- From: Jakub Jelinek <jakub at redhat dot com>
- To: Bernd Schmidt <bernds at codesourcery dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 21 May 2015 13:57:00 +0200
- Subject: Re: [gomp4] Vector-single predication
- Authentication-results: sourceware.org; auth=none
- References: <555DC493 dot 2050208 at codesourcery dot com>
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Thu, May 21, 2015 at 01:42:11PM +0200, Bernd Schmidt wrote:
> This uses the patch I committed yesterday which introduces warp broadcasts
> to implement the vector-single predication needed for OpenACC. Outside a
> loop with vector parallelism, only one of the threads representing a vector
> must execute, the others follow along. So we skip the real work in each
> basic block for the inactive threads, then broadcast the direction to take
> in the control flow graph from the active one, and jump as a group.
> This will get extended with similar functionality for worker-single. Julian
> is working on some patches on top of that to ensure the later optimizers
> don't destroy the control flow - we really need the threads to reconverge
> and perform the broadcast/jump in lockstep.
> Committed on gomp-4_0-branch.
What do you do with function calls?
Do you call them just in the (tid.x & 31) == 0 threads (then they can't use
vectorization), or for all threads (then it is an ABI change, they
would need to know whether they are called this way and depending on that
handle it similarly (skip all the real work, except for function calls, for
(tid.x & 31) != 0, unless it is a vectorized region).
Or is OpenACC restricting this to statements in the constructs directly
(rather than anywhere in the region)?
Haven't seen any accompanying testcases for this, so it is unclear to me how
do you express this in OpenACC.