This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [gomp4] Vector-single predication
- From: Julian Brown <julian at codesourcery dot com>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: Bernd Schmidt <bernds at codesourcery dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 21 May 2015 14:05:12 +0100
- Subject: Re: [gomp4] Vector-single predication
- Authentication-results: sourceware.org; auth=none
- References: <555DC493 dot 2050208 at codesourcery dot com> <20150521115700 dot GT1751 at tucnak dot redhat dot com>
On Thu, 21 May 2015 13:57:00 +0200
Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, May 21, 2015 at 01:42:11PM +0200, Bernd Schmidt wrote:
> > This uses the patch I committed yesterday which introduces warp
> > broadcasts to implement the vector-single predication needed for
> > OpenACC. Outside a loop with vector parallelism, only one of the
> > threads representing a vector must execute, the others follow
> > along. So we skip the real work in each basic block for the
> > inactive threads, then broadcast the direction to take in the
> > control flow graph from the active one, and jump as a group.
> >
> > This will get extended with similar functionality for
> > worker-single. Julian is working on some patches on top of that to
> > ensure the later optimizers don't destroy the control flow - we
> > really need the threads to reconverge and perform the
> > broadcast/jump in lockstep.
> >
> > Committed on gomp-4_0-branch.
>
> What do you do with function calls?
> Do you call them just in the (tid.x & 31) == 0 threads (then they
> can't use vectorization), or for all threads (then it is an ABI
> change, they would need to know whether they are called this way and
> depending on that handle it similarly (skip all the real work, except
> for function calls, for (tid.x & 31) != 0, unless it is a vectorized
> region). Or is OpenACC restricting this to statements in the
> constructs directly (rather than anywhere in the region)?
OpenACC handles function calls specially (calling them "routines" -- of
varying sorts, gang, worker, vector or seq, affecting where they can be
invoked from). The plan is that all threads will call such routines --
and then some threads will be "neutered" as appropriate within the
routines themselves, as appropriate.
That's not actually implemented yet, though.
Julian