This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [gomp4] Vector-single predication

From: Julian Brown <julian at codesourcery dot com>
To: Jakub Jelinek <jakub at redhat dot com>
Cc: Bernd Schmidt <bernds at codesourcery dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
Date: Thu, 21 May 2015 14:05:12 +0100
Subject: Re: [gomp4] Vector-single predication
Authentication-results: sourceware.org; auth=none
References: <555DC493 dot 2050208 at codesourcery dot com> <20150521115700 dot GT1751 at tucnak dot redhat dot com>

On Thu, 21 May 2015 13:57:00 +0200
Jakub Jelinek <jakub@redhat.com> wrote:

> On Thu, May 21, 2015 at 01:42:11PM +0200, Bernd Schmidt wrote:
> > This uses the patch I committed yesterday which introduces warp
> > broadcasts to implement the vector-single predication needed for
> > OpenACC. Outside a loop with vector parallelism, only one of the
> > threads representing a vector must execute, the others follow
> > along. So we skip the real work in each basic block for the
> > inactive threads, then broadcast the direction to take in the
> > control flow graph from the active one, and jump as a group.
> > 
> > This will get extended with similar functionality for
> > worker-single. Julian is working on some patches on top of that to
> > ensure the later optimizers don't destroy the control flow - we
> > really need the threads to reconverge and perform the
> > broadcast/jump in lockstep.
> > 
> > Committed on gomp-4_0-branch.
> 
> What do you do with function calls?
> Do you call them just in the (tid.x & 31) == 0 threads (then they
> can't use vectorization), or for all threads (then it is an ABI
> change, they would need to know whether they are called this way and
> depending on that handle it similarly (skip all the real work, except
> for function calls, for (tid.x & 31) != 0, unless it is a vectorized
> region). Or is OpenACC restricting this to statements in the
> constructs directly (rather than anywhere in the region)?

OpenACC handles function calls specially (calling them "routines" -- of
varying sorts, gang, worker, vector or seq, affecting where they can be
invoked from). The plan is that all threads will call such routines --
and then some threads will be "neutered" as appropriate within the
routines themselves, as appropriate.

That's not actually implemented yet, though.

Julian

Follow-Ups:
- Re: [gomp4] Vector-single predication
  - From: Jakub Jelinek

References:
- [gomp4] Vector-single predication
  - From: Bernd Schmidt
- Re: [gomp4] Vector-single predication
  - From: Jakub Jelinek

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]