This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant


On Wed, 2 Dec 2015, Jakub Jelinek wrote:
> > It's easy to address: just terminate threads 1-31 if the linked image has
> > no SIMD regions, like my pre-simd libgomp was doing.
> 
> Well, can't say the linked image in one shared library call a function
> in another linked image in another shared library?  Or is that just not
> supported for PTX?  I believe XeonPhi supports that.

I meant the PTX linked (post PTX-JIT link) image, so regardless of support,
it's not an issue.  E.g. check early in gomp_nvptx_main if .weak
__nvptx_has_simd != 0.  It would only break if there was dlopen on PTX.

> If each linked image is self-contained, then that is probably a good idea,
> but still you could have a single simd region somewhere and lots of other
> target regions that don't use simd, or cases where only small amount of time
> is spent in a simd region and this wouldn't help in that case.

Should we actually be much concerned about optimizing this case, which
is unlikely to run faster than host cpu in the first place?

> If the addressables are handled through soft stack, then the rest is mostly
> just SSA_NAMEs you can see on the edges of the SIMT region, that really
> shouldn't be that expensive to broadcast or reduce back.

That's not enough: you have to reach the SIMD region entry in threads 1-31,
which means they need to execute all preceding control flow like thread 0,
which means they need to compute controlling predicates like thread 0.
(OpenACC broadcasts controlling predicates at branches)

Alexander


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]