[gomp4] Preserve NVPTX "reconvergence" points

Bernd Schmidt bernds@codesourcery.com
Fri Jun 19 10:44:00 GMT 2015


On 05/28/2015 05:08 PM, Jakub Jelinek wrote:

> I understand it is more work, I'd just like to ask that when designing stuff
> for the OpenACC offloading you (plural) try to take the other offloading
> devices and host fallback into account.

The problem is that many of the transformations we need to do are really 
GPU specific, and with the current structure of omplow/ompexp they are 
being done in the host compiler. The offloading scheme we decided on 
does not give us the means to write out multiple versions of an 
offloaded function where each target gets a different one. For that 
reason I think we should postpone these lowering decisions until we're 
in the accel compiler, where they could be controlled by target hooks, 
and over the last two weeks I've been doing some experiments to see how 
that could be achieved.

The basic idea is to delay expanding the inner regions of an OpenACC 
target region during ompexp, write out offload LTO (almost) immediately 
afterwards, and then have another ompexp phase which runs on the accel 
compiler to take the offloaded function to its final form. The first 
attempt really did write LTO immediately after, before moving to SSA 
phase. It seems that this could be made to work, but the pass manager 
and LTO code rather expects that what is being read in is in SSA form 
already. Also, some offloaded code is produced by OpenACC kernels 
expansion much later in the compilation, so with this approach we have 
an inconsistency where functions we get back from LTO are at very 
different levels of lowering.

The next attempt was to run the into-ssa passes after ompexpand, and 
only then write things out. I've changed the gimple representation of 
some OMP statements (primarily gimple_omp_for) so that they are 
relatively normal statements with operands that can be transformed into 
SSA form. As far as what's easier to work with - I believe some of the 
transformations we have to do could benefit from being in SSA, but on 
the other hand the OpenACC predication code has given me some trouble. 
I've still not sompletely convinced myself that the update_ssa call I've 
added will actually do the right thing after we've mucked up the CFG.

I'm appending a proof-of-concept patch. This is intended to show the 
general outline of what I have in mind, rather than pass the testsuite. 
It's good enough to compile some of the OpenACC testcases (let's say 
worker-single-3 if you need one). Let me know what you think.


Bernd

-------------- next part --------------
A non-text attachment was scrubbed...
Name: offload-early.diff
Type: text/x-patch
Size: 60689 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20150619/33aed8dc/attachment.bin>


More information about the Gcc-patches mailing list