[gomp4] Preserve NVPTX "reconvergence" points
Bernd Schmidt
bernds@codesourcery.com
Fri Jun 19 10:44:00 GMT 2015
On 05/28/2015 05:08 PM, Jakub Jelinek wrote:
> I understand it is more work, I'd just like to ask that when designing stuff
> for the OpenACC offloading you (plural) try to take the other offloading
> devices and host fallback into account.
The problem is that many of the transformations we need to do are really
GPU specific, and with the current structure of omplow/ompexp they are
being done in the host compiler. The offloading scheme we decided on
does not give us the means to write out multiple versions of an
offloaded function where each target gets a different one. For that
reason I think we should postpone these lowering decisions until we're
in the accel compiler, where they could be controlled by target hooks,
and over the last two weeks I've been doing some experiments to see how
that could be achieved.
The basic idea is to delay expanding the inner regions of an OpenACC
target region during ompexp, write out offload LTO (almost) immediately
afterwards, and then have another ompexp phase which runs on the accel
compiler to take the offloaded function to its final form. The first
attempt really did write LTO immediately after, before moving to SSA
phase. It seems that this could be made to work, but the pass manager
and LTO code rather expects that what is being read in is in SSA form
already. Also, some offloaded code is produced by OpenACC kernels
expansion much later in the compilation, so with this approach we have
an inconsistency where functions we get back from LTO are at very
different levels of lowering.
The next attempt was to run the into-ssa passes after ompexpand, and
only then write things out. I've changed the gimple representation of
some OMP statements (primarily gimple_omp_for) so that they are
relatively normal statements with operands that can be transformed into
SSA form. As far as what's easier to work with - I believe some of the
transformations we have to do could benefit from being in SSA, but on
the other hand the OpenACC predication code has given me some trouble.
I've still not sompletely convinced myself that the update_ssa call I've
added will actually do the right thing after we've mucked up the CFG.
I'm appending a proof-of-concept patch. This is intended to show the
general outline of what I have in mind, rather than pass the testsuite.
It's good enough to compile some of the OpenACC testcases (let's say
worker-single-3 if you need one). Let me know what you think.
Bernd
-------------- next part --------------
A non-text attachment was scrubbed...
Name: offload-early.diff
Type: text/x-patch
Size: 60689 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20150619/33aed8dc/attachment.bin>
More information about the Gcc-patches
mailing list