This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Ping Re: [gomp4] Dumping gimple for offload.


On 11/20/2013 10:36 AM, Jakub Jelinek wrote:
> On Wed, Nov 20, 2013 at 10:34:30AM +0100, Richard Biener wrote:
>> On Tue, Nov 19, 2013 at 10:58 AM, Ilya Tocar <tocarip.intel@gmail.com> wrote:
>>> On 14 Nov 11:27, Richard Biener wrote:
>>>>> +  /* Set when symbol needs to be dumped for lto/offloading.  */
>>>>> +  unsigned need_dump : 1;
>>>>> +
>>>>
>>>> That's very non-descriptive.  What's "offloading"?  But yes, something
>>>> like this is what I was asking for.
>>>
>>> I've changed it into:
>>> Set when symbol needs to be dumped into LTO bytecode for LTO,
>>> or in pragma omp target case, for separate compilation targeting
>>> a different architecture.
>>>
>>> Ok for gomp4 branch now?
>>
>> Works for me.  I'll let branch maintainers decide if it follows whatever
>> is done there (I haven't found time to follow stuff here).
> 
> Ok then.

We've been working on similar patches for our OpenACC project. The goal
is to have functions generated during omp-low that will ultimately
execute on a ptx target, write them out using LTO infrastructure and
read them back in using a nvptx-none lto1.

Unforunately, with multiple teams working in the same area there's
obviously going to be some measure of duplication. What I'd like to do
is to post a snapshot of what I currently have, to show the general
ideas and hopefully get some discussion of what the final picture should
look like. The next few mails in reply to this one will contain patches
that work towards the following general outline. I've been trying to
keep this flexible enough so that it won't be suitable just for the
OpenACC work but for whatever else people want to achieve in this area.

1. New configure options are added, --enable-accelerator and
--enable-as-accelerator-for. The names are certainly up for discussion.
These allow the compiler to know which target combinations are
available. The host compiler will be configured with
--enable-accelerator, and the offload/accelerator compiler is configured
with both options (mostly to ensure they both agree on the spelling of
the accelerator target name).
2. Using --enable-as-accelerator-for= changes the install paths, so that
the accelerator compilers end up in (for example)
   bin/x86_64-linux-gnu-accel-nvptx-gcc-4.9.0
   libexec/x86_64-linux-gnu/accel/nvptx/4.9.0/lto1
which should keep them separate in case a target can be used both as a
normal target and as an accelerator.
3. Some machinery is added to build the accelerator gcc directly in the
same tree as the host compiler, in a separate "accel-gcc" subdir. This
works for nvptx because that target doesn't even want to build a libgcc.
It may not be suitable for other accelerators if they want to build
target libraries, but otherwise I think it would be a nice convenience.
However, building separately should work fine as well as long as the
right options are used for configuring all the involved compilers.
4. We add a vector of target machines to the compiler. Normally this is
just initialized to the single machine for which the compiler is
configured, but when e.g. OpenACC with an accelerator is enabled, the
accelerator machine is added to that list. It should cope fine with
multiple different accelerator devices.
5. There's a new DECL_TARGET which refers to this list of target
machines. It's set when creating a child function from e.g. "#pragma acc
parallel"
6. ipa_write_summaries iterates over DECL_TARGET machines to write out
LTO for each of them. LTO sections for a different target get a separate
prefix encoding the machine name, e.g. ".gnu.tlto_nvptx_...".
7. lto_wrapper recognizes them and calls the various gcc drivers as
needed. This is where the series ends, and this step is still incomplete.

As mentioned, this patch series is still incomplete and has rough edges,
but I hope it will generate discussion. Further details that will need
to be addressed are (among others) option handling between compilers for
different targets, and slightly rewriting the incoming gimple to be
valid for the target (nvptx requires variables to go into various
different address spaces).

The patches I'll send assume that the present patch from this thread has
been reverted, but otherwise they should apply to current gomp-4_0-branch.

Thoughts, comments? Does anyone have a good name for these accelerator
targets or output targets, something that avoids the overloaded word
"target" (I was thinking "destination machine" maybe)?


Bernd


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]