This is the mail archive of the
mailing list for the GCC project.
Re: Offloading GSOC 2015
- From: guray ozen <guray dot ozen at gmail dot com>
- To: Thomas Schwinge <thomas at codesourcery dot com>
- Cc: tobias dot burnus at physik dot fu-berlin dot de, gcc at gcc dot gnu dot org, Jakub Jelinek <jakub at redhat dot com>
- Date: Thu, 12 Mar 2015 12:23:02 +0100
- Subject: Re: Offloading GSOC 2015
- Authentication-results: sourceware.org; auth=none
- References: <CA+ga0G7z+xsO8LB8oc0yv9VHFPpryaH1T2rHOudky-it3Wnu3Q at mail dot gmail dot com> <87wq2n66gj dot fsf at kepler dot schwinge dot homeip dot net>
How can i progress about giving official proposal? Which topics are
GCC interested in?
So far, i have been tried to influence the evolution of the omp 4.0
accelerator model. Sum up of my small achievements until now
- Using Shared Memory in a efficient way
--- I allowed array privatization for private/firstprivate clause of
teams and distribute directives
--- But it is not possible to use private/firstprivate for big arrays.
--- That's why I added dist_private([CHUNK] var-list) and
dist_firstprive([CHUNK] var-list) clause in order to use shared memory
for big arrays. briefly it is not putting all array into shared
memory. it is putting chunk of array into shared memory. and each
block is dealing with own chunk.
--- I added dist_lastprivate([CHUNK] var-list). Basically lastprivate
is not exist according to omp 4.0 standards, since there is no way to
do synchronization among GPU Blocks. But i took off this clause
doesn't need sync because it is using CHUNK. Thus, i can re-collect
data from shared memory. (you can see its animation at slide page
- Extension of device clause
--- I behave target directive as a task. Since i implemented based on
OmpSs, thus OmpSs can manage my task.
--- I didn't wait used to pass integer for device() clause. Thus
runtime automatically started to manage multiple GPU. (OmpSs runtime
is already doing this)
--- Also device-to-device data transfer became available. (Normally
there is no way to do this in omp)
(you can see its animation at slide page 10 )
Additionally, Nowadays i am working on 2 topic
1 - How to take advantage Dynamic parallelism.
--- While doing this, I am comparing dynamic parallelism with creation
extra threads in advance instead creating new kernel. Because DP
causes overhead and sometimes it might need communication between
child-parent thread. (for example when reduction occurred. and only
way to communicate global memory)
2 - Trying to find some advantages of Dynamic compilation. (from
opencl side is already available. form nvidia side it is just
announced with 7.0 nvrtc runtime compilation)
2015-03-11 13:53 GMT+01:00 Thomas Schwinge <email@example.com>:
> On Tue, 3 Mar 2015 16:16:21 +0100, guray ozen <firstname.lastname@example.org> wrote:
>> I finished my master at Barcelona Supercomputing Center and i started
>> to do PhD. My master thesis code generation OpenMP 4.0 for GPU
>> accelerators. And i am still working on it.
>> Last year i presented my research compiler MACC at IWOMP'14 which is
>> based on OmpSs runtime (http://pm.bsc.es/). You can check here my
>> paper and related paper
>> As far as i know, GCC 5 will come with OpenMP 4.0 and OpenACC
>> offloading. I am wondering that are there a any project related code
>> generation within gsoc 2015? Because when i checked todo list about
>> offloading, i couldn't come accross. or what am i supposed to do?
> The idea that you propose seems like a fine project for GSoC --
> definitely there'll be enough work to be done. ;-)
> Somebody from the GCC side needs to step up as a mentor.