Give me advice on GSoC OpenMP

Sho Nakatani dev.laysakura@gmail.com
Sun Apr 3 10:26:00 GMT 2011


From: Richard Guenther <richard.guenther@gmail.com>
Date: Sun, 3 Apr 2011 09:28:49 +0200

> On Sat, Apr 2, 2011 at 10:53 AM, Sho Nakatani <dev.laysakura@gmail.com> wrote:
>> Hi!
>>
>> I'm Sho Nakatani, a student of the University of Tokyo, Japan.
>> I'd like to tackle GSoC this year!
>> I'm trying to speed up the OpenMP implementation in GCC.
>>
>> The following graph shows the OpenMP in GCC is much slower than that of Intel C Compiler.
>> https://github.com/laysakura/GCC-OpenMP-Speedup/raw/master/img/task-gcc-vs-icc.png
>>
>> Here is the code I on measured the exec time.
>> https://github.com/laysakura/GCC-OpenMP-Speedup/blob/master/test/openmp-fibonacci.c
>>
>> And I compiled it by the following command:
>>
>>    gcc -O3 -fopenmp -o openmp-fibonacci-gcc openmp-fibonacci.c
>>    icc -O3 -openmp -o openmp-fibonacci-icc openmp-fibonacci.c
>>
>> After that, I executed them on a machine with 32 AMD CPUs (each has 4 cores).
>>
>>
>> Currently, I'm planning to change the algorighm of `task' premitive in `libgomp'.
>> This plan is of course for GSoC but also for my graduation thesis.
>> My teacher has some idea on the better algorithm (but I haven't learned it yet).
>>
>> Are there any advice from the members of GCC ML?
>> Anything is OK:
>>
>> Although I know some about C programming and I have implemented a very small
>> C compiler myself, I'm quite new to GCC.
>>
>> I welcome advice on how to get accepted from GSoC, too :-)
> 
> What does your fibonacci testcase trying to measure?  It looks like it is
> measuring thread creation/switching time only.

I tried to show GOMP's implementation of OpenMP Task was slow.
Users of OpenMP normally expects the execution time should decrease
as the number of CPU cores increase.

Of course, the graph is not enough to show the cause of the low speed
is `task' implementation. So I'm now trying to show how threads are
created and in which cores they work on like:

<fib(3) thread0,core0>
|                      \
<fib(2) thread1,core1>  <fib(1) thread3,core3>
|                   \     |
|                    \    <fib(0) thread4,core4>
|                     \
|                      \
<fib(1) thread2,core2>  <fib(0) thread5,core5>


Then, I'll compare the trees created by gcc and icc, and point out
that the implementation of OpenMP Task uses Lazy Task Creation while
gcc does not.

> 
> Richard.
> 

--
Sho Nakatani



More information about the Gcc mailing list