This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] _Cilk_for for C and C++
- From: Jakub Jelinek <jakub at redhat dot com>
- To: "Iyer, Balaji V" <balaji dot v dot iyer at intel dot com>
- Cc: Jason Merrill <jason at redhat dot com>, "'Jeff Law'" <law at redhat dot com>, "'Aldy Hernandez'" <aldyh at redhat dot com>, "'gcc-patches at gcc dot gnu dot org'" <gcc-patches at gcc dot gnu dot org>, "'rth at redhat dot com'" <rth at redhat dot com>
- Date: Fri, 24 Jan 2014 20:41:44 +0100
- Subject: Re: [PATCH] _Cilk_for for C and C++
- Authentication-results: sourceware.org; auth=none
- References: <52CC6657 dot 3000500 at redhat dot com> <BF230D13CA30DD48930C31D4099330003A4B866E at FMSMSX101 dot amr dot corp dot intel dot com> <20140107212911 dot GA892 at tucnak dot redhat dot com> <BF230D13CA30DD48930C31D4099330003A4B86D5 at FMSMSX101 dot amr dot corp dot intel dot com> <20140108173106 dot GJ892 at tucnak dot redhat dot com> <BF230D13CA30DD48930C31D4099330003A4B8C5B at FMSMSX101 dot amr dot corp dot intel dot com> <52D81708 dot 3010700 at redhat dot com> <BF230D13CA30DD48930C31D4099330003A4BCDE0 at FMSMSX101 dot amr dot corp dot intel dot com> <20140123101239 dot GC892 at tucnak dot redhat dot com> <BF230D13CA30DD48930C31D4099330003A4BE610 at FMSMSX101 dot amr dot corp dot intel dot com>
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Thu, Jan 23, 2014 at 04:38:53PM +0000, Iyer, Balaji V wrote:
> This is how I started to think of it at first, but then when I thought about it ... in _Cilk_for unlike the #pragma simd's for, the for statement - not the body - (e.g. "_Cilk_for (int ii = 0; ii < 10; ii++") doesn't really do anything nor does it belong in the child function. It is really mostly used to calculate the loop count and capture step-size and starting point.
>
> The child function has its own loop that will have a step size of 1 regardless of your step size. You use the step-size to find the correct spot. Let me give you an example:
>
> _Cilk_for (int ii = 0; ii < 10; ii = ii + 2)
> {
> Array [ii] = 5;
> }
>
> This is translated to the following (assume grain is something that the user input):
>
> data_ptr.start = 0;
> data_ptr.end = 10;
> data_ptr.step_size = 2;
> __cilkrts_cilk_for_32 (child_function, &data_ptr, (10-0)/2, grain);
>
> Child_function (void *data_ptr, int high, int low)
> {
> for (xx = low; xx < high; xx++)
> {
> Tmp_var = (xx * data_ptr->step_size) + data_ptr->start;
> // Note: if the _Cilk_for was (ii = 9; ii >= 0; ii -= 2), we would have something like this:
> // Tmp_var = data_ptr->end - (xx * data_ptr->step_size)
> // The for-loop above won't change.
> Array[Tmp_var] = 5;
> }
> }
This isn't really much different from
#pragma omp parallel for schedule(runtime, N)
(i.e. the combined construct), when it is combined, we also don't emit a
call to GOMP_parallel but to some other function to which we pass the
number of iterations and chunk size (== grain in Cilk+ terminology), the
only (minor) difference is that for OpenMP when you handle the whole low ...
high range the child function doesn't exit, but calls a function to give it
next pari of low/high and only when that function tells it there is no
further work to do, it returns. But, the Cilk+ case is clearly the same
thing with just implicit telling there is no further work in the current
function.
So, I'd strongly prefer if you swap the parallel with Cilk_for, just set
the flag that the two are combined like OpenMP already has for tons of
constructs, and during expansion you just treat it together.
Jakub