[PATCH] _Cilk_for for C and C++

Jakub Jelinek jakub@redhat.com
Fri Jan 24 19:41:00 GMT 2014


On Thu, Jan 23, 2014 at 04:38:53PM +0000, Iyer, Balaji V wrote:
> 	This is how I started to think of it at first, but then when I thought about it ... in _Cilk_for unlike the #pragma simd's for, the for statement - not the body - (e.g. "_Cilk_for (int ii = 0; ii < 10; ii++") doesn't really do anything nor does it belong in the child function. It is really mostly used to calculate the loop count and capture step-size and starting point.
> 
> 	The child function has its own loop that will have a step size of 1 regardless of your step size. You use the step-size to find the correct spot. Let me give you an example:
> 
> _Cilk_for (int ii = 0; ii < 10; ii = ii  + 2)
> {
> 	Array [ii] = 5;
> }
> 
> This is translated to the following (assume grain is something that the user input):
> 
> data_ptr.start = 0;
> data_ptr.end = 10;
> data_ptr.step_size = 2;
> __cilkrts_cilk_for_32 (child_function, &data_ptr, (10-0)/2, grain);
> 
> Child_function (void *data_ptr, int high, int low)
> {
> 	for (xx = low; xx < high; xx++) 
> 	 {
> 		Tmp_var = (xx * data_ptr->step_size) + data_ptr->start;
> 		// Note: if the _Cilk_for was (ii = 9; ii >= 0; ii -= 2), we would have something like this:
> 		// Tmp_var = data_ptr->end - (xx * data_ptr->step_size)
> 		// The for-loop above won't change.  
> 		Array[Tmp_var] = 5;
> 	}
> }

This isn't really much different from
#pragma omp parallel for schedule(runtime, N)
(i.e. the combined construct), when it is combined, we also don't emit a
call to GOMP_parallel but to some other function to which we pass the
number of iterations and chunk size (== grain in Cilk+ terminology), the
only (minor) difference is that for OpenMP when you handle the whole low ...
high range the child function doesn't exit, but calls a function to give it
next pari of low/high and only when that function tells it there is no
further work to do, it returns.  But, the Cilk+ case is clearly the same
thing with just implicit telling there is no further work in the current
function.

So, I'd strongly prefer if you swap the parallel with Cilk_for, just set
the flag that the two are combined like OpenMP already has for tons of
constructs, and during expansion you just treat it together.

	Jakub



More information about the Gcc-patches mailing list