[PATCH] _Cilk_for for C and C++

Iyer, Balaji V balaji.v.iyer@intel.com
Fri Jan 24 20:33:00 GMT 2014



> -----Original Message-----
> From: Jakub Jelinek [mailto:jakub@redhat.com]
> Sent: Friday, January 24, 2014 2:42 PM
> To: Iyer, Balaji V
> Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
> 'rth@redhat.com'
> Subject: Re: [PATCH] _Cilk_for for C and C++
> 
> On Thu, Jan 23, 2014 at 04:38:53PM +0000, Iyer, Balaji V wrote:
> > 	This is how I started to think of it at first, but then when I thought
> about it ... in _Cilk_for unlike the #pragma simd's for, the for statement - not
> the body - (e.g. "_Cilk_for (int ii = 0; ii < 10; ii++") doesn't really do anything
> nor does it belong in the child function. It is really mostly used to calculate the
> loop count and capture step-size and starting point.
> >
> > 	The child function has its own loop that will have a step size of 1
> regardless of your step size. You use the step-size to find the correct spot.
> Let me give you an example:
> >
> > _Cilk_for (int ii = 0; ii < 10; ii = ii  + 2) {
> > 	Array [ii] = 5;
> > }
> >
> > This is translated to the following (assume grain is something that the user
> input):
> >
> > data_ptr.start = 0;
> > data_ptr.end = 10;
> > data_ptr.step_size = 2;
> > __cilkrts_cilk_for_32 (child_function, &data_ptr, (10-0)/2, grain);
> >
> > Child_function (void *data_ptr, int high, int low) {
> > 	for (xx = low; xx < high; xx++)
> > 	 {
> > 		Tmp_var = (xx * data_ptr->step_size) + data_ptr->start;
> > 		// Note: if the _Cilk_for was (ii = 9; ii >= 0; ii -= 2), we would
> have something like this:
> > 		// Tmp_var = data_ptr->end - (xx * data_ptr->step_size)
> > 		// The for-loop above won't change.
> > 		Array[Tmp_var] = 5;
> > 	}
> > }
> 
> This isn't really much different from
> #pragma omp parallel for schedule(runtime, N) (i.e. the combined
> construct), when it is combined, we also don't emit a call to GOMP_parallel
> but to some other function to which we pass the number of iterations and
> chunk size (== grain in Cilk+ terminology), the only (minor) difference is that
> for OpenMP when you handle the whole low ...
> high range the child function doesn't exit, but calls a function to give it next
> pari of low/high and only when that function tells it there is no further work
> to do, it returns.  But, the Cilk+ case is clearly the same thing with just implicit
> telling there is no further work in the current function.
> 
> So, I'd strongly prefer if you swap the parallel with Cilk_for, just set the flag
> that the two are combined like OpenMP already has for tons of constructs,
> and during expansion you just treat it together.

Hi Jakub,
	What you are suggesting here would require a significant rewrite of the code. This version of _Cilk_for works and it does share significant amount of work with OMP routines as requested by other GCC developers. Given the time constraints, let's try to get this version accepted so that the feature will be available for the users and we will look into moving toward your suggestion when the phase 1 opens again.

Thanks,

Balaji V. Iyer.


> 
> 	Jakub



More information about the Gcc-patches mailing list