RFC: simd enabled functions (omp declare simd / elementals)

Richard Biener richard.guenther@gmail.com
Mon Nov 4 10:37:00 GMT 2013


On Fri, Nov 1, 2013 at 4:04 AM, Aldy Hernandez <aldyh@redhat.com> wrote:
> Hello gentlemen.  I'm CCing all of you, because each of you can provide
> valuable feedback to various parts of the compiler which I touch.  I have
> sprinkled love notes with your names throughout the post :).
>
> This is a patch against the gomp4 branch.  It provides initial support for
> simd-enabled functions which are "#pragma omp declare simd" in the OpenMP
> world and elementals in Cilk Plus nomenclature.  The parsing bits for OpenMP
> are already in trunk, but they are silently ignored.  This patch aims to
> remedy the situation.  The Cilk Plus parsing bits, OTOH, are not ready, but
> could trivially be adapted to use this infrastructure (see below).
>
> I would like to at least get this into the gomp4 branch for now, because I
> am accumulating far too many changes locally.
>
> The main idea is that for a simd annotated function, we can create one or
> more cloned vector variants of a scalar function that can later be used by
> the vectorizer.
>
> For a simple example with multiple returns...
>
> #pragma omp declare simd simdlen(4) notinbranch
> int foo (int a, int b)
> {
>   if (a == b)
>     return 555;
>   else
>     return 666;
> }
>
> ...we would generate with this patch (unoptimized):

Just a quick question, queueing the thread for later review (aww, queue
back at >100 threads again - I can't get any work done anymore :().

What does #pragma omp declare simd guarantee about memory
side-effects and memory use in the function?  That is, unless the
function can be safely annotated with the 'const' attribute the
whole thing is useless for the vectorizer.

Thanks,
Richard.

> foo.simdclone.0 (vector(4) int simd.4, vector(4) int simd.5)
> {
>   unsigned int iter.6;
>   int b.3[4];
>   int a.2[4];
>   int retval.1[4];
>   int _3;
>   int _5;
>   int _6;
>   vector(4) int _7;
>
>   <bb 2>:
>   a.2 = VIEW_CONVERT_EXPR<int[4]>(simd.4);
>   b.3 = VIEW_CONVERT_EXPR<int[4]>(simd.5);
>   iter.6_12 = 0;
>
>   <bb 3>:
>   # iter.6_9 = PHI <iter.6_12(2), iter.6_14(6)>
>   _5 = a.2[iter.6_9];
>   _6 = b.3[iter.6_9];
>   if (_5 == _6)
>     goto <bb 5>;
>   else
>     goto <bb 4>;
>
>   <bb 4>:
>
>   <bb 5>:
>   # _3 = PHI <555(3), 666(4)>
>   retval.1[iter.6_9] = _3;
>   iter.6_14 = iter.6_9 + 1;
>   if (iter.6_14 < 4)
>     goto <bb 6>;
>   else
>     goto <bb 7>;
>
>   <bb 6>:
>   goto <bb 3>;
>
>   <bb 7>:
>   _7 = VIEW_CONVERT_EXPR<vector(4) int>(retval.1);
>   return _7;
>
> }
>
> The new loop is properly created and annotated with loop->force_vect=true
> and loop->safelen set.
>
> A possible use may be:
>
> int array[1000];
> void bar ()
> {
>   int i;
>   for (i=0; i < 1000; ++i)
>     array[i] = foo(i, 123);
> }
>
> In which case, we would use the simd clone if available:
>
> bar ()
> {
>   vector(4) int vect_cst_.21;
>   vector(4) int vect_i_6.20;
>   vector(4) int * vectp_array.19;
>   vector(4) int * vectp_array.18;
>   vector(4) int vect_cst_.17;
>   vector(4) int vect__4.16;
>   vector(4) int vect_vec_iv_.15;
>   vector(4) int vect_cst_.14;
>   vector(4) int vect_cst_.13;
>   int stmp_var_.12;
>   int i;
>   unsigned int ivtmp_1;
>   int _4;
>   unsigned int ivtmp_7;
>   unsigned int ivtmp_20;
>   unsigned int ivtmp_21;
>
>   <bb 2>:
>   vect_cst_.13_8 = { 0, 1, 2, 3 };
>   vect_cst_.14_2 = { 4, 4, 4, 4 };
>   vect_cst_.17_13 = { 123, 123, 123, 123 };
>   vectp_array.19_15 = &array;
>   vect_cst_.21_5 = { 1, 1, 1, 1 };
>   goto <bb 4>;
>
>   <bb 3>:
>
>   <bb 4>:
>   # i_9 = PHI <i_6(3), 0(2)>
>   # ivtmp_1 = PHI <ivtmp_7(3), 1000(2)>
>   # vect_vec_iv_.15_11 = PHI <vect_vec_iv_.15_12(3), vect_cst_.13_8(2)>
>   # vectp_array.18_16 = PHI <vectp_array.18_17(3), vectp_array.19_15(2)>
>   # ivtmp_20 = PHI <ivtmp_21(3), 0(2)>
>   vect_vec_iv_.15_12 = vect_vec_iv_.15_11 + vect_cst_.14_2;
>   vect__4.16_14 = foo.simdclone.0 (vect_vec_iv_.15_11, vect_cst_.17_13);
>   _4 = 0;
>   MEM[(int *)vectp_array.18_16] = vect__4.16_14;
>   vect_i_6.20_19 = vect_vec_iv_.15_11 + vect_cst_.21_5;
>   i_6 = i_9 + 1;
>   ivtmp_7 = ivtmp_1 - 1;
>   vectp_array.18_17 = vectp_array.18_16 + 16;
>   ivtmp_21 = ivtmp_20 + 1;
>   if (ivtmp_21 < 250)
>     goto <bb 3>;
>   else
>     goto <bb 5>;
>
>   <bb 5>:
>   return;
>
> }
>
> That's the idea.
>
> Some of the ABI issues still need to be resolved (mangling for avx-512, what
> to do with non x86 architectures, what (if any) default clones will be
> created when no vector length is specified, etc etc), but the main
> functionality can be seen above.
>
> Uniform and linear parameters (which are passed as scalars) are still not
> handled.  Also, Jakub mentioned that with the current vectorizer we probably
> can't make good use of the inbranch/masked clones.  I have a laundry list of
> missing things prepended by // FIXME if anyone is curious.
>
> I'd like some feedback from y'all in your respective areas, since this
> touches a few places besides OpenMP.  For instance...
>
> [Honza] Where do you suggest I place a list of simd clones for a particular
> (scalar) function?  Right now I have added a simdclone_of field in
> cgraph_node and am (temporarily) serially scanning all functions in
> get_simd_clone().  This is obviously inefficient.  I didn't know whether to
> use the current next_sibling_clone/etc fields or create my own.  I tried
> using clone_of, and that caused some havoc so I'd like some feedback.
>
> [Martin] I have adapted the ipa_parm_adjustment infrastructure to allow
> adding new arguments out of the blue like you mentioned was missing in
> ipa-prop.h.  I have also added support for creating vectors of arguments.
> Could you take a look at my changes to ipa-prop.[ch]?
>
> [Martin] I need to add new arguments in the case of inbranch clones, which
> add an additional vector with a mask as the last argument:  For the
> following:
>
> #pragma omp declare simd simdlen(4) inbranch
> int foo (int a)
> {
>   return a + 1234;
> }
>
> ...we would generate a clone with:
>
> vector(4) int
> foo.simdclone.0 (vector(4) int simd.4, vector(4) int mask.5)
>
> I thought it best to enhance ipa_modify_formal_parameters() and associated
> machinery than to add the new argument ad-hoc.  We already have enough ways
> of doing tree and cgraph versioning in the compiler ;-).
>
> [Richi] I would appreciate feedback on the vectorizer and the infrastructure
> as a whole.  Do keep in mind that this is a work in progress :).
>
> [Balaji] This patch would provide the infrastructure that can be used by the
> Cilk Plus elementals.  When this is complete, all that would be missing is
> the parser.  You would have to tag the original function with "omp declare
> simd" and "cilk plus elemental" attributes.  See simd_clone_clauses_extract.
>
> [Jakub/rth]: As usual, valuable feedback on OpenMP and everything else is
> greatly appreciated.
>
> Oh yeah, there are many more changes that would ideally be needed in the
> vectorizer.
>
> Fire away!



More information about the Gcc-patches mailing list