This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: OMP4/cilkplus: simd clone function mangling

On 09/27/13 09:23, Jakub Jelinek wrote:
On Thu, Sep 26, 2013 at 02:31:33PM -0500, Aldy Hernandez wrote:
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -42806,6 +42806,43 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
    return val;

+/* Return the default vector mangling ISA code when none is specified
+   in a `processor' clause.  */
+static char
+ix86_cilkplus_default_vector_mangling_isa_code (struct cgraph_node *clone
+  return 'x';

I think rth was suggesting using vecsize_mangle, vecsize_modifier or something else,
instead of ISA, because it won't represent the ISA on all targets.
It is just some magic letter used in mangling of the simd functions.

I thought he was only talking about the local vecsize_mangle() function, not the target hooks. Fair enough, I have changed all the ISA references when they can be replaced with *mangle* or something similar.

+  /* To distinguish from an OpenMP simd clone, Cilk Plus functions to
+     be cloned have a distinctive artificial label in addition to "omp
+     declare simd".  */
+  bool cilk_clone = flag_enable_cilkplus
+    && lookup_attribute ("cilk plus elemental",
+			 DECL_ATTRIBUTES (new_node->symbol.decl));

Formatting.  I'd say it should be
   bool cilk_clone
     = (flag_enable_cilkplus
        && lookup_attribute ("cilk plus elemental",
	  DECL_ATTRIBUTES (new_node->symbol.decl)));

+  if (cilk_clone)
+    remove_attribute ("cilk plus elemental",
+		      DECL_ATTRIBUTES (new_node->symbol.decl));

I think it doesn't make sense to remove the attribute.


+  pretty_printer vars_pp;

Do you really need two different pretty printers?

Whoops, fixed.  Nice catch.

Can't you just print "_ZGV%c%c%d into pp (is pp_printf
that cheap, wouldn't it be better to pp_string (&pp, "_ZGV"),
2 pp_character + one pp_decimal_int?), and then do the loop over
the args, which right now writes into vars_pp and finally
pp_underscore and pp_string the normally mangled name?

pp_printf() would be cheap. It's only used for a few cloned functions in a compilation unit. I like printf. It's pretty and clean. Not using it, is like saving sex for your old age ;-). But just to keep you happy, I changed maintainers are free to live their celibate monk lives as they see fit :).

+/* Create a simd clone of OLD_NODE and return it.  */
+static struct cgraph_node *
+simd_clone_create (struct cgraph_node *old_node)
+  struct cgraph_node *new_node;
+  new_node = cgraph_function_versioning (old_node, vNULL, NULL, NULL, false,
+					 NULL, NULL, "simdclone");

My understanding of how IPA cloning etc. works is that you first
set up various data structures describing how you change the arguments
and only then actually do cgraph_function_versioning which already during
the copying will do some of the transformations of the IL.
But perhaps those transformations are too complicated to describe for
tree-inline.c to make them for you.

Sure, we can worry about that when we're actually emitting the actual clones (as discussed below), and when we start adapting the vectorizer.

+  tree attr = lookup_attribute ("omp declare simd",
+				DECL_ATTRIBUTES (node->symbol.decl));
+  if (!attr)
+    return;
+  do
+    {
+      struct cgraph_node *new_node = simd_clone_create (node);
+      bool inbranch_clause;
+      simd_clone_clauses_extract (new_node, TREE_VALUE (attr),
+				  &inbranch_clause);
+      simd_clone_compute_isa_and_simdlen (new_node);
+      simd_clone_mangle (node, new_node);

As discussed on IRC, I was hoping that for OpenMP simd and selected
targets (e.g. i?86-linux and x86_64-linux) we could do better than that,
creating not just one or two clones as we do for Cilk+ where one can
select which CPU (and thus ISA) he wants to build the clones for, but
creating clones for all ISAs, and just based on command line options
either emit just one of them as the really optimized one and the others
just as thunks that would just call other simd clone functions or the
normal function possibly several times.

The thunk sounds like a good idea, long term. How about we start by emitting all the variants up-front and then we can optimize these cases later?

I was thinking, in the absence of a `simdlen' clause, we can provide a target hook that returns a vector of (struct { int hw_vector_size; char vecsize_mangle }) which would gives us the different clone variants we should generate. If the user provides `simdlen', we can continue generating just one clone (or two with *inbranch) with the present generic algorithm in simd_clone_compute_vecsize_and_simdlen().

Or do you have any other ideas? But I'd like to leave the thunking business after we get the general infrastructure working.


Attachment: curr
Description: Text document

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]