User directed Function Multiversioning via Function Overloading (issue5752064)

Sriraman Tallam tmsriram@google.com
Mon Jun 4 19:01:00 GMT 2012


Hi,

   Attaching updated patch for function multiversioning which brings
in plenty of changes.

* As suggested by Richard earlier, I have made cgraph aware of
function versions. All nodes of function versions are chained and the
dispatcher bodies are created on demand while building cgraph edges.
The dispatcher body will be created if and only if there is a call or
reference to a versioned function. Previously, I was maintaining the
list of versions separately in a hash map, all that is gone now.
* Now, the file multiverison.c has some helper routines that are used
in the context of function versioning. There are no new passes and no
new globals.
* More tests, updated existing tests.
* Fixed lots of bugs.
* Updated patch description.

Patch attached. Patch also available for review at
http://codereview.appspot.com/5752064

Please let me know what you think,

Thanks,
-Sri.


On Mon, May 14, 2012 at 11:28 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi H.J,
>
>   Attaching new patch with 2 test cases, mv2.C checks ISAs only and
> mv1.C checks ISAs and arches mixed. Right now, checking only arches is
> not needed as they are mutually exclusive, any order should be fine.
>
> Patch also available for review here:  http://codereview.appspot.com/5752064
>
> Thanks,
> -Sri.
>
> On Sat, May 12, 2012 at 6:37 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Fri, May 11, 2012 at 7:04 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> Hi H.J.,
>>>
>>>   I have updated the patch to improve the dispatching method like we
>>> discussed. Each feature gets a priority now, and the dispatching is
>>> done in priority order. Please see i386.c for the changes.
>>>
>>> Patch also available for review here:  http://codereview.appspot.com/5752064
>>>
>>
>> I think you need 3 tests:
>>
>> 1.  Only with ISA.
>> 2.  Only with arch
>> 3.  Mixed with ISA and arch
>>
>> since test mixed ISA and arch may hide issues with ISA only or arch only.
>>
>> --
>> H.J.
-------------- next part --------------

Overview of the patch which adds front-end support to specify function versions.

Example:

int foo ();  /* Default version */
int foo () __attribute__ ((target("avx,popcnt")));/*Specialized for avx and popcnt */
int foo () __attribute__ ((target("arch=core2,ssse3")));/*Specialized for core2 and ssse3*/

int main ()
{
 int (*p)() = &foo;
 return foo () + (*p)();
}

int foo ()
{
 return 0;
}

int __attribute__ ((target("avx,popcnt")))
foo ()
{
 return 0;
}

int __attribute__ ((target("arch=core2,ssse3")))
foo ()
{
 return 0;
}

The above example has foo defined 3 times, but all 3 definitions of foo are
different versions of the same function. The call to foo in main, directly and
via a pointer, are calls to the multi-versioned function foo which is dispatched
to the right foo at run-time.

What does the patch do?

* Tracking decls that correspond to function versions of function
name, say "foo":

When the front-end sees more than one decl for "foo", with atleast one decl
tagged with "target"  attributes, it marks it as function versions. To
prevent duplicate definition errors with other versions of "foo",
"decls_match" function in cp/decl.c is made to return false when 2 decls have
the same signature but different target attributes. This will make all function
versions of "foo" to be added to the overload list of "foo".

* Change the assembler names of the function versions.

The front-end changes the assembler names of the function versions by suffixing
the sorted list of args to "target" to the function name of "foo". For example,
he assembler name of "void foo () __attribute__ ((target ("sse4")))" will
become _Z3foov.sse4.

* Overload resolution:

 Function "build_over_call" in cp/call.c sees a call to function
"foo", which is multi-versioned. The overload resolution happens in
function "joust" in "cp/call.c". Here, the call to "foo" has all
possible versions of "foo" as candidates. All the candidates of "foo" are
stored in the cgraph data structures. Each version of foo is chained in a 
doubly-linked list with the default function as the first element.  This allows
any pass to access all the semantically identical versions. Also, a dispatcher
decl is created which should be called and at run-time will dispatch the right
function version.

Also, in joust, where overload resolution happens, a multiversioned function
resolution is made to return the most specialized version.  This is the version
that will be checked for dispatching first and is determined by the target.
Now, if the caller can inline this function version then a direct call is made
to this function version rather than go through the dispatcher. When a direct
call cannot be made, a call to the dispatcher function is created.

* Creating the dispatcher body.

The dispatcher body, called the resolver is made only when there is a call to a
multiversioned function dispatcher or the address of a function is taken. This
is generated during build_cgraph_edges for a call or cgraph_mark_address_taken
for a pointer reference.

* Dispatch ordering.

The order in which the function versions are checked during dispatch is based
on a priority value assigned for the ISA that is catered. More specialized 
versions are checked for dispatching first.  This is to mitigate the ambiguity
that can arise when more than one function version is valid for execution on
a particular platform.  This is not a perfect solution and in future, the user
should be allowed to assign a dispatching priority value to each version.


	* doc/tm.texi.in (TARGET_DISPATCH_VERSION): New hook description.
	(TARGET_COMPARE_VERSIONS): New hook description.
	* doc/tm.texi: Regenerate.
	* cgraphbuild.c (build_cgraph_edges): Generate body of multiversion
	function dispatcher.
	* c-family/c-common.c (handle_target_attribute): Always keep target
	attributes tagged.
	* target.def (dispatch_version): New target hook.
	(compare_versions): New hook.
	* cgraph.c (cgraph_mark_address_taken_node): Generate body of multiversion
	function dispatcher.
	* cgraph.h (cgraph_node): New members dispatcher_fndecl, resolver_fndecl,
	prev_function_version, next_function_version, dispatcher_function.
	(is_default_function_version): New function.
	(mark_function_as_version): New function.
	(has_different_version_attributes): New function.
	(function_target_attribute): New function.
	(build_dispatcher_for_function_versions): New function.
	(build_resolver_for_function_versions): New function.
	* tree.h (DECL_FUNCTION_VERSIONED): New macro.
	(tree_function_decl): New bit-field versioned_function.
	* multiversion.c: New file.
	* testsuite/g++.dg/mv1.C: New test.
	* testsuite/g++.dg/mv2.C: New test.
	* testsuite/g++.dg/mv3.C: New test.
	* testsuite/g++.dg/mv4.C: New test.
	* cp/class.c:
	(add_method): Change assembler names of function versions.
	(resolve_address_of_overloaded_function): Save all function
	version candidates. Create dispatcher decl and return address of
	dispatcher instead.
	* cp/decl.c (decls_match): Make decls unmatched for versioned
	functions.
	(duplicate_decls): Remove ambiguity for versioned functions. 
	(cxx_comdat_group): Make comdat group of versioned functions be the
	same.
	* cp/error.c (dump_exception_spec): Dump assembler name for function
	versions.
	* cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned
	functions that are also marked inline.
	* cp/decl2.c:(check_classfn): Check attributes of versioned functions
	for match.
	* cp/call.c: (build_new_function_call): Check if versioned functions
	have a default version.
	(build_over_call): Make calls to multiversioned functions
	to call the dispatcher.
	(joust): For calls to multi-versioned functions, make the most
	specialized function version win.
	(tourney): Generate dispatcher decl for function versions.
	* cp/mangle.c (write_unqualified_name): Use assembler name for
	versioned functions.
	* Makefile.in: Add multiversion.o
	* config/i386/i386.c (add_condition_to_bb): New function.
	(get_builtin_code_for_version): New function.
	(ix86_compare_versions): New function.
	(feature_compare): New function.
	(ix86_dispatch_version): New function.
	(TARGET_DISPATCH_VERSION): New macro.
	(TARGET_COMPARE_VERSION): New macro.

Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 187817)
+++ gcc/doc/tm.texi	(working copy)
@@ -10997,6 +10997,21 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb})
+For multi-versioned function, this hook sets up the dispatcher.
+@var{dispatch_decl} is the function that will be used to dispatch the
+version. @var{fndecls} are the function choices for dispatch.
+@var{empty_bb} is an basic block in @var{dispatch_decl} where the
+code to do the dispatch will be added.
+@end deftypefn
+
+@deftypefn {Target Hook} int TARGET_COMPARE_VERSIONS (tree @var{decl1}, tree @var{decl2})
+This hook is used to compare the target attributes in two functions to
+figure out which function's features get higher priority.  This is used
+during multi-versioning to figure out the order of dispatching. @var{decl1}
+and @var{decl2} are the two function decls that will be compared.
+@end deftypefn
+
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 187817)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -10877,6 +10877,21 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@hook TARGET_DISPATCH_VERSION
+For multi-versioned function, this hook sets up the dispatcher.
+@var{dispatch_decl} is the function that will be used to dispatch the
+version. @var{fndecls} are the function choices for dispatch.
+@var{empty_bb} is an basic block in @var{dispatch_decl} where the
+code to do the dispatch will be added.
+@end deftypefn
+
+@hook TARGET_COMPARE_VERSIONS
+This hook is used to compare the target attributes in two functions to
+figure out which function's features get higher priority.  This is used
+during multi-versioning to figure out the order of dispatching. @var{decl1}
+and @var{decl2} are the two function decls that will be compared.
+@end deftypefn
+
 @hook TARGET_INVALID_WITHIN_DOLOOP
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/cgraphbuild.c
===================================================================
--- gcc/cgraphbuild.c	(revision 187817)
+++ gcc/cgraphbuild.c	(working copy)
@@ -288,7 +288,6 @@ mark_store (gimple stmt, tree t, void *data)
      }
   return false;
 }
-
 /* Create cgraph edges for function calls.
    Also look for functions and variables having addresses taken.  */
 
@@ -316,6 +315,20 @@ build_cgraph_edges (void)
 	      int freq = compute_call_stmt_bb_frequency (current_function_decl,
 							 bb);
 	      decl = gimple_call_fndecl (stmt);
+	      /* If a call to a multiversioned function dispatcher is found,
+		 generate the body to dispatch the right function
+		 at run-time.  */
+	      if (decl && cgraph_get_node (decl)
+		  && cgraph_get_node (decl)->dispatcher_function)
+		{
+		  tree resolver_decl;
+		  struct cgraph_node *curr_node = cgraph_get_node (decl);
+		  gcc_assert (curr_node->next_function_version);
+		  resolver_decl
+		    = build_resolver_for_function_versions (curr_node);
+		  gcc_assert (resolver_decl);
+		}
+
 	      if (decl)
 		cgraph_create_edge (node, cgraph_get_create_node (decl),
 				    stmt, bb->count, freq);
Index: gcc/c-family/c-common.c
===================================================================
--- gcc/c-family/c-common.c	(revision 187817)
+++ gcc/c-family/c-common.c	(working copy)
@@ -8246,9 +8246,15 @@ handle_target_attribute (tree *node, tree name, tr
       warning (OPT_Wattributes, "%qE attribute ignored", name);
       *no_add_attrs = true;
     }
-  else if (! targetm.target_option.valid_attribute_p (*node, name, args,
-						      flags))
-    *no_add_attrs = true;
+  else
+    {
+      /* When a target attribute is invalid, it may also be because the
+	 target for the compilation unit and the attribute match.  For
+         instance, target attribute "xxx" is invalid when -mxxx is used.
+         When used with multiversioning, removing the attribute can lead
+         to duplicate definitions.  So, keep the attribute tagged.  */
+      targetm.target_option.valid_attribute_p (*node, name, args, flags);
+    }
 
   return NULL_TREE;
 }
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 187817)
+++ gcc/target.def	(working copy)
@@ -1249,6 +1249,24 @@ DEFHOOK
  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
  hook_tree_tree_int_treep_bool_null)
 
+/* Target hook to generate the dispatching code for calls to multi-versioned
+   functions.  DISPATCH_DECL is the function that will have the dispatching
+   logic.  FNDECLS are the list of choices for dispatch and EMPTY_BB is the
+   basic block in DISPATCH_DECL which will contain the code.  */
+DEFHOOK
+(dispatch_version,
+ "",
+ int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL)
+
+/* Target hook to compare the priority of target features in function DECL1
+   and DECL2.  It returns positive value if DECL1 is higher priority,
+   negative value if DECL2 is higher priority and 0 if they are the
+   same.  */
+DEFHOOK
+(compare_versions,
+ "",
+ int, (tree decl1, tree decl2), NULL)
+
 /* Returns a code for a target-specific builtin that implements
    reciprocal of the function, or NULL_TREE if not available.  */
 DEFHOOK
Index: gcc/cgraph.c
===================================================================
--- gcc/cgraph.c	(revision 187817)
+++ gcc/cgraph.c	(working copy)
@@ -1278,6 +1278,14 @@ cgraph_mark_address_taken_node (struct cgraph_node
   node->symbol.address_taken = 1;
   node = cgraph_function_or_thunk_node (node, NULL);
   node->symbol.address_taken = 1;
+  /* If the address of a multiversioned function dispatcher is taken,
+     generate the body to dispatch the right function at run-time.  This
+     is needed as the address can be used to do an indirect call.  */
+  if (node->dispatcher_function)
+    {
+      gcc_assert (node->next_function_version);
+      build_resolver_for_function_versions (node);
+    }
 }
 
 /* Return local info for the compiled function.  */
Index: gcc/cgraph.h
===================================================================
--- gcc/cgraph.h	(revision 187817)
+++ gcc/cgraph.h	(working copy)
@@ -220,6 +220,19 @@ struct GTY(()) cgraph_node {
   struct cgraph_node *prev_sibling_clone;
   struct cgraph_node *clones;
   struct cgraph_node *clone_of;
+
+  /* If this node corresponds to a function version, this points
+     to the dispatcher function.  */
+  tree dispatcher_fndecl;
+  /* If this node is a dispatcher for function versions, this points
+     to resolver function.  */
+  tree resolver_fndecl;
+  /* Chains all the semantically identical function versions.  The
+     first function in this chain is the default function.  */
+  struct cgraph_node *prev_function_version;
+  /* If this node is a dispatcher for function versions, this also points
+     to the default function version.  */
+  struct cgraph_node *next_function_version;
   /* For functions with many calls sites it holds map from call expression
      to the edge to speed up cgraph_edge function.  */
   htab_t GTY((param_is (struct cgraph_edge))) call_site_hash;
@@ -271,6 +284,7 @@ struct GTY(()) cgraph_node {
   /* ?? We should be able to remove this.  We have enough bits in
      cgraph to calculate it.  */
   unsigned tm_clone : 1;
+  unsigned dispatcher_function : 1;
 };
 
 typedef struct cgraph_node *cgraph_node_ptr;
@@ -636,6 +650,22 @@ void cgraph_rebuild_references (void);
 int compute_call_stmt_bb_frequency (tree, basic_block bb);
 void record_references_in_initializer (tree, bool);
 
+/* In multiversion.c  */
+/* Returns true if DECL is a function version and is the default version.  */
+bool is_default_function_version (tree decl);
+void mark_function_as_version (tree);
+/* Returns true if the "target" attribute strings of DECL1 and DECL2
+   dont match.  */
+bool has_different_version_attributes (const tree decl1, const tree decl2);
+/* Return the target attribute if decl is FUNCTION_DECL. */
+tree function_target_attribute (const tree decl);
+/* Builds the dispatcher decl for function versions in VEC.  */
+tree build_dispatcher_for_function_versions (VEC (tree,heap) *vec);
+/* Builds the resolver function which picks the right function version at
+   run-time.  NODE is the cgraph node of the dispatcher which points to
+   the various function versions to be resolved.  */
+tree build_resolver_for_function_versions (struct cgraph_node *node);
+
 /* In ipa.c  */
 bool symtab_remove_unreachable_nodes (bool, FILE *);
 cgraph_node_set cgraph_node_set_new (void);
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 187817)
+++ gcc/tree.h	(working copy)
@@ -3534,6 +3534,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
 #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
 
+/* In FUNCTION_DECL, this is set if this function has other versions generated
+   using "target" attributes.  The default version is the one which does not
+   have any "target" attribute set. */
+#define DECL_FUNCTION_VERSIONED(NODE)\
+   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
+
 /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
    arguments/result/saved_tree fields by front ends.   It was either inherit
    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
@@ -3578,8 +3584,8 @@ struct GTY(()) tree_function_decl {
   unsigned looping_const_or_pure_flag : 1;
   unsigned has_debug_args_flag : 1;
   unsigned tm_clone_flag : 1;
-
-  /* 1 bit left */
+  unsigned versioned_function : 1;
+  /* No bits left.  */
 };
 
 /* The source language of the translation-unit.  */
Index: gcc/multiversion.c
===================================================================
--- gcc/multiversion.c	(revision 0)
+++ gcc/multiversion.c	(revision 0)
@@ -0,0 +1,572 @@
+/* Function Multiversioning.
+   Copyright (C) 2012 Free Software Foundation, Inc.
+   Contributed by Sriraman Tallam (tmsriram@google.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>. */
+
+/* This file contains routines for handling multiversioned functions.
+
+   Function versions are created by using the same function signature but
+   also tagging attribute "target" to specify the platform type for which
+   the version must be executed.  Here is an example:
+
+   int foo ()
+   {
+     printf ("Execute as default");
+     return 0;
+   }
+
+   int  __attribute__ ((target ("arch=corei7")))
+   foo ()
+   {
+     printf ("Execute for corei7");
+     return 0;
+   }
+   
+   int main ()
+   {
+     return foo ();
+   } 
+
+   The call to foo in main is replaced with a call to a dispatcher function
+   that contains the resolver code to call the correct function version at
+   run-time.  */
+
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "langhooks.h"
+#include "flags.h"
+#include "cgraph.h"
+#include "diagnostic.h"
+#include "toplev.h"
+#include "params.h"
+#include "coverage.h"
+#include "ggc.h"
+#include "basic-block.h"
+#include "toplev.h"
+#include "tree-dump.h"
+#include "output.h"
+#include "gimple-pretty-print.h"
+#include "target.h"
+#include "tree-flow.h"
+
+/* Comparator function to be used in qsort routine to sort attribute
+   specification strings to "target".  */
+
+static int
+attr_strcmp (const void *v1, const void *v2)
+{
+  const char *c1 = *(char *const*)v1;
+  const char *c2 = *(char *const*)v2;
+  return strcmp (c1, c2);
+}
+
+/* STR is the argument to target attribute.  This function tokenizes
+   the comma separated arguments, sorts them and returns a string which
+   is a unique identifier for the comma separated arguments.  */
+
+static char *
+sorted_attr_string (const char *str)
+{
+  char **args = NULL;
+  char *attr_str, *ret_str;
+  char *attr = NULL;
+  unsigned int argnum = 1;
+  unsigned int i;
+
+  for (i = 0; i < strlen (str); i++)
+    if (str[i] == ',')
+      argnum++;
+
+  attr_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (attr_str, str);
+
+  for (i = 0; i < strlen (attr_str); i++)
+    if (attr_str[i] == '=')
+      attr_str[i] = '_';
+
+  if (argnum == 1)
+    return attr_str;
+
+  args = (char **)xmalloc (argnum * sizeof (char *));
+
+  i = 0;
+  attr = strtok (attr_str, ",");
+  while (attr != NULL)
+    {
+      args[i] = attr;
+      i++;
+      attr = strtok (NULL, ",");
+    }
+
+  qsort (args, argnum, sizeof (char*), attr_strcmp);
+
+  ret_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (ret_str, args[0]);
+  for (i = 1; i < argnum; i++)
+    {
+      strcat (ret_str, "_");
+      strcat (ret_str, args[i]);
+    }
+
+  free (args);
+  free (attr_str);
+  return ret_str;
+}
+
+/* Returns true when only one of DECL1 and DECL2 is marked with "target"
+   or if the "target" attribute strings of DECL1 and DECL2 dont match.  */
+
+bool
+has_different_version_attributes (const tree decl1, const tree decl2)
+{
+  tree attr1, attr2;
+  char *c1, *c2;
+  bool ret = false;
+
+  if (TREE_CODE (decl1) != FUNCTION_DECL
+      || TREE_CODE (decl2) != FUNCTION_DECL)
+    return false;
+
+  attr1 = function_target_attribute (decl1);
+  attr2 = function_target_attribute (decl2);
+
+  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
+    return false;
+
+  if ((attr1 == NULL_TREE && attr2 != NULL_TREE)
+      || (attr1 != NULL_TREE && attr2 == NULL_TREE))
+    return true;
+
+  c1 = sorted_attr_string (
+	TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1))));
+  c2 = sorted_attr_string (
+	TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2))));
+
+  if (strcmp (c1, c2) != 0)
+     ret = true;
+
+  free (c1);
+  free (c2);
+
+  return ret;
+}
+
+/* If this decl corresponds to a function and has "target" attribute,
+   append the attribute string to its assembler name.  */
+
+static void
+version_assembler_name (const tree decl)
+{
+  tree version_attr;
+  const char *orig_name, *version_string, *attr_str;
+  char *assembler_name;
+  tree assembler_name_tree;
+  
+  if (TREE_CODE (decl) != FUNCTION_DECL)
+    return;
+
+  if (DECL_DECLARED_INLINE_P (decl)
+      && lookup_attribute ("gnu_inline",
+			   DECL_ATTRIBUTES (decl)))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Function versions cannot be marked as gnu_inline,"
+	      " bodies have to be generated\n");
+
+  if (DECL_VIRTUAL_P (decl)
+      || DECL_VINDEX (decl))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Virtual function versioning not supported\n");
+
+  version_attr = function_target_attribute (decl);
+  /* target attribute string is NULL for default functions.  */
+  if (version_attr == NULL_TREE)
+    return;
+
+  orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+  version_string
+    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
+
+  attr_str = sorted_attr_string (version_string);
+  assembler_name = (char *) xmalloc (strlen (orig_name)
+				     + strlen (attr_str) + 2);
+
+  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
+  if (dump_file)
+    fprintf (dump_file, "Assembler name set to %s for function version %s\n",
+	     assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl)));
+
+  assembler_name_tree = get_identifier (assembler_name);
+
+  SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree);
+  SET_DECL_RTL (decl, NULL);
+}
+
+void
+mark_function_as_version (const tree decl)
+{
+  if (DECL_FUNCTION_VERSIONED (decl))
+    return;
+  DECL_FUNCTION_VERSIONED (decl) = 1;
+  version_assembler_name (decl);
+}
+
+/* Returns target attribute tree DECL is a FUNCTION_DECL, returns
+   NULL otherwise.  */
+
+tree
+function_target_attribute (const tree decl)
+{
+  if (TREE_CODE (decl) == FUNCTION_DECL)
+    return lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  return NULL;
+}
+
+/* Returns true if decl is multi-versioned and DECL is the default function,
+   that is it is not tagged with "target" attribute.  */
+
+bool
+is_default_function_version (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && (function_target_attribute (decl) == NULL_TREE));
+}
+
+/* Makes a function attribute of the form NAME(ARG_NAME) and chains
+   it to CHAIN.  */
+
+static tree
+make_attribute (const char *name, const char *arg_name, tree chain)
+{
+  tree attr_name;
+  tree attr_arg_name;
+  tree attr_args;
+  tree attr;
+
+  attr_name = get_identifier (name);
+  attr_arg_name = build_string (strlen (arg_name), arg_name);
+  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  attr = tree_cons (attr_name, attr_args, chain);
+  return attr;
+}
+
+/* Return a new name by appending SUFFIX to the DECL name.  If
+   make_unique is true, append the full path name.  */
+
+static char *
+make_name (tree decl, const char *suffix, bool make_unique)
+{
+  char *global_var_name;
+  int name_len;
+  const char *name;
+  const char *unique_name = NULL;
+
+  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  /* Get a unique name that can be used globally without any chances
+     of collision at link time.  */
+  if (make_unique)
+    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
+
+  name_len = strlen (name) + strlen (suffix) + 2;
+
+  if (make_unique)
+    name_len += strlen (unique_name) + 1;
+  global_var_name = (char *) xmalloc (name_len);
+
+  /* Use '.' to concatenate names as it is demangler friendly.  */
+  if (make_unique)
+      snprintf (global_var_name, name_len, "%s.%s.%s", name,
+		unique_name, suffix);
+  else
+      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
+
+  return global_var_name;
+}
+
+
+/* Make the resolver function decl to dispatch the versions of
+   a multi-versioned function,  DEFAULT_DECL.  Create an
+   empty basic block in the resolver and store the pointer in
+   EMPTY_BB.  Return the decl of the resolver function.  */
+
+static tree
+make_resolver_func (const tree default_decl,
+		    const tree dispatch_decl,
+		    basic_block *empty_bb)
+{
+  char *resolver_name;
+  tree decl, type, decl_name, t;
+  basic_block new_bb;
+  tree old_current_function_decl;
+  bool is_uniq = false;
+
+  /* IFUNC's have to be globally visible.  So, if the default_decl is
+     not, then the name of the IFUNC should be made unique.  */
+  if (TREE_PUBLIC (default_decl) == 0)
+    is_uniq = true;
+
+  /* Append the filename to the resolver function if the versions are
+     not externally visible.  This is because the resolver function has
+     to be externally visible for the loader to find it.  So, appending
+     the filename will prevent conflicts with a resolver function from
+     another module which is based on the same version name.  */
+  resolver_name = make_name (default_decl, "resolver", is_uniq);
+
+  /* The resolver function should return a (void *). */
+  type = build_function_type_list (ptr_type_node, NULL_TREE);
+
+  decl = build_fn_decl (resolver_name, type);
+  decl_name = get_identifier (resolver_name);
+  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
+
+  DECL_NAME (decl) = decl_name;
+  TREE_USED (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  /* IFUNC resolvers have to be externally visible.  */
+  TREE_PUBLIC (decl) = 1;
+  DECL_UNINLINABLE (decl) = 1;
+
+  DECL_EXTERNAL (decl) = 0;
+  DECL_EXTERNAL (dispatch_decl) = 0;
+
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  DECL_STATIC_CONSTRUCTOR (decl) = 0;
+  TREE_READONLY (decl) = 0;
+  DECL_PURE_P (decl) = 0;
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
+      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
+    }
+  else if (TREE_PUBLIC (default_decl))
+    {
+      /* In this case, each translation unit with a call to this
+	 versioned function will put out a resolver.  Ensure it
+	 is comdat to keep just one copy.  */
+      DECL_COMDAT (decl) = 1;
+      make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
+    }
+  /* Build result decl and add to function_decl. */
+  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_RESULT (decl) = t;
+
+  gimplify_function_tree (decl);
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (decl));
+  current_function_decl = decl;
+  gimple_register_cfg_hooks ();
+  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
+  cfun->curr_properties |=
+    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
+     PROP_gimple_any);
+  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
+  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
+  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
+  *empty_bb = new_bb;
+
+  cgraph_add_new_function (decl, true);
+  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      gcc_assert (cgraph_get_node (default_decl));
+      symtab_add_to_same_comdat_group (
+	(symtab_node) cgraph_get_node (decl),
+	(symtab_node) cgraph_get_node (default_decl));
+    }
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  gcc_assert (dispatch_decl != NULL);
+  /* Mark dispatch_decl as "ifunc" with resolver as resolver_name.  */
+  DECL_ATTRIBUTES (dispatch_decl) 
+    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (dispatch_decl));
+
+  /* Create the alias for dispatch to resolver here.  */
+  cgraph_create_function_alias (dispatch_decl, decl);
+  return decl;
+}
+
+/* Generate the dispatching code body to dispatch multi-versioned function
+   DECL.  The target hook is called to process the "target" attributes and
+   provide the code to dispatch the right function at run-time.  NODE points
+   to the dispatcher decl whose body will be created.  */
+
+tree 
+build_resolver_for_function_versions (struct cgraph_node *node)
+{
+  tree resolver_decl;
+  basic_block empty_bb;
+  VEC (tree, heap) *fn_ver_vec = NULL;
+  tree old_current_function_decl;
+  tree default_ver_decl;
+  struct cgraph_node *versn;
+
+  if (node->resolver_fndecl)
+    return node->resolver_fndecl;
+
+  default_ver_decl = node->next_function_version->symbol.decl;
+  resolver_decl = make_resolver_func (default_ver_decl,
+			  		    node->symbol.decl,
+					    &empty_bb);
+  node->resolver_fndecl = resolver_decl;
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (resolver_decl));
+  current_function_decl = resolver_decl;
+
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+
+  for (versn = node->next_function_version; versn;
+       versn = versn->next_function_version)
+    {
+      /* Check for virtual functions here again, as by this time it should
+	 have been determined if this function needs a vtable index or
+	 not.  This happens for methods in derived classes that override
+	 virtual methods in base classes but are not explicitly marked as
+	 virtual.  */
+      if (DECL_VINDEX (versn->symbol.decl))
+        error_at (DECL_SOURCE_LOCATION (versn->symbol.decl),
+		  "Virtual function multiversioning not supported");
+      VEC_safe_push (tree, heap, fn_ver_vec, versn->symbol.decl);
+    }
+
+  gcc_assert (targetm.dispatch_version);
+  targetm.dispatch_version (resolver_decl, fn_ver_vec, &empty_bb);
+
+  rebuild_cgraph_edges (); 
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+  return resolver_decl;
+}
+
+/* Make a dispatcher declaration for the multi-versioned function DECL.
+   Calls to DECL function will be replaced with calls to the dispatcher.
+   Return the decl created.  */
+
+static tree
+make_dispatcher_decl (const tree decl)
+{
+  tree func_decl;
+  char *func_name, *resolver_name;
+  tree fn_type, func_type;
+  bool is_uniq = false;
+
+  if (TREE_PUBLIC (decl) == 0)
+    is_uniq = true;
+
+  func_name = make_name (decl, "ifunc", is_uniq);
+  resolver_name = make_name (decl, "resolver", is_uniq);
+  gcc_assert (resolver_name);
+
+  fn_type = TREE_TYPE (decl);
+  func_type = build_function_type (TREE_TYPE (fn_type),
+				    TYPE_ARG_TYPES (fn_type));
+  
+  func_decl = build_fn_decl (func_name, func_type);
+  TREE_USED (func_decl) = 1;
+  DECL_CONTEXT (func_decl) = NULL_TREE;
+  DECL_INITIAL (func_decl) = error_mark_node;
+  DECL_ARTIFICIAL (func_decl) = 1;
+  /* Mark this func as external, the resolver will flip it again if
+     it gets generated.  */
+  DECL_EXTERNAL (func_decl) = 1;
+  /* This will be of type IFUNCs have to be externally visible.  */
+  TREE_PUBLIC (func_decl) = 1;
+
+  return func_decl;  
+}
+
+tree
+build_dispatcher_for_function_versions (VEC (tree,heap) *fn_ver_vec)
+{
+  struct cgraph_node *node = NULL;
+  struct cgraph_node *default_node = NULL;
+  struct cgraph_node *curr_node = NULL;
+  int ix;
+  tree ele;
+  tree dispatch_decl = NULL;
+
+  gcc_assert (fn_ver_vec != NULL);
+
+  /* Find the default version.  */
+  for (ix = 0; VEC_iterate (tree, fn_ver_vec, ix, ele); ++ix)
+    {
+      if (is_default_function_version (ele))
+	{
+	  default_node = cgraph_get_create_node (ele);
+	  break;
+	}
+    }
+
+  /* If there is no default node, just return NULL.  */
+  if (!default_node)
+    return NULL;
+
+  if (default_node->dispatcher_fndecl)
+    return default_node->dispatcher_fndecl;
+
+#if defined (ASM_OUTPUT_TYPE_DIRECTIVE) && HAVE_GNU_INDIRECT_FUNCTION
+  /* Right now, the dispatching is done via ifunc.  */
+  dispatch_decl = make_dispatcher_decl (default_node->symbol.decl); 
+#else
+  error_at (DECL_SOURCE_LOCATION (default_node->symbol.decl),
+	    "Multiversioning needs ifunc which is not supported "
+	    "in this configuration");
+#endif
+  default_node->dispatcher_fndecl = dispatch_decl;
+  curr_node = cgraph_get_create_node (dispatch_decl);
+  gcc_assert (curr_node);
+  curr_node->dispatcher_function = 1;
+  cgraph_mark_address_taken_node (default_node);
+
+  for (ix = 0; VEC_iterate (tree, fn_ver_vec, ix, ele); ++ix)
+    {
+      node = cgraph_get_create_node (ele);
+      gcc_assert (node != NULL && DECL_FUNCTION_VERSIONED (ele));
+      if (node == default_node)
+	continue;
+      gcc_assert (function_target_attribute (ele) != NULL_TREE);
+      if (curr_node->next_function_version)
+ 	{
+	  node->next_function_version = curr_node->next_function_version;
+	  curr_node->next_function_version->prev_function_version = node;
+	}
+      curr_node->next_function_version = node;
+      node->prev_function_version = curr_node;
+      node->dispatcher_fndecl = dispatch_decl;
+    }
+
+  /* The default version should be the first node.  */
+  default_node->next_function_version = curr_node->next_function_version;
+  curr_node->next_function_version->prev_function_version = default_node;
+  curr_node->next_function_version = default_node;
+  
+  return dispatch_decl; 
+}
Index: gcc/cgraphunit.c
===================================================================
--- gcc/cgraphunit.c	(revision 187817)
+++ gcc/cgraphunit.c	(working copy)
@@ -940,7 +940,7 @@ cgraph_analyze_functions (void)
 
 	      for (edge = cnode->callees; edge; edge = edge->next_callee)
 		if (edge->callee->local.finalized)
-		  enqueue_node ((symtab_node)edge->callee);
+                 enqueue_node ((symtab_node)edge->callee);
 
 	      /* If decl is a clone of an abstract function, mark that abstract
 		 function so that we don't release its body. The DECL_INITIAL() of that
Index: gcc/testsuite/g++.dg/mv2.C
===================================================================
--- gcc/testsuite/g++.dg/mv2.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv2.C	(revision 0)
@@ -0,0 +1,119 @@
+/* Test case to check if Multiversioning chooses the correct
+   dispatching order when versions are for various ISAs.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
+/* { dg-options "-O2" } */
+
+#include <assert.h>
+
+/* Default version.  */
+int foo ();
+/* The dispatch checks should be in the exact reverse order of the
+   declarations below.  */
+int foo () __attribute__ ((target ("mmx")));
+int foo () __attribute__ ((target ("sse")));
+int foo () __attribute__ ((target ("sse2")));
+int foo () __attribute__ ((target ("sse3")));
+int foo () __attribute__ ((target ("ssse3")));
+int foo () __attribute__ ((target ("sse4.1")));
+int foo () __attribute__ ((target ("sse4.2")));
+int foo () __attribute__ ((target ("popcnt")));
+int foo () __attribute__ ((target ("avx")));
+int foo () __attribute__ ((target ("avx2")));
+
+int main ()
+{
+
+  int val = foo ();
+
+  if (__builtin_cpu_supports ("avx2"))
+    assert (val == 1);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == 2);
+  else if (__builtin_cpu_supports ("popcnt"))
+    assert (val == 3);
+  else if (__builtin_cpu_supports ("sse4.2"))
+    assert (val == 4);
+  else if (__builtin_cpu_supports ("sse4.1"))
+    assert (val == 5);
+  else if (__builtin_cpu_supports ("ssse3"))
+    assert (val == 6);
+  else if (__builtin_cpu_supports ("sse3"))
+    assert (val == 7);
+  else if (__builtin_cpu_supports ("sse2"))
+    assert (val == 8);
+  else if (__builtin_cpu_supports ("sse"))
+    assert (val == 9);
+  else if (__builtin_cpu_supports ("mmx"))
+    assert (val == 10);
+  else
+    assert (val == 0);
+
+  return 0;
+}
+
+int
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("mmx")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("sse")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("sse2")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("sse3")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("ssse3")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("sse4.1")))
+foo ()
+{
+  return 5;
+}
+
+int __attribute__ ((target("sse4.2")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("popcnt")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 2;
+}
+
+int __attribute__ ((target("avx2")))
+foo ()
+{
+  return 1;
+}
Index: gcc/testsuite/g++.dg/mv4.C
===================================================================
--- gcc/testsuite/g++.dg/mv4.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv4.C	(revision 0)
@@ -0,0 +1,23 @@
+/* Test case to check if the compiler generates an error message
+   when the default version of a multiversioned function is absent
+   and its pointer is taken.  */
+
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2" } */
+
+int __attribute__ ((target ("sse")))
+foo ()
+{
+  return 1;
+}
+int __attribute__ ((target ("popcnt")))
+foo ()
+{
+  return 0;
+}
+
+int main ()
+{
+  int (*p)() = &foo; /* { dg-error "Pointer to a multiversioned function without a default is not allowed" {} } */
+  return (p)();
+}
Index: gcc/testsuite/g++.dg/mv1.C
===================================================================
--- gcc/testsuite/g++.dg/mv1.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv1.C	(revision 0)
@@ -0,0 +1,202 @@
+/* Test case to check if Multiversioning works.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
+/* { dg-options "-O2 -fPIC" } */
+
+#include <assert.h>
+
+/* Default version.  */
+int foo ();
+/* The other versions of foo.  Mix up the ordering and 
+   check if the dispatching does it in the order of priority. */
+/* Check combination of target attributes.  */
+int foo () __attribute__ ((target("arch=corei7,popcnt")));
+/* The target operands in this declaration and the definition are re-ordered.
+   This should still work.  */
+int foo () __attribute__ ((target("ssse3,avx2")));
+
+/* Check for all target attributes for which dispatchers are available.  */
+/* Check arch= */
+int foo () __attribute__((target("arch=core2")));
+int foo () __attribute__((target("arch=corei7")));
+int foo () __attribute__((target("arch=atom")));
+/* Check ISAs  */
+int foo () __attribute__((target("sse3")));
+int foo () __attribute__((target("sse2")));
+int foo () __attribute__((target("sse")));
+int foo () __attribute__((target("avx")));
+int foo () __attribute__((target("sse4.2")));
+int foo () __attribute__((target("popcnt")));
+int foo () __attribute__((target("sse4.1")));
+int foo () __attribute__((target("ssse3")));
+int foo () __attribute__((target("mmx")));
+int foo () __attribute__((target("avx2")));
+/* Check more arch=.  */
+int foo () __attribute__((target("arch=amdfam10")));
+int foo () __attribute__((target("arch=bdver1")));
+int foo () __attribute__((target("arch=bdver2")));
+
+int (*p)() = &foo;
+int main ()
+{
+  int val = foo ();
+  assert (val ==  (*p)());
+
+  /* Check in the exact same order in which the dispatching
+     is expected to happen.  */
+  if (__builtin_cpu_is ("bdver1"))
+    assert (val == 1);
+  else if (__builtin_cpu_is ("bdver2"))
+    assert (val == 2);
+  else if (__builtin_cpu_supports ("avx2")
+	   && __builtin_cpu_supports ("ssse3"))
+    assert (val == 3);
+  else if (__builtin_cpu_supports ("avx2"))
+    assert (val == 4);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == 5);
+  else if (__builtin_cpu_is ("corei7")
+	   && __builtin_cpu_supports ("popcnt"))
+    assert (val == 6);
+  else if (__builtin_cpu_supports ("popcnt"))
+    assert (val == 7);
+  else if (__builtin_cpu_is ("corei7"))
+    assert (val == 8);
+  else if (__builtin_cpu_supports ("sse4.2"))
+    assert (val == 9);
+  else if (__builtin_cpu_supports ("sse4.1"))
+    assert (val == 10);
+  else if (__builtin_cpu_is ("amdfam10h"))
+    assert (val == 11);
+  else if (__builtin_cpu_is ("core2"))
+    assert (val == 12);
+  else if (__builtin_cpu_is ("atom"))
+    assert (val == 13);
+  else if (__builtin_cpu_supports ("ssse3"))
+    assert (val == 14);
+  else if (__builtin_cpu_supports ("sse3"))
+    assert (val == 15);
+  else if (__builtin_cpu_supports ("sse2"))
+    assert (val == 16);
+  else if (__builtin_cpu_supports ("sse"))
+    assert (val == 17);
+  else if (__builtin_cpu_supports ("mmx"))
+    assert (val == 18);
+  else
+    assert (val == 0);
+  
+  return 0;
+}
+
+int foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("arch=corei7,popcnt")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("avx2,ssse3")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("arch=core2")))
+foo ()
+{
+  return 12;
+}
+
+int __attribute__ ((target("arch=corei7")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("arch=atom")))
+foo ()
+{
+  return 13;
+}
+
+int __attribute__ ((target("sse3")))
+foo ()
+{
+  return 15;
+}
+
+int __attribute__ ((target("sse2")))
+foo ()
+{
+  return 16;
+}
+
+int __attribute__ ((target("sse")))
+foo ()
+{
+  return 17;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 5;
+}
+
+int __attribute__ ((target("sse4.2")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("popcnt")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("sse4.1")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("ssse3")))
+foo ()
+{
+  return 14;
+}
+
+int __attribute__ ((target("mmx")))
+foo ()
+{
+  return 18;
+}
+
+int __attribute__ ((target("avx2")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("arch=amdfam10")))
+foo ()
+{
+  return 11;
+}
+
+int __attribute__ ((target("arch=bdver1")))
+foo ()
+{
+  return 1;
+}
+
+int __attribute__ ((target("arch=bdver2")))
+foo ()
+{
+  return 2;
+}
Index: gcc/testsuite/g++.dg/mv3.C
===================================================================
--- gcc/testsuite/g++.dg/mv3.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv3.C	(revision 0)
@@ -0,0 +1,37 @@
+/* Test case to check if a call to a multiversioned function
+   is replaced with a direct call to the particular version when
+   the most specialized version's target attributes match the
+   caller.  
+  
+   In this program, foo is multiversioned but there is no default
+   function.  This is an error if the call has to go through a
+   dispatcher.  However, the call to foo in bar can be replaced
+   with a direct call to the popcnt version of foo.  Hence, this
+   test should pass.  */
+
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2" } */
+
+
+/* Default version.  */
+int __attribute__ ((target ("sse")))
+foo ()
+{
+  return 1;
+}
+int __attribute__ ((target ("popcnt")))
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target ("popcnt")))
+bar ()
+{
+  return foo ();
+}
+
+int main ()
+{
+  return bar ();
+}
Index: gcc/cp/class.c
===================================================================
--- gcc/cp/class.c	(revision 187817)
+++ gcc/cp/class.c	(working copy)
@@ -1093,7 +1093,20 @@ add_method (tree type, tree method, tree using_dec
 	      || same_type_p (TREE_TYPE (fn_type),
 			      TREE_TYPE (method_type))))
 	{
-	  if (using_decl)
+	  /* For function versions, their parms and types match
+	     but they are not duplicates.  Record function versions
+	     as and when they are found.  */
+	  if (TREE_CODE (fn) == FUNCTION_DECL
+	      && TREE_CODE (method) == FUNCTION_DECL
+	      && (function_target_attribute (fn)
+		  || function_target_attribute (method))
+	      && has_different_version_attributes (fn, method))
+ 	    {
+	      mark_function_as_version (fn);
+	      mark_function_as_version (method);
+	      continue;
+	    }
+	  else if (using_decl)
 	    {
 	      if (DECL_CONTEXT (fn) == type)
 		/* Defer to the local function.  */
@@ -6863,6 +6876,7 @@ resolve_address_of_overloaded_function (tree targe
   tree matches = NULL_TREE;
   tree fn;
   tree target_fn_type;
+  VEC (tree, heap) *fn_ver_vec = NULL;
 
   /* By the time we get here, we should be seeing only real
      pointer-to-member types, not the internal POINTER_TYPE to
@@ -6927,9 +6941,19 @@ resolve_address_of_overloaded_function (tree targe
 	  if (DECL_ANTICIPATED (fn))
 	    continue;
 
-	  /* See if there's a match.  */
+	  /* See if there's a match.   For functions that are multi-versioned,
+	     all the versions match.  */
 	  if (same_type_p (target_fn_type, static_fn_type (fn)))
-	    matches = tree_cons (fn, NULL_TREE, matches);
+	    {
+	      matches = tree_cons (fn, NULL_TREE, matches);
+	      /*If versioned, push all possible versions into a vector.  */
+	      if (DECL_FUNCTION_VERSIONED (fn))
+		{
+		  if (fn_ver_vec == NULL)
+		   fn_ver_vec = VEC_alloc (tree, heap, 2);
+		  VEC_safe_push (tree, heap, fn_ver_vec, fn); 
+		}
+	    }
 	}
     }
 
@@ -7024,10 +7048,15 @@ resolve_address_of_overloaded_function (tree targe
       tree match;
 
       fn = TREE_PURPOSE (matches);
-      for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
-	if (!decls_match (fn, TREE_PURPOSE (match)))
-	  break;
 
+      /* For multi-versioned functions, more than one match is just fine.  */
+      if (DECL_FUNCTION_VERSIONED (fn))
+	match = NULL_TREE;
+      else
+        for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
+  	  if (!decls_match (fn, TREE_PURPOSE (match)))
+	    break;
+
       if (match)
 	{
 	  if (flags & tf_error)
@@ -7090,6 +7119,28 @@ resolve_address_of_overloaded_function (tree targe
       perform_or_defer_access_check (access_path, fn, fn);
     }
 
+  /* If a pointer to a function that is multi-versioned is requested, the
+     pointer to the dispatcher function is returned instead.  This works
+     well because indirectly calling the function will dispatch the right
+     function version at run-time. Also, the function address is kept
+     unique.  */
+  if (DECL_FUNCTION_VERSIONED (fn))
+    {
+      tree dispatcher_decl;
+      gcc_assert (fn_ver_vec != NULL);
+      dispatcher_decl = build_dispatcher_for_function_versions (fn_ver_vec);
+      if (!dispatcher_decl)
+	{
+	  error_at (input_location, "Pointer to a multiversioned function"
+		    " without a default is not allowed");
+	  return error_mark_node;
+	}
+      retrofit_lang_decl (dispatcher_decl);
+      mark_used (fn);
+      VEC_free (tree, heap, fn_ver_vec);
+      fn = dispatcher_decl;
+    }
+
   if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
     return cp_build_addr_expr (fn, flags);
   else
Index: gcc/cp/decl.c
===================================================================
--- gcc/cp/decl.c	(revision 187817)
+++ gcc/cp/decl.c	(working copy)
@@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "pointer-set.h"
 #include "splay-tree.h"
 #include "plugin.h"
+#include "cgraph.h"
 
 /* Possible cases of bad specifiers type used by bad_specifiers. */
 enum bad_spec_place {
@@ -973,6 +974,19 @@ decls_match (tree newdecl, tree olddecl)
       if (t1 != t2)
 	return 0;
 
+      /* The decls dont match if they correspond to two different versions
+	 of the same function.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)) 
+	  && has_different_version_attributes (newdecl, olddecl))
+	{
+	  /* One of the decls could be the default without the "target"
+	     attribute. Set it to be a versioned function here.  */
+	  mark_function_as_version (newdecl);
+	  mark_function_as_version (olddecl);
+	  return 0;
+	}
+
       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
 	  && ! (DECL_EXTERN_C_P (newdecl)
 		&& DECL_EXTERN_C_P (olddecl)))
@@ -1490,7 +1504,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
 	      error ("previous declaration %q+#D here", olddecl);
 	      return NULL_TREE;
 	    }
-	  else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
+	  /* For function versions, params and types match, but they
+	     are not ambiguous.  */
+	  else if ((!DECL_FUNCTION_VERSIONED (newdecl)
+		    && !DECL_FUNCTION_VERSIONED (olddecl))
+		   && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
 			      TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
 	    {
 	      error ("new declaration %q#D", newdecl);
@@ -2262,6 +2280,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
   else if (DECL_PRESERVE_P (newdecl))
     DECL_PRESERVE_P (olddecl) = 1;
 
+  /* If the olddecl is a version, so is the newdecl.  */
+  if (TREE_CODE (newdecl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (olddecl))
+    DECL_FUNCTION_VERSIONED (newdecl) = 1;
+
   if (TREE_CODE (newdecl) == FUNCTION_DECL)
     {
       int function_size;
@@ -14043,7 +14066,11 @@ cxx_comdat_group (tree decl)
 	  else
 	    break;
 	}
-      name = DECL_ASSEMBLER_NAME (decl);
+      if (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl))
+	name = DECL_NAME (decl);
+      else
+        name = DECL_ASSEMBLER_NAME (decl);
     }
 
   return name;
Index: gcc/cp/error.c
===================================================================
--- gcc/cp/error.c	(revision 187817)
+++ gcc/cp/error.c	(working copy)
@@ -1534,8 +1534,15 @@ dump_exception_spec (tree t, int flags)
 static void
 dump_function_name (tree t, int flags)
 {
-  tree name = DECL_NAME (t);
+  tree name;
 
+  /* For function versions, use the assembler name as the decl name is
+     the same for all versions.  */
+  if (DECL_FUNCTION_VERSIONED (t))
+    name = DECL_ASSEMBLER_NAME (t);
+  else
+    name = DECL_NAME (t);
+
   /* We can get here with a decl that was synthesized by language-
      independent machinery (e.g. coverage.c) in which case it won't
      have a lang_specific structure attached and DECL_CONSTRUCTOR_P
Index: gcc/cp/semantics.c
===================================================================
--- gcc/cp/semantics.c	(revision 187817)
+++ gcc/cp/semantics.c	(working copy)
@@ -3784,8 +3784,11 @@ expand_or_defer_fn_1 (tree fn)
       /* If the user wants us to keep all inline functions, then mark
 	 this function as needed so that finish_file will make sure to
 	 output it later.  Similarly, all dllexport'd functions must
-	 be emitted; there may be callers in other DLLs.  */
-      if ((flag_keep_inline_functions
+	 be emitted; there may be callers in other DLLs.
+	 Also, mark this function as needed if it is marked inline but
+	 is a multi-versioned function.  */
+      if (((flag_keep_inline_functions
+	    || DECL_FUNCTION_VERSIONED (fn))
 	   && DECL_DECLARED_INLINE_P (fn)
 	   && !DECL_REALLY_EXTERN (fn))
 	  || (flag_keep_inline_dllexport
Index: gcc/cp/decl2.c
===================================================================
--- gcc/cp/decl2.c	(revision 187817)
+++ gcc/cp/decl2.c	(working copy)
@@ -675,9 +675,13 @@ check_classfn (tree ctype, tree function, tree tem
 	  if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
 	    continue;
 
+	  /* While finding a match, same types and params are not enough
+	     if the function is versioned.  Also check version ("target")
+	     attributes.  */
 	  if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
 			   TREE_TYPE (TREE_TYPE (fndecl)))
 	      && compparms (p1, p2)
+	      && !has_different_version_attributes (function, fndecl)
 	      && (!is_template
 		  || comp_template_parms (template_parms,
 					  DECL_TEMPLATE_PARMS (fndecl)))
Index: gcc/cp/call.c
===================================================================
--- gcc/cp/call.c	(revision 187817)
+++ gcc/cp/call.c	(working copy)
@@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "c-family/c-objc.h"
 #include "timevar.h"
+#include "cgraph.h"
 
 /* The various kinds of conversion.  */
 
@@ -3905,6 +3906,16 @@ build_new_function_call (tree fn, VEC(tree,gc) **a
     {
       if (complain & tf_error)
 	{
+	  /* If the call is to a multiversioned function without
+	     a default version, overload resolution will fail.  */
+	  if (candidates
+	      && TREE_CODE (candidates->fn) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (candidates->fn))
+	    error_at (location_of (DECL_NAME (OVL_CURRENT (fn))),
+		      "Call to multiversioned function %<%D(%A)%> with"
+		      " no default version", DECL_NAME (OVL_CURRENT (fn)),
+		      build_tree_list_vec (*args));
+
 	  if (!any_viable_p && candidates && ! candidates->next
 	      && (TREE_CODE (candidates->fn) == FUNCTION_DECL))
 	    return cp_build_function_call_vec (candidates->fn, args, complain);
@@ -6829,6 +6840,30 @@ build_over_call (struct z_candidate *cand, int fla
   if (!already_used)
     mark_used (fn);
 
+  /* For calls to a multi-versioned function, overload resolution
+     returns the function with the highest target priority, that is,
+     the version that will checked for dispatching first.  If this
+     version is inlinable, a direct call can be made otherwise it
+     should go through the dispatcher.  */
+
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && !targetm.target_option.can_inline_p (current_function_decl, fn))
+    {
+      tree dispatcher_decl = NULL;
+      struct cgraph_node *node = cgraph_get_node (fn);
+      if (node != NULL)
+        dispatcher_decl = cgraph_get_node (fn)->dispatcher_fndecl;
+      if (dispatcher_decl == NULL)
+	{
+	  error_at (input_location, "Call to multiversioned function"
+		    " without a default is not allowed");
+	  return NULL;
+	}
+      retrofit_lang_decl (dispatcher_decl);
+      gcc_assert (dispatcher_decl != NULL);
+      fn = dispatcher_decl;
+    }
+
   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
     {
       tree t;
@@ -8086,6 +8121,29 @@ joust (struct z_candidate *cand1, struct z_candida
   size_t i;
   size_t len;
 
+  /* For Candidates of a multi-versioned function,  make the version with
+     the most specialized target attributes, highest priority win.  This
+     version will be checked for dispatching first.  If this version can
+     be inlined into the caller the front-end will simply make a direct
+     call to this function.  */
+
+  if ((TREE_CODE (cand1->fn) == FUNCTION_DECL
+       && DECL_FUNCTION_VERSIONED (cand1->fn))
+      ||(TREE_CODE (cand2->fn) == FUNCTION_DECL
+	 && DECL_FUNCTION_VERSIONED (cand2->fn)))
+    {
+      /* Both functions must be marked versioned.  */
+      gcc_assert (DECL_FUNCTION_VERSIONED (cand1->fn)
+		  && DECL_FUNCTION_VERSIONED (cand2->fn));
+
+      /* Always make the version with the higher priority, more
+	 specialized, win.  */
+      if (targetm.compare_versions (cand1->fn, cand2->fn) >= 0)
+	return 1;
+      else
+	return -1;
+    }
+
   /* Candidates that involve bad conversions are always worse than those
      that don't.  */
   if (cand1->viable > cand2->viable)
@@ -8431,6 +8489,20 @@ tourney (struct z_candidate *candidates, tsubst_fl
   int fate;
   int champ_compared_to_predecessor = 0;
 
+  /* For multiversioned functions, aggregate all the versions here for
+     generating the dispatcher body later if necessary.  */
+
+  if (DECL_FUNCTION_VERSIONED (candidates->fn))
+    {
+      VEC (tree, heap) *fn_ver_vec = NULL;
+      struct z_candidate *ver = candidates;
+      fn_ver_vec = VEC_alloc (tree, heap, 2);
+      for (;ver; ver = ver->next)
+        VEC_safe_push (tree, heap, fn_ver_vec, ver->fn);
+      build_dispatcher_for_function_versions (fn_ver_vec);
+      VEC_free (tree, heap, fn_ver_vec);
+    }
+
   /* Walk through the list once, comparing each current champ to the next
      candidate, knocking out a candidate or two with each comparison.  */
 
Index: gcc/cp/mangle.c
===================================================================
--- gcc/cp/mangle.c	(revision 187817)
+++ gcc/cp/mangle.c	(working copy)
@@ -1245,7 +1245,12 @@ write_unqualified_name (const tree decl)
     {
       MANGLE_TRACE_TREE ("local-source-name", decl);
       write_char ('L');
-      write_source_name (DECL_NAME (decl));
+      if (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && DECL_ASSEMBLER_NAME_SET_P (decl))
+	write_source_name (DECL_ASSEMBLER_NAME (decl));
+      else
+	write_source_name (DECL_NAME (decl));
       /* The default discriminator is 1, and that's all we ever use,
 	 so there's no code to output one here.  */
     }
@@ -1260,7 +1265,14 @@ write_unqualified_name (const tree decl)
                && LAMBDA_TYPE_P (type))
         write_closure_type_name (type);
       else
-        write_source_name (DECL_NAME (decl));
+	{
+	  if (TREE_CODE (decl) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (decl)
+	      && DECL_ASSEMBLER_NAME_SET_P (decl))
+	    write_source_name (DECL_ASSEMBLER_NAME (decl));
+	  else
+	    write_source_name (DECL_NAME (decl));
+	}
     }
 }
 
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 187817)
+++ gcc/Makefile.in	(working copy)
@@ -1297,6 +1297,7 @@ OBJS = \
 	mcf.o \
 	mode-switching.o \
 	modulo-sched.o \
+	multiversion.o \
 	omega.o \
 	omp-low.o \
 	optabs.o \
@@ -3042,6 +3043,11 @@ ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h
    $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
    $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \
    $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H)
+multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
+   $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \
+   $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \
+   $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \
+   $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h
 cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
    $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 187817)
+++ gcc/config/i386/i386.c	(working copy)
@@ -27626,6 +27626,473 @@ ix86_init_mmx_sse_builtins (void)
     }
 }
 
+
+/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
+   to return a pointer to VERSION_DECL if the outcome of the expression
+   formed by PREDICATE_CHAIN is true.  This function will be called during
+   version dispatch to decide which function version to execute.  It returns
+   the basic block at the end to which more conditions can be added.  */
+
+static basic_block
+add_condition_to_bb (tree function_decl, tree version_decl,
+		     tree predicate_chain, basic_block new_bb)
+{
+  gimple return_stmt;
+  tree convert_expr, result_var;
+  gimple convert_stmt;
+  gimple call_cond_stmt;
+  gimple if_else_stmt;
+
+  basic_block bb1, bb2, bb3;
+  edge e12, e23;
+
+  tree cond_var, and_expr_var = NULL_TREE;
+  gimple_seq gseq;
+
+  tree old_current_function_decl;
+  tree predicate_decl, predicate_arg;
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
+  current_function_decl = function_decl;
+
+  gcc_assert (new_bb != NULL);
+  gseq = bb_seq (new_bb);
+
+
+  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+	     		 build_fold_addr_expr (version_decl));
+  result_var = create_tmp_var (ptr_type_node, NULL);
+  convert_stmt = gimple_build_assign (result_var, convert_expr); 
+  return_stmt = gimple_build_return (result_var);
+
+  if (predicate_chain == NULL_TREE)
+    {
+      gimple_seq_add_stmt (&gseq, convert_stmt);
+      gimple_seq_add_stmt (&gseq, return_stmt);
+      set_bb_seq (new_bb, gseq);
+      gimple_set_bb (convert_stmt, new_bb);
+      gimple_set_bb (return_stmt, new_bb);
+      pop_cfun ();
+      current_function_decl = old_current_function_decl;
+      return new_bb;
+    }
+
+  while (predicate_chain != NULL)
+    {
+      cond_var = create_tmp_var (integer_type_node, NULL);
+      predicate_decl = TREE_PURPOSE (predicate_chain);
+      predicate_arg = TREE_VALUE (predicate_chain);
+      call_cond_stmt = gimple_build_call (predicate_decl, 1, predicate_arg);
+      gimple_call_set_lhs (call_cond_stmt, cond_var);
+
+      gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
+      gimple_set_bb (call_cond_stmt, new_bb);
+      gimple_seq_add_stmt (&gseq, call_cond_stmt);
+
+      predicate_chain = TREE_CHAIN (predicate_chain);
+      
+      if (and_expr_var == NULL)
+        and_expr_var = cond_var;
+      else
+	{
+	  gimple assign_stmt;
+	  /* Use MIN_EXPR to check if any integer is zero?.
+	     and_expr_var = min_expr <cond_var, and_expr_var>  */
+	  assign_stmt = gimple_build_assign (and_expr_var,
+			  build2 (MIN_EXPR, integer_type_node,
+				  cond_var, and_expr_var));
+
+	  gimple_set_block (assign_stmt, DECL_INITIAL (function_decl));
+	  gimple_set_bb (assign_stmt, new_bb);
+	  gimple_seq_add_stmt (&gseq, assign_stmt);
+	}
+    }
+
+  if_else_stmt = gimple_build_cond (GT_EXPR, and_expr_var,
+	  		            integer_zero_node,
+				    NULL_TREE, NULL_TREE);
+  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
+  gimple_set_bb (if_else_stmt, new_bb);
+  gimple_seq_add_stmt (&gseq, if_else_stmt);
+
+  gimple_seq_add_stmt (&gseq, convert_stmt);
+  gimple_seq_add_stmt (&gseq, return_stmt);
+  set_bb_seq (new_bb, gseq);
+
+  bb1 = new_bb;
+  e12 = split_block (bb1, if_else_stmt);
+  bb2 = e12->dest;
+  e12->flags &= ~EDGE_FALLTHRU;
+  e12->flags |= EDGE_TRUE_VALUE;
+
+  e23 = split_block (bb2, return_stmt);
+
+  gimple_set_bb (convert_stmt, bb2);
+  gimple_set_bb (return_stmt, bb2);
+
+  bb3 = e23->dest;
+  make_edge (bb1, bb3, EDGE_FALSE_VALUE); 
+
+  remove_edge (e23);
+  make_edge (bb2, EXIT_BLOCK_PTR, 0);
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  return bb3;
+}
+
+/* This parses the attribute arguments to target in DECL and determines
+   the right builtin to use to match the platform specification.
+   For now, only one target argument ("arch=" or "<-m>xxx") is allowed.
+   It returns the priority value for this version decl.  If PREDICATE_LIST
+   is not NULL, it stores the list of cpu features that need to be checked
+   before dispatching this function.  */
+
+static unsigned int
+get_builtin_code_for_version (tree decl, tree *predicate_list)
+{
+  tree attrs;
+  struct cl_target_option cur_target;
+  tree target_node;
+  struct cl_target_option *new_target;
+  const char *arg_str = NULL;
+  const char *attrs_str = NULL;
+  char *tok_str = NULL;
+  char *token;
+  unsigned int priority = 0;
+
+  /* Priority of i386 features, greater value is higher priority.   This is
+     used to decide the order in which function dispatch must happen.  For
+     instance, a version specialized for SSE4.2 should be checked for dispatch
+     before a version for SSE3, as SSE4.2 implies SSE3.  */
+  enum feature_priority
+  {
+    P_ZERO = 0,
+    P_MMX,
+    P_SSE,
+    P_SSE2,
+    P_SSE3,
+    P_SSSE3,
+    P_PROC_SSSE3,
+    P_SSE4_a,
+    P_PROC_SSE4_a,
+    P_SSE4_1,
+    P_SSE4_2,
+    P_PROC_SSE4_2,
+    P_POPCNT,
+    P_AVX,
+    P_AVX2,
+    P_FMA,
+    P_PROC_FMA
+  };
+
+  /* These are the target attribute strings for which a dispatcher is
+     available, from fold_builtin_cpu.  */
+
+  static struct _feature_list
+    {
+      const char *const name;
+      const enum feature_priority priority;
+    }
+  const feature_list[] =
+    {
+      {"mmx", P_MMX},
+      {"sse", P_SSE},
+      {"sse2", P_SSE2},
+      {"sse3", P_SSE3},
+      {"ssse3", P_SSSE3},
+      {"sse4.1", P_SSE4_1},
+      {"sse4.2", P_SSE4_2},
+      {"popcnt", P_POPCNT},
+      {"avx", P_AVX},
+      {"avx2", P_AVX2}
+    };
+
+
+  static unsigned int NUM_FEATURES
+    = sizeof (feature_list) / sizeof (struct _feature_list);
+
+  unsigned int i;
+
+  tree predicate_chain = NULL_TREE;
+  tree predicate_decl, predicate_arg;
+
+  attrs = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  gcc_assert (attrs != NULL);
+
+  attrs = TREE_VALUE (TREE_VALUE (attrs));
+
+  gcc_assert (TREE_CODE (attrs) == STRING_CST);
+  attrs_str = TREE_STRING_POINTER (attrs);
+
+
+  /* Handle arch= if specified.  For priority, set it to be 1 more than
+     the best instruction set the processor can handle.  For instance, if
+     there is a version for atom and a version for ssse3 (the highest ISA
+     priority for atom), the atom version must be checked for dispatch
+     before the ssse3 version. */
+  if (strstr (attrs_str, "arch=") != NULL)
+    {
+      cl_target_option_save (&cur_target, &global_options);
+      target_node = ix86_valid_target_attribute_tree (attrs);
+    
+      gcc_assert (target_node);
+      new_target = TREE_TARGET_OPTION (target_node);
+      gcc_assert (new_target);
+      
+      if (new_target->arch_specified && new_target->arch > 0)
+	{
+	  switch (new_target->arch)
+	    {
+	    case PROCESSOR_CORE2_32:
+	    case PROCESSOR_CORE2_64:
+	      arg_str = "core2";
+	      priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_COREI7_32:
+	    case PROCESSOR_COREI7_64:
+	      arg_str = "corei7";
+	      priority = P_PROC_SSE4_2;
+	      break;
+	    case PROCESSOR_ATOM:
+	      arg_str = "atom";
+	      priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_AMDFAM10:
+	      arg_str = "amdfam10h";
+	      priority = P_PROC_SSE4_a;
+	      break;
+	    case PROCESSOR_BDVER1:
+	      arg_str = "bdver1";
+	      priority = P_PROC_FMA;
+	      break;
+	    case PROCESSOR_BDVER2:
+	      arg_str = "bdver2";
+	      priority = P_PROC_FMA;
+	      break;
+	    }  
+	}    
+    
+      cl_target_option_restore (&global_options, &cur_target);
+	
+      if (predicate_list && arg_str == NULL)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+	    	"No dispatcher found for the versioning attributes");
+	  return 0;
+	}
+    
+      if (predicate_list)
+	{
+          predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_IS];
+          /* For a C string literal the length includes the trailing NULL.  */
+          predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
+          predicate_chain = tree_cons (predicate_decl, predicate_arg,
+				       predicate_chain);
+	}
+    }
+
+  /* Process feature name.  */
+  tok_str =  (char *) xmalloc (strlen (attrs_str) + 1);
+  strcpy (tok_str, attrs_str);
+  token = strtok (tok_str, ",");
+  predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_SUPPORTS];
+
+  while (token != NULL)
+    {
+      /* Do not process "arch="  */
+      if (strncmp (token, "arch=", 5) == 0)
+	{
+	  token = strtok (NULL, ",");
+	  continue;
+	}
+      for (i = 0; i < NUM_FEATURES; ++i)
+	{
+	  if (strcmp (token, feature_list[i].name) == 0)
+	    {
+	      if (predicate_list)
+		{
+		  predicate_arg = build_string_literal (
+				  strlen (feature_list[i].name) + 1,
+				  feature_list[i].name);
+		  predicate_chain = tree_cons (predicate_decl, predicate_arg,
+					       predicate_chain);
+		}
+	      /* Find the maximum priority feature.  */
+	      if (feature_list[i].priority > priority)
+		priority = feature_list[i].priority;
+
+	      break;
+	    }
+	}
+      if (predicate_list && i == NUM_FEATURES)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+		    "No dispatcher found for %s", token);
+	  return 0;
+	}
+      token = strtok (NULL, ",");
+    }
+  free (tok_str);
+
+  if (predicate_list && predicate_chain == NULL_TREE)
+    {
+      error_at (DECL_SOURCE_LOCATION (decl),
+	        "No dispatcher found for the versioning attributes : %s",
+	        attrs_str);
+      return 0;
+    }
+  else if (predicate_list)
+    {
+      predicate_chain = nreverse (predicate_chain);
+      *predicate_list = predicate_chain;
+    }
+
+  return priority; 
+}
+
+/* This compares the priority of target features in function DECL1
+   and DECL2.  It returns positive value if DECL1 is higher priority,
+   negative value if DECL2 is higher priority and 0 if they are the
+   same.  */
+
+static int
+ix86_compare_versions (tree decl1, tree decl2)
+{
+  unsigned int priority1 = 0;
+  unsigned int priority2 = 0;
+
+  if (lookup_attribute ("target", DECL_ATTRIBUTES (decl1)) != NULL)
+    priority1 = get_builtin_code_for_version (decl1, NULL);
+
+  if (lookup_attribute ("target", DECL_ATTRIBUTES (decl2)) != NULL)
+    priority2 = get_builtin_code_for_version (decl2, NULL);
+
+  return (int)priority1 - (int)priority2;
+}
+ 
+static int
+feature_compare (const void *v1, const void *v2)
+{
+  typedef struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    } function_version_info;
+
+  const function_version_info c1 = *(const function_version_info *)v1;
+  const function_version_info c2 = *(const function_version_info *)v2;
+  return (c2.dispatch_priority - c1.dispatch_priority);
+}
+
+/* This is the target hook to generate the dispatch function for
+   multi-versioned functions.  DISPATCH_DECL is the function which will
+   contain the dispatch logic.  FNDECLS are the function choices for
+   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
+   in DISPATCH_DECL in which the dispatch code is generated.  */
+
+static int
+ix86_dispatch_version (tree dispatch_decl,
+		       void *fndecls_p,
+		       basic_block *empty_bb)
+{
+  tree default_decl;
+  gimple ifunc_cpu_init_stmt;
+  gimple_seq gseq;
+  tree old_current_function_decl;
+  int ix;
+  tree ele;
+  VEC (tree, heap) *fndecls;
+  unsigned int num_versions = 0;
+  unsigned int actual_versions = 0;
+  unsigned int i;
+
+  struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    }*function_version_info;
+
+  gcc_assert (dispatch_decl != NULL
+	      && fndecls_p != NULL
+	      && empty_bb != NULL);
+
+  /*fndecls_p is actually a vector.  */
+  fndecls = (VEC (tree, heap) *)fndecls_p;
+
+  /* Atleast one more version other than the default.  */
+  num_versions = VEC_length (tree, fndecls);
+  gcc_assert (num_versions >= 2);
+
+  function_version_info = (struct _function_version_info *)
+    xmalloc ((num_versions - 1) * sizeof (struct _function_version_info));
+
+  /* The first version in the vector is the default decl.  */
+  default_decl = VEC_index (tree, fndecls, 0);
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
+  current_function_decl = dispatch_decl;
+
+  gseq = bb_seq (*empty_bb);
+  /* Function version dispatch is via IFUNC.  IFUNC resolvers fire before
+     constructors, so explicity call __builtin_cpu_init here.  */
+  ifunc_cpu_init_stmt = gimple_build_call_vec (
+                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
+  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
+  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
+  set_bb_seq (*empty_bb, gseq);
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+
+  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
+    {
+      tree version_decl = ele;
+      tree predicate_chain = NULL_TREE;
+      unsigned int priority;
+      /* Get attribute string, parse it and find the right predicate decl.
+         The predicate function could be a lengthy combination of many
+	 features, like arch-type and various isa-variants.  */
+      priority = get_builtin_code_for_version (version_decl,
+	 			               &predicate_chain);
+
+      if (predicate_chain == NULL_TREE)
+	continue;
+
+      actual_versions++;
+      function_version_info [ix - 1].version_decl = version_decl;
+      function_version_info [ix - 1].predicate_chain = predicate_chain;
+      function_version_info [ix - 1].dispatch_priority = priority;
+    }
+
+  /* Sort the versions according to descending order of dispatch priority.  The
+     priority is based on the ISA.  This is not a perfect solution.  There
+     could still be ambiguity.  If more than one function version is suitable
+     to execute,  which one should be dispatched?  In future, allow the user
+     to specify a dispatch  priority next to the version.  */
+  qsort (function_version_info, actual_versions,
+         sizeof (struct _function_version_info), feature_compare);
+
+  for  (i = 0; i < actual_versions; ++i)
+    *empty_bb = add_condition_to_bb (dispatch_decl,
+				     function_version_info[i].version_decl,
+				     function_version_info[i].predicate_chain,
+				     *empty_bb);
+
+  /* dispatch default version at the end.  */
+  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl,
+				   NULL, *empty_bb);
+
+  free (function_version_info);
+  return 0;
+}
+
 /* This builds the processor_model struct type defined in
    libgcc/config/i386/cpuinfo.c  */
 
@@ -39571,6 +40038,12 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN ix86_fold_builtin
 
+#undef TARGET_DISPATCH_VERSION
+#define TARGET_DISPATCH_VERSION ix86_dispatch_version
+
+#undef TARGET_COMPARE_VERSIONS
+#define TARGET_COMPARE_VERSIONS ix86_compare_versions
+
 #undef TARGET_ENUM_VA_LIST_P
 #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
 


More information about the Gcc-patches mailing list