This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [google] Patch to support calling multi-versioned functions via new GCC builtin. (issue4440078)
Hi,
I had reverted this patch from google/main because it failed to
boot-strap with java enabled. I have fixed that bug. I have attached
the new patch. The only changes I made from the old version are in
file
gcc/mversn-dispatch.c :
=============================================================================================
347,349d346
< cfun->curr_properties |=
< (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
< PROP_ssa);
865c862
< e = make_edge (new_bb, EXIT_BLOCK_PTR, 0);
---
> e = make_edge (new_bb, EXIT_BLOCK_PTR, EDGE_FALLTHRU);
1507c1504
< gimple_call_set_lhs (feature_test_call, ssa_result_var);
---
> gimple_assign_set_lhs (feature_test_call, ssa_result_var);
1558c1555
< init_stmt = gimple_build_assign (tmp_var, build_zero_cst (return_type));
---
> init_stmt = gimple_build_assign (tmp_var, integer_zero_node);
1708c1705
< if (strstr (IDENTIFIER_POINTER (DECL_NAME (call_decl)),
---
> if (strstr (cgraph_node_name (cgraph_node (call_decl)),
==============================================================================================
I now checked that it boot-straps with Java. All tests pass. Is this
ok to submit to google/main ?
* tree-pass.h (pass_tree_convert_builtin_dispatch): New pass.
(pass_ipa_multiversion_dispatch): New pass.
* builtin-types.def (BT_PTR_FN_INT): New pointer type.
(BT_FN_INT_PTR_FN_INT_PTR_PTR_VAR): New function type for __builtin_dispatch.
* builtins.def (BUILT_IN_DISPATCH): New builtin to
support multi-version calls.
* mversn-dispatch.c: New file.
* timevar.de (TV_MVERSN_DISPATCH): New time var.
* common.opt (fclone-hot-version-paths): New flag.
* Makefile.in (mversn-dispatch.o): New rule.
* passes.c (init_optimization_passes): Add the new
multi-version and dispatch passes to the pass list.
* params.def (PARAM_NUMBER_OF_MVERSN_CLONES): Define.
(PARAM_MVERSN_CLONE_CGRAPH_DEPTH): Define.
* doc/invoke.texi (mversn-clone-depth): Document.
(num-mversn-clones): Document.
(fclone-hot-version-paths): Document.
* testsuite/gcc.dg/mversn7.c: New test.
* testsuite/gcc.dg/mversn4.c: New test.
* testsuite/gcc.dg/mversn4.h: New test.
* testsuite/gcc.dg/mversn4a.c: New test.
* testsuite/gcc.dg/torture/mversn1.c: New test.
* testsuite/gcc.dg/mversn2.c: New test.
* testsuite/gcc.dg/mversn6.c: New test.
* testsuite/gcc.dg/mversn3.c: New test.
* testsuite/g++.dg/mversn8.C: New test.
* testsuite/g++.dg/mversn10a.C: New test.
* testsuite/g++.dg/mversn14a.C: New test.
* testsuite/g++.dg/tree-prof/mversn13.C: New test.
* testsuite/g++.dg/tree-prof/mversn15.C: New test.
* testsuite/g++.dg/tree-prof/mversn15a.C: New test.
* testsuite/g++.dg/mversn9.C: New test.
* testsuite/g++.dg/mversn10.C: New test.
* testsuite/g++.dg/mversn12.C: New test.
* testsuite/g++.dg/mversn14.C: New test.
* testsuite/g++.dg/mversn16.C: New test.
* testsuite/g++.dg/torture/mversn11.C: New test.
* testsuite/g++.dg/torture/mversn5.C: New test.
* testsuite/g++.dg/torture/mversn5.h: New test.
* testsuite/g++.dg/torture/mversn5a.C: New test.
* c-family/c-common.c (handle_version_selector_attribute): New function.
(c_common_attribute_table): New attribute "version_selector".
Thanks,
-Sri.
On Mon, May 2, 2011 at 1:38 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Thanks. I committed this patch.
>
> -Sri.
>
>
> On Mon, May 2, 2011 at 12:32 PM, Xinliang David Li <davidxl@google.com> wrote:
>> Ok. There may be more correctness related comments -- but those can be
>> addressed when available. For trunk, you need to address issues such
>> as multi-way dispatch.
>>
>> Thanks,
>>
>> David
>>
>> On Mon, May 2, 2011 at 12:11 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> Hi,
>>>
>>> ?I want to submit this patch to google/main to make sure it is
>>> available for our internal use at Google in order to materialize some
>>> optimization opportunities. Let us continue this dicussion as I make
>>> changes and submit this for review for trunk.
>>>
>>> Thanks,
>>> -Sri.
>>>
>>>
>>> On Mon, May 2, 2011 at 9:41 AM, Xinliang David Li <davidxl@google.com> wrote:
>>>> On Mon, May 2, 2011 at 2:11 AM, Richard Guenther
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Fri, Apr 29, 2011 at 6:23 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>>> Here is the background for this feature:
>>>>>>
>>>>>> 1) People relies on function multi-version to explore hw features and
>>>>>> squeeze performance, but there is no standard ways of doing so, either
>>>>>> a) using indirect function calls with function pointers set at program
>>>>>> initialization; b) using manual dispatch at each callsite; b) using
>>>>>> features like IFUNC. ?The dispatch mechanism needs to be promoted to
>>>>>> the language level and becomes the first class citizen;
>>>>>
>>>>> You are not doing that, you are inventing a new (crude) GCC extension.
>>>>
>>>> To capture the high level semantics and prevent user from lowering the
>>>> dispatch calls into forms compiler can not recognize, language
>>>> extension is the way to go.
>>>>
>>>>>
>>>>
>>>>>> See above. Multi-way dispatch can be added similarly.
>>>>>
>>>>> Not with the specified syntax. ?So you'd end up with _two_
>>>>> language extensions. ?That's bad (and unacceptable, IMHO).
>>>>
>>>> This part can be improved.
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> That's a nice idea, but why not simply process all functions with
>>>>>>> a const attribute and no arguments this way? ?IMHO
>>>>>>>
>>>>>>> int testsse (void) __attribute__((const));
>>>>>>>
>>>>>>> is as good and avoids the new attribute completely. ?The pass would
>>>>>>> also benefit other uses of this idiom (giving a way to have global
>>>>>>> dynamic initializers in C). ?The functions may not be strictly 'const'
>>>>>>> in a way the compiler autodetects this attribute but it presents
>>>>>>> exactly the promises to the compiler that allow this optimization.
>>>>>>>
>>>>>>> Thus, please split this piece out into a separate patch and use
>>>>>>> the const attribute.
>>>>>>
>>>>>>
>>>>>> Sounds good.
>>>>>>
>>>>
>>>>>>
>>>>>> 1) the function selection may happen in a different function;
>>>>>
>>>>> Well, sure. ?I propose to have the above as lowered form. ?If people
>>>>> deliberately obfuscate code it's their fault. ?Your scheme simply
>>>>> makes that obfuscation impossible (or less likely possible). ?So
>>>>> 1) is not a valid argument.
>>>>
>>>> Lowering too early may defeat the purpose because
>>>>
>>>> 1) the desired optimization may not happen subject to many compiler
>>>> heuristic changes;
>>>> 2) it has other side effects such as wrong estimation of function size
>>>> which may impact inlining
>>>> 3) it limits the lowering into one form which may not be ideal ?--
>>>> with builtin_dispatch, after hoisting optimization, the lowering can
>>>> use more efficient IFUNC scheme, for instance.
>>>>
>>>>>
>>>>>
>>>>> My point is that such optimization is completely independent of
>>>>> that dispatch thing. ?The above could as well be a selection to
>>>>> use different input arrays, one causing alias analysis issues
>>>>> and one not.
>>>>>
>>>>> Thus, a __builtin_dispatch centric optimization pass is the wrong
>>>>> way to go.
>>>>
>>>> I agree that many things can implemented in different ways, but a high
>>>> level standard builtin_dispatch mechanism doing hoisting
>>>> interprocedcurally is cleaner and simpler and targeted for function
>>>> multi-versioning -- it does not depend on/rely on later phase's
>>>> heuristic tunings to make the right things to happen. Function MV
>>>> deserves this level of treatment as it will become more and more
>>>> important for some users (e.g., Google).
>>>>
>>>>>
>>>>> Now, with FDO I'd expect the foo is inlined into bar (because foo
>>>>> is deemed hot),
>>>>
>>>> That is a myth -- the truth is that there are other heuristics which
>>>> can prevent this from happening.
>>>>
>>>>> then you only need to deal with loop unswitching,
>>>>> which should be easy to drive from FDO.
>>>>
>>>> Same here -- the loop body may not be well formed/recognized. The loop
>>>> nests may not be perfectly nested, or other unswitching heuristics may
>>>> block it from happening. ?This is the common problem form many other
>>>> things that get lowered too early. It is cleaner to make the high
>>>> level transformation first in IPA, and let unswitching dealing with
>>>> intra-procedural optimization.
>>>>
>>>> Thanks,
>>>>
>>>> David
>>>>
>>>>>
>>>>> Richard.
>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> David
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Note that we already have special FDO support for indirect to direct
>>>>>>> call promotion, so that might work already.
>>>>>>>
>>>>>>> Thus, with all this, __builtin_dispatch would be more like syntactic
>>>>>>> sugar to avoid writing above switch statement yourself. ?If you restrict
>>>>>>> that sugar to a constant number of candidates it can be replaced by
>>>>>>> a macro (or several ones, specialized for the number of candidates).
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>> ?For the call graph that looks like this before cloning :
>>>>>>>>
>>>>>>>> func_cold ----> func_hot ----> findOnes ----> IsPopCntSupported ----> popcnt
>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?|
>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----> no_popcnt
>>>>>>>>
>>>>>>>> ?where popcnt and no_popcnt are the multi-versioned functions, then after cloning :
>>>>>>>>
>>>>>>>> func_cold ---->IsPopCntSupported ----> func_hot_clone0 ----> findOnes_clone0 ----> popcnt
>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?|
>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ----> func_hot_clone1 ----> findOnes_clone1 ----> no_popcnt
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ?Flags : -fclone-hot-version-paths does function unswitching via cloning.
>>>>>>>> ? ? ? ? ?--param=num-mversn-clones=<num> allows to specify the number of
>>>>>>>> ? ? ? ? ?functions that should be cloned.
>>>>>>>> ? ? ? ? ?--param=mversn-clone-depth=<num> allows to specify the length of
>>>>>>>> ? ? ? ? ?the call graph path that should be cloned. ?num = 0 implies only
>>>>>>>> ? ? ? ? ?leaf node that contains the __builtin_dispatch statement must be
>>>>>>>> ? ? ? ? ?cloned.
>>>>>>>> ============================================================================================
>>>>>>>>
>>>>>>>> Google ref 45947, 45986
>>>>>>>>
>>>>>>>> 2011-04-28 ?Sriraman Tallam ?<tmsriram@google.com>
>>>>>>>>
>>>>>>>> ? ? ? ?* c-family/c-common.c ? (revision 173122) (handle_version_selector_attribute): New function.
>>>>>>>> ? ? ? ?(c_common_attribute_table): New attribute "version_selector".
>>>>>>>> ? ? ? ?* tree-pass.h ? (revision 173122) (pass_tree_convert_builtin_dispatch): New pass.
>>>>>>>> ? ? ? ?(pass_ipa_multiversion_dispatch): New pass.
>>>>>>>> ? ? ? ?* testsuite/gcc.dg/mversn7.c ? ?(revision 0): New test case.
>>>>>>>> ? ? ? ?* testsuite/gcc.dg/mversn4.c ? ?(revision 0): New test case.
>>>>>>>> ? ? ? ?* testsuite/gcc.dg/mversn4.h ? ?(revision 0): New test case.
>>>>>>>> ? ? ? ?* testsuite/gcc.dg/mversn4a.c ? (revision 0): New test case.
>>>>>>>> ? ? ? ?* testsuite/gcc.dg/torture/mversn1.c ? ?(revision 0): New test case.
>>>>>>>> ? ? ? ?* testsuite/gcc.dg/mversn2.c ? ?(revision 0): New test case.
>>>>>>>> ? ? ? ?* testsuite/gcc.dg/mversn6.c ? ?(revision 0): New test case.
>>>>>>>> ? ? ? ?* testsuite/gcc.dg/mversn3.c ? ?(revision 0): New test case.
>>>>>>>> ? ? ? ?* testsuite/g++.dg/mversn8.C ? ?(revision 0): New test case.
>>>>>>>> ? ? ? ?* testsuite/g++.dg/mversn10a.C ?(revision 0): New test case.
>>>>>>>> ? ? ? ?* testsuite/g++.dg/mversn14a.C ?(revision 0): New test case.
>>>>>>>> ? ? ? ?* testsuite/g++.dg/tree-prof/mversn13.C (revision 0): New test case.
>>>>>>>> ? ? ? ?* testsuite/g++.dg/tree-prof/mversn15.C (revision 0): New test case.
>>>>>>>> ? ? ? ?* testsuite/g++.dg/tree-prof/mversn15a.C ? ? ? ?(revision 0): New test case.
>>>>>>>> ? ? ? ?* testsuite/g++.dg/mversn9.C ? ?(revision 0): New test case.
>>>>>>>> ? ? ? ?* testsuite/g++.dg/mversn10.C ? (revision 0): New test case.
>>>>>>>> ? ? ? ?* testsuite/g++.dg/mversn12.C ? (revision 0): New test case.
>>>>>>>> ? ? ? ?* testsuite/g++.dg/mversn14.C ? (revision 0): New test case.
>>>>>>>> ? ? ? ?* testsuite/g++.dg/mversn16.C ? (revision 0): New test case.
>>>>>>>> ? ? ? ?* testsuite/g++.dg/torture/mversn11.C ? (revision 0): New test case.
>>>>>>>> ? ? ? ?* testsuite/g++.dg/torture/mversn5.C ? ?(revision 0): New test case.
>>>>>>>> ? ? ? ?* testsuite/g++.dg/torture/mversn5.h ? ?(revision 0): New test case.
>>>>>>>> ? ? ? ?* testsuite/g++.dg/torture/mversn5a.C ? (revision 0): New test case.
>>>>>>>> ? ? ? ?* builtin-types.def ? ? (revision 173122) (BT_PTR_FN_INT): New pointer type.
>>>>>>>> ? ? ? ?(BT_FN_INT_PTR_FN_INT_PTR_PTR_VAR): New function type for __builtin_dispatch.
>>>>>>>> ? ? ? ?* builtins.def ?(revision 173122) (BUILT_IN_DISPATCH): New builtin to
>>>>>>>> ? ? ? ?support multi-version calls.
>>>>>>>> ? ? ? ?* mversn-dispatch.c ? ? (revision 0): New file.
>>>>>>>> ? ? ? ?* timevar.def ? (revision 173122) (TV_MVERSN_DISPATCH): New time var.
>>>>>>>> ? ? ? ?* common.opt ? ?(revision 173122) (fclone-hot-version-paths): New flag.
>>>>>>>> ? ? ? ?* Makefile.in ? (revision 173122) (mversn-dispatch.o): New rule.
>>>>>>>> ? ? ? ?* passes.c ? ? ?(revision 173122) (init_optimization_passes): Add the new
>>>>>>>> ? ? ? ?multi-version and dispatch passes to the pass list.
>>>>>>>> ? ? ? ?* params.def ? ?(revision 173122) (PARAM_NUMBER_OF_MVERSN_CLONES): Define.
>>>>>>>> ? ? ? ?(PARAM_MVERSN_CLONE_CGRAPH_DEPTH): Define.
>>>>>>>> ? ? ? ?* doc/invoke.texi ? ? ? (revision 173122) (mversn-clone-depth): Document.
>>>>>>>> ? ? ? ?(num-mversn-clones): Document.
>>>>>>>> ? ? ? ?(fclone-hot-version-paths): Document.
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi (revision 173381)
+++ doc/invoke.texi (working copy)
@@ -343,7 +343,8 @@ Objective-C and Objective-C++ Dialects}.
-falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol
-fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol
-fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol
--fcheck-data-deps -fcombine-stack-adjustments -fconserve-stack @gol
+-fcheck-data-deps -fclone-hot-version-paths @gol
+-fcombine-stack-adjustments -fconserve-stack @gol
-fcompare-elim -fcprop-registers -fcrossjumping @gol
-fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol
-fcx-limited-range @gol
@@ -8283,6 +8284,10 @@ used in one place: in @file{reorg.c}, instead of g
branch is most likely to take, the @samp{REG_BR_PROB} values are used to
exactly determine which path is taken more often.
+@item -fclone-hot-version-paths
+When multi-version calls are made using @samp{__builtin_dispatch}, this flag
+enables cloning and hoisting of hot multiversioned paths.
+
@item -fprofile-values
@opindex fprofile-values
If combined with @option{-fprofile-arcs}, it adds code so that some
@@ -8544,6 +8549,16 @@ by the compiler will be investigated. To those fu
be applied.
The default value is 40.
+@item mversn-clone-depth
+When using @option{-fclone-hot-version-paths}, hot function paths are multi-
+versioned via cloning. This parameter specifies the maximum length of the
+call graph path that can be cloned. The default value is 2.
+
+@item num-mversn-clones
+When using @option{-fclone-hot-version-paths}, hot function paths are multi-
+versioned via cloning. This parameter specifies the maximum number of
+functions that can be cloned. The default value is 10.
+
@item large-function-insns
The limit specifying really large functions. For functions larger than this
limit after inlining, inlining is constrained by
Index: c-family/c-common.c
===================================================================
--- c-family/c-common.c (revision 173381)
+++ c-family/c-common.c (working copy)
@@ -315,6 +315,7 @@ static tree check_case_value (tree);
static bool check_case_bounds (tree, tree, tree *, tree *);
static tree handle_packed_attribute (tree *, tree, tree, int, bool *);
+static tree handle_version_selector_attribute (tree *, tree, tree, int, bool *);
static tree handle_nocommon_attribute (tree *, tree, tree, int, bool *);
static tree handle_common_attribute (tree *, tree, tree, int, bool *);
static tree handle_noreturn_attribute (tree *, tree, tree, int, bool *);
@@ -603,6 +604,8 @@ const unsigned int num_c_common_reswords =
const struct attribute_spec c_common_attribute_table[] =
{
/* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler } */
+ { "version_selector", 0, 0, true, false, false,
+ handle_version_selector_attribute },
{ "packed", 0, 0, false, false, false,
handle_packed_attribute },
{ "nocommon", 0, 0, true, false, false,
@@ -5786,6 +5789,37 @@ handle_packed_attribute (tree *node, tree name, tr
return NULL_TREE;
}
+/* Handle a "version_selector attribute".
+ Functions are marked with attribute "version_selector" only if
+ they are run-time constants. Example of such functions would
+ be those that test if a particular feature is available on a
+ particular architecture. Such function must return a positive
+ integer. For two-way functions, those that test if a feature
+ is present or not must return 1 or 0 respectively. */
+
+static tree
+handle_version_selector_attribute (tree *node, tree name,
+ tree ARG_UNUSED (args),
+ int ARG_UNUSED (flags),
+ bool *no_add_attrs)
+{
+ if (TREE_CODE (*node) == FUNCTION_DECL)
+ {
+ /* Check that the return type is integer. */
+ gcc_assert (TREE_CODE (TREE_TYPE (TREE_TYPE (*node)))
+ == INTEGER_TYPE);
+ if (dump_file)
+ fprintf (dump_file, "%s is a version_selector function\n",
+ IDENTIFIER_POINTER (DECL_NAME (*node)));
+ }
+ else
+ {
+ warning (OPT_Wattributes, "%qE attribute ignored", name);
+ *no_add_attrs = true;
+ }
+ return NULL_TREE;
+}
+
/* Handle a "nocommon" attribute; arguments as in
struct attribute_spec.handler. */
Index: tree-pass.h
===================================================================
--- tree-pass.h (revision 173381)
+++ tree-pass.h (working copy)
@@ -448,6 +448,7 @@ extern struct gimple_opt_pass pass_warn_unused_res
extern struct gimple_opt_pass pass_split_functions;
extern struct gimple_opt_pass pass_feedback_split_functions;
extern struct gimple_opt_pass pass_threadsafe_analyze;
+extern struct gimple_opt_pass pass_tree_convert_builtin_dispatch;
/* IPA Passes */
extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
@@ -472,6 +473,7 @@ extern struct ipa_opt_pass_d pass_ipa_lto_wpa_fixu
extern struct ipa_opt_pass_d pass_ipa_lto_finish_out;
extern struct ipa_opt_pass_d pass_ipa_profile;
extern struct ipa_opt_pass_d pass_ipa_cdtor_merge;
+extern struct simple_ipa_opt_pass pass_ipa_multiversion_dispatch;
extern struct gimple_opt_pass pass_all_optimizations;
extern struct gimple_opt_pass pass_cleanup_cfg_post_optimizing;
Index: testsuite/gcc.dg/mversn7.c
===================================================================
--- testsuite/gcc.dg/mversn7.c (revision 0)
+++ testsuite/gcc.dg/mversn7.c (revision 0)
@@ -0,0 +1,47 @@
+/* This test checks if cloning and dispatching works correctly with
+ a motivating example. The function problem calculates the sum of
+ numbers from 1 to 10 in two different ways. Hence, after cloning
+ both clones should return 55, which means main returns 0 if function
+ "problem" gets inlined.
+ This example also shows the benefits of function
+ unswitching. Without cloning, the loop will be done. */
+
+/* { dg-do run } */
+/* { dg-options "-O2 -fclone-hot-version-paths -fdump-tree-optimized" } */
+
+int __attribute__ ((version_selector))
+featureTest ()
+{
+ return 1;
+}
+
+int foo (int i)
+{
+ return i;
+}
+
+int bar (int i)
+{
+ return (11 - i);
+}
+
+/* This calculates the sum of numbers from 1 to 10 in 2 different ways. */
+int __attribute__ ((hot))
+problem ()
+{
+ int ret = 0;
+ int j = 1;
+ for (j = 1; j<=10; j++)
+ ret += __builtin_dispatch (featureTest, (void *)foo, (void *)bar, j);
+ return ret;
+}
+
+int __attribute__ ((hot))
+main ()
+{
+ return problem() - 55;
+}
+
+
+/* { dg-final { scan-tree-dump "return 55" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/gcc.dg/mversn4.c
===================================================================
--- testsuite/gcc.dg/mversn4.c (revision 0)
+++ testsuite/gcc.dg/mversn4.c (revision 0)
@@ -0,0 +1,29 @@
+/* Check that static feature test functions are correctly handled. This
+ test case also needs mversn4a.c. extern_func calls foo and returns 0.*/
+
+/* { dg-do run } */
+/* { dg-additional-sources "mversn4a.c" } */
+/* { dg-options "-O2" } */
+
+#include "mversn4.h"
+
+extern int extern_func ();
+
+int foo ()
+{
+ return 0;
+}
+
+int bar ()
+{
+ return 1;
+}
+
+int __attribute__ ((cold))
+main ()
+{
+ int a = 1, b = 1;
+ a = extern_func ();
+ b = __builtin_dispatch (featureTest, (void *)bar, (void *) foo);
+ return a * b;
+}
Index: testsuite/gcc.dg/mversn4.h
===================================================================
--- testsuite/gcc.dg/mversn4.h (revision 0)
+++ testsuite/gcc.dg/mversn4.h (revision 0)
@@ -0,0 +1,5 @@
+static int __attribute__ ((version_selector))
+featureTest ()
+{
+ return 1;
+}
Index: testsuite/gcc.dg/mversn4a.c
===================================================================
--- testsuite/gcc.dg/mversn4a.c (revision 0)
+++ testsuite/gcc.dg/mversn4a.c (revision 0)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#include "mversn4.h"
+
+extern int foo ();
+extern int bar ();
+
+
+int extern_func ()
+{
+ return __builtin_dispatch (featureTest, (void *)foo, (void *) bar);
+}
Index: testsuite/gcc.dg/torture/mversn1.c
===================================================================
--- testsuite/gcc.dg/torture/mversn1.c (revision 0)
+++ testsuite/gcc.dg/torture/mversn1.c (revision 0)
@@ -0,0 +1,31 @@
+/* Check that __builtin_dispatch gets converted and the executable runs fine.
+ Since featureTest () returns 1, foo should be called and return 0. Cloning
+ and Hoisting is not turned on here. */
+/* { dg-do run } */
+
+int __attribute__ ((version_selector))
+featureTest ()
+{
+ return 1;
+}
+
+int __attribute__ ((cold))
+foo ()
+{
+ return 0;
+}
+
+int __attribute__ ((cold))
+bar ()
+{
+ return 1;
+}
+
+int __attribute__ ((cold))
+main ()
+{
+ int a, b;
+ a = __builtin_dispatch (featureTest, (void *)foo, (void *) bar);
+ b = __builtin_dispatch (featureTest, (void *)bar, (void *) foo);
+ return a * b;
+}
Index: testsuite/gcc.dg/mversn2.c
===================================================================
--- testsuite/gcc.dg/mversn2.c (revision 0)
+++ testsuite/gcc.dg/mversn2.c (revision 0)
@@ -0,0 +1,47 @@
+/* This checks if cloning works correctly. Since dispatch and fn1 are hot, they
+ should be cloned. main should not be cloned.*/
+
+/* { dg-do run } */
+/* { dg-options "-O2 -fclone-hot-version-paths -fdump-tree-optimized" } */
+
+int __attribute__ ((version_selector))
+featureTest ()
+{
+ return 1;
+}
+
+int __attribute__ ((cold))
+foo ()
+{
+ return 0;
+}
+
+int __attribute__ ((cold))
+bar ()
+{
+ return 1;
+}
+
+int __attribute__ ((hot))
+dispatch ()
+{
+ return __builtin_dispatch (featureTest, (void *)foo, (void *) bar);
+}
+int __attribute__ ((hot))
+fn1 ()
+{
+ return dispatch ();
+}
+
+int __attribute__ ((cold))
+main ()
+{
+ return fn1 ();
+}
+
+/* { dg-final { scan-tree-dump "fn1_clone_1" "optimized" } } */
+/* { dg-final { scan-tree-dump "dispatch_clone_0" "optimized" } } */
+/* { dg-final { scan-tree-dump "dispatch_clone_1" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "main_clone_0" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "main_clone_1" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/gcc.dg/mversn6.c
===================================================================
--- testsuite/gcc.dg/mversn6.c (revision 0)
+++ testsuite/gcc.dg/mversn6.c (revision 0)
@@ -0,0 +1,29 @@
+/* Check that __builtin_dispatch gets converted and the executable runs fine.
+ when featureTest () is not marked with "version_selector". foo should be
+ called and return 0. Cloning and Hoisting is also done. */
+/* { dg-do run } */
+/* { dg-options "-O2 -fclone-hot-version-paths" } */
+
+int
+featureTest ()
+{
+ return 1;
+}
+
+int __attribute__ ((cold))
+foo ()
+{
+ return 0;
+}
+
+int __attribute__ ((cold))
+bar ()
+{
+ return 1;
+}
+
+int __attribute__ ((cold))
+main ()
+{
+ return __builtin_dispatch (featureTest, (void *)foo, (void *) bar);
+}
Index: testsuite/gcc.dg/mversn3.c
===================================================================
--- testsuite/gcc.dg/mversn3.c (revision 0)
+++ testsuite/gcc.dg/mversn3.c (revision 0)
@@ -0,0 +1,30 @@
+/* Simple check if parameters are passed correctly. */
+
+/* { dg-do run } */
+/* { dg-options "-O2 -fclone-hot-version-paths" } */
+
+int __attribute__ ((version_selector))
+featureTest ()
+{
+ return 1;
+}
+
+int __attribute__ ((cold))
+foo (int a)
+{
+ if (a == 1729)
+ return 0;
+ return 1;
+}
+
+int __attribute__ ((cold))
+bar (int a)
+{
+ return 1;
+}
+
+int __attribute__ ((cold))
+main ()
+{
+ return __builtin_dispatch (featureTest, (void *)foo, (void *) bar, 1729);
+}
Index: testsuite/g++.dg/mversn8.C
===================================================================
--- testsuite/g++.dg/mversn8.C (revision 0)
+++ testsuite/g++.dg/mversn8.C (revision 0)
@@ -0,0 +1,45 @@
+/* Call the caller of __builtin_dispatch indirectly. Specify the
+ feature test function as a function pointer. Make sure cloning
+ still happens. */
+
+/* { dg-do run } */
+/* { dg-options "-O2 -fclone-hot-version-paths -fdump-tree-optimized" } */
+
+int __attribute__ ((version_selector))
+featureTest ()
+{
+ return 1;
+}
+
+int
+foo ()
+{
+ return 0;
+}
+
+int
+bar ()
+{
+ return 1;
+}
+
+int __attribute__ ((hot))
+dispatch ()
+{
+ int (*funcp)() = featureTest;
+ int ret = __builtin_dispatch (funcp, (void *)foo, (void *)bar);
+ return ret;
+}
+
+int __attribute__ ((hot))
+main (int argc, char **argv)
+{
+ int (*ptr)() = dispatch;
+ return (*ptr)();
+}
+
+/* { dg-final { scan-tree-dump "dispatchv_clone_0" "optimized" } } */
+/* { dg-final { scan-tree-dump "dispatchv_clone_1" "optimized" } } */
+/* { dg-final { scan-tree-dump "main_clone_0" "optimized" } } */
+/* { dg-final { scan-tree-dump "main_clone_1" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/g++.dg/mversn10a.C
===================================================================
--- testsuite/g++.dg/mversn10a.C (revision 0)
+++ testsuite/g++.dg/mversn10a.C (revision 0)
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fclone-hot-version-paths -fdump-tree-optimized" } */
+
+static int __attribute__ ((version_selector))
+featureTest ()
+{
+ return 1;
+}
+
+static int foo ()
+{
+ return 0;
+}
+
+static int bar ()
+{
+ return 1;
+}
+
+static int __attribute__ ((noinline))
+dispatch ()
+{
+ __builtin_dispatch (featureTest, (void *)foo, (void *)bar);
+ return 0;
+}
+
+int
+fn2 ()
+{
+ for (int i = 0; i < 1000; i++)
+ dispatch ();
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump "dispatchv_clone_0" "optimized" } } */
+/* { dg-final { scan-tree-dump "dispatchv_clone_1" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/g++.dg/mversn14a.C
===================================================================
--- testsuite/g++.dg/mversn14a.C (revision 0)
+++ testsuite/g++.dg/mversn14a.C (revision 0)
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int __attribute__ ((version_selector))
+featureTest ()
+{
+ return 1;
+}
Index: testsuite/g++.dg/tree-prof/mversn13.C
===================================================================
--- testsuite/g++.dg/tree-prof/mversn13.C (revision 0)
+++ testsuite/g++.dg/tree-prof/mversn13.C (revision 0)
@@ -0,0 +1,37 @@
+/* Make sure -fprofile-generate and -fprofile-use work fine. */
+
+/* { dg-options "-O2 -fclone-hot-version-paths" } */
+
+static int glob = 0;
+int __attribute__ ((version_selector))
+featureTest ()
+{
+ return glob;
+}
+
+int bar (int i)
+{
+ if (i > 500)
+ return 2 * i;
+ return 3 * i;
+}
+
+int foo (int i)
+{
+ bar (i);
+}
+
+int
+dispatch ()
+{
+ int ret = 0;
+ for (int i = 0; i < 1000; i++)
+ ret += __builtin_dispatch (featureTest, (void *)foo, (void *)bar, i);
+ return ret;
+}
+
+int main ()
+{
+ int val = dispatch ();
+ return val > 10000000;
+}
Index: testsuite/g++.dg/tree-prof/mversn15.C
===================================================================
--- testsuite/g++.dg/tree-prof/mversn15.C (revision 0)
+++ testsuite/g++.dg/tree-prof/mversn15.C (revision 0)
@@ -0,0 +1,23 @@
+/* Make sure LIPO works correctly. dispatch is defined in mversn15a.C. It either
+ calls foo or bar and both returns 1. So, the value of ret is always 1000.
+ After cross-module inlining, main must return 0. */
+
+/* { dg-additional-sources "mversn15a.C" } */
+/* { dg-options "-O2 -fclone-hot-version-paths -fripa -fdump-tree-optimized" } */
+
+extern int foo ();
+extern int bar ();
+extern int dispatch ();
+
+int
+main ()
+{
+ int ret = 0;
+ for (int i = 1; i <= 1000; i++)
+ ret += dispatch ();
+ return ret - 1000;
+}
+
+/* { dg-final-use { scan-tree-dump "main_clone" "optimized" } } */
+/* { dg-final-use { scan-tree-dump "return 0" "optimized" } } */
+/* { dg-final-use { cleanup-tree-dump "optimized" } } */
Index: testsuite/g++.dg/tree-prof/mversn15a.C
===================================================================
--- testsuite/g++.dg/tree-prof/mversn15a.C (revision 0)
+++ testsuite/g++.dg/tree-prof/mversn15a.C (revision 0)
@@ -0,0 +1,26 @@
+/* { dg-options "-O2 -fclone-hot-version-paths -fripa" } */
+/* { dg-additional-sources "mversn15.C" } */
+
+#include <stdio.h>
+
+inline int
+ __attribute__ ((version_selector))
+featureTest()
+{
+ return 1;
+}
+
+int foo ()
+{
+ return 1;
+}
+
+int bar ()
+{
+ return 1;
+}
+
+int dispatch ()
+{
+ return __builtin_dispatch (featureTest, (void *)foo, (void *)bar);
+}
Index: testsuite/g++.dg/mversn9.C
===================================================================
--- testsuite/g++.dg/mversn9.C (revision 0)
+++ testsuite/g++.dg/mversn9.C (revision 0)
@@ -0,0 +1,30 @@
+/* Two __builtin_dispatch calls in different basic blocks. */
+
+/* { dg-do run } */
+/* { dg-options "-O2 -fclone-hot-version-paths" } */
+
+int __attribute__ ((version_selector))
+featureTest ()
+{
+ return 1;
+}
+
+int
+foo ()
+{
+ return 0;
+}
+
+int
+bar ()
+{
+ return 1;
+}
+
+int main (int argc, char **argv)
+{
+ if (argc)
+ return __builtin_dispatch (featureTest, (void *)foo, (void *)bar);
+ else
+ return (__builtin_dispatch (featureTest, (void *)bar, (void *)foo) - 1);
+}
Index: testsuite/g++.dg/mversn10.C
===================================================================
--- testsuite/g++.dg/mversn10.C (revision 0)
+++ testsuite/g++.dg/mversn10.C (revision 0)
@@ -0,0 +1,54 @@
+/* The purpose of this test is to check if the attributes of the clone
+ correctly shadow the parent function. In this case, function "dispatch"
+ is cloned but is a static function. The other file, "mversn10a.C" also
+ has a dispatch function that is cloned. So, if the attributes of the
+ clone are not correct the linker will complain. */
+
+/* { dg-do run } */
+/* { dg-additional-sources "mversn10a.C" } */
+/* { dg-options "-O2 -fclone-hot-version-paths -fdump-tree-optimized" } */
+
+static int __attribute__ ((version_selector))
+featureTest ()
+{
+ return 1;
+}
+
+static int foo ()
+{
+ return 0;
+}
+
+static int bar ()
+{
+ return 1;
+}
+
+static int __attribute__ ((noinline))
+dispatch ()
+{
+ __builtin_dispatch (featureTest, (void *)foo, (void *)bar);
+ return 0;
+}
+
+int
+fn1 ()
+{
+ for (int i = 0; i < 1000; i++)
+ dispatch ();
+ return 0;
+}
+
+extern int fn2 ();
+
+int __attribute__ ((hot))
+main ()
+{
+ fn1 ();
+ fn2 ();
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump "dispatchv_clone_0" "optimized" } } */
+/* { dg-final { scan-tree-dump "dispatchv_clone_1" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/g++.dg/mversn12.C
===================================================================
--- testsuite/g++.dg/mversn12.C (revision 0)
+++ testsuite/g++.dg/mversn12.C (revision 0)
@@ -0,0 +1,43 @@
+/* Check if everything is fine if the versioned functions are static
+ member functions. */
+
+/* { dg-do run } */
+/* { dg-options "-O2 -fclone-hot-version-paths" } */
+
+int __attribute__ ((version_selector))
+featureTest()
+{
+ return 1;;
+}
+
+class TestClass
+{
+ public:
+ static int foo ()
+ {
+ return 0;
+ }
+
+ static int bar ()
+ {
+ return 1;
+ }
+
+ int dispatch ()
+ {
+ int a = __builtin_dispatch (featureTest,
+ (void*)(&TestClass::foo),
+ (void*)(&TestClass::bar));
+ int b = __builtin_dispatch (featureTest,
+ (void*)(&TestClass::bar),
+ (void*)(&TestClass::foo));
+ return a * b;
+ }
+};
+
+int
+main ()
+{
+ TestClass c1;
+ return c1.dispatch ();
+}
Index: testsuite/g++.dg/mversn14.C
===================================================================
--- testsuite/g++.dg/mversn14.C (revision 0)
+++ testsuite/g++.dg/mversn14.C (revision 0)
@@ -0,0 +1,26 @@
+/* Check if everything is fine when the feature test body is in a different
+ module that does not have a __builtin-dispatch". Requires mversn14a.C. */
+
+/* { dg-do run } */
+/* { dg-additional-sources "mversn14a.C" } */
+/* { dg-options "-O2 -fclone-hot-version-paths" } */
+
+int __attribute__ ((version_selector))
+featureTest ();
+
+int
+foo ()
+{
+ return 0;
+}
+
+int
+bar ()
+{
+ return 1;
+}
+
+int main ()
+{
+ return __builtin_dispatch (featureTest, (void *)foo, (void *)bar);
+}
Index: testsuite/g++.dg/mversn16.C
===================================================================
--- testsuite/g++.dg/mversn16.C (revision 0)
+++ testsuite/g++.dg/mversn16.C (revision 0)
@@ -0,0 +1,39 @@
+/* dispatch is cloned. Make sure the double is returned to main correctly. */
+
+/* { dg-do run } */
+/* { dg-options "-O2 -fclone-hot-version-paths -fdump-tree-optimized" } */
+
+#include "assert.h"
+
+int __attribute__ ((version_selector))
+featureTest ()
+{
+ return 1;
+}
+
+static int foo ()
+{
+ return 0;
+}
+
+static int bar ()
+{
+ return 1;
+}
+
+double
+dispatch ()
+{
+ __builtin_dispatch (featureTest, (void *)foo, (void *)bar);
+ return 2.536;
+}
+
+int main ()
+{
+ double d = dispatch ();
+ assert (d == 2.536);
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump "dispatchv_clone" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/g++.dg/torture/mversn11.C
===================================================================
--- testsuite/g++.dg/torture/mversn11.C (revision 0)
+++ testsuite/g++.dg/torture/mversn11.C (revision 0)
@@ -0,0 +1,40 @@
+/* Check if parameters are passed correctly to the versioned function. */
+
+/* { dg-do run } */
+
+#include <stdio.h>
+#include <assert.h>
+
+int __attribute__ ((version_selector))
+featureTest ()
+{
+ return 1;
+}
+
+int __attribute__ ((noinline))
+foo (int n1, double n2, char n3)
+{
+ assert (n1 == 10);
+ assert (n2 == 20.0);
+ assert (n3 == 'c');
+ return n1;
+}
+
+
+int __attribute__ ((noinline))
+bar (int n1, double n2, char n3)
+{
+ assert (n1 == 10);
+ assert (n2 == 20.0);
+ assert (n3 == 'c');
+ return (n1 - 10);
+}
+
+int main ()
+{
+ int a = __builtin_dispatch (featureTest, (void *)foo, (void *)bar,
+ 10, 20.0, 'c');
+ int b = __builtin_dispatch (featureTest, (void *)bar, (void *)foo,
+ 10, 20.0, 'c');
+ return a * b;
+}
Index: testsuite/g++.dg/torture/mversn5.C
===================================================================
--- testsuite/g++.dg/torture/mversn5.C (revision 0)
+++ testsuite/g++.dg/torture/mversn5.C (revision 0)
@@ -0,0 +1,28 @@
+/* Check that global inline feature test functions are correctly handled. This
+ test case also needs mversn5a.c. extern_func calls foo and returns 0.*/
+
+/* { dg-do run } */
+/* { dg-additional-sources "mversn5a.C" } */
+
+#include "mversn5.h"
+
+extern int extern_func ();
+
+int foo ()
+{
+ return 0;
+}
+
+int bar ()
+{
+ return 1;
+}
+
+int __attribute__ ((cold))
+main ()
+{
+ int a = 1, b = 1;
+ a = extern_func ();
+ b = __builtin_dispatch (featureTest, (void *)bar, (void *) foo);
+ return a * b;
+}
Index: testsuite/g++.dg/torture/mversn5.h
===================================================================
--- testsuite/g++.dg/torture/mversn5.h (revision 0)
+++ testsuite/g++.dg/torture/mversn5.h (revision 0)
@@ -0,0 +1,5 @@
+inline int __attribute__ ((version_selector))
+featureTest ()
+{
+ return 1;
+}
Index: testsuite/g++.dg/torture/mversn5a.C
===================================================================
--- testsuite/g++.dg/torture/mversn5a.C (revision 0)
+++ testsuite/g++.dg/torture/mversn5a.C (revision 0)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+
+#include "mversn5.h"
+
+extern int foo ();
+extern int bar ();
+
+
+int extern_func ()
+{
+ return __builtin_dispatch (featureTest, (void *)foo, (void *) bar);
+}
Index: builtin-types.def
===================================================================
--- builtin-types.def (revision 173381)
+++ builtin-types.def (working copy)
@@ -475,3 +475,7 @@ DEF_FUNCTION_TYPE_VAR_5 (BT_FN_INT_INT_INT_INT_INT
DEF_POINTER_TYPE (BT_PTR_FN_VOID_VAR, BT_FN_VOID_VAR)
DEF_FUNCTION_TYPE_3 (BT_FN_PTR_PTR_FN_VOID_VAR_PTR_SIZE,
BT_PTR, BT_PTR_FN_VOID_VAR, BT_PTR, BT_SIZE)
+
+DEF_POINTER_TYPE (BT_PTR_FN_INT, BT_FN_INT)
+DEF_FUNCTION_TYPE_VAR_3 (BT_FN_INT_PTR_FN_INT_PTR_PTR_VAR,
+ BT_INT, BT_PTR_FN_INT, BT_PTR, BT_PTR)
Index: builtins.def
===================================================================
--- builtins.def (revision 173381)
+++ builtins.def (working copy)
@@ -760,6 +760,9 @@ DEF_BUILTIN (BUILT_IN_EMUTLS_REGISTER_COMMON,
true, true, true, ATTR_NOTHROW_LEAF_LIST, false,
!targetm.have_tls)
+/* Multiversioning builtin dispatch hook. */
+DEF_GCC_BUILTIN (BUILT_IN_DISPATCH, "dispatch", BT_FN_INT_PTR_FN_INT_PTR_PTR_VAR, ATTR_NULL)
+
/* Exception support. */
DEF_BUILTIN_STUB (BUILT_IN_UNWIND_RESUME, "__builtin_unwind_resume")
DEF_BUILTIN_STUB (BUILT_IN_CXA_END_CLEANUP, "__builtin_cxa_end_cleanup")
Index: mversn-dispatch.c
===================================================================
--- mversn-dispatch.c (revision 0)
+++ mversn-dispatch.c (revision 0)
@@ -0,0 +1,1766 @@
+/* Mulitversion Dispatch Pass.
+ Copyright (C) 2011 Free Software Foundation, Inc.
+ Contributed by Sriraman Tallam (tmsriram@google.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3. If not see
+<http://www.gnu.org/licenses/>. */
+
+
+/* This pass processes __builtin_dispatch calls to call multi-versioned
+ functions. Only two versions are supported now. Example use :
+
+ int popcnt_sse4(unsigned int x) __attribute__((__target__("popcnt")));
+ int popcnt_sse4(unsigned int x)
+ {
+ int count = __builtin_popcount(x);
+ return count;
+ }
+
+ int popcnt(unsigned int x) __attribute__((__target__("no-popcnt")));
+ int popcnt(unsigned int x)
+ {
+ int count = __builtin_popcount(x);
+ return count;
+ }
+
+ int testsse() __attribute__((version_selector));
+ int main ()
+ {
+ ...
+ ret = __builtin_dispatch (testsse, (void*)popcnt_sse4, (void*)popcnt, 25);
+ ...
+ }
+
+ There are two passes that are run to achieve multi-versioning.
+ "pass_ipa_multiversion_dispatch" is an ipa pass that decides which functions
+ have to be cloned and hoists the feature-test calls appropriately. This
+ pass can be enabled with the flag "-fclone-hot-version-paths" and disabled
+ with "-fno-clone-hot-version-paths".
+
+ "pass_tree_convert_builtin_dispatch" does the lowering. It is a
+ function-level pass. Functions marked with attribute "version_selector" are
+ also handled by this pass. This pass is always on.
+
+ How to use __builtin_dispatch ?
+ -----------------------------
+
+ __builtin_dispatch takes 3 mandatory arguments :
+
+ __builtin_dispatch (arg1, arg2, arg3, <arg4>, <arg5>, ...);
+
+ arg1 is the pointer to the feature-test function.
+ arg2 is the ( void *) cast pointer to the versioned function that is
+ executed when the feature test returns 1.
+ arg3 is the ( void *) cast pointer to the versioned function that is
+ executed when the feature test returns 0.
+ arg4, arg5, ... are optional. They are the arguments to the versioned
+ functions. Both versions must accept the same number of arguments.
+ The __builtin_dispatch function returns the value returned by the
+ versioned function that gets executed. The versioned function arg2
+ is executed when the feature_test function arg1 returns 1 and arg3
+ is executed when the feature_test function arg1 returns 0. arg1
+ could be marked as a "version_selector" function if it is a pure
+ function with no side-effects, returns a constant at run-time and
+ can be evaluated at any point in the execution.
+
+ When to use the "version_selector" attribute ?
+ -----------------------------------------------
+
+ Functions are marked with attribute "version_selector" only if
+ they are run-time constants. Example of such functions would
+ be those that test if a specific feature is available on a
+ particular architecture. Such functions must return a positive
+ integer. For two-way functions, those that test if a feature
+ is present or not must return 1 or 0 respectively.
+
+
+ The code is organized into five parts. The first part has the functionality
+ to detect and handle functions marked with attribute "version_selector". The
+ second part is the analysis phase where we find calls to __builtin_dispatch
+ and mark all functions that are hot and have a call-graph path to a
+ __builtin_dispatch call. The third part decides which functions
+ to clone. This is based on the number of clones that have to be created for
+ the functions marked in the analysis phase. Only two clones are allowed for
+ a function currently. The fourth part is where the actual cloning happens.
+ The fifth part contains the implementation to lower the __builtin_dispatch
+ calls.
+
+ Flags : -fclone-hot-version-paths does function unswitching via cloning.
+ --param=num-mversn-clones=<num> allows to specify the number of
+ functions that should be cloned.
+ --param=mversn-clone-depth=<num> allows to specify the length of
+ the call graph path that should be cloned. num = 0 implies only
+ leaf node that contains the __builtin_dispatch statement must be
+ cloned. */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "tree-inline.h"
+#include "langhooks.h"
+#include "flags.h"
+#include "cgraph.h"
+#include "diagnostic.h"
+#include "toplev.h"
+#include "timevar.h"
+#include "params.h"
+#include "fibheap.h"
+#include "intl.h"
+#include "tree-pass.h"
+#include "hashtab.h"
+#include "coverage.h"
+#include "ggc.h"
+#include "tree-flow.h"
+#include "rtl.h"
+#include "ipa-prop.h"
+#include "basic-block.h"
+#include "toplev.h"
+#include "dbgcnt.h"
+#include "tree-dump.h"
+#include "output.h"
+#include "vecprim.h"
+#include "gimple-pretty-print.h"
+
+typedef struct cgraph_node* NODEPTR;
+DEF_VEC_P (NODEPTR);
+DEF_VEC_ALLOC_P (NODEPTR, heap);
+
+/* Store the decl of __builtin_dispatch */
+static tree builtin_function_decl = NULL;
+
+/* Hash to map name to a decl. Used for variables and functions. */
+static htab_t name_decl_htab = NULL;
+
+/* Hashtable helpers for name_decl_htab. */
+
+static hashval_t
+name_decl_htab_hash_descriptor (const void *p)
+{
+ const_tree t = (const_tree) p;
+ const char *name
+ = (IDENTIFIER_POINTER (DECL_NAME (t)));
+ return htab_hash_string(name);
+}
+
+/* Hashtable helper for name_decl_htab. */
+
+static int
+name_decl_htab_eq_descriptor (const void *p1, const void *p2)
+{
+ const_tree t1 = (const_tree) p1;
+ const char *c1 = IDENTIFIER_POINTER (DECL_NAME (t1));
+ const char *c2 = (const char *)p2;
+
+ return (strcmp (c1, c2) == 0);
+}
+
+/* Return true if NODE is a hot function. It is a hot function
+ if its execution frequency is determined to be hot or
+ if any of its incoming or outgoing call-graph edges is hot. */
+
+static bool
+hot_function_p (struct cgraph_node *node)
+{
+ struct cgraph_edge *edge;
+
+ if (node->frequency == NODE_FREQUENCY_HOT)
+ return true;
+
+ for (edge = node->callees; edge; edge = edge->next_callee)
+ if (cgraph_maybe_hot_edge_p (edge))
+ return true;
+
+ for (edge = node->callers; edge; edge = edge->next_caller)
+ if (cgraph_maybe_hot_edge_p (edge))
+ return true;
+
+ return false;
+}
+
+/* Return the number of arguments that a function has. */
+
+static int
+function_args_count (tree fntype)
+{
+ function_args_iterator args_iter;
+ tree t;
+ int num = 0;
+
+ if (fntype)
+ {
+ FOREACH_FUNCTION_ARGS(fntype, t, args_iter)
+ {
+ num++;
+ }
+ }
+
+ return num;
+}
+
+/* Return the variable name (global/constructor) to use for the
+ version_selector function with name of DECL by appending SUFFIX. */
+
+static char *
+make_name (tree decl, const char *suffix)
+{
+ char *global_var_name;
+ int name_len;
+ const char *name;
+
+ name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+ name_len = strlen (name) + strlen (suffix) + 2;
+ global_var_name = (char *) xmalloc (name_len);
+ snprintf (global_var_name, name_len, "%s_%s", name, suffix);
+ return global_var_name;
+}
+
+/* Code for handling version_selector attribute functions. Such functions are
+ run-time constants and need to be executed only once. They are hoisted
+ to a static constructor and their result is stored in a global.
+ */
+
+
+/* This function returns the global variable / constructor name created
+ for feature-test functions marked with attribute "version_selector".
+ The name returned is the DECL name appended with
+ "version_selector_global" for the variable and
+ "version_selector_constructor" for the constructor. */
+
+static char*
+make_feature_test_global_name (tree decl, bool is_constructor)
+{
+ if (is_constructor)
+ return make_name (decl, "version_selector_constructor");
+
+ return make_name (decl, "version_selector_global");
+}
+
+/* This function creates a new VAR_DECL with attributes set
+ using the parameters. PUBLIK corresponds to TREE_PUBLIC,
+ EXTERNAL corresponds to DECL_EXTERNAL and comdat is
+ for DECL_ONE_ONLY. The global variable will have the
+ same status as the version_selector function.*/
+
+static tree
+allocate_new_var (const char *name, int publik,
+ int external, int comdat)
+{
+ tree new_global_var;
+ struct varpool_node *vnode;
+
+ new_global_var = build_decl (UNKNOWN_LOCATION,
+ VAR_DECL,
+ get_identifier (name),
+ integer_type_node);
+
+ DECL_EXTERNAL (new_global_var) = external;
+ TREE_STATIC (new_global_var) = 1;
+ TREE_PUBLIC (new_global_var) = publik;
+ DECL_INITIAL (new_global_var) = 0;
+ DECL_ARTIFICIAL (new_global_var) = 1;
+ DECL_PRESERVE_P (new_global_var) = 1;
+
+ if (comdat)
+ make_decl_one_only (new_global_var, DECL_ASSEMBLER_NAME (new_global_var));
+ assemble_variable (new_global_var, 0, 0, 0);
+
+ vnode = varpool_node (new_global_var);
+ gcc_assert (vnode != NULL);
+ /* Set finalized to 1, otherwise it asserts in function "write_symbol" in
+ lto-streamer-out.c. */
+ vnode->finalized = 1;
+
+ return new_global_var;
+}
+
+/* Make a new constructor function here to call a feature-test function
+ and set its body to CONSTRUCTOR_BODY. Its public and comdat
+ attributes are set from the parameters, PUBLIK, and COMDAT.
+ VERSION_SELECTOR_VAR is the global decl that saves the result of the
+ feature-test function in the constructor. */
+
+static tree
+make_constructor_function (char *name, gimple constructor_body, int publik,
+ int comdat, tree version_selector_var)
+{
+ tree decl, type, t;
+ gimple_seq seq;
+ basic_block new_bb;
+ tree old_current_function_decl;
+
+ type = build_function_type_list (void_type_node, NULL_TREE);
+
+ if (dump_file)
+ fprintf (dump_file, "Name of new constructor function = %s\n", name);
+
+ decl = build_fn_decl (name, type);
+
+ DECL_NAME (decl) = get_identifier (name);
+ SET_DECL_ASSEMBLER_NAME (decl, DECL_NAME (decl));
+ gcc_assert (cgraph_node (decl) != NULL);
+
+ TREE_USED (decl) = 1;
+ DECL_ARTIFICIAL (decl) = 1;
+ DECL_IGNORED_P (decl) = 0;
+ TREE_PUBLIC (decl) = publik;
+ DECL_UNINLINABLE (decl) = 1;
+ DECL_EXTERNAL (decl) = 0;
+ DECL_CONTEXT (decl) = NULL_TREE;
+ DECL_INITIAL (decl) = make_node (BLOCK);
+ DECL_STATIC_CONSTRUCTOR (decl) = 1;
+ TREE_READONLY (decl) = 0;
+ DECL_PURE_P (decl) = 0;
+
+ if (comdat)
+ make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
+
+ /* Build result decl and add to function_decl. */
+ t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, void_type_node);
+ DECL_ARTIFICIAL (t) = 1;
+ DECL_IGNORED_P (t) = 1;
+ DECL_RESULT (decl) = t;
+
+ gimplify_function_tree (decl);
+
+ /* Build CFG for this function. */
+
+ old_current_function_decl = current_function_decl;
+ push_cfun (DECL_STRUCT_FUNCTION (decl));
+ current_function_decl = decl;
+ init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
+ cfun->curr_properties |=
+ (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
+ PROP_ssa);
+ new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
+ make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
+
+ /* XXX: Not sure if the edge commented below is necessary. If I add this
+ edge, it fails in gimple_verify_flow_info in tree-cfg.c in condition :
+ " if (e->flags & EDGE_FALLTHRU)"
+ during -fprofile-generate.
+ Otherwise, it is fine. Deleting this edge does not break anything.
+ Commenting this so that it is clear I am intentionally not doing this.*/
+ /* make_edge (new_bb, EXIT_BLOCK_PTR, EDGE_FALLTHRU); */
+
+ seq = gimple_seq_alloc_with_stmt (constructor_body);
+
+ set_bb_seq (new_bb, seq);
+ gimple_set_bb (constructor_body, new_bb);
+
+ /* Set the lexical block of the constructor body. Fails the inliner
+ other wise. */
+ gimple_set_block (constructor_body, DECL_INITIAL (decl));
+
+ /* This call is very important if this pass runs when the IR is in
+ SSA form. It breaks things in strange ways otherwise. */
+ init_tree_ssa (DECL_STRUCT_FUNCTION (decl));
+ add_referenced_var (version_selector_var);
+
+ cgraph_add_new_function (decl, true);
+ cgraph_call_function_insertion_hooks (cgraph_node (decl));
+ cgraph_mark_needed_node (cgraph_node (decl));
+
+ if (dump_file)
+ dump_function_to_file (decl, dump_file, TDF_BLOCKS);
+
+ pop_cfun ();
+ current_function_decl = old_current_function_decl;
+ return decl;
+}
+
+/* If the current function is marked with attribute
+ "version_selector" then it is the predicate (feature-test) function
+ for multi-versioning. Call this function in a constructor and assign
+ the return value to a global variable.
+ The constructor's name is the decl name suffixed
+ "version_selector_constructor" and the global variable's name is the
+ decl name suffixed with "version_selector_global"
+
+ For example, feature-test function isSSE4 marked with attribute
+ version_selector is converted to
+
+ void isSSE4_version_selector_constructor ()
+ {
+ isSSE4_version_selector_global = isSSE4 ();
+ }
+
+ This function returns the decl of the global variable.
+
+ THIS_DECL is the function decl of the "version_selector" function.
+ */
+
+static tree
+handle_version_selector_attr_function (tree this_decl)
+{
+ char *global_var_name;
+ tree version_selector_var = NULL;
+ void **slot;
+
+ gcc_assert (!flag_lto);
+
+ if (dump_file)
+ fprintf (dump_file, "Creating constructor/global for function %s\n",
+ IDENTIFIER_POINTER (DECL_NAME (this_decl)));
+
+ global_var_name = make_feature_test_global_name (this_decl,
+ false);
+
+ slot = htab_find_slot_with_hash (name_decl_htab, global_var_name,
+ htab_hash_string (global_var_name),
+ INSERT);
+ if (*slot == NULL)
+ {
+ if (dump_file)
+ fprintf (dump_file, "Creating global variable %s\n",
+ global_var_name);
+ *slot = allocate_new_var (global_var_name,
+ TREE_PUBLIC (this_decl),
+ DECL_EXTERNAL (this_decl),
+ DECL_ONE_ONLY (this_decl));
+ }
+ else
+ {
+ free (global_var_name);
+ return (tree) *slot;
+ }
+
+ version_selector_var = (tree) *slot;
+
+ /* If the feature-test function is not external, create a constructor and
+ call this function in the constructor. */
+
+ if (!DECL_EXTERNAL (this_decl))
+ {
+ char *constructor_name;
+ gimple constructor_body;
+ tree constructor_decl;
+
+ constructor_name
+ = make_feature_test_global_name (this_decl, true);
+
+ constructor_body = gimple_build_call (this_decl, 0);
+
+ gimple_call_set_lhs (constructor_body, version_selector_var);
+
+ if (dump_file)
+ print_gimple_stmt (dump_file, constructor_body, 0, TDF_VOPS);
+
+ constructor_decl =
+ make_constructor_function (constructor_name, constructor_body,
+ TREE_PUBLIC (this_decl),
+ DECL_ONE_ONLY (this_decl),
+ version_selector_var);
+
+ gcc_assert (constructor_decl != NULL_TREE);
+ free (constructor_name);
+ }
+
+ free (global_var_name);
+ return version_selector_var;
+}
+
+/* Start Analysis phase. Mark all functions that are hot and have a call-graph
+ path to a __builtin_dispatch call. */
+
+/* This function returns the address of the feature test function.
+ If the address of the function is saved to a temporary,
+ this function traverses the gimple statements before BUILTIN_STMT
+ and finds an assignment whose rhs is the feature test function.
+ If the feature test function is specified as a function pointer
+ whose function value is unknown, this funcition returns NULL. */
+
+static tree
+find_version_selector_func_addr (gimple builtin_stmt)
+{
+ tree cond_func_addr = NULL;
+ gimple def_stmt = NULL;
+
+ cond_func_addr = gimple_call_arg (builtin_stmt, 0);
+
+ gcc_assert (TREE_CODE (cond_func_addr) == ADDR_EXPR
+ || TREE_CODE (cond_func_addr) == SSA_NAME);
+
+ if (TREE_CODE (cond_func_addr) == ADDR_EXPR)
+ return cond_func_addr;
+
+ /* TREE_CODE (cond_func_addr) == SSA_NAME
+ This means a new function pointer variable is created and assigned the
+ address of the feature-test function. Traverse the statements backwards
+ and find the assignment to get the RHS. */
+
+ def_stmt = SSA_NAME_DEF_STMT (cond_func_addr);
+
+ gcc_assert (def_stmt
+ && gimple_assign_lhs (def_stmt) == cond_func_addr);
+
+ cond_func_addr = gimple_assign_rhs1 (def_stmt);
+
+ /* If the cond_func_addr is still not an ADDR_EXPR, it means that the
+ feature-test function is specified as a pointer. In this case, we
+ return NULL, since the feature-test function decl is not known. */
+
+ if (cond_func_addr == NULL
+ || TREE_CODE (cond_func_addr) != ADDR_EXPR)
+ return NULL;
+
+ /* If the operand of the ADDR_EXPR is not a function_decl, return NULL
+ as this still means the feature-test function is specified as a
+ function pointer. */
+
+ if (TREE_CODE (TREE_OPERAND (cond_func_addr, 0)) != FUNCTION_DECL)
+ return NULL;
+
+ return cond_func_addr;
+}
+
+/* Finds the gimple calls to __builtin_dispatch in function pointed
+ to by the call graph NODE and populates the vector VEC. Returns
+ true if at least one statement was found where the feature test
+ function is marked as "version_selector". Otherwise, there is no
+ question of hoisting it. */
+
+static bool
+is_builtin_dispatch_stmt_present (struct cgraph_node *node,
+ VEC (tree,heap) **vec)
+{
+ struct cgraph_edge *edge;
+ bool present = false;
+
+ gcc_assert (!flag_lto);
+
+ for (edge = node->callees; edge; edge = edge->next_callee)
+ {
+ if (edge->callee->decl == builtin_function_decl)
+ {
+ tree cond_func_decl;
+ tree cond_func_addr;
+ gcc_assert (*vec != NULL);
+ cond_func_addr = find_version_selector_func_addr (edge->call_stmt);
+
+ if (cond_func_addr == NULL)
+ continue;
+
+ cond_func_decl = TREE_OPERAND (cond_func_addr, 0);
+
+ /* Do not consider for hoisting if "version_selector" attribute is
+ not set. */
+ if (lookup_attribute ("version_selector",
+ DECL_ATTRIBUTES (cond_func_decl)) == NULL)
+ {
+ if (dump_file)
+ {
+ fprintf (dump_file, "Not hoisting builtin_dispatch as "
+ "feature_test function not version_selector :\n");
+ print_gimple_stmt (dump_file, edge->call_stmt, 0, TDF_VOPS);
+ }
+ continue;
+ }
+
+ present = true;
+ VEC_safe_push (tree, heap, *vec, cond_func_decl);
+ }
+ }
+ return present;
+}
+
+/* Updates the list of feature-test function decls reaching the cgraph
+ function NODE. */
+
+static void
+update_reachable_decls_list (struct cgraph_node *node,
+ VEC (tree, heap) *predicate_decls)
+{
+ VEC (tree, heap) **decl_list = NULL;
+ tree cond_func_decl;
+ int ix;
+
+ if (node->aux == NULL)
+ {
+ decl_list = (VEC (tree, heap) **) xmalloc (sizeof (VEC (tree, heap) *));
+ *decl_list = VEC_alloc (tree, heap, 1);
+ node->aux = decl_list;
+ }
+ else
+ decl_list = (VEC (tree, heap) **) node->aux;
+
+ for (ix = 0; VEC_iterate (tree, predicate_decls, ix, cond_func_decl); ++ix)
+ VEC_safe_push (tree, heap, *decl_list, cond_func_decl);
+}
+
+/* Propagate the __builtin_dispatch stmt (s) called from node to its
+ callers, PREDICATE_DECLS is the decls list of the predicate functions. */
+
+static unsigned int
+mark_reachable_functions (struct cgraph_node *this_node,
+ VEC (tree, heap) *predicate_decls)
+{
+ VEC (NODEPTR, heap) *work_list;
+ VEC (int, heap) *depth_list;
+ struct cgraph_edge *e;
+ htab_t node_htab = NULL;
+ void **slot = NULL;
+
+ /* Use a work-list style algorithm to mark functions in any call-graph
+ path to the current function. */
+
+ work_list = VEC_alloc (NODEPTR, heap, 8);
+ depth_list = VEC_alloc (int, heap, 8);
+
+ VEC_safe_push (NODEPTR, heap, work_list, this_node);
+ VEC_safe_push (int, heap, depth_list, 0);
+
+ node_htab = htab_create (10, htab_hash_pointer,
+ htab_eq_pointer, NULL);
+
+ slot = htab_find_slot (node_htab, this_node, INSERT);
+
+ gcc_assert (*slot == NULL);
+ *slot = this_node;
+
+ while (!VEC_empty (NODEPTR, work_list))
+ {
+ struct cgraph_node *node = VEC_pop (NODEPTR, work_list);
+ int depth = VEC_pop (int, depth_list);
+
+ if (dump_file)
+ fprintf (dump_file, "%s has a depth = %d callgraph path to %s\n",
+ cgraph_node_name (node), depth,
+ cgraph_node_name (this_node));
+
+ update_reachable_decls_list (node, predicate_decls);
+
+ gcc_assert (node->aux != NULL);
+
+ if (depth >= PARAM_VALUE (PARAM_MVERSN_CLONE_CGRAPH_DEPTH))
+ {
+ if (dump_file)
+ fprintf (dump_file, "Not propogating __builtin_dispatch... "
+ "maximum cloning depth = %d reached\n",
+ PARAM_VALUE (PARAM_MVERSN_CLONE_CGRAPH_DEPTH));
+ continue;
+ }
+
+ for (e = node->callers; e; e = e->next_caller)
+ {
+ slot = htab_find_slot (node_htab, e->caller, INSERT);
+ if (*slot != NULL)
+ continue;
+ *slot = e->caller;
+ if (!hot_function_p (e->caller))
+ continue;
+
+ VEC_safe_push (NODEPTR, heap, work_list, e->caller);
+ VEC_safe_push (int, heap, depth_list, (depth + 1));
+ }
+ }
+
+ htab_delete (node_htab);
+ VEC_free (NODEPTR, heap, work_list);
+ VEC_free (int, heap, depth_list);
+ return 0;
+}
+
+/* Scan the call graph and detect hot functions that have __builtin_dispatch
+ calls. Then, propogate this information to its callers. Returns true if
+ a suitable __builtin_dispatch was found. */
+
+static bool
+perform_analysis_phase (void)
+{
+ struct cgraph_node *node;
+ VEC(tree, heap) *builtin_predicates_vec = NULL;
+ bool flag = false;
+
+ builtin_predicates_vec = VEC_alloc (tree, heap, 1);
+
+ for (node = cgraph_nodes; node; node = node->next)
+ {
+ /* if the body of this decl is from outside, do nothing. */
+ if (DECL_EXTERNAL (node->decl))
+ continue;
+
+ if (!hot_function_p (node))
+ continue;
+
+ if (!is_builtin_dispatch_stmt_present (node, &builtin_predicates_vec))
+ continue;
+
+ if (dump_file)
+ {
+ fprintf (dump_file, "%s calls __builtin_dispatch atleast once.\n",
+ cgraph_node_name (node));
+
+ fprintf (dump_file, "%s is a hot function, consider cloning ...\n",
+ cgraph_node_name (node));
+ }
+
+ flag = true;
+ mark_reachable_functions (node, builtin_predicates_vec);
+ VEC_truncate (tree, builtin_predicates_vec, 0);
+ }
+
+ VEC_free (tree, heap, builtin_predicates_vec);
+ return flag;
+}
+
+/* End Analysis phase. */
+
+/* Decide Cloning Phase.
+
+ In this phase, we go through each function and decide if it should be
+ cloned or not. */
+
+/* This function counts the number of unique decls in the DECL_LIST.*/
+
+static int
+count_predicate_functions (VEC (tree,heap) *decl_list)
+{
+ int ix;
+ int count = 0;
+ tree cond_func_decl = NULL;
+ htab_t dup_decl_htab = NULL;
+
+ if (VEC_length (tree, decl_list) == 1)
+ return 1;
+
+ dup_decl_htab = htab_create (2, htab_hash_pointer, htab_eq_pointer, NULL);
+
+ for (ix = 0; VEC_iterate (tree, decl_list, ix, cond_func_decl); ++ix)
+ {
+ void **slot = NULL;
+ slot = htab_find_slot (dup_decl_htab, cond_func_decl, INSERT);
+
+ if (*slot != NULL)
+ continue;
+ count++;
+ *slot = cond_func_decl;
+ }
+
+ htab_delete (dup_decl_htab);
+ return count;
+}
+
+/* This function decides which functions to clone based on the number of
+ feature_test decls reaching it. Currently, only one feature_test decl
+ is allowed. */
+
+static bool
+decide_cloning_phase (void)
+{
+ struct cgraph_node *node;
+ int count;
+ bool run_cloning_phase = false;
+ int num_funcs_cloned = 0;
+
+ for (node = cgraph_nodes; node; node = node->next)
+ {
+ tree cond_func_decl = NULL;
+ VEC (tree, heap) *vec;
+ if (node->aux == NULL)
+ continue;
+
+ if (num_funcs_cloned >= PARAM_VALUE (PARAM_NUMBER_OF_MVERSN_CLONES))
+ {
+ if (dump_file)
+ fprintf (dump_file, "Reached cloning limit specified "
+ "by \"num-mversn-clones\" for %s\n",
+ cgraph_node_name (node));
+
+ free (node->aux);
+ node->aux = NULL;
+ continue;
+ }
+
+ vec = *(VEC (tree,heap) **) node->aux;
+ count = count_predicate_functions (vec);
+ gcc_assert (count >= 1);
+ cond_func_decl = VEC_index (tree, vec, 0);
+ gcc_assert (cond_func_decl != NULL);
+ VEC_free (tree, heap, vec);
+ free (node->aux);
+ node->aux = NULL;
+
+ if (count > 1)
+ {
+ if (dump_file)
+ fprintf (dump_file, "%s has %d predicates, Not cloning for > 1\n",
+ cgraph_node_name (node), count);
+ continue;
+ }
+ /* Set the node's aux value to be that of the predicate decl. */
+ node->aux = cond_func_decl;
+ run_cloning_phase = true;
+ num_funcs_cloned++;
+ }
+ return run_cloning_phase;
+}
+
+/* End Decide Cloning Phase. */
+
+/* Cloning Phase. */
+
+/* Deletes all basic-blocks and leaves function with :
+ ENTRY_BLOCK ---> (new empty basic block) ---> EXIT_BLOCK
+*/
+
+static basic_block
+empty_function_body (tree fndecl)
+{
+ basic_block bb, new_bb;
+ edge e;
+ tree old_current_function_decl;
+
+ old_current_function_decl = current_function_decl;
+ push_cfun (DECL_STRUCT_FUNCTION (fndecl));
+ current_function_decl = fndecl;
+
+ clear_edges ();
+ for (bb = ENTRY_BLOCK_PTR; bb != NULL;)
+ {
+ basic_block bb_next;
+ bb_next = bb->next_bb;
+ if (bb != EXIT_BLOCK_PTR
+ && bb != ENTRY_BLOCK_PTR)
+ {
+ if (bb_seq (bb) != NULL)
+ {
+ gimple_stmt_iterator i;
+ for (i = gsi_start_bb (bb); !gsi_end_p (i);)
+ {
+ gimple stmt = gsi_stmt (i);
+ unlink_stmt_vdef (stmt);
+ gsi_remove (&i, true);
+ release_defs (stmt);
+ }
+ }
+ bb->il.gimple = NULL;
+ bb->prev_bb = NULL;
+ bb->next_bb = NULL;
+ SET_BASIC_BLOCK (bb->index, NULL);
+ n_basic_blocks--;
+ }
+ bb = bb_next;
+ }
+ ENTRY_BLOCK_PTR->next_bb = EXIT_BLOCK_PTR;
+ new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
+ e = make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
+ gcc_assert (e != NULL);
+ /* XXX:Is this edge necessary ? */
+ e = make_edge (new_bb, EXIT_BLOCK_PTR, 0);
+ gcc_assert (e != NULL);
+
+ current_function_decl = old_current_function_decl;
+ pop_cfun ();
+ return new_bb;
+}
+
+/* Takes function with decl ORIG_FNDECL and clones it. The
+ name of the clone is the original name suffixed with
+ NAME_SUFFIX. Code is adapted from cgraph_function_versioning
+ in cgraphunit.c */
+
+static tree
+clone_function (tree orig_fndecl, const char *name_suffix)
+{
+ tree new_decl;
+ char *new_name;
+ struct cgraph_node *new_version;
+ struct cgraph_node *old_version;
+ void **slot;
+ tree old_current_function_decl;
+
+ new_name = make_name (orig_fndecl, name_suffix);
+ new_decl = copy_node (orig_fndecl);
+
+
+ slot = htab_find_slot_with_hash (name_decl_htab, new_name,
+ htab_hash_string (new_name), INSERT);
+
+ gcc_assert (*slot == NULL);
+ *slot = new_decl;
+
+ /* Code adapted from cgraph_function_versioning in cgraphuinit.c */
+
+ new_version = cgraph_node (new_decl);
+ old_version = cgraph_node (orig_fndecl);
+
+ new_version->local = old_version->local;
+ new_version->global = old_version->global;
+ new_version->rtl = old_version->rtl;
+ new_version->reachable = true;
+ new_version->count = old_version->count;
+
+ /* Set the name of the new function. */
+ DECL_NAME (new_decl) = get_identifier (new_name);
+ SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl));
+ SET_DECL_RTL (new_decl, NULL);
+
+ tree_function_versioning (orig_fndecl, new_decl, NULL /*tree_map*/,
+ false, NULL /*args_to_skip*/,
+ NULL /* blocks_to_copy */ ,
+ NULL /* new_entry */);
+
+
+ old_current_function_decl = current_function_decl;
+ push_cfun (DECL_STRUCT_FUNCTION (new_decl));
+ current_function_decl = new_decl;
+
+ TREE_READONLY (new_decl) = TREE_READONLY (orig_fndecl);
+ TREE_STATIC (new_decl) = TREE_STATIC (orig_fndecl);
+ TREE_USED (new_decl) = TREE_USED (orig_fndecl);
+ DECL_ARTIFICIAL (new_decl) = 1;
+ DECL_IGNORED_P (new_decl) = 0;
+ TREE_PUBLIC (new_decl) = TREE_PUBLIC (orig_fndecl);
+ DECL_CONTEXT (new_decl) = DECL_CONTEXT (orig_fndecl);
+
+ DECL_EXTERNAL (new_decl) = DECL_EXTERNAL (orig_fndecl);
+ DECL_COMDAT (new_decl) = DECL_COMDAT (orig_fndecl);
+ DECL_COMDAT_GROUP (new_decl) = DECL_COMDAT_GROUP (orig_fndecl);
+ DECL_VIRTUAL_P (new_decl) = DECL_VIRTUAL_P (orig_fndecl);
+ DECL_WEAK (new_decl) = DECL_WEAK (orig_fndecl);
+
+ /* Always inline the clones. Why are we cloning otherwise? */
+ DECL_DECLARED_INLINE_P (new_decl) = 1;
+ DECL_UNINLINABLE (new_decl) = 0;
+ new_version->local.externally_visible
+ = old_version->local.externally_visible;
+ new_version->local.local
+ = old_version->local.local;
+
+ new_version->analyzed = true;
+ new_version->lowered = true;
+
+ if (dump_file)
+ dump_function_to_file (new_decl, dump_file, TDF_BLOCKS);
+
+ cgraph_add_new_function (new_decl, true);
+
+ cgraph_call_function_insertion_hooks (new_version);
+ cgraph_mark_needed_node (new_version);
+
+ pop_cfun ();
+ current_function_decl = old_current_function_decl;
+
+ return new_decl;
+}
+
+/* This function populates the vector *VEC with the args in the gimple
+ call statement STMT. SKIP_ARGS is the number of args to skip.*/
+
+static void
+get_function_args (gimple stmt, int num_args, VEC (tree, heap) **vec,
+ int skip_args)
+{
+ int i;
+
+ if (num_args == 0) return;
+
+ *vec = VEC_alloc (tree, heap, num_args);
+ /* The number of args in a function is 1 plus the actual number of
+ args. Also, there are 3 special args reserved, so the first arg
+ starts from 3. */
+ for (i = 0; i <= num_args - 2; ++i)
+ VEC_quick_push (tree, *vec, gimple_call_arg (stmt, (skip_args + i)));
+}
+
+/* Given ret = __builtin_dispatch (pred, fn1, fn2, arg1, ....)
+ get ret = fn1 (arg1, ...) or ret = fn2 (arg1, ....)
+ depending on the value of SIDE == 0 or 1. */
+
+static gimple
+make_specialized_call_from_builtin (gimple builtin_stmt, int side)
+{
+ tree func_addr;
+ int num_func_args = 0;
+ VEC (tree, heap) *nargs = NULL;
+ tree lhs_stmt;
+ gimple specialized_call_stmt;
+
+ if (side == 0)
+ func_addr = gimple_call_arg (builtin_stmt, 1);
+ else
+ func_addr = gimple_call_arg (builtin_stmt, 2);
+
+ num_func_args
+ = function_args_count (TREE_TYPE (TREE_OPERAND (func_addr, 0)));
+
+ get_function_args (builtin_stmt, num_func_args, &nargs, 3);
+
+ specialized_call_stmt = gimple_build_call_vec (func_addr, nargs);
+
+ lhs_stmt = gimple_call_lhs (builtin_stmt);
+
+ if (lhs_stmt != NULL_TREE)
+ gimple_call_set_lhs (specialized_call_stmt, lhs_stmt);
+
+ if (nargs != NULL)
+ VEC_free (tree, heap, nargs);
+
+ return specialized_call_stmt;
+}
+
+/* Given a call (GENERIC_STMT) to a function that is cloned, substitute
+ with a call to the correct clone. */
+
+static gimple
+make_specialized_call_to_clone (gimple generic_stmt, int side)
+{
+ tree new_decl;
+ char *new_name;
+ tree generic_fndecl;
+ gimple specialized_call_stmt;
+ void **slot;
+ int num_func_args;
+ tree lhs_stmt;
+ VEC (tree, heap) *nargs= NULL;
+
+ generic_fndecl = gimple_call_fndecl (generic_stmt);
+ gcc_assert (generic_fndecl != NULL);
+
+ if (side == 0)
+ new_name = make_name (generic_fndecl, "clone_0");
+ else
+ new_name = make_name (generic_fndecl, "clone_1");
+
+ slot = htab_find_slot_with_hash (name_decl_htab, new_name,
+ htab_hash_string (new_name), NO_INSERT);
+ gcc_assert (slot != NULL);
+ new_decl = (tree) *slot;
+ gcc_assert (new_decl);
+
+ num_func_args = function_args_count (TREE_TYPE (generic_fndecl));
+ get_function_args (generic_stmt, num_func_args, &nargs, 0);
+ specialized_call_stmt = gimple_build_call_vec (new_decl, nargs);
+
+ lhs_stmt = gimple_call_lhs (generic_stmt);
+
+ if (lhs_stmt != NULL_TREE)
+ gimple_call_set_lhs (specialized_call_stmt, lhs_stmt);
+
+ if (nargs != NULL)
+ VEC_free (tree, heap, nargs);
+
+ return specialized_call_stmt;
+}
+
+/* Returns true if STMT is a call to __builtin_dispatch and its
+ predicate feature-test function is marked with attribute
+ "version_selector". */
+
+static bool
+is_builtin_with_predicate_version_selector (gimple stmt)
+{
+ tree cond_func_addr, cond_func_decl;
+
+ gcc_assert (!flag_lto);
+
+ if (gimple_call_fndecl (stmt) != builtin_function_decl)
+ return false;
+
+ cond_func_addr = find_version_selector_func_addr (stmt);
+
+ if (cond_func_addr == NULL)
+ return false;
+
+ cond_func_decl = TREE_OPERAND (cond_func_addr, 0);
+ if (lookup_attribute ("version_selector",
+ DECL_ATTRIBUTES (cond_func_decl)) != NULL)
+ return true;
+
+ return false;
+}
+
+/* Find calls to __builtin_dispatch or to functions that are versioned
+ in CLONE_DECL and substitute the call with the correct version based
+ on the value of SIDE. */
+
+static void
+specialize_call (tree clone_decl, int side)
+{
+ basic_block bb;
+ tree old_current_function_decl;
+
+ old_current_function_decl = current_function_decl;
+ push_cfun (DECL_STRUCT_FUNCTION (clone_decl));
+ current_function_decl = clone_decl;
+
+ /* Iterate over call edges and find out if there is
+ a call to __builtin_dispatch or a cloned function.
+ We cannot iterate over call graph edges as there are
+ no edges for the clones yet. */
+
+ FOR_EACH_BB_FN (bb, DECL_STRUCT_FUNCTION (clone_decl))
+ {
+ gimple_stmt_iterator gsi;
+ for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+ {
+ tree callee_decl;
+ struct cgraph_node *callee_node;
+ gimple specialized_call_stmt;
+ gimple stmt = gsi_stmt (gsi);
+
+ if (!is_gimple_call (stmt))
+ continue;
+
+ callee_decl = gimple_call_fndecl (stmt);
+
+ if (callee_decl == NULL)
+ continue;
+
+ callee_node = cgraph_node (callee_decl);
+
+ /* For a __builtin_dispatch stmt, only specialize if
+ version_selector attribute is set. Otherwise, it is
+ not hoisted, so no specialization. */
+
+ if (is_builtin_with_predicate_version_selector (stmt))
+ {
+ specialized_call_stmt =
+ make_specialized_call_from_builtin (stmt, side);
+ }
+ else if (callee_node->aux != NULL)
+ {
+ specialized_call_stmt =
+ make_specialized_call_to_clone (stmt, side);
+ }
+ else
+ continue;
+
+ if (dump_file)
+ {
+ fprintf (dump_file, "Specialize stmt : \n");
+ print_gimple_stmt (dump_file, stmt, 0, TDF_VOPS);
+ fprintf (dump_file, "Specialized stmt : \n");
+ print_gimple_stmt (dump_file, specialized_call_stmt,
+ 0, TDF_VOPS);
+ }
+
+ gimple_set_block (specialized_call_stmt, gimple_block (stmt));
+ gsi_insert_before_without_update (&gsi, specialized_call_stmt,
+ GSI_SAME_STMT);
+
+
+ unlink_stmt_vdef (stmt);
+ gsi_remove (&gsi, true);
+ mark_symbols_for_renaming (specialized_call_stmt);
+
+ /* After removing make sure gsi is set correctly to not skip
+ a statememt. */
+ gsi = gsi_for_stmt (specialized_call_stmt);
+ }
+ }
+ current_function_decl = old_current_function_decl;
+ pop_cfun ();
+}
+
+/* When a function is version cloned, its body is replaced to call one
+ of the versions with the feature-test function acting as a predicate.
+ This is done with __builtin_dispatch which is later expanded. */
+
+static gimple
+make_builtin_call_to_clones (tree orig_fndecl, tree clone_0_addr,
+ tree clone_1_addr, tree cond_func_addr)
+{
+ gimple new_builtin_call;
+ VEC(tree, heap) *vargs = VEC_alloc (tree, heap, 4);
+ tree arg;
+
+ VEC_quick_push (tree, vargs, cond_func_addr);
+ VEC_quick_push (tree, vargs, clone_0_addr);
+ VEC_quick_push (tree, vargs, clone_1_addr);
+
+ for (arg = DECL_ARGUMENTS (orig_fndecl); arg; arg = TREE_CHAIN (arg))
+ {
+ VEC_safe_push (tree, heap, vargs, arg);
+ /* Again, this add_referenced_var is very very important. It broke
+ a build where a cloned function's arguments where never
+ referenced. Missing this statement in places asserts at
+ tree-dfa.c:589, in function referenced_var_lookup at
+ "gcc_assert (h || uid == 0);" and is very difficult to triage. */
+ add_referenced_var (arg);
+ }
+
+ new_builtin_call = gimple_build_call_vec (builtin_function_decl, vargs);
+ mark_symbols_for_renaming (new_builtin_call);
+
+
+ if (dump_file)
+ print_gimple_stmt (dump_file, new_builtin_call, 0, TDF_VOPS);
+
+ VEC_free (tree, heap, vargs);
+
+ return new_builtin_call;
+}
+
+/* This clones a dispatch function whose callee-graph path has a function
+ which calls __builtin_dispatch. This function is cloned and the
+ original function branches to the right clone. */
+
+static int
+clone_and_dispatch_function (struct cgraph_node *orig_node, tree *clone_0,
+ tree *clone_1)
+{
+ tree clone_0_decl, clone_1_decl;
+ gimple new_builtin_call = NULL;
+ gimple new_return_stmt = NULL;
+ gimple_seq seq = NULL;
+ basic_block new_bb;
+ tree orig_fndecl;
+ tree return_var = NULL;
+ tree return_type;
+ tree old_current_function_decl;
+
+ old_current_function_decl = current_function_decl;
+ orig_fndecl = orig_node->decl;
+ push_cfun (DECL_STRUCT_FUNCTION (orig_fndecl));
+ current_function_decl = orig_fndecl;
+
+ /* Make 2 clones for true and false function. */
+ clone_0_decl = clone_function (orig_fndecl, "clone_0");
+ clone_1_decl = clone_function (orig_fndecl, "clone_1");
+ *clone_0 = clone_0_decl;
+ *clone_1 = clone_1_decl;
+
+ new_bb = empty_function_body (orig_fndecl);
+
+ new_builtin_call = make_builtin_call_to_clones (
+ orig_fndecl,
+ build_fold_addr_expr (clone_0_decl),
+ build_fold_addr_expr (clone_1_decl),
+ build_fold_addr_expr ((tree)orig_node->aux));
+
+ return_type = TREE_TYPE (TREE_TYPE (orig_fndecl));
+
+ if (!TREE_ADDRESSABLE (return_type) && COMPLETE_TYPE_P (return_type))
+ {
+ tree tmp_var;
+ tmp_var = create_tmp_var (return_type, NULL);
+ add_referenced_var (tmp_var);
+ return_var = make_ssa_name (tmp_var, new_builtin_call);
+ gimple_call_set_lhs (new_builtin_call, return_var);
+ }
+
+ mark_symbols_for_renaming (new_builtin_call);
+ new_return_stmt = gimple_build_return (return_var);
+ mark_symbols_for_renaming (new_return_stmt);
+ gimple_seq_add_stmt (&seq, new_builtin_call);
+ gimple_seq_add_stmt (&seq, new_return_stmt);
+ set_bb_seq (new_bb, seq);
+ gimple_set_bb (new_builtin_call, new_bb);
+ gimple_set_bb (new_return_stmt, new_bb);
+
+ gimple_set_block (new_builtin_call, DECL_INITIAL (orig_fndecl));
+ gimple_set_block (new_return_stmt, DECL_INITIAL (orig_fndecl));
+
+ if (dump_file)
+ dump_function_to_file (orig_fndecl, dump_file, TDF_BLOCKS);
+
+ /* This update_ssa is necessary here for the following reason. SSA uses
+ a global syms_to_rename bitmap that stores syms that must be renamed.
+ So, if we accumulate the syms from one function in IPA but move to
+ a different function without updating SSA, then we could be
+ accumulating syms from many functions. This would assert in
+ referenced_var_lookup because the hashtab storing the syms is
+ function local. This is horrible. gcc-4.6 makes this bitmap a
+ global. */
+ update_ssa (TODO_update_ssa);
+
+ compute_inline_parameters (cgraph_node (orig_fndecl));
+ DECL_DECLARED_INLINE_P (orig_fndecl) = 1;
+ DECL_UNINLINABLE (orig_fndecl) = 0;
+ current_function_decl = old_current_function_decl;
+ pop_cfun ();
+ return 0;
+}
+
+/* Clone all functions marked for cloning by the earlier phase. */
+
+static void
+perform_cloning_phase (void)
+{
+ struct cgraph_node *node;
+ int ix;
+ VEC (tree, heap) *cloned_decl_list = NULL;
+ tree cloned_decl = NULL;
+
+ cloned_decl_list = VEC_alloc (tree, heap, 2);
+
+ /* First clone, then specialize the clones. */
+ for (node = cgraph_nodes; node; node = node->next)
+ {
+ tree clone_0_decl, clone_1_decl;
+ if (node->aux == NULL)
+ continue;
+ if (dump_file)
+ {
+ fprintf (dump_file, "%s will be cloned\n", cgraph_node_name (node));
+ dump_function_to_file (node->decl, dump_file, TDF_BLOCKS);
+ }
+ clone_and_dispatch_function (node, &clone_0_decl, &clone_1_decl);
+ VEC_safe_push (tree, heap, cloned_decl_list, clone_0_decl);
+ VEC_safe_push (tree, heap, cloned_decl_list, clone_1_decl);
+ continue;
+ }
+
+ /* Specialize the clones now. */
+ for (ix = 0; VEC_iterate (tree, cloned_decl_list, ix, cloned_decl); ++ix)
+ {
+ int which_clone = ix % 2;
+ specialize_call (cloned_decl, which_clone);
+ }
+
+ VEC_free (tree, heap, cloned_decl_list);
+}
+
+/* End Cloning phase. */
+
+/* Checks if there is atleast one call to __builtin_dispatch. */
+
+static bool
+find_builtin_decl (void)
+{
+ struct cgraph_node *node;
+ for (node = cgraph_nodes; node; node = node->next)
+ {
+ if (strstr (cgraph_node_name (node), "__builtin_dispatch") != NULL)
+ {
+ builtin_function_decl = node->decl;
+ return true;
+ }
+ }
+ return false;
+}
+
+/* Set the aux fields of all nodes and edges in the call graph to be NULL. */
+
+static void
+cleanup_aux_field (void)
+{
+ struct cgraph_node *node;
+ struct cgraph_edge *edge;
+
+ for (node = cgraph_nodes; node; node = node->next)
+ {
+ node->aux = NULL;
+ for (edge = node->callees; edge; edge = edge->next_callee)
+ edge->aux = NULL;
+ }
+}
+
+/* Main driver function. It scans the __builtin_dispatch calls and
+ figures out which functions to clone. It then clones the functions. */
+
+static unsigned int
+builtin_dispatch_ipa_clone (void)
+{
+ cleanup_aux_field ();
+
+ /* Allocate hashtab mapping name to decl. */
+ name_decl_htab = htab_create (10, name_decl_htab_hash_descriptor,
+ name_decl_htab_eq_descriptor, NULL);
+
+ /* Turn it on for O1 and above. At -O0, there is a SSA alias bug
+ with create_tmp_var. Cloning and hoisting is not necessary at
+ -O0 anyways. Also, guard it with the flag
+ "-fclone-hot-version-paths".
+ Disabled for LTO as it needs more work. */
+ if (optimize == 0
+ || profile_arc_flag
+ || !flag_clone_hot_version_paths
+ || flag_lto)
+ return 0;
+
+ if (!find_builtin_decl ())
+ return 0;
+
+ gcc_assert (builtin_function_decl != NULL);
+
+ if (!perform_analysis_phase ())
+ {
+ cleanup_aux_field ();
+ return 0;
+ }
+
+ if (decide_cloning_phase ())
+ perform_cloning_phase ();
+
+ cleanup_aux_field ();
+
+ return 0;
+}
+
+static bool
+gate_handle_builtin_dispatch (void)
+{
+ return true;
+}
+
+struct simple_ipa_opt_pass pass_ipa_multiversion_dispatch =
+{
+ {
+ SIMPLE_IPA_PASS,
+ "multiversion_dispatch", /* name */
+ gate_handle_builtin_dispatch, /* gate */
+ builtin_dispatch_ipa_clone, /* execute */
+ NULL, /* sub */
+ NULL, /* next */
+ 0, /* static_pass_number */
+ TV_MVERSN_DISPATCH, /* tv_id */
+ 0, /* properties_required */
+ PROP_cfg, /* properties_provided */
+ 0, /* properties_destroyed */
+ 0, /* todo_flags_start */
+ TODO_dump_func | /* todo_flags_finish */
+ TODO_update_ssa
+ }
+};
+
+/* Lowering of the __builtin_dispatch calls. */
+
+
+/* This function converts STMT which is a __builtin_dispatch
+ call of the form :
+ ret = __builtin_dispatch (predicate, foo, bar, arg1, ...)
+ into :
+ var_1 = predicate
+ if (var_1)
+ var_2 = foo (arg1, ...);
+ else
+ var_3 = bar (arg1, ...);
+ var_4 = phi (var_2, var_3)
+ ret = var_4
+
+ var_? are ssa names for variable var.
+*/
+
+static unsigned int
+convert_builtin_dispatch (gimple stmt)
+{
+ tree cond_func_addr, if_func_addr, else_func_addr;
+ tree cond_func_decl = NULL;
+ gimple if_part, else_part, if_else_stmt;
+ basic_block bb1, bb2, bb3, bb4;
+ gimple bb1end, bb2end, bb3end;
+ edge e12, e13, e23, e24, e34;
+ VEC(tree, heap) *nargs = NULL;
+ int num_func_args = 0, i;
+ tree version_selector_var;
+ tree lhs_result;
+ gimple_stmt_iterator gsi;
+ basic_block bb;
+ gimple feature_test_call = NULL;
+ tree tmp_var = NULL;
+ gimple init_stmt = NULL;
+ tree ssa_if_name, ssa_else_name;
+ gimple phinode = NULL;
+ tree tmp_result_var, ssa_result_var;
+
+ gsi = gsi_for_stmt (stmt);
+ bb = gsi_bb (gsi);
+
+ cond_func_addr = find_version_selector_func_addr (stmt);
+ if (cond_func_addr != NULL)
+ {
+ cond_func_decl = TREE_OPERAND (cond_func_addr, 0);
+ gcc_assert (cond_func_decl);
+ }
+
+ if (dump_file)
+ {
+ fprintf (dump_file, "Before Converting __builtin_dispatch :\n");
+ dump_function_to_file (current_function_decl, dump_file, TDF_BLOCKS);
+ }
+
+ if_func_addr = gimple_call_arg (stmt, 1);
+ else_func_addr = gimple_call_arg (stmt, 2);
+
+ tmp_result_var = create_tmp_var (integer_type_node, NULL);
+ add_referenced_var (tmp_result_var);
+
+ if (flag_lto
+ || cond_func_decl == NULL
+ || lookup_attribute ("version_selector",
+ DECL_ATTRIBUTES (cond_func_decl)) == NULL)
+ {
+ tree arg = gimple_call_arg (stmt, 0);
+ /* This means the feature-test function is not set with attribute
+ version_selector or it is a function pointer or in LTO. So,
+ explicitly call it. */
+ feature_test_call = gimple_build_call (arg, 0);
+ ssa_result_var = make_ssa_name (tmp_result_var, feature_test_call);
+ gimple_call_set_lhs (feature_test_call, ssa_result_var);
+ mark_symbols_for_renaming (feature_test_call);
+ version_selector_var = ssa_result_var;
+ }
+ else
+ {
+ /* Get the global corresponding to the "version_selector" function. */
+ version_selector_var
+ = handle_version_selector_attr_function (cond_func_decl);
+ gcc_assert (version_selector_var);
+ add_referenced_var (version_selector_var);
+ feature_test_call = gimple_build_assign (tmp_result_var,
+ version_selector_var);
+ ssa_result_var = make_ssa_name (tmp_result_var, feature_test_call);
+ gimple_assign_set_lhs (feature_test_call, ssa_result_var);
+ mark_symbols_for_renaming (feature_test_call);
+ version_selector_var = ssa_result_var;
+ }
+
+ if_else_stmt = gimple_build_cond (GT_EXPR,
+ version_selector_var,
+ integer_zero_node,
+ NULL_TREE, NULL_TREE);
+
+ mark_symbols_for_renaming (if_else_stmt);
+
+ num_func_args = function_args_count (
+ TREE_TYPE (TREE_OPERAND (if_func_addr, 0)));
+
+ nargs = VEC_alloc (tree, heap, num_func_args);
+
+ /* The arguments to the feature test function start from the 4th argument
+ in __builtin_dispatch. The first 3 arguments are mandatory. */
+
+ for (i = 0; i <= num_func_args - 2; ++i)
+ VEC_quick_push (tree, nargs,
+ gimple_call_arg (stmt, (3 + i)));
+
+ if_part = gimple_build_call_vec (if_func_addr, nargs);
+ else_part = gimple_build_call_vec (else_func_addr, nargs);
+
+ lhs_result = gimple_call_lhs (stmt);
+
+ if (lhs_result != NULL_TREE)
+ {
+ tree ssa_var;
+ tree return_type;
+ return_type = TREE_TYPE (lhs_result);
+ tmp_var = create_tmp_var (return_type, NULL);
+ add_referenced_var (tmp_var);
+
+ init_stmt = gimple_build_assign (tmp_var, build_zero_cst (return_type));
+ ssa_var = make_ssa_name (tmp_var, init_stmt);
+ gimple_assign_set_lhs (init_stmt, ssa_var);
+ mark_symbols_for_renaming (init_stmt);
+
+ ssa_if_name = make_ssa_name (tmp_var, init_stmt);
+ ssa_else_name = make_ssa_name (tmp_var, init_stmt);
+ gimple_call_set_lhs (if_part, ssa_if_name);
+ gimple_call_set_lhs (else_part, ssa_else_name);
+ }
+ mark_symbols_for_renaming (if_part);
+ mark_symbols_for_renaming (else_part);
+
+ /* Set the lexical block to be the same as the dispatch call. */
+ gcc_assert (feature_test_call);
+ gimple_set_block (feature_test_call, gimple_block (stmt));
+
+ if (init_stmt)
+ gimple_set_block (init_stmt, gimple_block (stmt));
+
+ gimple_set_block (if_else_stmt, gimple_block (stmt));
+ gimple_set_block (if_part, gimple_block (stmt));
+ gimple_set_block (else_part, gimple_block (stmt));
+
+ gsi_insert_before_without_update (&gsi, feature_test_call, GSI_SAME_STMT);
+ gimple_set_bb (feature_test_call, bb);
+
+ if (init_stmt)
+ {
+ gsi_insert_before_without_update (&gsi, init_stmt,
+ GSI_SAME_STMT);
+ gimple_set_bb (init_stmt, bb);
+ }
+
+ gsi_insert_before_without_update (&gsi, if_else_stmt, GSI_SAME_STMT);
+ gsi_insert_before_without_update (&gsi, if_part, GSI_SAME_STMT);
+ gsi_insert_before_without_update (&gsi, else_part, GSI_SAME_STMT);
+
+ /* Remove the builtin_dispatch call after the expansion. */
+ unlink_stmt_vdef (stmt);
+ gsi_remove (&gsi, true);
+
+ bb1end = if_else_stmt;
+ bb2end = if_part;
+ bb3end = else_part;
+ bb1 = bb;
+ e12 = split_block (bb1, bb1end);
+ bb2 = e12->dest;
+ e23 = split_block (bb2, bb2end);
+ bb3 = e23->dest;
+ e34 = split_block (bb3, bb3end);
+ bb4 = e34->dest;
+
+ e12->flags &= ~EDGE_FALLTHRU;
+ e12->flags |= EDGE_TRUE_VALUE;
+ e13 = make_edge (bb1, bb3, EDGE_FALSE_VALUE);
+ gcc_assert (e13);
+ e24 = make_edge (bb2, bb4, EDGE_FALLTHRU);
+ gcc_assert (e24);
+ remove_edge (e23);
+
+ if (tmp_var)
+ {
+ gimple assign_stmt;
+ phinode = create_phi_node (tmp_var, bb4);
+ add_phi_arg (phinode, ssa_if_name, e24, UNKNOWN_LOCATION);
+ add_phi_arg (phinode, ssa_else_name, e34, UNKNOWN_LOCATION);
+ mark_symbols_for_renaming (phinode);
+ gcc_assert (lhs_result);
+ assign_stmt
+ = gimple_build_assign (lhs_result, gimple_phi_result (phinode));
+ mark_symbols_for_renaming (assign_stmt);
+ gsi = gsi_start_bb (bb4);
+ gsi_insert_before_without_update (&gsi, assign_stmt, GSI_SAME_STMT);
+ gimple_set_bb (assign_stmt, bb4);
+ }
+
+ if (dump_file)
+ {
+ fprintf (dump_file, "Converted __builtin_dispatch :\n");
+ dump_function_to_file (current_function_decl, dump_file, TDF_BLOCKS);
+ }
+
+ return 0;
+}
+
+/* This function does two things.
+
+ 1) For a feature-test function marked with attribute "version_selector",
+ it creates a constructor that calls the feature-test function and a
+ global that holds the result. The global's result will be used
+ to lower any __builtin_dispatch statement that refers to this feature
+ test function. The __builtin_dispatch statement and the feature test
+ function can be in different modules.
+
+ 2) It lowers __builtin_dispatch statements. */
+
+static unsigned int
+do_convert_builtin_dispatch (void)
+{
+ basic_block bb;
+ gimple_stmt_iterator gsi;
+ VEC (gimple, heap) *builtin_stmt_list = NULL;
+ int ix;
+ gimple builtin_stmt;
+
+ /* Allocate hashtab mapping name to decl. */
+ if (name_decl_htab == NULL)
+ name_decl_htab = htab_create (10, name_decl_htab_hash_descriptor,
+ name_decl_htab_eq_descriptor, NULL);
+
+ /* Look for functions with attribute "version_selector" and make a
+ constructor which calls the function and saves the result in a
+ global. Disabled for LTO as it needs more work. */
+
+ if (!flag_lto
+ && lookup_attribute ("version_selector",
+ DECL_ATTRIBUTES (current_function_decl)) != NULL)
+ {
+ if (dump_file)
+ fprintf (dump_file, "Function with version_selector attribute found :"
+ " %s. Making constructor for it.\n",
+ current_function_name ());
+
+ handle_version_selector_attr_function (current_function_decl);
+ /* Assume there are no __builtin_dispatch calls in feature test
+ functions. So it is safe to return. */
+ return 0;
+ }
+
+ /* Find and lower __builtin_dispatch calls. */
+
+ FOR_EACH_BB_FN (bb, DECL_STRUCT_FUNCTION (current_function_decl))
+ {
+ for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+ {
+ gimple stmt = gsi_stmt (gsi);
+ tree call_decl;
+
+ if (!is_gimple_call (stmt))
+ continue;
+
+ call_decl = gimple_call_fndecl (stmt);
+
+ if (call_decl == NULL)
+ continue;
+
+ if (DECL_NAME (call_decl) == NULL_TREE)
+ continue;
+
+ if (strstr (IDENTIFIER_POINTER (DECL_NAME (call_decl)),
+ "__builtin_dispatch") == NULL)
+ continue;
+
+ if (dump_file)
+ {
+ fprintf (dump_file, "Converting __builtin_dispatch stmt in:%s\n",
+ current_function_name ());
+ print_gimple_stmt (dump_file, stmt, 0, TDF_VOPS);
+ }
+
+ if (builtin_stmt_list == NULL)
+ builtin_stmt_list = VEC_alloc (gimple, heap, 2);
+
+ gcc_assert (builtin_stmt_list != NULL);
+ VEC_safe_push (gimple, heap, builtin_stmt_list, stmt);
+ }
+ }
+
+ if (!builtin_stmt_list)
+ return 0;
+
+ for (ix = 0; VEC_iterate (gimple, builtin_stmt_list, ix, builtin_stmt);
+ ++ix)
+ convert_builtin_dispatch (builtin_stmt);
+
+ compute_inline_parameters (cgraph_node (current_function_decl));
+
+ VEC_free (gimple, heap, builtin_stmt_list);
+
+ return 0;
+}
+
+static bool
+gate_convert_builtin_dispatch (void)
+{
+ return true;
+}
+
+struct gimple_opt_pass pass_tree_convert_builtin_dispatch =
+{
+ {
+ GIMPLE_PASS,
+ "convert_builtin_dispatch", /* name */
+ gate_convert_builtin_dispatch, /* gate */
+ do_convert_builtin_dispatch, /* execute */
+ NULL, /* sub */
+ NULL, /* next */
+ 0, /* static_pass_number */
+ TV_MVERSN_DISPATCH, /* tv_id */
+ PROP_cfg, /* properties_required */
+ PROP_cfg, /* properties_provided */
+ 0, /* properties_destroyed */
+ 0, /* todo_flags_start */
+ TODO_dump_func | /* todo_flags_finish */
+ TODO_cleanup_cfg | TODO_dump_cgraph |
+ TODO_update_ssa | TODO_verify_ssa
+ }
+};
Index: timevar.def
===================================================================
--- timevar.def (revision 173381)
+++ timevar.def (working copy)
@@ -106,6 +106,7 @@ DEFTIMEVAR (TV_LEX , "lexical analysis")
DEFTIMEVAR (TV_PARSE , "parser")
DEFTIMEVAR (TV_NAME_LOOKUP , "name lookup")
DEFTIMEVAR (TV_INLINE_HEURISTICS , "inline heuristics")
+DEFTIMEVAR (TV_MVERSN_DISPATCH , "multiversion dispatch")
DEFTIMEVAR (TV_INTEGRATION , "integration")
DEFTIMEVAR (TV_TREE_GIMPLIFY , "tree gimplify")
DEFTIMEVAR (TV_TREE_EH , "tree eh")
Index: common.opt
===================================================================
--- common.opt (revision 173381)
+++ common.opt (working copy)
@@ -2205,6 +2205,10 @@ ftree-builtin-call-dce
Common Report Var(flag_tree_builtin_call_dce) Init(0) Optimization
Enable conditional dead code elimination for builtin calls
+fclone-hot-version-paths
+Common Report Var(flag_clone_hot_version_paths) Init(0)
+Enable cloning and hoisting of hot multiversioned paths
+
fwhole-program
Common Report Var(flag_whole_program) Init(0) Optimization
Perform whole program optimizations
Index: Makefile.in
===================================================================
--- Makefile.in (revision 173381)
+++ Makefile.in (working copy)
@@ -1304,6 +1304,7 @@ OBJS-common = \
mcf.o \
mode-switching.o \
modulo-sched.o \
+ mversn-dispatch.o \
omega.o \
omp-low.o \
optabs.o \
@@ -3100,6 +3101,11 @@ implicit-zee.o : implicit-zee.c $(CONFIG_H) $(SYST
$(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
$(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \
$(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H)
+mversn-dispatch.o : mversn-dispatch.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
+ $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \
+ $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \
+ $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \
+ $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H)
gcse.o : gcse.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
$(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
$(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \
Index: passes.c
===================================================================
--- passes.c (revision 173381)
+++ passes.c (working copy)
@@ -796,6 +796,15 @@ init_optimization_passes (void)
}
NEXT_PASS (pass_ipa_increase_alignment);
NEXT_PASS (pass_ipa_matrix_reorg);
+ NEXT_PASS (pass_ipa_multiversion_dispatch);
+ {
+ struct opt_pass **p = &pass_ipa_multiversion_dispatch.pass.sub;
+ NEXT_PASS (pass_tree_convert_builtin_dispatch);
+ /* Rebuilding cgraph edges is necessary as the above passes change
+ the call graph. Otherwise, future optimizations use the old
+ call graph and make wrong decisions sometimes.*/
+ NEXT_PASS (pass_rebuild_cgraph_edges);
+ }
NEXT_PASS (pass_ipa_lower_emutls);
*p = NULL;
Index: params.def
===================================================================
--- params.def (revision 173381)
+++ params.def (working copy)
@@ -950,6 +950,17 @@ DEFPARAM (CXX_MAX_NAMESPACES_FOR_DIAGNOSTIC_HELP,
"name lookup fails",
1000, 0, 0)
+DEFPARAM (PARAM_NUMBER_OF_MVERSN_CLONES,
+ "num-mversn-clones",
+ "maximum number of functions to be cloned while doing "
+ "multiversioning",
+ 10, 0, 1000)
+
+DEFPARAM (PARAM_MVERSN_CLONE_CGRAPH_DEPTH,
+ "mversn-clone-depth",
+ "maximum length of the call graph path to be cloned "
+ "while doing multiversioning",
+ 2, 0, 5)
/*
Local variables:
mode:c