This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [google] Patch to support calling multi-versioned functions via new GCC builtin. (issue4440078)


Here is the background for this feature:

1) People relies on function multi-version to explore hw features and
squeeze performance, but there is no standard ways of doing so, either
a) using indirect function calls with function pointers set at program
initialization; b) using manual dispatch at each callsite; b) using
features like IFUNC.  The dispatch mechanism needs to be promoted to
the language level and becomes the first class citizen;
2) Most importantly, the above non standard approaches block
interprocedural optimizations such as inlining. With the introduction
of buitlin_dispatch and the clear semantics, compiler can do more
aggressive optimization.

Multi-way dispatch will be good, but most of the use cases we have
seen is 2-way -- a fast path + a default path.

Who are the targeted consumer of the feature?

1) For developers who like to MV function manually;
2) For user directed function cloning

   e.g,
    int foo (...) __attribute__((clone ("sse")):

3) Automatic function cloning determined by compiler.


>
> I am not sure that combining the function choice and its invocation
> to a single builtin is good. ?First, you use variadic arguments for
> the actual function arguments which will cause vararg promotions
> to apply and thus it will be impossible to dispatch to functions
> taking single precision float values (and dependent on ABI details
> many more similar issues). ?Second, you restrict yourself to
> only two versions of a function - that looks quite limited and this
> restriction is not easy to lift with your proposed interface.

See above. Multi-way dispatch can be added similarly.


>
> That's a nice idea, but why not simply process all functions with
> a const attribute and no arguments this way? ?IMHO
>
> int testsse (void) __attribute__((const));
>
> is as good and avoids the new attribute completely. ?The pass would
> also benefit other uses of this idiom (giving a way to have global
> dynamic initializers in C). ?The functions may not be strictly 'const'
> in a way the compiler autodetects this attribute but it presents
> exactly the promises to the compiler that allow this optimization.
>
> Thus, please split this piece out into a separate patch and use
> the const attribute.


Sounds good.

>
>> ?What happens with cloning, -fclone-hot-version-paths ?
>> ?-----------------------------------------------------
>
> Now, here you lost me somewhat, because I didn't look into the
> patch details and I am missing an example on how the lowered
> IL looks before that cloning. ?So for now I suppose this
> -fclone-hot-version-paths
> is to expose direct calls to the IL. ?If you would lower __builtin_dispatch
> to a style like
>
> ?int sel = selector ();
> ?switch (sel)
> ? ? {
> ? ? case 0:
> ? ? ? fn = popcnt;
> ? ? ?break;
> ? ? case 1:
> ? ? ? fn = popcnt_sse4;
> ? ? ? break;
> ? ? }
> ? res = (*fn) (25);
>
> then rather than a new pass specialized for __builtin_dispatch existing
> optimization passes that are able to do tracing like VRP and DOM
> via jump-threading or the tracer pass should be able to do this
> optimization for you. ?If they do not use FDO in a good way it is better
> to improve them for this case instead of writing a new pass.

What you describe may not work

1) the function selection may happen in a different function;
2) Compiler will need to hoist the selection into the program
initializer to avoid overhead

As an example of why dispatch hoisting and call chain cloning is needed:

void foo();
void bar();

void doit_v1();
void doit_v2();
bool check_v () __attribute__((const));

int test();


void bar()
{
    ....
    for (.....)
     {
         foo();
         ....
     }
}

void foo()
{
   ...
   for (...)
   {
      __builtin_dispatch(check_v, doit_v1, doit_v2,...);
     ...
   }
}


int test ()
{
    ..
   bar ();
}


The feature testing and dispatch is embedded in a 2-deep loop nest
crossing function boundaries. The call paths test --> bar --> foo
needs to be cloned. This is done by hoisting dispatch up the call
chain -- it ends up :


void bar_v1()
{
   ....
   for (..)
    {
      foo_v1 ();
    }
 ..
}

void bar_v2 ()
{
    ...
    for (..)
    {
      foo_v2();
    }
   ..
}

void foo_v1 ()
{
   ..
   for ()
    {
      doit_v1()
    }
}

void foo_v2 ()
{
    ..
    for ()
    {
       doit_v2()
    }
}

int test ()
{
     __builtin_dispatch(check_v, bar_v1, bar_v2);
    ..

}



Thanks,

David



>
> Note that we already have special FDO support for indirect to direct
> call promotion, so that might work already.
>
> Thus, with all this, __builtin_dispatch would be more like syntactic
> sugar to avoid writing above switch statement yourself. ?If you restrict
> that sugar to a constant number of candidates it can be replaced by
> a macro (or several ones, specialized for the number of candidates).
>
> Richard.
>
>> ?For the call graph that looks like this before cloning :
>>
>> func_cold ----> func_hot ----> findOnes ----> IsPopCntSupported ----> popcnt
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?|
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----> no_popcnt
>>
>> ?where popcnt and no_popcnt are the multi-versioned functions, then after cloning :
>>
>> func_cold ---->IsPopCntSupported ----> func_hot_clone0 ----> findOnes_clone0 ----> popcnt
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?|
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ----> func_hot_clone1 ----> findOnes_clone1 ----> no_popcnt
>>
>>
>>
>>
>> ?Flags : -fclone-hot-version-paths does function unswitching via cloning.
>> ? ? ? ? ?--param=num-mversn-clones=<num> allows to specify the number of
>> ? ? ? ? ?functions that should be cloned.
>> ? ? ? ? ?--param=mversn-clone-depth=<num> allows to specify the length of
>> ? ? ? ? ?the call graph path that should be cloned. ?num = 0 implies only
>> ? ? ? ? ?leaf node that contains the __builtin_dispatch statement must be
>> ? ? ? ? ?cloned.
>> ============================================================================================
>>
>> Google ref 45947, 45986
>>
>> 2011-04-28 ?Sriraman Tallam ?<tmsriram@google.com>
>>
>> ? ? ? ?* c-family/c-common.c ? (revision 173122) (handle_version_selector_attribute): New function.
>> ? ? ? ?(c_common_attribute_table): New attribute "version_selector".
>> ? ? ? ?* tree-pass.h ? (revision 173122) (pass_tree_convert_builtin_dispatch): New pass.
>> ? ? ? ?(pass_ipa_multiversion_dispatch): New pass.
>> ? ? ? ?* testsuite/gcc.dg/mversn7.c ? ?(revision 0): New test case.
>> ? ? ? ?* testsuite/gcc.dg/mversn4.c ? ?(revision 0): New test case.
>> ? ? ? ?* testsuite/gcc.dg/mversn4.h ? ?(revision 0): New test case.
>> ? ? ? ?* testsuite/gcc.dg/mversn4a.c ? (revision 0): New test case.
>> ? ? ? ?* testsuite/gcc.dg/torture/mversn1.c ? ?(revision 0): New test case.
>> ? ? ? ?* testsuite/gcc.dg/mversn2.c ? ?(revision 0): New test case.
>> ? ? ? ?* testsuite/gcc.dg/mversn6.c ? ?(revision 0): New test case.
>> ? ? ? ?* testsuite/gcc.dg/mversn3.c ? ?(revision 0): New test case.
>> ? ? ? ?* testsuite/g++.dg/mversn8.C ? ?(revision 0): New test case.
>> ? ? ? ?* testsuite/g++.dg/mversn10a.C ?(revision 0): New test case.
>> ? ? ? ?* testsuite/g++.dg/mversn14a.C ?(revision 0): New test case.
>> ? ? ? ?* testsuite/g++.dg/tree-prof/mversn13.C (revision 0): New test case.
>> ? ? ? ?* testsuite/g++.dg/tree-prof/mversn15.C (revision 0): New test case.
>> ? ? ? ?* testsuite/g++.dg/tree-prof/mversn15a.C ? ? ? ?(revision 0): New test case.
>> ? ? ? ?* testsuite/g++.dg/mversn9.C ? ?(revision 0): New test case.
>> ? ? ? ?* testsuite/g++.dg/mversn10.C ? (revision 0): New test case.
>> ? ? ? ?* testsuite/g++.dg/mversn12.C ? (revision 0): New test case.
>> ? ? ? ?* testsuite/g++.dg/mversn14.C ? (revision 0): New test case.
>> ? ? ? ?* testsuite/g++.dg/mversn16.C ? (revision 0): New test case.
>> ? ? ? ?* testsuite/g++.dg/torture/mversn11.C ? (revision 0): New test case.
>> ? ? ? ?* testsuite/g++.dg/torture/mversn5.C ? ?(revision 0): New test case.
>> ? ? ? ?* testsuite/g++.dg/torture/mversn5.h ? ?(revision 0): New test case.
>> ? ? ? ?* testsuite/g++.dg/torture/mversn5a.C ? (revision 0): New test case.
>> ? ? ? ?* builtin-types.def ? ? (revision 173122) (BT_PTR_FN_INT): New pointer type.
>> ? ? ? ?(BT_FN_INT_PTR_FN_INT_PTR_PTR_VAR): New function type for __builtin_dispatch.
>> ? ? ? ?* builtins.def ?(revision 173122) (BUILT_IN_DISPATCH): New builtin to
>> ? ? ? ?support multi-version calls.
>> ? ? ? ?* mversn-dispatch.c ? ? (revision 0): New file.
>> ? ? ? ?* timevar.def ? (revision 173122) (TV_MVERSN_DISPATCH): New time var.
>> ? ? ? ?* common.opt ? ?(revision 173122) (fclone-hot-version-paths): New flag.
>> ? ? ? ?* Makefile.in ? (revision 173122) (mversn-dispatch.o): New rule.
>> ? ? ? ?* passes.c ? ? ?(revision 173122) (init_optimization_passes): Add the new
>> ? ? ? ?multi-version and dispatch passes to the pass list.
>> ? ? ? ?* params.def ? ?(revision 173122) (PARAM_NUMBER_OF_MVERSN_CLONES): Define.
>> ? ? ? ?(PARAM_MVERSN_CLONE_CGRAPH_DEPTH): Define.
>> ? ? ? ?* doc/invoke.texi ? ? ? (revision 173122) (mversn-clone-depth): Document.
>> ? ? ? ?(num-mversn-clones): Document.
>> ? ? ? ?(fclone-hot-version-paths): Document.
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]