This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] New attribute to create target clones


On Fri, Oct 9, 2015 at 9:27 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >Of course it also depends what you inline into function. You can have
>> >
>> >bar() target(-mavx) {fancy avx code}
>> >foobar() { ...... if (avx) bar();}
>> >foo() ctarget(-mavx,-mno-avx) {....foobar();....}

"no-" targets are not supported

>> >
>> >Now if you compile with -mavx and because ctarget takes effect only after inlining,
>> >at inlining time the target attributes will match and we can edn up inline bar->foobar->foo.
>> >After that we multiversion foo and drop AVX flag we will likely get ICE at expansion
>> >time.
>> But isn't that avoided by fixing up the call graph so that all calls
>> to the affected function are going through the dispatcher?  Or is
>> that happening too late?
>
> There is dispatcher only for foo that is the root of the callgarph tree.
> When inlining we compare target attributes for match (in can_inline_edge_p).
> We do not compare ctarget attributes.  Expanding ctarget to target early would
> avoid need for ctarget handling.
Currently inlining is disabled for functions with target_clone attribute:

if call from versioned function is call of multiversion function:

foo.avx2()
{
  ...
  bar();
  ...
}

bar dispatcher is better than inlining:

foo.avx2()
{
  ...
  bar.ifunc();
  ...
}

That is better from my point of view as:
1. multiversioning is going to be used mostly for target specific
optimizations like vectorization, scheduling... For these cases
inlininng not giving much. One of good examples is moving hot
vectorizable loop into the function and add target_clones attribute.
2. Inlining algorithm when we have not equal chains of architectures
in target_clones attribute are complex:
__attribute__((target_clones("avx","arch=slm","arch=core-avx2","default")))
__attribute__((target_clones("ssse3","arch=atom","arch=slm","default")))
3. If for one version inline occurs, but for another not we can get
unexpected (from user point) performance difference

Anyway I'll add the test you've mentioned to the patch.

The other big question is how all this deal with OpenMP clauses.
For now both target and target_clones attributes disable #pragma omp
effect inserting scalar ifunc call into loop.

Thanks,
Evgeny

>
>>
>> >
>> >>
>> >>
>> >>Since we're going through a dispatcher I don't think we're allowed
>> >>to change the ABI across the clones.
>> >
>> >If the dispatcher was something like
>> >
>> >switch(value)
>> >{
>> >   case 1: foo_ver1(param); break;
>> >   case 2: foo_ver2(param); break;
>> >}
>> >then it would be possible to change ABI if we can prove that param=0 in ipa-cp.
>> >If the dispatcher uses indirect calls, then we probably want to consider
>> >foo_ver* as having address taken and thus not local.
>> Yea, I guess that could work.  For some reason I kept thinking about
>> indirect calls, but that shouldn't be the case here.
>>
>> That argues further that setting TREE_PUBLIC on the target specific
>> implementations is wrong.
>
> Yep, if the function doesn't need to be public then the whole block of code
> tampering with visibility and other flags can go.  We probably also don't need
> to care about giving clone some fixed assembler name as clonning will invent
> .target_clone.XYZ name itself.
>
> Honza
>>
>>
>> Jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]