This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] New attribute to create target clones
- From: Evgeny Stupachenko <evstupac at gmail dot com>
- To: Jan Hubicka <hubicka at ucw dot cz>
- Cc: Jeff Law <law at redhat dot com>, Bernd Schmidt <bschmidt at redhat dot com>, Bernd Schmidt <bernds_cb1 at t-online dot de>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Fri, 9 Oct 2015 22:57:29 +0300
- Subject: Re: [PATCH] New attribute to create target clones
- Authentication-results: sourceware.org; auth=none
- References: <56000538 dot 5000800 at t-online dot de> <5601AEDB dot 4000100 at redhat dot com> <5601C375 dot 5050706 at redhat dot com> <CAOvf_xwUK1k=zx=f7eAWs4xs2J4KBCZGzeom_a5SfdkfF_gqSQ at mail dot gmail dot com> <CAOvf_xy6Kq_AGArurgxYC65K=LMNt6_gXHK15VFsoCd64F+TTA at mail dot gmail dot com> <5616BD37 dot 2000307 at redhat dot com> <20151008192349 dot GB90964 at kam dot mff dot cuni dot cz> <5616C9CA dot 2010702 at redhat dot com> <20151008213644 dot GD5527 at kam dot mff dot cuni dot cz> <5617FD1D dot 9010406 at redhat dot com> <20151009182753 dot GA7750 at kam dot mff dot cuni dot cz>
On Fri, Oct 9, 2015 at 9:27 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >Of course it also depends what you inline into function. You can have
>> >
>> >bar() target(-mavx) {fancy avx code}
>> >foobar() { ...... if (avx) bar();}
>> >foo() ctarget(-mavx,-mno-avx) {....foobar();....}
"no-" targets are not supported
>> >
>> >Now if you compile with -mavx and because ctarget takes effect only after inlining,
>> >at inlining time the target attributes will match and we can edn up inline bar->foobar->foo.
>> >After that we multiversion foo and drop AVX flag we will likely get ICE at expansion
>> >time.
>> But isn't that avoided by fixing up the call graph so that all calls
>> to the affected function are going through the dispatcher? Or is
>> that happening too late?
>
> There is dispatcher only for foo that is the root of the callgarph tree.
> When inlining we compare target attributes for match (in can_inline_edge_p).
> We do not compare ctarget attributes. Expanding ctarget to target early would
> avoid need for ctarget handling.
Currently inlining is disabled for functions with target_clone attribute:
if call from versioned function is call of multiversion function:
foo.avx2()
{
...
bar();
...
}
bar dispatcher is better than inlining:
foo.avx2()
{
...
bar.ifunc();
...
}
That is better from my point of view as:
1. multiversioning is going to be used mostly for target specific
optimizations like vectorization, scheduling... For these cases
inlininng not giving much. One of good examples is moving hot
vectorizable loop into the function and add target_clones attribute.
2. Inlining algorithm when we have not equal chains of architectures
in target_clones attribute are complex:
__attribute__((target_clones("avx","arch=slm","arch=core-avx2","default")))
__attribute__((target_clones("ssse3","arch=atom","arch=slm","default")))
3. If for one version inline occurs, but for another not we can get
unexpected (from user point) performance difference
Anyway I'll add the test you've mentioned to the patch.
The other big question is how all this deal with OpenMP clauses.
For now both target and target_clones attributes disable #pragma omp
effect inserting scalar ifunc call into loop.
Thanks,
Evgeny
>
>>
>> >
>> >>
>> >>
>> >>Since we're going through a dispatcher I don't think we're allowed
>> >>to change the ABI across the clones.
>> >
>> >If the dispatcher was something like
>> >
>> >switch(value)
>> >{
>> > case 1: foo_ver1(param); break;
>> > case 2: foo_ver2(param); break;
>> >}
>> >then it would be possible to change ABI if we can prove that param=0 in ipa-cp.
>> >If the dispatcher uses indirect calls, then we probably want to consider
>> >foo_ver* as having address taken and thus not local.
>> Yea, I guess that could work. For some reason I kept thinking about
>> indirect calls, but that shouldn't be the case here.
>>
>> That argues further that setting TREE_PUBLIC on the target specific
>> implementations is wrong.
>
> Yep, if the function doesn't need to be public then the whole block of code
> tampering with visibility and other flags can go. We probably also don't need
> to care about giving clone some fixed assembler name as clonning will invent
> .target_clone.XYZ name itself.
>
> Honza
>>
>>
>> Jeff