Function multiversioning question
Jonathan Wakely
jwakely.gcc@gmail.com
Thu Oct 25 12:11:00 GMT 2018
On Thu, 25 Oct 2018 at 12:46, Martin Reinecke
<martin@mpa-garching.mpg.de> wrote:
>
> Hi,
>
> I'm trying to use gcc's "target_clones" attribute for some functions in
> a performance critical library. These functions use gcc builtins and
> choose between different sets (standard code, SSE2, AVX) depending on
> the predefined macros __SSE2__ and __AVX__.
> Unfortunately these macros apparently are not set by the compiler when
> it compiles for the individual targets.
>
> Consider the code below:
>
> #include <stdio.h>
>
> __attribute__((target_clones("avx","sse2","default")))
> void foo(void)
> {
> #if defined(__AVX__)
> printf("AVX\n");
> #elif defined(__SSE2__)
> printf("SSE2\n");
> #else
> printf("nothing special\n");
> #endif
> }
>
> int main(void)
> {
> foo();
> return 0;
> }
>
> Compiling and running this in an AVX-capable CPU prints "SSE2", where I
> would have hoped to see "AVX".
Macros are defined during preprocessing, and the preprocessor doesn't
know anything about the target_clones attribute. When the compiler
sees the attribute it can't go back in time and alter the result of
earlier preprocessing.
> Is there a way to achieve what I have in mind?
If you want three different implementations of the function I think
you need three different clones. Or do runtime checks for the CPU
features inside the function, but that seems suboptimal.
More information about the Gcc-help
mailing list