User account creation filtered due to spam.

Bug 47895 - usage of __attribute__ ((__target__ ("xyz"))) with buitins
Summary: usage of __attribute__ ((__target__ ("xyz"))) with buitins
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.6.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
Depends on:
Reported: 2011-02-25 13:32 UTC by vincenzo Innocente
Modified: 2011-02-26 09:55 UTC (History)
0 users

See Also:
Known to work:
Known to fail:
Last reconfirmed:


Note You need to log in before you can comment on or make changes to this bug.
Description vincenzo Innocente 2011-02-25 13:32:51 UTC
I would like to generate code for multiple targets from the same source when using builtins
I think that this issue has been discussed before for instance in

I have code as in the example below that compiles only with -mavx.
In such a case it will use AVX instruction for all functions including the one "targetted" for sse3
while I would like to obtain an object file that I can run on multiple platform.
This problem occurs only when builtins are used: standard c code is correctly emitted accordingly to the target provided that the minimal -m is used.

Is there any preprocessor flag to "activate" all intrinsics and builtins in x86intrin.h?


#include <x86intrin.h>

float  __attribute__ ((__target__ ("sse3"))) sum3(float const * __restrict__ x, float const * __restrict__ y, float const * __restrict__ z) {
  __m128 sum = _mm_setzero_ps();
  for (int i=0; i!=1024; i+=4)
    sum  += _mm_add_ps(_mm_loadu_ps(z+i),
                       _mm_mul_ps(_mm_loadu_ps(x+i),_mm_loadu_ps(y+i)) );
    sum = _mm_hadd_ps(sum,sum);
    sum = _mm_hadd_ps(sum,sum);
  float ret;
  return ret;

float  __attribute__ ((__target__ ("avx"))) sumv(float const * __restrict__ x, float const * __restrict__ y, float const * __restrict__ z) {
  __m256 sum = _mm256_setzero_ps();
  for (int i=0; i!=1024; i+=8)
    sum  += _mm256_add_ps(_mm256_loadu_ps(z+i),
                       _mm256_mul_ps(_mm256_loadu_ps(x+i),_mm256_loadu_ps(y+i)) );
    sum = _mm256_hadd_ps(sum,sum);
    sum = _mm256_hadd_ps(sum,sum);
    sum = _mm256_hadd_ps(sum,sum);
  float ret[8];
  return ret[0];
Comment 1 Richard Biener 2011-02-25 14:44:27 UTC
A way easier and more portable way is to split your source into multiple
compilation units and use appropriate flags to compile them.
Comment 2 vincenzo Innocente 2011-02-26 09:55:03 UTC
I find that the solution with multiple files shifts the problem to the build system, which is not necessarily an easier solution in all projects, and make maintenance more difficult as more files need to be tracked for each single algorithm.
I would much prefer a solution that is fully confined to the source code without involving the configuration and build system. 

In any case at the moment there is a clear unbalance between plane c code, for which a single compilation unit with multiple functions for different "targets" do work, and code exploiting builtins for which __attribute__ ((__target__ ("xyz"))) is not ineffective.
I consider this behavior a defect.