This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug middle-end/39840] New: Non-optimal (or wrong) implementation of SSE intrinsics


The implementations of the SSE intrinsics for x86 and x86-64 in gcc is tied to
the use of an appropriate -m option, such as -mssse3 or -mavx.  This is
different from what icc does and it prevents code from being written in the
most natural form.  This is nothing new in gcc 4.4, it has been the behavior of
gcc forever, as far as I can see.  But especially the introduction of AVX
brings this problem to the foreground.

As an example, assume I want to write a vector class with the usual operations.
 I can write code like this:

#ifdef __AVX__
vec<float,N> operator+(vec<float,N> &a, vec<float,N> &b) {
  ... use AVX intrinsics ...
}
#elif defined __SSE4__
vec<float,N> operator+(vec<float,N> &a, vec<float,N> &b) {
  ... use SSE4 intrinsics ...
}
#elif defined __SSE2__
vec<float,N> operator+(vec<float,N> &a, vec<float,N> &b) {
  ... use SSE2 intrinsics ...
}
#else
vec<float,N> operator+(vec<float,N> &a, vec<float,N> &b) {
  ... generic implementation ...
}
#endif

But this means, of course, that the binary has to be compiled for every single
target and the correct one has to be chosen.  This is not attractive or
practical.  Chances are that only a generic implementation will be available.

It would be better to have a self-optimizing implementation:

vec<float,N> operator+(vec<float,N> &a, vec<float,N> &b) {
  if (AVX is available)
    ... use AVX intrinsics ...
  else if (SSE4 is available)
    ... use SSE4 intrinsics ...
  else if (SSE2 is available)
    ... use SSE2 intrinsics ...
  else
    ... generic implementation ...
}

This is possible with icc.  It is not possible with gcc in the moment.  For gcc
I would have to split the implementation of all the variants in individual
files and then, in the template function as seen above, these implementations
would have to be called.  Even if as in this case it might be doable (but
terribly inconvenient) there are situations where this is really impractical or
impossible.


The problem is that to be able to use the AVX intrinsics the compiler has to be
passed -mavx (all other extensions are implied in -mavx).   But this flag has
another consequence: the compiler will now take advantage of the new
instructions in AVX and generate for unrelated code not associated with
intrinsics (e.g., an inlined memset implementation).  The result is that such a
binary will fail to run on anything but an AVX-enabled machine.


In icc the -mavx flag exclusively controls the code generation (i.e., whether
AVX is used in inlined memset etc).  The SSE intrinsics and all the associated
data types are _always_ defined as soon as <immintrin.h> is included.


This means the exmaple code above would be compiled with an -m parameter for
the minimum ISA to support and still the AVX, SSE4, ... intrinsics are
available.


gcc should follow icc's way of handling the intrinsics.  Since all this
intrinsic business comes from icc I consider this a bug in gcc's implementation
instead of an enhancement request.


-- 
           Summary: Non-optimal (or wrong) implementation of SSE intrinsics
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: drepper at redhat dot com
GCC target triplet: i?86-* x86_64-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39840


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]