This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: PATCH: __nodebug__ attribute for use on SSE intrinsic wrappers
On Jul 29, 2005, at 2:27 PM, Daniel Berlin wrote:
On Fri, 2005-07-29 at 12:57 -0700, Devang Patel wrote:
On Jul 29, 2005, at 12:52 PM, Daniel Berlin wrote:
But it's not a function!
It's vec_add.
We've already solved the problem for vec_add. Stuart is referring to
"_mm_add_epi8".
So go back and put s/vec_add/_mm_add_epi8/ in my emails :)
BTW, I should probably point out that Intel defines these as functions
too in their compiler (see icc 9.0, emmintrin.h, for example):
/*
* Integer intrinsics
*/
/****************************************************/
/* NAME : _mm_add_epi8 */
/* DESCRIPTION : Adds 16 unsigned or signed 8-bit */
/* integers in a to b */
/****************************************************/
_MM_INLINE_COMMAND static __m128i _mm_add_epi8(__m128i a, __m128i b)
{
__m128i ret;
int i;
for (i=0; i<16; i++)
ret.m128i.b[i] = a.m128i.b[i] + b.m128i.b[i];
return ret;
}
If you think that "vec_add()" is the issue, you're right. However,
the folks that use these intrinsics don't think that way, and
it's a
bit arrogant for GCC to force the issue as it does. Our users
opinions should count for something...
If we followed every bad implementation idea choice someone else
ever
made, we'd be in serious trouble.
We're already in trouble, due to our /own/ implementation choice.
Which, btw, is the same implementation choice taken by, for example,
Intel, who invented these things.
--Dan
No. The function you're referencing is a "compatibility hack,"
provided in case anyone wants to recompile a module containing SSE
instructions for a chip that doesn't have SSE hardware (e.g. a '486,
or an Itanium). I can imagine why Intel would supply such a routine;
GCC doesn't attempt to support SSE intrinsics for non-SSE hardware.
(And I hope GCC never does. :-)
In fact, Intel has thoughtfully provided /three/ implementations of
_mm_add_epi8(). One is the scalar version you've included above;
another is in "sse2mmx.h" (in my ICC 8.0 installation), and seems to
be targetting MMX hardware (e.g. a PentiumII). Note it replaces the
scalar loop above with two 64-bit MMX intrinsics:
__inline __m128i
_mm_add_epi8(__m128i a, __m128i b)
{
__m128i x;
x.v64[0] = _mm_add_pi8(a.v64[0], b.v64[0]);
x.v64[1] = _mm_add_pi8(a.v64[1], b.v64[1]);
return x;
}
The last implementation seems to exist only as an ICC builtin; the
only reference I can find for it is this line in emmintrin.h:
extern __m128i _mm_add_epi8(__m128i a, __m128i b);
For reference, here is GCC's implementation:
static __inline __m128i __attribute__((__always_inline__))
_mm_add_epi8 (__m128i __A, __m128i __B)
{
return (__m128i)__builtin_ia32_paddb128 ((__v16qi)__A, (__v16qi)__B);
}
I tried using _mm_add_epi8() with ICC and this testcase:
#include <emmintrin.h>
__m128i x, y, z;
void trial()
{
z = _mm_add_epi8 (x, y);
}
Below is what happened with and without -g on a Linux/x86 box. The
diagnostics refer to an Intel-supplied header file:
------------------------
stuart@citroen:~$ /opt/intel_cc_80/bin/icc -S trial.c -o trial.nodebug.s
/opt/intel_cc_80/include/xmmintrin.h(434): (col. 10)warning #963: no
EMMS instruction before return
/opt/intel_cc_80/include/xmmintrin.h(419): (col. 10)warning #963: no
EMMS instruction before return
/opt/intel_cc_80/include/xmmintrin.h(387): (col. 10)warning #963: no
EMMS instruction before return
/opt/intel_cc_80/include/xmmintrin.h(368): (col. 10)warning #963: no
EMMS instruction before return
stuart@citroen:~$ /opt/intel_cc_80/bin/icc -S -g trial.c -o
trial.debug.s
/opt/intel_cc_80/include/xmmintrin.h(368): (col. 10)warning #963: no
EMMS instruction before return
/opt/intel_cc_80/include/xmmintrin.h(387): (col. 10)warning #963: no
EMMS instruction before return
/opt/intel_cc_80/include/xmmintrin.h(419): (col. 10)warning #964: no
EMMS instruction before call
/opt/intel_cc_80/include/xmmintrin.h(434): (col. 10)warning #964: no
EMMS instruction before call
/opt/intel_cc_80/include/xmmintrin.h(463): (col. 10)warning #963: no
EMMS instruction before return
stuart@citroen:~$ wc trial*.s
1554 3381 39259 trial.debug.s
254 760 11067 trial.nodebug.s
1808 4141 50326 total
stuart@citroen:~$ grep add trial*.s
trial.debug.s: paddb %xmm1, %
xmm0 #7.7
trial.nodebug.s: paddb y, %
xmm0 #7.7
stuart@citroen:~$
------------------------
I haven't explored what happens if you're targeting a non-SSE CPU,
but when ICC compiles _mm_add_epi8() into a PADDB instruction (as GCC
does), there's /no/ debug info for the intrinsic function. BTW, ICC
for Linux/x86 is using DWARF.
stuart