This is the mail archive of the
mailing list for the GCC project.
Re: SSE and SSE2 intrinsics
> I'm having a bit of trouble working out how to get gcc-3.1 to compile code
> written using Intel's intrinsics. I'm using the snapshot of 2001-12-31,
> built under SuSE7.3, with the current release version of binutils;
> compiling -msse2 -march=pentium4 does work, does produce SSE2 instructions,
> and sadly produces (for every test case I've tried) executables which are
> 15% or so slower than compiling without any -m directives.
Can you send me the testcases?
I am not sure what are you shooting for -msse2 -march=pentium4 just enables the
presence of SSE2 builtins. Currently gcc attempts to use SSE instruction for
moves of floating point values that due to register allocator missfeature
results in lousy code. I do have patch to avoid that behaviour that can be
viewed as a bug.
To get some benefits, you need to eighter use the intrisc and then the
code would not compile w/o those -m options or use -mfp-math=sse
to enable use of SSE instructions for floating point that should improve
perofmrance of FP code but not due to use of paralelization.
> I worked out I should do typedef int __m128i ((attribute V4SI)) to get the
> right type, but I couldn't find in the documentation any indication of what
> I should use in gcc where I'd use _mm_cmpgt_epi8 in VC++; indeed, even
> looking through i386.md I couldn't find an instruction-definition for the
> SSE2 extended-MMX compare instructions.
> I was somehow expecting a file that I could #include to get Intel intrinsics
> recognised, but I couldn't find one even by grepping through the source
> tree. If you tell me what pattern I should be following, writing such a file
> should be grunt-work given the MSVC help file listing what the intrinsics
> are supposed to do, and I'd be prepared to do that for the sake of getting
> SSE2 working nicely when I'm under Linux.