This is the mail archive of the
mailing list for the GCC project.
Re: SSE and SSE2 intrinsics
> > I'm having a bit of trouble working out how to get gcc-3.1 to compile
> > written using Intel's intrinsics. I'm using the snapshot of 2001-12-31,
> > built under SuSE7.3, with the current release version of binutils;
> > compiling -msse2 -march=pentium4 does work, does produce SSE2
> > and sadly produces (for every test case I've tried) executables which
> > 15% or so slower than compiling without any -m directives.
> Can you send me the testcases?
I'm afraid they're "I tried compiling these bits of the huge source
distribution of the FSL MRI image analyser with -march=pentium4 and
without"; I haven't produced small examples, though I'll have a go at that
this evening if you want.
Behavior I noticed by comparing gprof -l output was: transferring from FP to
SSE registers just to use the CVTTSD2SI commands, and
[unsigned char a,b,e; int c,d]
a = (a<b ? a:b);
if (c<d) e=(b>e?b:e);
compiling using byte-sized registers without -march=pentium4 and dword-sized
ones otherwise, and suddenly becoming the hottest spot in the code; in both
cases it was compiled in conditional-jump-over-one-instruction style, where
I was slightly expecting cmov.
> I am not sure what are you shooting for -msse2 -march=pentium4 just
> presence of SSE2 builtins.
Oh. I had expected -march=pentium4 to do what -mfp-math=sse does -- at
least, that's the behaviour I saw on the Intel compiler. I've collected the
20020204 snapshot, and will comment more in a couple of days when I've had
some time to play -- my P4 system is at home and my only Net connection at
college, so I carry snapshots back and forth on my Windows laptop.
Err, does -mfp-math=sse also use SSE2 for DF-mode operations, or do I need
to set it to sse2 for those? And is it documented anywhere? -- Google shows
no uses of the word on the Web.
> To get some benefits, you need to eighter use the intrisc and then the
> code would not compile w/o those -m options or use -mfp-math=sse
> to enable use of SSE instructions for floating point that should improve
> perofmrance of FP code but not due to use of paralelization.
Indeed; I already have some code which relies on the ICC intrinsics, which
I'd rather like to be able to compile with the normal gcc tools.