This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug inline-asm/29756] New: SSE intrinsics hard to use without redundant temporaries appearing
- From: "timday at bottlenose dot demon dot co dot uk" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 7 Nov 2006 22:22:56 -0000
- Subject: [Bug inline-asm/29756] New: SSE intrinsics hard to use without redundant temporaries appearing
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
I've been adapting some old codes' simple 4-float vector class to use SSE by
use of the intrinsic functions. It seems to be quite hard to avoid the
generated assembly code being rather diluted by apparently redundant spills of
intermediate results to the stack.
On inspecting the assembly produced from the file to be attached, compare the
code generated for matrix44f::transform_good and matrix44f::transform_bad.
The former is 20 instructions and apparently optimal. However, it was only
arrived at by prodding the latter version of the function (which does exactly
the same thing but expressed more naturally, but results in 32 instructions)
until the stack temporaries went away. It would be nice if both versions of
the function generated optimal code and there doesn't seem to be any particular
reason they shouldn't.
Both versions' assembly contain the same expected numbers of shuffle, multiply
and add instructions, the excess seems to all involve extra stack temporaries.
[I'm not sure what the "triplet" codes on this form are.
I'm using a gcc in Debian Etch gcc --version shows "gcc (GCC) 4.1.2 20060901
(prerelease) (Debian 4.1.1-13)"; platform is a Pentium3. Sorry if the
"inline-asm" component is a completely inappropriate thing to assign to.]
--
Summary: SSE intrinsics hard to use without redundant temporaries
appearing
Product: gcc
Version: 4.1.2
Status: UNCONFIRMED
Severity: minor
Priority: P3
Component: inline-asm
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: timday at bottlenose dot demon dot co dot uk
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756