This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/19680] sub-optimial register allocation with sse
- From: "tbptbp at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 31 Jan 2005 20:18:59 -0000
- Subject: [Bug rtl-optimization/19680] sub-optimial register allocation with sse
- References: <20050128233416.19680.tbptbp@gmail.com>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Additional Comments From tbptbp at gmail dot com 2005-01-31 20:18 -------
-fno-gcse is a godsend, instant speedup and most of the sillyness when inlining
is gone.
Now i've applied both your patches, and while there's promising they also
triggers their own nastyness; gcc is so fond of memory inputs that it dumps
stuff on the stack only to address them some instructions latter (well, that's
my interpretation :).
For example,
4010c3: 0f 28 6c 13 30 movaps 0x30(%ebx,%edx,1),%xmm5
4010c8: 0f 28 f9 movaps %xmm1,%xmm7
4010cb: 0f 28 cb movaps %xmm3,%xmm1
4010ce: 0f 29 6c 24 10 movaps %xmm5,0x10(%esp)
4010d3: 0f 59 ce mulps %xmm6,%xmm1
4010d6: 0f 59 c4 mulps %xmm4,%xmm0
4010d9: 0f 28 6c 16 30 movaps 0x30(%esi,%edx,1),%xmm5
4010de: 0f 59 5c 24 10 mulps 0x10(%esp),%xmm3
or
40119d: 0f c2 c1 01 cmpltps %xmm1,%xmm0
4011a1: 0f 29 04 24 movaps %xmm0,(%esp)
4011a5: 0f 28 c5 movaps %xmm5,%xmm0
4011a8: 0f c2 c1 01 cmpltps %xmm1,%xmm0
4011ac: 0f 28 c8 movaps %xmm0,%xmm1
4011af: 0f 56 0c 24 orps (%esp),%xmm1
Other than those quirks, it looks better to me.
Just to be sure i've tried that patched version on my app, and it's slower than
the unpatched version (both with -fno-gcse).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19680