This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/19680] sub-optimial register allocation with sse
- From: "tbptbp at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 30 Jan 2005 18:40:46 -0000
- Subject: [Bug rtl-optimization/19680] sub-optimial register allocation with sse
- References: <20050128233416.19680.tbptbp@gmail.com>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Additional Comments From tbptbp at gmail dot com 2005-01-30 18:40 -------
Yes that's not a win per se but even with those "unrolled" addr computations its
encodings end up generally tighter, ie:
gcc:
40114d: c1 e1 04 shl $0x4,%ecx
401150: 8d 41 30 lea 0x30(%ecx),%eax
...
40115a: 0f 58 0c 07 addps (%edi,%eax,1),%xmm1
...
40116f: 0f 58 04 0f addps (%edi,%ecx,1),%xmm0
icc:
4236e4: 03 ed add %ebp,%ebp
4236e6: 0f 28 64 ef 30 movaps 0x30(%edi,%ebp,8),%xmm4
4236eb: 0f 28 0c ef movaps (%edi,%ebp,8),%xmm1
Small win (and it's hard to follow as they schedule things very differently and
gcc touches the stack a lot more), but could be even better if shifting was
allowed. And in such a lenghty loop, decoding bandwith is scarce. If gcc wasn't
so greedingly trying to precompute indexes and offsets...
Could you tell me why gcc feels obliged to make a local copy of the hit_t
structure on the stack and then update both the copy and the original?
Ideally i'd like it to not try to outsmart me :) (or maybe i'm missing something
obvious).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19680