[Bug rtl-optimization/19680] sub-optimial register allocation with sse
tbptbp at gmail dot com
gcc-bugzilla@gcc.gnu.org
Sun Jan 30 18:41:00 GMT 2005
------- Additional Comments From tbptbp at gmail dot com 2005-01-30 18:40 -------
Yes that's not a win per se but even with those "unrolled" addr computations its
encodings end up generally tighter, ie:
gcc:
40114d: c1 e1 04 shl $0x4,%ecx
401150: 8d 41 30 lea 0x30(%ecx),%eax
...
40115a: 0f 58 0c 07 addps (%edi,%eax,1),%xmm1
...
40116f: 0f 58 04 0f addps (%edi,%ecx,1),%xmm0
icc:
4236e4: 03 ed add %ebp,%ebp
4236e6: 0f 28 64 ef 30 movaps 0x30(%edi,%ebp,8),%xmm4
4236eb: 0f 28 0c ef movaps (%edi,%ebp,8),%xmm1
Small win (and it's hard to follow as they schedule things very differently and
gcc touches the stack a lot more), but could be even better if shifting was
allowed. And in such a lenghty loop, decoding bandwith is scarce. If gcc wasn't
so greedingly trying to precompute indexes and offsets...
Could you tell me why gcc feels obliged to make a local copy of the hit_t
structure on the stack and then update both the copy and the original?
Ideally i'd like it to not try to outsmart me :) (or maybe i'm missing something
obvious).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19680
More information about the Gcc-bugs
mailing list