[Bug rtl-optimization/19680] sub-optimial register allocation with sse

Sun Jan 30 18:41:00 GMT 2005

------- Additional Comments From tbptbp at gmail dot com  2005-01-30 18:40 -------
Yes that's not a win per se but even with those "unrolled" addr computations its
encodings end up generally tighter, ie:
gcc:
  40114d:       c1 e1 04                shl    $0x4,%ecx
  401150:       8d 41 30                lea    0x30(%ecx),%eax
...
  40115a:       0f 58 0c 07             addps  (%edi,%eax,1),%xmm1
...
  40116f:       0f 58 04 0f             addps  (%edi,%ecx,1),%xmm0

icc:
  4236e4:       03 ed                   add    %ebp,%ebp
  4236e6:       0f 28 64 ef 30          movaps 0x30(%edi,%ebp,8),%xmm4
  4236eb:       0f 28 0c ef             movaps (%edi,%ebp,8),%xmm1

Small win (and it's hard to follow as they schedule things very differently and
gcc touches the stack a lot more), but could be even better if shifting was
allowed. And in such a lenghty loop, decoding bandwith is scarce. If gcc wasn't
so greedingly trying to precompute indexes and offsets...

Could you tell me why gcc feels obliged to make a local copy of the hit_t
structure on the stack and then update both the copy and the original?
Ideally i'd like it to not try to outsmart me :) (or maybe i'm missing something
obvious).

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19680