This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/31396] Inline code performance much worse than out-of-line
- From: "ubizjak at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 4 Apr 2007 08:21:01 -0000
- Subject: [Bug rtl-optimization/31396] Inline code performance much worse than out-of-line
- References: <bug-31396-14334@http.gcc.gnu.org/bugzilla/>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Comment #8 from ubizjak at gmail dot com 2007-04-04 09:21 -------
The difference is in CALLER_SAVE_PROFITALBLE condition. The pseudo that holds
sum is referenced 6 times. When only one foo() is called, default
CALLER_SAVE_PROFITABLE condition causes RA to allocate call-clobbered register
(fp or xmm regs are all call-clobbered for x86 targets). When two calls to
foo() are present, default heuristics
#define CALLER_SAVE_PROFITABLE(REFS, CALLS) (4 * (CALLS) < (REFS))
pushes pseudo to memory, as RA does not consider the fact that pseudo is used
inside the loop.
Default heuristics is _wrong_. When pseudo is accessed inside the loop,
call-clobbered register should be allocated, no matter how much calls it
crosses.
This can be confirmed by changing "double" keyword to "int" in the example of
comment #7. gcc now chooses ebx register (call-preserved) and loop compiles to
expected thight sequence:
test:
pushl %ebp
movl %esp, %ebp
pushl %ebx
subl $4, %esp
movl data, %edx
movl (%edx), %eax
leal 123(%eax), %ebx
movl $2, %eax
.L2:
addl -4(%edx,%eax,4), %ebx
addl $1, %eax
cmpl $5, %eax
jne .L2
call foo
call foo
movl %ebx, %eax
addl $4, %esp
popl %ebx
popl %ebp
ret
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31396