This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug rtl-optimization/31396] Inline code performance much worse than out-of-line



------- Comment #8 from ubizjak at gmail dot com  2007-04-04 09:21 -------
The difference is in CALLER_SAVE_PROFITALBLE condition. The pseudo that holds
sum is referenced 6 times.  When only one foo() is called, default
CALLER_SAVE_PROFITABLE condition causes RA to allocate call-clobbered register
(fp or xmm regs are all call-clobbered for x86 targets). When two calls to
foo() are present, default heuristics 

#define CALLER_SAVE_PROFITABLE(REFS, CALLS)  (4 * (CALLS) < (REFS))

pushes pseudo to memory, as RA does not consider the fact that pseudo is used
inside the loop.

Default heuristics is _wrong_. When pseudo is accessed inside the loop,
call-clobbered register should be allocated, no matter how much calls it
crosses.

This can be confirmed by changing "double" keyword to "int" in the example of
comment #7. gcc now chooses ebx register (call-preserved) and loop compiles to
expected thight sequence:

test:
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ebx
        subl    $4, %esp
        movl    data, %edx
        movl    (%edx), %eax
        leal    123(%eax), %ebx
        movl    $2, %eax
.L2:
        addl    -4(%edx,%eax,4), %ebx
        addl    $1, %eax
        cmpl    $5, %eax
        jne     .L2
        call    foo
        call    foo
        movl    %ebx, %eax
        addl    $4, %esp
        popl    %ebx
        popl    %ebp
        ret


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31396


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]