[Bug target/45937] New: unnecessary push/pop to reserve stack memory

carrot at google dot com gcc-bugzilla@gcc.gnu.org
Fri Oct 8 03:48:00 GMT 2010


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45937

           Summary: unnecessary push/pop to reserve stack memory
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: carrot@google.com
                CC: carrot@google.com
              Host: i686-linux
            Target: arm-eabi
             Build: i686-linux


Created attachment 21995
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=21995
test case

Compile the attached source code with options -march=armv7-a -mthumb -Os, gcc
generates:


tt:
        push    {r4, r5, r6, r7, lr}
        sub     sp, sp, #20
        mov     r5, r2
        ldr     r4, [sp, #40]
        cbz     r4, .L1
        movs    r2, #20
        muls    r2, r1, r2
        adds    r6, r3, r2
        ldr     r3, [r3, r2]
        cbz     r3, .L1
        ldr     lr, [r6, #8]
        ldr     r7, [r6, #12]
        ldr     r3, [r6, #16]
        ldr     r2, [r6, #4]
        ldr     r6, .L5
        str     lr, [sp, #0]
        cmp     r3, #0
        it      eq
        moveq   r3, r6
        str     r7, [sp, #4]
        str     r3, [sp, #8]
        mov     r3, r5
        blx     r4
.L1:
        add     sp, sp, #20
        pop     {r4, r5, r6, r7, pc}


Notice that this function uses only 12 bytes of stack memory to pass
parameters, but it allocates 20 bytes and the other 8 bytes is never used. So
the function prologue and epilogue can be rewritten as following and reduce 2
instructions.

tt:
        push    {r1, r2, r3, r4, r5, r6, r7, lr}
        ...
        pop     {r1, r2, r3, r4, r5, r6, r7, pc}


The root cause of this problem is the memory is separately allocated and
aligned for out going arguments and the callee saved registers. In function
expand_call() 12 bytes is needed and 16 bytes is allocated to align to 8 bytes.
In function arm_get_frame_offsets() 20 bytes is needed and 24 bytes is
allocated to save registers. So this function needs 40 bytes of stack, exceeds
the capability of push/pop, extra sub/add instructions are needed to adjust sp.
Actually the function uses only 32 bytes of stack and no data element is 8
bytes aligned, simple push/pop should be enough.



More information about the Gcc-bugs mailing list