For each function that uses the stack for parameter passing GCC generally first stores lr and decrements sp, and before returning it increments sp and loads lr into pc. If the number of the parameters is not too much, then GCC could perform the parameter passing in a tricky way (as far as code size is concerned), i.e. save some arbitrary registers with lr on the stack (so the sp is modified implicitly). --- c example --- // arm-elf-gcc -S -g0 -Os -o param-stack.s param-stack.c void func (int* a, int* b); void foo () { int a=6, b=7; func(&a, &b); } --- asm code --- foo: mov ip, sp stmfd sp!, {fp, ip, lr, pc} <- OLD mov r3, #6 sub fp, ip, #4 sub sp, sp, #8 <- OLD sub r0, fp, #16 str r3, [fp, #-16] sub r1, fp, #20 add r3, r3, #1 str r3, [fp, #-20] bl func ldmea fp, {fp, sp, pc} --- possible solution --- foo: mov ip, sp stmfd sp!, {r1, r2, fp, ip, lr, pc} <-NEW mov r3, #6 sub fp, ip, #4 sub r0, fp, #16 str r3, [fp, #-16] sub r1, fp, #20 add r3, r3, #1 str r3, [fp, #-20] bl func ldmea fp, {fp, sp, pc}
Confirmed with mainline (20030825).
Not reconfirmed for almost a year.. Is this still an issue?
Undoubtedly. But I don't see much prospect of this being changed any time soon. It would require too much co-operation between the mid and back-ends.
As far as I understand, the instruction stream is smaller, but there are two extra memory writes to adjust the stack. This optimization is only important for '-Os'. Generally, it will slow the code as data writes and code fetches are generally the same cost and this trades 2 for 1.
I'm not sure exactly when this was fixed, but certainly it was some time ago. At least, gcc-4.6 appears to implement this optimization at -Os. foo: @ Function supports interworking. @ args = 0, pretend = 0, frame = 8 @ frame_needed = 0, uses_anonymous_args = 0 stmfd sp!, {r0, r1, r2, lr} mov r3, #7 mov r2, #6 mov r0, sp add r1, sp, #4 stmia sp, {r2, r3} bl func ldmfd sp!, {r1, r2, r3, lr} bx lr