Compile the attached source code with options -march=armv5te -mthumb -Os -fno-exceptions, gcc generates: push {r4, lr} sub sp, sp, #8 add r4, sp, #4 // redundant mov r0, r4 // add r0, sp, 4 bl _ZN1XC1Ev mov r0, r4 // add r0, sp, 4 bl _Z3barP1X mov r0, r4 // add r0, sp, 4 bl _ZN1XD1Ev add sp, sp, #8 @ sp needed for prologue pop {r4, pc} As mentioned in the comments, the cse is redundant. We can recompute the value of (sp + 4) each time we want it. With this method we can save one instruction.
Created attachment 18120 [details] test case
This looks like one of those rematerialization problems albeit with the stack pointer this time.
Is there a C test case? Can you add objdump of the gcc-generated asm and the fixed asm to show the impact on code size? (/me is surprised that 3*"add r0,sp,4" is smaller than 1**"add r0,sp,4"+3*"mov r0,r4"... Thumb is amazing :-)
(In reply to comment #3) > Is there a C test case? Can you add objdump of the gcc-generated asm and the > fixed asm to show the impact on code size? (/me is surprised that 3*"add > r0,sp,4" is smaller than 1**"add r0,sp,4"+3*"mov r0,r4"... Thumb is amazing :-) The length of add r0,sp,4 and mov r0,r4 is the same for Thumb1 (16 bits). I suppose the ideal code generated would be something like this modulo errors with stack alignments in the prologue and the epilogue. We also don't need r4 in that case :) . So we can save a load, a store as well as 1 instruction over all. Smaller and faster by 1 instruction and reduced register usage. push {lr} sub sp, sp, #12 (8 byte stack alignment ) add r0, sp, 4 // add r0, sp, 4 bl _ZN1XC1Ev add r0, sp, #4 // add r0, sp, 4 bl _Z3barP1X add r0, sp, #4 // add r0, sp, 4 bl _ZN1XD1Ev add sp, sp, #12 (8 byte stack alignment ) @ sp needed for prologue pop {pc}
PowerPC has the same issue. Instructions on PPC are all the same size so 3 adds are better than one add plus 3 register moves. Here is a C example: int f(int *a); int g(int *a); int h(int *a); void hh(void) { int t; f(&t); g(&t); h(&t); } --- CUT --- Most RISC will have the same issue as most will have instructions which are fixed length. I bet this has to do with hard registers.
Before CSE: (insn 13 2 5 2 t.c:8 (set (reg:SI 119) (plus:SI (reg/f:SI 113 sfp) (const_int 8 [0x8]))) -1 (nil)) (insn 5 13 6 2 t.c:8 (set (reg:SI 3 3) (reg:SI 119)) 332 {*movsi_internal1} (nil)) ... (insn 14 6 7 2 t.c:9 (set (reg:SI 120) (plus:SI (reg/f:SI 113 sfp) (const_int 8 [0x8]))) -1 (nil)) (insn 7 14 8 2 t.c:9 (set (reg:SI 3 3) (reg:SI 120)) 332 {*movsi_internal1} (nil)) And then after CSE, we remove the adds but we should have moved the adds into the move.
Subject: Bug 40615 Author: bernds Date: Fri Jun 4 12:44:01 2010 New Revision: 160260 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=160260 Log: PR rtl-optimization/39871 PR rtl-optimization/40615 PR rtl-optimization/42500 PR rtl-optimization/42502 * ira.c (init_reg_equiv_memory_loc: New function. (ira): Call it twice. * reload.h (calculate_elim_costs_all_insns): Declare. * ira-costs.c: Include "reload.h". (regno_equiv_gains): New static variable. (init_costs): Allocate it. (finish_costs): Free it. (ira_costs): Call calculate_elim_costs_all_insns. (find_costs_and_classes): Take estimated elimination costs into account. (ira_adjust_equiv_reg_cost): New function. * ira.h (ira_adjust_equiv_reg_cost): Declare it. * reload1.c (init_eliminable_invariants, free_reg_equiv, elimination_costs_in_insn, note_reg_elim_costly): New static functions. (elim_bb): New static variable. (reload): Move code out of here into init_eliminable_invariants and free_reg_equiv. Call them. (calculate_elim_costs_all_insns): New function. (eliminate_regs_1): Declare. Add extra arg FOR_COSTS; all callers changed. If FOR_COSTS is true, don't call alter_reg, but call note_reg_elim_costly if we turned a valid memory address into an invalid one. * Makefile.in (ira-costs.o): Depend on reload.h. testsuite/ PR rtl-optimization/39871 PR rtl-optimization/40615 PR rtl-optimization/42500 PR rtl-optimization/42502 * gcc.target/arm/eliminate.c: New test. Added: trunk/gcc/testsuite/gcc.target/arm/eliminate.c Modified: trunk/gcc/ChangeLog trunk/gcc/Makefile.in trunk/gcc/ira-costs.c trunk/gcc/ira.c trunk/gcc/ira.h trunk/gcc/reload.h trunk/gcc/reload1.c trunk/gcc/testsuite/ChangeLog
Fixed.