Compile the attached source code with options -Os -fpic -mthumb, gcc generates following code for the constructor: push {r4, r5, r6, r7, lr} .LCFI2: .pad #12 sub sp, sp, #12 .LCFI3: str r3, [sp] add r3, sp, #32 //A, B ldrb r3, [r3] //A, B ldr r5, .L14 mov r4, r0 mov r7, r2 mov r6, r1 str r3, [sp, #4] //A bl _ZN1AC2Ev ldr r3, .L14+4 .LPIC2: add r5, pc ldr r3, [r5, r3] add r3, r3, #8 mov r1, sp //A str r3, [r4] add r2, r1, #4 //A, C mov r3, r4 add r3, r3, #42 ldrb r1, [r2] //A, C strb r7, [r3] add r3, r3, #1 // C strb r1, [r3] //A, C cmp r6, #0 beq .L9 ldr r2, [sp] mov r3, #2 cmp r2, #0 bne .L12 mov r3, #1 .L12: strh r3, [r4, #40] b .L11 .L9: strh r6, [r4, #40] .L11: add sp, sp, #12 mov r0, r4 @ sp needed for prologue pop {r4, r5, r6, r7} pop {r1} bx r1 1. The instructions marked A move a byte from parameter to the object body, but it first moves that parameter to stack [sp, 4], then move the value from [sp,4] to the target object. The move to [sp, 4] is redundant. 2. The Instructions marked B load a byte from the stack, then store the whole word into another place on the stack. Since later instructions load a byte from the new space, we can simply move the whole word at the first place, so instructions marked as B can be reduced to one instruction "ldr r3, [sp, 32]" 3. Instructions marked C can be simplified to ldrb r1, [r1, 4] strb r1, [r3, 1] 4. Why not use "pop {r4, r5, r6, r7, pc}" as the epilogue?
Created attachment 19194 [details] test case
(In reply to comment #0) > Compile the attached source code with options -Os -fpic -mthumb, gcc generates > following code for the constructor: Could you do some more analysis of where this is coming from using rtl / tree dumps ?
> Could you do some more analysis of where this is coming from using rtl / tree > dumps ? > Everything in tree dump is normal. After rtl expand, every parameter is loaded into a pseudo register at the entry of the function, including the parameter passed in from stack. So we get the following rtl code: (insn 2 19 3 2 src/./testC.cpp:19 (set (reg/f:SI 135 [ this ]) (reg:SI 0 r0 [ this ])) -1 (nil)) (insn 3 2 4 2 src/./testC.cpp:19 (set (reg/v:SI 136 [ p1 ]) (reg:SI 1 r1 [ p1 ])) -1 (nil)) (insn 4 3 5 2 src/./testC.cpp:19 (set (reg/v:SI 137 [ p2 ]) (reg:SI 2 r2 [ p2 ])) -1 (nil)) (insn 5 4 6 2 src/./testC.cpp:19 (set (reg/v:SI 138 [ p3 ]) (reg:SI 3 r3 [ p3 ])) -1 (nil)) (insn 6 5 7 2 src/./testC.cpp:19 (set (reg/f:SI 141) (reg/f:SI 128 virtual-incoming-args)) -1 (nil)) (insn 7 6 8 2 src/./testC.cpp:19 (set (reg:SI 142) (zero_extend:SI (mem/c/i:QI (reg/f:SI 141) [7 p4+0 S1 A32]))) -1 (nil)) (insn 8 7 9 2 src/./testC.cpp:19 (set (reg:QI 140) (subreg:QI (reg:SI 142) 0)) -1 (nil)) (insn 9 8 10 2 src/./testC.cpp:19 (set (reg:QI 144) (reg:QI 140)) -1 (nil)) (insn 10 9 11 2 src/./testC.cpp:19 (set (reg:SI 143) (ashift:SI (subreg:SI (reg:QI 144) 0) (const_int 24 [0x18]))) -1 (nil)) (insn 11 10 12 2 src/./testC.cpp:19 (set (reg/v:SI 139 [ p4 ]) (lshiftrt:SI (reg:SI 143) (const_int 24 [0x18]))) -1 (expr_list:REG_EQUAL (zero_extend:SI (reg:QI 140)) (nil))) ... // Other codes (insn 32 31 33 3 src/./testC.cpp:19 (set (reg:QI 155) (subreg/s/u:QI (reg/v:SI 139 [ p4 ]) 0)) -1 (nil)) (insn 33 32 34 3 src/./testC.cpp:19 (set (mem/s:QI (reg/f:SI 154) [7 this_1(D)->f3+0 S1 A8]) (reg:QI 155)) -1 (nil)) When it get to register allocation, it is simplified to: (insn 2 19 3 2 src/./testC.cpp:19 (set (reg/f:SI 135 [ this ]) (reg:SI 0 r0 [ this ])) 167 {*thumb1_movsi_insn} (expr_list:REG_DEAD (reg:SI 0 r0 [ this ]) (nil))) (insn 3 2 4 2 src/./testC.cpp:19 (set (reg/v:SI 136 [ p1 ]) (reg:SI 1 r1 [ p1 ])) 167 {*thumb1_movsi_insn} (expr_list:REG_DEAD (reg:SI 1 r1 [ p1 ]) (nil))) (insn 4 3 5 2 src/./testC.cpp:19 (set (reg/v:SI 137 [ p2 ]) (reg:SI 2 r2 [ p2 ])) 167 {*thumb1_movsi_insn} (expr_list:REG_DEAD (reg:SI 2 r2 [ p2 ]) (nil))) (insn 5 4 6 2 src/./testC.cpp:19 (set (reg/v:SI 138 [ p3 ]) (reg:SI 3 r3 [ p3 ])) 167 {*thumb1_movsi_insn} (expr_list:REG_DEAD (reg:SI 3 r3 [ p3 ]) (nil))) (insn 6 5 7 2 src/./testC.cpp:19 (set (reg/f:SI 141) (reg/f:SI 26 afp)) 167 {*thumb1_movsi_insn} (nil)) (note 7 6 10 2 NOTE_INSN_DELETED) (note 10 7 11 2 NOTE_INSN_DELETED) (insn 11 10 12 2 src/./testC.cpp:19 (set (reg/v:SI 139 [ p4 ]) (zero_extend:SI (mem/c/i:QI (reg/f:SI 141) [7 p4+0 S1 A32]))) 146 {*thumb1_zero_extendqisi2} (expr_list:REG_DEAD (reg/f:SI 141) (nil))) ... // Other codes (insn 33 31 34 2 src/./testC.cpp:19 (set (mem/s:QI (reg/f:SI 154) [7 this_1(D)->f3+0 S1 A8]) (subreg/s/u:QI (reg/v:SI 139 [ p4 ]) 0)) 178 {*thumb1_movqi_insn} (expr_list:REG_DEAD (reg/f:SI 154) (expr_list:REG_DEAD (reg/v:SI 139 [ p4 ]) (nil)))) Due to the high register pressure, reg 139 was chosen as the spilled variable, so we get a value moved from parameter stack space to spilled stack space. The instructions marked C are introduced by register spill and reload, need to do forward propagation after RA?
Subject: Bug 42235 Author: bernds Date: Fri Jul 16 02:09:03 2010 New Revision: 162240 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162240 Log: PR target/42235 * function.c (record_hard_reg_sets): New static function. (assign_parm_setup_reg): If an optab for extending exists and the generated code clobbbers no hard regs, emit the insn directly and create a REG_EQUIV note. Modified: trunk/gcc/ChangeLog trunk/gcc/function.c
Subject: Bug 42235 Author: bernds Date: Fri Jul 16 23:47:46 2010 New Revision: 162270 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162270 Log: PR target/42235 * postreload.c (reload_cse_move2add): Return bool, true if anything. changed. All callers changed. (move2add_use_add2_insn): Likewise. (move2add_use_add3_insn): Likewise. (reload_cse_regs): If reload_cse_move2add changed anything, rerun reload_combine. (RELOAD_COMBINE_MAX_USES): Bump to 16. (last_jump_ruid): New static variable. (struct reg_use): New members CONTAINING_MEM and RUID. (reg_state): New members ALL_OFFSETS_MATCH and REAL_STORE_RUID. (reload_combine_split_one_ruid, reload_combine_split_ruids, reload_combine_purge_insn_uses, reload_combine_closest_single_use reload_combine_purge_reg_uses_after_ruid, reload_combine_recognize_const_pattern): New static functions. (reload_combine_recognize_pattern): Verify that ALL_OFFSETS_MATCH is true for our reg and that we have available index regs. (reload_combine_note_use): New args RUID and CONTAINING_MEM. All callers changed. Use them to initialize fields in struct reg_use. (reload_combine): Initialize last_jump_ruid. Be careful when to take PREV_INSN of the scanned insn. Update REAL_STORE_RUID fields. Call reload_combine_recognize_const_pattern. (reload_combine_note_store): Update REAL_STORE_RUID field. * gcc.target/arm/pr42235.c: New test. Added: trunk/gcc/testsuite/gcc.target/arm/pr42235.c Modified: trunk/gcc/ChangeLog trunk/gcc/postreload.c trunk/gcc/testsuite/ChangeLog