Bug 42235 - redundant memory move from parameter space to spill space
Summary: redundant memory move from parameter space to spill space
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.5.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2009-12-01 09:01 UTC by Carrot
Modified: 2010-02-10 19:28 UTC (History)
3 users (show)

See Also:
Host: i686-linux
Target: arm-eabi
Build: i686-linux
Known to work:
Known to fail:
Last reconfirmed: 2009-12-09 16:55:46


Attachments
test case (177 bytes, text/plain)
2009-12-01 09:01 UTC, Carrot
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Carrot 2009-12-01 09:01:20 UTC
Compile the attached source code with options -Os -fpic -mthumb, gcc generates following code for the constructor:

        push    {r4, r5, r6, r7, lr}
.LCFI2:
        .pad #12
        sub     sp, sp, #12
.LCFI3:
        str     r3, [sp]
        add     r3, sp, #32    //A,  B
        ldrb    r3, [r3]       //A,  B
        ldr     r5, .L14
        mov     r4, r0
        mov     r7, r2
        mov     r6, r1
        str     r3, [sp, #4]   //A
        bl      _ZN1AC2Ev
        ldr     r3, .L14+4
.LPIC2:
        add     r5, pc
        ldr     r3, [r5, r3]
        add     r3, r3, #8
        mov     r1, sp        //A
        str     r3, [r4]
        add     r2, r1, #4    //A, C
        mov     r3, r4
        add     r3, r3, #42
        ldrb    r1, [r2]      //A, C
        strb    r7, [r3]
        add     r3, r3, #1    //   C
        strb    r1, [r3]      //A, C
        cmp     r6, #0
        beq     .L9
        ldr     r2, [sp]
        mov     r3, #2
        cmp     r2, #0
        bne     .L12
        mov     r3, #1
.L12:
        strh    r3, [r4, #40]
        b       .L11
.L9:
        strh    r6, [r4, #40]
.L11:
        add     sp, sp, #12
        mov     r0, r4
        @ sp needed for prologue
        pop     {r4, r5, r6, r7}
        pop     {r1}
        bx      r1

1. The instructions marked A move a byte from parameter to the object body, but it first moves that parameter to stack [sp, 4], then move the value from [sp,4] to the target object. The move to [sp, 4] is redundant.

2. The Instructions marked B load a byte from the stack, then store the whole word into another place on the stack. Since later instructions load a byte from the new space, we can simply move the whole word at the first place, so instructions marked as B can be reduced to one instruction "ldr r3, [sp, 32]"

3. Instructions marked C can be simplified to
     ldrb r1, [r1, 4]
     strb r1, [r3, 1]

4. Why not use "pop {r4, r5, r6, r7, pc}" as the epilogue?
Comment 1 Carrot 2009-12-01 09:01:41 UTC
Created attachment 19194 [details]
test case
Comment 2 Ramana Radhakrishnan 2009-12-01 18:26:46 UTC
(In reply to comment #0)
> Compile the attached source code with options -Os -fpic -mthumb, gcc generates
> following code for the constructor:

Could you do some more analysis of where this is coming from using rtl / tree dumps ? 
Comment 3 Carrot 2009-12-02 07:00:29 UTC
> Could you do some more analysis of where this is coming from using rtl / tree
> dumps ? 
> 
Everything in tree dump is normal.

After rtl expand, every parameter is loaded into a pseudo register at the entry of the function, including the parameter passed in from stack. So we get the following rtl code:

(insn 2 19 3 2 src/./testC.cpp:19 (set (reg/f:SI 135 [ this ])
        (reg:SI 0 r0 [ this ])) -1 (nil))

(insn 3 2 4 2 src/./testC.cpp:19 (set (reg/v:SI 136 [ p1 ])
        (reg:SI 1 r1 [ p1 ])) -1 (nil))

(insn 4 3 5 2 src/./testC.cpp:19 (set (reg/v:SI 137 [ p2 ])
        (reg:SI 2 r2 [ p2 ])) -1 (nil))

(insn 5 4 6 2 src/./testC.cpp:19 (set (reg/v:SI 138 [ p3 ])
        (reg:SI 3 r3 [ p3 ])) -1 (nil))

(insn 6 5 7 2 src/./testC.cpp:19 (set (reg/f:SI 141)
        (reg/f:SI 128 virtual-incoming-args)) -1 (nil))

(insn 7 6 8 2 src/./testC.cpp:19 (set (reg:SI 142)
        (zero_extend:SI (mem/c/i:QI (reg/f:SI 141) [7 p4+0 S1 A32]))) -1 (nil))

(insn 8 7 9 2 src/./testC.cpp:19 (set (reg:QI 140)
        (subreg:QI (reg:SI 142) 0)) -1 (nil))

(insn 9 8 10 2 src/./testC.cpp:19 (set (reg:QI 144)
        (reg:QI 140)) -1 (nil))

(insn 10 9 11 2 src/./testC.cpp:19 (set (reg:SI 143)
        (ashift:SI (subreg:SI (reg:QI 144) 0)
            (const_int 24 [0x18]))) -1 (nil))

(insn 11 10 12 2 src/./testC.cpp:19 (set (reg/v:SI 139 [ p4 ])
        (lshiftrt:SI (reg:SI 143)
            (const_int 24 [0x18]))) -1 (expr_list:REG_EQUAL (zero_extend:SI (reg:QI 140))
        (nil)))

...  // Other codes

(insn 32 31 33 3 src/./testC.cpp:19 (set (reg:QI 155)
        (subreg/s/u:QI (reg/v:SI 139 [ p4 ]) 0)) -1 (nil))

(insn 33 32 34 3 src/./testC.cpp:19 (set (mem/s:QI (reg/f:SI 154) [7 this_1(D)->f3+0 S1 A8])
        (reg:QI 155)) -1 (nil))


When it get to register allocation, it is simplified to:


(insn 2 19 3 2 src/./testC.cpp:19 (set (reg/f:SI 135 [ this ])
        (reg:SI 0 r0 [ this ])) 167 {*thumb1_movsi_insn} (expr_list:REG_DEAD (reg:SI 0 r0 [ this ])
        (nil)))

(insn 3 2 4 2 src/./testC.cpp:19 (set (reg/v:SI 136 [ p1 ])
        (reg:SI 1 r1 [ p1 ])) 167 {*thumb1_movsi_insn} (expr_list:REG_DEAD (reg:SI 1 r1 [ p1 ])
        (nil)))

(insn 4 3 5 2 src/./testC.cpp:19 (set (reg/v:SI 137 [ p2 ])
        (reg:SI 2 r2 [ p2 ])) 167 {*thumb1_movsi_insn} (expr_list:REG_DEAD (reg:SI 2 r2 [ p2 ])
        (nil)))

(insn 5 4 6 2 src/./testC.cpp:19 (set (reg/v:SI 138 [ p3 ])
        (reg:SI 3 r3 [ p3 ])) 167 {*thumb1_movsi_insn} (expr_list:REG_DEAD (reg:SI 3 r3 [ p3 ])
        (nil)))

(insn 6 5 7 2 src/./testC.cpp:19 (set (reg/f:SI 141)
        (reg/f:SI 26 afp)) 167 {*thumb1_movsi_insn} (nil))

(note 7 6 10 2 NOTE_INSN_DELETED)

(note 10 7 11 2 NOTE_INSN_DELETED)

(insn 11 10 12 2 src/./testC.cpp:19 (set (reg/v:SI 139 [ p4 ])
        (zero_extend:SI (mem/c/i:QI (reg/f:SI 141) [7 p4+0 S1 A32]))) 146 {*thumb1_zero_extendqisi2} (expr_list:REG_DEAD (reg/f:SI 141)
        (nil)))

...  // Other codes

(insn 33 31 34 2 src/./testC.cpp:19 (set (mem/s:QI (reg/f:SI 154) [7 this_1(D)->f3+0 S1 A8])
        (subreg/s/u:QI (reg/v:SI 139 [ p4 ]) 0)) 178 {*thumb1_movqi_insn} (expr_list:REG_DEAD (reg/f:SI 154)
        (expr_list:REG_DEAD (reg/v:SI 139 [ p4 ])
            (nil))))

Due to the high register pressure, reg 139 was chosen as the spilled variable, so we get a value moved from parameter stack space to spilled stack space.

The instructions marked C are introduced by register spill and reload, need to do forward propagation after RA?
Comment 4 Bernd Schmidt 2010-07-16 02:09:24 UTC
Subject: Bug 42235

Author: bernds
Date: Fri Jul 16 02:09:03 2010
New Revision: 162240

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162240
Log:
	PR target/42235
	* function.c (record_hard_reg_sets): New static function.
	(assign_parm_setup_reg): If an optab for extending exists and the
	generated code clobbbers no hard regs, emit the insn directly and
	create a REG_EQUIV note.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/function.c

Comment 5 Bernd Schmidt 2010-07-16 23:48:03 UTC
Subject: Bug 42235

Author: bernds
Date: Fri Jul 16 23:47:46 2010
New Revision: 162270

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162270
Log:
	PR target/42235
	* postreload.c (reload_cse_move2add): Return bool, true if anything.
	changed.  All callers changed.
	(move2add_use_add2_insn): Likewise.
	(move2add_use_add3_insn): Likewise.
	(reload_cse_regs): If reload_cse_move2add changed anything, rerun
	reload_combine.
	(RELOAD_COMBINE_MAX_USES): Bump to 16.
	(last_jump_ruid): New static variable.
	(struct reg_use): New members CONTAINING_MEM and RUID.
	(reg_state): New members ALL_OFFSETS_MATCH and REAL_STORE_RUID.
	(reload_combine_split_one_ruid, reload_combine_split_ruids,
	reload_combine_purge_insn_uses, reload_combine_closest_single_use
	reload_combine_purge_reg_uses_after_ruid,
	reload_combine_recognize_const_pattern): New static functions.
	(reload_combine_recognize_pattern): Verify that ALL_OFFSETS_MATCH
	is true for our reg and that we have available index regs.
	(reload_combine_note_use): New args RUID and CONTAINING_MEM.  All
	callers changed.  Use them to initialize fields in struct reg_use.
	(reload_combine): Initialize last_jump_ruid.  Be careful when to
	take PREV_INSN of the scanned insn.  Update REAL_STORE_RUID fields.
	Call reload_combine_recognize_const_pattern.
	(reload_combine_note_store): Update REAL_STORE_RUID field.

	* gcc.target/arm/pr42235.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/arm/pr42235.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/postreload.c
    trunk/gcc/testsuite/ChangeLog