Compile the attached test case with options -mthumb -Os -fpic, gcc generates: goo: push {r3, r4, r5, lr} ldr r4, .L7 ldr r3, .L7+4 // A .LPIC0: add r4, pc ldr r3, [r4, r3] // B ldr r3, [r3] mov r5, r0 ldr r0, [r3] cmp r0, #0 beq .L2 mov r1, r5 bl foo .L2: ldr r3, [r5] mov r0, #0 cmp r3, #0 beq .L3 ldr r2, .L7+4 // C ldr r2, [r4, r2] // D ldr r2, [r2, #4] ldr r2, [r2] cmp r3, r2 beq .L3 ldr r0, [r3] .L3: @ sp needed for prologue pop {r3, r4, r5} pop {r1} bx r1 .L7: .word _GLOBAL_OFFSET_TABLE_-(.LPIC0+4) .word gObj(GOT) Notice instructions A,B,C,D, they load the address of global variable gObj twice. When compiled with options -mthumb -O2 -fpic, gcc generates: goo: push {r4, r5, r6, lr} ldr r4, .L8 ldr r5, .L8+4 // E .LPIC0: add r4, pc ldr r3, [r4, r5] // F ldr r3, [r3] mov r6, r0 ldr r0, [r3] cmp r0, #0 bne .L7 .L2: ldr r3, [r6] mov r0, #0 cmp r3, #0 beq .L3 ldr r2, [r4, r5] // G ldr r2, [r2, #4] // H ldr r2, [r2] cmp r2, r3 beq .L3 ldr r0, [r3] .L3: @ sp needed for prologue pop {r4, r5, r6} pop {r1} bx r1 .L7: mov r1, r6 bl foo b .L2 .L9: .align 2 .L8: .word _GLOBAL_OFFSET_TABLE_-(.LPIC0+4) .word gObj(GOT) Instructions E,F,G do the same thing, but with one less memory load instruction. It uses the same number of instructions. -Os should do the same optimization. Actually -O2 result is still not optimal. If we store the result of F into r4, we can directly use r4 in instruction H, so G can also be removed.
Created attachment 19388 [details] test case
> instruction. It uses the same number of instructions. -Os should do the same It uses the same number of registers.
Is this for Thumb1 or Thumb2 ? Can you check with trunk today ?
It is for thumb1, there should be another parameter that I missed -march=armv5te. It still exists in today's trunk.
Elimination of subsequent calculations of PIC addresses should be handled in code hoisting optimization. However, there are two problems that inhibit the optimization: 1. ARM backend outputs calculation of a PIC address as two instructions (load GOT offset from constant pool and then load PIC address from GOT) and hoist only handles expressions contained in a single_set(). 2. Hoisting algorithm misses many opportunities for expression hoisting to basic blocks that contain calculation of the expression. I.e., expr from bb4 will not be hoisted to bb2 even though it is trivially profitable: bb2: expr condjump bb4 bb3: <no expr> jump bb5 bb4: expr bb5: I'm testing patches to the ARM backend and code hoisting pass which fix the above problems. The generated code calculates address of the global variable only once: goo: push {r3, r4, r5, lr} ldr r3, .L6 ldr r2, .L6+4 .LPIC0: add r3, pc ldr r5, [r3, r2] mov r4, r0 ldr r3, [r5] ldr r0, [r3] cmp r0, #0 beq .L2 mov r1, r4 bl foo .L2: ldr r3, [r4] mov r0, #0 cmp r3, #0 beq .L3 ldr r2, [r5] cmp r3, r2 beq .L3 ldr r0, [r3] .L3: @ sp needed for prologue pop {r3, r4, r5, pc} .L7: .align 2 .L6: .word _GLOBAL_OFFSET_TABLE_-(.LPIC0+4) .word gObj(GOT)
Subject: Bug 42495 Author: mkuvyrkov Date: Tue Jul 27 19:34:55 2010 New Revision: 162590 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162590 Log: PR rtl-optimization/40956 PR target/42495 PR middle-end/42574 * gcse.c (compute_code_hoist_vbeinout): Consider more expressions for hoisting. (hoist_code): Count occurences in current block too. Modified: trunk/gcc/ChangeLog trunk/gcc/gcse.c
Subject: Bug 42495 Author: mkuvyrkov Date: Tue Jul 27 19:38:10 2010 New Revision: 162592 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162592 Log: PR target/42495 PR middle-end/42574 * gcse.c (hoist_expr_reaches_here_p): Remove excessive check. Modified: trunk/gcc/ChangeLog trunk/gcc/gcse.c
Subject: Bug 42495 Author: mkuvyrkov Date: Tue Jul 27 19:42:15 2010 New Revision: 162594 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162594 Log: PR target/42495 PR middle-end/42574 * config/arm/arm.c (thumb1_size_rtx_costs): Add cost for "J" constants. * config/arm/arm.md (define_split "J", define_split "K"): Make IRA/reload friendly. Modified: trunk/gcc/ChangeLog trunk/gcc/config/arm/arm.c trunk/gcc/config/arm/arm.md
Subject: Bug 42495 Author: mkuvyrkov Date: Tue Jul 27 19:44:51 2010 New Revision: 162595 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162595 Log: PR target/42495 PR middle-end/42574 * config/arm/arm.c (legitimize_pic_address): Use gen_calculate_pic_address pattern to emit calculation of PIC address. (will_be_in_index_register): New function. (arm_legitimate_address_outer_p, thumb2_legitimate_address_p,) (thumb1_legitimate_address_p): Use it provided !strict_p. * config/arm/arm.md (calculate_pic_address): New expand and split. Modified: trunk/gcc/ChangeLog trunk/gcc/config/arm/arm.c trunk/gcc/config/arm/arm.md
Subject: Bug 42495 Author: mkuvyrkov Date: Tue Jul 27 19:48:15 2010 New Revision: 162597 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162597 Log: PR target/42495 PR middle-end/42574 * basic-block.h (get_dominated_to_depth): Declare. * dominance.c (get_dominated_to_depth): New function, use get_all_dominated_blocks as a base. (get_all_dominated_blocks): Use get_dominated_to_depth. * gcse.c (occr_t, VEC (occr_t, heap)): Define. (hoist_exprs): Remove. (alloc_code_hoist_mem, free_code_hoist_mem): Update. (compute_code_hoist_vbeinout): Add debug print outs. (hoist_code): Partially rewrite, simplify. Use get_dominated_to_depth. * params.def (PARAM_MAX_HOIST_DEPTH): New parameter to avoid quadratic behavior. * params.h (MAX_HOIST_DEPTH): New macro. * doc/invoke.texi (max-hoist-depth): Document. Modified: trunk/gcc/ChangeLog trunk/gcc/basic-block.h trunk/gcc/doc/invoke.texi trunk/gcc/dominance.c trunk/gcc/gcse.c trunk/gcc/params.def trunk/gcc/params.h
Subject: Bug 42495 Author: mkuvyrkov Date: Tue Jul 27 21:06:31 2010 New Revision: 162600 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162600 Log: PR rtl-optimization/40956 PR target/42495 PR middle-end/42574 * gcc.target/arm/pr40956.c, gcc.target/arm/pr42495.c, * gcc.target/arm/pr42574.c: Add tests. Added: trunk/gcc/testsuite/gcc.target/arm/pr40956.c trunk/gcc/testsuite/gcc.target/arm/pr42495.c trunk/gcc/testsuite/gcc.target/arm/pr42574.c Modified: trunk/gcc/testsuite/ChangeLog
Should be fixed now by the above patch series.