Bug 42495 - redundant memory load
Summary: redundant memory load
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.5.0
: P3 enhancement
Target Milestone: ---
Assignee: Maxim Kuvyrkov
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2009-12-25 07:51 UTC by Carrot
Modified: 2010-07-27 21:11 UTC (History)
3 users (show)

See Also:
Host: i686-linux
Target: arm-eabi
Build: i686-linux
Known to work:
Known to fail:
Last reconfirmed: 2010-06-08 10:41:43


Attachments
test case (194 bytes, text/x-csrc)
2009-12-25 07:51 UTC, Carrot
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Carrot 2009-12-25 07:51:06 UTC
Compile the attached test case with options -mthumb -Os -fpic, gcc generates:

goo:
        push    {r3, r4, r5, lr}
        ldr     r4, .L7
        ldr     r3, .L7+4        // A
.LPIC0:
        add     r4, pc
        ldr     r3, [r4, r3]     // B
        ldr     r3, [r3]
        mov     r5, r0
        ldr     r0, [r3]
        cmp     r0, #0
        beq     .L2
        mov     r1, r5
        bl      foo
.L2:
        ldr     r3, [r5]
        mov     r0, #0
        cmp     r3, #0
        beq     .L3
        ldr     r2, .L7+4         // C
        ldr     r2, [r4, r2]      // D
        ldr     r2, [r2, #4]
        ldr     r2, [r2]
        cmp     r3, r2
        beq     .L3
        ldr     r0, [r3]
.L3:
        @ sp needed for prologue
        pop     {r3, r4, r5}
        pop     {r1}
        bx      r1
.L7:
        .word   _GLOBAL_OFFSET_TABLE_-(.LPIC0+4)
        .word   gObj(GOT)


Notice instructions A,B,C,D, they load the address of global variable gObj twice.

When compiled with options -mthumb -O2 -fpic, gcc generates:

goo:
        push    {r4, r5, r6, lr}
        ldr     r4, .L8
        ldr     r5, .L8+4      // E
.LPIC0:
        add     r4, pc
        ldr     r3, [r4, r5]   // F
        ldr     r3, [r3]
        mov     r6, r0
        ldr     r0, [r3]
        cmp     r0, #0
        bne     .L7
.L2:
        ldr     r3, [r6]
        mov     r0, #0
        cmp     r3, #0
        beq     .L3
        ldr     r2, [r4, r5]     // G
        ldr     r2, [r2, #4]     // H
        ldr     r2, [r2]
        cmp     r2, r3
        beq     .L3
        ldr     r0, [r3]
.L3:
        @ sp needed for prologue
        pop     {r4, r5, r6}
        pop     {r1}
        bx      r1
.L7:
        mov     r1, r6
        bl      foo
        b       .L2
.L9:
        .align  2
.L8:
        .word   _GLOBAL_OFFSET_TABLE_-(.LPIC0+4)
        .word   gObj(GOT)

Instructions E,F,G do the same thing, but with one less memory load instruction. It uses the same number of instructions. -Os should do the same optimization.

Actually -O2 result is still not optimal. If we store the result of F into r4, we can directly use r4 in instruction H, so G can also be removed.
Comment 1 Carrot 2009-12-25 07:51:29 UTC
Created attachment 19388 [details]
test case
Comment 2 Carrot 2009-12-25 07:52:44 UTC
> instruction. It uses the same number of instructions. -Os should do the same

It uses the same number of registers.
Comment 3 Ramana Radhakrishnan 2010-03-19 15:12:32 UTC
Is this for Thumb1 or Thumb2 ? Can you check with trunk today ? 
Comment 4 Carrot 2010-03-21 09:18:44 UTC
It is for thumb1, there should be another parameter that I missed -march=armv5te. It still exists in today's trunk.
Comment 5 Maxim Kuvyrkov 2010-06-08 10:41:43 UTC
Elimination of subsequent calculations of PIC addresses should be handled in code hoisting optimization.

However, there are two problems that inhibit the optimization:

1. ARM backend outputs calculation of a PIC address as two instructions (load GOT offset from constant pool and then load PIC address from GOT) and hoist only handles expressions contained in a single_set().

2. Hoisting algorithm misses many opportunities for expression hoisting to basic blocks that contain calculation of the expression.  I.e., expr from bb4 will not be hoisted to bb2 even though it is trivially profitable:

bb2:
  expr
  condjump bb4
bb3:
  <no expr>
  jump bb5
bb4:
  expr
bb5:

I'm testing patches to the ARM backend and code hoisting pass which fix the above problems.  The  generated code calculates address of the global variable only once:

goo:
	push	{r3, r4, r5, lr}
	ldr	r3, .L6
	ldr	r2, .L6+4
.LPIC0:
	add	r3, pc
	ldr	r5, [r3, r2]
	mov	r4, r0
	ldr	r3, [r5]
	ldr	r0, [r3]
	cmp	r0, #0
	beq	.L2
	mov	r1, r4
	bl	foo
.L2:
	ldr	r3, [r4]
	mov	r0, #0
	cmp	r3, #0
	beq	.L3
	ldr	r2, [r5]
	cmp	r3, r2
	beq	.L3
	ldr	r0, [r3]
.L3:
	@ sp needed for prologue
	pop	{r3, r4, r5, pc}
.L7:
	.align	2
.L6:
	.word	_GLOBAL_OFFSET_TABLE_-(.LPIC0+4)
	.word	gObj(GOT)
Comment 6 Maxim Kuvyrkov 2010-07-27 19:35:23 UTC
Subject: Bug 42495

Author: mkuvyrkov
Date: Tue Jul 27 19:34:55 2010
New Revision: 162590

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162590
Log:
	PR rtl-optimization/40956
	PR target/42495
	PR middle-end/42574
	* gcse.c (compute_code_hoist_vbeinout): Consider more expressions
	for hoisting.
	(hoist_code): Count occurences in current block too.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/gcse.c

Comment 7 Maxim Kuvyrkov 2010-07-27 19:38:28 UTC
Subject: Bug 42495

Author: mkuvyrkov
Date: Tue Jul 27 19:38:10 2010
New Revision: 162592

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162592
Log:
	PR target/42495
	PR middle-end/42574
	* gcse.c (hoist_expr_reaches_here_p): Remove excessive check.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/gcse.c

Comment 8 Maxim Kuvyrkov 2010-07-27 19:42:33 UTC
Subject: Bug 42495

Author: mkuvyrkov
Date: Tue Jul 27 19:42:15 2010
New Revision: 162594

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162594
Log:
	PR target/42495
	PR middle-end/42574
	* config/arm/arm.c (thumb1_size_rtx_costs): Add cost for "J" constants.
	* config/arm/arm.md (define_split "J", define_split "K"): Make
	IRA/reload friendly.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/arm/arm.c
    trunk/gcc/config/arm/arm.md

Comment 9 Maxim Kuvyrkov 2010-07-27 19:45:11 UTC
Subject: Bug 42495

Author: mkuvyrkov
Date: Tue Jul 27 19:44:51 2010
New Revision: 162595

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162595
Log:
	PR target/42495
	PR middle-end/42574
	* config/arm/arm.c (legitimize_pic_address): Use
	gen_calculate_pic_address pattern to emit calculation of PIC address.
	(will_be_in_index_register): New function.
	(arm_legitimate_address_outer_p, thumb2_legitimate_address_p,)
	(thumb1_legitimate_address_p): Use it provided !strict_p.
	* config/arm/arm.md (calculate_pic_address): New expand and split.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/arm/arm.c
    trunk/gcc/config/arm/arm.md

Comment 10 Maxim Kuvyrkov 2010-07-27 19:48:32 UTC
Subject: Bug 42495

Author: mkuvyrkov
Date: Tue Jul 27 19:48:15 2010
New Revision: 162597

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162597
Log:
	PR target/42495
	PR middle-end/42574
	* basic-block.h (get_dominated_to_depth): Declare.
	* dominance.c (get_dominated_to_depth): New function, use
	get_all_dominated_blocks as a base.
	(get_all_dominated_blocks): Use get_dominated_to_depth.

	* gcse.c (occr_t, VEC (occr_t, heap)): Define.
	(hoist_exprs): Remove.
	(alloc_code_hoist_mem, free_code_hoist_mem): Update.
	(compute_code_hoist_vbeinout): Add debug print outs.
	(hoist_code): Partially rewrite, simplify.  Use get_dominated_to_depth.

	* params.def (PARAM_MAX_HOIST_DEPTH): New parameter to avoid
	quadratic behavior.
	* params.h (MAX_HOIST_DEPTH): New macro.
	* doc/invoke.texi (max-hoist-depth): Document.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/basic-block.h
    trunk/gcc/doc/invoke.texi
    trunk/gcc/dominance.c
    trunk/gcc/gcse.c
    trunk/gcc/params.def
    trunk/gcc/params.h

Comment 11 Maxim Kuvyrkov 2010-07-27 21:06:56 UTC
Subject: Bug 42495

Author: mkuvyrkov
Date: Tue Jul 27 21:06:31 2010
New Revision: 162600

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162600
Log:
	PR rtl-optimization/40956
	PR target/42495
	PR middle-end/42574
	* gcc.target/arm/pr40956.c, gcc.target/arm/pr42495.c,
	* gcc.target/arm/pr42574.c: Add tests.

Added:
    trunk/gcc/testsuite/gcc.target/arm/pr40956.c
    trunk/gcc/testsuite/gcc.target/arm/pr42495.c
    trunk/gcc/testsuite/gcc.target/arm/pr42574.c
Modified:
    trunk/gcc/testsuite/ChangeLog

Comment 12 Maxim Kuvyrkov 2010-07-27 21:11:17 UTC
Should be fixed now by the above patch series.