Summary: | Simplify global variable's address loading with option -fpic | ||
---|---|---|---|
Product: | gcc | Reporter: | Carrot <carrot> |
Component: | target | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | NEW --- | ||
Severity: | enhancement | CC: | carrot, gcc-bugs, rearnsha, siarhei.siamashka, stephen.clarke |
Priority: | P3 | Keywords: | missed-optimization |
Version: | 4.5.0 | ||
Target Milestone: | --- | ||
Host: | i686-linux | Target: | arm-eabi |
Build: | i686-linux | Known to work: | |
Known to fail: | Last reconfirmed: | 2010-02-20 12:41:03 |
Description
Carrot
2010-02-20 08:28:31 UTC
This optimization uses one less register (the register hold the GOT base), to get this beneficial the ideal place for it should be before register allocation. Usually expand pass generates instructions to load global variable's address from GOT entry for each access of the global variable. Later cse/gcse passes can remove many of them. In order to precisely model the cost, this optimization should be put after some cse/gcse passes. So what is the best place for this optimization? Is there any existed pass can be enhanced with this optimization? Or should I add a new pass? Doesn't this belong in the linker as a relaxation? This would solve the reloc problem in the process. (In reply to comment #2) > Doesn't this belong in the linker as a relaxation? This would solve the reloc > problem in the process. > Gnu linker has already support R_ARM_GOT_PREL. And the new relocation (GOT_PREL) has been added to trunk gas. So the offset can be represented as i(GOT_PREL)+(.-(.LPIC1+4)) Now my question is where is the best place to insert this enhancement? My best guess is that this optimization should be done late. For instance, in the machine-dependant reorg pass. I don't see any place to hook this earlier. The problem is that reload should be able to "spill" pseudos containing your got addresses and re-compute them from the given constants rather than consuming a stack slot to hold the computed value. Which means that the number of instances of got address loads may vary until after reload, which means that any size estimation calculation you do earlier can be off. The down-side to doing it after reload is that you will have committed to saving and restoring arm_pic_register in the prologue and epilogue. Given that arm uses ldm/stm this ought not impact your code size often, but will in the extreme case of a leaf function with no other saved registers. I guess you'll have to experiment with your implementation to see what gives the best results on a large body of code. (In reply to comment #4) > I guess you'll have to experiment with your implementation to > see what gives the best results on a large body of code. > I will experiment on CSiBE. Some experiment results: Compile CSiBE with options -Os -fpic -mthumb -fno-short-enums without this optimization: 2830665 simplify-got before ra: 2825737 simplify-got after ra: 2826853 So this optimization should be done before RA. (In reply to comment #6) > Some experiment results: > > Compile CSiBE with options -Os -fpic -mthumb -fno-short-enums > > without this optimization: 2830665 > simplify-got before ra: 2825737 > simplify-got after ra: 2826853 These numbers are sum of each line in file result-size.csv. For arm instruction set, could you fold pc into the indexing to save an instruction? foo: ldr r3, .L2 // C .LPIC0: ldr r3, [r3,pc] // C @ sp needed for prologue ldr r2, [r3] str r0, [r3] mov r0, r2 bx lr .L3: .align 2 .L2: .word ABS_ADDRESS_OF_GOT_ENTRY_FOR_i -(.LPIC0+4) // C On Thu, 2010-10-14 at 16:33 +0000, stephen.clarke at st dot com wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43129
>
> Stephen Clarke <stephen.clarke at st dot com> changed:
>
> What |Removed |Added
> ----------------------------------------------------------------------------
> CC| |stephen.clarke at st dot
> | |com
>
> --- Comment #8 from Stephen Clarke <stephen.clarke at st dot com> 2010-10-14 16:32:56 UTC ---
> For arm instruction set, could you fold pc into the indexing
> to save an instruction?
>
> foo:
> ldr r3, .L2 // C
> .LPIC0:
> ldr r3, [r3,pc] // C
You'll find that the ARM-ARM thinks that PC in any of the 3 locations in this instruction form is *unpredictable*. Thus this form of the instruction should not be used.
cheers
Ramana
OK, I can see that the ARM ARM states for Rm == PC then its unpredictable. But for Rn == PC, I can only see that its unpredictable if W is 1 or P is 0 (I am looking at encoding A1). So I am struggling to understand that: ldr r3, [pc,r3] is unpredictable. Forgive me if I made a mistake, my knowledge of ARM is a little rusty. *** Bug 260998 has been marked as a duplicate of this bug. *** Seen from the domain http://volichat.com Page where seen: http://volichat.com/adult-chat-rooms Marked for reference. Resolved as fixed @bugzilla. |