This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Where can I put the optimization of got for arm back end at?


It is the base of GOT that is loaded once in the function prologue.
But for individual global variable's access, the address of the global
variable is loaded from GOT every time the global variable is accessed
after expand pass. For example, compile the following function with
options -Os -fpic -mthumb

extern int i;
int foo(int j)
{

  int t = i;
  i = j;
  return t;
}

After expand pass I got
...

(insn 6 4 7 2 src/./static_pic.c:5 (set (reg:SI 136)
        (unspec:SI [
                (const:SI (unspec:SI [
                            (const:SI (plus:SI (unspec:SI [
                                            (const_int 0 [0x0])
                                        ] 21)
                                    (const_int 4 [0x4])))
                        ] 24))
            ] 3)) -1 (nil))

(insn 7 6 8 2 src/./static_pic.c:5 (set (reg:SI 136)
        (unspec:SI [
                (reg:SI 136)
                (const_int 4 [0x4])
                (const_int 0 [0x0])
            ] 4)) -1 (nil))

(insn 8 7 2 2 src/./static_pic.c:5 (use (reg:SI 136)) -1 (nil))

(insn 2 8 3 2 src/./static_pic.c:3 (set (reg/v:SI 135 [ j ])
        (reg:SI 0 r0 [ j ])) -1 (nil))

(note 3 2 5 2 NOTE_INSN_FUNCTION_BEG)

(note 5 3 9 3 [bb 3] NOTE_INSN_BASIC_BLOCK)

(insn 9 5 10 3 src/./static_pic.c:5 (set (reg:SI 138)
        (unspec:SI [
                (symbol_ref:SI ("i") [flags 0xc0]  <var_decl 0x7f9e42227000 i>)
            ] 3)) -1 (nil))

(insn 10 9 11 3 src/./static_pic.c:5 (set (reg/f:SI 137)
        (mem/u/c:SI (plus:SI (reg:SI 136)
                (reg:SI 138)) [0 S4 A32])) -1 (expr_list:REG_EQUAL
(symbol_ref:SI ("i") [flags 0xc0]  <var_decl 0x7f9e42227000 i>)
        (nil)))

(insn 11 10 12 3 src/./static_pic.c:5 (set (reg/v:SI 133 [ t ])
        (mem/c/i:SI (reg/f:SI 137) [2 i+0 S4 A32])) -1 (nil))

(insn 12 11 13 3 src/./static_pic.c:6 (set (reg:SI 140)
        (unspec:SI [
                (symbol_ref:SI ("i") [flags 0xc0]  <var_decl 0x7f9e42227000 i>)
            ] 3)) -1 (nil))

(insn 13 12 14 3 src/./static_pic.c:6 (set (reg/f:SI 139)
        (mem/u/c:SI (plus:SI (reg:SI 136)
                (reg:SI 140)) [0 S4 A32])) -1 (expr_list:REG_EQUAL
(symbol_ref:SI ("i") [flags 0xc0]  <var_decl 0x7f9e42227000 i>)
        (nil)))

(insn 14 13 15 3 src/./static_pic.c:6 (set (mem/c/i:SI (reg/f:SI 139)
[2 i+0 S4 A32])
        (reg/v:SI 135 [ j ])) -1 (nil))

(insn 15 14 16 3 src/./static_pic.c:6 (set (reg:SI 134 [ <retval> ])
        (reg/v:SI 133 [ t ])) -1 (nil))

...

Insn 9 and 10 load the address of global variable i for the first
access. Insn 12 and 13 load the address of i for the second access.

After cse1 pass I got

(insn 6 4 7 2 src/./static_pic.c:5 (set (reg:SI 136)
        (unspec:SI [
                (const:SI (unspec:SI [
                            (const:SI (plus:SI (unspec:SI [
                                            (const_int 0 [0x0])
                                        ] 21)
                                    (const_int 4 [0x4])))
                        ] 24))
            ] 3)) 169 {pic_load_addr_thumb1} (nil))

(insn 7 6 8 2 src/./static_pic.c:5 (set (reg:SI 136)
        (unspec:SI [
                (reg:SI 136)
                (const_int 4 [0x4])
                (const_int 0 [0x0])
            ] 4)) 170 {pic_add_dot_plus_four} (nil))

(insn 8 7 2 2 src/./static_pic.c:5 (use (reg:SI 136)) -1 (nil))

(insn 2 8 3 2 src/./static_pic.c:3 (set (reg/v:SI 135 [ j ])
        (reg:SI 0 r0 [ j ])) 167 {*thumb1_movsi_insn} (nil))

(note 3 2 9 2 NOTE_INSN_FUNCTION_BEG)

(insn 9 3 10 2 src/./static_pic.c:5 (set (reg:SI 138)
        (unspec:SI [
                (symbol_ref:SI ("i") [flags 0xc0]  <var_decl 0x7f9e42227000 i>)
            ] 3)) 169 {pic_load_addr_thumb1} (nil))

(insn 10 9 11 2 src/./static_pic.c:5 (set (reg/f:SI 137)
        (mem/u/c:SI (plus:SI (reg:SI 136)
                (reg:SI 138)) [0 S4 A32])) 167 {*thumb1_movsi_insn}
(expr_list:REG_EQUAL (symbol_ref:SI ("i") [flags 0xc0]  <var_decl
0x7f9e42227000 i>)
        (nil)))

(insn 11 10 12 2 src/./static_pic.c:5 (set (reg/v:SI 133 [ t ])
        (mem/c/i:SI (reg/f:SI 137) [2 i+0 S4 A32])) 167
{*thumb1_movsi_insn} (nil))

(insn 12 11 13 2 src/./static_pic.c:6 (set (reg:SI 140)
        (reg:SI 138)) 167 {*thumb1_movsi_insn} (nil))

(insn 13 12 14 2 src/./static_pic.c:6 (set (reg/f:SI 139)
        (reg/f:SI 137)) 167 {*thumb1_movsi_insn} (expr_list:REG_EQUAL
(symbol_ref:SI ("i") [flags 0xc0]  <var_decl 0x7f9e42227000 i>)
        (nil)))

(insn 14 13 15 2 src/./static_pic.c:6 (set (mem/c/i:SI (reg/f:SI 137)
[2 i+0 S4 A32])
        (reg/v:SI 135 [ j ])) 167 {*thumb1_movsi_insn} (nil))

(insn 15 14 19 2 src/./static_pic.c:6 (set (reg:SI 134 [ <retval> ])
        (reg/v:SI 133 [ t ])) 167 {*thumb1_movsi_insn} (nil))

(insn 19 15 22 2 src/./static_pic.c:8 (set (reg/i:SI 0 r0)
        (reg/v:SI 133 [ t ])) 167 {*thumb1_movsi_insn} (nil))

Now the address of global variable i is loaded once by insn 9 and 10.
The later access of i (insn 13) reuse the result of insn 10 (reg 137).

So we'd better do it after some cse/gcse passes.

On Mon, Apr 5, 2010 at 9:24 PM, Paul Yuan <yingbo.com@gmail.com> wrote:
> I remember that the GOT address is loaded only once in the function
> prologue. It is not the cse/gcse that removes the two load insns. For ARM,
> GOT address is loaded into sl reg.
>
> So simplify_GOT should precede register allocation. Otherwise compiler
> cannot exploit the relaxed register. I suggest the simplify_GOT is
> integrated into expand_pass, where we can consider different targets and
> speed/size trade-off.
>
>
> On Fri, Apr 2, 2010 at 12:06 PM, Carrot Wei <carrot@google.com> wrote:
>>
>> This is really a good question!
>>
>> Consider the requirement of this optimization.
>>
>> 1. There should be at least 2 methods to load a global variable's
>> address from GOT. Usually it means using different relocation types.
>>
>> 2. By default all global variables access use the same one method.
>>
>> 3. In some cases (less than X global variables access) method A is
>> better, in other cases method B is better.
>>
>> With these constraints a simplify_GOT optimization pass is applicable.
>> But these constraints are too weak. The new optimization pass nearly
>> can do nothing except a call to target specific hook. I suspect such a
>> pass is acceptable.
>>
>> We can also add more constraints:
>>
>> 4. If we can restrict method A as following: first load the base
>> address of GOT into a register pic_reg, then the real global
>> variable's address is loaded as
>> Â Â Â Â Â Âload offset_reg, the offset from GOT base to the GOT entry
>> Â Â Â Â Â Âload address, [pic_reg + offset_reg]
>>
>> With this constraint the new pass knows there is a special register
>> pic_reg, it can look for and count all usage of pic_reg. If all usages
>> are method A and the count is more than the target specific threshold,
>> then the usages can be rewritten as method B. The method detection and
>> rewritten should be target specific.
>>
>> I don't know how other targets handle global address access with
>> -fpic. And how many targets satisfy these 4 constraints.
>>
>> thanks
>> Guozhi
>>
>> On Fri, Apr 2, 2010 at 4:31 AM, Steven Bosscher <stevenb.gcc@gmail.com>
>> wrote:
>> > On Thu, Apr 1, 2010 at 8:10 PM, Andrew Haley <aph@redhat.com> wrote:
>> >> On 28/03/10 15:45, Carrot Wei wrote:
>> >>> Hi
>> >>>
>> >>> The detailed description of the optimization is at
>> >>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43129. This is an ARM
>> >>> specific optimization.
>> >>>
>> >>> This optimization uses one less register (the register hold the GOT
>> >>> base), to get this beneficial the ideal place for it should be before
>> >>> register allocation.
>> >>>
>> >>> Usually expand pass generates instructions to load global variable's
>> >>> address from GOT entry for each access of the global variable. Later
>> >>> cse/gcse passes can remove many of them. In order to precisely model
>> >>> the cost, this optimization should be put after some cse/gcse passes.
>> >>>
>> >>> So what is the best place for this optimization? Is there any existed
>> >>> pass can be enhanced with this optimization? Or should I add a new
>> >>> pass?
>> >>
>> >> The obvious place is machine-dependent reorg, which is a very late
>> >> pass.
>> >
>> > Yes, and after register allocation, i.e. too late for Guozhi.
>> >
>> > Basically there is no place right now to stuff a pass like that.
>> > Question is: Is this optimization really, reallyreallyreally so target
>> > specific that a target-independent pass is not the better option?
>> >
>> > Ciao!
>> > Steven
>> >
>
>
>
> --
> Regards,
> Paul Yuan (èé)
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]