[PATCH, ARM] Add a new target hook to compute the frame layout
Bernd Edlinger
bernd.edlinger@hotmail.de
Sat May 6 18:16:00 GMT 2017
On 05/05/17 16:59, Richard Earnshaw (lists) wrote:
> On 05/09/16 17:43, Bernd Edlinger wrote:
>> Hi Richard,
>>
>> what do you think of this patch, is it OK (with the suggested wording)?
>>
>
> Bernd,
>
> Apologies, this seems to have fallen through a crack.
>
> I'm happy with this. Does it still apply?
>
Yes, only in a few places the context lines changed slightly.
So I attached a refreshed patch for your reference.
> If so, I suggest applying it after a 24-hour cooling off period for any
> final comments.
>
Fine, then I will apply it on monday evening, unless someone objects.
Thanks
Bernd.
> R.
>
>>
>> Thanks
>> Bernd.
>>
>> On 08/05/16 16:06, Richard Earnshaw (lists) wrote:
>>> On 05/08/16 13:49, Bernd Edlinger wrote:
>>>> On 08/05/16 11:29, Richard Earnshaw (lists) wrote:
>>>>> On 04/08/16 22:16, Bernd Edlinger wrote:
>>>>>> Hi,
>>>>>>
>>>>>> this patch introduces a new target hook that allows the target's
>>>>>> INITIAL_ELIMINATION_OFFSET function to use cached values instead of
>>>>>> re-computing the frame layout every time.
>>>>>>
>>>>>> I have updated the documentation a bit and hope it is clearer this time.
>>>>>>
>>>>>> It still needs a review by ARM port maintainers.
>>>>>>
>>>>>> If the ARM port maintainers find this patch useful, that would be fine.
>>>>>>
>>>>>
>>>>> I need to look into this more, but my first thought was that the
>>>>> documentation is confusing, or there is a real problem in this patch.
>>>>>
>>>>
>>>> Thanks for your quick response.
>>>>
>>>> The documentation is actually the most difficult part for me.
>>>>
>>>>> As I understand it the frame has to be re-laid out each time the
>>>>> contents of the frame changes (an extra register becomes live or another
>>>>> spill slot is needed). So saying that it is laid out once can't be
>>>>> right if (as I read it initially) you mean 'once per function' since I
>>>>> think it needs to be 'once for each time the frame contents changes'.
>>>>>
>>>>> Of course, things might be a bit different with LRA compared with
>>>>> reload, but I strongly suspect this is never going to be an 'exactly
>>>>> once per function' operation.
>>>>>
>>>>
>>>> Right. It will be done 2 or 3 times for each function.
>>>> LRA and reload behave identical in that respect.
>>>>
>>>> But each time reload changes something in the input data the
>>>> INITIAL_EMIMINATION_OFFSET is called several times, and the results
>>>> have to be consistent in each iteration.
>>>>
>>>> The frame layout function has no way to know if the frame layout
>>>> is expected to change or not.
>>>>
>>>> Many targets use reload_completed as an indication when the frame layout
>>>> may not change at all, but that is only an approximation.
>>>>
>>>>> Can you clarify your meaning in the documentation please?
>>>>>
>>>>
>>>> I meant 'once' in the sense of 'once for each time the frame contents
>>>> change'.
>>>>
>>>> Thus I'd change that sentence to:
>>>>
>>>> "This target hook allows the target to compute the frame layout once for
>>>> each time the frame contents change and make use of the cached frame
>>>> layout in @code{INITIAL_ELIMINATION_OFFSET} instead of re-computing it
>>>> on every invocation. This is particularly useful for targets that have
>>>> an expensive frame layout function. Implementing this callback is
>>>> optional."
>>>>
>>>
>>> Thanks, that's pretty much what I expected would be the case.
>>>
>>> Could I suggest:
>>>
>>> This target hook is called once each time the frame layout needs to be
>>> recalculated. The calculations can be cached by the target and can then
>>> be used by @code{INITIAL_ELIMINATION_OFFSET} instead of re-computing the
>>> layout on every invocation of that hook. This is particularly useful
>>> for targets that have an expensive frame layout function. Implementing
>>> this callback is optional.
>>>
>>> R.
>>>
>>>>
>>>> Thanks
>>>> Bernd.
>>>>
>>>>
>>>>> R.
>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Bernd.
>>>>>>
>>>>>> On 06/21/16 23:29, Jeff Law wrote:
>>>>>>> On 06/16/2016 08:47 AM, Bernd Edlinger wrote:
>>>>>>>> Hi!
>>>>>>>>
>>>>>>>>
>>>>>>>> By the design of the target hook INITIAL_ELIMINATION_OFFSET
>>>>>>>> it is necessary to call this function several times with
>>>>>>>> different register combinations.
>>>>>>>> Most targets use a cached data structure that describes the
>>>>>>>> exact frame layout of the current function.
>>>>>>>>
>>>>>>>> It is safe to skip the computation when reload_completed = true,
>>>>>>>> and most targets do that already.
>>>>>>>>
>>>>>>>> However while reload is doing its work, it is not clear when to
>>>>>>>> do the computation and when not. This results in unnecessary
>>>>>>>> work. Computing the frame layout can be a simple function or an
>>>>>>>> arbitrarily complex one, that walks all instructions of the current
>>>>>>>> function for instance, which is more or less the common case.
>>>>>>>>
>>>>>>>>
>>>>>>>> This patch adds a new optional target hook that can be used
>>>>>>>> by the target to factor the INITIAL_ELIMINATION_OFFSET-hook
>>>>>>>> into a O(n) computation part, and a O(1) result function.
>>>>>>>>
>>>>>>>> The patch implements a compute_frame_layout target hook just
>>>>>>>> for ARM in the moment, to show the principle.
>>>>>>>> Other targets may also implement that hook, if it seems appropriate.
>>>>>>>>
>>>>>>>>
>>>>>>>> Boot-strapped and reg-tested on arm-linux-gnueabihf.
>>>>>>>> OK for trunk?
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Bernd.
>>>>>>>>
>>>>>>>>
>>>>>>>> changelog-frame-layout.txt
>>>>>>>>
>>>>>>>>
>>>>>>>> 2016-06-16 Bernd Edlinger <bernd.edlinger@hotmail.de>
>>>>>>>>
>>>>>>>> * target.def (compute_frame_layout): New optional target hook.
>>>>>>>> * doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
>>>>>>>> * doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
>>>>>>>> * lra-eliminations.c (update_reg_eliminate): Call
>>>>>>>> compute_frame_layout
>>>>>>>> target hook.
>>>>>>>> * reload1.c (verify_initial_elim_offsets): Likewise.
>>>>>>>> * config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
>>>>>>>> (use_simple_return_p): Call arm_compute_frame_layout if needed.
>>>>>>>> (arm_get_frame_offsets): Split up into this ...
>>>>>>>> (arm_compute_frame_layout): ... and this function.
>>>>>>> The ARM maintainers would need to chime in on the ARM specific changes
>>>>>>> though.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Index: gcc/target.def
>>>>>>>> ===================================================================
>>>>>>>> --- gcc/target.def (Revision 233176)
>>>>>>>> +++ gcc/target.def (Arbeitskopie)
>>>>>>>> @@ -5245,8 +5245,19 @@ five otherwise. This is best for most machines.",
>>>>>>>> unsigned int, (void),
>>>>>>>> default_case_values_threshold)
>>>>>>>>
>>>>>>>> -/* Retutn true if a function must have and use a frame pointer. */
>>>>>>> s/Retutn/Return
>>>>>>>
>>>>>>>> +/* Optional callback to advise the target to compute the frame
>>>>>>>> layout. */
>>>>>>>> DEFHOOK
>>>>>>>> +(compute_frame_layout,
>>>>>>>> + "This target hook is called immediately before reload wants to call\n\
>>>>>>>> +@code{INITIAL_ELIMINATION_OFFSET} and allows the target to cache the
>>>>>>>> frame\n\
>>>>>>>> +layout instead of re-computing it on every invocation. This is
>>>>>>>> particularly\n\
>>>>>>>> +useful for targets that have an O(n) frame layout function.
>>>>>>>> Implementing\n\
>>>>>>>> +this callback is optional.",
>>>>>>>> + void, (void),
>>>>>>>> + hook_void_void)
>>>>>>> So the docs say "immediately before", but that's not actually reality in
>>>>>>> lra-eliminations. I think you can just say "This target hook is called
>>>>>>> before reload or lra-eliminations calls
>>>>>>> @code{INITIAL_ELIMINATION_OFFSET} and allows ..."
>>>>>>>
>>>>>>>
>>>>>>> How does this macro interact with INITIAL_FRAME_POINTER_OFFSET?
>>>>>>>
>>>>>>> I'm OK with this conceptually. I think you need a minor doc update and
>>>>>>> OK from the ARM maintainers before it can be installed though.
>>>>>>>
>>>>>>> jeff
>>>>>>>
>>>>>>> changelog-frame-layout.txt
>>>>>>>
>>>>>>>
>>>>>>> 2016-06-16 Bernd Edlinger <bernd.edlinger@hotmail.de>
>>>>>>>
>>>>>>> * target.def (compute_frame_layout): New optional target hook.
>>>>>>> * doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
>>>>>>> * doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
>>>>>>> * lra-eliminations.c (update_reg_eliminate): Call compute_frame_layout
>>>>>>> target hook.
>>>>>>> * reload1.c (verify_initial_elim_offsets): Likewise.
>>>>>>> * config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
>>>>>>> (use_simple_return_p): Call arm_compute_frame_layout if needed.
>>>>>>> (arm_get_frame_offsets): Split up into this ...
>>>>>>> (arm_compute_frame_layout): ... and this function.
>>>>>>>
>>>>>>> patch-frame-layout.diff
>>>>>>>
>>>>>>>
>>>>>>> Index: gcc/config/arm/arm.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/config/arm/arm.c (revision 239144)
>>>>>>> +++ gcc/config/arm/arm.c (working copy)
>>>>>>> @@ -81,6 +81,7 @@ static bool arm_const_not_ok_for_debug_p (rtx);
>>>>>>> static bool arm_needs_doubleword_align (machine_mode, const_tree);
>>>>>>> static int arm_compute_static_chain_stack_bytes (void);
>>>>>>> static arm_stack_offsets *arm_get_frame_offsets (void);
>>>>>>> +static void arm_compute_frame_layout (void);
>>>>>>> static void arm_add_gc_roots (void);
>>>>>>> static int arm_gen_constant (enum rtx_code, machine_mode, rtx,
>>>>>>> unsigned HOST_WIDE_INT, rtx, rtx, int, int);
>>>>>>> @@ -663,6 +664,9 @@ static const struct attribute_spec arm_attribute_t
>>>>>>> #undef TARGET_SCALAR_MODE_SUPPORTED_P
>>>>>>> #define TARGET_SCALAR_MODE_SUPPORTED_P arm_scalar_mode_supported_p
>>>>>>>
>>>>>>> +#undef TARGET_COMPUTE_FRAME_LAYOUT
>>>>>>> +#define TARGET_COMPUTE_FRAME_LAYOUT arm_compute_frame_layout
>>>>>>> +
>>>>>>> #undef TARGET_FRAME_POINTER_REQUIRED
>>>>>>> #define TARGET_FRAME_POINTER_REQUIRED arm_frame_pointer_required
>>>>>>>
>>>>>>> @@ -3880,6 +3884,10 @@ use_simple_return_p (void)
>>>>>>> {
>>>>>>> arm_stack_offsets *offsets;
>>>>>>>
>>>>>>> + /* Note this function can be called before or after reload. */
>>>>>>> + if (!reload_completed)
>>>>>>> + arm_compute_frame_layout ();
>>>>>>> +
>>>>>>> offsets = arm_get_frame_offsets ();
>>>>>>> return offsets->outgoing_args != 0;
>>>>>>> }
>>>>>>> @@ -19370,7 +19378,7 @@ arm_compute_static_chain_stack_bytes (void)
>>>>>>>
>>>>>>> /* Compute a bit mask of which registers need to be
>>>>>>> saved on the stack for the current function.
>>>>>>> - This is used by arm_get_frame_offsets, which may add extra registers. */
>>>>>>> + This is used by arm_compute_frame_layout, which may add extra registers. */
>>>>>>>
>>>>>>> static unsigned long
>>>>>>> arm_compute_save_reg_mask (void)
>>>>>>> @@ -20926,12 +20934,25 @@ any_sibcall_could_use_r3 (void)
>>>>>>> alignment. */
>>>>>>>
>>>>>>>
>>>>>>> +/* Return cached stack offsets. */
>>>>>>> +
>>>>>>> +static arm_stack_offsets *
>>>>>>> +arm_get_frame_offsets (void)
>>>>>>> +{
>>>>>>> + struct arm_stack_offsets *offsets;
>>>>>>> +
>>>>>>> + offsets = &cfun->machine->stack_offsets;
>>>>>>> +
>>>>>>> + return offsets;
>>>>>>> +}
>>>>>>> +
>>>>>>> +
>>>>>>> /* Calculate stack offsets. These are used to calculate register elimination
>>>>>>> offsets and in prologue/epilogue code. Also calculates which registers
>>>>>>> should be saved. */
>>>>>>>
>>>>>>> -static arm_stack_offsets *
>>>>>>> -arm_get_frame_offsets (void)
>>>>>>> +static void
>>>>>>> +arm_compute_frame_layout (void)
>>>>>>> {
>>>>>>> struct arm_stack_offsets *offsets;
>>>>>>> unsigned long func_type;
>>>>>>> @@ -20943,19 +20964,6 @@ any_sibcall_could_use_r3 (void)
>>>>>>>
>>>>>>> offsets = &cfun->machine->stack_offsets;
>>>>>>>
>>>>>>> - /* We need to know if we are a leaf function. Unfortunately, it
>>>>>>> - is possible to be called after start_sequence has been called,
>>>>>>> - which causes get_insns to return the insns for the sequence,
>>>>>>> - not the function, which will cause leaf_function_p to return
>>>>>>> - the incorrect result.
>>>>>>> -
>>>>>>> - to know about leaf functions once reload has completed, and the
>>>>>>> - frame size cannot be changed after that time, so we can safely
>>>>>>> - use the cached value. */
>>>>>>> -
>>>>>>> - if (reload_completed)
>>>>>>> - return offsets;
>>>>>>> -
>>>>>>> /* Initially this is the size of the local variables. It will translated
>>>>>>> into an offset once we have determined the size of preceding data. */
>>>>>>> frame_size = ROUND_UP_WORD (get_frame_size ());
>>>>>>> @@ -21022,7 +21030,7 @@ any_sibcall_could_use_r3 (void)
>>>>>>> {
>>>>>>> offsets->outgoing_args = offsets->soft_frame;
>>>>>>> offsets->locals_base = offsets->soft_frame;
>>>>>>> - return offsets;
>>>>>>> + return;
>>>>>>> }
>>>>>>>
>>>>>>> /* Ensure SFP has the correct alignment. */
>>>>>>> @@ -21098,8 +21106,6 @@ any_sibcall_could_use_r3 (void)
>>>>>>> offsets->outgoing_args += 4;
>>>>>>> gcc_assert (!(offsets->outgoing_args & 7));
>>>>>>> }
>>>>>>> -
>>>>>>> - return offsets;
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> Index: gcc/doc/tm.texi
>>>>>>> ===================================================================
>>>>>>> --- gcc/doc/tm.texi (revision 239144)
>>>>>>> +++ gcc/doc/tm.texi (working copy)
>>>>>>> @@ -3693,6 +3693,14 @@ registers. This macro must be defined if @code{EL
>>>>>>> defined.
>>>>>>> @end defmac
>>>>>>>
>>>>>>> +@deftypefn {Target Hook} void TARGET_COMPUTE_FRAME_LAYOUT (void)
>>>>>>> +This target hook allows the target to compute the frame layout once and
>>>>>>> +make use of the cached frame layout in @code{INITIAL_ELIMINATION_OFFSET}
>>>>>>> +instead of re-computing it on every invocation. This is particularly
>>>>>>> +useful for targets that have an expensive frame layout function.
>>>>>>> +Implementing this callback is optional.
>>>>>>> +@end deftypefn
>>>>>>> +
>>>>>>> @node Stack Arguments
>>>>>>> @subsection Passing Function Arguments on the Stack
>>>>>>> @cindex arguments on stack
>>>>>>> Index: gcc/doc/tm.texi.in
>>>>>>> ===================================================================
>>>>>>> --- gcc/doc/tm.texi.in (revision 239144)
>>>>>>> +++ gcc/doc/tm.texi.in (working copy)
>>>>>>> @@ -3227,6 +3227,8 @@ registers. This macro must be defined if @code{EL
>>>>>>> defined.
>>>>>>> @end defmac
>>>>>>>
>>>>>>> +@hook TARGET_COMPUTE_FRAME_LAYOUT
>>>>>>> +
>>>>>>> @node Stack Arguments
>>>>>>> @subsection Passing Function Arguments on the Stack
>>>>>>> @cindex arguments on stack
>>>>>>> Index: gcc/lra-eliminations.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/lra-eliminations.c (revision 239144)
>>>>>>> +++ gcc/lra-eliminations.c (working copy)
>>>>>>> @@ -1203,6 +1203,10 @@ update_reg_eliminate (bitmap insns_with_changed_of
>>>>>>> struct lra_elim_table *ep, *ep1;
>>>>>>> HARD_REG_SET temp_hard_reg_set;
>>>>>>>
>>>>>>> +#ifdef ELIMINABLE_REGS
>>>>>>> + targetm.compute_frame_layout ();
>>>>>>> +#endif
>>>>>>> +
>>>>>>> /* Clear self elimination offsets. */
>>>>>>> for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++)
>>>>>>> self_elim_offsets[ep->from] = 0;
>>>>>>> Index: gcc/reload1.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/reload1.c (revision 239144)
>>>>>>> +++ gcc/reload1.c (working copy)
>>>>>>> @@ -3831,6 +3831,7 @@ verify_initial_elim_offsets (void)
>>>>>>> {
>>>>>>> struct elim_table *ep;
>>>>>>>
>>>>>>> + targetm.compute_frame_layout ();
>>>>>>> for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++)
>>>>>>> {
>>>>>>> INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, t);
>>>>>>> @@ -3855,6 +3856,7 @@ set_initial_elim_offsets (void)
>>>>>>> struct elim_table *ep = reg_eliminate;
>>>>>>>
>>>>>>> #ifdef ELIMINABLE_REGS
>>>>>>> + targetm.compute_frame_layout ();
>>>>>>> for (; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++)
>>>>>>> {
>>>>>>> INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, ep->initial_offset);
>>>>>>> Index: gcc/target.def
>>>>>>> ===================================================================
>>>>>>> --- gcc/target.def (revision 239144)
>>>>>>> +++ gcc/target.def (working copy)
>>>>>>> @@ -5269,8 +5269,19 @@ five otherwise. This is best for most machines.",
>>>>>>> unsigned int, (void),
>>>>>>> default_case_values_threshold)
>>>>>>>
>>>>>>> -/* Retutn true if a function must have and use a frame pointer. */
>>>>>>> +/* Optional callback to advise the target to compute the frame layout. */
>>>>>>> DEFHOOK
>>>>>>> +(compute_frame_layout,
>>>>>>> + "This target hook allows the target to compute the frame layout once and\n\
>>>>>>> +make use of the cached frame layout in @code{INITIAL_ELIMINATION_OFFSET}\n\
>>>>>>> +instead of re-computing it on every invocation. This is particularly\n\
>>>>>>> +useful for targets that have an expensive frame layout function.\n\
>>>>>>> +Implementing this callback is optional.",
>>>>>>> + void, (void),
>>>>>>> + hook_void_void)
>>>>>>> +
>>>>>>> +/* Return true if a function must have and use a frame pointer. */
>>>>>>> +DEFHOOK
>>>>>>> (frame_pointer_required,
>>>>>>> "This target hook should return @code{true} if a function must have and use\n\
>>>>>>> a frame pointer. This target hook is called in the reload pass. If its return\n\
>>>>>
>>>>
>>>
>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: changelog-frame-layout.txt
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20170506/1cf69149/attachment.txt>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch-frame-layout.diff
Type: text/x-patch
Size: 6915 bytes
Desc: patch-frame-layout.diff
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20170506/1cf69149/attachment.bin>
More information about the Gcc-patches
mailing list