This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
PING: [Updated, PATCH] i386: Avoid stack realignment if possible
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: Uros Bizjak <ubizjak at gmail dot com>, Jakub Jelinek <jakub at redhat dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Fri, 1 Sep 2017 11:48:58 -0700
- Subject: PING: [Updated, PATCH] i386: Avoid stack realignment if possible
- Authentication-results: sourceware.org; auth=none
On Sun, Aug 13, 2017 at 3:02 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Aug 07, 2017 at 08:58:49AM -0700, H.J. Lu wrote:
>> On Tue, Jul 25, 2017 at 7:54 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> > On Tue, Jul 25, 2017 at 3:52 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> >> On Fri, Jul 14, 2017 at 4:46 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> >>> On Fri, Jul 7, 2017 at 5:56 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> >>>> On Fri, Jul 07, 2017 at 09:58:42AM -0700, H.J. Lu wrote:
>> >>>>> On Fri, Dec 20, 2013 at 8:06 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>> >>>>> > Hi!
>> >>>>> >
>> >>>>> > Honza recently changed the i?86 backend, so that it often doesn't
>> >>>>> > do -maccumulate-outgoing-args by default on x86_64.
>> >>>>> > Unfortunately, on some of the here included testcases this regressed
>> >>>>> > quite a bit the generated code. As AVX vectors are used, the dynamic
>> >>>>> > realignment code needs to assume e.g. that some of them will need to be
>> >>>>> > spilled, and for -mno-accumulate-outgoing-args the code needs to set
>> >>>>> > need_drap early as well. But in when emitting the prologue/epilogue,
>> >>>>> > if need_drap is set, we don't perform the optimization for leaf functions
>> >>>>> > which have zero size stack frame, thus we end up with uselessly doing
>> >>>>> > dynamic stack realignment, setting up DRAP that nothing uses and later on
>> >>>>> > restore everything back.
>> >>>>> >
>> >>>>> > This patch improves it, if the DRAP register isn't live at the start of
>> >>>>> > entry bb successor and we aren't going to realign the stack, we don't
>> >>>>> > need DRAP at all, and even if we need DRAP register, that can't be the sole
>> >>>>> > reason for doing stack realignment, the prologue code is able to set up DRAP
>> >>>>> > even without dynamic stack realignment.
>> >>>>> >
>> >>>>> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>> >>>>> >
>> >>>>> > 2013-12-20 Jakub Jelinek <jakub@redhat.com>
>> >>>>> >
>> >>>>> > PR target/59501
>> >>>>> > * config/i386/i386.c (ix86_save_reg): Don't return true for drap_reg
>> >>>>> > if !crtl->stack_realign_needed.
>> >>>>> > (ix86_finalize_stack_realign_flags): If drap_reg isn't live on entry
>> >>>>> > and stack_realign_needed will be false, clear drap_reg and need_drap.
>> >>>>> > Optimize leaf functions that don't need stack frame even if
>> >>>>> > crtl->need_drap.
>> >>>>> >
>> >>>>> > * gcc.target/i386/pr59501-1.c: New test.
>> >>>>> > * gcc.target/i386/pr59501-1a.c: New test.
>> >>>>> > * gcc.target/i386/pr59501-2.c: New test.
>> >>>>> > * gcc.target/i386/pr59501-2a.c: New test.
>> >>>>> > * gcc.target/i386/pr59501-3.c: New test.
>> >>>>> > * gcc.target/i386/pr59501-3a.c: New test.
>> >>>>> > * gcc.target/i386/pr59501-4.c: New test.
>> >>>>> > * gcc.target/i386/pr59501-4a.c: New test.
>> >>>>> > * gcc.target/i386/pr59501-5.c: New test.
>> >>>>> > * gcc.target/i386/pr59501-6.c: New test.
>> >
>> > LGTM, assuming Jakub is OK with the patch.
>> >
>> > Thanks,
>> > Uros.
>>
>> Jakub, can you take a look at this:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00400.html
>>
>
> Here is the updated patch to fix
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81769
>
> OK for trunk?
>
> Thanks.
>
> H.J.
> ---
> ix86_finalize_stack_frame_flags has been extended to eliminate frame
> pointer when the new stack frame isn't needed with and without
> -maccumulate-outgoing-args as well as -fomit-frame-pointer. Since stack
> access with larger alignment may be optimized out, to decide if stack
> realignment is needed, we need to not only check for stack frame access,
> but also verify the alignment of stack frame access. Since alignment of
> memory access via arg_pointer is set up by caller, not by callee, we
> should find the maximum stack alignment from the stack frame access
> instructions via stack pointer and frame pointrer to avoid stack
> realignment when stack alignment needed is less than incoming stack
> boundary.
>
> gcc/
>
> PR target/59501
> PR target/81624
> PR target/81769
> * config/i386/i386.c (ix86_finalize_stack_frame_flags): Don't
> realign stack if stack alignment needed is less than incoming
> stack boundary.
>
> gcc/testsuite/
>
> PR target/59501
> PR target/81624
> PR target/81769
> * gcc.target/i386/pr59501-4a.c: Remove xfail.
> * gcc.target/i386/pr81769-1a.c: New test.
> * gcc.target/i386/pr81769-1b.c: Likewise.
> * gcc.target/i386/pr81769-2.c: Likewise.
> ---
> gcc/config/i386/i386.c | 143 ++++++++++++++++++-----------
> gcc/testsuite/gcc.target/i386/pr59501-4a.c | 2 +-
> gcc/testsuite/gcc.target/i386/pr81769-1a.c | 21 +++++
> gcc/testsuite/gcc.target/i386/pr81769-1b.c | 7 ++
> gcc/testsuite/gcc.target/i386/pr81769-2.c | 21 +++++
> 5 files changed, 138 insertions(+), 56 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/i386/pr81769-1a.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr81769-1b.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr81769-2.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 1d88e4f247a..2161f79b999 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -14237,6 +14237,11 @@ ix86_finalize_stack_frame_flags (void)
> add_to_hard_reg_set (&set_up_by_prologue, Pmode, ARG_POINTER_REGNUM);
> add_to_hard_reg_set (&set_up_by_prologue, Pmode,
> HARD_FRAME_POINTER_REGNUM);
> +
> + /* The preferred stack alignment is the minimum stack alignment. */
> + unsigned int stack_alignment = crtl->preferred_stack_boundary;
> + bool require_stack_frame = false;
> +
> FOR_EACH_BB_FN (bb, cfun)
> {
> rtx_insn *insn;
> @@ -14245,79 +14250,107 @@ ix86_finalize_stack_frame_flags (void)
> && requires_stack_frame_p (insn, prologue_used,
> set_up_by_prologue))
> {
> - if (crtl->stack_realign_needed != stack_realign)
> - recompute_frame_layout_p = true;
> - crtl->stack_realign_needed = stack_realign;
> - crtl->stack_realign_finalized = true;
> - if (recompute_frame_layout_p)
> - ix86_compute_frame_layout ();
> - return;
> + require_stack_frame = true;
> +
> + if (stack_realign)
> + {
> + /* Find the maximum stack alignment. */
> + subrtx_iterator::array_type array;
> + FOR_EACH_SUBRTX (iter, array, PATTERN (insn), ALL)
> + if (MEM_P (*iter)
> + && (reg_mentioned_p (stack_pointer_rtx,
> + *iter)
> + || reg_mentioned_p (frame_pointer_rtx,
> + *iter)))
> + {
> + unsigned int alignment = MEM_ALIGN (*iter);
> + if (alignment > stack_alignment)
> + stack_alignment = alignment;
> + }
> + }
> }
> }
>
> - /* If drap has been set, but it actually isn't live at the start
> - of the function, there is no reason to set it up. */
> - if (crtl->drap_reg)
> + if (require_stack_frame)
> {
> - basic_block bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb;
> - if (! REGNO_REG_SET_P (DF_LR_IN (bb), REGNO (crtl->drap_reg)))
> + /* Stack frame is required. If stack alignment needed is less
> + than incoming stack boundary, don't realign stack. */
> + stack_realign = incoming_stack_boundary < stack_alignment;
> + if (!stack_realign)
> {
> - crtl->drap_reg = NULL_RTX;
> - crtl->need_drap = false;
> + crtl->max_used_stack_slot_alignment
> + = incoming_stack_boundary;
> + crtl->stack_alignment_needed
> + = incoming_stack_boundary;
> }
> }
> else
> - cfun->machine->no_drap_save_restore = true;
> -
> - frame_pointer_needed = false;
> - stack_realign = false;
> - crtl->max_used_stack_slot_alignment = incoming_stack_boundary;
> - crtl->stack_alignment_needed = incoming_stack_boundary;
> - crtl->stack_alignment_estimated = incoming_stack_boundary;
> - if (crtl->preferred_stack_boundary > incoming_stack_boundary)
> - crtl->preferred_stack_boundary = incoming_stack_boundary;
> - df_finish_pass (true);
> - df_scan_alloc (NULL);
> - df_scan_blocks ();
> - df_compute_regs_ever_live (true);
> - df_analyze ();
> -
> - if (flag_var_tracking)
> {
> - /* Since frame pointer is no longer available, replace it with
> - stack pointer - UNITS_PER_WORD in debug insns. */
> - df_ref ref, next;
> - for (ref = DF_REG_USE_CHAIN (HARD_FRAME_POINTER_REGNUM);
> - ref; ref = next)
> + /* If drap has been set, but it actually isn't live at the
> + start of the function, there is no reason to set it up. */
> + if (crtl->drap_reg)
> {
> - rtx_insn *insn = DF_REF_INSN (ref);
> - /* Make sure the next ref is for a different instruction,
> - so that we're not affected by the rescan. */
> - next = DF_REF_NEXT_REG (ref);
> - while (next && DF_REF_INSN (next) == insn)
> - next = DF_REF_NEXT_REG (next);
> -
> - if (DEBUG_INSN_P (insn))
> + basic_block bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb;
> + if (! REGNO_REG_SET_P (DF_LR_IN (bb),
> + REGNO (crtl->drap_reg)))
> + {
> + crtl->drap_reg = NULL_RTX;
> + crtl->need_drap = false;
> + }
> + }
> + else
> + cfun->machine->no_drap_save_restore = true;
> +
> + frame_pointer_needed = false;
> + stack_realign = false;
> + crtl->max_used_stack_slot_alignment = incoming_stack_boundary;
> + crtl->stack_alignment_needed = incoming_stack_boundary;
> + crtl->stack_alignment_estimated = incoming_stack_boundary;
> + if (crtl->preferred_stack_boundary > incoming_stack_boundary)
> + crtl->preferred_stack_boundary = incoming_stack_boundary;
> + df_finish_pass (true);
> + df_scan_alloc (NULL);
> + df_scan_blocks ();
> + df_compute_regs_ever_live (true);
> + df_analyze ();
> +
> + if (flag_var_tracking)
> + {
> + /* Since frame pointer is no longer available, replace it with
> + stack pointer - UNITS_PER_WORD in debug insns. */
> + df_ref ref, next;
> + for (ref = DF_REG_USE_CHAIN (HARD_FRAME_POINTER_REGNUM);
> + ref; ref = next)
> {
> - bool changed = false;
> - for (; ref != next; ref = DF_REF_NEXT_REG (ref))
> + rtx_insn *insn = DF_REF_INSN (ref);
> + /* Make sure the next ref is for a different instruction,
> + so that we're not affected by the rescan. */
> + next = DF_REF_NEXT_REG (ref);
> + while (next && DF_REF_INSN (next) == insn)
> + next = DF_REF_NEXT_REG (next);
> +
> + if (DEBUG_INSN_P (insn))
> {
> - rtx *loc = DF_REF_LOC (ref);
> - if (*loc == hard_frame_pointer_rtx)
> + bool changed = false;
> + for (; ref != next; ref = DF_REF_NEXT_REG (ref))
> {
> - *loc = plus_constant (Pmode,
> - stack_pointer_rtx,
> - -UNITS_PER_WORD);
> - changed = true;
> + rtx *loc = DF_REF_LOC (ref);
> + if (*loc == hard_frame_pointer_rtx)
> + {
> + *loc = plus_constant (Pmode,
> + stack_pointer_rtx,
> + -UNITS_PER_WORD);
> + changed = true;
> + }
> }
> + if (changed)
> + df_insn_rescan (insn);
> }
> - if (changed)
> - df_insn_rescan (insn);
> }
> }
> - }
>
> - recompute_frame_layout_p = true;
> + recompute_frame_layout_p = true;
> + }
> }
>
> if (crtl->stack_realign_needed != stack_realign)
> diff --git a/gcc/testsuite/gcc.target/i386/pr59501-4a.c b/gcc/testsuite/gcc.target/i386/pr59501-4a.c
> index 5c3cb683a2e..908c7f457b6 100644
> --- a/gcc/testsuite/gcc.target/i386/pr59501-4a.c
> +++ b/gcc/testsuite/gcc.target/i386/pr59501-4a.c
> @@ -5,4 +5,4 @@
> #include "pr59501-3a.c"
>
> /* Verify no dynamic realignment is performed. */
> -/* { dg-final { scan-assembler-not "and\[^\n\r]*sp" { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-not "and\[^\n\r]*sp" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr81769-1a.c b/gcc/testsuite/gcc.target/i386/pr81769-1a.c
> new file mode 100644
> index 00000000000..8ebe7292184
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr81769-1a.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -mavx" } */
> +
> +typedef int v8si __attribute__ ((vector_size (32)));
> +typedef unsigned long long int u64 __attribute__ ((aligned(64)));
> +
> +
> +void
> +#ifndef __x86_64__
> +__attribute__((regparm(3)))
> +#endif
> +foo (u64 *idx, v8si *out_start, v8si *regions)
> +{
> + if (*idx < 20 ) {
> + v8si base = regions[*idx];
> + *out_start = base;
> + }
> +}
> +
> +/* Verify no dynamic realignment is performed. */
> +/* { dg-final { scan-assembler-not "and\[^\n\r]*sp" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr81769-1b.c b/gcc/testsuite/gcc.target/i386/pr81769-1b.c
> new file mode 100644
> index 00000000000..6505a5f0074
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr81769-1b.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -mavx -fno-omit-frame-pointer" } */
> +
> +#include "pr81769-1a.c"
> +
> +/* Verify no dynamic realignment is performed. */
> +/* { dg-final { scan-assembler-not "and\[^\n\r]*sp" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr81769-2.c b/gcc/testsuite/gcc.target/i386/pr81769-2.c
> new file mode 100644
> index 00000000000..e020db20227
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr81769-2.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -mavx -fno-omit-frame-pointer" } */
> +
> +typedef unsigned long long int u64 __attribute__ ((aligned(64)));
> +
> +void
> +#ifndef __x86_64__
> +__attribute__((regparm(3)))
> +#endif
> +foo (u64 *idx, unsigned int *out_start, unsigned int *out_end,
> + unsigned int *regions)
> +{
> + if (*idx < 20 ) {
> + unsigned int base = regions[*idx];
> + *out_start = base;
> + *out_end = base;
> + }
> +}
> +
> +/* Verify no dynamic realignment is performed. */
> +/* { dg-final { scan-assembler-not "and\[^\n\r]*sp" } } */
> --
> 2.13.4
>
PING.
--
H.J.