On Wed, May 25, 2016 at 02:30:54PM +0100, Dominik Vogt wrote:
> On Tue, May 03, 2016 at 03:17:53PM +0100, Dominik Vogt wrote:
> > Version two of the patch including a test case.
> >
> > On Mon, May 02, 2016 at 09:10:25AM -0600, Jeff Law wrote:
> > > On 04/29/2016 04:12 PM, Dominik Vogt wrote:
> > > >The attached patch removes excess stack space allocation with
> > > >alloca in some situations. Plese check the commit message in the
> > > >patch for details.
> >
> > > However, I would strongly recommend some tests, even if they are
> > > target specific. You can always copy pr36728-1 into the s390x
> > > directory and look at size of the generated stack. Simliarly for
> > > pr50938 for x86.
> >
> > However, x86 uses the "else" branch in round_push, i.e. it uses
> > "virtual_preferred_stack_boundary_rtx" to calculate the number of
> > bytes to add for stack alignment. That value is unknown at the
> > time round_push is called, so the test case fails on such targets,
> > and I've no idea how to fix this properly.
>
> Third version of the patch with the suggested cleanup in the first
> patch and the functional stuff in the second one. The first patch
> is based on Jeff's draft with the change suggested by Eric and
> more cleanup added by me.
This is the updated funtional patch. Re-tested with limited
effort, i.e. tested and bootstrapped on s390x biarch (but did not
look for performance regressions compared to version 2 of the
patch).
Ciao
Dominik ^_^ ^_^
-- Dominik Vogt IBM Germany
0002-v3-ChangeLog
gcc/ChangeLog
* explow.c (round_push): Use know adjustment.
(allocate_dynamic_stack_space): Pass known adjustment to round_push.
gcc/testsuite/ChangeLog
* gcc.dg/pr50938.c: New test.
0002-v3-Drop-excess-size-used-for-run-time-allocated-stack-v.patch
From 4296d353e1d153b5b5ee435a44cae6117bf2fff0 Mon Sep 17 00:00:00 2001
From: Dominik Vogt <vogt@linux.vnet.ibm.com>
Date: Fri, 29 Apr 2016 08:36:59 +0100
Subject: [PATCH 2/2] Drop excess size used for run time allocated stack
variables.
The present calculation sometimes led to more stack memory being used than
necessary with alloca. First, (STACK_BOUNDARY -1) would be added to the
allocated size:
size = plus_constant (Pmode, size, extra);
size = force_operand (size, NULL_RTX);
Then round_push was called and added another (STACK_BOUNDARY - 1) before
rounding down to a multiple of STACK_BOUNDARY. On s390x this resulted in
adding 14 before rounding down for "x" in the test case pr36728-1.c.
round_push() now takes an argument to inform it about what has already been
added to size.
---
gcc/explow.c | 45 +++++++++++++++++++++---------------
gcc/testsuite/gcc.dg/pr50938.c | 52 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 79 insertions(+), 18 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/pr50938.c
diff --git a/gcc/explow.c b/gcc/explow.c
index 09a0330..85596e2 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -949,24 +949,30 @@ anti_adjust_stack (rtx adjust)
}
/* Round the size of a block to be pushed up to the boundary required
- by this machine. SIZE is the desired size, which need not be constant. */
+ by this machine. SIZE is the desired size, which need not be constant.
+ ALREADY_ADDED is the number of units that have already been added to SIZE
+ for other alignment reasons.
+*/
static rtx
-round_push (rtx size)
+round_push (rtx size, int already_added)
{
- rtx align_rtx, alignm1_rtx;
+ rtx align_rtx, add_rtx;
if (!SUPPORTS_STACK_ALIGNMENT
|| crtl->preferred_stack_boundary == MAX_SUPPORTED_STACK_ALIGNMENT)
{
int align = crtl->preferred_stack_boundary / BITS_PER_UNIT;
+ int add;
if (align == 1)
return size;
+ add = (align > already_added) ? align - already_added - 1 : 0;
+
if (CONST_INT_P (size))
{
- HOST_WIDE_INT new_size = (INTVAL (size) + align - 1) / align * align;
+ HOST_WIDE_INT new_size = (INTVAL (size) + add) / align * align;
if (INTVAL (size) != new_size)
size = GEN_INT (new_size);