This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall



On 01/05/15 19:51, Jeff Law wrote:
On 04/28/2015 03:54 AM, Kyrill Tkachov wrote:

On 27/04/15 21:13, Jeff Law wrote:
On 04/21/2015 11:33 AM, Kyrill Tkachov wrote:
On 21/04/15 15:09, Jeff Law wrote:
On 04/21/2015 02:30 AM, Kyrill Tkachov wrote:
   From reading config/stormy16/stormy-abi it seems to me that we
don't
pass arguments partially in stormy16, so this code would never be
called
there. That leaves pa as the potential problematic target.
I don't suppose there's an easy way to test on pa? My checkout of
binutils
doesn't seem to include a sim target for it.
No simulator, no machines in the testfarm, the box I had access to via
parisc-linux.org seems dead and my ancient PA overheats well before a
bootstrap could complete.  I often regret knowing about the backwards
way many things were done on the PA because it makes me think about
cases that only matter on dead architectures.
So what should be the action plan here? I can't add an assert on
positive result as a negative result is valid.

We want to catch the case where this would cause trouble on
pa, or change the patch until we're confident that it's fine
for pa.

That being said, reading the documentation of STACK_GROWS_UPWARD
and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case
where this would cause trouble on pa.

Is the problem that in the function:

+/* Add SIZE to X and check whether it's greater than Y.
+   If it is, return the constant amount by which it's greater or
smaller.
+   If the two are not statically comparable (for example, X and Y
contain
+   different registers) return -1.  This is used in expand_push_insn to
+   figure out if reading SIZE bytes from location X will end up reading
from
+   location Y.  */
+static int
+memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
+{
+  rtx tmp = plus_constant (Pmode, x, size);
+  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
+
+  if (!CONST_INT_P (sub))
+    return -1;
+
+  return INTVAL (sub);
+}

for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x,
so the function should something like the following?
So I had to go back and compile some simple examples.

References to outgoing arguments will be SP relative. References to the
incoming arguments will be ARGP relative.  And that brings me to the
another issue.  Isn't X in this context the incoming argument slot and
the destination an outgoing argument slot?

If so, the approach of memory_load_overlap simply won't work on a target
with calling conventions like the PA.  And you might really want to
consider punting for these kind of calling conventions

Ok, thanks for the guidance.
How about this? This patch disables sibcall optimisation when
encountering a partial argument when ARGS_GROW_DOWNWARD &&
!STACK_GROWS_DOWNWARD.
Hopefully this shouldn't harm codegen on parisc if, as you say, it's
rare to have
partial arguments anyway on PA due to the large number of argument regs.

I tested this on arm and bootstrapped on x86_64.
I am now going through the process of getting access to a Debian PA
machine to
give it a test there (thanks Dave!)

Ok if testing comes clean?

Thanks,
Kyrill

2015-04-28  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR target/65358
     * calls.c (expand_call): Cancel sibcall optimisation when encountering
     partial argument on targets with ARGS_GROW_DOWNWARD and
     !STACK_GROWS_DOWNWARD.
     * expr.c (memory_load_overlap): New function.
     (emit_push_insn): When pushing partial args to the stack would
     clobber the register part load the overlapping part into a pseudo
     and put it into the hard reg after pushing.

2015-04-28  Honggyu Kim  <hong.gyu.kim@lge.com>

     PR target/65358
     * gcc.dg/pr65358.c: New test.
The more I think about this, the more I think it's an ugly can of worms and maybe we should just disable sibcalls for partial arguments.  I doubt it's a big performance issue in general.

We already have quite a bit of code in calls.c to detect cases with partial argument overlap for the
explicit purpose of allowing sibcalls when partial arguments occur in the general case. However, that
code only detects when a partial argument overlaps with other arguments in a call. In this PR the
partial argument overlaps with itself. It would be a shame to disable sibcalls for all partial arguments
when there is already infrastructure in place to handle them.



In addition to the argument/stack direction stuff, I've been pondering the stack/frame/arg pointer issues. Your approach assumes that the incoming and outgoing areas are always referenced off the same base register. If they aren't, then the routine returns no overlap.

But we'd need to consider the case where we have a reference to the arg or frame pointer which later gets rewritten into a stack pointer relative address.

Is it too late at the point were you do the checks to reject the sibling call? If not, then maybe the overlap routine should return a tri-state. No overlap, overlap, don't know. The last would be used when the two addresses use a different register.

Ok, here is my attempt at that. The overlap functions returns -2 when it cannot staticall compare the
two pointers (i.e. when the base registers are different) and the caller then disables sibcalls.
The code in calls.c that calls this code will undo any emitted instructions in the meantime if sibcall
optimisation fails. This required me to change the type of emit_push_insn to bool and add an extra
parameter, so this patch touches a bit more code than the original version.

Bootstrapped on x86_64 and tested on arm. The testcase in this PR still performs a sibcall correctly on arm.

What do you think of this?

Thanks,
Kyrill


2015-05-11  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

    PR target/65358
    * expr.c (memory_load_overlap): New function.
    (emit_push_insn): When pushing partial args to the stack would
    clobber the register part load the overlapping part into a pseudo
    and put it into the hard reg after pushing.  Change return type
    to bool.  Add bool argument.
    * expr.h (emit_push_insn): Change return type to bool.
    Add bool argument.
    * calls.c (expand_call): Cancel sibcall optimisation when encountering
    partial argument on targets with ARGS_GROW_DOWNWARD and
    !STACK_GROWS_DOWNWARD.
    (emit_library_call_value_1): Update callsite of emit_push_insn.
    (store_one_arg): Likewise.


2015-05-11  Honggyu Kim  <hong.gyu.kim@lge.com>

    PR target/65358
    * gcc.dg/pr65358.c: New test.


Jeff



commit 5b596f10846b6d3b143442a306801c8262d8b10a
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Wed Mar 18 13:42:37 2015 +0000

    [expr.c] PR 65358 Avoid clobbering partial argument during sibcall

diff --git a/gcc/calls.c b/gcc/calls.c
index caa7d60..81ef2c9 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -3225,6 +3225,13 @@ expand_call (tree exp, rtx target, int ignore)
 	    {
 	      rtx_insn *before_arg = get_last_insn ();
 
+	     /* On targets with weird calling conventions (e.g. PA) it's
+		hard to ensure that all cases of argument overlap between
+		stack and registers work.  Play it safe and bail out.  */
+#if defined (ARGS_GROW_DOWNWARD) && !defined (STACK_GROWS_DOWNWARD)
+	      sibcall_failure = 1;
+	      break;
+#endif
 	      if (store_one_arg (&args[i], argblock, flags,
 				 adjusted_args_size.var != 0,
 				 reg_parm_stack_space)
@@ -4270,7 +4277,7 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
 			  partial, reg, 0, argblock,
 			  GEN_INT (argvec[argnum].locate.offset.constant),
 			  reg_parm_stack_space,
-			  ARGS_SIZE_RTX (argvec[argnum].locate.alignment_pad));
+			  ARGS_SIZE_RTX (argvec[argnum].locate.alignment_pad), false);
 
 	  /* Now mark the segment we just used.  */
 	  if (ACCUMULATE_OUTGOING_ARGS)
@@ -4880,10 +4887,11 @@ store_one_arg (struct arg_data *arg, rtx argblock, int flags,
 
       /* This isn't already where we want it on the stack, so put it there.
 	 This can either be done with push or copy insns.  */
-      emit_push_insn (arg->value, arg->mode, TREE_TYPE (pval), NULL_RTX,
+      if (!emit_push_insn (arg->value, arg->mode, TREE_TYPE (pval), NULL_RTX,
 		      parm_align, partial, reg, used - size, argblock,
 		      ARGS_SIZE_RTX (arg->locate.offset), reg_parm_stack_space,
-		      ARGS_SIZE_RTX (arg->locate.alignment_pad));
+		      ARGS_SIZE_RTX (arg->locate.alignment_pad), true))
+	sibcall_failure = 1;
 
       /* Unless this is a partially-in-register argument, the argument is now
 	 in the stack.  */
@@ -4988,7 +4996,7 @@ store_one_arg (struct arg_data *arg, rtx argblock, int flags,
       emit_push_insn (arg->value, arg->mode, TREE_TYPE (pval), size_rtx,
 		      parm_align, partial, reg, excess, argblock,
 		      ARGS_SIZE_RTX (arg->locate.offset), reg_parm_stack_space,
-		      ARGS_SIZE_RTX (arg->locate.alignment_pad));
+		      ARGS_SIZE_RTX (arg->locate.alignment_pad), false);
 
       /* Unless this is a partially-in-register argument, the argument is now
 	 in the stack.
diff --git a/gcc/expr.c b/gcc/expr.c
index 25aa11f..712fa0b 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -4121,12 +4121,35 @@ emit_single_push_insn (machine_mode mode, rtx x, tree type)
 }
 #endif
 
+/* If reading SIZE bytes from X will end up reading from
+   Y return the number of bytes that overlap.  Return -1
+   if there is no overlap or -2 if we can't determing
+   (for example when X and Y have different base registers).  */
+
+static int
+memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
+{
+  rtx tmp = plus_constant (Pmode, x, size);
+  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
+
+  if (!CONST_INT_P (sub))
+    return -2;
+
+  HOST_WIDE_INT val = INTVAL (sub);
+
+  return IN_RANGE (val, 1, size) ? val : -1;
+}
+
 /* Generate code to push X onto the stack, assuming it has mode MODE and
    type TYPE.
    MODE is redundant except when X is a CONST_INT (since they don't
    carry mode info).
    SIZE is an rtx for the size of data to be copied (in bytes),
    needed only if X is BLKmode.
+   Return true if successful.  May return false if asked to push a
+   partial argument during a sibcall optimisation (as specified by
+   SIBCALL_P) and the incoming and outgoing pointers cannot be shown
+   to not overlap.
 
    ALIGN (in bits) is maximum alignment we can assume.
 
@@ -4152,11 +4175,11 @@ emit_single_push_insn (machine_mode mode, rtx x, tree type)
    for arguments passed in registers.  If nonzero, it will be the number
    of bytes required.  */
 
-void
+bool
 emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 		unsigned int align, int partial, rtx reg, int extra,
 		rtx args_addr, rtx args_so_far, int reg_parm_stack_space,
-		rtx alignment_pad)
+		rtx alignment_pad, bool sibcall_p)
 {
   rtx xinner;
   enum direction stack_direction
@@ -4179,6 +4202,10 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 
   xinner = x;
 
+  int nregs = partial / UNITS_PER_WORD;
+  rtx *tmp_regs = NULL;
+  int overlapping = 0;
+
   if (mode == BLKmode
       || (STRICT_ALIGNMENT && align < GET_MODE_ALIGNMENT (mode)))
     {
@@ -4309,6 +4336,43 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 	     PARM_BOUNDARY.  Assume the caller isn't lying.  */
 	  set_mem_align (target, align);
 
+	  /* If part should go in registers and pushing to that part would
+	     overwrite some of the values that need to go into regs, load the
+	     overlapping values into temporary pseudos to be moved into the hard
+	     regs at the end after the stack pushing has completed.
+	     We cannot load them directly into the hard regs here because
+	     they can be clobbered by the block move expansions.
+	     See PR 65358.  */
+
+	  if (partial > 0 && reg != 0 && mode == BLKmode
+	      && GET_CODE (reg) != PARALLEL)
+	    {
+	      overlapping = memory_load_overlap (XEXP (x, 0), temp, partial);
+	      if (overlapping > 0)
+	        {
+		  gcc_assert (overlapping % UNITS_PER_WORD == 0);
+		  overlapping /= UNITS_PER_WORD;
+
+		  tmp_regs = XALLOCAVEC (rtx, overlapping);
+
+		  for (int i = 0; i < overlapping; i++)
+		    tmp_regs[i] = gen_reg_rtx (word_mode);
+
+		  for (int i = 0; i < overlapping; i++)
+		    emit_move_insn (tmp_regs[i],
+				    operand_subword_force (target, i, mode));
+	        }
+	      else if (overlapping == -1)
+		overlapping = 0;
+	      /* Could not determine whether there is overlap.
+	         Fail the sibcall.  */
+	      else
+		{
+		  overlapping = 0;
+		  if (sibcall_p)
+		    return false;
+		}
+	    }
 	  emit_block_move (target, xinner, size, BLOCK_OP_CALL_PARM);
 	}
     }
@@ -4363,12 +4427,13 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 	 has a size a multiple of a word.  */
       for (i = size - 1; i >= not_stack; i--)
 	if (i >= not_stack + offset)
-	  emit_push_insn (operand_subword_force (x, i, mode),
+	  if (!emit_push_insn (operand_subword_force (x, i, mode),
 			  word_mode, NULL_TREE, NULL_RTX, align, 0, NULL_RTX,
 			  0, args_addr,
 			  GEN_INT (args_offset + ((i - not_stack + skip)
 						  * UNITS_PER_WORD)),
-			  reg_parm_stack_space, alignment_pad);
+			  reg_parm_stack_space, alignment_pad, sibcall_p))
+	    return false;
     }
   else
     {
@@ -4411,9 +4476,8 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 	}
     }
 
-  /* If part should go in registers, copy that part
-     into the appropriate registers.  Do this now, at the end,
-     since mem-to-mem copies above may do function calls.  */
+  /* Move the partial arguments into the registers and any overlapping
+     values that we moved into the pseudos in tmp_regs.  */
   if (partial > 0 && reg != 0)
     {
       /* Handle calls that pass values in multiple non-contiguous locations.
@@ -4421,9 +4485,15 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
       if (GET_CODE (reg) == PARALLEL)
 	emit_group_load (reg, x, type, -1);
       else
-	{
+        {
 	  gcc_assert (partial % UNITS_PER_WORD == 0);
-	  move_block_to_reg (REGNO (reg), x, partial / UNITS_PER_WORD, mode);
+	  move_block_to_reg (REGNO (reg), x, nregs - overlapping, mode);
+
+	  for (int i = 0; i < overlapping; i++)
+	    emit_move_insn (gen_rtx_REG (word_mode, REGNO (reg)
+						    + nregs - overlapping + i),
+			    tmp_regs[i]);
+
 	}
     }
 
@@ -4432,6 +4502,8 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 
   if (alignment_pad && args_addr == 0)
     anti_adjust_stack (alignment_pad);
+
+  return true;
 }
 
 /* Return X if X can be used as a subtarget in a sequence of arithmetic
diff --git a/gcc/expr.h b/gcc/expr.h
index 867852e..5fcc13f 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -218,8 +218,8 @@ extern rtx emit_move_resolve_push (machine_mode, rtx);
 extern rtx push_block (rtx, int, int);
 
 /* Generate code to push something onto the stack, given its mode and type.  */
-extern void emit_push_insn (rtx, machine_mode, tree, rtx, unsigned int,
-			    int, rtx, int, rtx, rtx, int, rtx);
+extern bool emit_push_insn (rtx, machine_mode, tree, rtx, unsigned int,
+			    int, rtx, int, rtx, rtx, int, rtx, bool);
 
 /* Expand an assignment that stores the value of FROM into TO.  */
 extern void expand_assignment (tree, tree, bool);
diff --git a/gcc/testsuite/gcc.dg/pr65358.c b/gcc/testsuite/gcc.dg/pr65358.c
new file mode 100644
index 0000000..ba89fd4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr65358.c
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+struct pack
+{
+  int fine;
+  int victim;
+  int killer;
+};
+
+int __attribute__ ((__noinline__, __noclone__))
+bar (int a, int b, struct pack p)
+{
+  if (a != 20 || b != 30)
+    __builtin_abort ();
+  if (p.fine != 40 || p.victim != 50 || p.killer != 60)
+    __builtin_abort ();
+  return 0;
+}
+
+int __attribute__ ((__noinline__, __noclone__))
+foo (int arg1, int arg2, int arg3, struct pack p)
+{
+  return bar (arg2, arg3, p);
+}
+
+int main (void)
+{
+  struct pack p = { 40, 50, 60 };
+
+  (void) foo (10, 20, 30, p);
+  return 0;
+}

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]