This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] [4.2/4.1] PR/32004, tree-ssa caused in/out asm constraints to often need reloads


Paolo Bonzini wrote:
This patch is the official submission of the 4.1/4.2 version of my patch to fix PR32004. The PR, an ice-on-valid caused by in/out asm constraints, is similar to PR21291 in spirit: in fact, the PR21291 test case did fail on 4.3 before my patch; on 4.1/4.2 instead a different test case is necessary, which was not reported by a user.

The patch is structured as a separate pass for simplicity (it is easier to read the code) and to turn it off on functions where it cannot have any effect.

The bulk of the backported patch is identical to the original, the only exception being the updating of dataflow. On 4.3 the new df API is used, while on 4.1/4.2 I use update_life_info_in_dirty_blocks.

UPDATE_LIFE_GLOBAL_RM_NOTES and PROP_DEATH_NOTES are probably overkill but I used them for conservativeness in order to make sure dataflow info is valid. Unfortunately, my original backport missed PROP_REG_INFO, which is an important item as local and global register allocation are the primary users of this information and (by design) no other pass runs between the new pass and the register allocator.

Honestly, it's not any kind of latent bug. It was my mistake, and all I can say is that the new dataflow spoiled me. :-) The way flow.c worked (it required each pass to update what it changed) is the exact opposite of the mindset of df-branch (each pass specifies the information it needs). This is why we need conservativeness in calling update_life_info_in_dirty_blocks; the patch I attach is as conservative as it can be.

The patch is already there on 4.1 and 4.2, together with the fix to include PROP_REG_INFO. The question is whether to leave it.

Pros:
- PR32004 is P1 at least according to the RM

Cons:
- the pass cannot be turned off with a command line flag

I won't put the simplicity (apparent or real) of the fix among the pros or cons: the problem I encountered meant that apparently it wasn't that simple, but I must say that the fix is a witness of how stupid is the cause of the failure.

Eric, who is an RTL maintainer, has already said that he opposes the fix to go on the 4.1 branch.

Are there any middle-end/global maintainers who are in favor, possibly two? And equally, are there any other maintainers who agree with Eric?

I can't believe it. Here is the patch. :-(


Paolo
2007-07-06  Paolo Bonzini  <bonzini@gnu.org>

	PR middle-end/32004
	* function.c (match_asm_constraints_1, rest_of_match_asm_constraints,
	pass_match_asm_constraints): New.
	* passes.c (init_optimization_passes): Add new pass.
	* stmt.c (expand_asm_operands): Set cfun->has_asm_statement.
	* function.h (struct function): Add has_asm_statement bit.
	(current_function_has_asm_statement): New.
	* tree-pass.h (pass_match_asm_constraints): New.

Index: tree-pass.h
===================================================================
--- tree-pass.h	(revision 126418)
+++ tree-pass.h	(revision 126419)
@@ -335,7 +335,7 @@ extern struct tree_opt_pass pass_life;
 extern struct tree_opt_pass pass_combine;
 extern struct tree_opt_pass pass_if_after_combine;
 extern struct tree_opt_pass pass_partition_blocks;
-extern struct tree_opt_pass pass_partition_blocks;
+extern struct tree_opt_pass pass_match_asm_constraints;
 extern struct tree_opt_pass pass_regmove;
 extern struct tree_opt_pass pass_split_all_insns;
 extern struct tree_opt_pass pass_mode_switching;
Index: ChangeLog
===================================================================
--- ChangeLog	(revision 126418)
+++ ChangeLog	(revision 126419)
@@ -1,3 +1,14 @@
+2007-07-06  Paolo Bonzini  <bonzini@gnu.org>
+
+	PR middle-end/32004
+	* function.c (match_asm_constraints_1, rest_of_match_asm_constraints,
+	pass_match_asm_constraints): New.
+	* passes.c (init_optimization_passes): Add new pass.
+	* stmt.c (expand_asm_operands): Set cfun->has_asm_statement.
+	* function.h (struct function): Add has_asm_statement bit.
+	(current_function_has_asm_statement): New.
+	* tree-pass.h (pass_match_asm_constraints): New.
+
 2007-07-06  Uros Bizjak  <ubizjak@gmail.com>
 
 	PR rtl-optimization/32450
Index: testsuite/gcc.target/i386/pr21291.c
===================================================================
--- testsuite/gcc.target/i386/pr21291.c	(revision 126418)
+++ testsuite/gcc.target/i386/pr21291.c	(revision 126419)
@@ -1,3 +1,8 @@
+/* The asm has 2 "r" in/out operands, 1 earlyclobber "r" output, 1 "r"
+   input and 2 fixed "r" clobbers (eax and edx), so there are a total of
+   6 registers that must not conflict.  Add to that the PIC register,
+   the frame pointer, and the stack pointer, and we've run out of
+   registers on 32-bit targets.  */
 /* { dg-do compile } */
 /* { dg-options "-O" } */
 
Index: function.c
===================================================================
--- function.c	(revision 126418)
+++ function.c	(revision 126419)
@@ -5597,6 +5597,137 @@ struct tree_opt_pass pass_leaf_regs =
   0,                                    /* todo_flags_finish */
   0                                     /* letter */
 };
+
 
+/* This mini-pass fixes fall-out from SSA in asm statements that have
+   in-out constraints.  Say you start with 
+
+     orig = inout;
+     asm ("": "+mr" (inout));
+     use (orig);
+
+   which is transformed very early to use explicit output and match operands:
+
+     orig = inout;
+     asm ("": "=mr" (inout) : "0" (inout));
+     use (orig);
+
+   Or, after SSA and copyprop,
+
+     asm ("": "=mr" (inout_2) : "0" (inout_1));
+     use (inout_1);
+
+   Clearly inout_2 and inout_1 can't be coalesced easily anymore, as
+   they represent two separate values, so they will get different pseudo
+   registers during expansion.  Then, since the two operands need to match
+   per the constraints, but use different pseudo registers, reload can
+   only register a reload for these operands.  But reloads can only be
+   satisfied by hardregs, not by memory, so we need a register for this
+   reload, just because we are presented with non-matching operands.
+   So, even though we allow memory for this operand, no memory can be
+   used for it, just because the two operands don't match.  This can
+   cause reload failures on register-starved targets.
+
+   So it's a symptom of reload not being able to use memory for reloads
+   or, alternatively it's also a symptom of both operands not coming into
+   reload as matching (in which case the pseudo could go to memory just
+   fine, as the alternative allows it, and no reload would be necessary).
+   We fix the latter problem here, by transforming
+
+     asm ("": "=mr" (inout_2) : "0" (inout_1));
+
+   back to
+
+     inout_2 = inout_1;
+     asm ("": "=mr" (inout_2) : "0" (inout_2));  */
+
+static void
+match_asm_constraints_1 (rtx insn, rtx *p_sets, int noutputs)
+{
+  int i;
+  rtx op = SET_SRC (p_sets[0]);
+  int ninputs = ASM_OPERANDS_INPUT_LENGTH (op);
+  rtvec inputs = ASM_OPERANDS_INPUT_VEC (op);
+
+  for (i = 0; i < ninputs; i++)
+    {
+      rtx input, output, insns;
+      const char *constraint = ASM_OPERANDS_INPUT_CONSTRAINT (op, i);
+      char *end;
+      int match;
+
+      match = strtoul (constraint, &end, 10);
+      if (end == constraint)
+	continue;
+
+      gcc_assert (match < noutputs);
+      output = SET_DEST (p_sets[match]);
+      input = RTVEC_ELT (inputs, i);
+      if (rtx_equal_p (output, input)
+	  || (GET_MODE (input) != VOIDmode
+	      && GET_MODE (input) != GET_MODE (output)))
+	continue;
+
+      start_sequence ();
+      emit_move_insn (copy_rtx (output), input);
+      RTVEC_ELT (inputs, i) = copy_rtx (output);
+      insns = get_insns ();
+      end_sequence ();
+
+      emit_insn_before (insns, insn);
+    }
+}
+
+static void
+rest_of_match_asm_constraints (void)
+{
+  basic_block bb;
+  rtx insn, pat, *p_sets;
+  int noutputs;
+
+  if (!cfun->has_asm_statement)
+    return;
+
+  FOR_EACH_BB (bb)
+    {
+      FOR_BB_INSNS (bb, insn)
+	{
+	  if (!INSN_P (insn))
+	    continue;
+
+	  pat = PATTERN (insn);
+	  if (GET_CODE (pat) == PARALLEL)
+	    p_sets = &XVECEXP (pat, 0, 0), noutputs = XVECLEN (pat, 0);
+	  else if (GET_CODE (pat) == SET)
+	    p_sets = &PATTERN (insn), noutputs = 1;
+	  else
+	    continue;
+
+	  if (GET_CODE (*p_sets) == SET
+	      && GET_CODE (SET_SRC (*p_sets)) == ASM_OPERANDS)
+	    match_asm_constraints_1 (insn, p_sets, noutputs);
+	 }
+    }
+
+  update_life_info_in_dirty_blocks (UPDATE_LIFE_GLOBAL_RM_NOTES,
+                                    PROP_DEATH_NOTES | PROP_REG_INFO);
+}
+
+struct tree_opt_pass pass_match_asm_constraints =
+{
+  "asmcons",				/* name */
+  NULL,					/* gate */
+  rest_of_match_asm_constraints,	/* execute */
+  NULL,                                 /* sub */
+  NULL,                                 /* next */
+  0,                                    /* static_pass_number */
+  0,					/* tv_id */
+  0,                                    /* properties_required */
+  0,                                    /* properties_provided */
+  0,                                    /* properties_destroyed */
+  0,					/* todo_flags_start */
+  TODO_dump_func,                       /* todo_flags_finish */
+  0                                     /* letter */
+};
 
 #include "gt-function.h"
Index: function.h
===================================================================
--- function.h	(revision 126418)
+++ function.h	(revision 126419)
@@ -408,6 +408,9 @@ struct function GTY(())
   /* Nonzero if function being compiled has nonlocal gotos to parent
      function.  */
   unsigned int has_nonlocal_goto : 1;
+  
+  /* Nonzero if function being compiled has an asm statement.  */
+  unsigned int has_asm_statement : 1;
 
   /* Nonzero if the current function is a thunk, i.e., a lightweight
      function implemented by the output_mi_thunk hook) that just
@@ -506,6 +509,7 @@ extern int trampolines_created;
 #define current_function_epilogue_delay_list (cfun->epilogue_delay_list)
 #define current_function_has_nonlocal_label (cfun->has_nonlocal_label)
 #define current_function_has_nonlocal_goto (cfun->has_nonlocal_goto)
+#define current_function_has_asm_statement (cfun->has_asm_statement)
 
 #define return_label (cfun->x_return_label)
 #define naked_return_label (cfun->x_naked_return_label)
Index: passes.c
===================================================================
--- passes.c	(revision 126418)
+++ passes.c	(revision 126419)
@@ -639,6 +639,7 @@ init_optimization_passes (void)
   NEXT_PASS (pass_split_all_insns);
   NEXT_PASS (pass_mode_switching);
   NEXT_PASS (pass_recompute_reg_usage);
+  NEXT_PASS (pass_match_asm_constraints);
   NEXT_PASS (pass_sms);
   NEXT_PASS (pass_sched);
   NEXT_PASS (pass_local_alloc);
Index: stmt.c
===================================================================
--- stmt.c	(revision 126418)
+++ stmt.c	(revision 126419)
@@ -1079,6 +1079,7 @@ expand_asm_operands (tree string, tree o
     if (real_output_rtx[i])
       emit_move_insn (real_output_rtx[i], output_rtx[i]);
 
+  cfun->has_asm_statement = 1;
   free_temp_slots ();
 }
 

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]