This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] [4.2/4.1] PR/32004, tree-ssa caused in/out asm constraints to often need reloads
- From: Paolo Bonzini <bonzini at gnu dot org>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Eric Botcazou <ebotcazou at libertysurf dot fr>, Mark Mitchell <mark at codesourcery dot com>
- Date: Mon, 09 Jul 2007 17:28:53 -0400
- Subject: Re: [PATCH] [4.2/4.1] PR/32004, tree-ssa caused in/out asm constraints to often need reloads
- References: <4692A204.1020603@gnu.org>
Paolo Bonzini wrote:
This patch is the official submission of the 4.1/4.2 version of my patch
to fix PR32004. The PR, an ice-on-valid caused by in/out asm
constraints, is similar to PR21291 in spirit: in fact, the PR21291 test
case did fail on 4.3 before my patch; on 4.1/4.2 instead a different
test case is necessary, which was not reported by a user.
The patch is structured as a separate pass for simplicity (it is easier
to read the code) and to turn it off on functions where it cannot have
any effect.
The bulk of the backported patch is identical to the original, the only
exception being the updating of dataflow. On 4.3 the new df API is
used, while on 4.1/4.2 I use update_life_info_in_dirty_blocks.
UPDATE_LIFE_GLOBAL_RM_NOTES and PROP_DEATH_NOTES are probably overkill
but I used them for conservativeness in order to make sure dataflow info
is valid. Unfortunately, my original backport missed PROP_REG_INFO,
which is an important item as local and global register allocation are
the primary users of this information and (by design) no other pass runs
between the new pass and the register allocator.
Honestly, it's not any kind of latent bug. It was my mistake, and all I
can say is that the new dataflow spoiled me. :-) The way flow.c worked
(it required each pass to update what it changed) is the exact opposite
of the mindset of df-branch (each pass specifies the information it
needs). This is why we need conservativeness in calling
update_life_info_in_dirty_blocks; the patch I attach is as conservative
as it can be.
The patch is already there on 4.1 and 4.2, together with the fix to
include PROP_REG_INFO. The question is whether to leave it.
Pros:
- PR32004 is P1 at least according to the RM
Cons:
- the pass cannot be turned off with a command line flag
I won't put the simplicity (apparent or real) of the fix among the pros
or cons: the problem I encountered meant that apparently it wasn't that
simple, but I must say that the fix is a witness of how stupid is the
cause of the failure.
Eric, who is an RTL maintainer, has already said that he opposes the fix
to go on the 4.1 branch.
Are there any middle-end/global maintainers who are in favor, possibly
two? And equally, are there any other maintainers who agree with Eric?
I can't believe it. Here is the patch. :-(
Paolo
2007-07-06 Paolo Bonzini <bonzini@gnu.org>
PR middle-end/32004
* function.c (match_asm_constraints_1, rest_of_match_asm_constraints,
pass_match_asm_constraints): New.
* passes.c (init_optimization_passes): Add new pass.
* stmt.c (expand_asm_operands): Set cfun->has_asm_statement.
* function.h (struct function): Add has_asm_statement bit.
(current_function_has_asm_statement): New.
* tree-pass.h (pass_match_asm_constraints): New.
Index: tree-pass.h
===================================================================
--- tree-pass.h (revision 126418)
+++ tree-pass.h (revision 126419)
@@ -335,7 +335,7 @@ extern struct tree_opt_pass pass_life;
extern struct tree_opt_pass pass_combine;
extern struct tree_opt_pass pass_if_after_combine;
extern struct tree_opt_pass pass_partition_blocks;
-extern struct tree_opt_pass pass_partition_blocks;
+extern struct tree_opt_pass pass_match_asm_constraints;
extern struct tree_opt_pass pass_regmove;
extern struct tree_opt_pass pass_split_all_insns;
extern struct tree_opt_pass pass_mode_switching;
Index: ChangeLog
===================================================================
--- ChangeLog (revision 126418)
+++ ChangeLog (revision 126419)
@@ -1,3 +1,14 @@
+2007-07-06 Paolo Bonzini <bonzini@gnu.org>
+
+ PR middle-end/32004
+ * function.c (match_asm_constraints_1, rest_of_match_asm_constraints,
+ pass_match_asm_constraints): New.
+ * passes.c (init_optimization_passes): Add new pass.
+ * stmt.c (expand_asm_operands): Set cfun->has_asm_statement.
+ * function.h (struct function): Add has_asm_statement bit.
+ (current_function_has_asm_statement): New.
+ * tree-pass.h (pass_match_asm_constraints): New.
+
2007-07-06 Uros Bizjak <ubizjak@gmail.com>
PR rtl-optimization/32450
Index: testsuite/gcc.target/i386/pr21291.c
===================================================================
--- testsuite/gcc.target/i386/pr21291.c (revision 126418)
+++ testsuite/gcc.target/i386/pr21291.c (revision 126419)
@@ -1,3 +1,8 @@
+/* The asm has 2 "r" in/out operands, 1 earlyclobber "r" output, 1 "r"
+ input and 2 fixed "r" clobbers (eax and edx), so there are a total of
+ 6 registers that must not conflict. Add to that the PIC register,
+ the frame pointer, and the stack pointer, and we've run out of
+ registers on 32-bit targets. */
/* { dg-do compile } */
/* { dg-options "-O" } */
Index: function.c
===================================================================
--- function.c (revision 126418)
+++ function.c (revision 126419)
@@ -5597,6 +5597,137 @@ struct tree_opt_pass pass_leaf_regs =
0, /* todo_flags_finish */
0 /* letter */
};
+
+/* This mini-pass fixes fall-out from SSA in asm statements that have
+ in-out constraints. Say you start with
+
+ orig = inout;
+ asm ("": "+mr" (inout));
+ use (orig);
+
+ which is transformed very early to use explicit output and match operands:
+
+ orig = inout;
+ asm ("": "=mr" (inout) : "0" (inout));
+ use (orig);
+
+ Or, after SSA and copyprop,
+
+ asm ("": "=mr" (inout_2) : "0" (inout_1));
+ use (inout_1);
+
+ Clearly inout_2 and inout_1 can't be coalesced easily anymore, as
+ they represent two separate values, so they will get different pseudo
+ registers during expansion. Then, since the two operands need to match
+ per the constraints, but use different pseudo registers, reload can
+ only register a reload for these operands. But reloads can only be
+ satisfied by hardregs, not by memory, so we need a register for this
+ reload, just because we are presented with non-matching operands.
+ So, even though we allow memory for this operand, no memory can be
+ used for it, just because the two operands don't match. This can
+ cause reload failures on register-starved targets.
+
+ So it's a symptom of reload not being able to use memory for reloads
+ or, alternatively it's also a symptom of both operands not coming into
+ reload as matching (in which case the pseudo could go to memory just
+ fine, as the alternative allows it, and no reload would be necessary).
+ We fix the latter problem here, by transforming
+
+ asm ("": "=mr" (inout_2) : "0" (inout_1));
+
+ back to
+
+ inout_2 = inout_1;
+ asm ("": "=mr" (inout_2) : "0" (inout_2)); */
+
+static void
+match_asm_constraints_1 (rtx insn, rtx *p_sets, int noutputs)
+{
+ int i;
+ rtx op = SET_SRC (p_sets[0]);
+ int ninputs = ASM_OPERANDS_INPUT_LENGTH (op);
+ rtvec inputs = ASM_OPERANDS_INPUT_VEC (op);
+
+ for (i = 0; i < ninputs; i++)
+ {
+ rtx input, output, insns;
+ const char *constraint = ASM_OPERANDS_INPUT_CONSTRAINT (op, i);
+ char *end;
+ int match;
+
+ match = strtoul (constraint, &end, 10);
+ if (end == constraint)
+ continue;
+
+ gcc_assert (match < noutputs);
+ output = SET_DEST (p_sets[match]);
+ input = RTVEC_ELT (inputs, i);
+ if (rtx_equal_p (output, input)
+ || (GET_MODE (input) != VOIDmode
+ && GET_MODE (input) != GET_MODE (output)))
+ continue;
+
+ start_sequence ();
+ emit_move_insn (copy_rtx (output), input);
+ RTVEC_ELT (inputs, i) = copy_rtx (output);
+ insns = get_insns ();
+ end_sequence ();
+
+ emit_insn_before (insns, insn);
+ }
+}
+
+static void
+rest_of_match_asm_constraints (void)
+{
+ basic_block bb;
+ rtx insn, pat, *p_sets;
+ int noutputs;
+
+ if (!cfun->has_asm_statement)
+ return;
+
+ FOR_EACH_BB (bb)
+ {
+ FOR_BB_INSNS (bb, insn)
+ {
+ if (!INSN_P (insn))
+ continue;
+
+ pat = PATTERN (insn);
+ if (GET_CODE (pat) == PARALLEL)
+ p_sets = &XVECEXP (pat, 0, 0), noutputs = XVECLEN (pat, 0);
+ else if (GET_CODE (pat) == SET)
+ p_sets = &PATTERN (insn), noutputs = 1;
+ else
+ continue;
+
+ if (GET_CODE (*p_sets) == SET
+ && GET_CODE (SET_SRC (*p_sets)) == ASM_OPERANDS)
+ match_asm_constraints_1 (insn, p_sets, noutputs);
+ }
+ }
+
+ update_life_info_in_dirty_blocks (UPDATE_LIFE_GLOBAL_RM_NOTES,
+ PROP_DEATH_NOTES | PROP_REG_INFO);
+}
+
+struct tree_opt_pass pass_match_asm_constraints =
+{
+ "asmcons", /* name */
+ NULL, /* gate */
+ rest_of_match_asm_constraints, /* execute */
+ NULL, /* sub */
+ NULL, /* next */
+ 0, /* static_pass_number */
+ 0, /* tv_id */
+ 0, /* properties_required */
+ 0, /* properties_provided */
+ 0, /* properties_destroyed */
+ 0, /* todo_flags_start */
+ TODO_dump_func, /* todo_flags_finish */
+ 0 /* letter */
+};
#include "gt-function.h"
Index: function.h
===================================================================
--- function.h (revision 126418)
+++ function.h (revision 126419)
@@ -408,6 +408,9 @@ struct function GTY(())
/* Nonzero if function being compiled has nonlocal gotos to parent
function. */
unsigned int has_nonlocal_goto : 1;
+
+ /* Nonzero if function being compiled has an asm statement. */
+ unsigned int has_asm_statement : 1;
/* Nonzero if the current function is a thunk, i.e., a lightweight
function implemented by the output_mi_thunk hook) that just
@@ -506,6 +509,7 @@ extern int trampolines_created;
#define current_function_epilogue_delay_list (cfun->epilogue_delay_list)
#define current_function_has_nonlocal_label (cfun->has_nonlocal_label)
#define current_function_has_nonlocal_goto (cfun->has_nonlocal_goto)
+#define current_function_has_asm_statement (cfun->has_asm_statement)
#define return_label (cfun->x_return_label)
#define naked_return_label (cfun->x_naked_return_label)
Index: passes.c
===================================================================
--- passes.c (revision 126418)
+++ passes.c (revision 126419)
@@ -639,6 +639,7 @@ init_optimization_passes (void)
NEXT_PASS (pass_split_all_insns);
NEXT_PASS (pass_mode_switching);
NEXT_PASS (pass_recompute_reg_usage);
+ NEXT_PASS (pass_match_asm_constraints);
NEXT_PASS (pass_sms);
NEXT_PASS (pass_sched);
NEXT_PASS (pass_local_alloc);
Index: stmt.c
===================================================================
--- stmt.c (revision 126418)
+++ stmt.c (revision 126419)
@@ -1079,6 +1079,7 @@ expand_asm_operands (tree string, tree o
if (real_output_rtx[i])
emit_move_insn (real_output_rtx[i], output_rtx[i]);
+ cfun->has_asm_statement = 1;
free_temp_slots ();
}