This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

patch to improve register preferencing in IRA and to *remove regmove* pass


  Tomorrow I'd like commit the following patch.

  The patch removes regmove pass.  It was discussed on RA
BOF of GNU Cauldron this summer.  Regmove pass makes a lot RTL
transformations in a wrong place (too earlier) without knowledge of
the situation with RA point of view (e.g. reload generations for insn
operands having a probability to be matched), some transformations are
tried and never done (i've checked them on hundred C and Fortran files
with overall >700K lines of code on 4 platforms x86/x86-64, arm,
ppc64, and s390).  Some transformations are doubtful with RA point of
view (e.g. increasing register pressure by making longer pseudo live
ranges).

  I've found only one useful transformations in regmove pass:

      dst = src                                    dst = src (src dies)
      ...      no dst or src modification  =>      src changed on dst
      src dies                                     ...

      It is some kind value numbering technique decreasing register
      pressure by removing one live-range.  It is not frequently
      triggered transformation (about 30 in all combine.c) and its
      effect is quite small but there is no harm in it at all too.

      So I added the code to IRA without changes practically (it would
      be nice to make this in more general way, e.g. not only in BB
      scope -- but I am not sure that it will makes visible
      improvements and I have no time to try this currently).

  Still to achieve the same code performance without regmove pass, I
needed to improve code in IRA which implicitly replace removed regmove
transformations:

  o improving hard reg preferences.  As you know RTL code can contain
    explicitly hard regs.  Currently, hard register in RTL code affect
    costs of hard regs only for pseudo involved with the hard register
    occurrence.  E.g.

     p1 = hr or
     p1 = op (hr, ...) and hr have a probability to match with p1.

    But if we have additionally

     p2 = p1

    the occurrence hr does not affect cost of hr for p2.  It can be
    happened that p2 will be assigned to another hard register before
    assigning (most probably hr) to p1.

    So we need a mechanism to propagate hr preference to other
    pseudos too.

    Actually we have such mechanism for propagation of preferences of
    hard register assigned to a pseudo to other pseudos connected
    through copies (one or more).  It was implemented by Richard
    Sandiford.  So I implemented analogous approach to hard registers
    explicitly occurring in RTL.

  o improving preference propagation of hard registers occurring in RTL
    and assigned to connected pseudo.  Let us look at the situation:

       p1 - p2 - p3,  where '-' means a copy

    and we are assigning p1, p2, p3 in the same order.
    When we've just assigned hr1 to p1, we propagating hr1 preference
    to p2 and p3.  When we assign to p2, we can not assign hr1 for
    some reason and have to assign hr2.  P3 in the current preference
    implementation still has hr1 preference which is wrong.

    I implemented undoing preference propagation for such situations.

  o Currently IRA generates too aggressively copies for operands might
  be matched, so I rewrite this code to generate copies more
  accurately.

  All these improvements were necessary to remove the most frequently
regmove optimization for matching constraints by generation of reload
insn or by pseudo coalescing if one matching pseudo dies in the insn.
Again, generating a reload insn sometimes is not necessary as there
may be other alternatives without matching constraints and we don't
know what alternative will be finally used.  The second transformation
(coalescing) is done without overall register pressure picture and can
result in worse spilling as it makes longer live-range.

  The changes in testsuites are necessary as IRA/LRA now generate a
different code (more accurately a better code by removing register
shuffle moves for each case).

  So this patch removes a lot of code, decrease compilation time
(e.g. valgrind lackey reports about 0.4% less executed insns on
compiling GCC combine.i with -O2), generates about the same performace
code (the best improvement I saw is 0.5% SPEC2000
improvement on x86_64 in -O3 mode on a Haswell processor) and about
the same average code size for SPEC2000 (the differences in hundredth
percent range).

  It is a big change and I hope there are no serious objections to
this.  If somebody has them, please express them or inform me.

  Thanks, Vlad.

2013-10-28  Vladimir Makarov  <vmakarov@redhat.com>

	* regmove.c: Remove.
	* tree-pass.h (make_pass_regmove): Remove.
	* timevar.def (TV_REGMOVE): Remove.
	* passes.def (pass_regmove): Remove.
	* opts.c (default_options_table): Remove entry for regmove.
	* doc/passes.texi: Remove regmove pass description.
	* doc/invoke.texi (-foptimize-register-move, -fregmove): Remove
	options.
	(-fdump-rtl-regmove): Ditto.
	* common.opt (foptimize-register-move, fregmove): Remove.
	* Makefile.in (OBJS): Remove regmove.o.
	* regmove.c: Remove.
	* ira-int.h (struct ira_allocno_pref, ira_pref_t): New structure
	and type.
	(struct ira_allocno) New member allocno_prefs.
	(ALLOCNO_PREFS): New macro.
	(ira_prefs, ira_prefs_num): New external vars.
	(ira_setup_alts, ira_get_dup_out_num, ira_debug_pref): New
	prototypes.
	(ira_debug_prefs, ira_debug_allocno_prefs, ira_create_pref):
	Ditto.
	(ira_add_allocno_pref, ira_remove_pref, ira_remove_allocno_prefs):
	Ditto.
	(ira_add_allocno_copy_to_list): Remove prototype.
	(ira_swap_allocno_copy_ends_if_necessary): Ditto.
	(ira_pref_iterator): New type.
	(ira_pref_iter_init, ira_pref_iter_cond): New functions.
	(FOR_EACH_PREF): New macro.
	* ira.c (commutative_constraint_p): Move from ira-conflicts.c.
	(ira_get_dup_out_num): Ditto. Rename from get_dup_num.  Modify the
	code.
	(ira_setup_alts): New function.
	(decrease_live_ranges_number): New function.
	(ira): Call the above function.
	* ira-build.c (ira_prefs, ira_prefs_num): New global vars.
	(ira_create_allocno): Initialize allocno prefs.
	(pref_pool, pref_vec): New static vars.
	(initiate_prefs, find_allocno_pref, ira_create_pref): New
	functions.
	(add_allocno_pref_to_list, ira_add_allocno_pref, print_pref): Ditto.
	(ira_debug_pref, print_prefs, ira_debug_prefs): Ditto.
	(print_allocno_prefs, ira_debug_allocno_prefs, finish_pref): Ditto.
	(ira_remove_pref, ira_remove_allocno_prefs, finish_prefs): Ditto.
	(ira_add_allocno_copy_to_list): Make static.  Rename to
	add_allocno_copy_to_list.
	(ira_swap_allocno_copy_ends_if_necessary): Make static.  Rename to
	swap_allocno_copy_ends_if_necessary.
	(remove_unnecessary_allocnos, remove_low_level_allocnos): Call
	ira_remove_allocno_prefs.
	(ira_flattening): Ditto.
	(ira_build): Call initiate_prefs, print_prefs.
	(ira_destroy): Call finish_prefs.
	* ira-color.c (struct update_cost_record): New.
	(struct allocno_color_data): Add new member update_cost_records.
	(update_cost_record_pool): New static var.
	(init_update_cost_records, get_update_cost_record): New functions.
	(free_update_cost_record_list, finish_update_cost_records): Ditto.
	(struct update_cost_queue_elem): Add member from.
	(initiate_cost_update): Call init_update_cost_records.
	(finish_cost_update): Call finish_update_cost_records.
	(queue_update_cost, get_next_update_cost): Add new param from.
	(update_allocno_cost, update_costs_from_allocno): New functions.
	(update_costs_from_prefs): Ditto.
	(update_copy_costs): Rename to update_costs_from_copies.
	(restore_costs_from_copies): New function.
	(update_conflict_hard_regno_costs): Don't go back.
	(assign_hard_reg): Call restore_costs_from_copies.  Add printing
	more debug info.
	(pop_allocnos): Add priniting more debug info.
	(color_allocnos): Remove prefs for conflicting hard regs.
	Call update_costs_from_prefs.
	* ira-conflicts.c (commutative_constraint_p): Move to ira.c
	(get_dup_num): Rename, modify, and move to ira.c
	(process_regs_for_copy): Add prefs.
	(add_insn_allocno_copies): Put src as first arg of
	process_regs_for_copy.  Remove dead code.  Call ira_setup_alts.
	* ira-costs.c (record_reg_classes): Modify and move code into
	record_operands_costs.
	(find_costs_and_classes): Create prefs for the hard reg of small
	reg class.
	

2013-10-29  Vladimir Makarov  <vmakarov@redhat.com>

	* gcc.target/i386/fma_double_3.c: Use pattern for
	scan-assembler-times instead of just one insn name.
	* gcc.target/i386/fma_double_5.c: Ditto.
	* gcc.target/i386/fma_float_3.c: Ditto.
	* gcc.target/i386/fma_float_5.c: Ditto.
	* gcc.target/i386/l_fma_double_1.c: Ditto.
	* gcc.target/i386/l_fma_double_2.c: Ditto.
	* gcc.target/i386/l_fma_double_3.c: Ditto.
	* gcc.target/i386/l_fma_double_4.c: Ditto.
	* gcc.target/i386/l_fma_double_5.c: Ditto.
	* gcc.target/i386/l_fma_double_6.c: Ditto.
	* gcc.target/i386/l_fma_float_1.c: Ditto.
	* gcc.target/i386/l_fma_float_2.c: Ditto.
	* gcc.target/i386/l_fma_float_3.c: Ditto.
	* gcc.target/i386/l_fma_float_4.c: Ditto.
	* gcc.target/i386/l_fma_float_5.c: Ditto.
	* gcc.target/i386/l_fma_float_6.c: Ditto.


Index: Makefile.in
===================================================================
--- Makefile.in	(revision 204148)
+++ Makefile.in	(working copy)
@@ -1327,7 +1327,6 @@ OBJS = \
 	reg-stack.o \
 	regcprop.o \
 	reginfo.o \
-	regmove.o \
 	regrename.o \
 	regstat.o \
 	reload.o \
Index: common.opt
===================================================================
--- common.opt	(revision 204148)
+++ common.opt	(working copy)
@@ -1591,10 +1591,6 @@ fopt-info-
 Common Joined RejectNegative Var(common_deferred_options) Defer
 -fopt-info[-<type>=filename]	Dump compiler optimization details
 
-foptimize-register-move
-Common Report Var(flag_regmove) Optimization
-Do the full register move optimization pass
-
 foptimize-sibling-calls
 Common Report Var(flag_optimize_sibling_calls) Optimization
 Optimize sibling and tail recursive calls
@@ -1729,10 +1725,6 @@ freg-struct-return
 Common Report Var(flag_pcc_struct_return,0) Optimization
 Return small aggregates in registers
 
-fregmove
-Common Report Var(flag_regmove) Optimization
-Enables a register move optimization
-
 frename-registers
 Common Report Var(flag_rename_registers) Init(2) Optimization
 Perform a register renaming optimization pass
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 204148)
+++ doc/invoke.texi	(working copy)
@@ -388,13 +388,13 @@ Objective-C and Objective-C++ Dialects}.
 -fno-inline -fno-math-errno -fno-peephole -fno-peephole2 @gol
 -fno-sched-interblock -fno-sched-spec -fno-signed-zeros @gol
 -fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss @gol
--fomit-frame-pointer -foptimize-register-move -foptimize-sibling-calls @gol
+-fomit-frame-pointer -foptimize-sibling-calls @gol
 -fpartial-inlining -fpeel-loops -fpredictive-commoning @gol
 -fprefetch-loop-arrays -fprofile-report @gol
 -fprofile-correction -fprofile-dir=@var{path} -fprofile-generate @gol
 -fprofile-generate=@var{path} @gol
 -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol
--freciprocal-math -free -fregmove -frename-registers -freorder-blocks @gol
+-freciprocal-math -free -frename-registers -freorder-blocks @gol
 -freorder-blocks-and-partition -freorder-functions @gol
 -frerun-cse-after-loop -freschedule-modulo-scheduled-loops @gol
 -frounding-math -fsched2-use-superblocks -fsched-pressure @gol
@@ -5822,10 +5822,6 @@ Dump after post-reload optimizations.
 @opindex fdump-rtl-pro_and_epilogue
 Dump after generating the function prologues and epilogues.
 
-@item -fdump-rtl-regmove
-@opindex fdump-rtl-regmove
-Dump after the register move pass.
-
 @item -fdump-rtl-sched1
 @itemx -fdump-rtl-sched2
 @opindex fdump-rtl-sched1
@@ -6738,7 +6734,6 @@ also turns on the following optimization
 -foptimize-sibling-calls @gol
 -fpartial-inlining @gol
 -fpeephole2 @gol
--fregmove @gol
 -freorder-blocks  -freorder-functions @gol
 -frerun-cse-after-loop  @gol
 -fsched-interblock  -fsched-spec @gol
@@ -7262,20 +7257,6 @@ registers after writing to their lower 3
 
 Enabled for x86 at levels @option{-O2}, @option{-O3}.
 
-@item -foptimize-register-move
-@itemx -fregmove
-@opindex foptimize-register-move
-@opindex fregmove
-Attempt to reassign register numbers in move instructions and as
-operands of other simple instructions in order to maximize the amount of
-register tying.  This is especially helpful on machines with two-operand
-instructions.
-
-Note @option{-fregmove} and @option{-foptimize-register-move} are the same
-optimization.
-
-Enabled at levels @option{-O2}, @option{-O3}, @option{-Os}.
-
 @item -fira-algorithm=@var{algorithm}
 Use the specified coloring algorithm for the integrated register
 allocator.  The @var{algorithm} argument can be @samp{priority}, which
Index: doc/passes.texi
===================================================================
--- doc/passes.texi	(revision 204148)
+++ doc/passes.texi	(working copy)
@@ -791,14 +791,6 @@ RTL expressions for the instructions by
 result using algebra, and then attempts to match the result against
 the machine description.  The code is located in @file{combine.c}.
 
-@item Register movement
-
-This pass looks for cases where matching constraints would force an
-instruction to need a reload, and this reload would be a
-register-to-register move.  It then attempts to change the registers
-used by the instruction to avoid the move instruction.  The code is
-located in @file{regmove.c}.
-
 @item Mode switching optimization
 
 This pass looks for instructions that require the processor to be in a
@@ -837,11 +829,6 @@ them on the stack.  This is done in seve
 
 @itemize @bullet
 @item
-Register move optimizations.  This pass makes some simple RTL code
-transformations which improve the subsequent register allocation.  The
-source file is @file{regmove.c}.
-
-@item
 The integrated register allocator (@acronym{IRA}).  It is called
 integrated because coalescing, register live range splitting, and hard
 register preferencing are done on-the-fly during coloring.  It also
Index: ira-build.c
===================================================================
--- ira-build.c	(revision 204148)
+++ ira-build.c	(working copy)
@@ -79,6 +79,13 @@ int ira_objects_num;
 /* Map a conflict id to its conflict record.  */
 ira_object_t *ira_object_id_map;
 
+/* Array of references to all allocno preferences.  The order number
+   of the preference corresponds to the index in the array.  */
+ira_pref_t *ira_prefs;
+
+/* Size of the previous array.  */
+int ira_prefs_num;
+
 /* Array of references to all copies.  The order number of the copy
    corresponds to the index in the array.  Removed copies have NULL
    element value.  */
@@ -515,6 +522,7 @@ ira_create_allocno (int regno, bool cap_
   ALLOCNO_BAD_SPILL_P (a) = false;
   ALLOCNO_ASSIGNED_P (a) = false;
   ALLOCNO_MODE (a) = (regno < 0 ? VOIDmode : PSEUDO_REGNO_MODE (regno));
+  ALLOCNO_PREFS (a) = NULL;
   ALLOCNO_COPIES (a) = NULL;
   ALLOCNO_HARD_REG_COSTS (a) = NULL;
   ALLOCNO_CONFLICT_HARD_REG_COSTS (a) = NULL;
@@ -1163,6 +1171,195 @@ finish_allocnos (void)
 
 
 
+/* Pools for allocno preferences.  */
+static alloc_pool pref_pool;
+
+/* Vec containing references to all created preferences.  It is a
+   container of array ira_prefs.  */
+static vec<ira_pref_t> pref_vec;
+
+/* The function initializes data concerning allocno prefs.  */
+static void
+initiate_prefs (void)
+{
+  pref_pool
+    = create_alloc_pool ("prefs", sizeof (struct ira_allocno_pref), 100);
+  pref_vec.create (get_max_uid ());
+  ira_prefs = NULL;
+  ira_prefs_num = 0;
+}
+
+/* Return pref for A and HARD_REGNO if any.  */
+static ira_pref_t
+find_allocno_pref (ira_allocno_t a, int hard_regno)
+{
+  ira_pref_t pref;
+
+  for (pref = ALLOCNO_PREFS (a); pref != NULL; pref = pref->next_pref)
+    if (pref->allocno == a && pref->hard_regno == hard_regno)
+      return pref;
+  return NULL;
+}
+
+/* Create and return pref with given attributes A, HARD_REGNO, and FREQ.  */
+ira_pref_t
+ira_create_pref (ira_allocno_t a, int hard_regno, int freq)
+{
+  ira_pref_t pref;
+
+  pref = (ira_pref_t) pool_alloc (pref_pool);
+  pref->num = ira_prefs_num;
+  pref->allocno = a;
+  pref->hard_regno = hard_regno;
+  pref->freq = freq;
+  pref_vec.safe_push (pref);
+  ira_prefs = pref_vec.address ();
+  ira_prefs_num = pref_vec.length ();
+  return pref;
+}
+
+/* Attach a pref PREF to the cooresponding allocno.  */
+static void
+add_allocno_pref_to_list (ira_pref_t pref)
+{
+  ira_allocno_t a = pref->allocno;
+
+  pref->next_pref = ALLOCNO_PREFS (a);
+  ALLOCNO_PREFS (a) = pref;
+}
+
+/* Create (or update frequency if the pref already exists) the pref of
+   allocnos A preferring HARD_REGNO with frequency FREQ.  */
+void
+ira_add_allocno_pref (ira_allocno_t a, int hard_regno, int freq)
+{
+  ira_pref_t pref;
+
+  if (freq <= 0)
+    return;
+  if ((pref = find_allocno_pref (a, hard_regno)) != NULL)
+    {
+      pref->freq += freq;
+      return;
+    }
+  pref = ira_create_pref (a, hard_regno, freq);
+  ira_assert (a != NULL);
+  add_allocno_pref_to_list (pref);
+}
+
+/* Print info about PREF into file F.  */
+static void
+print_pref (FILE *f, ira_pref_t pref)
+{
+  fprintf (f, "  pref%d:a%d(r%d)<-hr%d@%d\n", pref->num,
+	   ALLOCNO_NUM (pref->allocno), ALLOCNO_REGNO (pref->allocno),
+	   pref->hard_regno, pref->freq);
+}
+
+/* Print info about PREF into stderr.  */
+void
+ira_debug_pref (ira_pref_t pref)
+{
+  print_pref (stderr, pref);
+}
+
+/* Print info about all prefs into file F.  */
+static void
+print_prefs (FILE *f)
+{
+  ira_pref_t pref;
+  ira_pref_iterator pi;
+
+  FOR_EACH_PREF (pref, pi)
+    print_pref (f, pref);
+}
+
+/* Print info about all prefs into stderr.  */
+void
+ira_debug_prefs (void)
+{
+  print_prefs (stderr);
+}
+
+/* Print info about prefs involving allocno A into file F.  */
+static void
+print_allocno_prefs (FILE *f, ira_allocno_t a)
+{
+  ira_pref_t pref;
+
+  fprintf (f, " a%d(r%d):", ALLOCNO_NUM (a), ALLOCNO_REGNO (a));
+  for (pref = ALLOCNO_PREFS (a); pref != NULL; pref = pref->next_pref)
+    fprintf (f, " pref%d:hr%d@%d", pref->num, pref->hard_regno, pref->freq);
+  fprintf (f, "\n");
+}
+
+/* Print info about prefs involving allocno A into stderr.  */
+void
+ira_debug_allocno_prefs (ira_allocno_t a)
+{
+  print_allocno_prefs (stderr, a);
+}
+
+/* The function frees memory allocated for PREF.  */
+static void
+finish_pref (ira_pref_t pref)
+{
+  ira_prefs[pref->num] = NULL;
+  pool_free (pref_pool, pref);
+}
+
+/* Remove PREF from the list of allocno prefs and free memory for
+   it.  */
+void
+ira_remove_pref (ira_pref_t pref)
+{
+  ira_pref_t cpref, prev;
+
+  if (internal_flag_ira_verbose > 1 && ira_dump_file != NULL)
+    fprintf (ira_dump_file, " Removing pref%d:hr%d@%d\n",
+	     pref->num, pref->hard_regno, pref->freq);
+  for (prev = NULL, cpref = ALLOCNO_PREFS (pref->allocno);
+       cpref != NULL;
+       prev = cpref, cpref = cpref->next_pref)
+    if (cpref == pref)
+      break;
+  ira_assert (cpref != NULL);
+  if (prev == NULL)
+    ALLOCNO_PREFS (pref->allocno) = pref->next_pref;
+  else
+    prev->next_pref = pref->next_pref;
+  finish_pref (pref);
+}
+
+/* Remove all prefs of allocno A.  */
+void
+ira_remove_allocno_prefs (ira_allocno_t a)
+{
+  ira_pref_t pref, next_pref;
+
+  for (pref = ALLOCNO_PREFS (a); pref != NULL; pref = next_pref)
+    {
+      next_pref = pref->next_pref;
+      finish_pref (pref);
+    }
+  ALLOCNO_PREFS (a) = NULL;
+}
+
+/* Free memory allocated for all prefs.  */
+static void
+finish_prefs (void)
+{
+  ira_pref_t pref;
+  ira_pref_iterator pi;
+
+  FOR_EACH_PREF (pref, pi)
+    finish_pref (pref);
+  pref_vec.release ();
+  free_alloc_pool (pref_pool);
+}
+
+
+
 /* Pools for copies.  */
 static alloc_pool copy_pool;
 
@@ -1235,8 +1432,8 @@ ira_create_copy (ira_allocno_t first, ir
 }
 
 /* Attach a copy CP to allocnos involved into the copy.  */
-void
-ira_add_allocno_copy_to_list (ira_copy_t cp)
+static void
+add_allocno_copy_to_list (ira_copy_t cp)
 {
   ira_allocno_t first = cp->first, second = cp->second;
 
@@ -1264,8 +1461,8 @@ ira_add_allocno_copy_to_list (ira_copy_t
 
 /* Make a copy CP a canonical copy where number of the
    first allocno is less than the second one.  */
-void
-ira_swap_allocno_copy_ends_if_necessary (ira_copy_t cp)
+static void
+swap_allocno_copy_ends_if_necessary (ira_copy_t cp)
 {
   ira_allocno_t temp;
   ira_copy_t temp_cp;
@@ -1305,8 +1502,8 @@ ira_add_allocno_copy (ira_allocno_t firs
   cp = ira_create_copy (first, second, freq, constraint_p, insn,
 			loop_tree_node);
   ira_assert (first != NULL && second != NULL);
-  ira_add_allocno_copy_to_list (cp);
-  ira_swap_allocno_copy_ends_if_necessary (cp);
+  add_allocno_copy_to_list (cp);
+  swap_allocno_copy_ends_if_necessary (cp);
   return cp;
 }
 
@@ -2305,6 +2502,7 @@ remove_unnecessary_allocnos (void)
 		     map to avoid info propagation of subsequent
 		     allocno into this already removed allocno.  */
 		  a_node->regno_allocno_map[regno] = NULL;
+		  ira_remove_allocno_prefs (a);
 		  finish_allocno (a);
 		}
 	    }
@@ -2388,7 +2586,10 @@ remove_low_level_allocnos (void)
 #endif
 	}
       else
-	finish_allocno (a);
+	{
+	  ira_remove_allocno_prefs (a);
+	  finish_allocno (a);
+	}
     }
   if (merged_p)
     ira_rebuild_start_finish_chains ();
@@ -3105,6 +3306,7 @@ ira_flattening (int max_regno_before_emi
 	  if (internal_flag_ira_verbose > 4 && ira_dump_file != NULL)
 	    fprintf (ira_dump_file, "      Remove a%dr%d\n",
 		     ALLOCNO_NUM (a), REGNO (allocno_emit_reg (a)));
+	  ira_remove_allocno_prefs (a);
 	  finish_allocno (a);
 	  continue;
 	}
@@ -3131,8 +3333,8 @@ ira_flattening (int max_regno_before_emi
       ira_assert
 	(ALLOCNO_LOOP_TREE_NODE (cp->first) == ira_loop_tree_root
 	 && ALLOCNO_LOOP_TREE_NODE (cp->second) == ira_loop_tree_root);
-      ira_add_allocno_copy_to_list (cp);
-      ira_swap_allocno_copy_ends_if_necessary (cp);
+      add_allocno_copy_to_list (cp);
+      swap_allocno_copy_ends_if_necessary (cp);
     }
   rebuild_regno_allocno_maps ();
   if (ira_max_point != ira_max_point_before_emit)
@@ -3220,6 +3422,7 @@ ira_build (void)
   df_analyze ();
   initiate_cost_vectors ();
   initiate_allocnos ();
+  initiate_prefs ();
   initiate_copies ();
   create_loop_tree_nodes ();
   form_loop_tree ();
@@ -3265,6 +3468,8 @@ ira_build (void)
     }
   if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL)
     print_copies (ira_dump_file);
+  if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL)
+    print_prefs (ira_dump_file);
   if (internal_flag_ira_verbose > 0 && ira_dump_file != NULL)
     {
       int n, nr, nr_big;
@@ -3304,6 +3509,7 @@ void
 ira_destroy (void)
 {
   finish_loop_tree_nodes ();
+  finish_prefs ();
   finish_copies ();
   finish_allocnos ();
   finish_cost_vectors ();
Index: ira-color.c
===================================================================
--- ira-color.c	(revision 204148)
+++ ira-color.c	(working copy)
@@ -88,6 +88,17 @@ struct allocno_hard_regs_node
   allocno_hard_regs_node_t parent, first, prev, next;
 };
 
+/* Info about changing hard reg costs of an allocno.  */
+struct update_cost_record
+{
+  /* Hard regno for which we changed the cost.  */
+  int hard_regno;
+  /* Divisor used when we changed the cost of HARD_REGNO.  */
+  int divisor;
+  /* Next record for given allocno.  */
+  struct update_cost_record *next;
+};
+
 /* To decrease footprint of ira_allocno structure we store all data
    needed only for coloring in the following structure.  */
 struct allocno_color_data
@@ -126,6 +137,11 @@ struct allocno_color_data
   int hard_regs_subnodes_start;
   /* The length of the previous array. */
   int hard_regs_subnodes_num;
+  /* Records about updating allocno hard reg costs from copies.  If
+     the allocno did not get expected hard register, these records are
+     used to restore original hard reg costs of allocnos connected to
+     this allocno by copies.  */
+  struct update_cost_record *update_cost_records;
 };
 
 /* See above.  */
@@ -1113,6 +1129,53 @@ setup_profitable_hard_regs (void)
 /* This page contains functions used to choose hard registers for
    allocnos.  */
 
+/* Pool for update cost records.  */
+static alloc_pool update_cost_record_pool;
+
+/* Initiate update cost records.  */
+static void
+init_update_cost_records (void)
+{
+  update_cost_record_pool
+    = create_alloc_pool ("update cost records",
+			 sizeof (struct update_cost_record), 100);
+}
+
+/* Return new update cost record with given params.  */
+static struct update_cost_record *
+get_update_cost_record (int hard_regno, int divisor,
+			struct update_cost_record *next)
+{
+  struct update_cost_record *record;
+
+  record = (struct update_cost_record *) pool_alloc (update_cost_record_pool);
+  record->hard_regno = hard_regno;
+  record->divisor = divisor;
+  record->next = next;
+  return record;
+}
+
+/* Free memory for all records in LIST.  */
+static void
+free_update_cost_record_list (struct update_cost_record *list)
+{
+  struct update_cost_record *next;
+
+  while (list != NULL)
+    {
+      next = list->next;
+      pool_free (update_cost_record_pool, list);
+      list = next;
+    }
+}
+
+/* Free memory allocated for all update cost records.  */
+static void
+finish_update_cost_records (void)
+{
+  free_alloc_pool (update_cost_record_pool);
+}
+
 /* Array whose element value is TRUE if the corresponding hard
    register was already allocated for an allocno.  */
 static bool allocated_hardreg_p[FIRST_PSEUDO_REGISTER];
@@ -1129,6 +1192,11 @@ struct update_cost_queue_elem
      connecting this allocno to the one being allocated.  */
   int divisor;
 
+  /* Allocno from which we are chaning costs of connected allocnos.
+     It is used not go back in graph of allocnos connected by
+     copies.  */
+  ira_allocno_t from;
+
   /* The next allocno in the queue, or null if this is the last element.  */
   ira_allocno_t next;
 };
@@ -1145,11 +1213,11 @@ static struct update_cost_queue_elem *up
    Elements are indexed by ALLOCNO_NUM.  */
 static struct update_cost_queue_elem *update_cost_queue_elems;
 
-/* The current value of update_copy_cost call count.  */
+/* The current value of update_costs_from_copies call count.  */
 static int update_cost_check;
 
 /* Allocate and initialize data necessary for function
-   update_copy_costs.  */
+   update_costs_from_copiess.  */
 static void
 initiate_cost_update (void)
 {
@@ -1160,13 +1228,15 @@ initiate_cost_update (void)
     = (struct update_cost_queue_elem *) ira_allocate (size);
   memset (update_cost_queue_elems, 0, size);
   update_cost_check = 0;
+  init_update_cost_records ();
 }
 
-/* Deallocate data used by function update_copy_costs.  */
+/* Deallocate data used by function update_costs_from_copies.  */
 static void
 finish_cost_update (void)
 {
   ira_free (update_cost_queue_elems);
+  finish_update_cost_records ();
 }
 
 /* When we traverse allocnos to update hard register costs, the cost
@@ -1182,10 +1252,10 @@ start_update_cost (void)
   update_cost_queue = NULL;
 }
 
-/* Add (ALLOCNO, DIVISOR) to the end of update_cost_queue, unless
+/* Add (ALLOCNO, FROM, DIVISOR) to the end of update_cost_queue, unless
    ALLOCNO is already in the queue, or has NO_REGS class.  */
 static inline void
-queue_update_cost (ira_allocno_t allocno, int divisor)
+queue_update_cost (ira_allocno_t allocno, ira_allocno_t from, int divisor)
 {
   struct update_cost_queue_elem *elem;
 
@@ -1194,6 +1264,7 @@ queue_update_cost (ira_allocno_t allocno
       && ALLOCNO_CLASS (allocno) != NO_REGS)
     {
       elem->check = update_cost_check;
+      elem->from = from;
       elem->divisor = divisor;
       elem->next = NULL;
       if (update_cost_queue == NULL)
@@ -1204,11 +1275,11 @@ queue_update_cost (ira_allocno_t allocno
     }
 }
 
-/* Try to remove the first element from update_cost_queue.  Return false
-   if the queue was empty, otherwise make (*ALLOCNO, *DIVISOR) describe
-   the removed element.  */
+/* Try to remove the first element from update_cost_queue.  Return
+   false if the queue was empty, otherwise make (*ALLOCNO, *FROM,
+   *DIVISOR) describe the removed element.  */
 static inline bool
-get_next_update_cost (ira_allocno_t *allocno, int *divisor)
+get_next_update_cost (ira_allocno_t *allocno, ira_allocno_t *from, int *divisor)
 {
   struct update_cost_queue_elem *elem;
 
@@ -1217,34 +1288,50 @@ get_next_update_cost (ira_allocno_t *all
 
   *allocno = update_cost_queue;
   elem = &update_cost_queue_elems[ALLOCNO_NUM (*allocno)];
+  *from = elem->from;
   *divisor = elem->divisor;
   update_cost_queue = elem->next;
   return true;
 }
 
-/* Update the cost of allocnos to increase chances to remove some
-   copies as the result of subsequent assignment.  */
+/* Increase costs of HARD_REGNO by UPDATE_COST for ALLOCNO.  Return
+   true if we really modified the cost.  */
+static bool
+update_allocno_cost (ira_allocno_t allocno, int hard_regno, int update_cost)
+{
+  int i;
+  enum reg_class aclass = ALLOCNO_CLASS (allocno);
+
+  i = ira_class_hard_reg_index[aclass][hard_regno];
+  if (i < 0)
+    return false;
+  ira_allocate_and_set_or_copy_costs
+    (&ALLOCNO_UPDATED_HARD_REG_COSTS (allocno), aclass,
+     ALLOCNO_UPDATED_CLASS_COST (allocno),
+     ALLOCNO_HARD_REG_COSTS (allocno));
+  ira_allocate_and_set_or_copy_costs
+    (&ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (allocno),
+     aclass, 0, ALLOCNO_CONFLICT_HARD_REG_COSTS (allocno));
+  ALLOCNO_UPDATED_HARD_REG_COSTS (allocno)[i] += update_cost;
+  ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (allocno)[i] += update_cost;
+  return true;
+}
+
+/* Update (decrease if DECR_P) HARD_REGNO cost of allocnos connected
+   by copies to ALLOCNO to increase chances to remove some copies as
+   the result of subsequent assignment.  Record cost updates if
+   RECORD_P is true.  */
 static void
-update_copy_costs (ira_allocno_t allocno, bool decr_p)
+update_costs_from_allocno (ira_allocno_t allocno, int hard_regno,
+			   int divisor, bool decr_p, bool record_p)
 {
-  int i, cost, update_cost, hard_regno, divisor;
+  int cost, update_cost;
   enum machine_mode mode;
   enum reg_class rclass, aclass;
-  ira_allocno_t another_allocno;
+  ira_allocno_t another_allocno, from = NULL;
   ira_copy_t cp, next_cp;
 
-  hard_regno = ALLOCNO_HARD_REGNO (allocno);
-  ira_assert (hard_regno >= 0);
-
-  aclass = ALLOCNO_CLASS (allocno);
-  if (aclass == NO_REGS)
-    return;
-  i = ira_class_hard_reg_index[aclass][hard_regno];
-  ira_assert (i >= 0);
   rclass = REGNO_REG_CLASS (hard_regno);
-
-  start_update_cost ();
-  divisor = 1;
   do
     {
       mode = ALLOCNO_MODE (allocno);
@@ -1264,6 +1351,9 @@ update_copy_costs (ira_allocno_t allocno
 	  else
 	    gcc_unreachable ();
 
+	  if (another_allocno == from)
+	    continue;
+
 	  aclass = ALLOCNO_CLASS (another_allocno);
 	  if (! TEST_HARD_REG_BIT (reg_class_contents[aclass],
 				   hard_regno)
@@ -1280,24 +1370,67 @@ update_copy_costs (ira_allocno_t allocno
 	  if (update_cost == 0)
 	    continue;
 
-	  ira_allocate_and_set_or_copy_costs
-	    (&ALLOCNO_UPDATED_HARD_REG_COSTS (another_allocno), aclass,
-	     ALLOCNO_UPDATED_CLASS_COST (another_allocno),
-	     ALLOCNO_HARD_REG_COSTS (another_allocno));
-	  ira_allocate_and_set_or_copy_costs
-	    (&ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (another_allocno),
-	     aclass, 0, ALLOCNO_CONFLICT_HARD_REG_COSTS (another_allocno));
-	  i = ira_class_hard_reg_index[aclass][hard_regno];
-	  if (i < 0)
+	  if (! update_allocno_cost (another_allocno, hard_regno, update_cost))
 	    continue;
-	  ALLOCNO_UPDATED_HARD_REG_COSTS (another_allocno)[i] += update_cost;
-	  ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (another_allocno)[i]
-	    += update_cost;
-
-	  queue_update_cost (another_allocno, divisor * COST_HOP_DIVISOR);
+	  queue_update_cost (another_allocno, allocno, divisor * COST_HOP_DIVISOR);
+	  if (record_p && ALLOCNO_COLOR_DATA (another_allocno) != NULL)
+	    ALLOCNO_COLOR_DATA (another_allocno)->update_cost_records
+	      = get_update_cost_record (hard_regno, divisor,
+					ALLOCNO_COLOR_DATA (another_allocno)
+					->update_cost_records);
 	}
     }
-  while (get_next_update_cost (&allocno, &divisor));
+  while (get_next_update_cost (&allocno, &from, &divisor));
+}
+
+/* Decrease preferred ALLOCNO hard register costs and costs of
+   allocnos connected to ALLOCNO through copy.  */
+static void
+update_costs_from_prefs (ira_allocno_t allocno)
+{
+  ira_pref_t pref;
+
+  start_update_cost ();
+  for (pref = ALLOCNO_PREFS (allocno); pref != NULL; pref = pref->next_pref)
+    update_costs_from_allocno (allocno, pref->hard_regno,
+			       COST_HOP_DIVISOR, true, true);
+}
+
+/* Update (decrease if DECR_P) the cost of allocnos connected to
+   ALLOCNO through copies to increase chances to remove some copies as
+   the result of subsequent assignment.  ALLOCNO was just assigned to
+   a hard register.  */
+static void
+update_costs_from_copies (ira_allocno_t allocno, bool decr_p)
+{
+  int hard_regno;
+
+  hard_regno = ALLOCNO_HARD_REGNO (allocno);
+  ira_assert (hard_regno >= 0 && ALLOCNO_CLASS (allocno) != NO_REGS);
+  start_update_cost ();
+  update_costs_from_allocno (allocno, hard_regno, 1, decr_p, true);
+}
+
+/* Restore costs of allocnos connected to ALLOCNO by copies as it was
+   before updating costs of these allocnos from given allocno.  This
+   is a wise thing to do as if given allocno did not get an expected
+   hard reg, using smaller cost of the hard reg for allocnos connected
+   by copies to given allocno becomes actually misleading.  Free all
+   update cost records for ALLOCNO as we don't need them anymore.  */
+static void
+restore_costs_from_copies (ira_allocno_t allocno)
+{
+  struct update_cost_record *records, *curr;
+
+  if (ALLOCNO_COLOR_DATA (allocno) == NULL)
+    return;
+  records = ALLOCNO_COLOR_DATA (allocno)->update_cost_records;
+  start_update_cost ();
+  for (curr = records; curr != NULL; curr = curr->next)
+    update_costs_from_allocno (allocno, curr->hard_regno,
+			       curr->divisor, true, false);
+  free_update_cost_record_list (records);
+  ALLOCNO_COLOR_DATA (allocno)->update_cost_records = NULL;
 }
 
 /* This function updates COSTS (decrease if DECR_P) for hard_registers
@@ -1313,10 +1446,10 @@ update_conflict_hard_regno_costs (int *c
   int *conflict_costs;
   bool cont_p;
   enum reg_class another_aclass;
-  ira_allocno_t allocno, another_allocno;
+  ira_allocno_t allocno, another_allocno, from;
   ira_copy_t cp, next_cp;
 
-  while (get_next_update_cost (&allocno, &divisor))
+  while (get_next_update_cost (&allocno, &from, &divisor))
     for (cp = ALLOCNO_COPIES (allocno); cp != NULL; cp = next_cp)
       {
 	if (cp->first == allocno)
@@ -1331,6 +1464,10 @@ update_conflict_hard_regno_costs (int *c
 	  }
 	else
 	  gcc_unreachable ();
+
+	if (another_allocno == from)
+	  continue;
+
  	another_aclass = ALLOCNO_CLASS (another_allocno);
  	if (! ira_reg_classes_intersect_p[aclass][another_aclass]
 	    || ALLOCNO_ASSIGNED_P (another_allocno)
@@ -1374,7 +1511,7 @@ update_conflict_hard_regno_costs (int *c
 			   * COST_HOP_DIVISOR
 			   * COST_HOP_DIVISOR
 			   * COST_HOP_DIVISOR))
-	  queue_update_cost (another_allocno, divisor * COST_HOP_DIVISOR);
+	  queue_update_cost (another_allocno, allocno, divisor * COST_HOP_DIVISOR);
       }
 }
 
@@ -1640,7 +1777,8 @@ assign_hard_reg (ira_allocno_t a, bool r
 		      continue;
 		    full_costs[j] -= conflict_costs[k];
 		  }
-	      queue_update_cost (conflict_a, COST_HOP_DIVISOR);
+	      queue_update_cost (conflict_a, NULL, COST_HOP_DIVISOR);
+
 	    }
 	}
     }
@@ -1654,7 +1792,7 @@ assign_hard_reg (ira_allocno_t a, bool r
   if (! retry_p)
     {
       start_update_cost ();
-      queue_update_cost (a, COST_HOP_DIVISOR);
+      queue_update_cost (a, NULL,  COST_HOP_DIVISOR);
       update_conflict_hard_regno_costs (full_costs, aclass, false);
     }
   min_cost = min_full_cost = INT_MAX;
@@ -1711,10 +1849,11 @@ assign_hard_reg (ira_allocno_t a, bool r
       for (i = hard_regno_nregs[best_hard_regno][mode] - 1; i >= 0; i--)
 	allocated_hardreg_p[best_hard_regno + i] = true;
     }
+  restore_costs_from_copies (a);
   ALLOCNO_HARD_REGNO (a) = best_hard_regno;
   ALLOCNO_ASSIGNED_P (a) = true;
   if (best_hard_regno >= 0)
-    update_copy_costs (a, true);
+    update_costs_from_copies (a, true);
   ira_assert (ALLOCNO_CLASS (a) == aclass);
   /* We don't need updated costs anymore: */
   ira_free_allocno_updated_costs (a);
@@ -2164,7 +2303,9 @@ pop_allocnos_from_stack (void)
       else if (ALLOCNO_ASSIGNED_P (allocno))
 	{
 	  if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
-	    fprintf (ira_dump_file, "spill\n");
+	    fprintf (ira_dump_file, "spill%s\n",
+		     ALLOCNO_COLOR_DATA (allocno)->may_be_spilled_p
+		     ? "" : "!");
 	}
       ALLOCNO_COLOR_DATA (allocno)->in_graph_p = true;
     }
@@ -2546,6 +2687,32 @@ color_allocnos (void)
   ira_allocno_t a;
 
   setup_profitable_hard_regs ();
+  EXECUTE_IF_SET_IN_BITMAP (coloring_allocno_bitmap, 0, i, bi)
+    {
+      int l, nr;
+      HARD_REG_SET conflict_hard_regs;
+      allocno_color_data_t data;
+      ira_pref_t pref, next_pref;
+
+      a = ira_allocnos[i];
+      nr = ALLOCNO_NUM_OBJECTS (a);
+      CLEAR_HARD_REG_SET (conflict_hard_regs);
+      for (l = 0; l < nr; l++)
+	{
+	  ira_object_t obj = ALLOCNO_OBJECT (a, l);
+	  IOR_HARD_REG_SET (conflict_hard_regs,
+			    OBJECT_CONFLICT_HARD_REGS (obj));
+	}
+      data = ALLOCNO_COLOR_DATA (a);
+      for (pref = ALLOCNO_PREFS (a); pref != NULL; pref = next_pref)
+	{
+	  next_pref = pref->next_pref;
+	  if (! ira_hard_reg_in_set_p (pref->hard_regno,
+				       ALLOCNO_MODE (a),
+				       data->profitable_hard_regs))
+	    ira_remove_pref (pref);
+	}
+    }
   if (flag_ira_algorithm == IRA_ALGORITHM_PRIORITY)
     {
       n = 0;
@@ -2605,7 +2772,10 @@ color_allocnos (void)
 	{
 	  a = ira_allocnos[i];
 	  if (ALLOCNO_CLASS (a) != NO_REGS && ! empty_profitable_hard_regs (a))
-	    ALLOCNO_COLOR_DATA (a)->in_graph_p = true;
+	    {
+	      ALLOCNO_COLOR_DATA (a)->in_graph_p = true;
+	      update_costs_from_prefs (a);
+	    }
 	  else
 	    {
 	      ALLOCNO_HARD_REGNO (a) = -1;
@@ -2772,7 +2942,7 @@ color_pass (ira_loop_tree_node_t loop_tr
 	    ALLOCNO_HARD_REGNO (subloop_allocno) = hard_regno;
 	    ALLOCNO_ASSIGNED_P (subloop_allocno) = true;
 	    if (hard_regno >= 0)
-	      update_copy_costs (subloop_allocno, true);
+	      update_costs_from_copies (subloop_allocno, true);
 	    /* We don't need updated costs anymore: */
 	    ira_free_allocno_updated_costs (subloop_allocno);
 	  }
@@ -2816,7 +2986,7 @@ color_pass (ira_loop_tree_node_t loop_tr
 		  ALLOCNO_HARD_REGNO (subloop_allocno) = hard_regno;
 		  ALLOCNO_ASSIGNED_P (subloop_allocno) = true;
 		  if (hard_regno >= 0)
-		    update_copy_costs (subloop_allocno, true);
+		    update_costs_from_copies (subloop_allocno, true);
 		  /* We don't need updated costs anymore: */
 		  ira_free_allocno_updated_costs (subloop_allocno);
 		}
@@ -2832,7 +3002,7 @@ color_pass (ira_loop_tree_node_t loop_tr
 		  ALLOCNO_HARD_REGNO (subloop_allocno) = hard_regno;
 		  ALLOCNO_ASSIGNED_P (subloop_allocno) = true;
 		  if (hard_regno >= 0)
-		    update_copy_costs (subloop_allocno, true);
+		    update_costs_from_copies (subloop_allocno, true);
 		  /* We don't need updated costs anymore: */
 		  ira_free_allocno_updated_costs (subloop_allocno);
 		}
@@ -3813,7 +3983,7 @@ ira_mark_allocation_change (int regno)
 	       ? ALLOCNO_CLASS_COST (a)
 	       : ALLOCNO_HARD_REG_COSTS (a)
 	         [ira_class_hard_reg_index[aclass][old_hard_regno]]);
-      update_copy_costs (a, false);
+      update_costs_from_copies (a, false);
     }
   ira_overall_cost -= cost;
   ALLOCNO_HARD_REGNO (a) = hard_regno;
@@ -3828,7 +3998,7 @@ ira_mark_allocation_change (int regno)
 	       ? ALLOCNO_CLASS_COST (a)
 	       : ALLOCNO_HARD_REG_COSTS (a)
 	         [ira_class_hard_reg_index[aclass][hard_regno]]);
-      update_copy_costs (a, true);
+      update_costs_from_copies (a, true);
     }
   else
     /* Reload changed class of the allocno.  */
Index: ira-conflicts.c
===================================================================
--- ira-conflicts.c	(revision 204148)
+++ ira-conflicts.c	(working copy)
@@ -208,149 +208,6 @@ allocnos_conflict_for_copy_p (ira_allocn
   return OBJECTS_CONFLICT_P (obj1, obj2);
 }
 
-/* Return TRUE if the operand constraint STR is commutative.  */
-static bool
-commutative_constraint_p (const char *str)
-{
-  int curr_alt, c;
-  bool ignore_p;
-
-  for (ignore_p = false, curr_alt = 0;;)
-    {
-      c = *str;
-      if (c == '\0')
-	break;
-      str += CONSTRAINT_LEN (c, str);
-      if (c == '#' || !recog_data.alternative_enabled_p[curr_alt])
-	ignore_p = true;
-      else if (c == ',')
-	{
-	  curr_alt++;
-	  ignore_p = false;
-	}
-      else if (! ignore_p)
-	{
-	  /* Usually `%' is the first constraint character but the
-	     documentation does not require this.  */
-	  if (c == '%')
-	    return true;
-	}
-    }
-  return false;
-}
-
-/* Return the number of the operand which should be the same in any
-   case as operand with number OP_NUM (or negative value if there is
-   no such operand).  If USE_COMMUT_OP_P is TRUE, the function makes
-   temporarily commutative operand exchange before this.  The function
-   takes only really possible alternatives into consideration.  */
-static int
-get_dup_num (int op_num, bool use_commut_op_p)
-{
-  int curr_alt, c, original, dup;
-  bool ignore_p, commut_op_used_p;
-  const char *str;
-  rtx op;
-
-  if (op_num < 0 || recog_data.n_alternatives == 0)
-    return -1;
-  op = recog_data.operand[op_num];
-  commut_op_used_p = true;
-  if (use_commut_op_p)
-    {
-      if (commutative_constraint_p (recog_data.constraints[op_num]))
-	op_num++;
-      else if (op_num > 0 && commutative_constraint_p (recog_data.constraints
-						       [op_num - 1]))
-	op_num--;
-      else
-	commut_op_used_p = false;
-    }
-  str = recog_data.constraints[op_num];
-  for (ignore_p = false, original = -1, curr_alt = 0;;)
-    {
-      c = *str;
-      if (c == '\0')
-	break;
-      if (c == '#' || !recog_data.alternative_enabled_p[curr_alt])
-	ignore_p = true;
-      else if (c == ',')
-	{
-	  curr_alt++;
-	  ignore_p = false;
-	}
-      else if (! ignore_p)
-	switch (c)
-	  {
-	  case 'X':
-	    return -1;
-
-	  case 'm':
-	  case 'o':
-	    /* Accept a register which might be placed in memory.  */
-	    return -1;
-	    break;
-
-	  case 'V':
-	  case '<':
-	  case '>':
-	    break;
-
-	  case 'p':
-	    if (address_operand (op, VOIDmode))
-	      return -1;
-	    break;
-
-	  case 'g':
-	    return -1;
-
-	  case 'r':
-	  case 'a': case 'b': case 'c': case 'd': case 'e': case 'f':
-	  case 'h': case 'j': case 'k': case 'l':
-	  case 'q': case 't': case 'u':
-	  case 'v': case 'w': case 'x': case 'y': case 'z':
-	  case 'A': case 'B': case 'C': case 'D':
-	  case 'Q': case 'R': case 'S': case 'T': case 'U':
-	  case 'W': case 'Y': case 'Z':
-	    {
-	      enum reg_class cl;
-
-	      cl = (c == 'r'
-		    ? GENERAL_REGS : REG_CLASS_FROM_CONSTRAINT (c, str));
-	      if (cl != NO_REGS)
-		return -1;
-#ifdef EXTRA_CONSTRAINT_STR
-	      else if (EXTRA_CONSTRAINT_STR (op, c, str))
-		return -1;
-#endif
-	      break;
-	    }
-
-	  case '0': case '1': case '2': case '3': case '4':
-	  case '5': case '6': case '7': case '8': case '9':
-	    if (original != -1 && original != c)
-	      return -1;
-	    original = c;
-	    break;
-	  }
-      str += CONSTRAINT_LEN (c, str);
-    }
-  if (original == -1)
-    return -1;
-  dup = original - '0';
-  if (use_commut_op_p)
-    {
-      if (commutative_constraint_p (recog_data.constraints[dup]))
-	dup++;
-      else if (dup > 0
-	       && commutative_constraint_p (recog_data.constraints[dup -1]))
-	dup--;
-      else if (! commut_op_used_p)
-	return -1;
-    }
-  return dup;
-}
-
 /* Check that X is REG or SUBREG of REG.  */
 #define REG_SUBREG_P(x)							\
    (REG_P (x) || (GET_CODE (x) == SUBREG && REG_P (SUBREG_REG (x))))
@@ -461,6 +318,7 @@ process_regs_for_copy (rtx reg1, rtx reg
       ALLOCNO_CONFLICT_HARD_REG_COSTS (a)[index] -= cost;
       if (ALLOCNO_HARD_REG_COSTS (a)[index] < ALLOCNO_CLASS_COST (a))
 	ALLOCNO_CLASS_COST (a) = ALLOCNO_HARD_REG_COSTS (a)[index];
+      ira_add_allocno_pref (a, allocno_preferenced_hard_regno, freq);
       a = ira_parent_or_cap_allocno (a);
     }
   while (a != NULL);
@@ -498,9 +356,9 @@ static void
 add_insn_allocno_copies (rtx insn)
 {
   rtx set, operand, dup;
-  const char *str;
-  bool commut_p, bound_p[MAX_RECOG_OPERANDS];
-  int i, j, n, freq;
+  bool bound_p[MAX_RECOG_OPERANDS];
+  int i, n, freq;
+  HARD_REG_SET alts;
 
   freq = REG_FREQ_FROM_BB (BLOCK_FOR_INSN (insn));
   if (freq == 0)
@@ -513,7 +371,7 @@ add_insn_allocno_copies (rtx insn)
 			? SET_SRC (set)
 			: SUBREG_REG (SET_SRC (set))) != NULL_RTX)
     {
-      process_regs_for_copy (SET_DEST (set), SET_SRC (set),
+      process_regs_for_copy (SET_SRC (set), SET_DEST (set),
 			     false, insn, freq);
       return;
     }
@@ -521,7 +379,7 @@ add_insn_allocno_copies (rtx insn)
      there are no dead registers, there will be no such copies.  */
   if (! find_reg_note (insn, REG_DEAD, NULL_RTX))
     return;
-  extract_insn (insn);
+  ira_setup_alts (insn, alts);
   for (i = 0; i < recog_data.n_operands; i++)
     bound_p[i] = false;
   for (i = 0; i < recog_data.n_operands; i++)
@@ -529,21 +387,18 @@ add_insn_allocno_copies (rtx insn)
       operand = recog_data.operand[i];
       if (! REG_SUBREG_P (operand))
 	continue;
-      str = recog_data.constraints[i];
-      while (*str == ' ' || *str == '\t')
-	str++;
-      for (j = 0, commut_p = false; j < 2; j++, commut_p = true)
-	if ((n = get_dup_num (i, commut_p)) >= 0)
-	  {
-	    bound_p[n] = true;
-	    dup = recog_data.operand[n];
-	    if (REG_SUBREG_P (dup)
-		&& find_reg_note (insn, REG_DEAD,
-				  REG_P (operand)
-				  ? operand
-				  : SUBREG_REG (operand)) != NULL_RTX)
-	      process_regs_for_copy (operand, dup, true, NULL_RTX, freq);
-	  }
+      if ((n = ira_get_dup_out_num (i, alts)) >= 0)
+	{
+	  bound_p[n] = true;
+	  dup = recog_data.operand[n];
+	  if (REG_SUBREG_P (dup)
+	      && find_reg_note (insn, REG_DEAD,
+				REG_P (operand)
+				? operand
+				: SUBREG_REG (operand)) != NULL_RTX)
+	    process_regs_for_copy (operand, dup, true, NULL_RTX,
+				   freq);
+	}
     }
   for (i = 0; i < recog_data.n_operands; i++)
     {
Index: ira-costs.c
===================================================================
--- ira-costs.c	(revision 204148)
+++ ira-costs.c	(working copy)
@@ -405,7 +405,6 @@ record_reg_classes (int n_alts, int n_op
 {
   int alt;
   int i, j, k;
-  rtx set;
   int insn_allows_mem[MAX_RECOG_OPERANDS];
 
   for (i = 0; i < n_ops; i++)
@@ -914,60 +913,6 @@ record_reg_classes (int n_alts, int n_op
 	  ALLOCNO_BAD_SPILL_P (a) = true;
       }
 
-  /* If this insn is a single set copying operand 1 to operand 0 and
-     one operand is an allocno with the other a hard reg or an allocno
-     that prefers a hard register that is in its own register class
-     then we may want to adjust the cost of that register class to -1.
-
-     Avoid the adjustment if the source does not die to avoid
-     stressing of register allocator by preferrencing two colliding
-     registers into single class.
-
-     Also avoid the adjustment if a copy between hard registers of the
-     class is expensive (ten times the cost of a default copy is
-     considered arbitrarily expensive).  This avoids losing when the
-     preferred class is very expensive as the source of a copy
-     instruction.  */
-  if ((set = single_set (insn)) != 0
-      && ops[0] == SET_DEST (set) && ops[1] == SET_SRC (set)
-      && REG_P (ops[0]) && REG_P (ops[1])
-      && find_regno_note (insn, REG_DEAD, REGNO (ops[1])))
-    for (i = 0; i <= 1; i++)
-      if (REGNO (ops[i]) >= FIRST_PSEUDO_REGISTER
-	  && REGNO (ops[!i]) < FIRST_PSEUDO_REGISTER)
-	{
-	  unsigned int regno = REGNO (ops[i]);
-	  unsigned int other_regno = REGNO (ops[!i]);
-	  enum machine_mode mode = GET_MODE (ops[!i]);
-	  cost_classes_t cost_classes_ptr = regno_cost_classes[regno];
-	  enum reg_class *cost_classes = cost_classes_ptr->classes;
-	  reg_class_t rclass;
-	  int nr;
-
-	  for (k = cost_classes_ptr->num - 1; k >= 0; k--)
-	    {
-	      rclass = cost_classes[k];
-	      if (TEST_HARD_REG_BIT (reg_class_contents[rclass], other_regno)
-		  && (reg_class_size[(int) rclass]
-		      == ira_reg_class_max_nregs [(int) rclass][(int) mode]))
-		{
-		  if (reg_class_size[rclass] == 1)
-		    op_costs[i]->cost[k] = -frequency;
-		  else
-		    {
-		      for (nr = 0;
-			   nr < hard_regno_nregs[other_regno][mode];
-			   nr++)
-			if (! TEST_HARD_REG_BIT (reg_class_contents[rclass],
-						 other_regno + nr))
-			  break;
-		      
-		      if (nr == hard_regno_nregs[other_regno][mode])
-			op_costs[i]->cost[k] = -frequency;
-		    }
-		}
-	    }
-	}
 }
 
 
@@ -1204,6 +1149,8 @@ record_operand_costs (rtx insn, enum reg
 {
   const char *constraints[MAX_RECOG_OPERANDS];
   enum machine_mode modes[MAX_RECOG_OPERANDS];
+  rtx ops[MAX_RECOG_OPERANDS];
+  rtx set;
   int i;
 
   for (i = 0; i < recog_data.n_operands; i++)
@@ -1221,6 +1168,7 @@ record_operand_costs (rtx insn, enum reg
     {
       memcpy (op_costs[i], init_cost, struct_costs_size);
 
+      ops[i] = recog_data.operand[i];
       if (GET_CODE (recog_data.operand[i]) == SUBREG)
 	recog_data.operand[i] = SUBREG_REG (recog_data.operand[i]);
 
@@ -1260,6 +1208,77 @@ record_operand_costs (rtx insn, enum reg
   record_reg_classes (recog_data.n_alternatives, recog_data.n_operands,
 		      recog_data.operand, modes,
 		      constraints, insn, pref);
+
+  /* If this insn is a single set copying operand 1 to operand 0 and
+     one operand is an allocno with the other a hard reg or an allocno
+     that prefers a hard register that is in its own register class
+     then we may want to adjust the cost of that register class to -1.
+
+     Avoid the adjustment if the source does not die to avoid
+     stressing of register allocator by preferrencing two colliding
+     registers into single class.
+
+     Also avoid the adjustment if a copy between hard registers of the
+     class is expensive (ten times the cost of a default copy is
+     considered arbitrarily expensive).  This avoids losing when the
+     preferred class is very expensive as the source of a copy
+     instruction.  */
+  if ((set = single_set (insn)) != NULL_RTX
+      && ops[0] == SET_DEST (set) && ops[1] == SET_SRC (set))
+    {
+      int regno, other_regno;
+      rtx dest = SET_DEST (set);
+      rtx src = SET_SRC (set);
+
+      dest = SET_DEST (set);
+      src = SET_SRC (set);
+      if (GET_CODE (dest) == SUBREG
+	  && (GET_MODE_SIZE (GET_MODE (dest))
+	      == GET_MODE_SIZE (GET_MODE (SUBREG_REG (dest)))))
+	dest = SUBREG_REG (dest);
+      if (GET_CODE (src) == SUBREG
+	  && (GET_MODE_SIZE (GET_MODE (src))
+	      == GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)))))
+	src = SUBREG_REG (src);
+      if (REG_P (src) && REG_P (dest)
+	  && find_regno_note (insn, REG_DEAD, REGNO (src))
+	  && (((regno = REGNO (src)) >= FIRST_PSEUDO_REGISTER
+	       && (other_regno = REGNO (dest)) < FIRST_PSEUDO_REGISTER)
+	      || ((regno = REGNO (dest)) >= FIRST_PSEUDO_REGISTER
+		  && (other_regno = REGNO (src)) < FIRST_PSEUDO_REGISTER)))
+	{
+	  enum machine_mode mode = GET_MODE (src);
+	  cost_classes_t cost_classes_ptr = regno_cost_classes[regno];
+	  enum reg_class *cost_classes = cost_classes_ptr->classes;
+	  reg_class_t rclass;
+	  int k, nr;
+
+	  i = regno == (int) REGNO (src) ? 1 : 0;
+	  for (k = cost_classes_ptr->num - 1; k >= 0; k--)
+	    {
+	      rclass = cost_classes[k];
+	      if (TEST_HARD_REG_BIT (reg_class_contents[rclass], other_regno)
+		  && (reg_class_size[(int) rclass]
+		      == ira_reg_class_max_nregs [(int) rclass][(int) mode]))
+		{
+		  if (reg_class_size[rclass] == 1)
+		    op_costs[i]->cost[k] = -frequency;
+		  else
+		    {
+		      for (nr = 0;
+			   nr < hard_regno_nregs[other_regno][mode];
+			   nr++)
+			if (! TEST_HARD_REG_BIT (reg_class_contents[rclass],
+						 other_regno + nr))
+			  break;
+		      
+		      if (nr == hard_regno_nregs[other_regno][mode])
+			op_costs[i]->cost[k] = -frequency;
+		    }
+		}
+	    }
+	}
+    }
 }
 
 
@@ -1741,14 +1760,15 @@ find_costs_and_classes (FILE *dump_file)
 	       a != NULL;
 	       a = ALLOCNO_NEXT_REGNO_ALLOCNO (a))
 	    {
-	      a_num = ALLOCNO_NUM (a);
-	      if (regno_aclass[i] == NO_REGS)
+	      enum reg_class aclass = regno_aclass[i];
+	      int a_num = ALLOCNO_NUM (a);
+	      int *total_a_costs = COSTS (total_allocno_costs, a_num)->cost;
+	      int *a_costs = COSTS (costs, a_num)->cost;
+	
+	      if (aclass == NO_REGS)
 		best = NO_REGS;
 	      else
 		{
-		  int *total_a_costs = COSTS (total_allocno_costs, a_num)->cost;
-		  int *a_costs = COSTS (costs, a_num)->cost;
-		  
 		  /* Finding best class which is subset of the common
 		     class.  */
 		  best_cost = (1 << (HOST_BITS_PER_INT - 2)) - 1;
@@ -1757,7 +1777,7 @@ find_costs_and_classes (FILE *dump_file)
 		  for (k = 0; k < cost_classes_ptr->num; k++)
 		    {
 		      rclass = cost_classes[k];
-		      if (! ira_class_subset_p[rclass][regno_aclass[i]])
+		      if (! ira_class_subset_p[rclass][aclass])
 			continue;
 		      /* Ignore classes that are too small or invalid
 			 for this operand.  */
@@ -1792,9 +1812,25 @@ find_costs_and_classes (FILE *dump_file)
 			     ALLOCNO_LOOP_TREE_NODE (a)->loop_num);
 		  fprintf (dump_file, ") best %s, allocno %s\n",
 			   reg_class_names[best],
-			   reg_class_names[regno_aclass[i]]);
+			   reg_class_names[aclass]);
 		}
 	      pref[a_num] = best;
+	      if (pass == flag_expensive_optimizations && best != aclass
+		  && ira_class_hard_regs_num[best] > 0
+		  && (ira_reg_class_max_nregs[best][ALLOCNO_MODE (a)]
+		      >= ira_class_hard_regs_num[best]))
+		{
+		  int ind = cost_classes_ptr->index[aclass];
+
+		  ira_assert (ind >= 0);
+		  ira_add_allocno_pref (a, ira_class_hard_regs[best][0],
+					(a_costs[ind] - ALLOCNO_CLASS_COST (a))
+					/ (ira_register_move_cost
+					   [ALLOCNO_MODE (a)][best][aclass]));
+		  for (k = 0; k < cost_classes_ptr->num; k++)
+		    if (ira_class_subset_p[cost_classes[k]][best])
+		      a_costs[k] = a_costs[ind];
+		}
 	    }
 	}
       
@@ -1820,11 +1856,11 @@ find_costs_and_classes (FILE *dump_file)
 static void
 process_bb_node_for_hard_reg_moves (ira_loop_tree_node_t loop_tree_node)
 {
-  int i, freq, cost, src_regno, dst_regno, hard_regno;
+  int i, freq, src_regno, dst_regno, hard_regno, a_regno;
   bool to_p;
-  ira_allocno_t a;
-  enum reg_class rclass, hard_reg_class;
-  enum machine_mode mode;
+  ira_allocno_t a, curr_a;
+  ira_loop_tree_node_t curr_loop_tree_node;
+  enum reg_class rclass;
   basic_block bb;
   rtx insn, set, src, dst;
 
@@ -1851,15 +1887,15 @@ process_bb_node_for_hard_reg_moves (ira_
 	  && src_regno < FIRST_PSEUDO_REGISTER)
 	{
 	  hard_regno = src_regno;
-	  to_p = true;
 	  a = ira_curr_regno_allocno_map[dst_regno];
+	  to_p = true;
 	}
       else if (src_regno >= FIRST_PSEUDO_REGISTER
 	       && dst_regno < FIRST_PSEUDO_REGISTER)
 	{
 	  hard_regno = dst_regno;
-	  to_p = false;
 	  a = ira_curr_regno_allocno_map[src_regno];
+	  to_p = false;
 	}
       else
 	continue;
@@ -1869,20 +1905,31 @@ process_bb_node_for_hard_reg_moves (ira_
       i = ira_class_hard_reg_index[rclass][hard_regno];
       if (i < 0)
 	continue;
-      mode = ALLOCNO_MODE (a);
-      hard_reg_class = REGNO_REG_CLASS (hard_regno);
-      ira_init_register_move_cost_if_necessary (mode);
-      cost
-	= (to_p ? ira_register_move_cost[mode][hard_reg_class][rclass]
-	   : ira_register_move_cost[mode][rclass][hard_reg_class]) * freq;
-      ira_allocate_and_set_costs (&ALLOCNO_HARD_REG_COSTS (a), rclass,
-				  ALLOCNO_CLASS_COST (a));
-      ira_allocate_and_set_costs (&ALLOCNO_CONFLICT_HARD_REG_COSTS (a),
-				  rclass, 0);
-      ALLOCNO_HARD_REG_COSTS (a)[i] -= cost;
-      ALLOCNO_CONFLICT_HARD_REG_COSTS (a)[i] -= cost;
-      ALLOCNO_CLASS_COST (a) = MIN (ALLOCNO_CLASS_COST (a),
-				    ALLOCNO_HARD_REG_COSTS (a)[i]);
+      a_regno = ALLOCNO_REGNO (a);
+      for (curr_loop_tree_node = ALLOCNO_LOOP_TREE_NODE (a);
+	   curr_loop_tree_node != NULL;
+	   curr_loop_tree_node = curr_loop_tree_node->parent)
+	if ((curr_a = curr_loop_tree_node->regno_allocno_map[a_regno]) != NULL)
+	  ira_add_allocno_pref (curr_a, hard_regno, freq);
+      {
+	int cost;
+	enum reg_class hard_reg_class;
+	enum machine_mode mode;
+	
+	mode = ALLOCNO_MODE (a);
+	hard_reg_class = REGNO_REG_CLASS (hard_regno);
+	ira_init_register_move_cost_if_necessary (mode);
+	cost = (to_p ? ira_register_move_cost[mode][hard_reg_class][rclass]
+		: ira_register_move_cost[mode][rclass][hard_reg_class]) * freq;
+	ira_allocate_and_set_costs (&ALLOCNO_HARD_REG_COSTS (a), rclass,
+				    ALLOCNO_CLASS_COST (a));
+	ira_allocate_and_set_costs (&ALLOCNO_CONFLICT_HARD_REG_COSTS (a),
+				    rclass, 0);
+	ALLOCNO_HARD_REG_COSTS (a)[i] -= cost;
+	ALLOCNO_CONFLICT_HARD_REG_COSTS (a)[i] -= cost;
+	ALLOCNO_CLASS_COST (a) = MIN (ALLOCNO_CLASS_COST (a),
+				      ALLOCNO_HARD_REG_COSTS (a)[i]);
+      }
     }
 }
 
Index: ira-int.h
===================================================================
--- ira-int.h	(revision 204148)
+++ ira-int.h	(working copy)
@@ -57,6 +57,7 @@ extern FILE *ira_dump_file;
    allocnos.  */
 typedef struct live_range *live_range_t;
 typedef struct ira_allocno *ira_allocno_t;
+typedef struct ira_allocno_pref *ira_pref_t;
 typedef struct ira_allocno_copy *ira_copy_t;
 typedef struct ira_object *ira_object_t;
 
@@ -346,6 +347,8 @@ struct ira_allocno
      register class living at the point than number of hard-registers
      of the class available for the allocation.  */
   int excess_pressure_points_num;
+  /* Allocno hard reg preferences.  */
+  ira_pref_t allocno_prefs;
   /* Copies to other non-conflicting allocnos.  The copies can
      represent move insn or potential move insn usually because of two
      operand insn constraints.  */
@@ -426,6 +429,7 @@ struct ira_allocno
 #define ALLOCNO_BAD_SPILL_P(A) ((A)->bad_spill_p)
 #define ALLOCNO_ASSIGNED_P(A) ((A)->assigned_p)
 #define ALLOCNO_MODE(A) ((A)->mode)
+#define ALLOCNO_PREFS(A) ((A)->allocno_prefs)
 #define ALLOCNO_COPIES(A) ((A)->allocno_copies)
 #define ALLOCNO_HARD_REG_COSTS(A) ((A)->hard_reg_costs)
 #define ALLOCNO_UPDATED_HARD_REG_COSTS(A) ((A)->updated_hard_reg_costs)
@@ -516,6 +520,33 @@ extern ira_object_t *ira_object_id_map;
 /* The size of the previous array.  */
 extern int ira_objects_num;
 
+/* The following structure represents a hard register prefererence of
+   allocno.  The preference represent move insns or potential move
+   insns usually because of two operand insn constraints.  One move
+   operand is a hard register.  */
+struct ira_allocno_pref
+{
+  /* The unique order number of the preference node starting with 0.  */
+  int num;
+  /* Preferred hard register.  */
+  int hard_regno;
+  /* Accumulated execution frequency of insns from which the
+     preference created.  */
+  int freq;
+  /* Given allocno.  */
+  ira_allocno_t allocno;
+  /* All prefernces with the same allocno are linked by the following
+     member.  */
+  ira_pref_t next_pref;
+};
+
+/* Array of references to all allocno preferences.  The order number
+   of the preference corresponds to the index in the array.  */
+extern ira_pref_t *ira_prefs;
+
+/* Size of the previous array.  */
+extern int ira_prefs_num;
+
 /* The following structure represents a copy of two allocnos.  The
    copies represent move insns or potential move insns usually because
    of two operand insn constraints.  To remove register shuffle, we
@@ -925,6 +956,8 @@ extern void ira_print_disposition (FILE
 extern void ira_debug_disposition (void);
 extern void ira_debug_allocno_classes (void);
 extern void ira_init_register_move_cost (enum machine_mode);
+extern void ira_setup_alts (rtx insn, HARD_REG_SET &alts);
+extern int ira_get_dup_out_num (int op_num, HARD_REG_SET &alts);
 
 /* ira-build.c */
 
@@ -932,6 +965,10 @@ extern void ira_init_register_move_cost
 extern ira_loop_tree_node_t ira_curr_loop_tree_node;
 extern ira_allocno_t *ira_curr_regno_allocno_map;
 
+extern void ira_debug_pref (ira_pref_t);
+extern void ira_debug_prefs (void);
+extern void ira_debug_allocno_prefs (ira_allocno_t);
+
 extern void ira_debug_copy (ira_copy_t);
 extern void debug (ira_allocno_copy &ref);
 extern void debug (ira_allocno_copy *ptr);
@@ -963,10 +1000,12 @@ extern bool ira_live_ranges_intersect_p
 extern void ira_finish_live_range (live_range_t);
 extern void ira_finish_live_range_list (live_range_t);
 extern void ira_free_allocno_updated_costs (ira_allocno_t);
+extern ira_pref_t ira_create_pref (ira_allocno_t, int, int);
+extern void ira_add_allocno_pref (ira_allocno_t, int, int);
+extern void ira_remove_pref (ira_pref_t);
+extern void ira_remove_allocno_prefs (ira_allocno_t);
 extern ira_copy_t ira_create_copy (ira_allocno_t, ira_allocno_t,
 				   int, bool, rtx, ira_loop_tree_node_t);
-extern void ira_add_allocno_copy_to_list (ira_copy_t);
-extern void ira_swap_allocno_copy_ends_if_necessary (ira_copy_t);
 extern ira_copy_t ira_add_allocno_copy (ira_allocno_t, ira_allocno_t, int,
 					bool, rtx, ira_loop_tree_node_t);
 
@@ -1151,6 +1190,44 @@ ira_allocno_object_iter_cond (ira_allocn
        ira_allocno_object_iter_cond (&(ITER), (A), &(O));)
 
 
+/* The iterator for prefs.  */
+typedef struct {
+  /* The number of the current element in IRA_PREFS.  */
+  int n;
+} ira_pref_iterator;
+
+/* Initialize the iterator I.  */
+static inline void
+ira_pref_iter_init (ira_pref_iterator *i)
+{
+  i->n = 0;
+}
+
+/* Return TRUE if we have more prefs to visit, in which case *PREF is
+   set to the pref to be visited.  Otherwise, return FALSE.  */
+static inline bool
+ira_pref_iter_cond (ira_pref_iterator *i, ira_pref_t *pref)
+{
+  int n;
+
+  for (n = i->n; n < ira_prefs_num; n++)
+    if (ira_prefs[n] != NULL)
+      {
+	*pref = ira_prefs[n];
+	i->n = n + 1;
+	return true;
+      }
+  return false;
+}
+
+/* Loop over all prefs.  In each iteration, P is set to the next
+   pref.  ITER is an instance of ira_pref_iterator used to iterate
+   the prefs.  */
+#define FOR_EACH_PREF(P, ITER)				\
+  for (ira_pref_iter_init (&(ITER));			\
+       ira_pref_iter_cond (&(ITER), &(P));)
+
+
 /* The iterator for copies.  */
 typedef struct {
   /* The number of the current element in IRA_COPIES.  */
Index: ira.c
===================================================================
--- ira.c	(revision 204148)
+++ ira.c	(working copy)
@@ -1761,6 +1761,527 @@ setup_prohibited_mode_move_regs (void)
 
 
 
+/* Return TRUE if the operand constraint STR is commutative.  */
+static bool
+commutative_constraint_p (const char *str)
+{
+  int curr_alt, c;
+  bool ignore_p;
+
+  for (ignore_p = false, curr_alt = 0;;)
+    {
+      c = *str;
+      if (c == '\0')
+	break;
+      str += CONSTRAINT_LEN (c, str);
+      if (c == '#' || !recog_data.alternative_enabled_p[curr_alt])
+	ignore_p = true;
+      else if (c == ',')
+	{
+	  curr_alt++;
+	  ignore_p = false;
+	}
+      else if (! ignore_p)
+	{
+	  /* Usually `%' is the first constraint character but the
+	     documentation does not require this.  */
+	  if (c == '%')
+	    return true;
+	}
+    }
+  return false;
+}
+
+/* Setup possible alternatives in ALTS for INSN.  */
+void
+ira_setup_alts (rtx insn, HARD_REG_SET &alts)
+{
+  /* MAP nalt * nop -> start of constraints for given operand and
+     alternative */
+  static vec<const char *> insn_constraints;
+  int nop, nalt;
+  bool curr_swapped;
+  const char *p;
+  rtx op;
+  int commutative = -1;
+
+  extract_insn (insn);
+  CLEAR_HARD_REG_SET (alts);
+  insn_constraints.release ();
+  insn_constraints.safe_grow_cleared (recog_data.n_operands
+				      * recog_data.n_alternatives + 1);
+  /* Check that the hard reg set is enough for holding all
+     alternatives.  It is hard to imagine the situation when the
+     assertion is wrong.  */
+  ira_assert (recog_data.n_alternatives
+	      <= (int) MAX (sizeof (HARD_REG_ELT_TYPE) * CHAR_BIT,
+			    FIRST_PSEUDO_REGISTER));
+  for (curr_swapped = false;; curr_swapped = true)
+    {
+      /* Calculate some data common for all alternatives to speed up the
+	 function.  */
+      for (nop = 0; nop < recog_data.n_operands; nop++)
+	{
+	  for (nalt = 0, p = recog_data.constraints[nop];
+	       nalt < recog_data.n_alternatives;
+	       nalt++)
+	    {
+	      insn_constraints[nop * recog_data.n_alternatives + nalt] = p;
+	      while (*p && *p != ',')
+		p++;
+	      if (*p)
+		p++;
+	    }
+	}
+      for (nalt = 0; nalt < recog_data.n_alternatives; nalt++)
+	{
+	  if (! recog_data.alternative_enabled_p[nalt] || TEST_HARD_REG_BIT (alts, nalt))
+	    continue;
+
+	  for (nop = 0; nop < recog_data.n_operands; nop++)
+	    {
+	      int c, len;
+
+	      op = recog_data.operand[nop];
+	      p = insn_constraints[nop * recog_data.n_alternatives + nalt];
+	      if (*p == 0 || *p == ',')
+		continue;
+	      
+	      do
+		switch (c = *p, len = CONSTRAINT_LEN (c, p), c)
+		  {
+		  case '#':
+		  case ',':
+		    c = '\0';
+		  case '\0':
+		    len = 0;
+		    break;
+		  
+		  case '?':  case '!': case '*':  case '=':  case '+':
+		    break;
+		    
+		  case '%':
+		    /* We only support one commutative marker, the
+		       first one.  We already set commutative
+		       above.  */
+		    if (commutative < 0)
+		      commutative = nop;
+		    break;
+
+		  case '&':
+		    break;
+		    
+		  case '0':  case '1':  case '2':  case '3':  case '4':
+		  case '5':  case '6':  case '7':  case '8':  case '9':
+		    goto op_success;
+		    break;
+		    
+		  case 'p':
+		  case 'g':
+		  case 'X':
+		  case TARGET_MEM_CONSTRAINT:
+		    goto op_success;
+		    break;
+		    
+		  case '<':
+		    if (MEM_P (op)
+			&& (GET_CODE (XEXP (op, 0)) == PRE_DEC
+			    || GET_CODE (XEXP (op, 0)) == POST_DEC))
+		    goto op_success;
+		    break;
+		    
+		  case '>':
+		    if (MEM_P (op)
+		      && (GET_CODE (XEXP (op, 0)) == PRE_INC
+			  || GET_CODE (XEXP (op, 0)) == POST_INC))
+		      goto op_success;
+		    break;
+		    
+		  case 'E':
+		  case 'F':
+		    if (CONST_DOUBLE_AS_FLOAT_P (op)
+			|| (GET_CODE (op) == CONST_VECTOR
+			    && GET_MODE_CLASS (GET_MODE (op)) == MODE_VECTOR_FLOAT))
+		      goto op_success;
+		    break;
+		    
+		  case 'G':
+		  case 'H':
+		    if (CONST_DOUBLE_AS_FLOAT_P (op)
+			&& CONST_DOUBLE_OK_FOR_CONSTRAINT_P (op, c, p))
+		      goto op_success;
+		    break;
+		    
+		  case 's':
+		    if (CONST_SCALAR_INT_P (op))
+		      break;
+		  case 'i':
+		    if (CONSTANT_P (op))
+		      goto op_success;
+		    break;
+		    
+		  case 'n':
+		    if (CONST_SCALAR_INT_P (op))
+		      goto op_success;
+		    break;
+		    
+		  case 'I':
+		  case 'J':
+		  case 'K':
+		  case 'L':
+		  case 'M':
+		  case 'N':
+		  case 'O':
+		  case 'P':
+		    if (CONST_INT_P (op)
+			&& CONST_OK_FOR_CONSTRAINT_P (INTVAL (op), c, p))
+		      goto op_success;
+		    break;
+		    
+		  case 'V':
+		    if (MEM_P (op) && ! offsettable_memref_p (op))
+		      goto op_success;
+		    break;
+		    
+		  case 'o':
+		    goto op_success;
+		    break;
+		    
+		  default:
+		    {
+		      enum reg_class cl;
+		      
+		      cl = (c == 'r' ? GENERAL_REGS : REG_CLASS_FROM_CONSTRAINT (c, p));
+		      if (cl != NO_REGS)
+			goto op_success;
+#ifdef EXTRA_CONSTRAINT_STR
+		      else if (EXTRA_CONSTRAINT_STR (op, c, p))
+			goto op_success;
+		      else if (EXTRA_MEMORY_CONSTRAINT (c, p))
+			goto op_success;
+		      else if (EXTRA_ADDRESS_CONSTRAINT (c, p))
+			goto op_success;
+#endif
+		      break;
+		    }
+		  }
+	      while (p += len, c);
+	      break;
+	    op_success:
+	      ;
+	    }
+	  if (nop >= recog_data.n_operands)
+	    SET_HARD_REG_BIT (alts, nalt);
+	}
+      if (commutative < 0)
+	break;
+      if (curr_swapped)
+	break;
+      op = recog_data.operand[commutative];
+      recog_data.operand[commutative] = recog_data.operand[commutative + 1];
+      recog_data.operand[commutative + 1] = op;
+
+    }
+}
+
+/* Return the number of the output non-early clobber operand which
+   should be the same in any case as operand with number OP_NUM (or
+   negative value if there is no such operand).  The function takes
+   only really possible alternatives into consideration.  */
+int
+ira_get_dup_out_num (int op_num, HARD_REG_SET &alts)
+{
+  int curr_alt, c, original, dup;
+  bool ignore_p, use_commut_op_p;
+  const char *str;
+#ifdef EXTRA_CONSTRAINT_STR
+  rtx op;
+#endif
+
+  if (op_num < 0 || recog_data.n_alternatives == 0)
+    return -1;
+  use_commut_op_p = false;
+  str = recog_data.constraints[op_num];
+  for (;;)
+    {
+#ifdef EXTRA_CONSTRAINT_STR
+      op = recog_data.operand[op_num];
+#endif
+      
+      for (ignore_p = false, original = -1, curr_alt = 0;;)
+	{
+	  c = *str;
+	  if (c == '\0')
+	    break;
+	  if (c == '#' || !TEST_HARD_REG_BIT (alts, curr_alt))
+	    ignore_p = true;
+	  else if (c == ',')
+	    {
+	      curr_alt++;
+	      ignore_p = false;
+	    }
+	  else if (! ignore_p)
+	    switch (c)
+	      {
+		/* We should find duplications only for input operands.  */
+	      case '=':
+	      case '+':
+		goto fail;
+	      case 'X':
+	      case 'p':
+	      case 'g':
+		goto fail;
+	      case 'r':
+	      case 'a': case 'b': case 'c': case 'd': case 'e': case 'f':
+	      case 'h': case 'j': case 'k': case 'l':
+	      case 'q': case 't': case 'u':
+	      case 'v': case 'w': case 'x': case 'y': case 'z':
+	      case 'A': case 'B': case 'C': case 'D':
+	      case 'Q': case 'R': case 'S': case 'T': case 'U':
+	      case 'W': case 'Y': case 'Z':
+		{
+		  enum reg_class cl;
+		  
+		  cl = (c == 'r'
+			? GENERAL_REGS : REG_CLASS_FROM_CONSTRAINT (c, str));
+		  if (cl != NO_REGS)
+		    {
+		      if (! targetm.class_likely_spilled_p (cl))
+			goto fail;
+		    }
+#ifdef EXTRA_CONSTRAINT_STR
+		  else if (EXTRA_CONSTRAINT_STR (op, c, str))
+		    goto fail;
+#endif
+		  break;
+		}
+		
+	      case '0': case '1': case '2': case '3': case '4':
+	      case '5': case '6': case '7': case '8': case '9':
+		if (original != -1 && original != c)
+		  goto fail;
+		original = c;
+		break;
+	      }
+	  str += CONSTRAINT_LEN (c, str);
+	}
+      if (original == -1)
+	goto fail;
+      dup = -1;
+      for (ignore_p = false, str = recog_data.constraints[original - '0'];
+	   *str != 0;
+	   str++)
+	if (ignore_p)
+	  {
+	    if (*str == ',')
+	      ignore_p = false;
+	  }
+	else if (*str == '#')
+	  ignore_p = true;
+	else if (! ignore_p)
+	  {
+	    if (*str == '=')
+	      dup = original - '0';
+	    /* It is better ignore an alternative with early clobber.  */
+	    else if (*str == '&')
+	      goto fail;
+	  }
+      if (dup >= 0)
+	return dup;
+    fail:
+      if (use_commut_op_p)
+	break;
+      use_commut_op_p = true;
+      if (commutative_constraint_p (recog_data.constraints[op_num]))
+	str = recog_data.constraints[op_num + 1];
+      else if (op_num > 0 && commutative_constraint_p (recog_data.constraints
+						       [op_num - 1]))
+	str = recog_data.constraints[op_num - 1];
+      else
+	break;
+    }
+  return -1;
+}
+
+
+
+/* Search forward to see if the source register of a copy insn dies
+   before either it or the destination register is modified, but don't
+   scan past the end of the basic block.  If so, we can replace the
+   source with the destination and let the source die in the copy
+   insn.
+
+   This will reduce the number of registers live in that range and may
+   enable the destination and the source coalescing, thus often saving
+   one register in addition to a register-register copy.  */
+
+static void
+decrease_live_ranges_number (void)
+{
+  basic_block bb;
+  rtx insn, set, src, dest, dest_death, p, q, note;
+  int sregno, dregno;
+
+  if (! flag_expensive_optimizations)
+    return;
+
+  if (ira_dump_file)
+    fprintf (ira_dump_file, "Starting decreasing number of live ranges...\n");
+
+  FOR_EACH_BB (bb)
+    FOR_BB_INSNS (bb, insn)
+      {
+	set = single_set (insn);
+	if (! set)
+	  continue;
+	src = SET_SRC (set);
+	dest = SET_DEST (set);
+	if (! REG_P (src) || ! REG_P (dest)
+	    || find_reg_note (insn, REG_DEAD, src))
+	  continue;
+	sregno = REGNO (src);
+	dregno = REGNO (dest);
+	
+	/* We don't want to mess with hard regs if register classes
+	   are small.  */
+	if (sregno == dregno
+	    || (targetm.small_register_classes_for_mode_p (GET_MODE (src))
+		&& (sregno < FIRST_PSEUDO_REGISTER
+		    || dregno < FIRST_PSEUDO_REGISTER))
+	    /* We don't see all updates to SP if they are in an
+	       auto-inc memory reference, so we must disallow this
+	       optimization on them.  */
+	    || sregno == STACK_POINTER_REGNUM
+	    || dregno == STACK_POINTER_REGNUM)
+	  continue;
+	
+	dest_death = NULL_RTX;
+
+	for (p = NEXT_INSN (insn); p; p = NEXT_INSN (p))
+	  {
+	    if (! INSN_P (p))
+	      continue;
+	    if (BLOCK_FOR_INSN (p) != bb)
+	      break;
+	    
+	    if (reg_set_p (src, p) || reg_set_p (dest, p)
+		/* If SRC is an asm-declared register, it must not be
+		   replaced in any asm.  Unfortunately, the REG_EXPR
+		   tree for the asm variable may be absent in the SRC
+		   rtx, so we can't check the actual register
+		   declaration easily (the asm operand will have it,
+		   though).  To avoid complicating the test for a rare
+		   case, we just don't perform register replacement
+		   for a hard reg mentioned in an asm.  */
+		|| (sregno < FIRST_PSEUDO_REGISTER
+		    && asm_noperands (PATTERN (p)) >= 0
+		    && reg_overlap_mentioned_p (src, PATTERN (p)))
+		/* Don't change hard registers used by a call.  */
+		|| (CALL_P (p) && sregno < FIRST_PSEUDO_REGISTER
+		    && find_reg_fusage (p, USE, src))
+		/* Don't change a USE of a register.  */
+		|| (GET_CODE (PATTERN (p)) == USE
+		    && reg_overlap_mentioned_p (src, XEXP (PATTERN (p), 0))))
+	      break;
+	    
+	    /* See if all of SRC dies in P.  This test is slightly
+	       more conservative than it needs to be.  */
+	    if ((note = find_regno_note (p, REG_DEAD, sregno))
+		&& GET_MODE (XEXP (note, 0)) == GET_MODE (src))
+	      {
+		int failed = 0;
+		
+		/* We can do the optimization.  Scan forward from INSN
+		   again, replacing regs as we go.  Set FAILED if a
+		   replacement can't be done.  In that case, we can't
+		   move the death note for SRC.  This should be
+		   rare.  */
+		
+		/* Set to stop at next insn.  */
+		for (q = next_real_insn (insn);
+		     q != next_real_insn (p);
+		     q = next_real_insn (q))
+		  {
+		    if (reg_overlap_mentioned_p (src, PATTERN (q)))
+		      {
+			/* If SRC is a hard register, we might miss
+			   some overlapping registers with
+			   validate_replace_rtx, so we would have to
+			   undo it.  We can't if DEST is present in
+			   the insn, so fail in that combination of
+			   cases.  */
+			if (sregno < FIRST_PSEUDO_REGISTER
+			    && reg_mentioned_p (dest, PATTERN (q)))
+			  failed = 1;
+			
+			/* Attempt to replace all uses.  */
+			else if (!validate_replace_rtx (src, dest, q))
+			  failed = 1;
+			
+			/* If this succeeded, but some part of the
+			   register is still present, undo the
+			   replacement.  */
+			else if (sregno < FIRST_PSEUDO_REGISTER
+				 && reg_overlap_mentioned_p (src, PATTERN (q)))
+			  {
+			    validate_replace_rtx (dest, src, q);
+			    failed = 1;
+			  }
+		      }
+		    
+		    /* If DEST dies here, remove the death note and
+		       save it for later.  Make sure ALL of DEST dies
+		       here; again, this is overly conservative.  */
+		    if (! dest_death
+			&& (dest_death = find_regno_note (q, REG_DEAD, dregno)))
+		      {
+			if (GET_MODE (XEXP (dest_death, 0)) == GET_MODE (dest))
+			  remove_note (q, dest_death);
+			else
+			  {
+			    failed = 1;
+			    dest_death = 0;
+			  }
+		      }
+		  }
+		
+		if (! failed)
+		  {
+		    /* Move death note of SRC from P to INSN.  */
+		    remove_note (p, note);
+		    XEXP (note, 1) = REG_NOTES (insn);
+		    REG_NOTES (insn) = note;
+		  }
+		
+		/* DEST is also dead if INSN has a REG_UNUSED note for
+		   DEST.  */
+		if (! dest_death
+		    && (dest_death
+			= find_regno_note (insn, REG_UNUSED, dregno)))
+		  {
+		    PUT_REG_NOTE_KIND (dest_death, REG_DEAD);
+		    remove_note (insn, dest_death);
+		  }
+		
+		/* Put death note of DEST on P if we saw it die.  */
+		if (dest_death)
+		  {
+		    XEXP (dest_death, 1) = REG_NOTES (p);
+		    REG_NOTES (p) = dest_death;
+		  }
+		break;
+	      }
+	    
+	    /* If SRC is a hard register which is set or killed in
+	       some other way, we can't do this optimization.  */
+	    else if (sregno < FIRST_PSEUDO_REGISTER && dead_or_set_p (p, src))
+	      break;
+	  }
+      }
+}
+
+
+
 /* Return nonzero if REGNO is a particularly bad choice for reloading X.  */
 static bool
 ira_bad_reload_regno_1 (int regno, rtx x)
@@ -4466,7 +4987,7 @@ ira (FILE *f)
     }
 
   setup_prohibited_mode_move_regs ();
-
+  decrease_live_ranges_number ();
   df_note_add_problem ();
 
   /* DF_LIVE can't be used in the register allocator, too many other
@@ -4482,6 +5003,7 @@ ira (FILE *f)
   df->changeable_flags |= DF_VERIFY_SCHEDULED;
 #endif
   df_analyze ();
+
   df_clear_flags (DF_NO_INSN_RESCAN);
   regstat_init_n_sets_and_refs ();
   regstat_compute_ri ();
Index: opts.c
===================================================================
--- opts.c	(revision 204148)
+++ opts.c	(working copy)
@@ -473,7 +473,6 @@ static const struct default_options defa
     { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fschedule_insns, NULL, 1 },
     { OPT_LEVELS_2_PLUS, OPT_fschedule_insns2, NULL, 1 },
 #endif
-    { OPT_LEVELS_2_PLUS, OPT_fregmove, NULL, 1 },
     { OPT_LEVELS_2_PLUS, OPT_fstrict_aliasing, NULL, 1 },
     { OPT_LEVELS_2_PLUS, OPT_fstrict_overflow, NULL, 1 },
     { OPT_LEVELS_2_PLUS, OPT_freorder_blocks, NULL, 1 },
Index: passes.def
===================================================================
--- passes.def	(revision 204148)
+++ passes.def	(working copy)
@@ -350,7 +350,6 @@ along with GCC; see the file COPYING3.
       NEXT_PASS (pass_combine);
       NEXT_PASS (pass_if_after_combine);
       NEXT_PASS (pass_partition_blocks);
-      NEXT_PASS (pass_regmove);
       NEXT_PASS (pass_outof_cfg_layout_mode);
       NEXT_PASS (pass_split_all_insns);
       NEXT_PASS (pass_lower_subreg2);
Index: regmove.c
===================================================================
--- regmove.c	(revision 204148)
+++ regmove.c	(working copy)
@@ -1,1401 +0,0 @@
-/* Move registers around to reduce number of move instructions needed.
-   Copyright (C) 1987-2013 Free Software Foundation, Inc.
-
-This file is part of GCC.
-
-GCC is free software; you can redistribute it and/or modify it under
-the terms of the GNU General Public License as published by the Free
-Software Foundation; either version 3, or (at your option) any later
-version.
-
-GCC is distributed in the hope that it will be useful, but WITHOUT ANY
-WARRANTY; without even the implied warranty of MERCHANTABILITY or
-FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
-for more details.
-
-You should have received a copy of the GNU General Public License
-along with GCC; see the file COPYING3.  If not see
-<http://www.gnu.org/licenses/>.  */
-
-
-/* This module makes some simple RTL code transformations which
-   improve the subsequent register allocation.  */
-
-#include "config.h"
-#include "system.h"
-#include "coretypes.h"
-#include "tm.h"
-#include "rtl.h"
-#include "tm_p.h"
-#include "insn-config.h"
-#include "recog.h"
-#include "target.h"
-#include "regs.h"
-#include "hard-reg-set.h"
-#include "flags.h"
-#include "function.h"
-#include "expr.h"
-#include "basic-block.h"
-#include "except.h"
-#include "diagnostic-core.h"
-#include "reload.h"
-#include "tree-pass.h"
-#include "df.h"
-#include "ira.h"
-
-static int optimize_reg_copy_1 (rtx, rtx, rtx);
-static void optimize_reg_copy_2 (rtx, rtx, rtx);
-static void optimize_reg_copy_3 (rtx, rtx, rtx);
-static void copy_src_to_dest (rtx, rtx, rtx);
-
-enum match_use
-{
-  READ,
-  WRITE,
-  READWRITE
-};
-
-struct match {
-  int with[MAX_RECOG_OPERANDS];
-  enum match_use use[MAX_RECOG_OPERANDS];
-  int commutative[MAX_RECOG_OPERANDS];
-  int early_clobber[MAX_RECOG_OPERANDS];
-};
-
-static int find_matches (rtx, struct match *);
-static int fixup_match_2 (rtx, rtx, rtx, rtx);
-
-/* Return nonzero if registers with CLASS1 and CLASS2 can be merged without
-   causing too much register allocation problems.  */
-static int
-regclass_compatible_p (reg_class_t class0, reg_class_t class1)
-{
-  return (class0 == class1
-	  || (reg_class_subset_p (class0, class1)
-	      && ! targetm.class_likely_spilled_p (class0))
-	  || (reg_class_subset_p (class1, class0)
-	      && ! targetm.class_likely_spilled_p (class1)));
-}
-
-
-#ifdef AUTO_INC_DEC
-
-/* Find the place in the rtx X where REG is used as a memory address.
-   Return the MEM rtx that so uses it.
-   If PLUSCONST is nonzero, search instead for a memory address equivalent to
-   (plus REG (const_int PLUSCONST)).
-
-   If such an address does not appear, return 0.
-   If REG appears more than once, or is used other than in such an address,
-   return (rtx) 1.  */
-
-static rtx
-find_use_as_address (rtx x, rtx reg, HOST_WIDE_INT plusconst)
-{
-  enum rtx_code code = GET_CODE (x);
-  const char * const fmt = GET_RTX_FORMAT (code);
-  int i;
-  rtx value = 0;
-  rtx tem;
-
-  if (code == MEM && XEXP (x, 0) == reg && plusconst == 0)
-    return x;
-
-  if (code == MEM && GET_CODE (XEXP (x, 0)) == PLUS
-      && XEXP (XEXP (x, 0), 0) == reg
-      && CONST_INT_P (XEXP (XEXP (x, 0), 1))
-      && INTVAL (XEXP (XEXP (x, 0), 1)) == plusconst)
-    return x;
-
-  if (code == SIGN_EXTRACT || code == ZERO_EXTRACT)
-    {
-      /* If REG occurs inside a MEM used in a bit-field reference,
-	 that is unacceptable.  */
-      if (find_use_as_address (XEXP (x, 0), reg, 0) != 0)
-	return (rtx) (size_t) 1;
-    }
-
-  if (x == reg)
-    return (rtx) (size_t) 1;
-
-  for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
-    {
-      if (fmt[i] == 'e')
-	{
-	  tem = find_use_as_address (XEXP (x, i), reg, plusconst);
-	  if (value == 0)
-	    value = tem;
-	  else if (tem != 0)
-	    return (rtx) (size_t) 1;
-	}
-      else if (fmt[i] == 'E')
-	{
-	  int j;
-	  for (j = XVECLEN (x, i) - 1; j >= 0; j--)
-	    {
-	      tem = find_use_as_address (XVECEXP (x, i, j), reg, plusconst);
-	      if (value == 0)
-		value = tem;
-	      else if (tem != 0)
-		return (rtx) (size_t) 1;
-	    }
-	}
-    }
-
-  return value;
-}
-
-
-/* INC_INSN is an instruction that adds INCREMENT to REG.
-   Try to fold INC_INSN as a post/pre in/decrement into INSN.
-   Iff INC_INSN_SET is nonzero, inc_insn has a destination different from src.
-   Return nonzero for success.  */
-static int
-try_auto_increment (rtx insn, rtx inc_insn, rtx inc_insn_set, rtx reg,
-		    HOST_WIDE_INT increment, int pre)
-{
-  enum rtx_code inc_code;
-
-  rtx pset = single_set (insn);
-  if (pset)
-    {
-      /* Can't use the size of SET_SRC, we might have something like
-	 (sign_extend:SI (mem:QI ...  */
-      rtx use = find_use_as_address (pset, reg, 0);
-      if (use != 0 && use != (rtx) (size_t) 1)
-	{
-	  int size = GET_MODE_SIZE (GET_MODE (use));
-	  if (0
-	      || (HAVE_POST_INCREMENT
-		  && pre == 0 && (inc_code = POST_INC, increment == size))
-	      || (HAVE_PRE_INCREMENT
-		  && pre == 1 && (inc_code = PRE_INC, increment == size))
-	      || (HAVE_POST_DECREMENT
-		  && pre == 0 && (inc_code = POST_DEC, increment == -size))
-	      || (HAVE_PRE_DECREMENT
-		  && pre == 1 && (inc_code = PRE_DEC, increment == -size))
-	  )
-	    {
-	      if (inc_insn_set)
-		validate_change
-		  (inc_insn,
-		   &SET_SRC (inc_insn_set),
-		   XEXP (SET_SRC (inc_insn_set), 0), 1);
-	      validate_change (insn, &XEXP (use, 0),
-			       gen_rtx_fmt_e (inc_code,
-					      GET_MODE (XEXP (use, 0)), reg),
-			       1);
-	      if (apply_change_group ())
-		{
-		  /* If there is a REG_DEAD note on this insn, we must
-		     change this not to REG_UNUSED meaning that the register
-		     is set, but the value is dead.  Failure to do so will
-		     result in sched1 dying -- when it recomputes lifetime
-		     information, the number of REG_DEAD notes will have
-		     changed.  */
-		  rtx note = find_reg_note (insn, REG_DEAD, reg);
-		  if (note)
-		    PUT_REG_NOTE_KIND (note, REG_UNUSED);
-
-		  add_reg_note (insn, REG_INC, reg);
-
-		  if (! inc_insn_set)
-		    delete_insn (inc_insn);
-		  return 1;
-		}
-	    }
-	}
-    }
-  return 0;
-}
-#endif
-
-
-static int *regno_src_regno;
-
-/* INSN is a copy from SRC to DEST, both registers, and SRC does not die
-   in INSN.
-
-   Search forward to see if SRC dies before either it or DEST is modified,
-   but don't scan past the end of a basic block.  If so, we can replace SRC
-   with DEST and let SRC die in INSN.
-
-   This will reduce the number of registers live in that range and may enable
-   DEST to be tied to SRC, thus often saving one register in addition to a
-   register-register copy.  */
-
-static int
-optimize_reg_copy_1 (rtx insn, rtx dest, rtx src)
-{
-  rtx p, q;
-  rtx note;
-  rtx dest_death = 0;
-  int sregno = REGNO (src);
-  int dregno = REGNO (dest);
-  basic_block bb = BLOCK_FOR_INSN (insn);
-
-  /* We don't want to mess with hard regs if register classes are small.  */
-  if (sregno == dregno
-      || (targetm.small_register_classes_for_mode_p (GET_MODE (src))
-	  && (sregno < FIRST_PSEUDO_REGISTER
-	      || dregno < FIRST_PSEUDO_REGISTER))
-      /* We don't see all updates to SP if they are in an auto-inc memory
-	 reference, so we must disallow this optimization on them.  */
-      || sregno == STACK_POINTER_REGNUM || dregno == STACK_POINTER_REGNUM)
-    return 0;
-
-  for (p = NEXT_INSN (insn); p; p = NEXT_INSN (p))
-    {
-      if (! INSN_P (p))
-	continue;
-      if (BLOCK_FOR_INSN (p) != bb)
-	break;
-
-      if (reg_set_p (src, p) || reg_set_p (dest, p)
-	  /* If SRC is an asm-declared register, it must not be replaced
-	     in any asm.  Unfortunately, the REG_EXPR tree for the asm
-	     variable may be absent in the SRC rtx, so we can't check the
-	     actual register declaration easily (the asm operand will have
-	     it, though).  To avoid complicating the test for a rare case,
-	     we just don't perform register replacement for a hard reg
-	     mentioned in an asm.  */
-	  || (sregno < FIRST_PSEUDO_REGISTER
-	      && asm_noperands (PATTERN (p)) >= 0
-	      && reg_overlap_mentioned_p (src, PATTERN (p)))
-	  /* Don't change hard registers used by a call.  */
-	  || (CALL_P (p) && sregno < FIRST_PSEUDO_REGISTER
-	      && find_reg_fusage (p, USE, src))
-	  /* Don't change a USE of a register.  */
-	  || (GET_CODE (PATTERN (p)) == USE
-	      && reg_overlap_mentioned_p (src, XEXP (PATTERN (p), 0))))
-	break;
-
-      /* See if all of SRC dies in P.  This test is slightly more
-	 conservative than it needs to be.  */
-      if ((note = find_regno_note (p, REG_DEAD, sregno)) != 0
-	  && GET_MODE (XEXP (note, 0)) == GET_MODE (src))
-	{
-	  int failed = 0;
-	  int d_length = 0;
-	  int s_length = 0;
-	  int d_n_calls = 0;
-	  int s_n_calls = 0;
-	  int s_freq_calls = 0;
-	  int d_freq_calls = 0;
-
-	  /* We can do the optimization.  Scan forward from INSN again,
-	     replacing regs as we go.  Set FAILED if a replacement can't
-	     be done.  In that case, we can't move the death note for SRC.
-	     This should be rare.  */
-
-	  /* Set to stop at next insn.  */
-	  for (q = next_real_insn (insn);
-	       q != next_real_insn (p);
-	       q = next_real_insn (q))
-	    {
-	      if (reg_overlap_mentioned_p (src, PATTERN (q)))
-		{
-		  /* If SRC is a hard register, we might miss some
-		     overlapping registers with validate_replace_rtx,
-		     so we would have to undo it.  We can't if DEST is
-		     present in the insn, so fail in that combination
-		     of cases.  */
-		  if (sregno < FIRST_PSEUDO_REGISTER
-		      && reg_mentioned_p (dest, PATTERN (q)))
-		    failed = 1;
-
-		  /* Attempt to replace all uses.  */
-		  else if (!validate_replace_rtx (src, dest, q))
-		    failed = 1;
-
-		  /* If this succeeded, but some part of the register
-		     is still present, undo the replacement.  */
-		  else if (sregno < FIRST_PSEUDO_REGISTER
-			   && reg_overlap_mentioned_p (src, PATTERN (q)))
-		    {
-		      validate_replace_rtx (dest, src, q);
-		      failed = 1;
-		    }
-		}
-
-	      /* For SREGNO, count the total number of insns scanned.
-		 For DREGNO, count the total number of insns scanned after
-		 passing the death note for DREGNO.  */
-	      if (!DEBUG_INSN_P (p))
-		{
-		  s_length++;
-		  if (dest_death)
-		    d_length++;
-		}
-
-	      /* If the insn in which SRC dies is a CALL_INSN, don't count it
-		 as a call that has been crossed.  Otherwise, count it.  */
-	      if (q != p && CALL_P (q))
-		{
-		  /* Similarly, total calls for SREGNO, total calls beyond
-		     the death note for DREGNO.  */
-		  s_n_calls++;
-		  s_freq_calls += REG_FREQ_FROM_BB  (BLOCK_FOR_INSN (q));
-		  if (dest_death)
-		    {
-		      d_n_calls++;
-		      d_freq_calls += REG_FREQ_FROM_BB  (BLOCK_FOR_INSN (q));
-		    }
-		}
-
-	      /* If DEST dies here, remove the death note and save it for
-		 later.  Make sure ALL of DEST dies here; again, this is
-		 overly conservative.  */
-	      if (dest_death == 0
-		  && (dest_death = find_regno_note (q, REG_DEAD, dregno)) != 0)
-		{
-		  if (GET_MODE (XEXP (dest_death, 0)) != GET_MODE (dest))
-		    failed = 1, dest_death = 0;
-		  else
-		    remove_note (q, dest_death);
-		}
-	    }
-
-	  if (! failed)
-	    {
-	      /* These counters need to be updated if and only if we are
-		 going to move the REG_DEAD note.  */
-	      if (sregno >= FIRST_PSEUDO_REGISTER)
-		{
-		  if (REG_LIVE_LENGTH (sregno) >= 0)
-		    {
-		      REG_LIVE_LENGTH (sregno) -= s_length;
-		      /* REG_LIVE_LENGTH is only an approximation after
-			 combine if sched is not run, so make sure that we
-			 still have a reasonable value.  */
-		      if (REG_LIVE_LENGTH (sregno) < 2)
-			REG_LIVE_LENGTH (sregno) = 2;
-		    }
-
-		  REG_N_CALLS_CROSSED (sregno) -= s_n_calls;
-		  REG_FREQ_CALLS_CROSSED (sregno) -= s_freq_calls;
-		}
-
-	      /* Move death note of SRC from P to INSN.  */
-	      remove_note (p, note);
-	      XEXP (note, 1) = REG_NOTES (insn);
-	      REG_NOTES (insn) = note;
-	    }
-
-	  /* DEST is also dead if INSN has a REG_UNUSED note for DEST.  */
-	  if (! dest_death
-	      && (dest_death = find_regno_note (insn, REG_UNUSED, dregno)))
-	    {
-	      PUT_REG_NOTE_KIND (dest_death, REG_DEAD);
-	      remove_note (insn, dest_death);
-	    }
-
-	  /* Put death note of DEST on P if we saw it die.  */
-	  if (dest_death)
-	    {
-	      XEXP (dest_death, 1) = REG_NOTES (p);
-	      REG_NOTES (p) = dest_death;
-
-	      if (dregno >= FIRST_PSEUDO_REGISTER)
-		{
-		  /* If and only if we are moving the death note for DREGNO,
-		     then we need to update its counters.  */
-		  if (REG_LIVE_LENGTH (dregno) >= 0)
-		    REG_LIVE_LENGTH (dregno) += d_length;
-		  REG_N_CALLS_CROSSED (dregno) += d_n_calls;
-		  REG_FREQ_CALLS_CROSSED (dregno) += d_freq_calls;
-		}
-	    }
-
-	  return ! failed;
-	}
-
-      /* If SRC is a hard register which is set or killed in some other
-	 way, we can't do this optimization.  */
-      else if (sregno < FIRST_PSEUDO_REGISTER
-	       && dead_or_set_p (p, src))
-	break;
-    }
-  return 0;
-}
-
-/* INSN is a copy of SRC to DEST, in which SRC dies.  See if we now have
-   a sequence of insns that modify DEST followed by an insn that sets
-   SRC to DEST in which DEST dies, with no prior modification of DEST.
-   (There is no need to check if the insns in between actually modify
-   DEST.  We should not have cases where DEST is not modified, but
-   the optimization is safe if no such modification is detected.)
-   In that case, we can replace all uses of DEST, starting with INSN and
-   ending with the set of SRC to DEST, with SRC.  We do not do this
-   optimization if a CALL_INSN is crossed unless SRC already crosses a
-   call or if DEST dies before the copy back to SRC.
-
-   It is assumed that DEST and SRC are pseudos; it is too complicated to do
-   this for hard registers since the substitutions we may make might fail.  */
-
-static void
-optimize_reg_copy_2 (rtx insn, rtx dest, rtx src)
-{
-  rtx p, q;
-  rtx set;
-  int sregno = REGNO (src);
-  int dregno = REGNO (dest);
-  basic_block bb = BLOCK_FOR_INSN (insn);
-
-  for (p = NEXT_INSN (insn); p; p = NEXT_INSN (p))
-    {
-      if (! INSN_P (p))
-	continue;
-      if (BLOCK_FOR_INSN (p) != bb)
-	break;
-
-      set = single_set (p);
-      if (set && SET_SRC (set) == dest && SET_DEST (set) == src
-	  && find_reg_note (p, REG_DEAD, dest))
-	{
-	  /* We can do the optimization.  Scan forward from INSN again,
-	     replacing regs as we go.  */
-
-	  /* Set to stop at next insn.  */
-	  for (q = insn; q != NEXT_INSN (p); q = NEXT_INSN (q))
-	    if (INSN_P (q))
-	      {
-		if (reg_mentioned_p (dest, PATTERN (q)))
-		  {
-		    rtx note;
-
-		    PATTERN (q) = replace_rtx (PATTERN (q), dest, src);
-		    note = FIND_REG_INC_NOTE (q, dest);
-		    if (note)
-		      {
-			remove_note (q, note);
-			add_reg_note (q, REG_INC, src);
-		      }
-		    df_insn_rescan (q);
-		  }
-
-		if (CALL_P (q))
-		  {
-		    int freq = REG_FREQ_FROM_BB  (BLOCK_FOR_INSN (q));
-		    REG_N_CALLS_CROSSED (dregno)--;
-		    REG_N_CALLS_CROSSED (sregno)++;
-		    REG_FREQ_CALLS_CROSSED (dregno) -= freq;
-		    REG_FREQ_CALLS_CROSSED (sregno) += freq;
-		  }
-	      }
-
-	  remove_note (p, find_reg_note (p, REG_DEAD, dest));
-	  REG_N_DEATHS (dregno)--;
-	  remove_note (insn, find_reg_note (insn, REG_DEAD, src));
-	  REG_N_DEATHS (sregno)--;
-	  return;
-	}
-
-      if (reg_set_p (src, p)
-	  || find_reg_note (p, REG_DEAD, dest)
-	  || (CALL_P (p) && REG_N_CALLS_CROSSED (sregno) == 0))
-	break;
-    }
-}
-
-/* INSN is a ZERO_EXTEND or SIGN_EXTEND of SRC to DEST.
-   Look if SRC dies there, and if it is only set once, by loading
-   it from memory.  If so, try to incorporate the zero/sign extension
-   into the memory read, change SRC to the mode of DEST, and alter
-   the remaining accesses to use the appropriate SUBREG.  This allows
-   SRC and DEST to be tied later.  */
-static void
-optimize_reg_copy_3 (rtx insn, rtx dest, rtx src)
-{
-  rtx src_reg = XEXP (src, 0);
-  int src_no = REGNO (src_reg);
-  int dst_no = REGNO (dest);
-  rtx p, set, set_insn;
-  enum machine_mode old_mode;
-  basic_block bb = BLOCK_FOR_INSN (insn);
-
-  if (src_no < FIRST_PSEUDO_REGISTER
-      || dst_no < FIRST_PSEUDO_REGISTER
-      || ! find_reg_note (insn, REG_DEAD, src_reg)
-      || REG_N_DEATHS (src_no) != 1
-      || REG_N_SETS (src_no) != 1)
-    return;
-
-  for (p = PREV_INSN (insn); p && ! reg_set_p (src_reg, p); p = PREV_INSN (p))
-    if (INSN_P (p) && BLOCK_FOR_INSN (p) != bb)
-      break;
-
-  if (! p || BLOCK_FOR_INSN (p) != bb)
-    return;
-
-  if (! (set = single_set (p))
-      || !MEM_P (SET_SRC (set))
-      /* If there's a REG_EQUIV note, this must be an insn that loads an
-	 argument.  Prefer keeping the note over doing this optimization.  */
-      || find_reg_note (p, REG_EQUIV, NULL_RTX)
-      || SET_DEST (set) != src_reg)
-    return;
-
-  /* Be conservative: although this optimization is also valid for
-     volatile memory references, that could cause trouble in later passes.  */
-  if (MEM_VOLATILE_P (SET_SRC (set)))
-    return;
-
-  /* Do not use a SUBREG to truncate from one mode to another if truncation
-     is not a nop.  */
-  if (GET_MODE_BITSIZE (GET_MODE (src_reg)) <= GET_MODE_BITSIZE (GET_MODE (src))
-      && !TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (src), GET_MODE (src_reg)))
-    return;
-
-  set_insn = p;
-  old_mode = GET_MODE (src_reg);
-  PUT_MODE (src_reg, GET_MODE (src));
-  XEXP (src, 0) = SET_SRC (set);
-
-  /* Include this change in the group so that it's easily undone if
-     one of the changes in the group is invalid.  */
-  validate_change (p, &SET_SRC (set), src, 1);
-
-  /* Now walk forward making additional replacements.  We want to be able
-     to undo all the changes if a later substitution fails.  */
-  while (p = NEXT_INSN (p), p != insn)
-    {
-      if (! INSN_P (p))
-	continue;
-
-      /* Make a tentative change.  */
-      validate_replace_rtx_group (src_reg,
-				  gen_lowpart_SUBREG (old_mode, src_reg),
-				  p);
-    }
-
-  validate_replace_rtx_group (src, src_reg, insn);
-
-  /* Now see if all the changes are valid.  */
-  if (! apply_change_group ())
-    {
-      /* One or more changes were no good.  Back out everything.  */
-      PUT_MODE (src_reg, old_mode);
-      XEXP (src, 0) = src_reg;
-    }
-  else
-    {
-      rtx note = find_reg_note (set_insn, REG_EQUAL, NULL_RTX);
-      if (note)
-	{
-	  if (rtx_equal_p (XEXP (note, 0), XEXP (src, 0)))
-	    {
-	      XEXP (note, 0)
-		= gen_rtx_fmt_e (GET_CODE (src), GET_MODE (src),
-				 XEXP (note, 0));
-	      df_notes_rescan (set_insn);
-	    }
-	  else
-	    remove_note (set_insn, note);
-	}
-    }
-}
-
-
-/* If we were not able to update the users of src to use dest directly, try
-   instead moving the value to dest directly before the operation.  */
-
-static void
-copy_src_to_dest (rtx insn, rtx src, rtx dest)
-{
-  rtx seq;
-  rtx link;
-  rtx next;
-  rtx set;
-  rtx move_insn;
-  rtx *p_insn_notes;
-  rtx *p_move_notes;
-  int src_regno;
-  int dest_regno;
-
-  /* A REG_LIVE_LENGTH of -1 indicates the register must not go into
-     a hard register, e.g. because it crosses as setjmp.  See the
-     comment in regstat.c:regstat_bb_compute_ri.  Don't try to apply
-     any transformations to such regs.  */
-
-  if (REG_P (src)
-      && REG_LIVE_LENGTH (REGNO (src)) > 0
-      && REG_P (dest)
-      && REG_LIVE_LENGTH (REGNO (dest)) > 0
-      && (set = single_set (insn)) != NULL_RTX
-      && !reg_mentioned_p (dest, SET_SRC (set))
-      && GET_MODE (src) == GET_MODE (dest))
-    {
-      int old_num_regs = reg_rtx_no;
-
-      /* Generate the src->dest move.  */
-      start_sequence ();
-      emit_move_insn (dest, src);
-      seq = get_insns ();
-      end_sequence ();
-      /* If this sequence uses new registers, we may not use it.  */
-      if (old_num_regs != reg_rtx_no
-	  || ! validate_replace_rtx (src, dest, insn))
-	{
-	  /* We have to restore reg_rtx_no to its old value, lest
-	     recompute_reg_usage will try to compute the usage of the
-	     new regs, yet reg_n_info is not valid for them.  */
-	  reg_rtx_no = old_num_regs;
-	  return;
-	}
-      emit_insn_before (seq, insn);
-      move_insn = PREV_INSN (insn);
-      p_move_notes = &REG_NOTES (move_insn);
-      p_insn_notes = &REG_NOTES (insn);
-
-      /* Move any notes mentioning src to the move instruction.  */
-      for (link = REG_NOTES (insn); link != NULL_RTX; link = next)
-	{
-	  next = XEXP (link, 1);
-	  if (GET_CODE (link) == EXPR_LIST && XEXP (link, 0) == src)
-	    {
-	      *p_move_notes = link;
-	      p_move_notes = &XEXP (link, 1);
-	    }
-	  else
-	    {
-	      *p_insn_notes = link;
-	      p_insn_notes = &XEXP (link, 1);
-	    }
-	}
-
-      *p_move_notes = NULL_RTX;
-      *p_insn_notes = NULL_RTX;
-
-      /* Update the various register tables.  */
-      dest_regno = REGNO (dest);
-      INC_REG_N_SETS (dest_regno, 1);
-      REG_LIVE_LENGTH (dest_regno)++;
-      src_regno = REGNO (src);
-      if (! find_reg_note (move_insn, REG_DEAD, src))
-	REG_LIVE_LENGTH (src_regno)++;
-    }
-}
-
-/* reg_set_in_bb[REGNO] points to basic block iff the register is set
-   only once in the given block and has REG_EQUAL note.  */
-
-static basic_block *reg_set_in_bb;
-
-/* Size of reg_set_in_bb array.  */
-static unsigned int max_reg_computed;
-
-
-/* Return whether REG is set in only one location, and is set to a
-   constant, but is set in a different basic block from INSN (an
-   instructions which uses REG).  In this case REG is equivalent to a
-   constant, and we don't want to break that equivalence, because that
-   may increase register pressure and make reload harder.  If REG is
-   set in the same basic block as INSN, we don't worry about it,
-   because we'll probably need a register anyhow (??? but what if REG
-   is used in a different basic block as well as this one?).  */
-
-static bool
-reg_is_remote_constant_p (rtx reg, rtx insn)
-{
-  basic_block bb;
-  rtx p;
-  int max;
-
-  if (!reg_set_in_bb)
-    {
-      max_reg_computed = max = max_reg_num ();
-      reg_set_in_bb = XCNEWVEC (basic_block, max);
-
-      FOR_EACH_BB (bb)
-	FOR_BB_INSNS (bb, p)
-	  {
-	    rtx s;
-
-	    if (!INSN_P (p))
-	      continue;
-	    s = single_set (p);
-	    /* This is the instruction which sets REG.  If there is a
-	       REG_EQUAL note, then REG is equivalent to a constant.  */
-	    if (s != 0
-	        && REG_P (SET_DEST (s))
-	        && REG_N_SETS (REGNO (SET_DEST (s))) == 1
-	        && find_reg_note (p, REG_EQUAL, NULL_RTX))
-	      reg_set_in_bb[REGNO (SET_DEST (s))] = bb;
-	  }
-    }
-
-  gcc_assert (REGNO (reg) < max_reg_computed);
-  if (reg_set_in_bb[REGNO (reg)] == NULL)
-    return false;
-  return (reg_set_in_bb[REGNO (reg)] != BLOCK_FOR_INSN (insn));
-}
-
-/* INSN is adding a CONST_INT to a REG.  We search backwards looking for
-   another add immediate instruction with the same source and dest registers,
-   and if we find one, we change INSN to an increment, and return 1.  If
-   no changes are made, we return 0.
-
-   This changes
-     (set (reg100) (plus reg1 offset1))
-     ...
-     (set (reg100) (plus reg1 offset2))
-   to
-     (set (reg100) (plus reg1 offset1))
-     ...
-     (set (reg100) (plus reg100 offset2-offset1))  */
-
-/* ??? What does this comment mean?  */
-/* cse disrupts preincrement / postdecrement sequences when it finds a
-   hard register as ultimate source, like the frame pointer.  */
-
-static int
-fixup_match_2 (rtx insn, rtx dst, rtx src, rtx offset)
-{
-  rtx p, dst_death = 0;
-  int length, num_calls = 0, freq_calls = 0;
-  basic_block bb = BLOCK_FOR_INSN (insn);
-
-  /* If SRC dies in INSN, we'd have to move the death note.  This is
-     considered to be very unlikely, so we just skip the optimization
-     in this case.  */
-  if (find_regno_note (insn, REG_DEAD, REGNO (src)))
-    return 0;
-
-  /* Scan backward to find the first instruction that sets DST.  */
-
-  for (length = 0, p = PREV_INSN (insn); p; p = PREV_INSN (p))
-    {
-      rtx pset;
-
-      if (! INSN_P (p))
-	continue;
-      if (BLOCK_FOR_INSN (p) != bb)
-	break;
-
-      if (find_regno_note (p, REG_DEAD, REGNO (dst)))
-	dst_death = p;
-      if (! dst_death && !DEBUG_INSN_P (p))
-	length++;
-
-      pset = single_set (p);
-      if (pset && SET_DEST (pset) == dst
-	  && GET_CODE (SET_SRC (pset)) == PLUS
-	  && XEXP (SET_SRC (pset), 0) == src
-	  && CONST_INT_P (XEXP (SET_SRC (pset), 1)))
-	{
-	  HOST_WIDE_INT newconst
-	    = INTVAL (offset) - INTVAL (XEXP (SET_SRC (pset), 1));
-	  rtx add = gen_add3_insn (dst, dst,
-				   gen_int_mode (newconst, GET_MODE (dst)));
-
-	  if (add && validate_change (insn, &PATTERN (insn), add, 0))
-	    {
-	      /* Remove the death note for DST from DST_DEATH.  */
-	      if (dst_death)
-		{
-		  remove_death (REGNO (dst), dst_death);
-		  REG_LIVE_LENGTH (REGNO (dst)) += length;
-		  REG_N_CALLS_CROSSED (REGNO (dst)) += num_calls;
-		  REG_FREQ_CALLS_CROSSED (REGNO (dst)) += freq_calls;
-		}
-
-	      if (dump_file)
-		fprintf (dump_file,
-			 "Fixed operand of insn %d.\n",
-			  INSN_UID (insn));
-
-#ifdef AUTO_INC_DEC
-	      for (p = PREV_INSN (insn); p; p = PREV_INSN (p))
-		{
-		  if (! INSN_P (p))
-		    continue;
-		  if (BLOCK_FOR_INSN (p) != bb)
-		    break;
-		  if (reg_overlap_mentioned_p (dst, PATTERN (p)))
-		    {
-		      if (try_auto_increment (p, insn, 0, dst, newconst, 0))
-			return 1;
-		      break;
-		    }
-		}
-	      for (p = NEXT_INSN (insn); p; p = NEXT_INSN (p))
-		{
-		  if (! INSN_P (p))
-		    continue;
-		  if (BLOCK_FOR_INSN (p) != bb)
-		    break;
-		  if (reg_overlap_mentioned_p (dst, PATTERN (p)))
-		    {
-		      try_auto_increment (p, insn, 0, dst, newconst, 1);
-		      break;
-		    }
-		}
-#endif
-	      return 1;
-	    }
-	}
-
-      if (reg_set_p (dst, PATTERN (p)))
-	break;
-
-      /* If we have passed a call instruction, and the
-         pseudo-reg SRC is not already live across a call,
-         then don't perform the optimization.  */
-      /* reg_set_p is overly conservative for CALL_INSNS, thinks that all
-	 hard regs are clobbered.  Thus, we only use it for src for
-	 non-call insns.  */
-      if (CALL_P (p))
-	{
-	  if (! dst_death)
-	    {
-	      num_calls++;
-	      freq_calls += REG_FREQ_FROM_BB  (BLOCK_FOR_INSN (p));
-	    }
-
-	  if (REG_N_CALLS_CROSSED (REGNO (src)) == 0)
-	    break;
-
-	  if ((HARD_REGISTER_P (dst) && call_used_regs [REGNO (dst)])
-	      || find_reg_fusage (p, CLOBBER, dst))
-	    break;
-	}
-      else if (reg_set_p (src, PATTERN (p)))
-	break;
-    }
-
-  return 0;
-}
-
-/* A forward pass.  Replace output operands with input operands.  */
-
-static void
-regmove_forward_pass (void)
-{
-  basic_block bb;
-  rtx insn;
-
-  if (! flag_expensive_optimizations)
-    return;
-
-  if (dump_file)
-    fprintf (dump_file, "Starting forward pass...\n");
-
-  FOR_EACH_BB (bb)
-    {
-      FOR_BB_INSNS (bb, insn)
-	{
-	  rtx set = single_set (insn);
-	  if (! set)
-	    continue;
-
-	  if ((GET_CODE (SET_SRC (set)) == SIGN_EXTEND
-	       || GET_CODE (SET_SRC (set)) == ZERO_EXTEND)
-	      && REG_P (XEXP (SET_SRC (set), 0))
-	      && REG_P (SET_DEST (set)))
-	    optimize_reg_copy_3 (insn, SET_DEST (set), SET_SRC (set));
-
-	  if (REG_P (SET_SRC (set))
-	      && REG_P (SET_DEST (set)))
-	    {
-	      /* If this is a register-register copy where SRC is not dead,
-		 see if we can optimize it.  If this optimization succeeds,
-		 it will become a copy where SRC is dead.  */
-	      if ((find_reg_note (insn, REG_DEAD, SET_SRC (set))
-		   || optimize_reg_copy_1 (insn, SET_DEST (set), SET_SRC (set)))
-		  && REGNO (SET_DEST (set)) >= FIRST_PSEUDO_REGISTER)
-		{
-		  /* Similarly for a pseudo-pseudo copy when SRC is dead.  */
-		  if (REGNO (SET_SRC (set)) >= FIRST_PSEUDO_REGISTER)
-		    optimize_reg_copy_2 (insn, SET_DEST (set), SET_SRC (set));
-		  if (regno_src_regno[REGNO (SET_DEST (set))] < 0
-		      && SET_SRC (set) != SET_DEST (set))
-		    {
-		      int srcregno = REGNO (SET_SRC (set));
-		      if (regno_src_regno[srcregno] >= 0)
-			srcregno = regno_src_regno[srcregno];
-		      regno_src_regno[REGNO (SET_DEST (set))] = srcregno;
-		    }
-		}
-	    }
-	}
-    }
-}
-
-/* A backward pass.  Replace input operands with output operands.  */
-
-static void
-regmove_backward_pass (void)
-{
-  basic_block bb;
-  rtx insn, prev;
-
-  if (dump_file)
-    fprintf (dump_file, "Starting backward pass...\n");
-
-  FOR_EACH_BB_REVERSE (bb)
-    {
-      /* ??? Use the safe iterator because fixup_match_2 can remove
-	     insns via try_auto_increment.  */
-      FOR_BB_INSNS_REVERSE_SAFE (bb, insn, prev)
-	{
-	  struct match match;
-	  rtx copy_src, copy_dst;
-	  int op_no, match_no;
-	  int success = 0;
-
-	  if (! INSN_P (insn))
-	    continue;
-
-	  if (! find_matches (insn, &match))
-	    continue;
-
-	  /* Now scan through the operands looking for a destination operand
-	     which is supposed to match a source operand.
-	     Then scan backward for an instruction which sets the source
-	     operand.  If safe, then replace the source operand with the
-	     dest operand in both instructions.  */
-
-	  copy_src = NULL_RTX;
-	  copy_dst = NULL_RTX;
-	  for (op_no = 0; op_no < recog_data.n_operands; op_no++)
-	    {
-	      rtx set, p, src, dst;
-	      rtx src_note, dst_note;
-	      int num_calls = 0, freq_calls = 0;
-	      enum reg_class src_class, dst_class;
-	      int length;
-
-	      match_no = match.with[op_no];
-
-	      /* Nothing to do if the two operands aren't supposed to match.  */
-	      if (match_no < 0)
-		continue;
-
-	      dst = recog_data.operand[match_no];
-	      src = recog_data.operand[op_no];
-
-	      if (!REG_P (src))
-		continue;
-
-	      if (!REG_P (dst)
-		  || REGNO (dst) < FIRST_PSEUDO_REGISTER
-		  || REG_LIVE_LENGTH (REGNO (dst)) < 0
-		  || GET_MODE (src) != GET_MODE (dst))
-		continue;
-
-	      /* If the operands already match, then there is nothing to do.  */
-	      if (operands_match_p (src, dst))
-		continue;
-
-	      if (match.commutative[op_no] >= 0)
-		{
-		  rtx comm = recog_data.operand[match.commutative[op_no]];
-		  if (operands_match_p (comm, dst))
-		    continue;
-		}
-
-	      set = single_set (insn);
-	      if (! set)
-		continue;
-
-	      /* Note that single_set ignores parts of a parallel set for
-		 which one of the destinations is REG_UNUSED.  We can't
-		 handle that here, since we can wind up rewriting things
-		 such that a single register is set twice within a single
-		 parallel.  */
-	      if (reg_set_p (src, insn))
-		continue;
-
-	      /* match_no/dst must be a write-only operand, and
-		 operand_operand/src must be a read-only operand.  */
-	      if (match.use[op_no] != READ
-		  || match.use[match_no] != WRITE)
-		continue;
-
-	      if (match.early_clobber[match_no]
-		  && count_occurrences (PATTERN (insn), src, 0) > 1)
-		continue;
-
-	      /* Make sure match_no is the destination.  */
-	      if (recog_data.operand[match_no] != SET_DEST (set))
-		continue;
-
-	      if (REGNO (src) < FIRST_PSEUDO_REGISTER)
-		{
-		  if (GET_CODE (SET_SRC (set)) == PLUS
-		      && CONST_INT_P (XEXP (SET_SRC (set), 1))
-		      && XEXP (SET_SRC (set), 0) == src
-		      && fixup_match_2 (insn, dst, src,
-					XEXP (SET_SRC (set), 1)))
-		    break;
-		  continue;
-		}
-	      src_class = reg_preferred_class (REGNO (src));
-	      dst_class = reg_preferred_class (REGNO (dst));
-
-	      if (! (src_note = find_reg_note (insn, REG_DEAD, src)))
-		{
-		  /* We used to force the copy here like in other cases, but
-		     it produces worse code, as it eliminates no copy
-		     instructions and the copy emitted will be produced by
-		     reload anyway.  On patterns with multiple alternatives,
-		     there may be better solution available.
-
-		     In particular this change produced slower code for numeric
-		     i387 programs.  */
-
-		  continue;
-		}
-
-	      if (! regclass_compatible_p (src_class, dst_class))
-		{
-		  if (!copy_src)
-		    {
-		      copy_src = src;
-		      copy_dst = dst;
-		    }
-		  continue;
-		}
-
-	      /* Can not modify an earlier insn to set dst if this insn
-		 uses an old value in the source.  */
-	      if (reg_overlap_mentioned_p (dst, SET_SRC (set)))
-		{
-		  if (!copy_src)
-		    {
-		      copy_src = src;
-		      copy_dst = dst;
-		    }
-		  continue;
-		}
-
-	      /* If src is set once in a different basic block,
-		 and is set equal to a constant, then do not use
-		 it for this optimization, as this would make it
-		 no longer equivalent to a constant.  */
-
-	      if (reg_is_remote_constant_p (src, insn))
-		{
-		  if (!copy_src)
-		    {
-		      copy_src = src;
-		      copy_dst = dst;
-		    }
-		  continue;
-		}
-
-
-	      if (dump_file)
-		fprintf (dump_file,
-			 "Could fix operand %d of insn %d matching operand %d.\n",
-			 op_no, INSN_UID (insn), match_no);
-
-	      /* Scan backward to find the first instruction that uses
-		 the input operand.  If the operand is set here, then
-		 replace it in both instructions with match_no.  */
-
-	      for (length = 0, p = PREV_INSN (insn); p; p = PREV_INSN (p))
-		{
-		  rtx pset;
-
-		  if (! INSN_P (p))
-		    continue;
-		  if (BLOCK_FOR_INSN (p) != bb)
-		    break;
-
-		  if (!DEBUG_INSN_P (p))
-		    length++;
-
-		  /* ??? See if all of SRC is set in P.  This test is much
-		     more conservative than it needs to be.  */
-		  pset = single_set (p);
-		  if (pset && SET_DEST (pset) == src)
-		    {
-		      /* We use validate_replace_rtx, in case there
-			 are multiple identical source operands.  All
-			 of them have to be changed at the same time:
-			 when validate_replace_rtx() calls
-			 apply_change_group().  */
-		      validate_change (p, &SET_DEST (pset), dst, 1);
-		      if (validate_replace_rtx (src, dst, insn))
-			success = 1;
-		      break;
-		    }
-
-		  /* We can't make this change if DST is mentioned at
-		     all in P, since we are going to change its value.
-		     We can't make this change if SRC is read or
-		     partially written in P, since we are going to
-		     eliminate SRC.  However, if it's a debug insn, we
-		     can't refrain from making the change, for this
-		     would cause codegen differences, so instead we
-		     invalidate debug expressions that reference DST,
-		     and adjust references to SRC in them so that they
-		     become references to DST.  */
-		  if (reg_mentioned_p (dst, PATTERN (p)))
-		    {
-		      if (DEBUG_INSN_P (p))
-			validate_change (p, &INSN_VAR_LOCATION_LOC (p),
-					 gen_rtx_UNKNOWN_VAR_LOC (), 1);
-		      else
-			break;
-		    }
-		  if (reg_overlap_mentioned_p (src, PATTERN (p)))
-		    {
-		      if (DEBUG_INSN_P (p))
-			validate_replace_rtx_group (src, dst, p);
-		      else
-			break;
-		    }
-
-		  /* If we have passed a call instruction, and the
-		     pseudo-reg DST is not already live across a call,
-		     then don't perform the optimization.  */
-		  if (CALL_P (p))
-		    {
-		      num_calls++;
-		      freq_calls += REG_FREQ_FROM_BB  (BLOCK_FOR_INSN (p));
-
-		      if (REG_N_CALLS_CROSSED (REGNO (dst)) == 0)
-			break;
-		    }
-		}
-
-	      if (success)
-		{
-		  int dstno, srcno;
-
-		  /* Remove the death note for SRC from INSN.  */
-		  remove_note (insn, src_note);
-		  /* Move the death note for SRC to P if it is used
-		     there.  */
-		  if (reg_overlap_mentioned_p (src, PATTERN (p)))
-		    {
-		      XEXP (src_note, 1) = REG_NOTES (p);
-		      REG_NOTES (p) = src_note;
-		    }
-		  /* If there is a REG_DEAD note for DST on P, then remove
-		     it, because DST is now set there.  */
-		  if ((dst_note = find_reg_note (p, REG_DEAD, dst)))
-		    remove_note (p, dst_note);
-
-		  dstno = REGNO (dst);
-		  srcno = REGNO (src);
-
-		  INC_REG_N_SETS (dstno, 1);
-		  INC_REG_N_SETS (srcno, -1);
-
-		  REG_N_CALLS_CROSSED (dstno) += num_calls;
-		  REG_N_CALLS_CROSSED (srcno) -= num_calls;
-		  REG_FREQ_CALLS_CROSSED (dstno) += freq_calls;
-		  REG_FREQ_CALLS_CROSSED (srcno) -= freq_calls;
-
-		  REG_LIVE_LENGTH (dstno) += length;
-		  if (REG_LIVE_LENGTH (srcno) >= 0)
-		    {
-		      REG_LIVE_LENGTH (srcno) -= length;
-		      /* REG_LIVE_LENGTH is only an approximation after
-			 combine if sched is not run, so make sure that we
-			 still have a reasonable value.  */
-		      if (REG_LIVE_LENGTH (srcno) < 2)
-			REG_LIVE_LENGTH (srcno) = 2;
-		    }
-
-		  if (dump_file)
-		    fprintf (dump_file,
-			     "Fixed operand %d of insn %d matching operand %d.\n",
-			     op_no, INSN_UID (insn), match_no);
-
-		  break;
-		}
-	      else if (num_changes_pending () > 0)
-		cancel_changes (0);
-	    }
-
-	  /* If we weren't able to replace any of the alternatives, try an
-	     alternative approach of copying the source to the destination.  */
-	  if (!success && copy_src != NULL_RTX)
-	    copy_src_to_dest (insn, copy_src, copy_dst);
-	}
-    }
-}
-
-/* Main entry for the register move optimization.  */
-
-static unsigned int
-regmove_optimize (void)
-{
-  int i;
-  int nregs = max_reg_num ();
-
-  df_note_add_problem ();
-  df_analyze ();
-
-  regstat_init_n_sets_and_refs ();
-  regstat_compute_ri ();
-
-  if (flag_ira_loop_pressure)
-    ira_set_pseudo_classes (true, dump_file);
-
-  regno_src_regno = XNEWVEC (int, nregs);
-  for (i = nregs; --i >= 0; )
-    regno_src_regno[i] = -1;
-
-  /* A forward pass.  Replace output operands with input operands.  */
-  regmove_forward_pass ();
-
-  /* A backward pass.  Replace input operands with output operands.  */
-  regmove_backward_pass ();
-
-  /* Clean up.  */
-  free (regno_src_regno);
-  if (reg_set_in_bb)
-    {
-      free (reg_set_in_bb);
-      reg_set_in_bb = NULL;
-    }
-  regstat_free_n_sets_and_refs ();
-  regstat_free_ri ();
-  if (flag_ira_loop_pressure)
-    free_reg_info ();
-  return 0;
-}
-
-/* Returns nonzero if INSN's pattern has matching constraints for any operand.
-   Returns 0 if INSN can't be recognized, or if the alternative can't be
-   determined.
-
-   Initialize the info in MATCHP based on the constraints.  */
-
-static int
-find_matches (rtx insn, struct match *matchp)
-{
-  int likely_spilled[MAX_RECOG_OPERANDS];
-  int op_no;
-  int any_matches = 0;
-
-  extract_insn (insn);
-  if (! constrain_operands (0))
-    return 0;
-
-  /* Must initialize this before main loop, because the code for
-     the commutative case may set matches for operands other than
-     the current one.  */
-  for (op_no = recog_data.n_operands; --op_no >= 0; )
-    matchp->with[op_no] = matchp->commutative[op_no] = -1;
-
-  for (op_no = 0; op_no < recog_data.n_operands; op_no++)
-    {
-      const char *p;
-      char c;
-      int i = 0;
-
-      p = recog_data.constraints[op_no];
-
-      likely_spilled[op_no] = 0;
-      matchp->use[op_no] = READ;
-      matchp->early_clobber[op_no] = 0;
-      if (*p == '=')
-	matchp->use[op_no] = WRITE;
-      else if (*p == '+')
-	matchp->use[op_no] = READWRITE;
-
-      for (;*p && i < which_alternative; p++)
-	if (*p == ',')
-	  i++;
-
-      while ((c = *p) != '\0' && c != ',')
-	{
-	  switch (c)
-	    {
-	    case '=':
-	      break;
-	    case '+':
-	      break;
-	    case '&':
-	      matchp->early_clobber[op_no] = 1;
-	      break;
-	    case '%':
-	      matchp->commutative[op_no] = op_no + 1;
-	      matchp->commutative[op_no + 1] = op_no;
-	      break;
-
-	    case '0': case '1': case '2': case '3': case '4':
-	    case '5': case '6': case '7': case '8': case '9':
-	      {
-		char *end;
-		unsigned long match_ul = strtoul (p, &end, 10);
-		int match = match_ul;
-
-		p = end;
-
-		if (match < op_no && likely_spilled[match])
-		  continue;
-		matchp->with[op_no] = match;
-		any_matches = 1;
-		if (matchp->commutative[op_no] >= 0)
-		  matchp->with[matchp->commutative[op_no]] = match;
-	      }
-	    continue;
-
-	  case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': case 'h':
-	  case 'j': case 'k': case 'l': case 'p': case 'q': case 't': case 'u':
-	  case 'v': case 'w': case 'x': case 'y': case 'z': case 'A': case 'B':
-	  case 'C': case 'D': case 'W': case 'Y': case 'Z':
-	    if (targetm.class_likely_spilled_p (REG_CLASS_FROM_CONSTRAINT ((unsigned char) c, p)))
-	      likely_spilled[op_no] = 1;
-	    break;
-	  }
-	  p += CONSTRAINT_LEN (c, p);
-	}
-    }
-  return any_matches;
-}
-
-
-
-static bool
-gate_handle_regmove (void)
-{
-  return (optimize > 0 && flag_regmove);
-}
-
-
-namespace {
-
-const pass_data pass_data_regmove =
-{
-  RTL_PASS, /* type */
-  "regmove", /* name */
-  OPTGROUP_NONE, /* optinfo_flags */
-  true, /* has_gate */
-  true, /* has_execute */
-  TV_REGMOVE, /* tv_id */
-  0, /* properties_required */
-  0, /* properties_provided */
-  0, /* properties_destroyed */
-  0, /* todo_flags_start */
-  ( TODO_df_finish | TODO_verify_rtl_sharing ), /* todo_flags_finish */
-};
-
-class pass_regmove : public rtl_opt_pass
-{
-public:
-  pass_regmove (gcc::context *ctxt)
-    : rtl_opt_pass (pass_data_regmove, ctxt)
-  {}
-
-  /* opt_pass methods: */
-  bool gate () { return gate_handle_regmove (); }
-  unsigned int execute () { return regmove_optimize (); }
-
-}; // class pass_regmove
-
-} // anon namespace
-
-rtl_opt_pass *
-make_pass_regmove (gcc::context *ctxt)
-{
-  return new pass_regmove (ctxt);
-}
Index: testsuite/gcc.target/i386/fma_double_3.c
===================================================================
--- testsuite/gcc.target/i386/fma_double_3.c	(revision 204148)
+++ testsuite/gcc.target/i386/fma_double_3.c	(working copy)
@@ -8,11 +8,7 @@
 
 #include "fma_3.h"
 
-/* { dg-final { scan-assembler-times "vfmadd132sd" 4  } } */
-/* { dg-final { scan-assembler-times "vfmadd231sd" 4  } } */
-/* { dg-final { scan-assembler-times "vfmsub132sd" 4  } } */
-/* { dg-final { scan-assembler-times "vfmsub231sd" 4  } } */
-/* { dg-final { scan-assembler-times "vfnmadd132sd" 4  } } */
-/* { dg-final { scan-assembler-times "vfnmadd231sd" 4  } } */
-/* { dg-final { scan-assembler-times "vfnmsub132sd" 4  } } */
-/* { dg-final { scan-assembler-times "vfnmsub231sd" 4  } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 8  } } */
+/* { dg-final { scan-assembler-times "vfmsub\[132\]+sd" 8  } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[132\]+sd" 8  } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[132\]+sd" 8  } } */
Index: testsuite/gcc.target/i386/fma_double_5.c
===================================================================
--- testsuite/gcc.target/i386/fma_double_5.c	(revision 204148)
+++ testsuite/gcc.target/i386/fma_double_5.c	(working copy)
@@ -8,7 +8,7 @@
 
 #include "fma_5.h"
 
-/* { dg-final { scan-assembler-times "vfmadd132sd" 8  } } */
-/* { dg-final { scan-assembler-times "vfmsub132sd" 8  } } */
-/* { dg-final { scan-assembler-times "vfnmadd132sd" 8  } } */
-/* { dg-final { scan-assembler-times "vfnmsub132sd" 8  } } */
+/* { dg-final { scan-assembler-times "vfmadd\[132\]+sd" 8  } } */
+/* { dg-final { scan-assembler-times "vfmsub\[132\]+sd" 8  } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[132\]+sd" 8  } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[132\]+sd" 8  } } */
Index: testsuite/gcc.target/i386/fma_float_3.c
===================================================================
--- testsuite/gcc.target/i386/fma_float_3.c	(revision 204148)
+++ testsuite/gcc.target/i386/fma_float_3.c	(working copy)
@@ -8,11 +8,7 @@
 
 #include "fma_3.h"
 
-/* { dg-final { scan-assembler-times "vfmadd132ss" 4  } } */
-/* { dg-final { scan-assembler-times "vfmadd231ss" 4  } } */
-/* { dg-final { scan-assembler-times "vfmsub132ss" 4  } } */
-/* { dg-final { scan-assembler-times "vfmsub231ss" 4  } } */
-/* { dg-final { scan-assembler-times "vfnmadd132ss" 4  } } */
-/* { dg-final { scan-assembler-times "vfnmadd231ss" 4  } } */
-/* { dg-final { scan-assembler-times "vfnmsub132ss" 4  } } */
-/* { dg-final { scan-assembler-times "vfnmsub231ss" 4  } } */
+/* { dg-final { scan-assembler-times "vfmadd\[132\]+ss" 8  } } */
+/* { dg-final { scan-assembler-times "vfmsub\[132\]+ss" 8  } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[132\]+ss" 8  } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[132\]+ss" 8  } } */
Index: testsuite/gcc.target/i386/fma_float_5.c
===================================================================
--- testsuite/gcc.target/i386/fma_float_5.c	(revision 204148)
+++ testsuite/gcc.target/i386/fma_float_5.c	(working copy)
@@ -8,7 +8,7 @@
 
 #include "fma_5.h"
 
-/* { dg-final { scan-assembler-times "vfmadd132ss" 8  } } */
-/* { dg-final { scan-assembler-times "vfmsub132ss" 8  } } */
-/* { dg-final { scan-assembler-times "vfnmadd132ss" 8  } } */
-/* { dg-final { scan-assembler-times "vfnmsub132ss" 8  } } */
+/* { dg-final { scan-assembler-times "vfmadd\[132\]+ss" 8  } } */
+/* { dg-final { scan-assembler-times "vfmsub\[132\]+ss" 8  } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[132\]+ss" 8  } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[132\]+ss" 8  } } */
Index: testsuite/gcc.target/i386/l_fma_double_1.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_double_1.c	(revision 204148)
+++ testsuite/gcc.target/i386/l_fma_double_1.c	(working copy)
@@ -17,11 +17,7 @@ typedef double adouble __attribute__((al
 /* { dg-final { scan-assembler-times "vfnmadd231pd" 4  } } */
 /* { dg-final { scan-assembler-times "vfnmsub132pd" 4  } } */
 /* { dg-final { scan-assembler-times "vfnmsub231pd" 4  } } */
-/* { dg-final { scan-assembler-times "vfmadd132sd" 28  } } */
-/* { dg-final { scan-assembler-times "vfmadd213sd" 28 } } */
-/* { dg-final { scan-assembler-times "vfmsub132sd" 28 } } */
-/* { dg-final { scan-assembler-times "vfmsub213sd" 28 } } */
-/* { dg-final { scan-assembler-times "vfnmadd132sd" 28 } } */
-/* { dg-final { scan-assembler-times "vfnmadd213sd" 28 } } */
-/* { dg-final { scan-assembler-times "vfnmsub132sd" 28 } } */
-/* { dg-final { scan-assembler-times "vfnmsub213sd" 28 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56  } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
Index: testsuite/gcc.target/i386/l_fma_double_2.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_double_2.c	(revision 204148)
+++ testsuite/gcc.target/i386/l_fma_double_2.c	(working copy)
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((al
 /* { dg-final { scan-assembler-times "vfmsub132pd" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmadd132pd" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmsub132pd" 8  } } */
-/* { dg-final { scan-assembler-times "vfmadd132sd" 56  } } */
-/* { dg-final { scan-assembler-times "vfmsub132sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmadd132sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmsub132sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56  } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
Index: testsuite/gcc.target/i386/l_fma_double_3.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_double_3.c	(revision 204148)
+++ testsuite/gcc.target/i386/l_fma_double_3.c	(working copy)
@@ -17,11 +17,7 @@ typedef double adouble __attribute__((al
 /* { dg-final { scan-assembler-times "vfnmadd231pd" 4  } } */
 /* { dg-final { scan-assembler-times "vfnmsub132pd" 4  } } */
 /* { dg-final { scan-assembler-times "vfnmsub231pd" 4  } } */
-/* { dg-final { scan-assembler-times "vfmadd132sd" 28 } } */
-/* { dg-final { scan-assembler-times "vfmadd213sd" 28 } } */
-/* { dg-final { scan-assembler-times "vfmsub132sd" 28 } } */
-/* { dg-final { scan-assembler-times "vfmsub213sd" 28 } } */
-/* { dg-final { scan-assembler-times "vfnmadd132sd" 28 } } */
-/* { dg-final { scan-assembler-times "vfnmadd213sd" 28 } } */
-/* { dg-final { scan-assembler-times "vfnmsub132sd" 28 } } */
-/* { dg-final { scan-assembler-times "vfnmsub213sd" 28 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
Index: testsuite/gcc.target/i386/l_fma_double_4.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_double_4.c	(revision 204148)
+++ testsuite/gcc.target/i386/l_fma_double_4.c	(working copy)
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((al
 /* { dg-final { scan-assembler-times "vfmsub132pd" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmadd132pd" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmsub132pd" 8  } } */
-/* { dg-final { scan-assembler-times "vfmadd132sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfmsub132sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmadd132sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfnmsub132sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56 } } */
Index: testsuite/gcc.target/i386/l_fma_double_5.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_double_5.c	(revision 204148)
+++ testsuite/gcc.target/i386/l_fma_double_5.c	(working copy)
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((al
 /* { dg-final { scan-assembler-times "vfmsub132pd" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmadd132pd" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmsub132pd" 8  } } */
-/* { dg-final { scan-assembler-times "vfmadd132sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfmsub132sd" 56  } } */
-/* { dg-final { scan-assembler-times "vfnmadd132sd" 56  } } */
-/* { dg-final { scan-assembler-times "vfnmsub132sd" 56  } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56  } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56  } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56  } } */
Index: testsuite/gcc.target/i386/l_fma_double_6.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_double_6.c	(revision 204148)
+++ testsuite/gcc.target/i386/l_fma_double_6.c	(working copy)
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((al
 /* { dg-final { scan-assembler-times "vfmsub132pd" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmadd132pd" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmsub132pd" 8  } } */
-/* { dg-final { scan-assembler-times "vfmadd132sd" 56 } } */
-/* { dg-final { scan-assembler-times "vfmsub132sd" 56  } } */
-/* { dg-final { scan-assembler-times "vfnmadd132sd" 56  } } */
-/* { dg-final { scan-assembler-times "vfnmsub132sd" 56  } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 56 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 56  } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 56  } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 56  } } */
Index: testsuite/gcc.target/i386/l_fma_float_1.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_float_1.c	(revision 204148)
+++ testsuite/gcc.target/i386/l_fma_float_1.c	(working copy)
@@ -16,11 +16,7 @@
 /* { dg-final { scan-assembler-times "vfnmadd231ps" 4  } } */
 /* { dg-final { scan-assembler-times "vfnmsub132ps" 4  } } */
 /* { dg-final { scan-assembler-times "vfnmsub231ps" 4  } } */
-/* { dg-final { scan-assembler-times "vfmadd132ss" 60 } } */
-/* { dg-final { scan-assembler-times "vfmadd213ss" 60 } } */
-/* { dg-final { scan-assembler-times "vfmsub132ss" 60 } } */
-/* { dg-final { scan-assembler-times "vfmsub213ss" 60 } } */
-/* { dg-final { scan-assembler-times "vfnmadd132ss" 60 } } */
-/* { dg-final { scan-assembler-times "vfnmadd213ss" 60 } } */
-/* { dg-final { scan-assembler-times "vfnmsub132ss" 60 } } */
-/* { dg-final { scan-assembler-times "vfnmsub213ss" 60 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 120 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 120 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 120 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 120 } } */
Index: testsuite/gcc.target/i386/l_fma_float_2.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_float_2.c	(revision 204148)
+++ testsuite/gcc.target/i386/l_fma_float_2.c	(working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub132ps" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmadd132ps" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmsub132ps" 8  } } */
-/* { dg-final { scan-assembler-times "vfmadd132ss" 120  } } */
-/* { dg-final { scan-assembler-times "vfmsub132ss" 120  } } */
-/* { dg-final { scan-assembler-times "vfnmadd132ss" 120  } } */
-/* { dg-final { scan-assembler-times "vfnmsub132ss" 120  } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 120  } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 120  } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 120  } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 120  } } */
Index: testsuite/gcc.target/i386/l_fma_float_3.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_float_3.c	(revision 204148)
+++ testsuite/gcc.target/i386/l_fma_float_3.c	(working copy)
@@ -16,11 +16,7 @@
 /* { dg-final { scan-assembler-times "vfnmadd231ps" 4  } } */
 /* { dg-final { scan-assembler-times "vfnmsub132ps" 4  } } */
 /* { dg-final { scan-assembler-times "vfnmsub231ps" 4  } } */
-/* { dg-final { scan-assembler-times "vfmadd132ss" 60  } } */
-/* { dg-final { scan-assembler-times "vfmadd213ss" 60  } } */
-/* { dg-final { scan-assembler-times "vfmsub132ss" 60  } } */
-/* { dg-final { scan-assembler-times "vfmsub213ss" 60  } } */
-/* { dg-final { scan-assembler-times "vfnmadd132ss" 60  } } */
-/* { dg-final { scan-assembler-times "vfnmadd213ss" 60  } } */
-/* { dg-final { scan-assembler-times "vfnmsub132ss" 60  } } */
-/* { dg-final { scan-assembler-times "vfnmsub213ss" 60  } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 120  } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 120  } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 120  } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 120  } } */
Index: testsuite/gcc.target/i386/l_fma_float_4.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_float_4.c	(revision 204148)
+++ testsuite/gcc.target/i386/l_fma_float_4.c	(working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub132ps" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmadd132ps" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmsub132ps" 8  } } */
-/* { dg-final { scan-assembler-times "vfmadd132ss" 120  } } */
-/* { dg-final { scan-assembler-times "vfmsub132ss" 120  } } */
-/* { dg-final { scan-assembler-times "vfnmadd132ss" 120  } } */
-/* { dg-final { scan-assembler-times "vfnmsub132ss" 120  } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 120  } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 120  } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 120  } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 120  } } */
Index: testsuite/gcc.target/i386/l_fma_float_5.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_float_5.c	(revision 204148)
+++ testsuite/gcc.target/i386/l_fma_float_5.c	(working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub132ps" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmadd132ps" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmsub132ps" 8  } } */
-/* { dg-final { scan-assembler-times "vfmadd132ss" 120  } } */
-/* { dg-final { scan-assembler-times "vfmsub132ss" 120  } } */
-/* { dg-final { scan-assembler-times "vfnmadd132ss" 120  } } */
-/* { dg-final { scan-assembler-times "vfnmsub132ss" 120  } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 120  } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 120  } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 120  } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 120  } } */
Index: testsuite/gcc.target/i386/l_fma_float_6.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_float_6.c	(revision 204148)
+++ testsuite/gcc.target/i386/l_fma_float_6.c	(working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub132ps" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmadd132ps" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmsub132ps" 8  } } */
-/* { dg-final { scan-assembler-times "vfmadd132ss" 120 } } */
-/* { dg-final { scan-assembler-times "vfmsub132ss" 120  } } */
-/* { dg-final { scan-assembler-times "vfnmadd132ss" 120  } } */
-/* { dg-final { scan-assembler-times "vfnmsub132ss" 120  } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 120 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 120  } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 120  } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 120  } } */
Index: timevar.def
===================================================================
--- timevar.def	(revision 204148)
+++ timevar.def	(working copy)
@@ -221,7 +221,6 @@ DEFTIMEVAR (TV_CSE2                  , "
 DEFTIMEVAR (TV_BRANCH_PROB           , "branch prediction")
 DEFTIMEVAR (TV_COMBINE               , "combiner")
 DEFTIMEVAR (TV_IFCVT		     , "if-conversion")
-DEFTIMEVAR (TV_REGMOVE               , "regmove")
 DEFTIMEVAR (TV_MODE_SWITCH           , "mode switching")
 DEFTIMEVAR (TV_SMS		     , "sms modulo scheduling")
 DEFTIMEVAR (TV_SCHED                 , "scheduling")
Index: tree-pass.h
===================================================================
--- tree-pass.h	(revision 204148)
+++ tree-pass.h	(working copy)
@@ -524,7 +524,6 @@ extern rtl_opt_pass *make_pass_if_after_
 extern rtl_opt_pass *make_pass_ree (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_partition_blocks (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_match_asm_constraints (gcc::context *ctxt);
-extern rtl_opt_pass *make_pass_regmove (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_split_all_insns (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_fast_rtl_byte_dce (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_lower_subreg2 (gcc::context *ctxt);

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]