This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[sel-sched] tuning of speculation, alignment and cleanups


Hello. This patch presents various tweaks for selective scheduler and related things aimed at improving performance.

Moved pass_compute_alignments after pass_machine_reorg to fix problems with loop_optimizer_init moving labels with alignment out of loops' headers into preheaders.

New flag mstop-bit-before-check to force stops before speculation checks for ia64.

Field weakness_cutoff of spec_info structure renamed into data_weakness_cutoff, also added control_weakness_cutoff.

Two new attributes of exprs: usefulness and orig_sched_cycle.
The first is the probability of actually using result of this expr if it is to be scheduled now. The second is set only for already scheduled insns and represents cycle, on which it was scheduled.


Tuning of merging of fences: saves some fence properties of old fence for use in a new fence if it is possible.

Function is_ineligible_successor_p now handles cases of trying to schedule insns, scheduled on one fences, on the other fence, when pipelining. Now it checks if sched_times of insns on path switches from zero to non-zero; if it does then path must contain back edge of loop in order to be eligible.

Do not call loop_optimizer_init if it will not be useful in case of single basic block regions. Do not add to any region (neither create new regions from) empty preheader blocks - just remove them.

Prevention of scheduling renaming or speculation at the end of loops if their result will not be ready at the original insn at the beginning of loops on the next iteration and that way will always cause stall.

Assertions of correctness and existance of valid sets at the beginnings of basic blocks in move_op. Removing of unnecessary jumps and empty blocks. Now empty basic blocks are not created on incoming edges to region, everything is handled in generate_bookkeepin_insn.

Bootstrapped and tested on ia64. Committed to sel-sched branch.

-- Dmitry
2007-12-26  Dmitry Zhurikhin  <zhur@ispras.ru>

	* haifa-sched.c (sched_init): Add initialization of new spec_info 
        field: control_weakness_cutoff.
	(try_ready): Change weakness_cutoff for data_weakness_cutoff.
	* sel-sched.c (find_best_reg_for_rhs): New parameter - pointer 
	to bool, showing if best reg for rhs is an original reg.
	Update all callers.
	(moveup_set_rhs): Add computation of usefullness of rhses in av sets. 
	(compute_av_set): Same.
	(end_of_loop_p): New.
	(can_overcome_dep_p): Change weakness_cutoff for data_weakness_cutoff. 
	(process_spec_exprs): Same. 
	(check_stalling_p): New.
	(fill_vec_av_set): Remove from av set at the end of the loop insns 
	that will always cause stall at the beginning of 
	the next iteration if pipelined.
	(generate_bookkeeping_insn): Change return from void to basic_block
        where bookkeeping has been generated.
	Also add correction of some av sets.
	(fill_insns): When creating new floating bb header move data sets from
	the new basic block created above back to the old one.
	(move_op): Always assert that all bb headers that move_op passes has
        valid data sets.  Also correct data sets for bookkeeping
        block and remove unneeded jumps created by bookkeeping.
	(split_edges_incoming_to_rgn): Remove.
	(sel_region_init): Correctly initialize MARK_LOOP_FOR_PIPELINING.
	Also do not split edges coming into region, because
        generate_bookkeeping_code can correctly handle it.
	(sel_sched_region_1): Add cleanup of preheader block of pipelined loop
        if it is empty after pipelining.
	(sel_global_finish): Rearrange initialization routines so that if we
        cannot pipeline loop preheaders are not created.
	* sel-sched-ir.c (preheader_removed): New variable.
	(merge_fences): New.
	(new_fences_add): Use merge_fences. 
	(init_expr): Add handling of new expr fields: usefullness and 
        orig_sched_cycle.
	(copy_expr): Same.
	(merge_expr_data): Same. 
	(merge_expr): Same.
	(init_global_and_expr_for_insn): Same.
	(init_insn): Same.
	(av_set_split_usefulness): New.
	(free_lv_sets): Assert that empty basic blocks have empty lv_sets.
	(free_av_set): Make non-static.
	(exchange_av_sets): Same. 
	(copy_data_sets): New.
	(cfg_succs_2): New.
	(cfg_succs): Add computation of probabilities of succs.
	(overall_prob_of_succs): New.
	(sel_num_cfg_preds_gt_1): Remove wrong assert. 
	(jump_destination): New.
	(jump_to_back_edge_p): New.
	(path_contains_back_edge_p): New.
	(path_contains_switch_of_sched_times_p): New.
	(is_ineligible_successor): Consider insns scheduled on other fences as
        ineligible successors even if pipelining. 
	(sel_remove_empty_bb): Assert removing only empty bbs and after
        removing empty block remove unused jump in prev_bb if it exists.
	(considered_for_pipelining_p): Do initialization of pipelining of 
        outer loops only if pipelining current region.
	(sel_find_rgns): Move pipeline_outer_loops_init here from 
         sel_global_init.
	(sel_is_loop_preheader): Return false if preheader was removed.
	(jump_leads_only_to_bb_p): New.
	(sel_remove_loop_preheader): Do not create empty regions from empty
        preheader block - just remove it. 
	* sel-sched-ir.h (struct _expr): New fields usefullness and 
	orig_sched_cycle.  Also access macros for them. 
	* sel-sched-dump.c (dump_expr_1): Add dump for expr_usefullness.
	* passes.c (init_optimization_passes): Move pass_compute_alignments 
	below of pass_machine_reorg.
	* config/ia64/ia64.opt (mstop-bit-before-check): New flag.
	* config/ia64/ia64.c (group_barrier_needed): Use new flag.
Index: gcc/haifa-sched.c
===================================================================
--- gcc/haifa-sched.c	(revision 131180)
+++ gcc/haifa-sched.c	(working copy)
@@ -2810,8 +2810,13 @@ sched_init (void)
       targetm.sched.set_sched_flags (spec_info);
 
       if (spec_info->mask != 0)
-	spec_info->weakness_cutoff =
-	  (PARAM_VALUE (PARAM_SCHED_SPEC_PROB_CUTOFF) * MAX_DEP_WEAK) / 100;
+        {
+          spec_info->data_weakness_cutoff =
+            (PARAM_VALUE (PARAM_SCHED_SPEC_PROB_CUTOFF) * MAX_DEP_WEAK) / 100;
+          spec_info->control_weakness_cutoff =
+            (PARAM_VALUE (PARAM_SCHED_SPEC_PROB_CUTOFF)
+             * REG_BR_PROB_BASE) / 100;
+        }
       else
 	/* So we won't read anything accidentally.  */
 	spec_info = NULL;
@@ -3132,7 +3137,7 @@ try_ready (rtx next)
 		*ts = ds_merge (*ts, ds);
 	    }
 
-	  if (dep_weak (*ts) < spec_info->weakness_cutoff)
+	  if (dep_weak (*ts) < spec_info->data_weakness_cutoff)
 	    /* Too few points.  */
 	    *ts = (*ts & ~SPECULATIVE) | HARD_DEP;
 	}
Index: gcc/sel-sched.c
===================================================================
--- gcc/sel-sched.c	(revision 131181)
+++ gcc/sel-sched.c	(working copy)
@@ -51,6 +51,7 @@
 #include "vec.h"
 #include "langhooks.h"
 #include "rtlhooks-def.h"
+#include "output.h"
 
 #ifdef INSN_SCHEDULING
 #include "sel-sched-ir.h"
@@ -211,7 +212,7 @@ static bool rtx_search (rtx, rtx);
 static int sel_rank_for_schedule (const void *, const void *);
 static bool equal_after_moveup_path_p (rhs_t, ilist_t, rhs_t);
 static regset compute_live (insn_t);
-static void generate_bookkeeping_insn (rhs_t, insn_t, edge, edge);
+static basic_block generate_bookkeeping_insn (rhs_t, insn_t, edge, edge);
 static bool find_used_regs (insn_t, av_set_t, regset, HARD_REG_SET *, 
                             def_list_t *);
 static bool move_op (insn_t, av_set_t, ilist_t, edge, edge, expr_t);
@@ -291,7 +292,6 @@ extract_new_fences_from (fence_t fence, 
                        || (!flag_sel_sched_reset_tc_on_join
                            && in_fallthru_bb_p (insn, succ));
 
-
 	      print ("%d[%d] (state %s); ", INSN_UID (succ),
 		     BLOCK_NUM (succ), b ? "continue" : "reset");
 
@@ -1241,7 +1241,7 @@ choose_best_pseudo_reg (regset used_regs
      - RHS_SCHEDULE_AS_RHS is false but the insn sets/clobbers one of
        the registers that are used on the moving path.  */
 static bool
-find_best_reg_for_rhs (rhs_t rhs, blist_t bnds)
+find_best_reg_for_rhs (rhs_t rhs, blist_t bnds, bool *is_orig_reg_p)
 {
   av_set_iterator i2;
   rhs_t rhs_orig;
@@ -1252,7 +1252,8 @@ find_best_reg_for_rhs (rhs_t rhs, blist_
   def_list_t original_insns = NULL;
   int res = 0;
   bool reg_ok = true;
-  bool is_orig_reg_p = false;
+
+  *is_orig_reg_p = false;
 
   /* Don't bother to do anything if this insn doesn't set any registers.  */
   if (bitmap_empty_p (VINSN_REG_SETS (EXPR_VINSN (rhs))))
@@ -1330,15 +1331,15 @@ find_best_reg_for_rhs (rhs_t rhs, blist_
                  restrictions and live range intersection.  */
               IOR_HARD_REG_SET (hard_regs_used, unavailable_hard_regs);
               best_reg = choose_best_reg (hard_regs_used, original_insns,
-					  &is_orig_reg_p);
+					  is_orig_reg_p);
             }
           else
             best_reg = choose_best_pseudo_reg (used_regs, 
                                                unavailable_hard_regs, 
                                                original_insns,
-					       &is_orig_reg_p);
+					       is_orig_reg_p);
 
-	  if (!is_orig_reg_p && sel_vinsn_cost (EXPR_VINSN (rhs)) < 2)
+	  if (!*is_orig_reg_p && sel_vinsn_cost (EXPR_VINSN (rhs)) < 2)
 	    best_reg = NULL_RTX;
 
 	  if (best_reg != NULL_RTX)
@@ -1439,7 +1440,7 @@ can_overcome_dep_p (ds_t ds)
       return false;
   }
 
-  if (ds_weak (ds) < spec_info->weakness_cutoff)
+  if (ds_weak (ds) < spec_info->data_weakness_cutoff)
     return false;
 
   return true;
@@ -1979,7 +1980,10 @@ moveup_set_rhs (av_set_t *avp, insn_t in
 	       remove it.  */
 	    if (rhs2 != NULL)
 	      {
+                EXPR_USEFULNESS (rhs2) = 0;
 		merge_expr (rhs2, rhs);
+                /* Fix usefulness as it should be now REG_BR_PROB_BASE.  */
+                EXPR_USEFULNESS (rhs2) = REG_BR_PROB_BASE;
 
 		av_set_iter_remove (&i);
 		print (" and removed.");
@@ -2110,7 +2114,8 @@ compute_av_set (insn_t insn, ilist_t p, 
   av_set_t rhs_in_all_succ_branches;
   int succs_n, real_succs_n;
   insn_t *succs;
-  int succ;
+  int *probs;
+  int succ, all_prob;
 
   line_start ();
   print ("compute_av_set");
@@ -2159,7 +2164,11 @@ compute_av_set (insn_t insn, ilist_t p, 
       return NULL;
     }
 
-  cfg_succs (insn, &succs, &succs_n);
+  cfg_succs (insn, &succs, &probs, &succs_n);
+
+  /* Sometimes there are weird cases when sum of probabilities of outgoing 
+     edges is greater than REG_BR_PROB_BASE.  */
+  all_prob = overall_prob_of_succs (insn);
 
   /* If there are successors that lead out of the region, then all rhses from
      the below av_sets should be speculative.  */
@@ -2170,6 +2179,12 @@ compute_av_set (insn_t insn, ilist_t p, 
   print ("successors (%d): ", INSN_UID (insn));
   dump_insn_array (succs, succs_n);
   line_finish ();
+  if (succs_n != real_succs_n)
+    {
+      line_start ();
+      print ("real successors num: %d", real_succs_n);
+      line_finish ();
+    }
 
   /* Add insn to to the tail of current path.  */
   ilist_add (&p, insn);
@@ -2189,6 +2204,7 @@ compute_av_set (insn_t insn, ilist_t p, 
 
       /* We will edit SUCC_SET and RHS_SPEC field of its elements.  */
       succ_set = compute_av_set (succs[succ], p, ws + 1, true);
+      av_set_split_usefulness (&succ_set, probs[succ], all_prob);
 
       if (real_succs_n > 1)
 	{
@@ -2237,6 +2253,7 @@ compute_av_set (insn_t insn, ilist_t p, 
     }
   
   free (succs);
+  free (probs);
   ilist_remove (&p);
 
   line_start ();
@@ -2910,6 +2927,66 @@ sel_rank_for_schedule (const void *x, co
   return INSN_UID (tmp_insn) - INSN_UID (tmp2_insn);
 }
 
+/* Check if av-set AV_PTR contains RHS corresponding to a jump that ends 
+   the loop.  */
+static bool
+end_of_loop_p (av_set_t av)
+{
+  /*basic_block loop_begin;*/
+  rhs_t rhs;
+  av_set_iterator si;
+
+  /*loop_begin = sel_is_loop_preheader_p (EBB_FIRST_BB (0)) ? EBB_FIRST_BB (1)
+                                                          : EBB_FIRST_BB (0);
+  gcc_assert (sel_is_loop_preheader_p (EBB_FIRST_BB (0)));*/
+  FOR_EACH_RHS_1 (rhs, si, &av)
+    {
+      insn_t insn = EXPR_INSN_RTX (rhs);
+
+      /* Jumps must return to the first basic block of the region.  */
+      if (JUMP_P (insn)
+          && JUMP_LABEL (insn)
+          && BLOCK_FOR_INSN (JUMP_LABEL (insn)) == EBB_FIRST_BB (1))
+        return true;
+    }
+
+  return false;
+}
+
+/* While pipelining and scheduling end of loop, checks if instruction RHS
+   will be stalled at the beginning of loop if scheduled now.
+   Example:
+
+   ORIGINAL_LOOP:
+         load f8=[r1];
+         ...
+         <scheduling fence here>
+         if (cc0) jump ORIGINAL_LOOP
+         ...
+
+   MODIFIED_LOOP:
+         check.spec f8;
+         ...
+         load.spec f8=[r1]
+         if (cc0) jump MODIFIED_LOOP
+         ...
+
+  Imagine, when scheduling original loop we can pipeline load from the 
+  beginning of loop.  But is we do so, load's result will not be ready when 
+  executing check operation and execution will stall.  This can happen with
+  speculation instructions (which leave checks in the original place of moved
+  instruction) or renamed instructions (which leave renaming instruction in the
+  original place of moved instruction).  */
+static bool
+check_stalling_p (rhs_t rhs, bool is_orig_reg_p)
+{
+  if (EXPR_ORIG_SCHED_CYCLE (rhs) != 0
+      && (!is_orig_reg_p || EXPR_SPEC_DONE_DS (rhs) != 0)
+      && EXPR_ORIG_SCHED_CYCLE (rhs) <= insn_cost (EXPR_INSN_RTX (rhs)))
+    return true;
+  return false;
+}
+
 /* Filter out expressions that are pipelined too much.  */
 static void
 process_pipelined_exprs (av_set_t *av_ptr)
@@ -2949,7 +3026,8 @@ process_spec_exprs (av_set_t *av_ptr)
 
       /* The probability of a success is too low - don't speculate.  */
       if ((ds & SPECULATIVE)
-          && ds_weak (ds) < spec_info->weakness_cutoff)
+          && (ds_weak (ds) < spec_info->data_weakness_cutoff
+              || EXPR_USEFULNESS (rhs) < spec_info->control_weakness_cutoff))
         {
           av_set_iter_remove (&si);
           continue;
@@ -2996,6 +3074,7 @@ process_use_exprs (av_set_t *av_ptr, bli
   av_set_iterator si;
   bool uses_present_p = false;
   bool try_uses_p = true;
+  bool is_orig_reg = false;
 
   FOR_EACH_RHS_1 (rhs, si, av_ptr)
     {
@@ -3006,7 +3085,7 @@ process_use_exprs (av_set_t *av_ptr, bli
              do so because it will do good only.  */
           if (EXPR_SCHED_TIMES (rhs) <= 0)
             {
-              if (find_best_reg_for_rhs (rhs, bnds))
+              if (find_best_reg_for_rhs (rhs, bnds, &is_orig_reg))
                 return rhs;
 
               av_set_iter_remove (&si);
@@ -3038,7 +3117,7 @@ process_use_exprs (av_set_t *av_ptr, bli
             {
               gcc_assert (INSN_CODE (EXPR_INSN_RTX (rhs)) < 0);
 
-              if (find_best_reg_for_rhs (rhs, bnds))
+              if (find_best_reg_for_rhs (rhs, bnds, &is_orig_reg))
                 return rhs;
 
               av_set_iter_remove (&si);
@@ -3058,6 +3137,7 @@ fill_vec_av_set (av_set_t av, blist_t bn
   rhs_t rhs;
   int sched_next_worked = 0, stalled, n;
   deps_t dc = BND_DC (BLIST_BND (bnds));
+  bool av_contain_end_of_loop_p;
 
   /* Bail out early when the ready list contained only USEs/CLOBBERs that are
      already scheduled.  */
@@ -3068,6 +3148,9 @@ fill_vec_av_set (av_set_t av, blist_t bn
   if (VEC_length (rhs_t, vec_av_set) > 0)
     VEC_block_remove (rhs_t, vec_av_set, 0, VEC_length (rhs_t, vec_av_set));
 
+  /* We are interested in knowing if it is a loop end if pipelining is on.  */
+  av_contain_end_of_loop_p = pipelining_p && end_of_loop_p (av);
+
   /* Turn the set into a vector for sorting.  */
   gcc_assert (VEC_empty (rhs_t, vec_av_set));
   FOR_EACH_RHS (rhs, si, av)
@@ -3077,6 +3160,7 @@ fill_vec_av_set (av_set_t av, blist_t bn
   for (n = 0, stalled = 0; VEC_iterate (rhs_t, vec_av_set, n, rhs); )
     {
       insn_t insn = EXPR_INSN_RTX (rhs);
+      bool is_orig_reg_p = true;
 
       /* Don't allow any insns other than from SCHED_GROUP if we have one.  */
       if (FENCE_SCHED_NEXT (fence) && insn != FENCE_SCHED_NEXT (fence))
@@ -3094,13 +3178,21 @@ fill_vec_av_set (av_set_t av, blist_t bn
 
       /* Check all liveness requirements and try renaming.  
          FIXME: try to minimize calls to this.  */
-      if (! find_best_reg_for_rhs (rhs, bnds))
+      if (! find_best_reg_for_rhs (rhs, bnds, &is_orig_reg_p))
         {
           VEC_unordered_remove (rhs_t, vec_av_set, n);
           print ("- no reg; ");
           continue;
         }
 
+      if (av_contain_end_of_loop_p
+          && check_stalling_p (rhs, is_orig_reg_p))
+        {
+          VEC_unordered_remove (rhs_t, vec_av_set, n);
+          print ("- will cause stall; ");
+          continue;
+        }
+
       /* Don't allow any insns whose data is not yet ready.  */
       if (! tick_check_p (rhs, dc, fence))
 	{
@@ -3213,7 +3305,7 @@ fill_ready_list (av_set_t *av_ptr, blist
       ready.n_ready = 0;
       return NULL;
     }
-  
+
   /* Build the final ready list.  */
   convert_vec_av_set_to_ready ();
 
@@ -3595,7 +3687,7 @@ gen_insn_from_expr_after (expr_t expr, i
    the upper bb, redirecting all other paths to the lower bb and returns the
    newly created bb, which is the lower bb. 
    All scheduler data is initialized for the newly created insn.  */
-static void
+static basic_block
 generate_bookkeeping_insn (rhs_t c_rhs, insn_t join_point, edge e1, edge e2)
 {
   basic_block src, bb = e2->dest;
@@ -3606,6 +3698,8 @@ generate_bookkeeping_insn (rhs_t c_rhs, 
   basic_block empty_bb = e1->dest;
   int new_seqno = INSN_SEQNO (join_point);
   basic_block other_block = NULL;
+  bool need_to_exchange_data_sets = false;
+  insn_t new_insn;
 
   /* Find a basic block that can hold bookkeeping.  If it can be found, do not
      create new basic block, but insert bookkeeping there.  */
@@ -3662,7 +3756,9 @@ generate_bookkeeping_insn (rhs_t c_rhs, 
 
   /* Explore, if we can insert bookkeeping into OTHER_BLOCK in case edge
      OTHER_BLOCK -> BB is fallthrough, meaning there is no jump there.  */
-  if (EDGE_COUNT (bb->preds) == 2 && other_block)
+  if (EDGE_COUNT (bb->preds) == 2
+      && other_block
+      && in_current_region_p (other_block))
     {
       /* SRC is the block, in which we possibly can insert bookkeeping insn
          without creating new basic block.  It is the other (than E2->SRC)
@@ -3671,7 +3767,6 @@ generate_bookkeeping_insn (rhs_t c_rhs, 
 
       /* Instruction, after which we would try to insert bookkeeping insn.  */
       src_end = BB_END (src);
-      gcc_assert (in_current_region_p (src));
 
       if (INSN_P (src_end))
 	{
@@ -3704,7 +3799,9 @@ generate_bookkeeping_insn (rhs_t c_rhs, 
 
       /* Explore, if we can insert bookkeeping into OTHER_BLOCK in case edge
          OTHER_BLOCK -> BB is not fallthrough, meaning there is jump there.  */
-      if (other_block && EDGE_COUNT (other_block->succs) == 1
+      if (other_block
+          && in_current_region_p (other_block)
+          && EDGE_COUNT (other_block->succs) == 1
           && (e1->flags & EDGE_FALLTHRU))
         {
           insn_t src_begin;
@@ -3717,10 +3814,7 @@ generate_bookkeeping_insn (rhs_t c_rhs, 
               INSN_SCHED_TIMES (src_end) > 0
               /* This is a floating bb header.  */
               || (src_end == src_begin
-                  && EDGE_COUNT (other_block->preds) == 1
-                  && INSN_P (BB_END (EDGE_I (other_block->preds, 0)->src))
-                  && INSN_SCHED_TIMES (BB_END (EDGE_I (other_block->preds, 
-                                                       0)->src)) > 0))
+                  && IN_CURRENT_FENCE_P (src_end)))
             new_bb = NULL;
           else
             {
@@ -3779,7 +3873,9 @@ generate_bookkeeping_insn (rhs_t c_rhs, 
     rtx new_insn_rtx;
     vinsn_t new_vinsn;
     expr_def _new_expr, *new_expr = &_new_expr;
-    insn_t new_insn;
+
+    need_to_exchange_data_sets
+      = sel_bb_empty_p (BLOCK_FOR_INSN (place_to_insert));
 
     new_insn_rtx = create_copy_of_insn_rtx (EXPR_INSN_RTX (c_rhs));
     new_vinsn = create_vinsn_from_insn_rtx (new_insn_rtx);
@@ -3787,7 +3883,35 @@ generate_bookkeeping_insn (rhs_t c_rhs, 
     change_vinsn_in_expr (new_expr, new_vinsn);
 
     new_insn = gen_insn_from_expr_after (new_expr, new_seqno, place_to_insert);
-    INSN_SCHED_TIMES (new_insn) = 0;
+
+    /* When inserting bookkeeping insn in new block, av sets should be
+       following: old basic block (that now holds bookkeeping) data sets are
+       the same as was before generation of bookkeeping, and new basic block
+       (that now hold all other insns of old basic block) data sets are
+       invalid.  So exchange data sets for these basic blocks as sel_split_block
+       mistakenly exchanges them in this case.  Cannot do it earlier because
+       when single instruction is added to new basic block it should hold NULL
+       lv_set.  */
+    if (need_to_exchange_data_sets)
+      exchange_data_sets (BLOCK_FOR_INSN (new_insn),
+                          BLOCK_FOR_INSN (join_point));
+
+    /* Not obvoius.  Set sched times of bookkeeping to sched times of join
+       point if join point is not header of loop while pipelining (in this
+       case set it to zero).  This is done to correctly handle inserting of
+       bookkeeping in already scheduled code: when bookkeeping is inserted in
+       code not yet scheduled (including preheader when pipelining) it will
+       recieve zero sched times (as join point is not scheduled)
+       and when bookkeeping is inserted in scheduled code there will not be a
+       gap of sched times in scheduled code, so is_ineligible_successor_p of
+       path going through bookkeeping will not say that join point is
+       ineligible.  */
+    INSN_SCHED_TIMES (new_insn) =
+      (pipelining_p
+       && ((flag_sel_sched_pipelining_outer_loops
+           && join_point == NEXT_INSN (bb_note (EBB_FIRST_BB (1))))
+	   || (join_point == NEXT_INSN (bb_note (EBB_FIRST_BB (0))))))
+        ? 0 : INSN_SCHED_TIMES (join_point);
 
     clear_expr (new_expr);
 
@@ -3800,6 +3924,7 @@ generate_bookkeeping_insn (rhs_t c_rhs, 
   }
 
   stat_bookkeeping_copies++;
+  return BLOCK_FOR_INSN (new_insn);
 }
 
 /* Remove from AV_PTR all insns that may need bookkeeping when scheduling 
@@ -4227,6 +4352,7 @@ fill_insns (fence_t fence, int seqno, il
 
 		  /* Split block to generate a new floating bb header.  */
 		  bb = sched_split_block (bb, place_to_insert);
+                  copy_data_sets (bb, prev_bb);
 		}
 	      else
 		{
@@ -4299,7 +4425,7 @@ fill_insns (fence_t fence, int seqno, il
 	    clear_expr (c_rhs);
 
 	    ++INSN_SCHED_TIMES (insn);
-
+            EXPR_ORIG_SCHED_CYCLE (INSN_EXPR (insn)) = fence->cycle;
 	    if (INSN_NOP_P (place_to_insert))
 	      /* Return the nop generated for preserving of data sets back
 		 into pool.  */
@@ -4392,6 +4518,7 @@ fill_insns (fence_t fence, int seqno, il
 	  /* Check that the recent movement didn't destroyed loop
 	     structure.  */
 	  gcc_assert (!flag_sel_sched_pipelining_outer_loops
+                      || !pipelining_p
 		      || current_loop_nest == NULL
 		      || loop_latch_edge (current_loop_nest));
         }
@@ -4415,7 +4542,7 @@ fill_insns (fence_t fence, int seqno, il
 	 as this will bring two boundaries and, hence, necessity to handle
 	 information for two or more blocks concurrently.  */
       if (insn == BB_END (BLOCK_FOR_INSN (insn)))
-	  break;
+        break;
 
       /* !!! This is a possible perfomance regression as we schedule one
 	 instruction at a time because of floating bb headers.  We need to
@@ -4501,6 +4628,7 @@ move_op (insn_t insn, av_set_t orig_ops,
   bool c_rhs_inited_p;
   rtx dest;
   bool generated_nop_p = false;
+  basic_block book_block = NULL;
   
   line_start ();
   print ("move_op(");
@@ -4521,6 +4649,9 @@ move_op (insn_t insn, av_set_t orig_ops,
 
   orig_ops = av_set_copy (orig_ops);
 
+  if (sel_bb_head_p (insn))
+    gcc_assert (AV_SET_VALID_P (insn));
+
   /* If we've found valid av set, then filter the orig_ops set.  */
   if (AV_SET_VALID_P (insn))
     {
@@ -4669,6 +4800,7 @@ move_op (insn_t insn, av_set_t orig_ops,
 	     operation is not only excessive, but it may not be supported 
 	     on certain platforms, e.g. "mov si, si" is invalid on i386.  */
 	  sel_remove_insn (insn);
+
 	  insn = x;
 	}
       }
@@ -4754,8 +4886,11 @@ move_op (insn_t insn, av_set_t orig_ops,
                      SCHED_TIMES to the maximum instead of minimum in the 
                      below function.  */
                   int old_times = EXPR_SCHED_TIMES (c_rhs);
+                  int use = MAX (EXPR_USEFULNESS (c_rhs), EXPR_USEFULNESS (x));
 
+                  gcc_assert (EXPR_USEFULNESS (c_rhs) == EXPR_USEFULNESS (x));
                   merge_expr_data (c_rhs, x);
+                  EXPR_USEFULNESS (c_rhs) = use;
                   if (EXPR_SCHED_TIMES (x) == 0)
                     EXPR_SCHED_TIMES (c_rhs) = old_times;
                 }
@@ -4780,7 +4915,7 @@ move_op (insn_t insn, av_set_t orig_ops,
   if (e1 && sel_num_cfg_preds_gt_1 (insn))
     {
       /* INSN is a joint point, insert bookkeeping code here.  */
-      generate_bookkeeping_insn (c_rhs, insn, e1, e2);
+      book_block = generate_bookkeeping_insn (c_rhs, insn, e1, e2);
       gcc_assert (sel_bb_head_p (insn));
     }
 
@@ -4789,11 +4924,98 @@ move_op (insn_t insn, av_set_t orig_ops,
   else
     gcc_assert (AV_LEVEL (insn) == INSN_WS_LEVEL (insn));
 
+  /* If bookkeeping code was inserted - we need to update av sets of basic
+     block, that recieved bookkeeping.  This should have minor impact on 
+     performance as valid av set is found in next basic block.  
+     In fact, after generation of bookkeeping insn, bookkeeping block does not
+     contain valid av set.  This happens because we are not following
+     Moon algorithm in every detail because of effectiveness.  The one
+     point in this implementatin affects av sets of bookkeeping block is
+     not considering instructions of simple moving (i.e. "r1 := r2") as
+     rhs-able instructions.  Consider example:
+
+     bookkeeping block           scheduling fence
+                   \              /
+                    \    join    /
+                      ----------
+                      |        |
+                      ----------
+                    /            \
+                   /              \
+              r1 := r2          r1 := r3
+
+     Imagine, we try to schedule insn "r1 := r3" on the current 
+     scheduling fence.  Also, note that av set of bookkeeping block
+     contain both insns "r1 := r2" and "r1 := r3".  When the insn has
+     been scheduled, CFG is as following:
+
+            r1 := r3               r1 := r3
+     bookkeeping block           scheduling fence
+                   \              /
+                    \    join    /
+                      ----------
+                      |        |
+                      ----------
+                    /            \
+                   /              \
+              r1 := r2
+
+     Here, insn "r1 := r3" was scheduled at the current scheduling point
+     and bookkeeping code was generated at the bookeeping block.  This
+     way insn "r1 := r2" is no longer available as a whole instruction
+     (but only as rhs) ahead of insn "r1 := r3" in bookkeeping block.
+     This situation is handled by calling update_data_sets.  */
+
+  if (book_block)
+    update_data_sets (sel_bb_head (book_block));
+
   /* If INSN was previously marked for deletion, it's time to do it.  */
   if (generated_nop_p)
     {
+      basic_block xbb = BLOCK_FOR_INSN (insn);
+
       gcc_assert (INSN_NOP_P (insn));
 
+      /* Check if there is a unnecessary jump after insn left.  */
+      if (jump_leads_only_to_bb_p (BB_END (xbb), xbb->next_bb))
+        {
+          sel_remove_insn (BB_END (xbb));
+          tidy_fallthru_edge (EDGE_SUCC (xbb, 0));
+        }
+
+      /* Check if there is an unnecessary jump in previous basic block leading
+         to next basic block left after removing INSN from stream.  
+         If it is so, remove that jump and redirect edge to current basic block
+         (where there was INSN before deletion).  This way when NOP will be 
+         deleted several instructions later with its basic block we will not 
+         get a jump to next instruction, which can be harmful.  */
+      if (/* INSN (nop) is the only insn in its bb.  */
+          NEXT_INSN (bb_note (xbb)) == BB_END (xbb)
+          /* Flow goes fallthru from current block to the next.  */
+          && EDGE_COUNT (xbb->succs) == 1
+          && (EDGE_SUCC (xbb, 0)->flags & EDGE_FALLTHRU)
+          /* And unconditional jump in previous basic block leads to
+             next basic block of XBB and this jump can be safely removed.  */
+          && in_current_region_p (xbb->prev_bb)
+          && jump_leads_only_to_bb_p (BB_END (xbb->prev_bb), xbb->next_bb)
+          /* Also this jump is not at the scheduling boundary.  */
+          && !IN_CURRENT_FENCE_P (BB_END (xbb->prev_bb)))
+        {
+          /* Clear data structures of jump - jump itself will be removed
+             by sel_redirect_edge_and_branch.  */
+          clear_expr (INSN_EXPR (BB_END (xbb->prev_bb)));
+          sel_redirect_edge_and_branch (EDGE_SUCC (xbb->prev_bb, 0), xbb);
+          gcc_assert (EDGE_SUCC (xbb->prev_bb, 0)->flags & EDGE_FALLTHRU);
+          /* It can turn out that after removing unused jump, basic block
+             that contained that jump, becomes empty too.  In such case
+             remove it too.  */
+          if (sel_bb_empty_p (xbb->prev_bb))
+            {
+              free_data_sets (xbb->prev_bb);
+              sel_remove_empty_bb (xbb->prev_bb, false, true);
+            }
+        }
+
       return_nop_to_pool (insn);
     }
 
@@ -4893,61 +5115,6 @@ add_region_head (void)
     }
 }
 
-/* Split all edges incoming to current region, but not those that 
-   come to loop header, and not those that come to preheader.  */
-static void
-split_edges_incoming_to_rgn (void)
-{
-  int i;
-  int cur_rgn = CONTAINING_RGN (BB_TO_BLOCK (0));
-  edge e;
-  VEC(edge, heap) *edges_to_split = NULL;
-
-  if (!current_loop_nest)
-    return;
-
-  for (i = 0; i < RGN_NR_BLOCKS (cur_rgn); i++)
-    {
-      edge_iterator ei;
-      basic_block bb;
-      bool has_preds_in_rgn;
-
-      bb = BASIC_BLOCK (BB_TO_BLOCK (i));
-
-      /* Skip header, preheaders, and single pred blocks.  */
-      if (bb == current_loop_nest->header)
-        continue;
-      if ((unsigned) bb->loop_depth < loop_depth (current_loop_nest))
-        continue;
-      if (EDGE_COUNT (bb->preds) < 2)
-        continue;
-
-      /* Skip also blocks that don't have preds in the region.  */
-      has_preds_in_rgn = false;
-      FOR_EACH_EDGE (e, ei, bb->preds)
-        if (in_current_region_p (e->src))
-          {
-            has_preds_in_rgn = true;
-            break;
-          }
-      if (!has_preds_in_rgn)
-        continue;
-
-      /* Record all edges we need to split.  */
-      FOR_EACH_EDGE (e, ei, bb->preds)
-        if (!in_current_region_p (e->src))
-          VEC_safe_push (edge, heap, edges_to_split, e);
-    }
-  
-  for (i = 0; VEC_iterate (edge, edges_to_split, i, e); i++)
-    /* Some of edges could be already redirected by previous splits.
-       So check this again before calling sel_split_edge.  */
-    if (!in_current_region_p (e->src))
-      sel_split_edge (e);
-
-  VEC_free (edge, heap, edges_to_split);
-}
-
 /* Init scheduling data for RGN.  Returns true when this region should not 
    be scheduled.  */
 static bool
@@ -4970,15 +5137,21 @@ sel_region_init (int rgn)
   if (flag_sel_sched_pipelining_outer_loops)
     {
       current_loop_nest = get_loop_nest_for_rgn (rgn);
-  
+
       if (current_loop_nest 
 	  && LOOP_PREHEADER_BLOCKS (current_loop_nest))
         {
           sel_add_loop_preheader ();
-          
+
           /* Check that we're starting with a valid information.  */
           gcc_assert (loop_latch_edge (current_loop_nest));
         }
+
+      if (current_loop_nest)
+        {
+          gcc_assert (LOOP_MARKED_FOR_PIPELINING_P (current_loop_nest));
+          MARK_LOOP_FOR_PIPELINING (current_loop_nest);
+        }
     }
 
   bbs = VEC_alloc (basic_block, heap, current_nr_blocks);
@@ -5051,12 +5224,7 @@ sel_region_init (int rgn)
 	 already created with loop optimizer, so if current region
 	 has a corresponding loop nest, we should pipeline it.  */
       if (flag_sel_sched_pipelining_outer_loops)
-	{
-	  pipelining_p = (current_loop_nest != NULL);
-
-	  if (pipelining_p)
-	    split_edges_incoming_to_rgn ();
-	}
+	pipelining_p = (current_loop_nest != NULL);
       else
 	pipelining_p = add_region_head ();
     }
@@ -5640,19 +5808,67 @@ sel_sched_region_1 (void)
                     }
                 }
             }
-	}
+        }
       else
         {
-          basic_block loop_entry;
+          basic_block loop_entry, loop_preheader = EBB_FIRST_BB (0);
 
           /* Schedule region pre-header first, if not pipelining 
              outer loops.  */
           bb = EBB_FIRST_BB (0);
           head = sel_bb_head (bb);
+          loop_entry = EBB_FIRST_BB (1);
           
-          if (sel_is_loop_preheader_p (bb))          
-            /* Don't leave old flags on insns in bb.  */
-            clear_outdated_rtx_info (bb);
+          /* Don't leave old flags on insns in loop preheader.  */
+          if (sel_is_loop_preheader_p (loop_preheader))          
+            {
+              basic_block prev_bb = loop_preheader->prev_bb;
+
+              /* If...  */
+              if (/* Preheader is empty;  */
+                  sel_bb_empty_p (loop_preheader)
+                  /* Block before preheader is in current region and
+                     contains only unconditional jump to header.  */
+                  && in_current_region_p (prev_bb)
+                  && NEXT_INSN (bb_note (prev_bb)) == BB_END (prev_bb)
+                  && jump_leads_only_to_bb_p (BB_END (prev_bb), 
+                                              loop_preheader->next_bb))
+                {
+                  /* Then remove empty preheader and unnecessary jump from
+                     previous block of preheader (usually latch).  */
+
+                  if (current_loop_nest->latch == prev_bb)
+                    current_loop_nest->latch = NULL;
+
+                  /* Remove latch!  */
+                  clear_expr (INSN_EXPR (BB_END (prev_bb)));
+                  sel_redirect_edge_and_branch (EDGE_SUCC (prev_bb, 0),
+                                                loop_preheader);
+
+                  /* Correct wrong moving of header to BB.  */
+                  if (current_loop_nest->header == loop_preheader)
+                    current_loop_nest->header = loop_preheader->next_bb;
+
+                  gcc_assert (EDGE_SUCC (prev_bb, 0)->flags & EDGE_FALLTHRU);
+
+                  /* Empty basic blocks should not have av and lv sets.  */
+                  free_data_sets (prev_bb);
+
+                  gcc_assert (BB_LV_SET (loop_preheader) == NULL
+                              && BB_AV_SET (loop_preheader) == NULL);
+                  gcc_assert (sel_bb_empty_p (loop_preheader)
+                              && sel_bb_empty_p (prev_bb));
+
+                  sel_remove_empty_bb (prev_bb, false, true);
+                  sel_remove_empty_bb (loop_preheader, false, true);
+                  preheader_removed = true;
+                  loop_preheader = NULL;
+                }
+
+              /* If BB was not deleted.  */
+              if (loop_preheader)
+                clear_outdated_rtx_info (loop_preheader);
+            }
           else if (head != NULL_RTX)
             {
               gcc_assert (INSN_SCHED_CYCLE (head) == 0);
@@ -5673,10 +5889,6 @@ sel_sched_region_1 (void)
             }
 
           /* Reschedule pipelined code without pipelining.  */
-          loop_entry = EBB_FIRST_BB (1);
-	  /* Please note that loop_header (not preheader) might not be in
-	     the current region.  Hence it is possible for loop_entry to have
-	     arbitrary number of predecessors.  */
 
           for (i = BLOCK_TO_BB (loop_entry->index); i < current_nr_blocks; i++)
             {
@@ -5696,16 +5908,16 @@ sel_sched_region_1 (void)
                    insn = NEXT_INSN (insn))
                 {
                   gcc_assert (INSN_P (insn));
-		  INSN_AFTER_STALL_P (insn) = 0;
+                  INSN_AFTER_STALL_P (insn) = 0;
                   INSN_SCHED_CYCLE (insn) = 0;
 
-		  /* ??? Should we reset those counters which reside in
-		     INSN_EXPR field (e.g. SPEC and SCHED_TIMES)?  */
-		  /* For now we do need to zero SCHED_TIMES because we don't
-		     want to skip dependencies from any instruction.  This
-		     will be a subject to consider when we implement better
-		     dependency tracking.  */
-		  INSN_SCHED_TIMES (insn) = 0;
+                  /* ??? Should we reset those counters which reside in
+                     INSN_EXPR field (e.g. SPEC and SCHED_TIMES)?  */
+                  /* For now we do need to zero SCHED_TIMES because we don't
+                     want to skip dependencies from any instruction.  This
+                     will be a subject to consider when we implement better
+                     dependency tracking.  */
+                  INSN_SCHED_TIMES (insn) = 0;
                 }
             }
 
@@ -5717,7 +5929,10 @@ sel_sched_region_1 (void)
 
           gcc_assert (fences == NULL);
 
-          init_fences (BB_END (EBB_FIRST_BB (0)));
+          if (loop_preheader)
+            init_fences (BB_END (loop_preheader));
+          else
+            init_fences (bb_note (loop_entry));
 
           sel_sched_region_2 (data);
         }
@@ -5733,6 +5948,8 @@ sel_sched_region (int rgn)
   if (sel_region_init (rgn))
     return;
 
+  gcc_assert (preheader_removed == false);
+
   sel_dump_cfg ("after-region-init");
 
   print ("sel_sched_region: start");
@@ -5764,6 +5981,7 @@ sel_sched_region (int rgn)
   }
 
   sel_region_finish ();
+  preheader_removed = false;
   
   sel_dump_cfg_1 ("after-region-finish",
 		  SEL_DUMP_CFG_CURRENT_REGION | SEL_DUMP_CFG_LV_SET
@@ -5787,14 +6005,14 @@ sel_global_init (void)
 
   init_sched_pools ();
 
-  if (flag_sel_sched_pipelining_outer_loops)
-    pipeline_outer_loops_init ();
-
   setup_sched_dump_to_stderr ();
 
   /* Setup the infos for sched_init.  */
   sel_setup_sched_infos ();
 
+  sched_rgn_init (flag_sel_sched_single_block_regions != 0,
+                  flag_sel_sched_ebb_regions != 0);
+
   sched_init ();
 
   /* Init lv_sets.  */
@@ -5810,9 +6028,6 @@ sel_global_init (void)
     VEC_free (basic_block, heap, bbs);
   }
 
-  sched_rgn_init (flag_sel_sched_single_block_regions != 0,
-                  flag_sel_sched_ebb_regions != 0);
-
   sched_extend_target ();
   sched_deps_init (true);
 
@@ -5845,7 +6060,7 @@ sel_global_finish (void)
 
   free_sel_dump_data ();
 
-  if (flag_sel_sched_pipelining_outer_loops)
+  if (flag_sel_sched_pipelining_outer_loops && current_loops)
     pipeline_outer_loops_finish ();
 
   free_sched_pools ();
Index: gcc/sel-sched-ir.c
===================================================================
--- gcc/sel-sched-ir.c	(revision 131180)
+++ gcc/sel-sched-ir.c	(working copy)
@@ -119,6 +119,10 @@ rtx nop_pattern = NULL_RTX;
 /* A special instruction that resides in EXIT_BLOCK.
    EXIT_INSN is successor of the insns that lead to EXIT_BLOCK.  */
 rtx exit_insn = NULL_RTX;
+
+/* TRUE if while scheduling current region, which is loop, its preheader 
+   was removed.  */
+bool preheader_removed = false;
 
 
 /* Forward static declarations.  */
@@ -130,7 +134,6 @@ static void deps_init_id (idata_t, insn_
 static void cfg_preds (basic_block, insn_t **, int *);
 
 static void sel_add_or_remove_bb (basic_block, int);
-static void free_data_sets (basic_block);
 static void move_bb_info (basic_block, basic_block);
 static void remove_empty_bb (basic_block, bool);
 
@@ -595,6 +598,151 @@ init_fences (insn_t old_fence)
     }
 }
 
+/* Merges two fences (filling fields of OLD_FENCE with resulting values) by
+   following rules: 1) state, target context and last scheduled insn are
+   propagated from fallthrough edge if it is availiable; 
+   2) deps context and cycle is propagated from more probable edge;
+   3) all other fields are set to corresponding constant values.  */
+static void
+merge_fences (fence_t f, insn_t insn,
+	      state_t state, deps_t dc, void *tc, rtx last_scheduled_insn, 
+	      rtx sched_next, int cycle, bool after_stall_p)
+{
+  insn_t last_scheduled_insn_old = FENCE_LAST_SCHEDULED_INSN (f);
+
+  gcc_assert (sel_bb_head_p (FENCE_INSN (f))
+              && !sched_next && !FENCE_SCHED_NEXT (f));
+
+  /* Check if we can decide which path fences came.  
+     If we can't (or don't want to) - reset all.  */
+  if (last_scheduled_insn == NULL
+      || last_scheduled_insn_old == NULL
+      /* This is a case when INSN is reachable on several paths from 
+         one insn (this can happen when pipelining of outer loops is on and 
+         there are two edges: one going around of inner loop and the other - 
+         right through it; in such case just reset everything).  */
+      || last_scheduled_insn == last_scheduled_insn_old)
+    {
+      state_reset (FENCE_STATE (f));
+      state_free (state);
+  
+      reset_deps_context (FENCE_DC (f));
+      delete_deps_context (dc);
+  
+      reset_target_context (FENCE_TC (f), true);
+      delete_target_context (tc);
+
+      if (cycle > FENCE_CYCLE (f))
+        FENCE_CYCLE (f) = cycle;
+
+      FENCE_LAST_SCHEDULED_INSN (f) = NULL;
+    }
+  else
+    {
+      edge edge_old = NULL, edge_new = NULL;
+      edge candidate;
+      succ_iterator si;
+      insn_t succ;
+    
+      /* Find fallthrough edge.  */
+      gcc_assert (BLOCK_FOR_INSN (insn)->prev_bb);
+      candidate = find_fallthru_edge (BLOCK_FOR_INSN (insn)->prev_bb);
+
+      if (!candidate
+          || (candidate->src != BLOCK_FOR_INSN (last_scheduled_insn)
+              && candidate->src != BLOCK_FOR_INSN (last_scheduled_insn_old)))
+        {
+          /* No fallthrough edge leading to basic block of INSN.  */
+          state_reset (FENCE_STATE (f));
+          state_free (state);
+  
+          reset_target_context (FENCE_TC (f), true);
+          delete_target_context (tc);
+  
+          FENCE_LAST_SCHEDULED_INSN (f) = NULL;
+        }
+      else
+        if (candidate->src == BLOCK_FOR_INSN (last_scheduled_insn))
+          {
+            /* Would be weird if same insn is successor of several fallthrough 
+               edges.  */
+            gcc_assert (BLOCK_FOR_INSN (insn)->prev_bb
+                        != BLOCK_FOR_INSN (last_scheduled_insn_old));
+
+            state_free (FENCE_STATE (f));
+            FENCE_STATE (f) = state;
+
+            delete_target_context (FENCE_TC (f));
+            FENCE_TC (f) = tc;
+
+            FENCE_LAST_SCHEDULED_INSN (f) = last_scheduled_insn;
+          }
+        else
+          {
+            /* Leave STATE, TC and LAST_SCHEDULED_INSN fields untouched.  */
+            state_free (state);
+
+            delete_target_context (tc);
+
+            gcc_assert (BLOCK_FOR_INSN (insn)->prev_bb
+                        != BLOCK_FOR_INSN (last_scheduled_insn));
+          }
+
+        /* Find edge of first predecessor (last_scheduled_insn_old->insn).  */
+        FOR_EACH_SUCC_1 (succ, si, last_scheduled_insn_old,
+                         SUCCS_NORMAL | SUCCS_SKIP_TO_LOOP_EXITS)
+          {
+            if (succ == insn)
+              {
+                /* No same successor allowed from several edges.  */
+                gcc_assert (!edge_old);
+                edge_old = si.e1;
+              }
+          }
+        /* Find edge of second predecessor (last_scheduled_insn->insn).  */
+        FOR_EACH_SUCC_1 (succ, si, last_scheduled_insn,
+                         SUCCS_NORMAL | SUCCS_SKIP_TO_LOOP_EXITS)
+          {
+            if (succ == insn)
+              {
+                /* No same successor allowed from several edges.  */
+                gcc_assert (!edge_new);
+                edge_new = si.e1;
+              }
+          }
+
+        /* Check if we can choose most probable predecessor.  */
+        if (edge_old == NULL || edge_new == NULL)
+          {
+            reset_deps_context (FENCE_DC (f));
+            delete_deps_context (dc);
+  
+            FENCE_CYCLE (f) = MAX (FENCE_CYCLE (f), cycle);
+          }
+        else
+          if (edge_new->probability > edge_old->probability)
+            {
+              delete_deps_context (FENCE_DC (f));
+              FENCE_DC (f) = dc;
+
+              FENCE_CYCLE (f) = cycle;
+            }
+          else
+            {
+              /* Leave DC and CYCLE untouched.  */
+              delete_deps_context (dc);
+            }
+    }
+
+  /* Fill remaining invariant fields.  */
+  if (after_stall_p)
+    FENCE_AFTER_STALL_P (f) = 1;
+
+  FENCE_ISSUED_INSNS (f) = 0;
+  FENCE_STARTS_CYCLE_P (f) = 1;
+  FENCE_SCHED_NEXT (f) = NULL;
+}
+
 /* Add a new fence to NEW_FENCES list, initializing it from all 
    other parameters.  */
 void
@@ -615,31 +763,10 @@ new_fences_add (flist_tail_t new_fences,
 	= &FLIST_NEXT (*FLIST_TAIL_TAILP (new_fences));
     }
   else
-    /* Here we should somehow choose between two DFA states.
-       Plain reset for now.  */
     {
-      gcc_assert (sel_bb_head_p (FENCE_INSN (f))
-		  && !sched_next && !FENCE_SCHED_NEXT (f));
-
-      state_reset (FENCE_STATE (f));
-      state_free (state);
-
-      reset_deps_context (FENCE_DC (f));
-      delete_deps_context (dc);
-
-      reset_target_context (FENCE_TC (f), true);
-      delete_target_context (tc);
-
-      if (cycle > FENCE_CYCLE (f))
-        FENCE_CYCLE (f) = cycle;
-
-      if (after_stall_p)
-        FENCE_AFTER_STALL_P (f) = 1;
+      merge_fences (f, insn, state, dc, tc, last_scheduled_insn, sched_next, 
+                    cycle, after_stall_p);
 
-      FENCE_ISSUED_INSNS (f) = 0;
-      FENCE_STARTS_CYCLE_P (f) = 1;
-      FENCE_LAST_SCHEDULED_INSN (f) = NULL;
-      FENCE_SCHED_NEXT (f) = NULL;
     }
 }
 
@@ -1532,17 +1659,20 @@ vinsns_correlate_as_rhses_p (vinsn_t x, 
 
 /* Initialize RHS.  */
 static void
-init_expr (expr_t expr, vinsn_t vi, int spec, int priority, int sched_times,
-	   int orig_bb_index, ds_t spec_done_ds, ds_t spec_to_check_ds,
+init_expr (expr_t expr, vinsn_t vi, int spec, int use, int priority,
+	   int sched_times, int orig_bb_index, ds_t spec_done_ds,
+	   ds_t spec_to_check_ds, int orig_sched_cycle,
 	   VEC(unsigned, heap) *changed_on, bool was_substituted)
 {
   vinsn_attach (vi);
 
   EXPR_VINSN (expr) = vi;
   EXPR_SPEC (expr) = spec;
+  EXPR_USEFULNESS (expr) = use;
   EXPR_PRIORITY (expr) = priority;
   EXPR_SCHED_TIMES (expr) = sched_times;
   EXPR_ORIG_BB_INDEX (expr) = orig_bb_index;
+  EXPR_ORIG_SCHED_CYCLE (expr) = orig_sched_cycle;
   EXPR_SPEC_DONE_DS (expr) = spec_done_ds;
   EXPR_SPEC_TO_CHECK_DS (expr) = spec_to_check_ds;
 
@@ -1561,10 +1691,11 @@ copy_expr (expr_t to, expr_t from)
   VEC(unsigned, heap) *temp;
 
   temp = VEC_copy (unsigned, heap, EXPR_CHANGED_ON_INSNS (from));
-  init_expr (to, EXPR_VINSN (from), EXPR_SPEC (from), EXPR_PRIORITY (from),
-	     EXPR_SCHED_TIMES (from), EXPR_ORIG_BB_INDEX (from),
-	     EXPR_SPEC_DONE_DS (from), EXPR_SPEC_TO_CHECK_DS (from), temp,
-	     EXPR_WAS_SUBSTITUTED (from));
+  init_expr (to, EXPR_VINSN (from), EXPR_SPEC (from), EXPR_USEFULNESS (from),
+	     EXPR_PRIORITY (from), EXPR_SCHED_TIMES (from),
+	     EXPR_ORIG_BB_INDEX (from), EXPR_SPEC_DONE_DS (from),
+	     EXPR_SPEC_TO_CHECK_DS (from), EXPR_ORIG_SCHED_CYCLE (from),
+	     temp, EXPR_WAS_SUBSTITUTED (from));
 }
 
 /* Same, but the final expr will not ever be in av sets, so don't copy 
@@ -1572,9 +1703,9 @@ copy_expr (expr_t to, expr_t from)
 void
 copy_expr_onside (expr_t to, expr_t from)
 {
-  init_expr (to, EXPR_VINSN (from), EXPR_SPEC (from), EXPR_PRIORITY (from),
-	     EXPR_SCHED_TIMES (from), 0,
-	     EXPR_SPEC_DONE_DS (from), EXPR_SPEC_TO_CHECK_DS (from), NULL,
+  init_expr (to, EXPR_VINSN (from), EXPR_SPEC (from), EXPR_USEFULNESS (from),
+	     EXPR_PRIORITY (from), EXPR_SCHED_TIMES (from), 0,
+	     EXPR_SPEC_DONE_DS (from), EXPR_SPEC_TO_CHECK_DS (from), 0, NULL,
 	     EXPR_WAS_SUBSTITUTED (from));
 }
 
@@ -1590,6 +1721,8 @@ merge_expr_data (expr_t to, expr_t from)
   if (RHS_SPEC (to) > RHS_SPEC (from))
     RHS_SPEC (to) = RHS_SPEC (from);
 
+  EXPR_USEFULNESS (to) += EXPR_USEFULNESS (from);
+
   if (RHS_PRIORITY (to) < RHS_PRIORITY (from))
     RHS_PRIORITY (to) = RHS_PRIORITY (from);
 
@@ -1599,6 +1732,9 @@ merge_expr_data (expr_t to, expr_t from)
   if (EXPR_ORIG_BB_INDEX (to) != EXPR_ORIG_BB_INDEX (from))
     EXPR_ORIG_BB_INDEX (to) = 0;
 
+  EXPR_ORIG_SCHED_CYCLE (to) = MIN (EXPR_ORIG_SCHED_CYCLE (to), 
+                                    EXPR_ORIG_SCHED_CYCLE (from));
+
   EXPR_SPEC_DONE_DS (to) = ds_max_merge (EXPR_SPEC_DONE_DS (to),
 					 EXPR_SPEC_DONE_DS (from));
 
@@ -1630,6 +1766,7 @@ merge_expr (expr_t to, expr_t from)
     change_vinsn_in_expr (to, EXPR_VINSN (from));
 
   merge_expr_data (to, from);
+  gcc_assert (EXPR_USEFULNESS (to) <= REG_BR_PROB_BASE);
 }
 
 /* Clear the information of this RHS.  */
@@ -1797,6 +1934,18 @@ av_set_substract_cond_branches (av_set_t
       av_set_iter_remove (&i);
 }
 
+/* Multiplies usefulness attribute of each member of av-set *AVP by 
+   value PROB / ALL_PROB.  */
+void
+av_set_split_usefulness (av_set_t *avp, int prob, int all_prob)
+{
+  av_set_iterator i;
+  expr_t expr;
+
+  FOR_EACH_RHS_1 (expr, i, avp)
+    EXPR_USEFULNESS (expr) = (EXPR_USEFULNESS (expr) * prob) / all_prob;
+}
+
 /* Leave in AVP only those expressions, which are present in AV,
    and return it.  */
 void
@@ -2157,8 +2306,8 @@ init_global_and_expr_for_insn (insn_t in
 
     /* Initialize INSN's expr.  */
     init_expr (INSN_EXPR (insn), vinsn_create (insn, force_unique_p), 0,
-	       INSN_PRIORITY (insn), 0, BLOCK_NUM (insn), spec_done_ds, 0,
-	       NULL, false);
+	       REG_BR_PROB_BASE, INSN_PRIORITY (insn), 0, BLOCK_NUM (insn),
+	       spec_done_ds, 0, 0, NULL, false);
   }
 
   init_first_time_insn_data (insn);
@@ -3097,6 +3246,7 @@ init_insn (insn_t insn)
 
   INSN_SEQNO (insn) = ssid->seqno;
   EXPR_ORIG_BB_INDEX (expr) = BLOCK_NUM (insn);
+  EXPR_ORIG_SCHED_CYCLE (expr) = 0;
 
   if (insn_init_create_new_vinsn_p)
     change_vinsn_in_expr (expr, vinsn_create (insn, init_insn_force_unique_p));
@@ -3110,8 +3260,8 @@ init_insn (insn_t insn)
 static void
 init_simplejump (insn_t insn)
 {
-  init_expr (INSN_EXPR (insn), vinsn_create (insn, false), 0, 0, 0, 
-             0, 0, 0, NULL, false);
+  init_expr (INSN_EXPR (insn), vinsn_create (insn, false), 0,
+	     REG_BR_PROB_BASE, 0, 0, 0, 0, 0, 0, NULL, false);
 
   INSN_SEQNO (insn) = get_seqno_of_a_pred (insn);
 
@@ -3277,6 +3427,8 @@ free_lv_sets (void)
   FOR_EACH_BB (bb)
     if (!sel_bb_empty_p (bb))
       free_lv_set (bb);
+    else
+      gcc_assert (BB_LV_SET (bb) == NULL);
 }
 
 /* Initialize an invalid LV_SET for BB.
@@ -3319,7 +3471,7 @@ free_av_set (basic_block bb)
 }
 
 /* Free data sets of BB.  */
-static void
+void
 free_data_sets (basic_block bb)
 {
   free_lv_set (bb);
@@ -3366,13 +3518,34 @@ exchange_av_sets (basic_block to, basic_
 }
 
 /* Exchange data sets of TO and FROM.  */
-static void
+void
 exchange_data_sets (basic_block to, basic_block from)
 {
   exchange_lv_sets (to, from);
   exchange_av_sets (to, from);
 }
 
+/* Copy data sets of FROM to TO.  */
+void
+copy_data_sets (basic_block to, basic_block from)
+{
+  gcc_assert (!BB_LV_SET_VALID_P (to) && !BB_AV_SET_VALID_P (to));
+  gcc_assert (BB_AV_SET (to) == NULL);
+
+  BB_AV_LEVEL (to) = BB_AV_LEVEL (from);
+  BB_LV_SET_VALID_P (to) = BB_LV_SET_VALID_P (from);
+
+  if (BB_AV_SET_VALID_P (from))
+    {
+      BB_AV_SET (to) = av_set_copy (BB_AV_SET (from));
+    }
+  if (BB_LV_SET_VALID_P (from))
+    {
+      gcc_assert (BB_LV_SET (to) != NULL);
+      COPY_REG_SET (BB_LV_SET (to), BB_LV_SET (from));
+    }
+}
+
 av_set_t
 get_av_set (insn_t insn)
 {
@@ -3610,13 +3783,37 @@ cfg_succs_1 (insn_t insn, int flags, ins
     (*succsp)[--n] = succ;
 }
 
+/* Same as above, but fill PROBS vector with probabilities of corresponding
+   successors depending on INSN.  */
+void
+cfg_succs_2 (insn_t insn, int flags, insn_t **succsp, int **probs, int *np)
+{
+  int n;
+  succ_iterator si;
+  insn_t succ;
+
+  n = *np = cfg_succs_n (insn, flags);
+
+  *succsp = xmalloc (n * sizeof (**succsp));
+  *probs = xmalloc (n * sizeof (**probs));
+
+  FOR_EACH_SUCC_1 (succ, si, insn, flags)
+    {
+      (*succsp)[--n] = succ;
+      (*probs)[n] = si.bb_end ? si.e1->probability 
+                                /* FIXME: Improve calculation when skipping 
+                                          inner loop to exits.  */
+                              : REG_BR_PROB_BASE;
+    }
+}
+
 /* Find all successors of INSN and record them in SUCCSP and their number 
    in NP.  Empty blocks are skipped, and only normal (forward in-region) 
    edges are processed.  */
 void
-cfg_succs (insn_t insn, insn_t **succsp, int *np)
+cfg_succs (insn_t insn, insn_t **succsp, int **probs, int *np)
 {
-  cfg_succs_1 (insn, SUCCS_NORMAL, succsp, np);
+  cfg_succs_2 (insn, SUCCS_NORMAL, succsp, probs, np);
 }
 
 /* Return the only successor of INSN, honoring FLAGS.  */
@@ -3643,6 +3840,34 @@ cfg_succ (insn_t insn)
   return cfg_succ_1 (insn, SUCCS_NORMAL);
 }
 
+/* Returns sum of all probabilities of successors of INSN (even ineligible).  
+   FIXME: Correct calculations when skipping inner loops while pipelining of
+          outer loops.  */
+int
+overall_prob_of_succs (insn_t insn)
+{
+  insn_t succ;
+  succ_iterator si;
+  int prob = 0;
+  bool b = false;
+  
+  FOR_EACH_SUCC_1 (succ, si, insn, SUCCS_ALL)
+    {
+      /* Check if it is not an end of basic block that we have only 
+         one successor.  */
+      gcc_assert (!b);
+      if (!si.bb_end)
+        {
+          prob = REG_BR_PROB_BASE;
+          b = true;
+        }
+      else
+        prob += si.e1->probability;
+    }
+
+  return prob;
+}
+
 /* Return the predecessors of BB in PREDS and their number in N. 
    Empty blocks are skipped.  SIZE is used to allocate PREDS.  */
 static void
@@ -3704,22 +3929,7 @@ sel_num_cfg_preds_gt_1 (insn_t insn)
   while (1)
     {
       if (EDGE_COUNT (bb->preds) > 1)
-	{
-	  if (ENABLE_SEL_CHECKING)
-	    {
-	      edge e;
-	      edge_iterator ei;
-
-	      FOR_EACH_EDGE (e, ei, bb->preds)
-		{
-		  basic_block pred = e->src;
-
-		  gcc_assert (in_current_region_p (pred));
-		}
-	    }
-
-	  return true;
-	}
+	return true;
 
       gcc_assert (EDGE_PRED (bb, 0)->dest == bb);
       bb = EDGE_PRED (bb, 0)->src;
@@ -3731,6 +3941,68 @@ sel_num_cfg_preds_gt_1 (insn_t insn)
   return false;
 }
 
+/* Returns the basic block to which jump instruction JUMP makes a jump (in case
+   it does not have several destinations).  */
+static inline basic_block
+jump_destination (insn_t jump)
+{
+  basic_block bb = BLOCK_FOR_INSN (jump);
+  gcc_assert (JUMP_P (jump) && BB_END (BLOCK_FOR_INSN (jump)) == jump);
+  if (EDGE_COUNT (bb->succs) > 2)
+    return NULL;
+  if (EDGE_COUNT (bb->succs) == 1)
+    return EDGE_SUCC (bb, 0)->dest;
+  if (EDGE_SUCC (bb, 0)->flags & EDGE_FALLTHRU)
+    return EDGE_SUCC (bb, 1)->dest;
+  return EDGE_SUCC (bb, 0)->dest;
+}
+
+/* Checks if instruction X is a jump to the header of the loop.  */
+static inline bool
+jump_to_back_edge_p (insn_t x)
+{
+  if (JUMP_P (x)
+      && jump_destination (x)
+      && in_current_region_p (jump_destination (x))
+      && (BLOCK_TO_BB (BLOCK_FOR_INSN (x)->index) 
+          >= BLOCK_TO_BB (jump_destination (x)->index)))
+    return true;
+  return false;
+}
+
+/* Checks if path P contains jump to the loop header.  */
+static inline bool
+path_contains_back_edge_p (ilist_t p)
+{
+  for (; p != NULL; p = ILIST_NEXT (p))
+    {
+      if (jump_to_back_edge_p (ILIST_INSN (p)))
+        return true;
+    }
+  return false;
+}
+
+/* Checks if path (con INSN P) contains two consequtive instructions, first of
+   which has sched times of zero and second one non-zero sched times.  */
+static inline bool
+path_contains_switch_of_sched_times_p (insn_t insn, ilist_t p)
+{
+  if (!p)
+    return false;
+
+  if (INSN_SCHED_TIMES (insn) > 0
+      && INSN_SCHED_TIMES (ILIST_INSN (p)) == 0)
+    return true;
+
+  for (; ILIST_NEXT (p) != NULL; p = ILIST_NEXT (p))
+    {
+      if (INSN_SCHED_TIMES (ILIST_INSN (p)) > 0
+          && INSN_SCHED_TIMES (ILIST_INSN (ILIST_NEXT (p))) == 0)
+        return true;
+    }
+  return false;
+}
+
 /* Returns true if INSN is not a downward continuation of the given path P in 
    the current stage.  */
 bool
@@ -3767,7 +4039,8 @@ is_ineligible_successor (insn_t insn, il
       /* An insn from another fence could also be 
 	 scheduled earlier even if this insn is not in 
 	 a fence list right now.  Check INSN_SCHED_CYCLE instead.  */
-      || (!pipelining_p && INSN_SCHED_TIMES (insn) > 0))
+      || ((!pipelining_p || !path_contains_back_edge_p (p))
+          && path_contains_switch_of_sched_times_p (insn, p)))
     return true;
   else
     return false;
@@ -4218,6 +4491,8 @@ sel_remove_empty_bb (basic_block empty_b
 {
   basic_block merge_bb;
 
+  gcc_assert (sel_bb_empty_p (empty_bb));
+
   if (merge_up_p)
     {
       merge_bb = empty_bb->prev_bb;
@@ -4227,8 +4502,22 @@ sel_remove_empty_bb (basic_block empty_b
     }
   else
     {
+      edge e;
+      edge_iterator ei;
+
       merge_bb = bb_next_bb (empty_bb);
 
+      /* Redirect incoming edges (except fallthrough one) of EMPTY_BB to its 
+         successor block.  */
+      for (ei = ei_start (empty_bb->preds);
+           (e = ei_safe_edge (ei)); )
+        {
+          if (! (e->flags & EDGE_FALLTHRU))
+            sel_redirect_edge_and_branch (e, merge_bb);
+          else
+            ei_next (&ei);
+        }
+
       gcc_assert (EDGE_COUNT (empty_bb->succs) == 1
 		  && EDGE_SUCC (empty_bb, 0)->dest == merge_bb);
     }
@@ -4532,7 +4821,9 @@ sel_redirect_edge_and_branch_force (edge
 
   /* This function could not be used to spoil the loop structure by now,
      thus we don't care to update anything.  But check it to be sure.  */
-  if (flag_sel_sched_pipelining_outer_loops && current_loop_nest)
+  if (flag_sel_sched_pipelining_outer_loops 
+      && current_loop_nest 
+      && pipelining_p)
     gcc_assert (loop_latch_edge (current_loop_nest));
 
   /* Now the CFG has been updated, and we can init data for the newly 
@@ -4549,6 +4840,7 @@ sel_redirect_edge_and_branch (edge e, ba
   bool latch_edge_p;
 
   latch_edge_p = (flag_sel_sched_pipelining_outer_loops 
+                  && pipelining_p
                   && current_loop_nest
                   && e == loop_latch_edge (current_loop_nest));
 
@@ -5023,7 +5315,7 @@ considered_for_pipelining_p (struct loop
      latch.  We can't use header here, because this header could be 
      just removed preheader and it will give us the wrong region number.
      Latch can't be used because it could be in the inner loop too.  */
-  if (LOOP_MARKED_FOR_PIPELINING_P (loop))
+  if (LOOP_MARKED_FOR_PIPELINING_P (loop) && pipelining_p)
     {
       int rgn = CONTAINING_RGN (loop->latch->index);
 
@@ -5141,6 +5433,9 @@ sel_find_rgns (void)
 {
   struct loop *loop;
 
+  pipeline_outer_loops_init ();
+  extend_regions ();
+
   if (current_loops)
     /* Start traversing from the root node.  */
     for (loop = VEC_index (loop_p, current_loops->larray, 0)->inner; 
@@ -5181,7 +5476,6 @@ sel_add_loop_preheader (void)
     }
   
   VEC_free (basic_block, heap, preheader_blocks);
-  MARK_LOOP_FOR_PIPELINING (current_loop_nest);
 }
 
 /* While pipelining outer loops, returns TRUE if BB is a loop preheader.  
@@ -5195,6 +5489,9 @@ sel_is_loop_preheader_p (basic_block bb)
     {
       struct loop *outer;
 
+      if (preheader_removed)
+        return false;
+
       /* Preheader is the first block in the region.  */
       if (BLOCK_TO_BB (bb->index) == 0)
         return true;
@@ -5218,6 +5515,29 @@ sel_is_loop_preheader_p (basic_block bb)
   return false;
 }
 
+/* Checks whether JUMP leads to basic block DEST_BB and no other blocks.  */
+bool
+jump_leads_only_to_bb_p (insn_t jump, basic_block dest_bb)
+{
+  basic_block jump_bb = BLOCK_FOR_INSN (jump);
+
+  /* It is not jump, jump with side-effects or jump can lead to several 
+     basic blocks.  */
+  if (!onlyjump_p (jump)
+      || !any_uncondjump_p(jump))
+    return false;
+
+  /* Several outgoing edges, abnormal edge or destination of jump is 
+     not DEST_BB.  */
+  if (EDGE_COUNT (jump_bb->succs) != 1
+      || EDGE_SUCC (jump_bb, 0)->flags & EDGE_ABNORMAL
+      || EDGE_SUCC (jump_bb, 0)->dest != dest_bb)
+    return false;
+
+  /* If not anything of the upper.  */
+  return true;
+}
+
 /* Removes the loop preheader from the current region and saves it in
    PREHEADER_BLOCKS of the father loop, so they will be added later to 
    region that represents an outer loop.  
@@ -5228,6 +5548,7 @@ sel_remove_loop_preheader (void)
   int i, old_len;
   int cur_rgn = CONTAINING_RGN (BB_TO_BLOCK (0));
   basic_block bb;
+  bool all_empty_p = true;
   VEC(basic_block, heap) *preheader_blocks 
     = LOOP_PREHEADER_BLOCKS (loop_outer (current_loop_nest));
 
@@ -5242,7 +5563,11 @@ sel_remove_loop_preheader (void)
       /* If the basic block belongs to region, but doesn't belong to 
 	 corresponding loop, then it should be a preheader.  */
       if (sel_is_loop_preheader_p (bb))
-        VEC_safe_push (basic_block, heap, preheader_blocks, bb);
+        {
+          VEC_safe_push (basic_block, heap, preheader_blocks, bb);
+          if (BB_END (bb) != bb_note (bb))
+            all_empty_p = false;
+        }
     }
   
   /* Remove these blocks only after iterating over the whole region.  */
@@ -5256,8 +5581,47 @@ sel_remove_loop_preheader (void)
     }
 
   if (!considered_for_pipelining_p (loop_outer (current_loop_nest)))
-    /* Immediately create new region from preheader.  */
-    make_region_from_loop_preheader (&preheader_blocks);
+    {
+      if (!all_empty_p)
+        /* Immediately create new region from preheader.  */
+        make_region_from_loop_preheader (&preheader_blocks);
+      else
+        {
+          /* If all preheader blocks are empty - dont create new empty region.
+             Instead, remove them completely.  */
+          for (i = 0; VEC_iterate (basic_block, preheader_blocks, i, bb); i++)
+            {
+              edge e;
+              edge_iterator ei;
+              basic_block prev_bb = bb->prev_bb, next_bb = bb->next_bb;
+
+              /* Redirect all incoming edges to next basic block.  */
+              for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei)); )
+                {
+                  if (! (e->flags & EDGE_FALLTHRU))
+                    redirect_edge_and_branch (e, bb->next_bb);
+                  else
+                    redirect_edge_succ (e, bb->next_bb);
+                }
+              gcc_assert (BB_NOTE_LIST (bb) == NULL);
+              delete_basic_block (bb);
+
+              /* Check if after deleting preheader there is a nonconditional 
+                 jump in PREV_BB that leads to the next basic block NEXT_BB.  
+                 If it is so - delete this jump and clear data sets of its 
+                 basic block if it becomes empty.  */
+              if (next_bb->prev_bb == prev_bb
+                  && prev_bb != ENTRY_BLOCK_PTR
+                  && jump_leads_only_to_bb_p (BB_END (prev_bb), next_bb))
+                {
+                  redirect_edge_and_branch (EDGE_SUCC (prev_bb, 0), next_bb);
+                  if (BB_END (prev_bb) == bb_note (prev_bb))
+                    free_data_sets (prev_bb);
+                }
+            }
+        }
+      VEC_free (basic_block, heap, preheader_blocks);
+    }
   else
     /* Store preheader within the father's loop structure.  */
     SET_LOOP_PREHEADER_BLOCKS (loop_outer (current_loop_nest),
Index: gcc/sel-sched-ir.h
===================================================================
--- gcc/sel-sched-ir.h	(revision 131180)
+++ gcc/sel-sched-ir.h	(working copy)
@@ -91,6 +91,10 @@ struct _expr
      control on scheduling.  */
   int spec;
 
+  /* Degree of speculativeness too.  Shows the chance of the result of 
+     instruction to be actually used if it is moved to the current point.  */
+  int usefulness;
+
   /* A priority of this expression.  */
   int priority;
 
@@ -109,6 +113,10 @@ struct _expr
      (used only during move_op ()).  */
   ds_t spec_to_check_ds;
 
+  /* Cycle on which original insn was scheduled.  Zero when it has not yet 
+     been scheduled or more than one originator.  */
+  int orig_sched_cycle;
+
   /* A vector of insn's hashes on which this expr was changed when 
      moving up.  We can't use bitmap here, because the recorded insn
      could be scheduled, and its bookkeeping copies should be checked 
@@ -135,9 +143,11 @@ typedef expr_t rhs_t;
 #define EXPR_SEPARABLE_P(EXPR) (VINSN_SEPARABLE_P (EXPR_VINSN (EXPR)))
 
 #define EXPR_SPEC(EXPR) ((EXPR)->spec)
+#define EXPR_USEFULNESS(EXPR) ((EXPR)->usefulness)
 #define EXPR_PRIORITY(EXPR) ((EXPR)->priority)
 #define EXPR_SCHED_TIMES(EXPR) ((EXPR)->sched_times)
 #define EXPR_ORIG_BB_INDEX(EXPR) ((EXPR)->orig_bb_index)
+#define EXPR_ORIG_SCHED_CYCLE(EXPR) ((EXPR)->orig_sched_cycle)
 #define EXPR_SPEC_DONE_DS(EXPR) ((EXPR)->spec_done_ds)
 #define EXPR_SPEC_TO_CHECK_DS(EXPR) ((EXPR)->spec_to_check_ds)
 #define EXPR_CHANGED_ON_INSNS(EXPR) ((EXPR)->changed_on_insns)
@@ -707,7 +717,9 @@ extern rtx exit_insn;
 #define LOOP_PREHEADER_BLOCKS(LOOP) ((size_t)((LOOP)->aux) == 1 \
                                      ? NULL \
                                      : ((VEC(basic_block, heap) *) (LOOP)->aux))
-#define SET_LOOP_PREHEADER_BLOCKS(LOOP,BLOCKS) ((LOOP)->aux = BLOCKS)
+#define SET_LOOP_PREHEADER_BLOCKS(LOOP,BLOCKS) ((LOOP)->aux = \
+                                                  BLOCKS != NULL \
+                                                    ? BLOCKS : (LOOP)->aux)
 
 /* When false, only notes may be added.  */
 extern bool can_add_real_insns_p;
@@ -840,6 +852,7 @@ extern bool enable_moveup_set_path_p;
 extern bool enable_schedule_as_rhs_p;
 extern bool pipelining_p;
 extern bool bookkeeping_p;
+extern bool preheader_removed;
 
 /* Functions that are used in sel-sched.c.  */
 
@@ -912,6 +925,7 @@ extern void av_set_clear (av_set_t *);
 extern void av_set_leave_one (av_set_t *);
 extern rhs_t av_set_element (av_set_t, int);
 extern void av_set_substract_cond_branches (av_set_t *);
+extern void av_set_split_usefulness (av_set_t *, int, int);
 extern void av_set_intersect (av_set_t *, av_set_t);
 
 extern void sel_save_haifa_priorities (void);
@@ -941,6 +955,8 @@ extern void sel_remove_insn (insn_t);
 extern int vinsn_dfa_cost (vinsn_t, fence_t);
 extern bool bb_header_p (insn_t);
 extern void sel_init_invalid_data_sets (insn_t);
+extern bool insn_at_boundary_p (insn_t);
+extern bool jump_leads_only_to_bb_p (insn_t, basic_block);
 
 /* Basic block and CFG functions.  */
 
@@ -959,9 +975,11 @@ extern void sel_finish_bbs (void);
 extern int cfg_succs_n (insn_t, int);
 extern bool sel_insn_has_single_succ_p (insn_t, int);
 extern void cfg_succs_1 (insn_t, int, insn_t **, int *);
-extern void cfg_succs (insn_t, insn_t **, int *);
+extern void cfg_succs_2 (insn_t, int, insn_t **, int **, int *);
+extern void cfg_succs (insn_t, insn_t **, int **, int *);
 extern insn_t cfg_succ_1 (insn_t, int);
 extern insn_t cfg_succ (insn_t);
+extern int overall_prob_of_succs (insn_t);
 extern bool sel_num_cfg_preds_gt_1 (insn_t);
 
 extern bool is_ineligible_successor (insn_t, ilist_t);
@@ -989,6 +1007,9 @@ extern void make_region_from_loop_prehea
 extern void sel_add_loop_preheader (void);
 extern bool sel_is_loop_preheader_p (basic_block);
 extern void clear_outdated_rtx_info (basic_block);
+extern void free_data_sets (basic_block);
+extern void exchange_data_sets (basic_block, basic_block);
+extern void copy_data_sets (basic_block, basic_block);
 
 extern void sel_register_cfg_hooks (void);
 extern void sel_unregister_cfg_hooks (void);
@@ -1148,10 +1169,6 @@ get_all_loop_exits (basic_block bb)
 		continue;
 	      }
 	  }
-	else
-	  {
-	    gcc_assert (!inner_loop_header_p (e->dest));
-	  }
     }
 
   return exits;
Index: gcc/sel-sched-dump.c
===================================================================
--- gcc/sel-sched-dump.c	(revision 131180)
+++ gcc/sel-sched-dump.c	(working copy)
@@ -358,6 +358,14 @@ dump_expr_1 (expr_t expr, int flags)
 	print ("spec:%d;", spec);
     }
 
+  if (flags & DUMP_EXPR_USEFULNESS)
+    {
+      int use = EXPR_USEFULNESS (expr);
+
+      if (use != REG_BR_PROB_BASE)
+        print ("use:%d;", use);
+    }
+
   if (flags & DUMP_EXPR_PRIORITY)
     print ("prio:%d;", EXPR_PRIORITY (expr));
 
Index: gcc/sel-sched-dump.h
===================================================================
--- gcc/sel-sched-dump.h	(revision 131180)
+++ gcc/sel-sched-dump.h	(working copy)
@@ -102,10 +102,11 @@ enum _dump_expr
     DUMP_EXPR_SCHED_TIMES = 16,
     DUMP_EXPR_SPEC_DONE_DS = 32,
     DUMP_EXPR_ORIG_BB = 64,
+    DUMP_EXPR_USEFULNESS = 128,
 
     DUMP_EXPR_ALL = (DUMP_EXPR_VINSN | DUMP_EXPR_SPEC | DUMP_EXPR_PRIORITY
 		     | DUMP_EXPR_SCHED_TIMES | DUMP_EXPR_SPEC_DONE_DS
-		     | DUMP_EXPR_ORIG_BB)
+		     | DUMP_EXPR_ORIG_BB | DUMP_EXPR_USEFULNESS)
   };
 
 extern void dump_expr_1 (expr_t, int);
Index: gcc/sched-int.h
===================================================================
--- gcc/sched-int.h	(revision 131180)
+++ gcc/sched-int.h	(working copy)
@@ -604,7 +604,11 @@ struct spec_info_def
 
   /* Minimal cumulative weakness of speculative instruction's
      dependencies, so that insn will be scheduled.  */
-  dw_t weakness_cutoff;
+  dw_t data_weakness_cutoff;
+
+  /* Minimal usefulness of speculative instruction to be considered for
+     scheduling.  */
+  int control_weakness_cutoff;
 
   /* Flags from the enum SPEC_SCHED_FLAGS.  */
   int flags;
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 131180)
+++ gcc/passes.c	(working copy)
@@ -774,11 +774,11 @@ init_optimization_passes (void)
 	      NEXT_PASS (pass_split_before_regstack);
 	      NEXT_PASS (pass_stack_regs_run);
 	    }
-	  NEXT_PASS (pass_compute_alignments);
 	  NEXT_PASS (pass_duplicate_computed_gotos);
 	  NEXT_PASS (pass_variable_tracking);
 	  NEXT_PASS (pass_free_cfg);
 	  NEXT_PASS (pass_machine_reorg);
+	  NEXT_PASS (pass_compute_alignments);
 	  NEXT_PASS (pass_cleanup_barriers);
 	  NEXT_PASS (pass_delay_slots);
 	  NEXT_PASS (pass_df_finish);
Index: gcc/config/ia64/ia64.opt
===================================================================
--- gcc/config/ia64/ia64.opt	(revision 131181)
+++ gcc/config/ia64/ia64.opt	(working copy)
@@ -180,4 +180,8 @@ msel-sched-dont-check-control-spec
 Target Report Var(mflag_sel_sched_dont_check_control_spec) Init(0)
 Don't generate checks for control speculation in selective scheduling
 
+mstop-bit-before-check
+Target Report Var(mflag_stop_bit_before_check) Init(0)
+Force a barrier before a speculative check
+
 ; This comment is to ensure we retain the blank line above.
Index: gcc/config/ia64/ia64.c
===================================================================
--- gcc/config/ia64/ia64.c	(revision 131181)
+++ gcc/config/ia64/ia64.c	(working copy)
@@ -6048,11 +6048,14 @@ group_barrier_needed (rtx insn)
       if (! need_barrier)
 	need_barrier = rws_access_regno (REG_VOLATILE, flags, 0);
 
-      /* Force a barrier before a speculative check.  This is used to allow 
-         more instructions to move through the check and to minimize 
-         delaying of other instructions in case this checks stalls.  */
-      if (ia64_spec_check_p (insn))
-	need_barrier = 1;
+      if (mflag_stop_bit_before_check)
+        {
+          /* Force a barrier before a speculative check.  This is used to allow
+             more instructions to move through the check and to minimize
+             delaying of other instructions in case this checks stalls.  */
+          if (ia64_spec_check_p (insn))
+            need_barrier = 1;
+        }
 
       break;
 

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]