This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive


Hi Jeff,

On 27/07/15 17:40, Kyrill Tkachov wrote:
On 27/07/15 17:09, Jeff Law wrote:
On 07/27/2015 04:17 AM, Kyrill Tkachov wrote:
I experimented with resource.c and the roadblock I hit is that it
seems to have an assumption that it operates on hard regs (in fact
the struct it uses to describe the resources has a HARD_REG_SET for
the regs) and so it triggers various HARD_REGISTER_P asserts when I
try to use the functions there. if-conversion runs before register
allocation, so we're dealing with pseudos here.
Sigh.  resource.c probably isn't going to be useful then.

My other attempt was to go over BB_A and mark the set registers in a
   bitmap then go over BB_B and do a FOR_EACH_SUBRTX of the SET_SRC of
each insn. If a sub-rtx is a reg that is set in the bitmap from BB_A
we return false. This seemed to do the job and testing worked out ok.
That would require one walk over BB_A, one walk over BB_B but I don't
know how expensive FOR_EACH_SUBRTX walks are...

Would that be an acceptable solution?
I think the latter is reasonable.  Ultimately we have to do a full look
at those rtxs, so it's unavoidable to some extent.

The only other possibility would be to use the DF framework.  I'm not
sure if it's even initialized for the ifcvt code.  If it is, then you
might be able to avoid some of the walking of the insns and instead walk
the DF structures.
I think it is initialized (I look at df_get_live_out earlier on
in the call chain). I suppose what we want is for the live in regs for BB_B
to not include any of the set regs in BB_A?



<snip>
It fails when the last insn is not recognised, because
noce_try_cmove_arith can modify the last insn, but I have not seen
it cause any trouble. If it fails then back in noce_try_cmove_arith
we goto end_seq_and_fail which ends the sequence and throws it away
(and cancels if-conversion down that path), so it should be safe.
OK, I was working for the assumption that memoization ought not
fail, but it seems that was a bad assumption on my part.    So
given noce_try_cmove_arith can change the last insn and make it
no-recognizable this code seems reasoanble.

So I think the only outstanding issues are:

1. Investigate moving rather than re-emitting insns.
I'll look into that, but what is the machinery by which one moves
insns?
I don't think we have any good generic machinery for this.  I think
every pass that needs this capability unlinks the insn from the chain
and patches it back in at the new location.
That's the SET_PREV_INSN, SET_NEXT_INSN functions, right?

The current way the top-level noce_process_if_block is structured
it expects the various ifcvt functions (like noce_try_cmove_arith)
to generate a sequence, then it takes it, unshares it and removes
the empty basic blocks.

If we're to instead move insns around we'd need to further modify
   noce_process_if_block to handle differently
   this one case where we move insns instead of re-emitting them.
I think this would make that function more convoluted than it needs to be.
With the current approach we always call unshare_all_rtl_in_chain on the
emitted sequence which should take care of any RTL sharing issues and in
practice I don't expect to have more than 3-4 insns in these sequences since
they will be guarded by the branch cost.

So I would rather argue for re-emitting insns in this case to keep consistent
with the dozen or so similar functions in ifcvt.c that already work that way.

Here's a respin.
I've reworked bbs_ok_for_cmove_arith to go over BB_A once and record
the set registers then go over BB_B once and look inside the SET_SRC
of each insn for those registers. How does this look? Would you like
me to investigate the data-flow infrastructure approach?

Also, in bb_valid_for_noce_process_p I iterate through the sub-rtxes
looking for a MEM with the FOR_EACH_SUBRTX machinery.

As I said above, I think moving the insns rather than re-emitting them
would make the function more convoluted than I think it needs to be.

Bootstrapped and tested on arm, aarch64, x86_64.



2015-07-28  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

    * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
    then_cost, else_cost fields.  Change branch_cost field to unsigned int.
    (end_ifcvt_sequence): Call set_used_flags on each insn in the
    sequence.
    (noce_simple_bbs): New function.
    (noce_try_move): Bail if basic blocks are not simple.
    (noce_try_store_flag): Likewise.
    (noce_try_store_flag_constants): Likewise.
    (noce_try_addcc): Likewise.
    (noce_try_store_flag_mask): Likewise.
    (noce_try_cmove): Likewise.
    (noce_try_minmax): Likewise.
    (noce_try_abs): Likewise.
    (noce_try_sign_mask): Likewise.
    (noce_try_bitop): Likewise.
    (bbs_ok_for_cmove_arith): New function.
    (noce_emit_all_but_last): Likewise.
    (noce_emit_insn): Likewise.
    (noce_emit_bb): Likewise.
    (noce_try_cmove_arith): Handle non-simple basic blocks.
    (insn_valid_noce_process_p): New function.
    (contains_mem_rtx_p): Likewise.
    (bb_valid_for_noce_process_p): Likewise.
    (noce_process_if_block): Allow non-simple basic blocks
    where appropriate.

2015-07-28  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

    * gcc.dg/ifcvt-1.c: New test.
    * gcc.dg/ifcvt-2.c: Likewise.
    * gcc.dg/ifcvt-3.c: Likewise.
Thanks,
Kyrill

jeff


commit a02417f015b70a0e47170ac7c41a5569392085a5
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Wed Jul 8 15:45:04 2015 +0100

    [PATCH][ifcvt] Make non-conditional execution if-conversion more aggressive

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 2d97cc5..dae9b41 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -53,6 +53,7 @@
 #include "tree-pass.h"
 #include "dbgcnt.h"
 #include "shrink-wrap.h"
+#include "rtl-iter.h"
 #include "ifcvt.h"
 
 #ifndef HAVE_incscc
@@ -816,8 +817,17 @@ struct noce_if_info
      form as well.  */
   bool then_else_reversed;
 
+  /* True if the contents of then_bb and else_bb are a
+     simple single set instruction.  */
+  bool then_simple;
+  bool else_simple;
+
+  /* The total rtx cost of the instructions in then_bb and else_bb.  */
+  unsigned int then_cost;
+  unsigned int else_cost;
+
   /* Estimated cost of the particular branch instruction.  */
-  int branch_cost;
+  unsigned int branch_cost;
 };
 
 static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int);
@@ -1037,6 +1047,10 @@ end_ifcvt_sequence (struct noce_if_info *if_info)
   set_used_flags (if_info->cond);
   set_used_flags (if_info->a);
   set_used_flags (if_info->b);
+
+  for (insn = seq; insn; insn = NEXT_INSN (insn))
+    set_used_flags (insn);
+
   unshare_all_rtl_in_chain (seq);
   end_sequence ();
 
@@ -1054,6 +1068,21 @@ end_ifcvt_sequence (struct noce_if_info *if_info)
   return seq;
 }
 
+/* Return true iff the then and else basic block (if it exists)
+   consist of a single simple set instruction.  */
+
+static bool
+noce_simple_bbs (struct noce_if_info *if_info)
+{
+  if (!if_info->then_simple)
+    return false;
+
+  if (if_info->else_bb)
+    return if_info->else_simple;
+
+  return true;
+}
+
 /* Convert "if (a != b) x = a; else x = b" into "x = a" and
    "if (a == b) x = a; else x = b" into "x = b".  */
 
@@ -1068,6 +1097,9 @@ noce_try_move (struct noce_if_info *if_info)
   if (code != NE && code != EQ)
     return FALSE;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   /* This optimization isn't valid if either A or B could be a NaN
      or a signed zero.  */
   if (HONOR_NANS (if_info->x)
@@ -1116,6 +1148,9 @@ noce_try_store_flag (struct noce_if_info *if_info)
   rtx target;
   rtx_insn *seq;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   if (CONST_INT_P (if_info->b)
       && INTVAL (if_info->b) == STORE_FLAG_VALUE
       && if_info->a == const0_rtx)
@@ -1164,6 +1199,9 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
   int normalize, can_reverse;
   machine_mode mode;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   if (CONST_INT_P (if_info->a)
       && CONST_INT_P (if_info->b))
     {
@@ -1292,6 +1330,9 @@ noce_try_addcc (struct noce_if_info *if_info)
   rtx_insn *seq;
   int subtract, normalize;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   if (GET_CODE (if_info->a) == PLUS
       && rtx_equal_p (XEXP (if_info->a, 0), if_info->b)
       && (reversed_comparison_code (if_info->cond, if_info->jump)
@@ -1383,6 +1424,9 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
   rtx_insn *seq;
   int reversep;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   reversep = 0;
   if ((if_info->branch_cost >= 2
        || STORE_FLAG_VALUE == -1)
@@ -1551,6 +1595,9 @@ noce_try_cmove (struct noce_if_info *if_info)
   rtx target;
   rtx_insn *seq;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   if ((CONSTANT_P (if_info->a) || register_operand (if_info->a, VOIDmode))
       && (CONSTANT_P (if_info->b) || register_operand (if_info->b, VOIDmode)))
     {
@@ -1585,6 +1632,116 @@ noce_try_cmove (struct noce_if_info *if_info)
   return FALSE;
 }
 
+/* Return true iff the registers that the insns in BB_A set do not
+   get used in BB_B.  */
+
+static bool
+bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
+{
+  rtx_insn *a_insn;
+  bitmap bba_sets = BITMAP_ALLOC (&reg_obstack);
+
+  FOR_BB_INSNS (bb_a, a_insn)
+    {
+      if (!active_insn_p (a_insn))
+	continue;
+
+      rtx sset_a = single_set (a_insn);
+
+      if (!sset_a)
+	{
+	  BITMAP_FREE (bba_sets);
+	  return false;
+	}
+
+      rtx dest_reg = SET_DEST (sset_a);
+      bitmap_set_bit (bba_sets, REGNO (dest_reg));
+    }
+
+  rtx_insn *b_insn;
+  subrtx_iterator::array_type array;
+
+  FOR_BB_INSNS (bb_b, b_insn)
+    {
+      if (!active_insn_p (b_insn))
+	continue;
+
+      rtx sset_b = single_set (b_insn);
+
+      if (!sset_b)
+	{
+	  BITMAP_FREE (bba_sets);
+	  return false;
+	}
+
+      FOR_EACH_SUBRTX (iter, array, SET_SRC (sset_b), NONCONST)
+	{
+	  const_rtx x = *iter;
+	  if (REG_P (x) && bitmap_bit_p (bba_sets, REGNO (x)))
+	    {
+	      BITMAP_FREE (bba_sets);
+	      return true;
+	    }
+	}
+
+    }
+
+  BITMAP_FREE (bba_sets);
+  return true;
+}
+
+/* Emit copies of all the active instructions in BB except the last.
+   This is a helper for noce_try_cmove_arith.  */
+
+static void
+noce_emit_all_but_last (basic_block bb)
+{
+  rtx_insn *last = last_active_insn (bb, FALSE);
+  rtx_insn *insn;
+  FOR_BB_INSNS (bb, insn)
+    {
+      if (insn != last && active_insn_p (insn))
+	{
+	  rtx_insn *to_emit = as_a <rtx_insn *> (copy_rtx (insn));
+
+	  emit_insn (PATTERN (to_emit));
+	}
+    }
+}
+
+/* Helper for noce_try_cmove_arith.  Emit the pattern TO_EMIT and return
+   the resulting insn or NULL if it's not a valid insn.  */
+
+static rtx_insn *
+noce_emit_insn (rtx to_emit)
+{
+  gcc_assert (to_emit);
+  rtx_insn *insn = emit_insn (to_emit);
+
+  if (recog_memoized (insn) < 0)
+    return NULL;
+
+  return insn;
+}
+
+/* Helper for noce_try_cmove_arith.  Emit a copy of the insns up to
+   and including the penultimate one in BB if it is not simple
+   (as indicated by SIMPLE).  Then emit LAST_INSN as the last
+   insn in the block.  The reason for that is that LAST_INSN may
+   have been modified by the preparation in noce_try_cmove_arith.  */
+
+static bool
+noce_emit_bb (rtx last_insn, basic_block bb, bool simple)
+{
+  if (bb && !simple)
+    noce_emit_all_but_last (bb);
+
+  if (last_insn && !noce_emit_insn (last_insn))
+    return false;
+
+  return true;
+}
+
 /* Try more complex cases involving conditional_move.  */
 
 static int
@@ -1595,9 +1752,12 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   rtx x = if_info->x;
   rtx orig_a, orig_b;
   rtx_insn *insn_a, *insn_b;
+  bool a_simple = if_info->then_simple;
+  bool b_simple = if_info->else_simple;
+  basic_block then_bb = if_info->then_bb;
+  basic_block else_bb = if_info->else_bb;
   rtx target;
   int is_mem = 0;
-  int insn_cost;
   enum rtx_code code;
   rtx_insn *ifcvt_seq;
 
@@ -1636,27 +1796,22 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   insn_a = if_info->insn_a;
   insn_b = if_info->insn_b;
 
-  /* Total insn_rtx_cost should be smaller than branch cost.  Exit
-     if insn_rtx_cost can't be estimated.  */
+  unsigned int then_cost;
+  unsigned int else_cost;
   if (insn_a)
-    {
-      insn_cost
-	= insn_rtx_cost (PATTERN (insn_a),
-      			 optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn_a)));
-      if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (if_info->branch_cost))
-	return FALSE;
-    }
+    then_cost = if_info->then_cost;
   else
-    insn_cost = 0;
+    then_cost = 0;
 
   if (insn_b)
-    {
-      insn_cost
-	+= insn_rtx_cost (PATTERN (insn_b),
-      			  optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn_b)));
-      if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (if_info->branch_cost))
-        return FALSE;
-    }
+    else_cost = if_info->else_cost;
+  else
+    else_cost = 0;
+
+  /* We're going to execute one of the basic blocks anyway, so
+     bail out if the most expensive of the two blocks is unacceptable.  */
+  if (MAX (then_cost, else_cost) > COSTS_N_INSNS (if_info->branch_cost))
+    return FALSE;
 
   /* Possibly rearrange operands to make things come out more natural.  */
   if (reversed_comparison_code (if_info->cond, if_info->jump) != UNKNOWN)
@@ -1672,26 +1827,36 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 	  code = reversed_comparison_code (if_info->cond, if_info->jump);
 	  std::swap (a, b);
 	  std::swap (insn_a, insn_b);
+	  std::swap (a_simple, b_simple);
+	  std::swap (then_bb, else_bb);
 	}
     }
 
+  if (!a_simple && then_bb && !b_simple && else_bb
+      && (!bbs_ok_for_cmove_arith (then_bb, else_bb)
+	  || !bbs_ok_for_cmove_arith (else_bb, then_bb)))
+    return FALSE;
+
   start_sequence ();
 
   orig_a = a;
   orig_b = b;
 
+  rtx emit_a = NULL_RTX;
+  rtx emit_b = NULL_RTX;
+
   /* If either operand is complex, load it into a register first.
      The best way to do this is to copy the original insn.  In this
      way we preserve any clobbers etc that the insn may have had.
      This is of course not possible in the IS_MEM case.  */
+
   if (! general_operand (a, GET_MODE (a)))
     {
-      rtx_insn *insn;
 
       if (is_mem)
 	{
 	  rtx reg = gen_reg_rtx (GET_MODE (a));
-	  insn = emit_insn (gen_rtx_SET (reg, a));
+	  emit_a = gen_rtx_SET (reg, a);
 	}
       else if (! insn_a)
 	goto end_seq_and_fail;
@@ -1701,49 +1866,58 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 	  rtx_insn *copy_of_a = as_a <rtx_insn *> (copy_rtx (insn_a));
 	  rtx set = single_set (copy_of_a);
 	  SET_DEST (set) = a;
-	  insn = emit_insn (PATTERN (copy_of_a));
+
+	  emit_a = PATTERN (copy_of_a);
 	}
-      if (recog_memoized (insn) < 0)
-	goto end_seq_and_fail;
     }
+
   if (! general_operand (b, GET_MODE (b)))
     {
-      rtx pat;
-      rtx_insn *last;
-      rtx_insn *new_insn;
-
       if (is_mem)
 	{
           rtx reg = gen_reg_rtx (GET_MODE (b));
-	  pat = gen_rtx_SET (reg, b);
+	  emit_b = gen_rtx_SET (reg, b);
 	}
       else if (! insn_b)
 	goto end_seq_and_fail;
       else
 	{
           b = gen_reg_rtx (GET_MODE (b));
-	  rtx_insn *copy_of_insn_b = as_a <rtx_insn *> (copy_rtx (insn_b));
-	  rtx set = single_set (copy_of_insn_b);
+	  rtx_insn *copy_of_b = as_a <rtx_insn *> (copy_rtx (insn_b));
+	  rtx set = single_set (copy_of_b);
+
 	  SET_DEST (set) = b;
-	  pat = PATTERN (copy_of_insn_b);
+	  emit_b = PATTERN (copy_of_b);
 	}
+    }
 
-      /* If insn to set up A clobbers any registers B depends on, try to
-	 swap insn that sets up A with the one that sets up B.  If even
-	 that doesn't help, punt.  */
-      last = get_last_insn ();
-      if (last && modified_in_p (orig_b, last))
-	{
-	  new_insn = emit_insn_before (pat, get_insns ());
-	  if (modified_in_p (orig_a, new_insn))
-	    goto end_seq_and_fail;
-	}
-      else
-	new_insn = emit_insn (pat);
+    /* If insn to set up A clobbers any registers B depends on, try to
+       swap insn that sets up A with the one that sets up B.  If even
+       that doesn't help, punt.  */
 
-      if (recog_memoized (new_insn) < 0)
-	goto end_seq_and_fail;
-    }
+    if (emit_a && modified_in_p (orig_b, emit_a))
+      {
+	if (modified_in_p (orig_a, emit_b))
+	  goto end_seq_and_fail;
+
+	if (else_bb && !b_simple)
+	  {
+	    if (!noce_emit_bb (emit_b, else_bb, b_simple))
+	      goto end_seq_and_fail;
+	  }
+
+	if (!noce_emit_bb (emit_a, then_bb, a_simple))
+	  goto end_seq_and_fail;
+      }
+    else
+      {
+	if (!noce_emit_bb (emit_a, then_bb, a_simple))
+	  goto end_seq_and_fail;
+
+	if (!noce_emit_bb (emit_b, else_bb, b_simple))
+	  goto end_seq_and_fail;
+
+      }
 
   target = noce_emit_cmove (if_info, x, code, XEXP (if_info->cond, 0),
 			    XEXP (if_info->cond, 1), a, b);
@@ -1947,6 +2121,9 @@ noce_try_minmax (struct noce_if_info *if_info)
   enum rtx_code code, op;
   int unsignedp;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   /* ??? Reject modes with NaNs or signed zeros since we don't know how
      they will be resolved with an SMIN/SMAX.  It wouldn't be too hard
      to get the target to tell us...  */
@@ -2043,6 +2220,9 @@ noce_try_abs (struct noce_if_info *if_info)
   int negate;
   bool one_cmpl = false;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   /* Reject modes with signed zeros.  */
   if (HONOR_SIGNED_ZEROS (if_info->x))
     return FALSE;
@@ -2191,6 +2371,9 @@ noce_try_sign_mask (struct noce_if_info *if_info)
   enum rtx_code code;
   bool t_unconditional;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   cond = if_info->cond;
   code = GET_CODE (cond);
   m = XEXP (cond, 0);
@@ -2274,6 +2457,9 @@ noce_try_bitop (struct noce_if_info *if_info)
   cond = if_info->cond;
   code = GET_CODE (cond);
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   /* Check for no else condition.  */
   if (! rtx_equal_p (x, if_info->b))
     return FALSE;
@@ -2524,6 +2710,137 @@ noce_can_store_speculate_p (basic_block top_bb, const_rtx mem)
   return false;
 }
 
+/* Helper for bb_valid_for_noce_process_p.  Validate that
+   the rtx insn INSN is a single set that does not set
+   the conditional register CC and is in general valid for
+   if-conversion.  */
+
+static bool
+insn_valid_noce_process_p (rtx_insn *insn, rtx cc)
+{
+  if (!insn
+      || !NONJUMP_INSN_P (insn)
+      || (cc && set_of (cc, insn)))
+      return false;
+
+  rtx sset = single_set (insn);
+
+  /* Currently support only simple single sets in test_bb.  */
+  if (!sset
+      || !noce_operand_ok (SET_DEST (sset))
+      || !noce_operand_ok (SET_SRC (sset)))
+    return false;
+
+  return true;
+}
+
+/* Return true if X contains a MEM subrtx.  */
+
+static bool
+contains_mem_rtx_p (rtx x)
+{
+  subrtx_iterator::array_type array;
+  FOR_EACH_SUBRTX (iter, array, x, ALL)
+    if (MEM_P (*iter))
+      return true;
+
+  return false;
+}
+
+/* Return true iff basic block TEST_BB is valid for noce if-conversion.
+   The condition used in this if-conversion is in COND.
+   In practice, check that TEST_BB ends with a single set
+   x := a and all previous computations
+   in TEST_BB don't produce any values that are live after TEST_BB.
+   In other words, all the insns in TEST_BB are there only
+   to compute a value for x.  Put the rtx cost of the insns
+   in TEST_BB into COST.  Record whether TEST_BB is a single simple
+   set instruction in SIMPLE_P.  */
+
+static bool
+bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
+			      unsigned int *cost, bool *simple_p)
+{
+  if (!test_bb)
+    return false;
+
+  rtx_insn *last_insn = last_active_insn (test_bb, FALSE);
+  rtx last_set = NULL_RTX;
+
+  rtx cc = cc_in_cond (cond);
+
+  if (!insn_valid_noce_process_p (last_insn, cc))
+    return false;
+  last_set = single_set (last_insn);
+
+  rtx x = SET_DEST (last_set);
+  rtx_insn *first_insn = first_active_insn (test_bb);
+  rtx first_set = single_set (first_insn);
+
+  if (!first_set)
+    return false;
+
+  /* We have a single simple set, that's okay.  */
+  bool speed_p = optimize_bb_for_speed_p (test_bb);
+
+  if (first_insn == last_insn)
+    {
+      *simple_p = noce_operand_ok (SET_DEST (first_set));
+      *cost = insn_rtx_cost (first_set, speed_p);
+      return *simple_p;
+    }
+
+  rtx_insn *prev_last_insn = PREV_INSN (last_insn);
+  gcc_assert (prev_last_insn);
+
+  /* For now, disallow setting x multiple times in test_bb.  */
+  if (REG_P (x) && reg_set_between_p (x, first_insn, prev_last_insn))
+    return false;
+
+  bitmap test_bb_temps = BITMAP_ALLOC (&reg_obstack);
+
+  /* The regs that are live out of test_bb.  */
+  bitmap test_bb_live_out = df_get_live_out (test_bb);
+
+  int potential_cost = insn_rtx_cost (last_set, speed_p);
+  rtx_insn *insn;
+  FOR_BB_INSNS (test_bb, insn)
+    {
+      if (insn != last_insn)
+	{
+	  if (!active_insn_p (insn))
+	    continue;
+
+	  if (!insn_valid_noce_process_p (insn, cc))
+	    goto free_bitmap_and_fail;
+
+	  rtx sset = single_set (insn);
+	  gcc_assert (sset);
+
+	  if (contains_mem_rtx_p (SET_SRC (sset))
+	      || contains_mem_rtx_p (SET_DEST (sset)))
+	    goto free_bitmap_and_fail;
+
+	  potential_cost += insn_rtx_cost (sset, speed_p);
+	  bitmap_set_bit (test_bb_temps, REGNO (SET_DEST (sset)));
+	}
+    }
+
+  /* If any of the intermediate results in test_bb are live after test_bb
+     then fail.  */
+  if (bitmap_intersect_p (test_bb_live_out, test_bb_temps))
+    goto free_bitmap_and_fail;
+
+  BITMAP_FREE (test_bb_temps);
+  *cost = potential_cost;
+  *simple_p = false;
+  return true;
+
+ free_bitmap_and_fail:
+  BITMAP_FREE (test_bb_temps);
+  return false;
+}
+
 /* Given a simple IF-THEN-JOIN or IF-THEN-ELSE-JOIN block, attempt to convert
    it without using conditional execution.  Return TRUE if we were successful
    at converting the block.  */
@@ -2540,7 +2857,6 @@ noce_process_if_block (struct noce_if_info *if_info)
   rtx_insn *insn_a, *insn_b;
   rtx set_a, set_b;
   rtx orig_x, x, a, b;
-  rtx cc;
 
   /* We're looking for patterns of the form
 
@@ -2549,15 +2865,23 @@ noce_process_if_block (struct noce_if_info *if_info)
      (3) if (...) x = a;   // as if with an initial x = x.
 
      The later patterns require jumps to be more expensive.
-
+     For the if (...) x = a; else x = b; case we allow multiple insns
+     inside the then and else blocks as long as their only effect is
+     to calculate a value for x.
      ??? For future expansion, look for multiple X in such patterns.  */
 
-  /* Look for one of the potential sets.  */
-  insn_a = first_active_insn (then_bb);
-  if (! insn_a
-      || insn_a != last_active_insn (then_bb, FALSE)
-      || (set_a = single_set (insn_a)) == NULL_RTX)
-    return FALSE;
+  if (! bb_valid_for_noce_process_p (then_bb, cond, &if_info->then_cost,
+				    &if_info->then_simple))
+    return false;
+
+  if (else_bb
+      && ! bb_valid_for_noce_process_p (else_bb, cond, &if_info->else_cost,
+				      &if_info->else_simple))
+    return false;
+
+  insn_a = last_active_insn (then_bb, FALSE);
+  set_a = single_set (insn_a);
+  gcc_assert (set_a);
 
   x = SET_DEST (set_a);
   a = SET_SRC (set_a);
@@ -2572,11 +2896,11 @@ noce_process_if_block (struct noce_if_info *if_info)
   set_b = NULL_RTX;
   if (else_bb)
     {
-      insn_b = first_active_insn (else_bb);
-      if (! insn_b
-	  || insn_b != last_active_insn (else_bb, FALSE)
-	  || (set_b = single_set (insn_b)) == NULL_RTX
-	  || ! rtx_interchangeable_p (x, SET_DEST (set_b)))
+      insn_b = last_active_insn (else_bb, FALSE);
+      set_b = single_set (insn_b);
+      gcc_assert (set_b);
+
+      if (!rtx_interchangeable_p (x, SET_DEST (set_b)))
 	return FALSE;
     }
   else
@@ -2652,20 +2976,14 @@ noce_process_if_block (struct noce_if_info *if_info)
   if_info->a = a;
   if_info->b = b;
 
-  /* Skip it if the instruction to be moved might clobber CC.  */
-  cc = cc_in_cond (cond);
-  if (cc
-      && (set_of (cc, insn_a)
-	  || (insn_b && set_of (cc, insn_b))))
-    return FALSE;
-
   /* Try optimizations in some approximation of a useful order.  */
   /* ??? Should first look to see if X is live incoming at all.  If it
      isn't, we don't need anything but an unconditional set.  */
 
   /* Look and see if A and B are really the same.  Avoid creating silly
      cmove constructs that no one will fix up later.  */
-  if (rtx_interchangeable_p (a, b))
+  if (noce_simple_bbs (if_info)
+      && rtx_interchangeable_p (a, b))
     {
       /* If we have an INSN_B, we don't have to create any new rtl.  Just
 	 move the instruction that we already have.  If we don't have an
diff --git a/gcc/testsuite/gcc.dg/ifcvt-1.c b/gcc/testsuite/gcc.dg/ifcvt-1.c
new file mode 100644
index 0000000..92bc17a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ifcvt-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target aarch64*-*-* x86_64-*-* } } */
+/* { dg-options "-fdump-rtl-ce1 -O2" } */
+
+int
+foo (int x)
+{
+  return x > 100 ? x - 2 : x - 1;
+}
+
+/* { dg-final { scan-rtl-dump "3 true changes made" "ce1" } } */
diff --git a/gcc/testsuite/gcc.dg/ifcvt-2.c b/gcc/testsuite/gcc.dg/ifcvt-2.c
new file mode 100644
index 0000000..e0e1728
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ifcvt-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target aarch64*-*-* x86_64-*-* } } */
+/* { dg-options "-fdump-rtl-ce1 -O2" } */
+
+
+typedef unsigned char uint8_t;
+typedef unsigned int uint16_t;
+
+uint8_t
+_xtime (const uint8_t byte, const uint16_t generator)
+{
+  if (byte & 0x80)
+    return byte ^ generator;
+  else
+    return byte << 1;
+}
+
+/* { dg-final { scan-rtl-dump "3 true changes made" "ce1" } } */
diff --git a/gcc/testsuite/gcc.dg/ifcvt-3.c b/gcc/testsuite/gcc.dg/ifcvt-3.c
new file mode 100644
index 0000000..2e104a4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ifcvt-3.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target aarch64*-*-* x86_64-*-* }  } */
+/* { dg-options "-fdump-rtl-ce1 -O2" } */
+
+typedef long long s64;
+
+int
+foo (s64 a, s64 b, s64 c)
+{
+ s64 d = a - b;
+
+  if (d == 0)
+    return a + c;
+  else
+    return b + d + c;
+}
+
+/* This test can be reduced to just return a + c;  */
+/* { dg-final { scan-rtl-dump "3 true changes made" "ce1" } } */
+/* { dg-final { scan-assembler-not "sub\.*\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+\.*" { target { aarch64*-*-* } } } } */

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]