This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC] PR tree-optimization/67628: Make tree ifcombine more symmetric and interactions with dom


Hi all,

I'm looking into improving usage of aarch64 conditional compare instructions and PR 67628
is one area of improvement I identified. The problem there is different tree-level behaviour
for the expressions:
(a > b && b <= c) && c > d
vs.
a > b && (b <= c && c > d)

The second variant generates worse code for aarch64 (and x86_64) because tree ifcombine doesn't
do as good a job at merging basic blocks.

I've tracked this down to the function ifcombine_andif in tree-ssa-ifcombine.c.
With this patch I get the good codegen from that PR, both forms are combined into a single basic block
and the conditional compare expansion code in ccmp.c does a good job at generating the ccmp instructions.
I also see about 2% more ccmp instructions being generated for SPEC2006 with the code generally looking
better as well (if I remove this whole single-conditional restriction completely I get 19% more conditional
compares in SPEC2006, but that's something I want to investigate separately).

The idea is that for the two expressions given above the control flow and contents of the basic blocks
is the same, but when ifcombine_ifandif is called the inner_cond_bb and outer_cond_bb arguments are
reversed i.e. for the first expression the inner_cond_bb contains a single comparison c > d
and outer_cond_bb contains multiple comparisons a > b && b <= c.
In the second case the outer cond contains the single comparison a > b and the inner block this
time contains the multiple conditions a > b && b <= c, thus failing the
"Only do this optimization if the inner bb contains only the conditional" check.
The patch makes this condition more symmetrical by rejecting this optimisation only if
neither the inner nor the outer condition blocks are simple.

Unfortunately, I see a testsuite regression with this patch:
FAIL: gcc.dg/pr66299-2.c scan-tree-dump-not optimized "<<"

The reduced part of that test is:
void
test1 (int x, unsigned u)
{
  if ((1U << x) != 64
      || (2 << x) != u
      || (x << x) != 384
      || (3 << x) == 9
      || (x << 14) != 98304U
      || (1 << x) == 14
      || (3 << 2) != 12)
    __builtin_abort ();
}

The patched ifcombine pass works more or less as expected and produces fewer basic blocks.
Before this patch a relevant part of the ifcombine dump for test1 is:
;;   basic block 2, loop depth 0, count 0, freq 10000, maybe hot
  if (x_1(D) != 6)
    goto <bb 6>;
  else
    goto <bb 3>;

;;   basic block 3, loop depth 0, count 0, freq 9996, maybe hot
  _2 = 2 << x_1(D);
  _3 = (unsigned intD.10) _2;
  if (_3 != u_4(D))
    goto <bb 6>;
  else
    goto <bb 4>;


After this patch it is:
;;   basic block 2, loop depth 0, count 0, freq 10000, maybe hot
  _2 = 2 << x_1(D);
  _3 = (unsigned intD.10) _2;
  _9 = _3 != u_4(D);
  _10 = x_1(D) != 6;
  _11 = _9 | _10;
  if (_11 != 0)
    goto <bb 5>;
  else
    goto <bb 3>;

The second form ends up generating worse codegen however, and the badness starts with the dom1 pass.
In the unpatched case it manages to deduce that x must be 6 by the time it reaches basic block 3 and
uses that information to eliminate the shift in "_2 = 2 << x_1(D)" from basic block 3
In the patched case it is unable to make that call, I think because the x != 6 condition is IORed
with another test.

I'm not familiar with the internals of the dom pass, so I'm not sure where to go looking for a fix for this.
Is the ifcombine change a step in the right direction? If so, what would need to be done to fix the issue with
the dom pass?

I suppose what we want is to not combine basic blocks if the sequence and conditions of the basic blocks are
such that dom can potentially exploit them, but how do we express that?

Thanks,
Kyrill

P.S. I can provide more complete tree dumps for the examples if needed.

2015-09-22  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

    PR tree-optimization/67628
    * tree-ssa-ifcombine.c (ifcombine_ifandif): Allow optimization
    when either the inner or outer block contain a single conditional.
diff --git a/gcc/tree-ssa-ifcombine.c b/gcc/tree-ssa-ifcombine.c
index 9f04174..bfe17ba 100644
--- a/gcc/tree-ssa-ifcombine.c
+++ b/gcc/tree-ssa-ifcombine.c
@@ -511,8 +511,12 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
 	  gimple_stmt_iterator gsi;
 	  if (!LOGICAL_OP_NON_SHORT_CIRCUIT)
 	    return false;
-	  /* Only do this optimization if the inner bb contains only the conditional. */
-	  if (!gsi_one_before_end_p (gsi_start_nondebug_after_labels_bb (inner_cond_bb)))
+	  /* Only do this optimization if inner or outer bb contains
+	     only the conditional. */
+	  if (!gsi_one_before_end_p (gsi_start_nondebug_after_labels_bb (
+				     inner_cond_bb))
+	      && !gsi_one_before_end_p (gsi_start_nondebug_after_labels_bb (
+					 outer_cond_bb)))
 	    return false;
 	  t1 = fold_build2_loc (gimple_location (inner_cond),
 				inner_cond_code,

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]