Bug 25243 - [4.1/4.2 Regression] Jump threading opportunity missed in tree-ssa but caught in jump1
Summary: [4.1/4.2 Regression] Jump threading opportunity missed in tree-ssa but caught...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: unknown
: P3 normal
Target Milestone: 4.2.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization, TREE
Depends on: 21829
Blocks:
  Show dependency treegraph
 
Reported: 2005-12-03 14:21 UTC by Steven Bosscher
Modified: 2006-02-11 00:48 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.0.0 4.0.2
Known to fail: 4.1.0 4.2.0
Last reconfirmed: 2005-12-03 15:46:50


Attachments
follow SSA_NAME_VALUE deep (568 bytes, patch)
2005-12-05 18:05 UTC, Steven Bosscher
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Steven Bosscher 2005-12-03 14:21:11 UTC
static const int e[4] = { 16, 16, 20, 20 };

extern void foo (unsigned long *);

void
baz (unsigned long *r)
{
  unsigned long i;

  for (i = 0; i < 4; i++)
    if (e[i] == 16)
      break;

  if (i == 4)
    {
      foo (r);
    }
}

We have the following in the .vars tree dump:

;; Function baz (baz)

baz (r)
{
  long unsigned int i;

<bb 0>:
  i = 0;

<L0>:;
  if (MEM[symbol: e, index: (int *) i, step: 4B] == 16) goto <L5>; else goto <L1>;

<L1>:;
  i = i + 1;
  if (i != 4) goto <L0>; else goto <L3>;

<L3>:;
  if (i == 4) goto <L4>; else goto <L5>;

<L4>:;
  foo (r);

<L5>:;
  return;

}

The first jump pass on RTL immediately optimizes away the redundant jump.

This kind of jump threading opportunity occurs very often.  About half of the jump threadings on RTL in jump1 that I have looked at so far (several dozen) are of this form.
I'm seeing this for AMD64 and i686, but I guess it happens everywhere.
Comment 1 Steven Bosscher 2005-12-03 14:22:46 UTC
I should have said, this is at "-O1 -fthread-jumps".  I guess VRP catches this at -O2 and better.
Comment 2 Steven Bosscher 2005-12-03 14:37:00 UTC
Actually VRP doesn't catch it.

Do:
-    if (e[i] == 16)
+    if (e[i] == 16)
so that store-CCP doesn't load e[0] anymore to find that it is 16.  With that, the .vrp dump at -O2 looks like this:

baz (r)
{
  long unsigned int i;
  int D.1638;
  long unsigned int i.0;

<bb 0>:
  goto <bb 3> (<L2>);

<L0>:;
  i_4 = i_1;
  D.1638_6 = e[i_1];
  if (D.1638_6 == 17) goto <L3>; else goto <L1>;

<L1>:;
  i_7 = i_1 + 1;

  # i_1 = PHI <0(0), i_7(2)>;
<L2>:;
  if (i_1 <= 3) goto <L0>; else goto <L3>;

<L3>:;
  if (i_1 == 4) goto <L4>; else goto <L5>;

<L4>:;
  foo (r_3);

<L5>:;
  return;

}

The tree loop optimizers turn "(i_1 <= 3)" into "(i_1 != 4)", but there are no passes after VRP that use the ASSERT_EXPRs to propagate "i_1 == 4" down along the false-edge going to the block starting with label L3.

So even at -O2 we don't catch this jump threading opportunity at the tree level.
Comment 3 Steven Bosscher 2005-12-03 15:46:50 UTC
With a minor hack, we optimize the test case in dom3:

Index: tree-ssa-dom.c
===================================================================
--- tree-ssa-dom.c      (revision 107822)
+++ tree-ssa-dom.c      (working copy)
@@ -2776,7 +2776,9 @@ cprop_operand (tree stmt, use_operand_p
   /* If the operand has a known constant value or it is known to be a
      copy of some other variable, use the value or copy stored in
      CONST_AND_COPIES.  */
-  val = SSA_NAME_VALUE (op);
+  val = op;
+  while (TREE_CODE (val) == SSA_NAME && SSA_NAME_VALUE (val))
+    val = SSA_NAME_VALUE (val);
   if (val && val != op && TREE_CODE (val) != VALUE_HANDLE)
     {
       tree op_type, val_type;


Jeff, thoughts?

Comment 4 Andrew Pinski 2005-12-03 17:01:12 UTC
This looks very much related to PR 21829.

Hmm, in 4.0.0 we got Before DOM (likewise for 4.1.0 and above):
<L1>:;
  i_2 = i_10 + 1;
  if (i_2 != 4) goto <L0>; else goto <L3>;

  # i_7 = PHI <i_2(2)>;
<L3>:;
  if (i_7 == 4) goto <L4>; else goto <L5>;

but after:
<L1>:;
  i_2 = i_10 + 1;
  if (i_2 != 4) goto <L0>; else goto <L3>;

Invalid sum of incoming frequencies 2375, should be 0
  # i_7 = PHI <i_2(2)>;
<L3>:;
  foo (r_3);

Which means that GCC got it right in 4.0.0
Comment 5 Steven Bosscher 2005-12-03 17:46:28 UTC
Actually, it's more related to Bug 21488.  What happens is that we record a value for the left hand side of a single-argument PHI node (i.e. for "rhs=PHI(lhs)" we record an equivalence rhs==lhs), but the left hand side also already has a value (in this case, lhs==4).

Later on we don't copy propagate lhs into the uses of rhs because lhs is defined in a loop and we don't copy propagate out of loops, so we never see that rhs==4 too.

My hack in comment #3 propagates the value "4" by following SSA_NAME_VALUE links as deeply as possible.  What we should probably do instead is look through SSA_NAME_VALUE chains in record_equivalences_from_phis.
Comment 6 Jeffrey A. Law 2005-12-03 18:27:54 UTC
Subject: Re:  [4.1/4.2 Regression] Jump
	threading opportunity missed in tree-ssa but caught in jump1

On Sat, 2005-12-03 at 17:46 +0000, steven at gcc dot gnu dot org wrote:
> 
> ------- Comment #5 from steven at gcc dot gnu dot org  2005-12-03 17:46 -------
> Actually, it's more related to Bug 21488.  What happens is that we record a
> value for the left hand side of a single-argument PHI node (i.e. for
> "rhs=PHI(lhs)" we record an equivalence rhs==lhs), but the left hand side also
> already has a value (in this case, lhs==4).
> 
> Later on we don't copy propagate lhs into the uses of rhs because lhs is
> defined in a loop and we don't copy propagate out of loops, so we never see
> that rhs==4 too.
> 
> My hack in comment #3 propagates the value "4" by following SSA_NAME_VALUE
> links as deeply as possible.  What we should probably do instead is look
> through SSA_NAME_VALUE chains in record_equivalences_from_phis.
The reason we don't generally walk through those chains is it is
possible to get loops in the value chain.  I've always considered
this a semi-bug in DOM.

Jeff








Comment 7 Steven Bosscher 2005-12-05 18:05:00 UTC
Created attachment 10410 [details]
follow SSA_NAME_VALUE deep

Hmmwell, the attached patch does bootstrap on i686,ia64, and x86-64, and it passes regression testing on those targets too.  Jeff, do you have an example of when this could go into an infinite loop?
Comment 8 Jeffrey A. Law 2005-12-05 18:18:27 UTC
Subject: Re:  [4.1/4.2 Regression] Jump
	threading opportunity missed in tree-ssa but caught in jump1

On Mon, 2005-12-05 at 18:05 +0000, steven at gcc dot gnu dot org wrote:
> 
> ------- Comment #7 from steven at gcc dot gnu dot org  2005-12-05 18:05 -------
> Created an attachment (id=10410)
>  --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=10410&action=view)
> follow SSA_NAME_VALUE deep
> 
> Hmmwell, the attached patch does bootstrap on i686,ia64, and x86-64, and it
> passes regression testing on those targets too.  Jeff, do you have an example
> of when this could go into an infinite loop?
Try putting it in lookup_avail_expr.  ie, instead of following one chain
back, make it a loop.

I'd also really rather look at moving away from the equivalency chains,
which would also address this problem.  What's preventing that is things
actually get slower when we fully propagate constants/copies as they are
discovered.

jeff

Comment 9 Steven Bosscher 2006-02-11 00:48:24 UTC
Fixed with the 2nd VRP pass Jeff added recently:

2006-02-07  Jeff Law  <law at redhat dot com>

        * tree-vrp.c (find_conditional_asserts): Update comments.
        (simplify_stmt_for_jump_threading): New.
        (etc.)