13876 – loop not fully optimized

Bug 13876 - loop not fully optimized

Summary: loop not fully optimized

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	tree-ssa

Importance:	P2 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:	spec
	Show dependency tree / graph

Reported:	2004-01-27 06:06 UTC by Andrew Pinski
Modified:	2014-12-01 05:20 UTC (History)
CC List:	4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:	2.97, 3.0, 3.0.1, 3.0.2, 3.0.3, 3.0.4, 3.1, 3.1.1, 3.1.2, 3.2, 3.2.1, 3.2.2, 3.2.3, 3.3, 3.3.1, 3.3.2, 3.3.3, 3.4.0, 4.0.0
Last reconfirmed:	2006-02-11 01:03:32

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andrew Pinski 2004-01-27 06:06:02 UTC

Basically these two functions should produce the same code, the check for NotFound is not needed 
at all.
void g(); void h(); void o();
void t(int l, int *y)
{
int i;
int NotFound;
o();
for(i = 0, NotFound = 1;NotFound && i<l;i++)
{  if (y[i])  { NotFound = 0;  g();  } }
h();
}


void t1(int l, int *y)
{
int i;
int NotFound;
o();
for(i = 0, NotFound = 1;NotFound && i<l;i++)
{  if (y[i])  { NotFound = 0;  g();  break; } }
h();
}

Comment 1 Andrew Pinski 2004-01-27 06:06:46 UTC

This should be able to be done on the tree-ssa.

Comment 2 Andrew Pinski 2004-01-28 22:03:26 UTC

This is a problem with jump threading as if you change the fcuntion t to be:
void t(int l, int *y)
{
  int i;
  int NotFound;
  o();
  for(i = 0, NotFound = 1;i<l;i++)
  {
    if (y[i])  
    {
      NotFound = 0;
      g();
    }
    if (!NotFound)
      break;
  }
  h(NotFound);
}

It works correctly but changing it to:
void t(int l, int *y)
{
  int i;
  int NotFound;
  o();
  for(i = 0, NotFound = 1;i<l;i++)
  {
    if (!NotFound)
      break;
    if (y[i])  
    {
      NotFound = 0;
      g();
    }
  }
  h(NotFound);
}
It does not.

Comment 3 Jeffrey A. Law 2004-01-29 22:14:45 UTC

Subject: Re:  loop not fully optimized 

In message <20040128220328.24048.qmail@sources.redhat.com>, "pinskia at gcc dot
 gnu dot org" writes:
 >
 >------- Additional Comments From pinskia at gcc dot gnu dot org  2004-01-28 2
 >2:03 -------
 >This is a problem with jump threading as if you change the fcuntion t to be:
 >void t(int l, int *y)
 >{
 >  int i;
 >  int NotFound;
 >  o();
 >  for(i = 0, NotFound = 1;i<l;i++)
 >  {
 >    if (y[i])  
 >    {
 >      NotFound = 0;
 >      g();
 >    }
 >    if (!NotFound)
 >      break;
 >  }
 >  h(NotFound);
 >}
 >
 >It works correctly but changing it to:
 >void t(int l, int *y)
 >{
 >  int i;
 >  int NotFound;
 >  o();
 >  for(i = 0, NotFound = 1;i<l;i++)
 >  {
 >    if (!NotFound)
 >      break;
 >    if (y[i])  
 >    {
 >      NotFound = 0;
 >      g();
 >    }
 >  }
 >  h(NotFound);
 >}
 >It does not.
Right.  This is actually something I had already started looking at and
addressing it is going to be a reasonable amount of work.

To correctly handle this we need to record temporary equivalences for
every PHI we pass through during the dominator walk so that a temporary
equivalence created by a PHI in an earlier block can be used to thread
through a later block.  Right now we only record temporary equivalences
for the PHI in the block we want to thread through.

If we look at the dump if your code we have:

  # BLOCK 0
  # PRED: ENTRY [100.0%]  (fallthru,exec)
  #   MT.4_17 = VDEF <MT.4_16>;
  o ();
  goto <bb 5> (<L4>);
  # SUCC: 5 [100.0%]  (fallthru,exec)

  # BLOCK 1
  # PRED: 5 [95.0%]  (true,exec)
<L0>:;
  if (NotFound_3 == 0) goto <L5>; else goto <L1>;
  # SUCC: 2 [95.0%]  (false,exec) 6 [5.0%]  (true,exec)

  # BLOCK 2
  # PRED: 1 [95.0%]  (false,exec)
<L1>:;
  i.0_7 = (unsigned int)i_1;
  T.1_8 = i.0_7 * 4;
  T.2_10 = T.1_8 + y_9;
  #   VUSE <MT.4_15>;
  T.3_11 = *T.2_10;
  if (T.3_11 != 0) goto <L2>; else goto <L3>;
  # SUCC: 4 [70.0%]  (false,exec) 3 [30.0%]  (true,exec)

  # BLOCK 3
  # PRED: 2 [30.0%]  (true,exec)
<L2>:;
  #   MT.4_19 = VDEF <MT.4_15>;
  g ();
  # SUCC: 4 [100.0%]  (fallthru,exec)

  # BLOCK 4
  # PRED: 2 [70.0%]  (false,exec) 3 [100.0%]  (fallthru,exec)
  # MT.4_14 = PHI <MT.4_15(2), MT.4_19(3)>;
  # NotFound_2 = PHI <NotFound_3(2), 0(3)>;
<L3>:;
  i_12 = i_1 + 1;
  # SUCC: 5 [100.0%]  (fallthru,dfs_back,exec)

  # BLOCK 5
  # PRED: 4 [100.0%]  (fallthru,dfs_back,exec) 0 [100.0%]  (fallthru,exec)
  # MT.4_15 = PHI <MT.4_17(0), MT.4_14(4)>;
  # NotFound_3 = PHI <1(0), NotFound_2(4)>;
  # i_1 = PHI <0(0), i_12(4)>;
<L4>:;
  if (i_1 < l_6) goto <L0>; else goto <L5>;
  # SUCC: 6 [5.0%]  (false,exec) 1 [95.0%]  (true,exec)

  # BLOCK 6
  # PRED: 5 [5.0%]  (false,exec) 1 [5.0%]  (true,exec)
<L5>:;
  #   MT.4_18 = VDEF <MT.4_15>;
  h (NotFound_3);
  return;
  # SUCC: EXIT [100.0%]

}

We need to record a temporary equivalence NotFound_3 = 0 when we start
processing block #5 so that we can use that equivalence to eliminate
the test in block #1.

To do this effectively we really need a temporary equivalence table which
is maintained through the dominator walk for the sole use by the jump
threader -- these equivalences can't be used for const/copy propagation
for example since they are specific to a particular path through the
dominator tree.

Note this will not fix the first testcase you mention in the PR -- something
more complex is necessary to analyze the behavior of NotFound.  See "Beyond
Induction Variables" from Gerlek, Stolz & Wolfe -- IIRC their analysis would
be able to determine the behavior of NotFound in the loop and optimize
the code appropriately.  I think everyone would agree that having the kind of
analysis mentioned in that paper needs to be on the "plan".

jeff

Comment 4 Andrew Pinski 2005-05-07 18:52:59 UTC

The orginal testcase in comment #0 is fixed now but not the one in comment #2.

Comment 5 Steven Bosscher 2006-02-27 13:48:54 UTC

Could it be that this is now fixed?

Comment 6 Andrew Pinski 2006-02-27 14:14:11 UTC

(In reply to comment #5)
> Could it be that this is now fixed?

Nope, the second testcase in comment #2 is still very obvious missing an optimization with respect with jump threading:
        cmpw cr7,r31,r30
        li r3,0
        cmpwi cr6,r3,0
        bne+ cr7,L6
L6:
        beq- cr6,L4

That is only time I have seen an missed optimization that is so obvious.

Comment 7 Jeffrey A. Law 2006-03-21 15:08:36 UTC

This is a loop optimization issue, not a jump threading bug.  To really
optimize this loop well we will likely need the kinds of analysis found
in "Beyond Induction Variables", the classic paper describing loop flip-flops
and other interesting loop variables that are not simple induction variables.

Anyway, I'm removing this from the set of bugs related to jump threading.

Jeff

Comment 8 Richard Biener 2007-04-22 12:49:03 UTC

Loop header copying allows dom to duplicate the latch and thus we can now
eliminate the NotFound check:

;; Function t (t)

t (l, y)
{
  int i;

<bb 2>:
  o ();
  if (l > 0) goto <L15>; else goto <L4>;

<L15>:;
  i = 0;

Invalid sum of incoming frequencies 7098, should be 9100
<L0>:;
  if (MEM[base: y, index: (unsigned int) i, step: 4] != 0) goto <L1>; else goto <L2>;

<L1>:;
  g ();
  goto <bb 7> (<L4>);

<L2>:;
  i = i + 1;
  if (l > i) goto <L0>; else goto <L4>;

Invalid sum of incoming frequencies 2902, should be 900
<L4>:;
  h () [tail call];
  return;

}


So, fixed.  Since 4.1 actually.

Comment 9 Andrew Pinski 2007-04-22 18:39:19 UTC

> So, fixed.  Since 4.1 actually.

No, the example in comment #2 is not fixed.

Comment 10 Andrew Pinski 2014-12-01 05:20:28 UTC

These all are optimized now so closing as fixed.