Bug 18046 - Missed jump threading optimization
Summary: Missed jump threading optimization
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: unknown
: P2 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: alias, missed-optimization, TREE
Depends on:
Blocks: 19794
  Show dependency treegraph
 
Reported: 2004-10-18 12:16 UTC by Steven Bosscher
Modified: 2012-11-08 22:29 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-11-08 10:29:57


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Steven Bosscher 2004-10-18 12:16:05 UTC
Filed under unknown since there is no version tag for the
tree-cleanup-branch.
The problem probably also exists on mainline, but I have
not tried it there.

Consider this code,

-----------------------------------------------------
extern void foo (void);
extern int i;
void
bar (void)
{
  switch (i)
    {
    case 0:
      foo ();
      break;
    case 1:
      break;
    }
                                                                               
                
  switch (i)
    {
    case 0:
      foo ();
      break;
    case 1:
      break;
    }
}
-----------------------------------------------------

This gives the following .dse2 dump:

;; Function bar (bar)
                                                                               
                
bar ()
{
  int i.0;
                                                                               
                
<bb 0>:
  #   VUSE <i_2>;
  i.0_3 = i;
  switch (i.0_3)
    {
      case 0: goto <L0>;
      default : goto <L1>;
    }
                                                                               
                
<L0>:;
  #   i_6 = V_MAY_DEF <i_2>;
  foo ();
                                                                               
                
  # i_1 = PHI <i_2(0), i_6(1)>;
<L1>:;
  #   VUSE <i_1>;
  i.0_4 = i;
  switch (i.0_4)
    {
      case 0: goto <L4>;
      default : goto <L5>;
    }
                                                                               
                
<L4>:;
  #   i_5 = V_MAY_DEF <i_1>;
  foo ();
                                                                               
                
<L5>:;
  return;
                                                                               
                
}

If the first switch goes through the default case, the
switch condition "i" is not call clobbered by the call
to "foo ()" so in the second switch "i" is unchanged and
we can thread the jump.

We don't catch this jump threading opportunity on trees,
but we *do* catch it on RTL.  This is *the* major source
of missed jump threads on the tree-cleanup-branch (we
catch rougly a thousand of these on the cc1-i file set).

If we teach the tree optimizers to catch this case, we
can almost certainly kill the RTL thread_jump junk on the
tree-cleanup-branch.
Comment 1 Andrew Pinski 2004-10-18 12:52:45 UTC
Confirmed. To summarize what the code should look like:
extern void foo (void);
extern int i;
void
bar (void)
{
  switch (i)
    {
    case 0:
      foo ();
      break;
    default:
       goto other_block;
    }
                                                                               
                
  switch (i)
    {
    case 0:
      foo ();
      break;
    default:
    other_block:
      break;
    }
}
Comment 2 Steven Bosscher 2004-10-18 13:22:49 UTC
Might I propose we don't deal with this as an enhancement
request but as a "normal" bug?  Killing the jump threader
in cfgcleanup.c would be a mighty feat, it's one of the
slowest parts of the cfgcleanup on RTL (and IIRC it's one
of the quadratic bottleneckt).

Otherwise, perhaps I should (have) add(ed) the compile
time hog keyword...
Comment 3 Steven Bosscher 2004-10-18 17:30:08 UTC
Diego told me to bug Law.  Obedient as I always am, I hereby 
do so :-) 
 
Jeff, this is a missed jump threading opportunity, the default 
case can be threaded here.  Any ideas how to fix this? 
Comment 4 Jeffrey A. Law 2004-10-18 18:31:13 UTC
Subject: Re:  Missed jump threading
	optimization

On Mon, 2004-10-18 at 11:30, steven at gcc dot gnu dot org wrote:
> ------- Additional Comments From steven at gcc dot gnu dot org  2004-10-18 17:30 -------
> Diego told me to bug Law.  Obedient as I always am, I hereby 
> do so :-) 
>  
> Jeff, this is a missed jump threading opportunity, the default 
> case can be threaded here.  Any ideas how to fix this? 
I don't see a good way to fix this.  There's lots of interconnected
issues that would need to be addressed.

Clearly we need better range information so that we can determine
that i != 0 on the default case leaving the first switch statement
(or we would need to avoid collapsing empty cases to the default
label until expansion).

Second, we need to rewrite the jump threading selection code; that's
on the queue.  Specifically we need to drop the requirement that
the statements at the start of the intermediate block are nops.
The SSA update code is already prepared to handle this change, so
it really ought to be isolated in the jump threading selection
code.


Given those two improvements we'd have a chance of threading the
default case out of the first switch statement.

jeff



Comment 5 Steven Bosscher 2004-10-18 22:50:41 UTC
Subject: Re:  Missed jump threading optimization

Hmm, threading the default case sounds interesting, but the real
reason why the RTL threader catches this and the tree threader does
not is because on RTL the test case basically looks like this:

extern void foo (void);
extern int i;
void
bar (void)
{
  if (i == 0)
    foo ();

  if (i == 0)
    foo ();
}

Hey, I can thread that!  :-)

So perhaps we should consider lowering SWITCH_EXPRs with only two
targets to COND_EXPRs after all...?  That would be quite easy to
do.


Comment 6 Jeffrey A. Law 2004-10-19 20:16:49 UTC
Subject: Re:  Missed jump threading
	optimization

On Mon, 2004-10-18 at 16:50, stevenb at suse dot de wrote:
> ------- Additional Comments From stevenb at suse dot de  2004-10-18 22:50 -------
> Subject: Re:  Missed jump threading optimization
> 
> Hmm, threading the default case sounds interesting, but the real
> reason why the RTL threader catches this and the tree threader does
> not is because on RTL the test case basically looks like this:
> 
> extern void foo (void);
> extern int i;
> void
> bar (void)
> {
>   if (i == 0)
>     foo ();
> 
>   if (i == 0)
>     foo ();
> }
> 
> Hey, I can thread that!  :-)
> 
> So perhaps we should consider lowering SWITCH_EXPRs with only two
> targets to COND_EXPRs after all...?  That would be quite easy to
> do.
Jan and maybe others have talked about lowering SWITCH_EXPRs
earlier.  I don't recall if it ever got implemented.

jeff

Comment 7 Steven Bosscher 2005-04-23 16:57:44 UTC
I'm going to implement lowering of some SWITCH_EXPRs at the tree level.  At  
least the ones that we do not produce a decision tree for now in stmt.c... 
Comment 8 Steven Bosscher 2005-10-07 21:21:03 UTC
I don't have time to work on these (new job), so unassigning.
Comment 9 Jeffrey A. Law 2006-03-21 15:57:46 UTC
We've got zero chance of threading the jump in this case until the 
partially redundant load from "i" is removed.

Daniel -- there's a pretty obvious redundant load from the global
variable "i" in this testcase.  I haven't investigated why PRE
is missing this obvious redundancy.

Jeff
Comment 10 Andrew Pinski 2006-03-21 16:05:21 UTC
(In reply to comment #9)
> Daniel -- there's a pretty obvious redundant load from the global
> variable "i" in this testcase.  I haven't investigated why PRE
> is missing this obvious redundancy.

Because tree level load PRE does not handle global variables, there is another bug about this, PR 23455.
Comment 11 Daniel Berlin 2006-03-21 16:57:31 UTC
Subject: Re:  Missed jump threading
	optimization

On Tue, 2006-03-21 at 15:57 +0000, law at redhat dot com wrote:
> 
> ------- Comment #9 from law at redhat dot com  2006-03-21 15:57 -------
> We've got zero chance of threading the jump in this case until the 
> partially redundant load from "i" is removed.
> 
> Daniel -- there's a pretty obvious redundant load from the global
> variable "i" in this testcase.  I haven't investigated why PRE
> is missing this obvious redundancy.

It doesn't deal with loads from global variables because we need to
place a value number on each "instance" that occurs in the program, but
can't easily because they are all shared.

I will get to it eventually.


> 
> Jeff
> 
> 

Comment 12 Steven Bosscher 2008-09-21 13:58:48 UTC
tree PRE now *does* handle the partially redundant global variable load. This is the .final_cleanup dump:


;; Function bar (bar)

bar ()
{
  int prephitmp.13;

<bb 2>:
  prephitmp.13 = i;
  switch (prephitmp.13) <default: <L1>, case 0: <L0>>

<L0>:
  foo ();
  prephitmp.13 = i;

<L1>:
  switch (prephitmp.13) <default: <L5>, case 0: <L4>>

<L4>:
  foo (); [tail call]

<L5>:
  return;

}


But we still miss the jump threading opportunity.
Comment 13 Jeffrey A. Law 2008-09-23 21:55:33 UTC
Subject: Re:  Missed jump threading optimization

steven at gcc dot gnu dot org wrote:
> ------- Comment #12 from steven at gcc dot gnu dot org  2008-09-21 13:58 -------
> tree PRE now *does* handle the partially redundant global variable load. This
> is the .final_cleanup dump:
>
>
> ;; Function bar (bar)
>
> bar ()
> {
>   int prephitmp.13;
>
> <bb 2>:
>   prephitmp.13 = i;
>   switch (prephitmp.13) <default: <L1>, case 0: <L0>>
>
> <L0>:
>   foo ();
>   prephitmp.13 = i;
>
> <L1>:
>   switch (prephitmp.13) <default: <L5>, case 0: <L4>>
>
> <L4>:
>   foo (); [tail call]
>
> <L5>:
>   return;
>
> }
>
>
> But we still miss the jump threading opportunity.
>   
Thanks for the update.  One blocking issue out of the way....

Things have changed a lot since that original bug report.  I believe the 
best solution for this particular case is to lower the switch statements 
early enough to expose the conditionals to DOM & VRP.

The 2nd best approach would be to extend VRP to create a range for the 
default case of a SWITCH and extend the jump threading code in 
tree-vrp.c to handle switch statements (they're currently ignored).  The 
biggest difficulty here would be to avoid dropping to varying too 
quickly.  I think you'd want to sort the cases, then build up a range 
containing all the cases.  If you get a gap in the range, you drop to 
varying.  If after extracting all the ranges you haven't dropped to 
varying, then you invert the range and create the appropriate ASSERT_EXPRs.

I'm not currently working on either solution.

jeff


Comment 14 Steven Bosscher 2010-02-04 22:52:40 UTC
Still not fixed. Still the major source of RTL jump threads.
Comment 15 Steven Bosscher 2010-07-13 10:29:57 UTC
Still not fixed with r162134:

;; Function bar (bar)

bar ()
{
  int prephitmp.4;

<bb 2>:
  prephitmp.4_1 = i;
  switch (prephitmp.4_1) <default: <L2>, case 0: <L0>>

<L0>:
  foo ();
  prephitmp.4_7 = i;

  # prephitmp.4_8 = PHI <prephitmp.4_1(2), prephitmp.4_7(3)>
<L2>:
  switch (prephitmp.4_8) <default: <L6>, case 0: <L4>>

<L4>:
  foo (); [tail call]

<L6>:
  return;

}
Comment 16 Andrew Pinski 2011-07-20 00:33:25 UTC
If VRP inserts into the default case the assert that i.1 cannot be what is other cases are, then I think jump threading could handle this.
Comment 17 Steven Bosscher 2012-11-08 22:29:45 UTC
Will be addressed for GCC 4.9 by moving switch lowering to GIMPLE.