Bug 21883 - [4.1/4.2 Regression] jump threading causing excessive code duplication
Summary: [4.1/4.2 Regression] jump threading causing excessive code duplication
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.1.0
: P2 major
Target Milestone: 4.2.0
Assignee: Not yet assigned to anyone
URL: http://gcc.gnu.org/ml/gcc-patches/200...
Keywords: compile-time-hog, missed-optimization
Depends on:
Blocks: 16996 jumpthreading 29285
  Show dependency treegraph
 
Reported: 2005-06-02 18:55 UTC by Jakub Jelinek
Modified: 2006-09-29 16:02 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2006-02-03 01:09:28


Attachments
PATCH (1.85 KB, text/plain)
2005-11-04 20:10 UTC, Jeffrey A. Law
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2005-06-02 18:55:12 UTC
extern int foo1 (void), foo2 (int);

int bar (int x)
{
  int varvar1, varvar2;
  int z;
  if (x > 913)
    varvar1 = foo1 ();
  else
    varvar2 = foo2 (6);
#define A foo1 (); foo1 (); foo1 (); foo1 (); foo1 ();
#define B A A A A A A A A A A
#define C B B B B B B B B B B
  C C C C C C C C C C C
  if (x > 913)
    z = varvar1 + foo1 ();
  else
    z = varvar2 + foo2 (6);
  return z;
}

Current HEAD will duplicate all the 2500 foo1 () calls to save a if (x > 913)
comparison.  With -Os almost no duplication should be allowed (only if the
duplicated instructions are expected to be shorter than the compare and branch),
and even with -O2 there should be some sensible limit.
Comment 1 Andrew Pinski 2005-06-02 19:04:47 UTC
Confirmed, a regression from 4.0.0.
Comment 2 Steven Bosscher 2005-09-01 11:22:59 UTC
This also appears to be a source of compile time problems. 
Comment 3 Steven Bosscher 2005-09-01 11:37:24 UTC
I'll have a look at this. 
Comment 4 Steven Bosscher 2005-10-07 21:32:01 UTC
The patch has almost no effect except for -Os.  For SPEC binaries the 
effect of the patch is not exactly shocking on AMD64 at least: No effect
at all on compile time, no effect on performance, and almost no effect
on code size either:

param = value of --param max-jump-thread-duplication-insns=...
INT = total size of SPECint binaries at -O2 with this param
FP = total size of SPECfp binaries
value   INT     FP
1000    4353510 3349018
500     4353510 3349018
200     4353510 3349018
100     4352614 3348570
75      4352198 3347322
50      4351302 3346874
40      4351046 3346746
30      4350374 3346394
20      4350310 3346170
10      4350150 3341434
5       4346342 3340826
0       4345126 3341034

I don't have time to update the patch for recent changes and do any
further testing (e.g. CSiBE) to get this accepted, so unassigning.
Comment 5 Mark Mitchell 2005-10-31 03:43:19 UTC
Downgraded to P2.  Important, but not a showstopper.  We really should have some kind of throttle, even if it's a bit simplistic.
Comment 6 Jeffrey A. Law 2005-11-01 01:56:22 UTC
Subject: Re:  [4.1 Regression] jump threading
	causing excessive code duplication

On Mon, 2005-10-31 at 03:43 +0000, mmitchel at gcc dot gnu dot org
wrote:
> 
> ------- Comment #5 from mmitchel at gcc dot gnu dot org  2005-10-31 03:43 -------
> Downgraded to P2.  Important, but not a showstopper.  We really should have
> some kind of throttle, even if it's a bit simplistic.
We could easily put in a trivial throttle.    If there's more than N
statements + phis, then the block is considered not threadable.  Choose
N and it'll take about 5 minutes of work (and 3 hours to test :-)

I'll throw out 50 as a very very very conservative number.  If we're OK
with that number, then let's do it.

We might be better around 10, but 50 ought to catch the pathological
cases without impacting much of anything.

Jeff


Comment 7 Jeffrey A. Law 2005-11-03 01:35:44 UTC
Subject: Re:  [4.1 Regression] jump threading
	causing excessive code duplication

On Mon, 2005-10-31 at 18:56 -0700, Jeffrey A Law wrote:
> On Mon, 2005-10-31 at 03:43 +0000, mmitchel at gcc dot gnu dot org
> wrote:
> > 
> > ------- Comment #5 from mmitchel at gcc dot gnu dot org  2005-10-31 03:43 -------
> > Downgraded to P2.  Important, but not a showstopper.  We really should have
> > some kind of throttle, even if it's a bit simplistic.
> We could easily put in a trivial throttle.    If there's more than N
> statements + phis, then the block is considered not threadable.  Choose
> N and it'll take about 5 minutes of work (and 3 hours to test :-)
> 
> I'll throw out 50 as a very very very conservative number.  If we're OK
> with that number, then let's do it.
> 
> We might be better around 10, but 50 ought to catch the pathological
> cases without impacting much of anything.

FWIW with a nice blob of .i files I measured how many real phis + real
statements appeared in blocks we are able to thread through.

Ultimately we threaded 28911 incoming edges for my little test bucket.
Results:

Phis+stmts   Threaded Edges  Cumulative       Percentage
    1		5955		5955            20.6%
    2		15706		21661           74.9%
    3		3787		25448           88.0%
    4		1396		26844           92.9%
    5		795		27639           95.6%
    6		331		27970           96.7%
    7		236		28206           97.6%
    8		132		28338           98.0%
    9		114		28452           98.4%
    10		60		28512           98.6%
    11		68		28580           98.9%
    12		17		28597           98.9%
    13		91		28688           99.2%
    14		11		28699           99.3%
    15		23		28722           99.3%
    16		17		28739           99.4%
    17		16		28755           99.5%
    18		44		28799           99.6%
    19		9		28808           99.6%
    20		2		28810           99.7%
    21		2		28812           99.7%
    22		93		28905           99.98%
    27		2		28907           99.99%
    61		4		28911           100%


Note that at least some of the PHIs are either dead or will
disappear due to copy propagation& coalescing.  Some of the
statements are also going to be dead, in fact, these numbers
include the control statement at the end of the block which
we already know is going to disappear :-)  So IMHO these are
conservative numbers.

[ Even more so since only one copy is made if multiple
  incoming edges thread to the same outgoing edge.  ]


We can see that the vast majority of blocks of interest
have a small number of statements (again remembering that
one or more will ultimately disappear as they are dead).
In fact, if we were to put the limit at 15 phis+statements,
then we would still pick up over 99% of the jump threading
opportunities for this little test bucket.

Unless there is some objection, I'm planning to put in a 
limit of 15 phis+statements in the target block for jump
threading.  That should be enough for us to defer this PR
to 4.2..

jeff


Comment 8 Jeffrey A. Law 2005-11-04 20:10:10 UTC
Subject: Re:  [4.1 Regression] jump threading
	causing excessive code duplication

On Mon, 2005-10-31 at 18:56 -0700, Jeffrey A Law wrote:
> On Mon, 2005-10-31 at 03:43 +0000, mmitchel at gcc dot gnu dot org
> wrote:
> > 
> > ------- Comment #5 from mmitchel at gcc dot gnu dot org  2005-10-31 03:43 -------
> > Downgraded to P2.  Important, but not a showstopper.  We really should have
> > some kind of throttle, even if it's a bit simplistic.
> We could easily put in a trivial throttle.    If there's more than N
> statements + phis, then the block is considered not threadable.  Choose
> N and it'll take about 5 minutes of work (and 3 hours to test :-)
> 
> I'll throw out 50 as a very very very conservative number.  If we're OK
> with that number, then let's do it.
> 
> We might be better around 10, but 50 ought to catch the pathological
> cases without impacting much of anything.
Here's what I actually checked in.  This should be enough to allow
us to delay final resolution of this PR till 4.2 (probably using
Steven's prototype, which I like better).

Bootstrapped and regression tested on i686-pc-linux-gnu.


Comment 9 Jeffrey A. Law 2005-11-04 20:10:11 UTC
Created attachment 10148 [details]
PATCH
Comment 10 Jeffrey A. Law 2005-11-04 20:11:48 UTC
Band-aid applied for 4.1; Steven's prototype patch may be a better solution as it only simulates those statements which affect the conditional and doesn't count those statements (they're likely going to disappear in addition to the conditional)

Evaluation of Steven's patch should occur after 4.1 is branched.
Comment 11 Andrew Pinski 2006-05-21 20:13:25 UTC
(In reply to comment #10)
> Evaluation of Steven's patch should occur after 4.1 is branched.

Any progress on this, it has been over 3 months since that was written.
Comment 12 Andrew Pinski 2006-07-05 09:04:36 UTC
As far as I can tell this has been fixed now so closing as such.  Yes I did test to make sure the function calls are no longer being duplicated.