This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: compiling very large functions.

From: Jan Hubicka <jh at suse dot cz>
To: Richard Guenther <richard dot guenther at gmail dot com>
Cc: Kenneth Zadeck <zadeck at naturalbridge dot com>, GCC <gcc at gcc dot gnu dot org>, Paolo Bonzini <paolo dot bonzini at lu dot unisi dot ch>, "Hubicha, Jan" <jh at suse dot cz>, Ian Lance Taylor <ian at airs dot com>, "Novillo, Diego" <dnovillo at redhat dot com>, Steven Bosscher <stevenb dot gcc at gmail dot com>
Date: Sun, 5 Nov 2006 17:33:15 +0100
Subject: Re: compiling very large functions.
References: <454CF559.3090701@naturalbridge.com> <84fc9c000611041223r7ce272d9o273d05e8f75ffa04@mail.gmail.com> <454CFEDE.2040503@naturalbridge.com> <84fc9c000611041306j364a2654l2cbd96dbea5bfe4b@mail.gmail.com> <454D0A66.5030206@naturalbridge.com> <84fc9c000611050801h37bb875ft361f554bc997926c@mail.gmail.com>

> On 11/4/06, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
> >Richard Guenther wrote:
> >> On 11/4/06, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
> >>> Richard Guenther wrote:
> >>> > On 11/4/06, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
> >>> >> I think that it is time that we in the GCC community took some
> >>> time to
> >>> >> address the problem of compiling very large functions in a somewhat
> >>> >> systematic manner.
> >>> >>
> >>> >> GCC has two competing interests here:  it needs to be able to provide
> >>> >> state of the art optimization for modest sized functions and it
> >>> needs to
> >>> >> be able to properly process very large machine generated functions
> >>> using
> >>> >> reasonable resources.
> >>> >>
> >>> >> I believe that the default behavior for the compiler should be that
> >>> >> certain non essential passes be skipped if a very large function is
> >>> >> encountered.
> >>> >>
> >>> >> There are two problems here:
> >>> >>
> >>> >> 1) defining the set of optimizations that need to be skipped.
> >>> >> 2) defining the set of functions that trigger the special processing.
> >>> >>
> >>> >>
> >>> >> For (1) I would propose that three measures be made of each function.
> >>> >> These measures should be made before inlining occurs. The three
> >>> measures
> >>> >> are the number of variables, the number of statements, and the
> >>> number of
> >>> >> basic blocks.
> >>> >
> >>> > Why before inlining?  These three numbers can change quite
> >>> significantly
> >>> > as a function passes through the pass pipeline.  So we should try
> >>> to keep
> >>> > them up-to-date to have an accurate measurement.
> >>> >
> >>> I am flexible here. We may want inlining to be able to update the
> >>> numbers.  However, I think that we should drive the inlining agression
> >>> based on these numbers.
> >>
> >> Well, for example jump threading and tail duplication can cause these
> >> numbers to significantly change.  Also CFG instrumentation and PRE
> >> can increase the BB count.  So we need to deal with cases where an
> >> optimization produces overly large number of basic blocks or
> >> instructions.
> >> (like by throtting those passes properly)
> >>
> >I lean to leave the numbers static even if they do increase as time goes
> >by.  Otherwise you get two effects, the first optimizations get to be
> >run more, and you get the wierd non linear step functions where small
> >changes in some upstream function effect the down stream.
> 
> Ok, I guess we can easily flag each function as having
> - many BBs
> - big BBs
> - complex CFG (many edges)
> and set these flags at CFG construction time during the lowering phase
> (which is after the early inlining pass I believe).
> 
> The number of basic blocks is kept up-to-date during optimization, the
> other numbers would need to be re-generated if we want to keep them
> up-to-date.

We definitly need some heuristics to trottle down expensive
optimizations for very huge functions.  I am not sure I like the static
flag rather than per-case approach.  The idea of big versus small is
very different for individual optimizations and the properties change
significantly (ie expansion of min/max function will turn big BB into
big CFG) and it will be a lot of fun to set up proper thresholds here.

I also do believe that we need to take care to not have O(n^k) where K
is significantly higher than neccesary and that most of our current
scalability issues falls into this category and we have large complexity
just out of little laziness. We are implementing many of fancier
algorithms in order to be scalable so we should try to not undermine it
by stupid mistakes.

To take the specific example (that triggered this discussion) of extreme
testcase triggering on new fwprop pass, I think it is quite different.

Analyzing the testcase, there was number of problems that was trivial to
solve and are fixed in mainline now.  Remaining problems are usually
easy too - quadratic removal from single linked lists in out-of-SSA and
scheduler. I think those ought to be cured, since those don't hurt only
in such monstrosities.

On the other hand GVN-PRE/df.c do have algoritmic problems that we might
want or might not want to solve.  I care less if we add code to disable
the optimization or come with better algorithm and I am very happy
Daniel considers the second alternative for PRE.

The df.c issue can probably also be solved by either using FUD graph
build by our SSA algorithm over RTL, or turning the FUD graph into DU/UD
chain avoiding the need for the bitmaps. This might be interesting task
if such a big functions turn out to be common bottleneck.
> 
> But with just using three (or even only one?) flag, we can easily fit this
> information in struct function.
> 
> I also like the idea of a "hot" function flag to be able to dynamically
> switch betweed optimize_size and !optimize_size in the tree optimizers.
> Profile based inlining already tries to follow that route.

See cfun->function_frequency here.  I am not quite sure if it is going
to help us with compilation time issues at this moment, since it is set
to meaningful value only with profiling and with profiling user is
probably willing to spend some more time on compilation.  With LTO we
ought to be able to compute the frequencies ourselves, but in
source-unit-at-a-time we don't have enough context to guess.  It should
be possible/nice to add function attribute for that.

In general we should make most of optimize_size/!optimize_size decisions
on BB granuality using maybe_hot_bb_p predicate.  There is patch in
queue to tell instruction expansion about that.

Honza
> 
> Richard.

References:
- compiling very large functions.
  - From: Kenneth Zadeck
- Re: compiling very large functions.
  - From: Richard Guenther
- Re: compiling very large functions.
  - From: Kenneth Zadeck
- Re: compiling very large functions.
  - From: Richard Guenther
- Re: compiling very large functions.
  - From: Kenneth Zadeck
- Re: compiling very large functions.
  - From: Richard Guenther

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]