This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Add capability to run several iterations of early optimizations


On Fri, 28 Oct 2011, Richard Guenther wrote:

I discussed the idea of iterating early optimizations shortly with Honza.
I was trying to step back a bit and look at what we try to do right now,
which is, optimize functions in topological order (thus, try to make sure
all callees are already early optimized when optimizing callers).  That
of course is difficult when there are cycles in the cgraph (for which we
basically start at a random place), especially when there would be
may-edges (which we do not have) for indirect/virtual calls, as they
basically make the whole cgraph cyclic.  So my idea was to make
cycle processing more explicit in early optimizations, and, whenever
we discover a new direct cgraph edge make sure we optimize the
callee, and whenever we optimized a callee queue all callers for
re-optimization.  You of course have to limit the number of times you
want to process a function, otherwise for a cycle, you'd optimize
indefinitely.  We already do the inlining itself repeatedly (via
--param early-inliner-max-iterations), though that only iterates
the inlining itself, allowing for "deep" inlining and some cases
of inlining of indirect calls if the inliner substituted a function
pointer parameter into a call of the inlined function.  Moving that
iteration over to iterating over the optimizations could make sense.

Inverting the patch's current approach makes sense to me, and may be preferrable from a usability perspective. That is, since each iteration increases compile time the user is indirectly specifying how long they're willing to give the compiler to get the best code. That being said, I'd be curious what the "maximum" number of iterations would be on something like Firefox/WebKit when compiled with LTO. On the proprietary codebase we tested on, there were still things being discovered at 120+ iterations (~100KLOC, compiled with mega-compilation aka the poor-man's LTO).


This could actually speed things up quite a bit if it means that we would only re-run the early passes on the functions whose call-chain had some optimization applied. That is, that the parts of the code being re-analyzed/optimized would narrow with each iteration, potentially reducing the overhead of multiple passes in the first place.


Thus, I'd really like to at least make iterating depend on some
profitability analysis, even if it is only based on cgraph analysis
such as 'we discovered a new direct edge'.

That makes sense to me, but then again I'm not implementing it ;) FWIW, I implemented a similar rule in a static analysis product I wrote a few years ago -- if no new information was discovered on a given bottom-up pass, move onto the next analysis.


Thanks again for taking the time to work through this!


-- tangled strands of DNA explain the way that I behave. http://www.clock.org/~matt


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]