[PATCH] Add capability to run several iterations of early optimizations

Tue Nov 8 11:18:00 GMT 2011

On Tue, Nov 8, 2011 at 7:34 AM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote:
> On 2/11/2011, at 10:18 AM, Richard Guenther wrote:
>>
>>>>>  Thus, I don't think we want to
>>>>> merge this in its current form or in this stage1.
>>>>
>>>> What is the benefit of pushing this to a later release?  If anything,
>>>> merging the support for iterative optimizations now will allow us to
>>>> consider adding the wonderful smartness to it later.  In the meantime,
>>>> substituting that smartness with a knob is still a great alternative.
>>
>> The benefit?  The benifit is to not have another magic knob in there
>> that doesn't make too much sense and papers over real conceptual/algorithmic
>> issues.  Brute-force iterating is a hack, not a solution. (sorry)
>>
>>> I agree (of course). Having the knob will be very useful for testing and
>>> determining the acceptance criteria for the later "smartness". While
>>> terminating early would be a nice optimization, the feature is still
>>> intrinsically useful and deployable without it. In addition, when using LTO
>>> on nearly all the projects/modules I tested on, 3+ passes were always
>>> productive.
>>
>> If that is true then I'd really like to see testcases.  Because I am sure you
>> are just papering over (mabe even easy to fix) issues by the brute-force
>> iterating approach.  We also do not have a switch to run every pass twice
>> in succession, just because that would be as stupid as this.
>>
>>> To be fair, when not using LTO, beyond 2-3 passes did not often
>>> produce improvements unless individual compilation units were enormous.
>>>
>>> There was also the question of if some of the improvements seen with
>>> multiple passes were indicative of deficiencies in early inlining, CFG, SRA,
>>> etc. If the knob is available, I'm happy to continue testing on the same
>>> projects I've filed recent LTO/graphite bugs against (glib, zlib, openssl,
>>> scummvm, binutils, etc) and write a report on what I observe as "suspicious"
>>> improvements that perhaps should be caught/made in a single pass.
>>>
>>> It's worth noting again that while this is a useful feature in and of itself
>>> (especially when combined with LTO), it's *extremely* useful when coupled
>>> with the de-virtualization improvements submitted in other threads. The
>>> examples submitted for inclusion in the test suite aren't academic -- they
>>> are reductions of real-world performance issues from a mature (and shipping)
>>> C++-based networking product. Any C++ codebase that employs physical
>>> separation in their designs via Factory patterns, Interface Segregation,
>>> and/or Dependency Inversion will likely see improvements. To me, these
>>> enahncements combine to form one of the biggest leaps I've seen in C++ code
>>> optimization -- code that can be clean, OO, *and* fast.
>>
>> But iterating the whole early optimization pipeline is not a sensible approach
>> of attacking these.
>
> Richard,
>
> You are making valid points.  Ideally, we would have a compiler with optimal pass pipeline that would not generate any better or worse code with second and subsequent iterations.  I disagree, however, that the patch discussed in this thread is a hack.  It is a feature.  It allows developers and power-users to experiment with the pass pipeline and see what comes out of it.
>
> The benchmarks for the 2nd version of the patch have finally finished [*].  It shows that the additional iterations do make a difference of 2% for a number of benchmarks.  There are also trends like "additional iterations improve -m64 and hurt -m32" -- 254.gap, 256.bzip2, 300.twolf, 171.swim, 173.applu -- and "additional iterations improve -O2 and hurt -O3" -- 179.art, 400.perlbench.  There also are enough benchmarks that show straight improvement or straight regression.  How are we going to investigate and fix these missed optimization opportunities and regressions without the feature that we are discussing here?
>
> We already provide certain features to enable users tinker with optimization pipeline:
> 1. -O0/-O1/-O2/-O3 options for entry users,
> 2. -f<pass>/-fno-<pass> for advanced users,
> 3. --param <obscure>=MAGIC for developers and compiler-savvy users.
>
> Iterative optimizations should be no different.  "Yes", without this feature the world would be a simpler place for us to live in, as we would not have to know and care about regressions and missed opportunities in GCC's optimizations.  But, really, that would've been a boring world :-).
>
> I feel that dropping this feature will cause the problems to stay unnoticed and never get fixed.

How?  You are not even _analyzing_ why you are getting benefits, or
filing bugs.  Did you experiment with reducing the set of passes you
iterate?  Thus, which passes really do help when iterating?  I've
asked you that before - and showed proper analysis that could be
done to detect the indirect inlining benefit that you might see (surely
not on SPEC 2000 though) - and said that yes, it _would_ make sense
to do sth about this specific case and yes, iterating for a selected
set of functions is a valid solution for the indirect inlining case.
It is never,
IMHO, when no inlining happened between the iterations.

You are basically saying, well - let's iterate optimizers some time
and hope they will catch something.  Sure they will.  Show the important
pieces and analyze why they need the iteration.

You probably should realize that with each of this "power-user" features
the GCC testing matrix becomes larger.  If I'd sell commercial support
for GCC I'd make all but -O* unsupported.

Thanks,
Richard.