This is the mail archive of the
mailing list for the GCC project.
-O2 versus -O1 (Was: Re: GCSE store motion)
- From: Jan Hubicka <jh at suse dot cz>
- To: Robert Dewar <dewar at gnat dot com>
- Cc: dberlin at dberlin dot org, mark at codesourcery dot com, roger at eyesopen dot com,aj at suse dot de, davem at redhat dot com, gcc at gcc dot gnu dot org, rth at redhat dot com
- Date: Thu, 16 May 2002 16:07:07 +0200
- Subject: -O2 versus -O1 (Was: Re: GCSE store motion)
- References: <20020516114838.949B6F28C9@nile.gnat.com>
> > That means we shouldn't be spending much time trying to do software
> > loop pipelining when compiling GCC, so the optimization shouldn't
> > make compiling the compiler significantly slower.
> I don't see how you conclude this. You have to do the analysis on every
> loop. There will definitely be loops in GCC where the optimization is
> possible, there will be loops where it is not. I would expect the
> compiler to spend quite a bit of time trying to improve code for
> loops in GCC. What I am saying is that I doubt that the overall
> effect will be that benficial for GCC.
I don't think the rule should be taken literaly for each optimization.
Software pipelining, profile feedback, loop unroling, function inlining,
prefetch code genration, scheduling on i386 are all optimizations that will
lose in such test and still are worthwhile to have as for numeric code for
instance are a must.
I think we have -O1 for those "I want sane code but don't have time to wait"
and -O2 for "I can wait to save extra few %".
On the other hand, what I think is wortwhile is to reconsider what optimizations
should be enabled at -O1. Currently we do:
flag_defer_pop = 1;
flag_thread_jumps = 1;
flag_delayed_branch = 1;
flag_omit_frame_pointer = 1;
flag_guess_branch_prob = 1;
flag_cprop_registers = 1;
flag_loop_optimize = 1;
flag_crossjumping = 1;
flag_if_conversion = 1;
flag_if_conversion2 = 1;
I believe crossjumping, jump threading and perhaps if conversion 2 are examples
of such optimizations that are expensive and brings not so much benefit.
Do you think it makes sense to run some tests and think about disabling them?
Would be the "bootstrap -O1" considered as valueable rule of thumb?
On the other hand at -O2 we do some bits that are not that expensive
and may come to -O1 category. I would guess for:
flag_optimize_sibling_calls = 1;
flag_rename_registers = 1;
flag_caller_saves = 1;
flag_force_mem = 1;
flag_regmove = 1;
flag_strict_aliasing = 1;
flag_reorder_blocks = 1;
flag_reorder_functions = 1;
What do you think? If we get kind of agreeement, I can run series of tests
for these optimizations...
Another thing I believe can be worthwhile is to have switch that enables
the aggressive bits, like loop unrolling or prefetch people can use for
benchmarks or very CPU bound code. It appears to be common problems of the
GCC reviews that they do use suboptimal switches and partly it is our mistake
I guess. It is very dificult to set it up.