This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

-O2 versus -O1 (Was: Re: GCSE store motion)


> > That means we shouldn't be spending much time trying to do software
> > loop pipelining when compiling GCC, so the optimization shouldn't
> > make compiling the compiler significantly slower.
> 
> I don't see how you conclude this. You have to do the analysis on every
> loop. There will definitely be loops in GCC where the optimization is
> possible, there will be loops where it is not. I would expect the
> compiler to spend quite a bit of time trying to improve code for
> loops in GCC. What I am saying is that I doubt that the overall
> effect will be that benficial for GCC.

I don't think the rule should be taken literaly for each optimization.
Software pipelining, profile feedback, loop unroling, function inlining,
prefetch code genration, scheduling on i386 are all optimizations that will
lose in such test and still are worthwhile to have as for numeric code for
instance are a must.

I think we have -O1 for those "I want sane code but don't have time to wait"
and -O2 for "I can wait to save extra few %".

On the other hand, what I think is wortwhile is to reconsider what optimizations
should be enabled at -O1. Currently we do:

      flag_defer_pop = 1;
      flag_thread_jumps = 1;
#ifdef DELAY_SLOTS
      flag_delayed_branch = 1;
#endif
#ifdef CAN_DEBUG_WITHOUT_FP
      flag_omit_frame_pointer = 1;
#endif
      flag_guess_branch_prob = 1;
      flag_cprop_registers = 1;
      flag_loop_optimize = 1;
      flag_crossjumping = 1;
      flag_if_conversion = 1;
      flag_if_conversion2 = 1;

I believe crossjumping, jump threading and perhaps if conversion 2 are examples
of such optimizations that are expensive and brings not so much benefit.
Do you think it makes sense to run some tests and think about disabling them?
Would be the "bootstrap -O1" considered as valueable rule of thumb?

On the other hand at -O2 we do some bits that are not that expensive
and may come to -O1 category.  I would guess for:

      flag_optimize_sibling_calls = 1;
      flag_rename_registers = 1;
      flag_caller_saves = 1;
      flag_force_mem = 1;
      flag_regmove = 1;
      flag_strict_aliasing = 1;
      flag_reorder_blocks = 1;
      flag_reorder_functions = 1;

What do you think?  If we get kind of agreeement, I can run series of tests
for these optimizations...

Another thing I believe can be worthwhile is to have switch that enables
the aggressive bits, like loop unrolling or prefetch people can use for
benchmarks or very CPU bound code.  It appears to be common problems of the
GCC reviews that they do use suboptimal switches and partly it is our mistake
I guess. It is very dificult to set it up.

Honza


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]