This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: String packing

> On the Pentium 4 the fastest code I often get with '-Os', I have some example
> where '-O3' generates code with 40% of the performance than the '-Os' code.

The pentium4 will be kind of challenge to tune for as it is extremly sesnible
to code size (but not measured in the actual size, but in the number of
microinstruction generated and these numbers are not documented as far
as I know) and yet it requires many optimizations that causes extreme
code size growth - such as it is worthwile to replace shift by up to 4 (or 8?)
additions. For multiplies the value is roughtly about 30 aditions or 30/4shifts.

To tune for such chip it is critical to identify hot spots of functions and
drive the decisions using them. Partly this can be done using my framework for
profile based optimizations, but still it can be nice to have scheduler-like
code identifiying the hot paths trought the basic block and doing the
transofrmations just there.

I've implemented the -march=pentium4 switch, but it is not really tunned yet -
I just needed to avoid biggest code generation mistakes gcc do to make some
benchmarking.  Hope that this will change in future.

Knowing small testcases, where -Os is win is very interesting still, as
-Os should be win just for moderately sized or large programs.

Note that most of benefits can come from the fact, that -Os disables
instruction alignment - try it with -O2 and -function-aling=1, -floop-align=1, -fjump-align=1 and maybe it will solve the problem you are seeing.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]