Quantitative analysis of -Os vs -O3

Allan Sandfeld Jensen linux@carewolf.com
Sat Aug 26 10:59:00 GMT 2017


On Samstag, 26. August 2017 10:56:16 CEST Markus Trippelsdorf wrote:
> On 2017.08.26 at 01:39 -0700, Andrew Pinski wrote:
> > First let me put into some perspective on -Os usage and some history:
> > 1) -Os is not useful for non-embedded users
> > 2) the embedded folks really need the smallest code possible and
> > usually will be willing to afford the performance hit
> > 3) -Os was a mistake for Apple to use in the first place; they used it
> > and then GCC got better for PowerPC to use the string instructions
> > which is why -Oz was added :)
> > 4) -Os is used heavily by the arm/thumb2 folks in bare metal applications.
> > 
> > Comparing -O3 to -Os is not totally fair on x86 due to the many
> > different instructions and encodings.
> > Compare it on ARM/Thumb2 or MIPS/MIPS16 (or micromips) where size is a
> > big issue.
> > I soon have a need to keep overall (bare-metal) application size down
> > to just 256k.
> > Micro-controllers are places where -Os matters the most.
> > 
> > This comment does not help my application usage.  It rather hurts it
> > and goes against what -Os is really about.  It is not about reducing
> > icache pressure but overall application code size.  I really need the
> > code to fit into a specific size.
> 
> For many applications using -flto does reduce code size more than just
> going from -O2 to -Os.

I added the option to optimize with -Os in Qt, and it gives an average 15% 
reduction in binary size, somtimes as high as 25%. Using lto gives almost the 
same (slightly less), but the two options combine perfectly and using both can 
reduce binary size from 20 to 40%. And that is on a shared library, not even a 
statically linked binary.

Only real minus is that some of the libraries especially QtGui would benefit 
from a auto-vectorization, so it would be nice if there existed an -O3s 
version which vectorized the most obvious vectorizable functions, a few 
hundred bytes for an additional version here and there would do good. 
Fortunately it doesn't too much damage as we have manually vectorized routines 
for to have good performance also on MSVC, if we relied more on auto-
vectorization it would be worse.

`Allan



More information about the Gcc mailing list