This is the mail archive of the
mailing list for the GCC project.
Re: Quantitative analysis of -Os vs -O3
- From: Markus Trippelsdorf <markus at trippelsdorf dot de>
- To: Allan Sandfeld Jensen <linux at carewolf dot com>
- Cc: gcc at gcc dot gnu dot org, Andrew Pinski <pinskia at gmail dot com>, Michael Clark <michaeljclark at mac dot com>, egall at gwmail dot gwu dot edu
- Date: Sat, 26 Aug 2017 12:59:06 +0200
- Subject: Re: Quantitative analysis of -Os vs -O3
- Authentication-results: sourceware.org; auth=none
- References: <A38B64A9-A409-4245-9F9F-083E096A14A0@mac.com> <CA+=Sn1ndSytA3CYecimoB7h_3Az-AaQNWa5PSEsMeqiMVPBfUQ@mail.gmail.com> <20170826085616.GA31687@x4> <3154967.4SU9o1nrP3@twilight>
On 2017.08.26 at 12:40 +0200, Allan Sandfeld Jensen wrote:
> On Samstag, 26. August 2017 10:56:16 CEST Markus Trippelsdorf wrote:
> > On 2017.08.26 at 01:39 -0700, Andrew Pinski wrote:
> > > First let me put into some perspective on -Os usage and some history:
> > > 1) -Os is not useful for non-embedded users
> > > 2) the embedded folks really need the smallest code possible and
> > > usually will be willing to afford the performance hit
> > > 3) -Os was a mistake for Apple to use in the first place; they used it
> > > and then GCC got better for PowerPC to use the string instructions
> > > which is why -Oz was added :)
> > > 4) -Os is used heavily by the arm/thumb2 folks in bare metal applications.
> > >
> > > Comparing -O3 to -Os is not totally fair on x86 due to the many
> > > different instructions and encodings.
> > > Compare it on ARM/Thumb2 or MIPS/MIPS16 (or micromips) where size is a
> > > big issue.
> > > I soon have a need to keep overall (bare-metal) application size down
> > > to just 256k.
> > > Micro-controllers are places where -Os matters the most.
> > >
> > > This comment does not help my application usage. It rather hurts it
> > > and goes against what -Os is really about. It is not about reducing
> > > icache pressure but overall application code size. I really need the
> > > code to fit into a specific size.
> > For many applications using -flto does reduce code size more than just
> > going from -O2 to -Os.
> I added the option to optimize with -Os in Qt, and it gives an average 15%
> reduction in binary size, somtimes as high as 25%. Using lto gives almost the
> same (slightly less), but the two options combine perfectly and using both can
> reduce binary size from 20 to 40%. And that is on a shared library, not even a
> statically linked binary.
> Only real minus is that some of the libraries especially QtGui would benefit
> from a auto-vectorization, so it would be nice if there existed an -O3s
> version which vectorized the most obvious vectorizable functions, a few
> hundred bytes for an additional version here and there would do good.
> Fortunately it doesn't too much damage as we have manually vectorized routines
> for to have good performance also on MSVC, if we relied more on auto-
> vectorization it would be worse.
In that case using profile guided optimizations will help. It will
optimize cold functions with -Os and hot functions with -O3 (when using
e.g.: "-flto -O3 -fprofile-use"). Of course you will have to compile
twice and also collect training data from your library in between.