This is the mail archive of the
mailing list for the GCC project.
Re: Proposal for a 'gcc -O4': interprocedural optimization
- From: Chris Lattner <sabre at nondot dot org>
- To: Dave Hudson <dave at cyclicode dot net>
- Cc: <gcc at gcc dot gnu dot org>
- Date: Sat, 24 Aug 2002 15:07:36 -0500 (CDT)
- Subject: Re: Proposal for a 'gcc -O4': interprocedural optimization
On Sat, 24 Aug 2002, Dave Hudson wrote:
[ Note, I'm CC'ing the list, because the discussion may be interesting for
> One group of users whom I'm sure would love to see interprocedural
> optimization are those of embedded processors. The AVR, 68HC11 and
> IP2022 ports all run on platforms where just exapanding code memory
> isn't an option - similar things happen with some SoCs too.
Sure, there are a lot of interprocedural optimizations focused solely on
removing dead code, interprocedural GCSE, etc... which are powerful ways
to shrink the size of the final program. Even inlining functions only
called from one place (but across translation unit boundaries) can be a
simple way to reduce the size of programs.
> My current estimate is that even trivial interprocedural optimizations
> could be worth well over 1% to us on code size and around 2% on speed.
> In addition, being able to better optimize the use of GPRs and
> consequently avoiding using stack slots where not necessary could save
> about 1% of our total data usage (and around 4% of stack space usage).
Certainly that. With a reasonable alias analysis infrastructure you can
expect a lot more loads to be folded and stores to be eliminated: that
alone can save a lot more than your 1%. The nice thing about having the
whole program available for analysis is that you can write more powerful
analyses, and reuse the existing transformations without change (assuming
well defined interfaces exist of course). Of course there are a variety
of techniques that can also give you more than 2% speed, for example
redundant load/store elimination can relieve a lot of memory traffic
> Much more significant though is that by being able to do things like
> constant propagation through our library code I'd anticipate that in
> some applications we could see huge improvements in code size and data
> usage though - our software has some interesting similarities to much
> C++ code even though it's written in C and consequently has quite a lot
> of places where things could be eliminated in the majority of applications.
I've found that one of the interesting (optional) transformations that can
be run at link time is an "internalizing" pass. Basically if your
application doesn't have shared libraries calling back into it (which I
expect is rare in embedded apps :), you can mark all of the functions in
the program (except main) as "static". This allows the optimizer a lot
more freedom to, for example, simplify the prototypes for functions, hoist
conditions out of functions into their caller (for example, because the
predicate is a function of a constant argument, etc), and other
Obviously, this is not appropriate for all applications, but gives a
flavor of nice things that can be done with bigger scope for analysis and