This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Proposal for a 'gcc -O4': interprocedural optimization

Hi Chris,

Chris Lattner wrote:
On Sat, 24 Aug 2002, Dave Hudson wrote:

My current estimate is that even trivial interprocedural optimizations
could be worth well over 1% to us on code size and around 2% on speed.
In addition, being able to better optimize the use of GPRs and
consequently avoiding using stack slots where not necessary could save
about 1% of our total data usage (and around 4% of stack space usage).
Certainly that.  With a reasonable alias analysis infrastructure you can
expect a lot more loads to be folded and stores to be eliminated: that
alone can save a lot more than your 1%.  The nice thing about having the
whole program available for analysis is that you can write more powerful
analyses, and reuse the existing transformations without change (assuming
well defined interfaces exist of course).  Of course there are a variety
of techniques that can also give you more than 2% speed, for example
redundant load/store elimination can relieve a lot of memory traffic
For pretty much most of the targets I'm interested in right now memory traffic isn't a problem - everything's on-chip and accessing memory costs extactly the same as accessing registers (when every instruction costs two bytes of code space and pretty much all non-branching instructions cost exactly one clock cycle then determining costs is pretty easy :-)). With the ip2k backend I go to a reasonable amount of trouble to optimize away as many register moves as I can, leaving memory references (usually to function arguments passed on the stack) in their place instead. Of course I have to ensure that this happens late to avoid not being able to take advantage of some useful flow analysis-based optimizations.

What I'd not thought of here though is that in situations where arguments are simply propagated through to another function then that could give enormous opportunities for localized improvements. Much of the code I'm interested in already uses explicit inlining within modules but getting much of the same advantage across modules would be a big win at times.

I've found that one of the interesting (optional) transformations that can
be run at link time is an "internalizing" pass.  Basically if your
application doesn't have shared libraries calling back into it (which I
expect is rare in embedded apps :), you can mark all of the functions in
the program (except main) as "static".  This allows the optimizer a lot
more freedom to, for example, simplify the prototypes for functions, hoist
conditions out of functions into their caller (for example, because the
predicate is a function of a constant argument, etc), and other
You'd be amazed how much of our code uses library code that calls back into places - almost every event that doesn't happen synchronously triggers callbacks. With that said, however, I can suddenly see a whole range of potential improvements here (at the moment such improvements generally require more experienced developers to refactor significant chunks of code within modules and of course none of this works across modules). One interesting problem however is how a developer would go about debugging under such circumstances. Ideally some means would be needed to allow some code to be compiled and marked as being unavailable for significant transformations - when code space is constrained then building and testing things with -O0 just isn't an option :-(

Obviously, this is not appropriate for all applications, but gives a
flavor of nice things that can be done with bigger scope for analysis and
This is why I think this sort of optimization has great potential in many of these sorts of embedded apps.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]