This is the mail archive of the
mailing list for the GCC project.
Re: Proposal for a 'gcc -O4': interprocedural optimization
- From: Chris Lattner <sabre at nondot dot org>
- To: Dave Hudson <dave at cyclicode dot net>
- Cc: <gcc at gcc dot gnu dot org>
- Date: Sat, 24 Aug 2002 16:15:08 -0500 (CDT)
- Subject: Re: Proposal for a 'gcc -O4': interprocedural optimization
On Sat, 24 Aug 2002, Dave Hudson wrote:
> > Certainly that. With a reasonable alias analysis infrastructure you can
> > expect a lot more loads to be folded and stores to be eliminated: that
> > alone can save a lot more than your 1%. The nice thing about having the
> For pretty much most of the targets I'm interested in right now memory
> traffic isn't a problem - everything's on-chip and accessing memory
> costs extactly the same as accessing registers (when every instruction
> costs two bytes of code space and pretty much all non-branching
> instructions cost exactly one clock cycle then determining costs is
> pretty easy :-)). With the ip2k backend I go to a reasonable amount of
Ok, that makes sense. In this case you would still benefit a lot from
elimination of loads and stores. Not only do the actual loads and stores
consume instruction space, but folding two loads together has a nice
cascading effect on other optimizations that can be performed (mostly
scalar simplifications arising from better value #ing information).
> What I'd not thought of here though is that in situations where
> arguments are simply propagated through to another function then that
> could give enormous opportunities for localized improvements. Much of
> the code I'm interested in already uses explicit inlining within modules
> but getting much of the same advantage across modules would be a big win
> at times.
Sure, you could take advantage of these situations as well. Although GCC
could do some reasonable interprocedural optimization within translation
units, I don't think it currently does (it's mostly function-at-a-time).
I am probably wrong though, because development has certainly been picking
up, especially on the branches. :)
> > I've found that one of the interesting (optional) transformations that can
> > be run at link time is an "internalizing" pass. Basically if your
> > application doesn't have shared libraries calling back into it (which I
> > expect is rare in embedded apps :), you can mark all of the functions in
> > the program (except main) as "static". This allows the optimizer a lot
> You'd be amazed how much of our code uses library code that calls back
> into places - almost every event that doesn't happen synchronously
> triggers callbacks. With that said, however, I can suddenly see a whole
> range of potential improvements here (at the moment such improvements
Sure that's absolutely no problem. Library code is really easy to handle:
just compile the library code and interprocedurally optimize it with the
rest of the application. Shared objects are the problem, because
currently there is no good way to specify which externally visible
functions may be called by dynamically loaded code.
> generally require more experienced developers to refactor significant
> chunks of code within modules and of course none of this works across
> modules). One interesting problem however is how a developer would go
> about debugging under such circumstances. Ideally some means would be
> needed to allow some code to be compiled and marked as being unavailable
> for significant transformations - when code space is constrained then
> building and testing things with -O0 just isn't an option :-(
That too is no problem at all. In my initial proposal I explicitly
allowed the linker to combine some native .o files and some "high-level"
.o files together. Of course the linker can do better optimization if
more code is in the high level format, but this is an easy way to control
debuggability. Code you want to debug can just be compiled as normal,
> > Obviously, this is not appropriate for all applications, but gives a
> > flavor of nice things that can be done with bigger scope for analysis and
> > transformation...
> This is why I think this sort of optimization has great potential in
> many of these sorts of embedded apps.
Not _just_ embedded apps! :)