This is the mail archive of the
mailing list for the GCC project.
Re: Proposal for a 'gcc -O4': interprocedural optimization
- From: Dave Hudson <dave at cyclicode dot net>
- To: Chris Lattner <sabre at nondot dot org>
- Cc: gcc at gcc dot gnu dot org
- Date: Sat, 24 Aug 2002 22:47:44 +0100
- Subject: Re: Proposal for a 'gcc -O4': interprocedural optimization
- References: <Pine.LNX.firstname.lastname@example.org>
Chris Lattner wrote:
Hmm - interesting. Part of what I've set out to understand with my
recent code is exactly how much we can gain from such situations.
"Register" moves are so absurdly expensive that every time I can
eliminate one it makes me very happy (e.g. for the IP2022 each 16-bit
reg-to-reg, reg-to-mem or mem-to-mem copy costs 4 opcodes). Just
recently I've had some pretty surprising success with some constant
propagation code I wrote and so allowing this to span multiple functions
could be *very* useful.
On Sat, 24 Aug 2002, Dave Hudson wrote:
Ok, that makes sense. In this case you would still benefit a lot from
elimination of loads and stores. Not only do the actual loads and stores
consume instruction space, but folding two loads together has a nice
cascading effect on other optimizations that can be performed (mostly
scalar simplifications arising from better value #ing information).
Hmm - some of our apps have started to use dlls because we've been tight
on space, but if the interprocedural wins were sufficiently large then
this would certainly make a very strong case to eliminate their use, at
least in places. I guess though that even in these cases because the
dlls are being used more as a paging mechanism than anything else then
their use could be worked out statically anyway.
You'd be amazed how much of our code uses library code that calls back
into places - almost every event that doesn't happen synchronously
triggers callbacks. With that said, however, I can suddenly see a whole
range of potential improvements here (at the moment such improvements
Sure that's absolutely no problem. Library code is really easy to handle:
just compile the library code and interprocedurally optimize it with the
rest of the application. Shared objects are the problem, because
currently there is no good way to specify which externally visible
functions may be called by dynamically loaded code.
True, but I think embedded apps are very good examples of where the wins
are huge - typically code and data spaces are limited and also product
volumes are sufficiently large and costs sufficiently tight that moving
to the next sized processor up just isn't an option.
This is why I think this sort of optimization has great potential in
many of these sorts of embedded apps.
Not _just_ embedded apps! :)