This is the mail archive of the
mailing list for the GCC project.
Re: Proposal for a 'gcc -O4': interprocedural optimization
- From: Chris Lattner <sabre at nondot dot org>
- To: Dave Hudson <dave at cyclicode dot net>
- Cc: <gcc at gcc dot gnu dot org>
- Date: Sat, 24 Aug 2002 17:05:53 -0500 (CDT)
- Subject: Re: Proposal for a 'gcc -O4': interprocedural optimization
On Sat, 24 Aug 2002, Dave Hudson wrote:
> > Ok, that makes sense. In this case you would still benefit a lot from
> > elimination of loads and stores. Not only do the actual loads and stores
> > consume instruction space, but folding two loads together has a nice
> > cascading effect on other optimizations that can be performed (mostly
> > scalar simplifications arising from better value #ing information).
> Hmm - interesting. Part of what I've set out to understand with my
> recent code is exactly how much we can gain from such situations.
> "Register" moves are so absurdly expensive that every time I can
> eliminate one it makes me very happy (e.g. for the IP2022 each 16-bit
> reg-to-reg, reg-to-mem or mem-to-mem copy costs 4 opcodes). Just
> recently I've had some pretty surprising success with some constant
> propagation code I wrote and so allowing this to span multiple functions
> could be *very* useful.
Wow. It certainly sounds like an interesting architecture! :) I haven't
actually played with interprocedural constant propogation myself, but I
expect that the opportunities are fairly limited without using function
cloning: basically you would only use it when the parameter to a function
is _always_ the _same_ constant. That said, it sounds like you could get
some impressive savings just form some simple interprocedural register
> Hmm - some of our apps have started to use dlls because we've been tight
> on space, but if the interprocedural wins were sufficiently large then
> this would certainly make a very strong case to eliminate their use, at
> least in places. I guess though that even in these cases because the
> dlls are being used more as a paging mechanism than anything else then
> their use could be worked out statically anyway.
Sure you could do that as well. The only reason dynamic loading is
problematic is because it allows users of the program to load code that
was not available at static compilation time. If you _do_ know all of the
code in use, you can obviously lift this restriction.
> >>This is why I think this sort of optimization has great potential in
> >>many of these sorts of embedded apps.
> > Not _just_ embedded apps! :)
> True, but I think embedded apps are very good examples of where the wins
> are huge - typically code and data spaces are limited and also product
> volumes are sufficiently large and costs sufficiently tight that moving
> to the next sized processor up just isn't an option.
That's actually a good point that I hadn't considered. With desktops and
scientific applications, it's nice for things to be a few percent faster,
but not critical. With embedded apps, if you bump over the line of what
your architecture can support, you end up having to move to a different
processor/architecture/system, which could add to the final cost of the