This is the mail archive of the
mailing list for the GCC project.
Re: Proposal for a 'gcc -O4': interprocedural optimization
- From: Dave Hudson <dave at cyclicode dot net>
- To: Chris Lattner <sabre at nondot dot org>
- Cc: gcc at gcc dot gnu dot org
- Date: Sat, 24 Aug 2002 23:45:17 +0100
- Subject: Re: Proposal for a 'gcc -O4': interprocedural optimization
- References: <Pine.LNX.firstname.lastname@example.org>
Chris Lattner wrote:
It's definitely interesting - actually if you can keep things to 8 bits
so that the accumulator register can be used then things work
surprisingly well. With this one though we hide the accumulator from
the register allocator completely and only make it visible late in the
machine-dependent-reorg. Fortunately we have the best push and pop
opcodes I've found and these can work wonders to keep code sizes down.
On Sat, 24 Aug 2002, Dave Hudson wrote:
Wow. It certainly sounds like an interesting architecture! :)
> ... I haven't
Well most of the library code that we ship with our SDK is written to be
completely general-purpose because we only want one function to handle
each type of operation. In practice, however, with networking code
(which is what most of our stuff is) most applications tend to use
functions in a stylized way appropriate to the problem at hand. As an
example a lot of our code will take a pointer to a datalink layer
because we have some apps that run with 8 or even 9 such link layers,
however in the majority of cases the code only uses one and that
consequently if this could be analyzed correctly then we could
effectively eliminate every single use of such pointers as parameters.
There are plenty of similar situations elsewhere.
actually played with interprocedural constant propogation myself, but I
expect that the opportunities are fairly limited without using function
cloning: basically you would only use it when the parameter to a function
is _always_ the _same_ constant. That said, it sounds like you could get
some impressive savings just form some simple interprocedural register
One of the reasons I suspect more code is not written in a completely
general way is that if tools can't run the sort of analysis we're
considering here the costs are usually pretty terrible.
Right - more importantly if some new feature is required then this can
mean very costly re-engineering. Another issue is that many smaller
embedded systems companies try to use the same processor family for
almost everything they do - generating better code gives them far more
options to avoid needing to move to multiple architectures.
True, but I think embedded apps are very good examples of where the wins
are huge - typically code and data spaces are limited and also product
volumes are sufficiently large and costs sufficiently tight that moving
to the next sized processor up just isn't an option.
That's actually a good point that I hadn't considered. With desktops and
scientific applications, it's nice for things to be a few percent faster,
but not critical. With embedded apps, if you bump over the line of what
your architecture can support, you end up having to move to a different
processor/architecture/system, which could add to the final cost of the
Of course speed is always useful too :-)