This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Faster compilation speed
- From: Geoff Keating <geoffk at geoffk dot org>
- To: Matt Austern <austern at apple dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: 12 Aug 2002 16:00:08 -0700
- Subject: Re: Faster compilation speed
- References: <FB532843-AE35-11D6-9F25-00039390D9E0@apple.com>
Matt Austern <austern@apple.com> writes:
> On Monday, August 12, 2002, at 12:43 PM, David S. Miller wrote:
>
> > From: Matt Austern <austern@apple.com>
> > Date: Mon, 12 Aug 2002 12:47:30 -0700
> >
> > And yes, we're aware that many gains are possible only
> > if we rewrite the parser or redesign the tree structure. The
> > only reason we haven't started on rewriting the parser is
> > that someone else is already doing it.
> >
> > So work on an attempt at RTL refcounting, the patch below is a place
> > to start.
>
> Thanks for the pointer, that's a useful starting point.
>
> But, at the risk of sounding like a broken record... Do
> we have benchmarks showing that RTL gc is one of
> the major causes of slow compile speed?
We happen to know that GC as a whole is 10-13% of total compile time,
even at -O0, and my expectation is that the RTL part of that is
perhaps two-thirds, say 7%. So the benefit you can get is 7% less any
overhead in tracking the reference counts and freeing
briefly-allocated RTL.
My suggestion is to try shrinking RTL in other ways. For instance,
once RTL is generated it should all match an insn or a splitter. If
we could store RTL as the insn number (or a splitter number) plus the
operands, rather than the expanded form we have now, that should be
much easier to traverse. For those operations that look at the form
of RTL, code could be generated to perform that operation knowing what
insns exist; for instance, on x86 the form of the 'add' instruction is:
(insn 15 13 17 (parallel[
(set (reg:SI 61)
(plus:SI (reg/v:SI 59)
(reg/v:SI 60)))
(clobber (reg:CC 17 flags))
] ) -1 (nil)
(nil))
we could represent this as
(packed_insn 15 13 17 207 {*addsi_1} [(reg:SI 61) (reg:SI 59) (reg:SI 60)])
which would save us, by my count, 50% of the RTL objects for this
case. I'd expect that would then speed GC (on this object) by 50%,
speed up allocation by 50%, and hopefully would also speed up code
that uses these objects because (a) they'd better fit in cache and (b)
there would be fewer pointers to chase.
To perform operations that are now done directly on the RTL, there'd be
a switch statement, for instance:
int reg_mentioned_p (reg, in)
{
...
case PACKED_INSN:
switch (PACKED_INSN_NUMBER (in)) {
...
case 207: /* *addsi_1 */
if (REGNO (reg) == 17) // deal with the clobbered register
return 1;
// deal with the operands
break;
}
...
}
Even combine can be handled this way, by pregenerating rules based on
the insn numbers being combined. Relatively few insns can actually be
combined, so it shouldn't require a huge amount of generated code.
On RISCy chips, you could take even further advantage of the fact that
often an operand is guaranteed to be a register, or a constant
integer or whatever, and so eliminate some tests.
I'm not sure how much work this is to implement. I suspect what you'd
end up doing is performing a trade-off between generating too many
routines and having to rewrite large chunks of old code to use the
routines that already exist but that they don't use.
Now, if only I could think of something that would work like this on
trees...
--
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>