This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: [RFC] Kill gen_sequence
- From: law at redhat dot com
- To: "David S. Miller" <davem at redhat dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Fri, 07 Jun 2002 13:10:33 -0600
- Subject: Re: [RFC] Kill gen_sequence
- Reply-to: law at redhat dot com
In message <20020606.202600.40740199.davem@redhat.com>, "David S. Miller" write
s:
> From: law@redhat.com
> Date: Thu, 06 Jun 2002 14:01:21 -0600
>
> The net result is about a .4% improvement in compile time. I realize the
> point behind this patch wasn't to improve compile-times, but it's nice to
> see it happen as a side effect.
>
> Thanks for checking this out Jeff.
No problem. You interested in some other tidbits? Consider the following
stats gathered by gprof on the PA when building those 162 testfiles with -O2
optimization:
We do just shy of 40 million calls to gen_rtx_REG. 99.8% of those proceed to
call gen_raw_REG (and they account for 98+% of the calls to gen_raw_REG).
gen_raw_REG unconditionally calls gen_rtx_fmt_i0 and accounts for 99.9% of the
calls into gen_rtx_fmt_i0.
gen_rtx_fmt_i0 unconditionally calls ggc_alloc. The calls into ggc_alloc via
this chain account for more than 40% of all the calls to ggc_alloc. [ggc_alloc
is the second most called function in the profiling data. ]
Now for the kicker. If we go back to gen_rtx_REG we find that 87% of the calls
to gen_rtx_REG come from propagate_one_insn.
That's bloody interesting. We've got one call chain appears to be accounting
for nearly 40% of all the calls to ggc_alloc!
With a relatively simple change to flow.c we go from 39.9 million calls to
gen_rtx_REG to just around 8 million. We go from 89 million calls to ggc_alloc
to 57 million.
The only trick is I have no idea what the structure sharing assumptions are
for hard registers. Clearly we're allowed to have multiple rtx objects that
refer to the same hard register -- but is it a requirement that each rtx
for a hard register be distinct? If we can share hard register rtxs of
the same mode, then we can eliminate a ton of calls to gen_rtx_REG and
ggc_alloc.
jeff