This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: RFA: Reduce iterations of elimination bookkeeping
- From: Jeff Law <law at redhat dot com>
- To: Vladimir Makarov <vmakarov at redhat dot com>
- Cc: gcc-patches <gcc-patches at gcc dot gnu dot org>
- Date: Fri, 03 Apr 2009 17:22:01 -0600
- Subject: Re: RFA: Reduce iterations of elimination bookkeeping
- References: <49D6394D.3020509@redhat.com> <49D675DA.6050708@redhat.com>
Vladimir Makarov wrote:
Jeff Law wrote:
A few weeks ago I instrumented the top section of the main reload
loop. Specifically, I was looking to see how often we have to do
elimination bookkeeping.
For an x86 bootstrap, we call reload 101k times and we perform
elimination bookkeeping 151k times. Obviously, we're iterating less
than half the times we call reload -- that was somewhat of a surprise.
However, there's still 50k iterations of elimination bookkeeping that
are worth looking at.
We will iterate on elimination bookkeeping any time the size of the
frame changes. So if we spill a pseudo to memory, allocate a
caller-save slot, align the stack, spill a memory address. We also
iterate for a variety of other reasons such as unexpected changes in
elimination offsets, a previously eliminable register is no longer
eliminable, any spill code generation.
About 13k of the iterations occur because we allocated a stack slot
for caller-save registers. Yup, that's right, 13k just because we
allocated a slot for a caller-save. These iterations can be
trivially avoided by allocating caller-save slots before we do the
elimination bookkeeping.
About 26k iterations occur because of requested stack alignments.
Yes, we allocate a slot to ensure stack alignment *after* elimination
bookkeeping. Interestingly enough, we used to align prior to
elimination bookkeeping, but the code was moved in response to
pr29248 and pr28966. Fortunately, all that was really necessary to
fix those PRs was to avoid aligning the stack if none had yet been
allocated -- changing the sequencing of elimination bookkeeping &
stack alignment was not necessary and obviously was making more work
for reload than was necessary.
With those two fixes to sequencing, we can eliminate 38k (of the 50k)
iterations of elimination bookkeeping. The remaining iterations are
almost exclusively due to spilling.
From a code generation standpoint, these changes permute where
objects land in the frame, so it's possible we can get minor code
generation differences on targets with restricted displacements in
reg+d addressing modes. We can also get some changes in cache
behaviour. I would expect both effects to be neutral overall.
From a compile-time standpoint, we're clearly doing less work, so I'd
expect some minor (possibly unmeasurable) compile-time improvements.
Jeff, sorry. I thought too that the patch makes compiler faster (and
it was quite obvious for me). But valgrind --tool=lackey actually
shows 0.3% more executed insn s for -O2 combine.i on x86 after
applying the patch. I think the problem may be (at least partially)
in bigger code generation, e.g. for combine.i
83082 0 792 83874 147a2 c0.o
83098 0 792 83890 147b2 c1.o <- after
the patch
I see also that compiler allocates bigger stack space for many
functions after applying the patch even in cases when all stack
displacements are the same. I have no idea why is that.
I've found the source of extra stack allocations and it's trivial to
fix. Basically we're allocating multiple slots for alignment purposes.
This was something I had looked at, but convinced myself it wasn't an
issue because the old code didn't worry about multiple alignment slots.
Sure enough, it's trivial to show the old compiler allocating multiple
alignment slots. So that's clearly something that'll be fixed :-)
I've got valgrind tests running.
jeff