This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: RFA: Reduce iterations of elimination bookkeeping
- From: Vladimir Makarov <vmakarov at redhat dot com>
- To: Jeff Law <law at redhat dot com>
- Cc: gcc-patches <gcc-patches at gcc dot gnu dot org>
- Date: Fri, 03 Apr 2009 16:47:22 -0400
- Subject: Re: RFA: Reduce iterations of elimination bookkeeping
- References: <49D6394D.3020509@redhat.com>
Jeff Law wrote:
A few weeks ago I instrumented the top section of the main reload
loop. Specifically, I was looking to see how often we have to do
elimination bookkeeping.
For an x86 bootstrap, we call reload 101k times and we perform
elimination bookkeeping 151k times. Obviously, we're iterating less
than half the times we call reload -- that was somewhat of a surprise.
However, there's still 50k iterations of elimination bookkeeping that
are worth looking at.
We will iterate on elimination bookkeeping any time the size of the
frame changes. So if we spill a pseudo to memory, allocate a
caller-save slot, align the stack, spill a memory address. We also
iterate for a variety of other reasons such as unexpected changes in
elimination offsets, a previously eliminable register is no longer
eliminable, any spill code generation.
About 13k of the iterations occur because we allocated a stack slot
for caller-save registers. Yup, that's right, 13k just because we
allocated a slot for a caller-save. These iterations can be trivially
avoided by allocating caller-save slots before we do the elimination
bookkeeping.
About 26k iterations occur because of requested stack alignments.
Yes, we allocate a slot to ensure stack alignment *after* elimination
bookkeeping. Interestingly enough, we used to align prior to
elimination bookkeeping, but the code was moved in response to pr29248
and pr28966. Fortunately, all that was really necessary to fix those
PRs was to avoid aligning the stack if none had yet been allocated --
changing the sequencing of elimination bookkeeping & stack alignment
was not necessary and obviously was making more work for reload than
was necessary.
With those two fixes to sequencing, we can eliminate 38k (of the 50k)
iterations of elimination bookkeeping. The remaining iterations are
almost exclusively due to spilling.
From a code generation standpoint, these changes permute where objects
land in the frame, so it's possible we can get minor code generation
differences on targets with restricted displacements in reg+d
addressing modes. We can also get some changes in cache behaviour. I
would expect both effects to be neutral overall.
From a compile-time standpoint, we're clearly doing less work, so I'd
expect some minor (possibly unmeasurable) compile-time improvements.
Jeff, sorry. I thought too that the patch makes compiler faster (and
it was quite obvious for me). But valgrind --tool=lackey actually shows
0.3% more executed insn s for -O2 combine.i on x86 after applying the
patch. I think the problem may be (at least partially) in bigger code
generation, e.g. for combine.i
83082 0 792 83874 147a2 c0.o
83098 0 792 83890 147b2 c1.o <- after
the patch
I see also that compiler allocates bigger stack space for many functions
after applying the patch even in cases when all stack displacements are
the same. I have no idea why is that.
Bootstrapped and regression tested on i686-pc-linux-gnu. I also
verified pr29248 and pr28966 continue to generate the desired code.