This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: RFA: Reduce iterations of elimination bookkeeping
- From: Jeff Law <law at redhat dot com>
- To: Vladimir Makarov <vmakarov at redhat dot com>
- Cc: gcc-patches <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 07 Apr 2009 12:57:09 -0600
- Subject: Re: RFA: Reduce iterations of elimination bookkeeping
- References: <49D6394D.3020509@redhat.com> <49D675DA.6050708@redhat.com>
Vladimir Makarov wrote:
Jeff, sorry. I thought too that the patch makes compiler faster (and
it was quite obvious for me). But valgrind --tool=lackey actually
shows 0.3% more executed insn s for -O2 combine.i on x86 after
applying the patch. I think the problem may be (at least partially)
in bigger code generation, e.g. for combine.i
83082 0 792 83874 147a2 c0.o
83098 0 792 83890 147b2 c1.o <- after
the patch
I see also that compiler allocates bigger stack space for many
functions after applying the patch even in cases when all stack
displacements are the same. I have no idea why is that.
I'm withdrawing the patch. I've probably already spent more time
dorking around with it than I should.
I considered just moving the caller-save setup, but that actually is a
net loss. The best theory I've got is that the setup_save_areas is
significantly more costly than elimination bookkeeping. Enough that the
extra calls we're making to setup_save_areas is offsetting the savings
we're getting from fewer iterations of elimination bookkeeping.
Basically we
had something like this:
start:
elimination bookkeeping
if (something changed)
goto start;
caller-save-setup
if (something changed)
goto start
...
Note this is (effectively) a two loop nest with a common header. After
my patch it looks like this:
start
caller-save setup
elimination bookkeeping
if (something changed)
goto start
Note how we've effectively pulled the caller-save setup into the inner
loop. Ugh. We could make caller-save setup faster, but as I mentioned,
I think I've already spent more time on this than I should.
In regards to the stack alignment bits. We're actually dependent on the
multiple alignments right now. So my desire to allocate the alignment
slot once clearly won't fly. Too bad, that was a definite savings in
space and time.....
For future reference, the multiple stack alignments occur when we spill
a pseudo to memory after aligning the stack. The spill allocates a new
stack slot and forces the big loop to iterate. If on that next
iteration the stack isn't properly aligned, then we align it again (and
again and again if we continue to have to spill pseudos to memory on
subsequent iterations).
Ideally, we'd align the stack once, after everything has been reloaded
and not iterate. The current structure of this code doesn't allow for
that possibility.
Jeff