This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Ping: patch to remove the old RA
- From: Vladimir Makarov <vmakarov at redhat dot com>
- To: Jeff Law <law at redhat dot com>
- Cc: Kenneth Zadeck <zadeck at naturalbridge dot com>, gcc-patches <gcc-patches at gcc dot gnu dot org>, Jakub Jelinek <jakub at redhat dot com>
- Date: Tue, 03 Feb 2009 16:57:59 -0500
- Subject: Re: Ping: patch to remove the old RA
- References: <4980D6C6.1040504@redhat.com> <4980E1DD.4070005@naturalbridge.com> <498112BC.50802@redhat.com> <4988752E.8000900@redhat.com>
Jeff Law wrote:
Vladimir Makarov wrote:
IRA has better communication with reload. Reload can assign hard
register to pseudo spilled by reload and by *IRA*. IRA can advice
reload to spill better pseudo and can say to share stack slots for
pseudo and advice to do some other small optimizations. But even
this code in reload is not so big (about 50 lines).
It would be nice to remove reload. I think it is possible but it is
a bigger project than IRA.
I've been pondering what a world without reload, or at least one with
a drastically different reload would look like. And to be honest I'm
not getting too far.
One approach that has been tried with some initial success was to
have, effectively, a pre-reload pass which IIRC ran before register
allocation. However, it was also my understanding that this pass
basically just moved when big hunks of reload
ran rather than actually eliminating the need for the spaghetti code
known as reload. I'm certainly not opposed to a pre-reload pass, but
I'd rather it be new, clean, code.
By removing reload I mean more clean spiller which is part of RA. It
means more integration of spiller and RA. Classical RA spills pseudo
(e.g. if insn constraints are not satisfied or the address displacement
is out of range) and reload it for each its reference and does RA on
modified representation again. It can slow down RA because 2-3
iterations usually needed (although 2nd and subsequent iterations have
simpler conflict graphs). I don't know how much it can gives (I think it
can give some improvement) but if it gives nothing, more clean code is
still important. Iteration approach is not necessary, we could use
approach patented by Andrew Mcleod when he was in IBM. It still
provides a better spiller and RA integration that the current IRA+reload
state.
Actually, I implemented something analogous in YARA project (it worked
without reload) but only for x86/x86_64. I did not touch address
displacement problem and a lot of others. YARA ignored a lot of
machine-dependent macros used by reload. So I got the idea that it will
take a lot of time to implement this approach for all targets.
Therefore I wrote that removing the reload is bigger project than IRA.
There was talk of doing instruction selection prior to allocation at
the summit a few years back. I never knew what happened to that
idea. It's never been absolutely clear to me how this would work
given in our register class based world, but if someone could walk me
through how it was supposed to work/help it would be greatly appreciated.
I have some ideas which I'd like to try after I am done with IRA and
live range shrinkage in insn-scheduling before RA. I'd like to try full
code selection before RA. It means we know before RA what exact
constraints for the operand will be used (actually Andrew Mcleod
expressed this idea in his RABLE proposal). I don't know how productive
it could be because I see that in some situation code selection at late
stage (in reload) can be win (like x86 lea or add choosing) but it could
make proposed spiller and RA more accurate. I think that is worth to
try independently off result. I'd like to do it in combiner or more
accurately in its replacement (based on modern minimal cost pattern
covering which means machine description changes showing the cost for
each possible constraint combination of insn but that is another story).
There was another initiative which attacked the horrid reload
inheritance code and replaced it with a basic dependency graph.
Ideally I'd like a reload pass which didn't need any of the reload
inheritance stuff. Reload inheritance and the other optimizers are
just making the problem that reload generates horrid code once we
start spilling. If we're going to have reload inheritance, I'd
certainly want a real dependency graph rather than the "time" stuff we
do now.
I am absolutely agree that reload inheritance complicates reload too
much. It should be an probably could be done even globally as an
independent optimization.
Some have stated that they want to see reload go away. While I'd like
that too, I'll state again that I'd be happy with a vastly simpler
reload, preferably one that doesn't reload too often for common cases
and as a result we don't have to work so damn hard to optimize the
code it creates.
I suspect the first thing we need to do is get a reasonable idea of
what triggers reloads these days:
I have some guesses after working a lot on IRA last few years.
Are we typically dealing with constraint mismatches within register
classes?
It is not a rare event for some irregular file architectures like x86.
Is it due to operands living in memory and needing to be moved into
registers?
That is the most frequent case especially for x86/x86_64.
Is it secondary reloads because of out of range addresses?
It is a rare even for most architectures. But there are a few nasty
architectures (as sh or mcore) with tiny displacements where it is very
important problem.
Secondary memory?
I don't think it is a frequent event.