This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Ping: patch to remove the old RA


Jeff Law wrote:
Vladimir Makarov wrote:

IRA has better communication with reload. Reload can assign hard register to pseudo spilled by reload and by *IRA*. IRA can advice reload to spill better pseudo and can say to share stack slots for pseudo and advice to do some other small optimizations. But even this code in reload is not so big (about 50 lines).


It would be nice to remove reload. I think it is possible but it is a bigger project than IRA.

I've been pondering what a world without reload, or at least one with a drastically different reload would look like. And to be honest I'm not getting too far.

One approach that has been tried with some initial success was to have, effectively, a pre-reload pass which IIRC ran before register allocation. However, it was also my understanding that this pass basically just moved when big hunks of reload
ran rather than actually eliminating the need for the spaghetti code known as reload. I'm certainly not opposed to a pre-reload pass, but I'd rather it be new, clean, code.


By removing reload I mean more clean spiller which is part of RA. It means more integration of spiller and RA. Classical RA spills pseudo (e.g. if insn constraints are not satisfied or the address displacement is out of range) and reload it for each its reference and does RA on modified representation again. It can slow down RA because 2-3 iterations usually needed (although 2nd and subsequent iterations have simpler conflict graphs). I don't know how much it can gives (I think it can give some improvement) but if it gives nothing, more clean code is still important. Iteration approach is not necessary, we could use approach patented by Andrew Mcleod when he was in IBM. It still provides a better spiller and RA integration that the current IRA+reload state.

Actually, I implemented something analogous in YARA project (it worked without reload) but only for x86/x86_64. I did not touch address displacement problem and a lot of others. YARA ignored a lot of machine-dependent macros used by reload. So I got the idea that it will take a lot of time to implement this approach for all targets. Therefore I wrote that removing the reload is bigger project than IRA.
There was talk of doing instruction selection prior to allocation at the summit a few years back. I never knew what happened to that idea. It's never been absolutely clear to me how this would work given in our register class based world, but if someone could walk me through how it was supposed to work/help it would be greatly appreciated.

I have some ideas which I'd like to try after I am done with IRA and live range shrinkage in insn-scheduling before RA. I'd like to try full code selection before RA. It means we know before RA what exact constraints for the operand will be used (actually Andrew Mcleod expressed this idea in his RABLE proposal). I don't know how productive it could be because I see that in some situation code selection at late stage (in reload) can be win (like x86 lea or add choosing) but it could make proposed spiller and RA more accurate. I think that is worth to try independently off result. I'd like to do it in combiner or more accurately in its replacement (based on modern minimal cost pattern covering which means machine description changes showing the cost for each possible constraint combination of insn but that is another story).
There was another initiative which attacked the horrid reload inheritance code and replaced it with a basic dependency graph. Ideally I'd like a reload pass which didn't need any of the reload inheritance stuff. Reload inheritance and the other optimizers are just making the problem that reload generates horrid code once we start spilling. If we're going to have reload inheritance, I'd certainly want a real dependency graph rather than the "time" stuff we do now.

I am absolutely agree that reload inheritance complicates reload too much. It should be an probably could be done even globally as an independent optimization.

Some have stated that they want to see reload go away. While I'd like that too, I'll state again that I'd be happy with a vastly simpler reload, preferably one that doesn't reload too often for common cases and as a result we don't have to work so damn hard to optimize the code it creates.


I suspect the first thing we need to do is get a reasonable idea of what triggers reloads these days:

I have some guesses after working a lot on IRA last few years.
Are we typically dealing with constraint mismatches within register classes?

It is not a rare event for some irregular file architectures like x86.
Is it due to operands living in memory and needing to be moved into registers?

That is the most frequent case especially for x86/x86_64.
Is it secondary reloads because of out of range addresses?

It is a rare even for most architectures. But there are a few nasty architectures (as sh or mcore) with tiny displacements where it is very important problem.
Secondary memory?


I don't think it is a frequent event.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]