This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Register Allocation issues with microblaze-elf

On 02/14/2013 02:41 PM, Michael Veksler wrote:


It sounds that your model requires many 0,1 variables. If it is so, then maybe SAT
is a better choice than ILP.

Yes, it is binary ILP.

I don't see any progress, ILP solver may be ten times faster but they still have exponential complexity. I don't know any industrial compiler which uses exact solver for RA (RA in GCC is even more complicated as it requires code selection too).
But the solver need not be exact. ILP uses branch and bound, which can be
interrupted after some time giving the best solution so far. The best solution,
so far, could be good enough for RA. I have seen huge SAT instances solved in
a fraction of a second. These problems have a very clear structure that
the solver takes advantage of, but why the RA problem should be different?
As I wrote, I used this approach too. By the way, I used GLPK as it is pretty good to affect its the solving process (when to stop, what to consider first, in what order to consider sub-problems and so on). Although it might be not the fastest open source MILP solver.

So why ILP/SAT/CSP aren't used in RA? I don't think they will work much slower than
what RA does today ( they will be somewhat slower but not much). I do believe that
there is a good chance to get much better results from RA with these technologies.

You know, some LLVM guys had the same thoughts and as remember they had PBQP (I tried this too) based RA (they had too many different RAs) and now they have finished with just a good heursitics RA which works best.
I don't how PBQP works with RA, but if it uses branch & bound over integer variables
then it may have performance issues there.
As I remember correctly a person who invented this, try to promote it in LLVM.
I'd like to invest some time into a feasibility check, if someone is willing to work
with me on modeling the RA problem (or at least several examples, such as the above).

Good luck with that. I am serious.
I need some help to get a basic model, even a hand-written model for a given
input to RA just as example (before diving deeper into it). I'd be happy if you
or somebody else can guide me through. At least, where should I look in the
source of GCC to figure some of these things out, or where can I find some
documentation/papers/presentations on RA and its input data-structure
(and output requirements).

There are tons of articles about optimal RA (less for optimal insn scheduling).

You could start with Goodwin/Wilken or Appel/George ILP approach. Add choosing insn alternatives (0/1 var for each alternative of each insn and constraints for using only alternative for each insn). That is specific for GCC as GCC partially does code selection in RA. That will define possible sets of registers or memory for pseudos which are operands of the insn. You can add cost for each alternative (as RTL insn alternatives can represent different insns with different execution time) to cost function. If you use Appel/George approach, add constraints and cost function part for coalescing and rematerialization. If you want to go further you could add scheduling problem too because RA and insn scheduling have conflicting goals.

That will be probably most general base problem. Now you can simplify this model in different ways. But don't simplify it to much as ,for example, just solving register allocation (assignment) with ILP has no sense as graph coloring heuristics based on Kempe's criterion of trivial colorability solves this problem optimally probably in 99% cases.

Please don't forget that even optimal solution can work worse then heuristics one as ins execution frequency evaluation is approximate. Some heuristics work very well with uncertain or approximate execution frequencies. Profile based compilation in some cases is not possible or very inconvenient. I saw other cases when optimal solution even with profile based execution frequencies generates worse cade as the model probably does not take generated memory foot-prints into account.

The following articles (presentations) could help you to understand GCC RA:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]