This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Register Allocation issues with microblaze-elf


On 02/14/2013 06:31 PM, Vladimir Makarov wrote:
On 02/14/2013 03:46 AM, Michael Veksler wrote:
[snip]
I am reading this thread and getting more and more puzzled. The RA stuff is very complicated,
having many constraints and many dependencies with other passes. Taking this into
account, it seems that no heuristic algorithm can even get close to an optimal register
allocation. A heuristic algorithm can't take all effects and side-effects into account
simultaneously.


Considering all that, why GCC does not use generic optimization algorithms for RA?
A generic optimization can take all issues into account, simultaneously. I am talking
about ILP/MIP (Integer Linear Programming/Mixed IP), SAT and CSP [1] (Constraint
Satisfaction Problem). There has been a lot of progress in these areas, and the
solvers are much faster (orders of magnitude) than they were 10 years ago.


Actually, I tried ILP solver first time in 2003. I am returning to this about every 3 years and every time I fail to produce something useful (at least an acceptable in some cases solution which makes compiler only ten times slower). I've tried a lot of models. From more sophisticated ones to simpler ones, pure solvers or some hybrids of heuristics and ILP solutions. All solutions were too slow or generated not better code.
In many cases ILP is faster than CP (Constraint Programming), but there are quite a few cases when CP and SAT are much faster. Speed issues with modeling RA as ILP do not rule out neither SAT nor CP.

Last time, I had hope to use massive parallelism of GPUs for this. After investing of much my time to learning simplex algorithm theory (a base for ILP solvers), I found that simple tableu-based algorithm is parallelized very well but for RA tasks even using 1000 GPU elements is still behind revised simplex algorithm which is not parallelized. GPUs are not good for very sparse matrix problems which is the case of RA problems.
More and more it sounds like a CP or SAT.

Still some important RA problems (like rematerialization) is very hard to describe by ILP.
It sounds that your model requires many 0,1 variables. If it is so, then maybe SAT
is a better choice than ILP.

I don't see any progress, ILP solver may be ten times faster but they still have exponential complexity. I don't know any industrial compiler which uses exact solver for RA (RA in GCC is even more complicated as it requires code selection too).
But the solver need not be exact. ILP uses branch and bound, which can be
interrupted after some time giving the best solution so far. The best solution,
so far, could be good enough for RA. I have seen huge SAT instances solved in
a fraction of a second. These problems have a very clear structure that
the solver takes advantage of, but why the RA problem should be different?
So why ILP/SAT/CSP aren't used in RA? I don't think they will work much slower than
what RA does today ( they will be somewhat slower but not much). I do believe that
there is a good chance to get much better results from RA with these technologies.


You know, some LLVM guys had the same thoughts and as remember they had PBQP (I tried this too) based RA (they had too many different RAs) and now they have finished with just a good heursitics RA which works best.
I don't how PBQP works with RA, but if it uses branch & bound over integer variables
then it may have performance issues there.
I'd like to invest some time into a feasibility check, if someone is willing to work
with me on modeling the RA problem (or at least several examples, such as the above).


Good luck with that. I am serious.
I need some help to get a basic model, even a hand-written model for a given
input to RA just as example (before diving deeper into it). I'd be happy if you
or somebody else can guide me through. At least, where should I look in the
source of GCC to figure some of these things out, or where can I find some
documentation/papers/presentations on RA and its input data-structure
(and output requirements).


Michael


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]