Register Allocation issues with microblaze-elf

Thu Feb 14 19:41:00 GMT 2013

On 02/14/2013 06:31 PM, Vladimir Makarov wrote:
> On 02/14/2013 03:46 AM, Michael Veksler wrote:
[snip]
>> I am reading this thread and getting more and more puzzled. The RA 
>> stuff is very complicated,
>> having many constraints and many dependencies with other passes. 
>> Taking this into
>> account, it seems that no heuristic algorithm can even get close to 
>> an optimal register
>> allocation. A heuristic algorithm can't take all effects and 
>> side-effects into account
>> simultaneously.
>>
>> Considering all that, why GCC does not use generic optimization 
>> algorithms for RA?
>> A generic optimization can take all issues into account, 
>> simultaneously. I am talking
>> about ILP/MIP (Integer Linear Programming/Mixed IP), SAT and CSP [1] 
>> (Constraint
>> Satisfaction Problem). There has been a lot of progress in these 
>> areas, and the
>> solvers are much faster (orders of magnitude) than they were 10 years 
>> ago.
>>
> Actually, I tried ILP solver first time in 2003.  I am returning to 
> this about every 3 years and every time I fail to produce something 
> useful (at least an acceptable in some cases solution which makes 
> compiler only ten times slower).  I've tried a lot of models.  From 
> more sophisticated ones to simpler ones, pure solvers or some hybrids 
> of heuristics and ILP solutions.  All solutions were too slow or 
> generated not better code.
In many cases ILP is faster than CP (Constraint Programming), but there 
are quite a few cases when CP and SAT are much faster. Speed issues with 
modeling RA as ILP do not rule out neither SAT nor CP.
>
> Last time, I had hope to use massive parallelism of GPUs for this. 
> After investing of much my time to learning simplex algorithm theory 
> (a base for ILP solvers), I found that simple tableu-based algorithm 
> is parallelized very well but for RA tasks even using 1000 GPU 
> elements is still behind revised simplex algorithm which is not 
> parallelized.  GPUs are not good for very sparse matrix problems which 
> is the case of RA problems.
More and more it sounds like a CP or SAT.
>
> Still some important RA problems (like rematerialization) is very hard 
> to describe by ILP.
It sounds that your model requires many 0,1 variables. If it is so, then 
maybe SAT
is a better choice than ILP.
>
> I don't see any progress, ILP solver may be ten times faster but they 
> still have exponential complexity.  I don't know any industrial 
> compiler which uses exact solver for RA (RA in GCC is even more 
> complicated as it requires code selection too).
But the solver need not be exact. ILP uses branch and bound, which can be
interrupted after some time giving the best solution so far. The best 
solution,
so far, could be good enough for RA. I have seen huge SAT instances 
solved in
a fraction of a second. These problems have a very clear structure that
the solver takes advantage of,  but why the RA problem should be different?
>> So why ILP/SAT/CSP aren't used in RA? I don't think they will work 
>> much slower than
>> what RA does today ( they will be somewhat slower but not much). I do 
>> believe that
>> there is a good chance to get much better results from RA with these 
>> technologies.
>>
> You know, some LLVM guys had the same thoughts and as remember they 
> had PBQP (I tried this too) based RA (they had too many different RAs) 
> and now they have finished with just a good heursitics RA which works 
> best.
I don't how PBQP works with RA, but if it uses branch & bound over 
integer variables
then it may have performance issues there.
>> I'd like to invest some time into a feasibility check, if someone is 
>> willing to work
>> with me on modeling the RA problem (or at least several examples, 
>> such as the above).
>>
> Good luck with that.  I am serious.
I need some help to get a basic model, even a hand-written model for a given
input to RA just as example (before diving deeper into it). I'd be happy 
if you
or somebody else can guide me through. At least, where should I look in the
source of GCC to figure some of these things out, or where can I find some
documentation/papers/presentations on RA and its input data-structure
(and output requirements).

Michael