This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Live Range Splitting in Integrated Register Allocator


On 05/15/2014 03:28 AM, Ajit Kumar Agarwal wrote:
>
> On 2014-05-14, 1:33 PM, Ajit Kumar Agarwal wrote:
>
>> Hello All:
>>
>> I am planning to implement the Live range splitting based on the following cases in the Integrated Register Allocator.
>>
>> For a given Live range that spans from  from outer region to inner region of the loop. Such Live ranges which are LiveIn at the entry of the header of the Loop and Live Out at the exit of the loop but there are no references inside the  Loop. Such Live ranges lead to unoptimal spill and fetch inside the Loop conflicting with the shorter live ranges that spans inside the Loop.
>>
>> Lets say such Live range as L1. L1 can be splitted at the Loop Boundary splitting the Live range by making a store at the header of the Loop and the Load at the exit of the Loop. This makes the Live range less conflicting with the Live ranges that are local to the Loop regions reducing the spill and Fetch inside the Loops.
>>
>>  From the code and documentation of Integrated Register Allocator following is the understanding.
>>
>> As Live range L1 is live in the outer region but as there are no reference inside the Loop region. Since the allocno for L1 for a given variable v is assigned two allocno v1 and v2 . V1 being assigned allocno for the outer region and v2 as allocno for the inner Loop region. This allows to accumulate the information from the inner loop region to outer region.
>>
>> Will the current Integrated Register Allocator will consider the Live range L1 as Live inside the Loop and outer region? If Yes then there will be conflicting with the Live ranges that are local to the Loop region leading to spill and fetch inside the Loop.  If the v1 and v2 allocno are created v1 for the outer region and v2 for the inner region then there will v2 will be conflicting the local live ranges inside the Loop region and v1 will be conflicting with the Live ranges of the outer regions.  This is how its been considered as Live range splitting at the Loop Boundary for the Live range that spans inside the Loop but not not being referenced?
>>
>> If Such cases are not being considered in the Integrated Register Allocator, then it will be useful to implement such cases in IRA which will be benefitted the microblaze target.
>>
>> Please let me know what do you think.
>>
>>> Allocno v2 corresponding to live range inside the loop has a very small cost for spilling therefore it will be spilled if we still need registers to pseudos local to the loop.  If allocno v1 >>corresponding live ranges outside the loop *and* inside the loop gets a hard register, we will have live range splitting as you propose.  So I do not see a necessity for the optimization >>you propose.
> If the allocno v1 does n't get the hard register and become the candidate of spill which is Live that spans the Loop but not reference inside the Loop, it will lead to spills inside the loop which become bottleneck as performance is concerned. Does this case also handled in the IRA? With my proposal the load and store will be moved outside the loop instead of having it inside the Loop for such case.
>
If v1 does not get a hard register (and v2 most probably too), than
there will be no any additional insns because v1 and v2 most probably
gets the same memory.  If v1 gets a hard register and v2 does not, then
we have store on the loop enter and loads on the loop exits (the
store/loads are not inside the loop). 
>>> Moreover my experience shows that making a lot of explicit transformations (e.g. proposed splitting) even if we have transformations to undo them (e.g. coalescing) results in worse >>code. 
>>> The explicit transformations should be as minimal as possible during RA in a good register allocator.  So I guess the optimization you propose will actually results in a worse code.  >>Although I might be wrong because it is always hard to predict the result of heuristic optimizations.
> There is a paper on Improved Passive Splitting which does take care of Splitting at Loop boundaries for the Live range that spans the Loop but not referenced.
>
> Improved Passive Splitting (2005) by Keith D. Cooper , Jason Eckhardt. This paper uses the heuristics and claim to reduce the spill and fetch for the case I propose. Also claims to generate the better code.
>
I read this article long ago (actually I worked with Jason for a few
years until he left us to Rice University post-graduated school). 
Usually heuristic optimizations (including RA) are very sensitive to
compiler environment and overall design.  So the effect of algorithm
could be quite different in different compiler environment.  Another
analogous approach is Peter Begner's one.  The both is based on
classical Chaitin-Briggs RA good for regular file architectures.  IRA is
different.  It is a regional RA (analogous to Callahan-Koblentz).  IRA
was also designed to deal with irregular register files and as any
regional RA deals with splitting on region borders by their original
design.

You can read an article
http://vmakarov.fedorapeople.org/vmakarov-submission-cgo2008.pdf which
describes IRA design.  Although this article is old and does not contain
info how IRA handles with irregular files, it is still a good reading to
understand IRA.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]