This is the mail archive of the
mailing list for the GCC project.
RE: Live Range Splitting in Integrated Register Allocator
- From: Ajit Kumar Agarwal <ajit dot kumar dot agarwal at xilinx dot com>
- To: Vladimir Makarov <vmakarov at redhat dot com>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
- Cc: Michael Eager <eager at eagercon dot com>, Vinod Kathail <vinodk at xilinx dot com>, Vidhumouli Hunsigida <vidhum at xilinx dot com>, Nagaraju Mekala <nmekala at xilinx dot com>
- Date: Thu, 15 May 2014 07:28:08 +0000
- Subject: RE: Live Range Splitting in Integrated Register Allocator
- Authentication-results: sourceware.org; auth=none
- Authentication-results: spf=pass (sender IP is 126.96.36.199) smtp dot mailfrom=ajit dot kumar dot agarwal at xilinx dot com;
- References: <0c7a1ace-d040-4c6f-a66a-298b44ea89aa at BN1BFFO11FD057 dot protection dot gbl> <53743A50 dot 1060207 at redhat dot com>
On 2014-05-14, 1:33 PM, Ajit Kumar Agarwal wrote:
> Hello All:
> I am planning to implement the Live range splitting based on the following cases in the Integrated Register Allocator.
> For a given Live range that spans from from outer region to inner region of the loop. Such Live ranges which are LiveIn at the entry of the header of the Loop and Live Out at the exit of the loop but there are no references inside the Loop. Such Live ranges lead to unoptimal spill and fetch inside the Loop conflicting with the shorter live ranges that spans inside the Loop.
> Lets say such Live range as L1. L1 can be splitted at the Loop Boundary splitting the Live range by making a store at the header of the Loop and the Load at the exit of the Loop. This makes the Live range less conflicting with the Live ranges that are local to the Loop regions reducing the spill and Fetch inside the Loops.
> From the code and documentation of Integrated Register Allocator following is the understanding.
> As Live range L1 is live in the outer region but as there are no reference inside the Loop region. Since the allocno for L1 for a given variable v is assigned two allocno v1 and v2 . V1 being assigned allocno for the outer region and v2 as allocno for the inner Loop region. This allows to accumulate the information from the inner loop region to outer region.
> Will the current Integrated Register Allocator will consider the Live range L1 as Live inside the Loop and outer region? If Yes then there will be conflicting with the Live ranges that are local to the Loop region leading to spill and fetch inside the Loop. If the v1 and v2 allocno are created v1 for the outer region and v2 for the inner region then there will v2 will be conflicting the local live ranges inside the Loop region and v1 will be conflicting with the Live ranges of the outer regions. This is how its been considered as Live range splitting at the Loop Boundary for the Live range that spans inside the Loop but not not being referenced?
> If Such cases are not being considered in the Integrated Register Allocator, then it will be useful to implement such cases in IRA which will be benefitted the microblaze target.
> Please let me know what do you think.
>>Allocno v2 corresponding to live range inside the loop has a very small cost for spilling therefore it will be spilled if we still need registers to pseudos local to the loop. If allocno v1 >>corresponding live ranges outside the loop *and* inside the loop gets a hard register, we will have live range splitting as you propose. So I do not see a necessity for the optimization >>you propose.
If the allocno v1 does n't get the hard register and become the candidate of spill which is Live that spans the Loop but not reference inside the Loop, it will lead to spills inside the loop which become bottleneck as performance is concerned. Does this case also handled in the IRA? With my proposal the load and store will be moved outside the loop instead of having it inside the Loop for such case.
>>Moreover my experience shows that making a lot of explicit transformations (e.g. proposed splitting) even if we have transformations to undo them (e.g. coalescing) results in worse >>code.
>>The explicit transformations should be as minimal as possible during RA in a good register allocator. So I guess the optimization you propose will actually results in a worse code. >>Although I might be wrong because it is always hard to predict the result of heuristic optimizations.
There is a paper on Improved Passive Splitting which does take care of Splitting at Loop boundaries for the Live range that spans the Loop but not referenced.
Improved Passive Splitting (2005) by Keith D. Cooper , Jason Eckhardt. This paper uses the heuristics and claim to reduce the spill and fetch for the case I propose. Also claims to generate the better code.
>>What is really missed in RA, it is a good splitting on BB boundaries. I have plans to try this as a part of more common pass to decrease register pressure on which I'll start work soon.
>>In any way thanks for the proposal. You are free try it to confirm or reject my prediction. Unfortunately, that is the only way to be sure about the result.
Thanks & Regards