This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Reducing Register Pressure based on Instruction Scheduling and Register Allocator!!

Hello All:

I was looking further the aspect of reducing register pressure based on Register Allocation and Instruction Scheduling and the
Following observation being made on reducing register pressure based on the existing papers on reducing register pressure 
Based on scheduling approach.

Does the following aspect of reducing register pressure is already been implemented in GCC?

Minimum register usage through better Instruction scheduling

Instruction scheduling play an important role increase or decrease of register pressure. If the instruction scheduler is done before 
register allocation the register pressure will increase whereas if the register allocation is done before instruction scheduling the false
antidependence will be created. For the out of order superscalar process the scheduling will be done in such a way that register
pressure will be decreased. To achieve ILP the register pressure increases but in superscalar processor the register pressure is
important. To achieve the decrease of register pressure the instruction scheduling for ILP should also consider the register pressure.

Govindarajan proposed a scheme where the list scheduling is modified to have one of the constraint's as available register
along with latency to perform the instruction scheduling. From the data dependency graph of the given basic block the
chains are created based on depth first search to form different chains. And one register is allocated for each chain in order
to have a baton process where the def is used in next instruction and the next instruction def  will be used in next to next 
instruction. A given register is assigned to def and use and the same register is used for next def and use, which makes it
to have one register for each chain, The main challenges is to assign the same register for the different chains if the chains
don't overlap.

The list scheduling is modified to have a parameter as list of available colors and release is the release of available colors.
>From the ready queue the nodes in the DAG is removed the live ranges that terminate at that node is released and as
added as available colors. This ensures the register pressure will not increase by the schedule which was derived from 
Graph coloring register allocator.

Register pressure sensitivity based Instruction scheduling

The superscalar OOO processor does the dynamic instruction scheduling based on the out of order on instruction window 
and register renaming to achieve ILP. The main challenges is to achieve a ILP schedule and from the ILP schedule, it generates 
the instruction sequence with sensitivity to registers is called linearization. The ILP schedule groups are formed. The groups are,

1. If the instruction selected is s all the instruction that are scheduled before s are formed with one group.

2. All the instruction that are scheduled after s are formed with another group.

The linearization algorithm uses can test and must can test. The can test if the selected instruction is s , then all the instruction 
that are formed with Group1 and are not scheduled but in the ready list are generated with W-1. If the selected instruction is 
's' is generated at index i  ,all the instruction are scheduled before s are generated with i+W-1 where W is the size of register
 window. The must can test if the selected instruction is generated at index i, then all the instruction that are scheduled 
after s are generated with index i+W-1. If the instruction scheduled after s are not in window the ILP won't be achieved.

The linearization algorithm checks for can test if it satisfies then the instruction is generated. If not satisfied it will be 
checked with must can test and if it satisfies it will be generated.

The register pressure should not be increased during linearization as there is a chance of register pressure getting increased 
during scheduling. To achieve this, the priority is assigned for each instruction that are in the ready list. If the instruction 
selected all the use of the variables in the instruction are scheduled the priority is assigned as 2. if an instruction is dependent
 on i and i is dependent on another instruction the Live range is too long and the priority is less and assigned to be 1. In the
 ready list it selects the instruction with high priority which ensures the register pressure doesn't increases.

There are approaches where the register pressure is high in the basic block the instruction scheduler will be phased after
 Register allocation, otherwise the instruction scheduler is done before the register allocator. This is how in gcc there is
 an instruction scheduler before register allocator and after the register allocator.

Integrated Code Scheduling and Register Allocation with minimum register pressure

The integrated code scheduling where there is a prepass and post pass instruction scheduler. The prepass scheduler
increase the overuse of the registers and thus increase the register pressure. Whereas the post pass scheduling is
done after the register allocation and the overuse of register is less and thus perform schedule with reduced spill
and fetch. The prepass instruction scheduler achieves more parallelism and has high degree of parallelism and very
much useful when the register pressure is less and if the register pressure is high the post pass scheduling is often 
used. HSU, proposed the integrated scheduling where the code scheduling with ILP is CSP and CSR is the scheduling 
with minimum registers usage.

>From the leader set, the nodes in the DAG which doesn't have predecessors and the ready queue are the queues with
nodes whose predecessors has been scheduled. From the ready set or the leader is selected and the AVAILREG is greater
than threshold then a node from the ready set is selected and the scheduling done is CSP. When the registers become 
high with the live registers are less than threshold then the CSR Scheduling is done. The nodes from the ready set either 
generates a live register or frees the registers. In the CSR approach the nodes from the ready which frees the atmost registers
are selected. If there are more than one nodes which frees the registers, the nodes with highest cumulative cost is selected.

Global liveness analysis is performed for Liveness of the variables whereas the Reference counting is used to determine
when the variables are dead and can be freed. For some architectures the scheduling with greater pipeline depth and
higher degree of parallelism is more important with more interlock dependency than the spill and fetch. In that
cases if the register pressure is too high and the dependence node with high interlock dependencies, then one of the 
register is spilled and the thread continues with the CSP which schedules the independent nodes between the nodes 
with high interlock dependencies.

Please let me know what do you think.

Thanks & Regards

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]