This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[ira] More comments and a change in cfgloop reg pressure estimation for IRA
- From: Vladimir Makarov <vmakarov at redhat dot com>
- To: gcc-patches <gcc-patches at gcc dot gnu dot org>
- Date: Mon, 07 Apr 2008 20:39:09 -0400
- Subject: [ira] More comments and a change in cfgloop reg pressure estimation for IRA
This patch adds IRA overview and changes estimation of register
pressure cost for IRA in cfgloopanal.c.
2008-04-07 Vladimir Makarov <vmakarov@redhat.com>
* cfgloopanal.c (estimate_reg_pressure_cost): Decrease register
pressure cost for IRA regional allocation.
* ira.c: Add IRA overview.
Index: cfgloopanal.c
===================================================================
--- cfgloopanal.c (revision 133840)
+++ cfgloopanal.c (working copy)
@@ -373,25 +373,33 @@ init_set_costs (void)
unsigned
estimate_reg_pressure_cost (unsigned n_new, unsigned n_old)
{
+ unsigned cost;
unsigned regs_needed = n_new + n_old;
- if (flag_ira && (flag_ira_algorithm == IRA_ALGORITHM_REGIONAL
- || flag_ira_algorithm == IRA_ALGORITHM_MIXED)
- && number_of_loops () <= (unsigned) IRA_MAX_LOOPS_NUM)
- /* Let IRA itself to deal with high register pressure. */
- return 0;
-
/* If we have enough registers, we should use them and not restrict
the transformations unnecessarily. */
if (regs_needed + target_res_regs <= target_avail_regs)
return 0;
- /* If we are close to running out of registers, try to preserve them. */
if (regs_needed <= target_avail_regs)
- return target_reg_cost * n_new;
-
- /* If we run out of registers, it is very expensive to add another one. */
- return target_spill_cost * n_new;
+ /* If we are close to running out of registers, try to preserve
+ them. */
+ cost = target_reg_cost * n_new;
+ else
+ /* If we run out of registers, it is very expensive to add another
+ one. */
+ cost = target_spill_cost * n_new;
+
+ if (flag_ira && (flag_ira_algorithm == IRA_ALGORITHM_REGIONAL
+ || flag_ira_algorithm == IRA_ALGORITHM_MIXED)
+ && number_of_loops () <= (unsigned) IRA_MAX_LOOPS_NUM)
+ /* IRA regional allocation deals with high register pressure
+ better. So decrease the cost (to do more accurate the cost
+ calculation for IRA, we need to know how many registers lives
+ through the loop transparently). */
+ cost /= 2;
+
+ return cost;
}
/* Sets EDGE_LOOP_EXIT flag for all loop exits. */
Index: ira.c
===================================================================
--- ira.c (revision 133968)
+++ ira.c (working copy)
@@ -19,38 +19,259 @@ You should have received a copy of the G
along with GCC; see the file COPYING3. If not see
<http://www.gnu.org/licenses/>. */
-/* The integrated register allocator (IRA) is called integrated
- because register coalescing, register live range splitting, and
- choosing a better hard register are done on-the-fly during
- coloring. Register coalescing and choosing a cheaper hard register
- is done by hard register preferencing during hard register
- assigning. The live range splitting is a byproduct of the regional
- register allocation.
-
- The regional allocation is top-down process. The first we do
- allocation for all function then we improve it for loops then their
- subloops and so on. To reduce register shuffling, the same
- mechanism of hard register preferencing is used. This approach
- works as good as Callahan-Koblentz algorithm but it is simpler. We
- use Chaitin-Briggs coloring for each region (loop or all function).
- If pseudo-registers got different location on loop borders we
- rename them inside the loop and generate pseudo-register move
- insns. Some optimizations (like removing redundant stores, moving
- register shuffling to less frequent points, and code duplication
- reducing) to minimize effect of register shuffling are done.
-
- If we don't improve register allocation for loops (or there are no
- loops at all) we get classic Chaitin-Briggs coloring (only instead
- of separate pass of coalescing, we use hard register preferencing).
- In any case, before the reload work, we have one region IRA
- internal representation.
-
- IRA also has better integration with the reload pass than the old
- register allocator. Pseudo-registers spilled by IRA or the reload
- have still a chance to get hard-registers when the reload evicts
- some pseudo-registers from hard-registers. IRA helps to choose
- better pseudos for spilling based on their live ranges and to
- coalesce stack slots allocated for the spilled pseudo-registers.
+/* The integrated register allocator (IRA) is a
+ regional register allocator performing graph coloring on a top-down
+ traversal of nested regions. Graph coloring in a region is based
+ on Chaitin-Briggs algorithm. It is called integrated because
+ register coalescing, register live range splitting, and choosing a
+ better hard register are done on-the-fly during coloring. Register
+ coalescing and choosing a cheaper hard register is done by hard
+ register preferencing during hard register assigning. The live
+ range splitting is a byproduct of the regional register allocation.
+
+ Major IRA notions are:
+
+ o *Region* is a part of CFG where graph coloring based on
+ Chaitin-Briggs algorithm is done. IRA can work on any set of
+ nested CFG regions forming a tree. Currently the regions are
+ the entire function for the root region and natural loops for
+ the other regions. Therefore data structure represnting a
+ region is called loop_tree_node.
+
+ o *Cover class* is a register class belonging to a set of
+ non-intersecting register classes containing all of the
+ hard-registers available for register allocation. The set of
+ all cover classes for a target is defined in the corresponding
+ machine-description file according some criteria. Such notion
+ is needed because Chaitin-Briggs algorithm works on
+ non-intersected register classes.
+
+ o *Allocno* represents the live range of a pseudo-register in a
+ region. Besides the obvious attributes like the corresponding
+ pseudo-register number, cover class, conflicting allocnos and
+ conflicting hard-registers, there are a few allocno attributes
+ which are important for understanding the allocation algorithm:
+
+ - *Live ranges*. This is a list of ranges of *program points*
+ where the allocno lives. Program points represent places
+ where a pseudo can be born or become dead (there are
+ approximately two times more program points than the insns)
+ and they are represented by integers starting with 0. The
+ live ranges are used to find conflicts between allocnos of
+ different cover classes. They also play very important role
+ for transformation of IRA internal representation for several
+ regions into one region representation. The later is used
+ during the reload pass work because each allocno represents
+ all corresponding pseudo-register.
+
+ - *Hard-register costs*. This is a vector of size equal to the
+ number of available hard-registers of the allocno's cover
+ class. The cost of a callee-clobbered hard-register for an
+ allocno is increased by the cost of save/restore code around
+ the calls through the given allocno's life. If the allocno
+ is a move instruction operand and another operand is a
+ hard-register of the allocno's cover class, the cost of the
+ hard-register is decreased by the move cost.
+
+ When an allocno is assigned, the hard-register with minimal
+ full cost is used. Initially, a hard-register's full cost is
+ the corresponding value from the hard-register's cost vector.
+ If the allocno is connected by a *copy* (see below) to
+ another allocno which has just received a hard-register, the
+ cost of the hard-register is decreased. Before choosing a
+ hard-register for an allocno, the allocno's current costs of
+ the hard-registers are modified by the conflict hard-register
+ costs of all of the conflicting allocnos which are not
+ assigned yet.
+
+ - *Conflict hard-register costs*. This is a vector of the same
+ size as the hard-register costs vector. To permit an
+ unassigned allocno to get a better hard-register, IRA uses
+ this vector to calculate the final full cost of the
+ available hard-registers. Conflict hard-register costs of an
+ unassigned allocno are also changed with a change of the
+ hard-register cost of the allocno when a copy involving the
+ allocno is processed as described above. This is done to
+ show other unassigned allocnos that a given allocno prefers
+ some hard-registers in order to remove the move instruction
+ corresponding to the copy.
+
+ o *Cap*. If a pseudo-register does not live in a region but
+ lives in a nested region, IRA creates a special allocno called
+ a cap in the outer region. A region cap is also created for a
+ subregion cap.
+
+ o *Copy*. Allocnos can be connected by copies. Copies are used
+ to modify hard-register costs for allocnos during coloring.
+ Such modifications reflects a preference to use the same
+ hard-register for the allocnos connected by copies. Usually
+ copies are created for move insns (in this case it results in
+ register coalescing). But IRA also creates copies for operands
+ of an insn which should be assigned to the same hard-register
+ due to constraints in the machine description (it usually
+ results in removing a move generated in the reload to satfisfy
+ the constraints) and copies referring to the allocno which is
+ the output operand of an instruction and the allocno which is
+ an input operand dying in the instruction (creation of such
+ copies results in less register shuffling). IRA *does not*
+ create copies between the same register allocnos from different
+ regions because we use another technique for propagating
+ hard-register preference on the borders of regions.
+
+ Allocnos (including caps) for the upper region in the region tree
+ *accumulate* information important for coloring from allocnos with
+ the same pseudo-register from nested regions. This includes
+ hard-register and memory costs, conflicts with hard-registers,
+ allocno conflicts, allocno copies and more. *Thus, attributes for
+ allocnos in a region have the same values as if the region had no
+ subregions*. It means that attributes for allocnos in the
+ outermost region corresponding to the function have the same values
+ as though the allocation used only one region which is the entire
+ function. It also means that we can look at IRA work as if the
+ first IRA did allocation for all function then it improved the
+ allocation for loops then their subloops and so on.
+
+ IRA major passes are:
+
+ o Building IRA internal representation which consists of the
+ following subpasses:
+
+ * First, IRA builds regions and creates allocnos (file
+ ira-build.c) and initializes most of their attributes.
+
+ * Then IRA finds a cover class for each allocno and calculates
+ its initial cost of memory and each hard-register of its
+ cover class (file ira-cost.c).
+
+ * IRA creates all caps (file ira-build.c).
+
+ * IRA creates live ranges of each allocno, calulates register
+ pressure for each cover class in each region, sets up
+ conflict hard registers for each allocno and call insns the
+ allocno lives through (file ira-lives.c).
+
+ * Having live-ranges of allocnos and their cover classes, IRA
+ creates conflicting allocnos of the same cover class for each
+ allocno. Conflicting allocnos are stored as a bit vector or
+ array of pointers to the conflicting allocnos whatever is more
+ profitable (file ira-conflicts.c). At this point IRA creates
+ allocno copies and after that accumulates allocno info
+ (conflicts, copies, call insns lived through) in upper level
+ allocnos from lower levels ones.
+
+ * Having all conflicts and other info for regular allocnos, IRA
+ propagates this info to caps (file ira-build.c) and modifies
+ costs of callee-clobbered hard-registers (file ira-costs.c).
+
+ o Coloring. Now IRA has all neccessary info to start graph coloring
+ process. It is done in each region on top-down traverse of the
+ region tree (file ira-color.c). There are following subpasses:
+
+ * Optional agressive coaleascing of allocnos in the region.
+
+ * Putting allocnos onto the coloring stack. IRA uses Briggs
+ optimistic coloring which is a major improvement over
+ Chaitin's coloring. Therefore IRa does not spill allocnos at
+ this point. There is some freedom in order of putting
+ allocnos on the stack which can affect the final result of
+ the allocation. IRA uses some heuristics to improve the order.
+
+ * Popping the allocnos from the stack and assigning them hard
+ registers. If IRA can not assign a hard register to an
+ allocno and allocno is coalesced, IRA undoes coalescing and
+ put the uncoalesced allocnos onto the satck in hope that some
+ such allocnos will get a hard register separetly. If IRA
+ fails to assign hard register or memory is more profitable
+ for it, IRA spills the allocno. IRA assigns allocno the
+ hard-register with minimal full allocation cost which
+ reflects the cost of usage of the hard-register for the
+ allocno and cost of usage of the hard-register for allocnos
+ conflicting with given allocno.
+
+ * After allono assigning in the region, IRA modifies hard
+ register and memory costs for the corresponding allocnos in
+ the subregions to reflect the cost of possible loads, stores,
+ or moves on the border of the region and its subregions.
+ When default regional allocation algorithm is used
+ (-fira-algorithm=mixed), IRA just propagates the assignment
+ for allocnos if the register pressure in the region for the
+ corresponding cover class is less than number of available
+ hard registers for given cover class.
+
+ o Spill/restore code moving. When IRA performs allocation
+ traversing regions in top-down order, it does not know what
+ happens below in the region tree. Therefore, sometimes IRA
+ misses opportunities to perform a better allocation. A simple
+ ooptimization tries to improve allocation in a region having
+ subregions and containing in another region. If the
+ corresponding allocnos in the subregion are spilled, it spills
+ the region allocno if it is profitable. The optimization
+ impelements a simple iterative algorithm performing profitable
+ transformations while they are still possible. It is fast in
+ practice, so there is no real need for a better time complexity
+ algorithm.
+
+ o Code change. After coloring, two allocnos representing the same
+ pseudo-register outside and inside a region respectively may be
+ assigned to different locations (hard-registers or memory). In
+ this case IRA creates and uses a new pseudo-register inside the
+ region and add code to move allocno values on the region's
+ borders. This is done during top-down traversal of the regions
+ (file ira-emit.c). In some complicated cases IRA can create a
+ new allocno to move allocno values (e.g. when a swap of values
+ stored in two hard-registers is needed). At this stage, the
+ new allocno marked as spilled. IRA still creates the
+ pseudo-register and the moves on the region borders even when
+ both allocnos were assigned to the same hard-register. If the
+ reload pass spills a pseudo-register for some reason, the
+ effect will be smaller because another allocno will still be in
+ the hard-register. In most cases, this is better then spilling
+ both allocnos. If the reload does not change the allocation
+ for the two pseudo-registers, the trivial move will be removed
+ by post-reload optimizations. IRA does not generate moves for
+ allocnos assigned to the same hard register when the default
+ regional allocation algorithm is used and the register pressure
+ in the region for the corresponding allocno cover class is less
+ than number of available hard registers for given cover class.
+ IRA also does some optimizations to remove redundant stores and
+ to reduce code duplication on the region borders.
+
+ o Flattening internal representation. After changing code, IRA
+ transforms its internal representation for several regions into
+ one region representation (file ira-build.c). This process is
+ called IR flattening. Such process is more complicated than IR
+ rebuilding would be, but is much faster.
+
+ o After IR flattening, IRA tries to assign hard registers to all
+ spilled allocnos. It is impelemented by a simple and fast
+ priority coloring algorithm (see function
+ reassign_conflict_allocnos::ira-color.c). Here new allocnos
+ created during the code change pass can be assigned to hard
+ registers.
+
+ o At the end IRA calls the reload pass. The reload pass
+ communicates with IRA through several functions in file
+ ira-color.c to improve its decisions in
+
+ * sharing stack slots for the spilled pseudos based on IRA info
+ about pseudo-register conflicts.
+
+ * reassigning hard-registers to all spilled pseudos at the end
+ of each reload iteration.
+
+ * choosing a better hard-register to spill based on IRA info
+ about pseudo-register live ranges and the register pressure
+ in places where the pseudo-register lives.
+
+ IRA uses a lot of data representing the target processors. These
+ data are initilized in file ira.c.
+
+ If function has no loops (or the loops are ignored when
+ -fira-algorithm=CB is used), we have classic Chaitin-Briggs
+ coloring (only instead of separate pass of coalescing, we use hard
+ register preferencing). In such case, IRA works much faster
+ because many things are not made (like IR flattening, the
+ spill/restore optimization, and the code change).
Literature is worth to read for better understanding the code:
@@ -69,6 +290,9 @@ along with GCC; see the file COPYING3.
o Vladimir Makarov. The Integrated Register Allocator for GCC.
+ o Vladimir Makarov. The top-down register allocator for irregular
+ register file architectures.
+
*/