This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[ira] patch solving IRA slowness in -O0 mode
- From: Vladimir Makarov <vmakarov at redhat dot com>
- To: gcc-patches <gcc-patches at gcc dot gnu dot org>
- Date: Fri, 02 May 2008 16:31:42 -0400
- Subject: [ira] patch solving IRA slowness in -O0 mode
Hi, this patch solves a problem of IRA slowness for -O0 which was
reported a week ago. 40-60% of slowdown in comparison with the old
register allocator was reported for all_cp2k_gfortran.f90.
Although previous patch compressing conflict bit table decreased the
difference to 13%, it was still unacceptable. This patch brings IRA
the same speed as the old RA has: all_cp2k_gfortran.f90 compilation
using IRA takes 97.93 sec user and 121.07 wall time in comparison with
98.96 and 121.90 using the old RA in -O0 mode on my machine (3Ghz Core
2).
To achieve this speed, IRA uses a RA algorithm without building
conflict graph. It only uses potential conflict info which was
introduced for compressing conflict bit tables in optimization mode.
The generated code size for -O0 is approximatelly the same (the
difference is about 0.1% in #insns) which tells me that the IRA
algorithm used for -O0 is about the same quality as local allocator of
the old RA.
The patch was bootrapped (with -O0 -fira -g) successfully on x86_64
and tortured on x86_64 in 64 and 32 bit mode.
2008-05-02 Vladimir Makarov <vmakarov@redhat.com>
* ira-conflicts.c (print_conflicts): Exclude non-allocatable hard
regs and hard regs from different cover classes.
(ira_build_conflicts): Build conflicts only for optimization mode.
* cfgloopanal.c (estimate_reg_pressure_cost): Decrease cost only
for optimization mode.
* caller-save.c (caller-save.c, calculate_local_save_info,
save_call_clobbered_regs): Do save/restore placement optimization
and sharing stack slots only for optimization mode.
* ira-int.h (set_allocno_cover_class, ira_fast_allocation): New
prototypes.
* ira-color.c (assign_hard_reg): Clear conflicting_regs.
(allocno_assign_compare_func, allocno_assign_compare_func): New
functions.
* global.c (build_insn_chain): Add spilled pseudos only in
optimization mode.
* ira-emit.c (modify_move_list): Use set_allocno_cover_class.
* alias.c (nonoverlapping_memrefs_p): Check addresses only in
optimization mode.
* ira-build.c (create_allocno): Initialize
ALLOCNO_CONFLICT_HARD_REGS and ALLOCNO_TOTAL_CONFLICT_HARD_REGS by
non-allocatable hard reg set.
(set_allocno_cover_class): New function.
(create_cap_allocno): Use set_allocno_cover_class.
(ira_build): Call create_loop_tree_node_caps,
propagate_info_to_loop_tree_node_caps, and
propagate_info_to_loop_tree_node_caps only in optimization mode.
* ira.c (setup_reg_renumber): Add optimize to ira_assert.
(ira): Call find_reg_equiv_invariant_const, ira_color,
sort_insn_chain, and fix_reg_equiv_init only in optimization mode.
Call ira_fast_allocation for -O0. Use right argument for reload.
* ira-costs.c (find_allocno_class_costs): Use important_classes
only for optimization mode.
(setup_allocno_cover_class_and_costs): Use
set_allocno_cover_class. Initialize cost vectors only in
optimization mode. Call process_bb_node_for_hard_reg_moves only
in optimization mode.
* reload1.c (compute_use_by_pseudos): Add optimize to ira_assert.
(reload): Sort pseudo-registers only optimization mode. Restore
original order for insn chain only in optimization mode.
(calculate_needs_all_insns): Call mark_memory_move_deletion only
in optimization mode.
(count_pseudo, count_spilled_pseudo): Check spilled pseudos only
in optimization mode.
(alter_reg): Share stack slots only in optimization mode.
(finish_spills): Check spilled pseudos only in optimization mode.
(emit_input_reload_insns, delete_output_reload): Call
mark_allocation_change only in optimization mode.
Index: ira-conflicts.c
===================================================================
--- ira-conflicts.c (revision 134857)
+++ ira-conflicts.c (working copy)
@@ -864,6 +864,7 @@ print_conflicts (FILE *file, int reg_p)
{
allocno_t a;
allocno_iterator ai;
+ HARD_REG_SET conflicting_hard_regs;
FOR_EACH_ALLOCNO (a, ai)
{
@@ -899,10 +900,20 @@ print_conflicts (FILE *file, int reg_p)
ALLOCNO_LOOP_TREE_NODE (conflict_a)->loop->num);
}
}
+ COPY_HARD_REG_SET (conflicting_hard_regs,
+ ALLOCNO_TOTAL_CONFLICT_HARD_REGS (a));
+ AND_COMPL_HARD_REG_SET (conflicting_hard_regs, no_alloc_regs);
+ AND_HARD_REG_SET (conflicting_hard_regs,
+ reg_class_contents[ALLOCNO_COVER_CLASS (a)]);
print_hard_reg_set (file, "\n;; total conflict hard regs:",
- ALLOCNO_TOTAL_CONFLICT_HARD_REGS (a));
+ conflicting_hard_regs);
+ COPY_HARD_REG_SET (conflicting_hard_regs,
+ ALLOCNO_CONFLICT_HARD_REGS (a));
+ AND_COMPL_HARD_REG_SET (conflicting_hard_regs, no_alloc_regs);
+ AND_HARD_REG_SET (conflicting_hard_regs,
+ reg_class_contents[ALLOCNO_COVER_CLASS (a)]);
print_hard_reg_set (file, ";; conflict hard regs:",
- ALLOCNO_CONFLICT_HARD_REGS (a));
+ conflicting_hard_regs);
}
fprintf (file, "\n");
}
@@ -925,14 +936,17 @@ ira_build_conflicts (void)
allocno_t a;
allocno_iterator ai;
- build_conflict_bit_table ();
- traverse_loop_tree (FALSE, ira_loop_tree_root, NULL, add_copies);
- if (flag_ira_algorithm == IRA_ALGORITHM_REGIONAL
- || flag_ira_algorithm == IRA_ALGORITHM_MIXED)
- propagate_info ();
- /* We need finished conflict table for the subsequent call. */
- remove_conflict_allocno_copies ();
- build_allocno_conflicts ();
+ if (optimize)
+ {
+ build_conflict_bit_table ();
+ traverse_loop_tree (FALSE, ira_loop_tree_root, NULL, add_copies);
+ if (flag_ira_algorithm == IRA_ALGORITHM_REGIONAL
+ || flag_ira_algorithm == IRA_ALGORITHM_MIXED)
+ propagate_info ();
+ /* We need finished conflict table for the subsequent call. */
+ remove_conflict_allocno_copies ();
+ build_allocno_conflicts ();
+ }
FOR_EACH_ALLOCNO (a, ai)
{
if (ALLOCNO_CALLS_CROSSED_NUM (a) == 0)
@@ -954,8 +968,11 @@ ira_build_conflicts (void)
no_caller_save_reg_set);
}
}
- traverse_loop_tree (FALSE, ira_loop_tree_root, NULL,
- propagate_modified_regnos);
- if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL)
- print_conflicts (ira_dump_file, FALSE);
+ if (optimize)
+ {
+ traverse_loop_tree (FALSE, ira_loop_tree_root, NULL,
+ propagate_modified_regnos);
+ if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL)
+ print_conflicts (ira_dump_file, FALSE);
+ }
}
Index: cfgloopanal.c
===================================================================
--- cfgloopanal.c (revision 134857)
+++ cfgloopanal.c (working copy)
@@ -390,8 +390,8 @@ estimate_reg_pressure_cost (unsigned n_n
one. */
cost = target_spill_cost * n_new;
- if (flag_ira && (flag_ira_algorithm == IRA_ALGORITHM_REGIONAL
- || flag_ira_algorithm == IRA_ALGORITHM_MIXED)
+ if (optimize && flag_ira && (flag_ira_algorithm == IRA_ALGORITHM_REGIONAL
+ || flag_ira_algorithm == IRA_ALGORITHM_MIXED)
&& number_of_loops () <= (unsigned) IRA_MAX_LOOPS_NUM)
/* IRA regional allocation deals with high register pressure
better. So decrease the cost (to do more accurate the cost
Index: caller-save.c
===================================================================
--- caller-save.c (revision 134857)
+++ caller-save.c (working copy)
@@ -457,7 +457,7 @@ setup_save_areas (void)
unsigned int regno = reg_renumber[i];
unsigned int endregno
= end_hard_regno (GET_MODE (regno_reg_rtx[i]), regno);
- if (flag_ira && flag_ira_ipra)
+ if (flag_ira && optimize && flag_ira_ipra)
{
HARD_REG_SET clobbered_regs;
@@ -472,7 +472,7 @@ setup_save_areas (void)
SET_HARD_REG_BIT (hard_regs_used, r);
}
- if (flag_ira && flag_ira_share_save_slots)
+ if (flag_ira && optimize && flag_ira_share_save_slots)
{
rtx insn, slot;
struct insn_chain *chain, *next;
@@ -857,7 +857,7 @@ calculate_local_save_info (void)
/* Remember live_throughout can contain spilled
registers when IRA is used. */
- if (flag_ira && r < 0)
+ if (flag_ira && optimize && r < 0)
continue;
gcc_assert (r >= 0);
nregs = hard_regno_nregs[r][PSEUDO_REGNO_MODE (regno)];
@@ -1203,7 +1203,7 @@ save_call_clobbered_regs (void)
struct insn_chain *chain, *next;
enum machine_mode save_mode[FIRST_PSEUDO_REGISTER];
- if (flag_ira && flag_ira_move_spills)
+ if (flag_ira && optimize && flag_ira_move_spills)
{
/* Do global analysis for better placement of spill code. */
alloc_aux_for_blocks (sizeof (struct bb_info));
@@ -1248,7 +1248,7 @@ save_call_clobbered_regs (void)
regno += insert_restore (chain, 1, regno, MOVE_MAX_WORDS,
save_mode);
- if (flag_ira && flag_ira_move_spills)
+ if (flag_ira && optimize && flag_ira_move_spills)
{
gcc_assert (before == regno);
save_mode[before] = VOIDmode;
@@ -1291,7 +1291,7 @@ save_call_clobbered_regs (void)
/* Remember live_throughout can contain spilled
registers when IRA is used. */
- if (flag_ira && r < 0)
+ if (flag_ira && optimize && r < 0)
continue;
gcc_assert (r >= 0);
nregs = hard_regno_nregs[r][PSEUDO_REGNO_MODE (regno)];
@@ -1343,7 +1343,7 @@ save_call_clobbered_regs (void)
remain saved. If the last insn in the block is a JUMP_INSN, put
the restore before the insn, otherwise, put it after the insn. */
- if (flag_ira && flag_ira_move_spills)
+ if (flag_ira && optimize && flag_ira_move_spills)
set_hard_reg_saved (BB_INFO_BY_INDEX (chain->block)->save_here,
BB_INFO_BY_INDEX (chain->block)->save_out_mode,
save_mode);
@@ -1356,21 +1356,22 @@ save_call_clobbered_regs (void)
regno += insert_restore (chain, JUMP_P (insn),
regno, MOVE_MAX_WORDS, save_mode);
- if (flag_ira && flag_ira_move_spills)
+ if (flag_ira && optimize && flag_ira_move_spills)
{
gcc_assert (before == regno);
save_mode[before] = VOIDmode;
}
}
- if (flag_ira && flag_ira_move_spills && next_bb_info != NULL)
+ if (flag_ira && optimize
+ && flag_ira_move_spills && next_bb_info != NULL)
set_hard_reg_saved (next_bb_info->save_in,
next_bb_info->save_in_mode, save_mode);
}
}
- if (flag_ira && flag_ira_move_spills)
+ if (flag_ira && optimize && flag_ira_move_spills)
free_aux_for_blocks ();
}
Index: ira-int.h
===================================================================
--- ira-int.h (revision 134857)
+++ ira-int.h (working copy)
@@ -301,8 +301,9 @@ struct allocno
allocnos. */
int conflict_allocnos_num;
/* Initial and accumulated hard registers conflicting with this
- allocno and as a consequences can not be assigned to the
- allocno. */
+ allocno and as a consequences can not be assigned to the allocno.
+ All non-allocatable hard regs and hard regs of cover classes
+ different from given allocno one are included in the sets. */
HARD_REG_SET conflict_hard_regs, total_conflict_hard_regs;
/* Accumulated frequency of calls which given allocno
intersects. */
@@ -835,6 +836,7 @@ extern void traverse_loop_tree (int, loo
void (*) (loop_tree_node_t),
void (*) (loop_tree_node_t));
extern allocno_t create_allocno (int, int, loop_tree_node_t);
+extern void set_allocno_cover_class (allocno_t, enum reg_class);
extern int conflict_vector_profitable_p (allocno_t, int);
extern void allocate_allocno_conflict_vec (allocno_t, int);
extern void allocate_allocno_conflicts (allocno_t, int);
@@ -887,6 +889,7 @@ extern void reassign_conflict_allocnos (
extern void initiate_ira_assign (void);
extern void finish_ira_assign (void);
extern void ira_color (void);
+extern void ira_fast_allocation (void);
/* ira-emit.c */
extern void ira_emit (int);
Index: ira-color.c
===================================================================
--- ira-color.c (revision 134857)
+++ ira-color.c (working copy)
@@ -95,6 +95,8 @@ static int pseudo_reg_compare (const voi
static int calculate_spill_cost (int *, rtx, rtx, rtx,
int *, int *, int *, int*);
+static int allocno_assign_compare_func (const void *, const void *);
+
/* Bitmap of allocnos which should be colored. */
static bitmap coloring_allocno_bitmap;
@@ -300,8 +302,7 @@ assign_hard_reg (allocno_t allocno, int
cover_class = ALLOCNO_COVER_CLASS (allocno);
class_size = class_hard_regs_num[cover_class];
mode = ALLOCNO_MODE (allocno);
- COPY_HARD_REG_SET (conflicting_regs, no_alloc_regs);
- IOR_COMPL_HARD_REG_SET (conflicting_regs, reg_class_contents[cover_class]);
+ CLEAR_HARD_REG_SET (conflicting_regs);
best_hard_regno = -1;
memset (full_costs, 0, sizeof (int) * class_size);
mem_cost = 0;
@@ -2738,3 +2739,105 @@ ira_color (void)
VEC_free (allocno_t, heap, allocno_stack_vec);
move_spill_restore ();
}
+
+
+
+/* This page contains a simple register allocator without usage of
+ allocno conflicts. This is used for fast allocation for -O0. */
+
+/* The function is used to sort allocnos according to their priority
+ for assigning. */
+static int
+allocno_assign_compare_func (const void *v1p, const void *v2p)
+{
+ allocno_t p1 = *(const allocno_t *) v1p, p2 = *(const allocno_t *) v2p;
+ int c1, c2, l1, l2, s1, s2, pri1, pri2;
+
+ c1 = ALLOCNO_MEMORY_COST (p1) - ALLOCNO_COVER_CLASS_COST (p1);
+ c2 = ALLOCNO_MEMORY_COST (p2) - ALLOCNO_COVER_CLASS_COST (p2);
+ l1 = (ALLOCNO_MAX (p1) <= ALLOCNO_MIN (p1)
+ ? 1 : ALLOCNO_MAX (p1) - ALLOCNO_MIN (p1));
+ l2 = (ALLOCNO_MAX (p2) <= ALLOCNO_MIN (p2)
+ ? 1 : ALLOCNO_MAX (p2) - ALLOCNO_MIN (p2));
+ s1 = reg_class_nregs [ALLOCNO_COVER_CLASS (p1)][ALLOCNO_MODE (p1)];
+ s2 = reg_class_nregs [ALLOCNO_COVER_CLASS (p2)][ALLOCNO_MODE (p2)];
+
+ /* Note that the quotient will never be bigger than the value of
+ floor_log2 times the maximum number of times a register can occur
+ in one insn (surely less than 100) weighted by the frequency
+ (maximally REG_FREQ_MAX). Multiplying this by 10000/REG_FREQ_MAX
+ can't overflow. */
+ pri1 = (((double) (floor_log2 (ALLOCNO_NREFS (p1)) * c1) / l1)
+ * (10000 / REG_FREQ_MAX) * s1);
+ pri2 = (((double) (floor_log2 (ALLOCNO_NREFS (p2)) * c2) / l2)
+ * (10000 / REG_FREQ_MAX) * s2);
+ if (pri2 - pri1)
+ return pri2 - pri1;
+ /* If regs are equally good, sort by allocno numbers, so that the
+ results of qsort leave nothing to chance. */
+ return ALLOCNO_NUM (p1) - ALLOCNO_NUM (p2);
+}
+
+/* The function does register allocation not using allocno conflicts.
+ It uses only potential conflicts (see comments for attributes
+ ALLOCNO_MIN and ALLOCNO_MAX). The algorithm is close to Chow's
+ priority coloring. */
+void
+ira_fast_allocation (void)
+{
+ int i, j, k, id, class_size, no_stack_reg_p, hard_regno;
+ enum reg_class cover_class;
+ enum machine_mode mode;
+ allocno_t a;
+ HARD_REG_SET *conflict_hard_regs;
+
+ /* Make map: allocno conflict id -> conflict hard regs for better
+ cache locality. */
+ conflict_hard_regs = ira_allocate (sizeof (HARD_REG_SET) * allocnos_num);
+ for (i = 0; i < allocnos_num; i++)
+ {
+ a = conflict_id_allocno_map [i];
+ COPY_HARD_REG_SET (conflict_hard_regs[i],
+ ALLOCNO_CONFLICT_HARD_REGS (a));
+ }
+ sorted_allocnos = ira_allocate (sizeof (allocno_t) * allocnos_num);
+ memcpy (sorted_allocnos, allocnos, sizeof (allocno_t) * allocnos_num);
+ qsort (sorted_allocnos, allocnos_num, sizeof (allocno_t),
+ allocno_assign_compare_func);
+ for (i = 0; i < allocnos_num; i++)
+ {
+ a = sorted_allocnos[i];
+ id = ALLOCNO_CONFLICT_ID (a);
+ cover_class = ALLOCNO_COVER_CLASS (a);
+ ALLOCNO_ASSIGNED_P (a) = TRUE;
+ ALLOCNO_HARD_REGNO (a) = -1;
+ if (hard_reg_set_subset_p (reg_class_contents[cover_class],
+ conflict_hard_regs[id]))
+ continue;
+ mode = ALLOCNO_MODE (a);
+ no_stack_reg_p = ALLOCNO_NO_STACK_REG_P (a);
+ class_size = class_hard_regs_num[cover_class];
+ for (j = 0; j < class_size; j++)
+ {
+ hard_regno = class_hard_regs[cover_class][j];
+#ifdef STACK_REGS
+ if (no_stack_reg_p && FIRST_STACK_REG <= hard_regno
+ && hard_regno <= LAST_STACK_REG)
+ continue;
+#endif
+ if (!hard_reg_not_in_set_p (hard_regno, mode, conflict_hard_regs[id])
+ || (TEST_HARD_REG_BIT
+ (prohibited_class_mode_regs[cover_class][mode], hard_regno)))
+ continue;
+ ALLOCNO_HARD_REGNO (a) = hard_regno;
+ for (k = ALLOCNO_MIN (a); k <= ALLOCNO_MAX (a); k++)
+ IOR_HARD_REG_SET (conflict_hard_regs[k],
+ reg_mode_hard_regset[hard_regno][mode]);
+ break;
+ }
+ }
+ ira_free (sorted_allocnos);
+ ira_free (conflict_hard_regs);
+ if (internal_flag_ira_verbose > 1 && ira_dump_file != NULL)
+ print_disposition (ira_dump_file);
+}
Index: global.c
===================================================================
--- global.c (revision 134857)
+++ global.c (working copy)
@@ -1455,7 +1455,7 @@ build_insn_chain (void)
/* Consider spilled pseudos too for IRA because they still
have a chance to get hard-registers in the reload when
IRA is used. */
- if (reg_renumber[i] >= 0 || flag_ira)
+ if (reg_renumber[i] >= 0 || (flag_ira && optimize))
bitmap_set_bit (live_relevant_regs, i);
}
@@ -1496,12 +1496,14 @@ build_insn_chain (void)
because they still have a chance to get
hard-registers in the reload when IRA is
used. */
- else if (reg_renumber[regno] >= 0 || flag_ira)
+ else if (reg_renumber[regno] >= 0
+ || (flag_ira && optimize))
bitmap_set_bit (&c->dead_or_set, regno);
}
if ((regno < FIRST_PSEUDO_REGISTER
- || reg_renumber[regno] >= 0 || flag_ira)
+ || reg_renumber[regno] >= 0
+ || (flag_ira && optimize))
&& (!DF_REF_FLAGS_IS_SET (def, DF_REF_CONDITIONAL)))
{
rtx reg = DF_REF_REG (def);
@@ -1601,7 +1603,8 @@ build_insn_chain (void)
because they still have a chance to get
hard-registers in the reload when IRA is
used. */
- else if (reg_renumber[regno] >= 0 || flag_ira)
+ else if (reg_renumber[regno] >= 0
+ || (flag_ira && optimize))
bitmap_set_bit (&c->dead_or_set, regno);
}
@@ -1610,7 +1613,8 @@ build_insn_chain (void)
because they still have a chance to get
hard-registers in the reload when IRA is
used. */
- || reg_renumber[regno] >= 0 || flag_ira)
+ || reg_renumber[regno] >= 0
+ || (flag_ira && optimize))
{
if (GET_CODE (reg) == SUBREG
&& !DF_REF_FLAGS_IS_SET (use,
Index: ira-emit.c
===================================================================
--- ira-emit.c (revision 134857)
+++ ira-emit.c (working copy)
@@ -661,8 +661,8 @@ modify_move_list (move_t list)
= create_allocno (ALLOCNO_REGNO (set_move->to), FALSE,
ALLOCNO_LOOP_TREE_NODE (set_move->to));
ALLOCNO_MODE (new_allocno) = ALLOCNO_MODE (set_move->to);
- ALLOCNO_COVER_CLASS (new_allocno)
- = ALLOCNO_COVER_CLASS (set_move->to);
+ set_allocno_cover_class (new_allocno,
+ ALLOCNO_COVER_CLASS (set_move->to));
ALLOCNO_ASSIGNED_P (new_allocno) = TRUE;
ALLOCNO_HARD_REGNO (new_allocno) = -1;
ALLOCNO_REG (new_allocno)
Index: alias.c
===================================================================
--- alias.c (revision 134857)
+++ alias.c (working copy)
@@ -2013,7 +2013,7 @@ nonoverlapping_memrefs_p (const_rtx x, c
rtx moffsetx, moffsety;
HOST_WIDE_INT offsetx = 0, offsety = 0, sizex, sizey, tem;
- if (flag_ira && reload_completed)
+ if (flag_ira && optimize && reload_completed)
{
/* We need this code for IRA because of stack slot sharing. RTL
in decl can be different than RTL used in insns. It is a
Index: ira-build.c
===================================================================
--- ira-build.c (revision 134857)
+++ ira-build.c (working copy)
@@ -542,8 +542,8 @@ create_allocno (int regno, int cap_p, lo
ALLOCNO_NUM (a) = allocnos_num;
ALLOCNO_CONFLICT_ALLOCNO_ARRAY (a) = NULL;
ALLOCNO_CONFLICT_ALLOCNOS_NUM (a) = 0;
- CLEAR_HARD_REG_SET (ALLOCNO_CONFLICT_HARD_REGS (a));
- CLEAR_HARD_REG_SET (ALLOCNO_TOTAL_CONFLICT_HARD_REGS (a));
+ COPY_HARD_REG_SET (ALLOCNO_CONFLICT_HARD_REGS (a), no_alloc_regs);
+ COPY_HARD_REG_SET (ALLOCNO_TOTAL_CONFLICT_HARD_REGS (a), no_alloc_regs);
ALLOCNO_NREFS (a) = 0;
ALLOCNO_FREQ (a) = 1;
ALLOCNO_HARD_REGNO (a) = -1;
@@ -592,6 +592,17 @@ create_allocno (int regno, int cap_p, lo
return a;
}
+/* Set up cover class for A and update its conflict hard registers. */
+void
+set_allocno_cover_class (allocno_t a, enum reg_class cover_class)
+{
+ ALLOCNO_COVER_CLASS (a) = cover_class;
+ IOR_COMPL_HARD_REG_SET (ALLOCNO_CONFLICT_HARD_REGS (a),
+ reg_class_contents[cover_class]);
+ IOR_COMPL_HARD_REG_SET (ALLOCNO_TOTAL_CONFLICT_HARD_REGS (a),
+ reg_class_contents[cover_class]);
+}
+
/* The function returns TRUE if conflict vector with NUM elements is
more profitable than conflict bit vector for A. */
int
@@ -918,7 +929,7 @@ create_cap_allocno (allocno_t a)
father = ALLOCNO_LOOP_TREE_NODE (a)->father;
cap = create_allocno (ALLOCNO_REGNO (a), TRUE, father);
ALLOCNO_MODE (cap) = ALLOCNO_MODE (a);
- ALLOCNO_COVER_CLASS (cap) = ALLOCNO_COVER_CLASS (a);
+ set_allocno_cover_class (cap, ALLOCNO_COVER_CLASS (a));
ALLOCNO_AVAILABLE_REGS_NUM (cap) = ALLOCNO_AVAILABLE_REGS_NUM (a);
ALLOCNO_CAP_MEMBER (cap) = a;
bitmap_set_bit (father->mentioned_allocnos, ALLOCNO_NUM (cap));
@@ -2330,8 +2341,9 @@ ira_flattening (int max_regno_before_emi
}
}
/* Change allocnos regno, conflicting allocnos, and range allocnos. */
- temp_change_bit_vec = ira_allocate (((allocnos_num + INT_BITS - 1) / INT_BITS)
- * sizeof (INT_TYPE));
+ temp_change_bit_vec
+ = ira_allocate (((allocnos_num + INT_BITS - 1) / INT_BITS)
+ * sizeof (INT_TYPE));
FOR_EACH_ALLOCNO (a, ai)
{
if (a != regno_top_level_allocno_map[REGNO (ALLOCNO_REG (a))]
@@ -2459,8 +2471,8 @@ ira_build (int loops_p)
form_loop_tree ();
create_allocnos ();
ira_costs ();
- if (flag_ira_algorithm == IRA_ALGORITHM_REGIONAL
- || flag_ira_algorithm == IRA_ALGORITHM_MIXED)
+ if (optimize && (flag_ira_algorithm == IRA_ALGORITHM_REGIONAL
+ || flag_ira_algorithm == IRA_ALGORITHM_MIXED))
{
local_allocnos_bitmap = ira_allocate_bitmap ();
traverse_loop_tree (FALSE, ira_loop_tree_root, NULL,
@@ -2493,18 +2505,21 @@ ira_build (int loops_p)
" allocnos=%d, copies=%d, conflicts=%d, ranges=%d\n",
allocnos_num, copies_num, n, nr);
}
- if (flag_ira_algorithm == IRA_ALGORITHM_REGIONAL
- || flag_ira_algorithm == IRA_ALGORITHM_MIXED)
- traverse_loop_tree (FALSE, ira_loop_tree_root, NULL,
- propagate_info_to_loop_tree_node_caps);
- tune_allocno_costs_and_cover_classes ();
- if (flag_ira_algorithm == IRA_ALGORITHM_REGIONAL
- || flag_ira_algorithm == IRA_ALGORITHM_MIXED)
- {
- for (i = 0; VEC_iterate (loop_p, ira_loops.larray, i, loop); i++)
- if (ira_loop_nodes[i].regno_allocno_map != NULL
- && ira_loop_tree_root != &ira_loop_nodes[i])
- return TRUE;
+ if (optimize)
+ {
+ if (flag_ira_algorithm == IRA_ALGORITHM_REGIONAL
+ || flag_ira_algorithm == IRA_ALGORITHM_MIXED)
+ traverse_loop_tree (FALSE, ira_loop_tree_root, NULL,
+ propagate_info_to_loop_tree_node_caps);
+ tune_allocno_costs_and_cover_classes ();
+ if (flag_ira_algorithm == IRA_ALGORITHM_REGIONAL
+ || flag_ira_algorithm == IRA_ALGORITHM_MIXED)
+ {
+ for (i = 0; VEC_iterate (loop_p, ira_loops.larray, i, loop); i++)
+ if (ira_loop_nodes[i].regno_allocno_map != NULL
+ && ira_loop_tree_root != &ira_loop_nodes[i])
+ return TRUE;
+ }
}
return FALSE;
}
Index: ira.c
===================================================================
--- ira.c (revision 134857)
+++ ira.c (working copy)
@@ -1425,7 +1425,7 @@ setup_reg_renumber (void)
&& ! hard_reg_not_in_set_p (hard_regno, ALLOCNO_MODE (a),
call_used_reg_set))
{
- ira_assert (flag_caller_saves || regno >= reg_equiv_len
+ ira_assert (!optimize || flag_caller_saves || regno >= reg_equiv_len
|| reg_equiv_const[regno]
|| reg_equiv_invariant_p[regno]);
caller_save_needed = 1;
@@ -1559,7 +1559,6 @@ fix_reg_equiv_init (void)
int i, new_regno;
rtx x, prev, next, insn, set;
-
if (reg_equiv_init_size < max_regno)
{
reg_equiv_init = ggc_realloc (reg_equiv_init, max_regno * sizeof (rtx));
@@ -1800,90 +1799,101 @@ ira (FILE *f)
rebuild_p = update_equiv_regs ();
regstat_free_n_sets_and_refs ();
regstat_free_ri ();
-
+
#ifndef IRA_NO_OBSTACK
gcc_obstack_init (&ira_obstack);
#endif
bitmap_obstack_initialize (&ira_bitmap_obstack);
-
- max_regno = max_reg_num ();
- reg_equiv_len = max_regno;
- reg_equiv_invariant_p = ira_allocate (max_regno * sizeof (int));
- memset (reg_equiv_invariant_p, 0, max_regno * sizeof (int));
- reg_equiv_const = ira_allocate (max_regno * sizeof (rtx));
- memset (reg_equiv_const, 0, max_regno * sizeof (rtx));
- find_reg_equiv_invariant_const ();
- if (rebuild_p)
- {
- timevar_push (TV_JUMP);
- rebuild_jump_labels (get_insns ());
- purge_all_dead_edges ();
- timevar_pop (TV_JUMP);
+ if (optimize)
+ {
+ max_regno = max_reg_num ();
+ reg_equiv_len = max_regno;
+ reg_equiv_invariant_p = ira_allocate (max_regno * sizeof (int));
+ memset (reg_equiv_invariant_p, 0, max_regno * sizeof (int));
+ reg_equiv_const = ira_allocate (max_regno * sizeof (rtx));
+ memset (reg_equiv_const, 0, max_regno * sizeof (rtx));
+ find_reg_equiv_invariant_const ();
+ if (rebuild_p)
+ {
+ timevar_push (TV_JUMP);
+ rebuild_jump_labels (get_insns ());
+ purge_all_dead_edges ();
+ timevar_pop (TV_JUMP);
+ }
}
+
max_regno_before_ira = allocated_reg_info_size = max_reg_num ();
allocate_reg_info ();
setup_eliminable_regset ();
-
+
overall_cost = reg_cost = mem_cost = 0;
load_cost = store_cost = shuffle_cost = 0;
move_loops_num = additional_jumps_num = 0;
-
+
ira_assert (current_loops == NULL);
flow_loops_find (&ira_loops);
current_loops = &ira_loops;
saved_flag_ira_algorithm = flag_ira_algorithm;
- if (number_of_loops () > (unsigned) IRA_MAX_LOOPS_NUM)
+ if (optimize && number_of_loops () > (unsigned) IRA_MAX_LOOPS_NUM)
flag_ira_algorithm = IRA_ALGORITHM_CB;
-
+
if (internal_flag_ira_verbose > 0 && ira_dump_file != NULL)
fprintf (ira_dump_file, "Building IRA IR\n");
- loops_p = ira_build (flag_ira_algorithm == IRA_ALGORITHM_REGIONAL
- || flag_ira_algorithm == IRA_ALGORITHM_MIXED);
- ira_color ();
-
+ loops_p = ira_build (optimize
+ && (flag_ira_algorithm == IRA_ALGORITHM_REGIONAL
+ || flag_ira_algorithm == IRA_ALGORITHM_MIXED));
+ if (optimize)
+ ira_color ();
+ else
+ ira_fast_allocation ();
+
max_point_before_emit = max_point;
-
+
ira_emit (loops_p);
-
- max_regno = max_reg_num ();
- if (! loops_p)
- initiate_ira_assign ();
- else
+ if (optimize)
{
- expand_reg_info (allocated_reg_info_size);
- allocated_reg_info_size = max_regno;
-
- if (internal_flag_ira_verbose > 0 && ira_dump_file != NULL)
- fprintf (ira_dump_file, "Flattening IR\n");
- ira_flattening (max_regno_before_ira, max_point_before_emit);
- /* New insns were generated: add notes and recalculate live
- info. */
- df_analyze ();
-
- {
- basic_block bb;
-
- FOR_ALL_BB (bb)
- bb->loop_father = NULL;
- current_loops = NULL;
- }
-
- setup_allocno_assignment_flags ();
- initiate_ira_assign ();
- reassign_conflict_allocnos (max_regno);
+ max_regno = max_reg_num ();
+
+ if (! loops_p)
+ initiate_ira_assign ();
+ else
+ {
+ expand_reg_info (allocated_reg_info_size);
+ allocated_reg_info_size = max_regno;
+
+ if (internal_flag_ira_verbose > 0 && ira_dump_file != NULL)
+ fprintf (ira_dump_file, "Flattening IR\n");
+ ira_flattening (max_regno_before_ira, max_point_before_emit);
+ /* New insns were generated: add notes and recalculate live
+ info. */
+ df_analyze ();
+
+ {
+ basic_block bb;
+
+ FOR_ALL_BB (bb)
+ bb->loop_father = NULL;
+ current_loops = NULL;
+ }
+
+ setup_allocno_assignment_flags ();
+ initiate_ira_assign ();
+ reassign_conflict_allocnos (max_regno);
+ }
}
setup_reg_renumber ();
-
+
calculate_allocation_cost ();
-
+
#ifdef ENABLE_IRA_CHECKING
- check_allocation ();
+ if (optimize)
+ check_allocation ();
#endif
-
+
setup_preferred_alternate_classes ();
-
+
delete_trivially_dead_insns (get_insns (), max_reg_num ());
max_regno = max_reg_num ();
@@ -1897,41 +1907,48 @@ ira (FILE *f)
memset (VEC_address (rtx, reg_equiv_memory_loc_vec), 0,
sizeof (rtx) * max_regno);
reg_equiv_memory_loc = VEC_address (rtx, reg_equiv_memory_loc_vec);
-
+
regstat_init_n_sets_and_refs ();
regstat_compute_ri ();
allocate_initial_values (reg_equiv_memory_loc);
-
- fix_reg_equiv_init ();
+ overall_cost_before = overall_cost;
+ if (optimize)
+ {
+ fix_reg_equiv_init ();
+
#ifdef ENABLE_IRA_CHECKING
- print_redundant_copies ();
+ print_redundant_copies ();
#endif
-
- overall_cost_before = overall_cost;
-
- spilled_reg_stack_slots_num = 0;
- spilled_reg_stack_slots
- = ira_allocate (max_regno * sizeof (struct spilled_reg_stack_slot));
- memset (spilled_reg_stack_slots, 0,
- max_regno * sizeof (struct spilled_reg_stack_slot));
-
+
+ spilled_reg_stack_slots_num = 0;
+ spilled_reg_stack_slots
+ = ira_allocate (max_regno * sizeof (struct spilled_reg_stack_slot));
+ memset (spilled_reg_stack_slots, 0,
+ max_regno * sizeof (struct spilled_reg_stack_slot));
+ }
+
df_set_flags (DF_NO_INSN_RESCAN);
build_insn_chain ();
- sort_insn_chain (TRUE);
- reload_completed = ! reload (get_insns (), 1);
- ira_free (spilled_reg_stack_slots);
+ if (optimize)
+ sort_insn_chain (TRUE);
- finish_ira_assign ();
+ reload_completed = ! reload (get_insns (), optimize > 0);
+ if (optimize)
+ {
+ ira_free (spilled_reg_stack_slots);
+
+ finish_ira_assign ();
+
+ }
if (internal_flag_ira_verbose > 0 && ira_dump_file != NULL
&& overall_cost_before != overall_cost)
fprintf (ira_dump_file, "+++Overall after reload %d\n", overall_cost);
-
ira_destroy ();
-
+
flow_loops_free (&ira_loops);
free_dominance_info (CDI_DOMINATORS);
FOR_ALL_BB (bb)
@@ -1940,19 +1957,22 @@ ira (FILE *f)
flag_ira_algorithm = saved_flag_ira_algorithm;
- cleanup_cfg (CLEANUP_EXPENSIVE);
-
regstat_free_ri ();
regstat_free_n_sets_and_refs ();
-
- ira_free (reg_equiv_invariant_p);
- ira_free (reg_equiv_const);
+
+ if (optimize)
+ {
+ cleanup_cfg (CLEANUP_EXPENSIVE);
+
+ ira_free (reg_equiv_invariant_p);
+ ira_free (reg_equiv_const);
+ }
bitmap_obstack_release (&ira_bitmap_obstack);
#ifndef IRA_NO_OBSTACK
obstack_free (&ira_obstack, NULL);
#endif
-
+
/* The code after the reload has changed so much that at this point
we might as well just rescan everything. Not that
df_rescan_all_insns is not going to help here because it does not
Index: ira-costs.c
===================================================================
--- ira-costs.c (revision 134857)
+++ ira-costs.c (working copy)
@@ -79,7 +79,8 @@ static struct costs *this_op_costs[MAX_R
static struct costs *total_costs;
/* Classes used for cost calculation. They may be different on
- different iterations of the cost calculations. */
+ different iterations of the cost calculations or in different
+ optimization modes. */
static enum reg_class *cost_classes;
/* The size of the previous array. */
@@ -1124,17 +1125,34 @@ find_allocno_class_costs (void)
if (internal_flag_ira_verbose > 0 && ira_dump_file)
fprintf (ira_dump_file, "\nPass %i for finding allocno costs\n\n",
pass);
- /* We could use only cover classes on the 1st iteration.
- Unfortunately it does not work well for some targets where
- some subclass of cover class is costly and wrong cover class
- is chosen on the first iteration and it can not be fixed on
- the 2nd iteration. */
- for (cost_classes_num = 0;
- cost_classes_num < important_classes_num;
- cost_classes_num++)
+ if (optimize)
{
- cost_classes[cost_classes_num] = important_classes[cost_classes_num];
- cost_class_nums[cost_classes[cost_classes_num]] = cost_classes_num;
+ /* We could use only cover classes on the 1st iteration.
+ Unfortunately it does not work well for some targets where
+ some subclass of cover class is costly and wrong cover class
+ is chosen on the first iteration and it can not be fixed on
+ the 2nd iteration. */
+ for (cost_classes_num = 0;
+ cost_classes_num < important_classes_num;
+ cost_classes_num++)
+ {
+ cost_classes[cost_classes_num]
+ = important_classes[cost_classes_num];
+ cost_class_nums[cost_classes[cost_classes_num]]
+ = cost_classes_num;
+ }
+ }
+ else
+ {
+ for (cost_classes_num = 0;
+ cost_classes_num < reg_class_cover_size;
+ cost_classes_num++)
+ {
+ cost_classes[cost_classes_num]
+ = reg_class_cover[cost_classes_num];
+ cost_class_nums[cost_classes[cost_classes_num]]
+ = cost_classes_num;
+ }
}
struct_costs_size
= sizeof (struct costs) + sizeof (int) * (cost_classes_num - 1);
@@ -1425,14 +1443,14 @@ setup_allocno_cover_class_and_costs (voi
ira_assert (allocno_pref[i] == NO_REGS || cover_class != NO_REGS);
ALLOCNO_MEMORY_COST (a) = ALLOCNO_UPDATED_MEMORY_COST (a)
= COSTS_OF_ALLOCNO (total_costs, i)->mem_cost;
- ALLOCNO_COVER_CLASS (a) = cover_class;
+ set_allocno_cover_class (a, cover_class);
if (cover_class == NO_REGS)
continue;
ALLOCNO_AVAILABLE_REGS_NUM (a) = available_class_regs[cover_class];
ALLOCNO_COVER_CLASS_COST (a)
= (COSTS_OF_ALLOCNO (total_costs, i)
->cost[cost_class_nums[allocno_pref[i]]]);
- if (ALLOCNO_COVER_CLASS (a) != allocno_pref[i])
+ if (optimize && ALLOCNO_COVER_CLASS (a) != allocno_pref[i])
{
n = class_hard_regs_num[cover_class];
ALLOCNO_HARD_REG_COSTS (a)
@@ -1446,8 +1464,9 @@ setup_allocno_cover_class_and_costs (voi
}
}
}
- traverse_loop_tree (FALSE, ira_loop_tree_root,
- process_bb_node_for_hard_reg_moves, NULL);
+ if (optimize)
+ traverse_loop_tree (FALSE, ira_loop_tree_root,
+ process_bb_node_for_hard_reg_moves, NULL);
}
Index: reload1.c
===================================================================
--- reload1.c (revision 134857)
+++ reload1.c (working copy)
@@ -553,7 +553,7 @@ compute_use_by_pseudos (HARD_REG_SET *to
which might still contain registers that have not
actually been allocated since they have an
equivalence. */
- gcc_assert (flag_ira || reload_completed);
+ gcc_assert ((flag_ira && optimize) || reload_completed);
}
else
add_to_hard_reg_set (to, PSEUDO_REGNO_MODE (regno), r);
@@ -897,7 +897,7 @@ reload (rtx first, int global)
for (n = 0, i = LAST_VIRTUAL_REGISTER + 1; i < max_regno; i++)
temp_pseudo_reg_arr[n++] = i;
- if (flag_ira)
+ if (flag_ira && optimize)
/* Ask IRA to order pseudo-registers for better stack slot
sharing. */
sort_regnos_for_alter_reg (temp_pseudo_reg_arr, n, reg_max_ref_width);
@@ -1051,7 +1051,7 @@ reload (rtx first, int global)
calculate_needs_all_insns (global);
- if (! flag_ira)
+ if (! flag_ira || ! optimize)
/* Don't do it for IRA. We need this info because we don't
change live_throughout and dead_or_set for chains when IRA
is used. */
@@ -1114,7 +1114,7 @@ reload (rtx first, int global)
obstack_free (&reload_obstack, reload_firstobj);
}
- if (flag_ira)
+ if (flag_ira && optimize)
/* Restore the original insn chain order for correct reload
work. */
sort_insn_chain (FALSE);
@@ -1624,7 +1624,7 @@ calculate_needs_all_insns (int global)
reg_equiv_memory_loc
[REGNO (SET_DEST (set))]))))
{
- if (flag_ira)
+ if (flag_ira && optimize)
/* Inform IRA about the insn deletion. */
mark_memory_move_deletion (REGNO (SET_DEST (set)),
REGNO (SET_SRC (set)));
@@ -1733,7 +1733,7 @@ count_pseudo (int reg)
|| REGNO_REG_SET_P (&spilled_pseudos, reg)
/* Ignore spilled pseudo-registers which can be here only if IRA
is used. */
- || (flag_ira && r < 0))
+ || (flag_ira && optimize && r < 0))
return;
SET_REGNO_REG_SET (&pseudos_counted, reg);
@@ -1814,7 +1814,7 @@ count_spilled_pseudo (int spilled, int s
/* Ignore spilled pseudo-registers which can be here only if IRA is
used. */
- if ((flag_ira && r < 0)
+ if ((flag_ira && optimize && r < 0)
|| REGNO_REG_SET_P (&spilled_pseudos, reg)
|| spilled + spilled_nregs <= r || r + nregs <= spilled)
return;
@@ -1882,7 +1882,7 @@ find_reg (struct insn_chain *chain, int
if (! ok)
continue;
- if (flag_ira)
+ if (flag_ira && optimize)
{
/* Ask IRA to find a better pseudo-register for
spilling. */
@@ -2165,10 +2165,10 @@ alter_reg (int i, int from_reg, bool don
int adjust = 0;
bool shared_p = false;
- if (flag_ira)
+ if (flag_ira && optimize)
/* Mark the spill for IRA. */
SET_REGNO_REG_SET (&spilled_pseudos, i);
- x = (dont_share_p || ! flag_ira
+ x = (dont_share_p || ! flag_ira || ! optimize
? NULL_RTX : reuse_stack_slot (i, inherent_size, total_size));
if (x)
shared_p = true;
@@ -2180,7 +2180,7 @@ alter_reg (int i, int from_reg, bool don
enough inherent space and enough total space.
Otherwise, we allocate a new slot, making sure that it has no less
inherent space, and no less total space, then the previous slot. */
- else if (from_reg == -1 || (! dont_share_p && flag_ira))
+ else if (from_reg == -1 || (! dont_share_p && flag_ira && optimize))
{
alias_set_type alias_set = new_alias_set ();
@@ -2199,7 +2199,7 @@ alter_reg (int i, int from_reg, bool don
set_mem_alias_set (x, alias_set);
dse_record_singleton_alias_set (alias_set, mode);
- if (! dont_share_p && flag_ira)
+ if (! dont_share_p && flag_ira && optimize)
/* Inform IRA about allocation a new stack slot. */
mark_new_stack_slot (x, i, total_size);
}
@@ -3944,7 +3944,7 @@ finish_spills (int global)
spill_reg_order[i] = -1;
EXECUTE_IF_SET_IN_REG_SET (&spilled_pseudos, FIRST_PSEUDO_REGISTER, i, rsi)
- if (! flag_ira || reg_renumber[i] >= 0)
+ if (! flag_ira || ! optimize || reg_renumber[i] >= 0)
{
/* Record the current hard register the pseudo is allocated to
in pseudo_previous_regs so we avoid reallocating it to the
@@ -3954,7 +3954,7 @@ finish_spills (int global)
SET_HARD_REG_BIT (pseudo_previous_regs[i], reg_renumber[i]);
/* Mark it as no longer having a hard register home. */
reg_renumber[i] = -1;
- if (flag_ira)
+ if (flag_ira && optimize)
/* Inform IRA about the change. */
mark_allocation_change (i);
/* We will need to scan everything again. */
@@ -3984,7 +3984,7 @@ finish_spills (int global)
}
}
- if (! flag_ira)
+ if (! flag_ira || ! optimize)
{
/* Retry allocating the spilled pseudos. For each reg,
merge the various reg sets that indicate which hard regs
@@ -4035,7 +4035,7 @@ finish_spills (int global)
HARD_REG_SET used_by_pseudos;
HARD_REG_SET used_by_pseudos2;
- if (! flag_ira)
+ if (! flag_ira || ! optimize)
{
/* Don't do it for IRA because IRA and the reload still can
assign hard registers to the spilled pseudos on next
@@ -5131,6 +5131,7 @@ reloads_unique_chain_p (int r1, int r2)
return true;
}
+
/* The recursive function change all occurrences of WHAT in *WHERE
onto REPL. */
static void
@@ -7008,7 +7009,7 @@ emit_input_reload_insns (struct insn_cha
&& REG_N_SETS (REGNO (old)) == 1)
{
reg_renumber[REGNO (old)] = REGNO (reloadreg);
- if (flag_ira)
+ if (flag_ira && optimize)
/* Inform IRA about the change. */
mark_allocation_change (REGNO (old));
alter_reg (REGNO (old), -1, false);
@@ -8547,7 +8548,7 @@ delete_output_reload (rtx insn, int j, i
/* For the debugging info, say the pseudo lives in this reload reg. */
reg_renumber[REGNO (reg)] = REGNO (new_reload_reg);
- if (flag_ira)
+ if (flag_ira && optimize)
/* Inform IRA about the change. */
mark_allocation_change (REGNO (reg));
alter_reg (REGNO (reg), -1, false);