This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: LRA for x86/x86-64 [7/9] -- continuation


On 10/04/2012 11:50 AM, Richard Sandiford wrote:
Hi Vlad,

This message is for lra-assigns.c.  Sorry for the piecemeal reviews,
never sure when I'll get time...

+/* This file contains a pass mostly assigning hard registers to reload
+   pseudos.  There is no any RTL code transformation on this pass.
Maybe:

/* This file's main objective is to assign hard registers to reload pseudos.
    It also tries to allocate hard registers to other pseudos, but at a lower
    priority than the reload pseudos.  The pass does not transform the RTL.

if that's accurate.
Yes. That is better. I used your comment.
+   Reload pseudos get what they need (usually) hard registers in
+   anyway possibly by spilling non-reload pseudos and by assignment
+   reload pseudos with smallest number of available hard registers
+   first.
+
+   If reload pseudos can get hard registers only through spilling
+   other pseudos, we choose what pseudos to spill taking into account
+   how given reload pseudo benefits and also how other reload pseudos
+   not assigned yet benefit too (see function spill_for).
Maybe:

    We must allocate a hard register to every reload pseudo.  We try to
    increase the chances of finding a viable allocation by assigning the
    pseudos in order of fewest available hard registers first.  If we
    still fail to find a hard register, we spill other (non-reload)
    pseudos in order to make room.

    assign_hard_regno_for allocates registers without spilling.
    spill_for does the same with spilling.  Both functions use
    a cost model to determine the most profitable choice of
    hard and spill registers.
Ok. I just changed two sentences a bit:

find_hard_regno_for finds hard registers for allocation without spilling. spill_for does the same with spilling.

+   Non-reload pseudos can get hard registers too if it is possible and
+   improves the code.  It might be possible because of spilling
+   non-reload pseudos on given pass.
Maybe:

    Once we have finished allocating reload pseudos, we also try to
    assign registers to other (non-reload) pseudos.  This is useful
    if hard registers were freed up by the spilling just described.

Fixed.
+   We try to assign hard registers processing pseudos by threads.  The
+   thread contains reload and inheritance pseudos connected by copies
+   (move insns).  It improves the chance to get the same hard register
+   to pseudos in the thread and, as the result, to remove some move
+   insns.
Maybe:

    We try to assign hard registers by collecting pseudos into threads.
    These threads contain reload and inheritance pseudos that are connected
    by copies (move insns).  Doing this improves the chances of pseudos
    in the thread getting the same hard register and, as a result,
    of allowing some move insns to be deleted.
Fixed.
+   When we assign hard register to a pseudo, we decrease the cost of
+   the hard registers for corresponding pseudos connected by copies.
Maybe:

    When we assign a hard register to a pseudo, we decrease the cost of
    using the same hard register for pseudos that are connected by copies.
Fixed.
+   If two hard registers are equally good for assigning the pseudo
+   with hard register cost point of view, we prefer a hard register in
+   smaller register bank.  By default, there is only one register
+   bank.  A target can define register banks by hook
+   register_bank. For example, x86-64 has a few register banks: hard
+   regs with and without REX prefixes are in different banks.  It
+   permits to generate smaller code as insns without REX prefix are
+   shorter.
Maybe:

    If two hard registers have the same frequency-derived cost,
    we prefer hard registers in lower register banks.  The mapping
    of registers to banks is controlled by the register_bank target hook.
    For example, x86-64 has a few register banks: hard registers with and
    without REX prefixes are in different banks.  This permits us
    to generate smaller code as insns without REX prefixes are shorter.

although this might change if the name of the hook changes.

With recent change in the hook name, I modified it to:


   If two hard registers have the same frequency-derived cost, we
   prefer hard registers with bigger priorities.  The mapping of
   registers to priorities is controlled by the register_priority
   target hook.  For example, x86-64 has a few register priorities:
   hard registers with and without REX prefixes have different
   priorities.  This permits us to generate smaller code as insns
   without REX prefixes are shorter.

+/* Info about pseudo used during the assignment pass.  Thread is a set
+   of connected reload and inheritance pseudos with the same set of
+   available hard reg set.  Thread is a pseudo itself for other
+   cases.  */
+struct regno_assign_info
Maybe:

/* Information about the thread to which a pseudo belongs.  Threads are
    a set of connected reload and inheritance pseudos with the same set of
    available hard registers.  Lone registers belong to their own threads.  */
Fixed.
Although the condition seems to be:
+	&& (ira_class_hard_regs_num[regno_allocno_class_array[regno1]]
+	    == ira_class_hard_regs_num[regno_allocno_class_array[regno2]]))
i.e. the same _number_ of available hard regs, but not necessarily the
same set.
It should be the same in most cases. This condition is just faster approximation of the same available hard reg set.
"thread" might be more mnemonic than "regno_assign" in this file,
but that's bikeshed stuff.

+  for (i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
+    {
+      regno_assign_info[i].first = i;
+      regno_assign_info[i].next = -1;
+      regno_assign_info[i].freq = lra_reg_info[i].freq;
+    }
Minor speedup, but it's probably worth caching max_reg_num () rather than
calling it in each loop iteration.  Several other loops with the same thing.
That is not a critical code and LTO could solve the problem. But as we usually don't use it for building GCC, I rewrote it.
+/* Process a pseudo copy with execution frequency COPY_FREQ connecting
+   REGNO1 and REGNO2 to form threads.  */
+static void
+process_copy_to_form_thread (int regno1, int regno2, int copy_freq)
+{
+  int last, regno1_first, regno2_first;
+
+  lra_assert (regno1 >= lra_constraint_new_regno_start
+	      && regno2 >= lra_constraint_new_regno_start);
+  regno1_first = regno_assign_info[regno1].first;
+  regno2_first = regno_assign_info[regno2].first;
+  if (regno1_first != regno2_first)
+    {
+      for (last = regno2_first;
+	   regno_assign_info[last].next >= 0;
+	   last = regno_assign_info[last].next)
+	regno_assign_info[last].first = regno1_first;
+      regno_assign_info[last].next = regno_assign_info[regno1_first].next;
+      regno_assign_info[regno1_first].first = regno2_first;
+      regno_assign_info[regno1_first].freq
+	+= regno_assign_info[regno2_first].freq;
Couple of things I don't understand here:

- Why don't we set regno_assign_info[last].first (for final "last")
   to regno1_first?  I.e. the loop stops while "last" is still valid,
   but only assigns to that element's "next" field, leaving "first"
   as before.

- I might be wrong, but should:

regno_assign_info[regno1_first].first = regno2_first;

be:

regno_assign_info[regno1_first].next = regno2_first;

so that the list becomes:

regno1_first regno2_first ... last ...

The current version seems to create a cycle:

    regno_assign_info[regno1_first].first == regno2_first
    regno_assign_info[regno2_first].first == regno1_first
It is a typo. Fixed. Thanks. There is no looping danger as we don't traverse this list by field first but it results in assignment order different from what I assumed.
+/* Update LIVE_HARD_REG_PSEUDOS and LIVE_PSEUDOS_REG_RENUMBER by
+   pseudo REGNO assignment or by the pseudo spilling if FREE_P.	 */
Maybe:

/* Update the LIVE_HARD_REG_PSEUDOS and LIVE_PSEUDOS_REG_RENUMBER
    entries for pseudo REGNO.  Assume that the register has been
    spilled if FREE_P, otherwise assume that it has been assigned
    reg_renumber[REGNO] (if >= 0).  */
Fixed. I also added a comment about recently added insert_in_live_range_start_chain call.
+/* Find and return best (or TRY_ONLY_HARD_REGNO) free hard register
+   for pseudo REGNO.  In the failure case, return a negative number.
+   Return through *COST the cost of usage of the hard register for the
+   pseudo.  Best free hard register has smallest cost of usage for
+   REGNO or smallest register bank if the cost is the same.  */
Maybe:

/* Try to find a free hard register for pseudo REGNO.  Return the
    hard register on success and set *COST to the cost of using
    that register.  (If several registers have equal cost, the one with
    the lowest register bank wins.)  Return -1 on failure.

    If TRY_ONLY_HARD_REGNO >= 0, consider only that hard register,
    otherwise consider all hard registers in REGNO's class.  */

Fixed with changing bank to priority.
+      if (hard_regno_costs_check[hard_regno] != curr_hard_regno_costs_check)
+	hard_regno_costs[hard_regno] = 0;
+      hard_regno_costs_check[hard_regno] = curr_hard_regno_costs_check;
+      hard_regno_costs[hard_regno]
+	-= lra_reg_info[regno].preferred_hard_regno_profit1;
This pattern occurs several times.  I think it'd be clearer to have
an inline helper function (adjust_hard_regno_cost, or whatever).
Done.
+  /* That is important for allocation of multi-word pseudos.  */
+  IOR_COMPL_HARD_REG_SET (conflict_set, reg_class_contents[rclass]);
Maybe:

   /* Make sure that all registers in a multi-word pseudo belong to the
      required class.  */
Fixed.
+	  /* We can not use prohibited_class_mode_regs because it is
+	     defined not for all classes.  */
s/defined not/not defined/
Fixed.
+	  && ! TEST_HARD_REG_BIT (impossible_start_hard_regs, hard_regno)
+	  && (nregs_diff == 0
+#ifdef WORDS_BIG_ENDIAN
+	      || (hard_regno - nregs_diff >= 0
+		  && TEST_HARD_REG_BIT (reg_class_contents[rclass],
+					hard_regno - nregs_diff))
+#else
+	      || TEST_HARD_REG_BIT (reg_class_contents[rclass],
+				    hard_regno + nregs_diff)
+#endif
+	      ))
impossible_start_hard_regs is set up as:

+	conflict_hr = live_pseudos_reg_renumber[conflict_regno];
+	nregs = (hard_regno_nregs[conflict_hr]
+		 [lra_reg_info[conflict_regno].biggest_mode]);
+	/* Remember about multi-register pseudos.  For example, 2 hard
+	   register pseudos can start on the same hard register but can
+	   not start on HR and HR+1/HR-1.  */
+	for (hr = conflict_hr + 1;
+	     hr < FIRST_PSEUDO_REGISTER && hr < conflict_hr + nregs;
+	     hr++)
+	  SET_HARD_REG_BIT (impossible_start_hard_regs, hr);
+	for (hr = conflict_hr - 1;
+	     hr >= 0 && hr + hard_regno_nregs[hr][biggest_mode] > conflict_hr;
+	     hr--)
+	  SET_HARD_REG_BIT (impossible_start_hard_regs, hr);
which I don't think copes with big-endian cases like:

   other hard reg in widest mode:    ........XXXX...
   impossible_start_regs:            .....XXX.XXX...
   this hard reg in pseudo's mode:   ............XX.
   this hard reg in widest mode:     ..........XXXX.

which AIUI is an invalid choice.

There are other corner cases too.  If the other hard reg is narrower than
its widest mode, and that widest mode is wider than the current regno's
widest mode, then on big-endian targets we could have:

   other hard reg in its own mode:   ........XX....
   other hard reg in widest mode:    ......XXXX.....
   impossible_start_regs:            .......X.XXX... (*)
   this hard reg in pseudo's mode:   .....XX........
   this hard reg in widest mode:     .....XX........

(*) note no big-endian adjustment for the other hard reg's widest mode here.

Maybe it would be easier to track impossible end regs for
big-endian targets?
I'll look at this tomorrow.
+/* Update HARD_REGNO preference for pseudos connected (directly or
+   indirectly) to a pseudo with REGNO.	Use divisor DIV to the
+   corresponding copy frequency for the hard regno cost preference
+   calculation.	 The more indirectly a pseudo connected, the less the
+   cost preference.  It is achieved by increasing the divisor for each
+   next recursive level move.  */
"cost preference" seems a bit contradictory. Maybe:

/* Update the preference for using HARD_REGNO for pseudos that are
    connected directly or indirectly with REGNO.  Apply divisor DIV
    to any preference adjustments.

    The more indirectly a pseudo is connected, the smaller its effect
    should be.  We therefore increase DIV on each "hop".  */

Fixed. By the way, it is your invention from IRA.
+static void
+update_hard_regno_preference (int regno, int hard_regno, int div)
+{
+  int another_regno, cost;
+  lra_copy_t cp, next_cp;
+
+  /* Search depth 5 seems to be enough.	 */
+  if (div > (1 << 5))
+    return;
+  for (cp = lra_reg_info[regno].copies; cp != NULL; cp = next_cp)
+    {
+      if (cp->regno1 == regno)
+	{
+	  next_cp = cp->regno1_next;
+	  another_regno = cp->regno2;
+	}
+      else if (cp->regno2 == regno)
+	{
+	  next_cp = cp->regno2_next;
+	  another_regno = cp->regno1;
+	}
+      else
+	gcc_unreachable ();
+      if (reg_renumber[another_regno] < 0
+	  && (update_hard_regno_preference_check[another_regno]
+	      != curr_update_hard_regno_preference_check))
+	{
+	  update_hard_regno_preference_check[another_regno]
+	    = curr_update_hard_regno_preference_check;
+	  cost = cp->freq < div ? 1 : cp->freq / div;
+	  lra_setup_reload_pseudo_preferenced_hard_reg
+	    (another_regno, hard_regno, cost);
+	  update_hard_regno_preference (another_regno, hard_regno, div * 2);
+	}
+    }
+}
Using a depth-first search for this seems a bit dangerous, because we
could end up processing a connected pseudo via a very indirect path
first, even though it is more tightly connected via a more direct path.
(Could be a well-known problem, sorry.)
Actually I did on purpose. It is a bit different situation from IRA. The vast majority of copies form a straight line (therefore I use term threads). Therefore I use also only two preferences of hard registers (from two copies with hard registers). I searched a balance between code simplicity and speed and more sophisticated and slow heuristics.
+/* Update REG_RENUMBER and other pseudo preferences by assignment of
+   HARD_REGNO to pseudo REGNO and print about it if PRINT_P.  */
+void
+lra_setup_reg_renumber (int regno, int hard_regno, bool print_p)
+{
+  int i, hr;
+
+  if ((hr = hard_regno) < 0)
+    hr = reg_renumber[regno];
+  reg_renumber[regno] = hard_regno;
+  lra_assert (hr >= 0);
+  for (i = 0; i < hard_regno_nregs[hr][PSEUDO_REGNO_MODE (regno)]; i++)
+    if (hard_regno < 0)
+      lra_hard_reg_usage[hr + i] -= lra_reg_info[regno].freq;
+    else
+      lra_hard_reg_usage[hr + i] += lra_reg_info[regno].freq;
Is it possible for this function to reallocate a register,
i.e. for reg_regnumber to be >= 0 both before and after the call?
If so, I think we'd need two loops.  If not, an assert would be good.
No. I added an assert.
+      mode = PSEUDO_REGNO_MODE (spill_regno);
+      if (lra_hard_reg_set_intersection_p
+	  (live_pseudos_reg_renumber[spill_regno],
+	   mode, reg_class_contents[rclass]))
+	{
+	  hard_regno = live_pseudos_reg_renumber[spill_regno];
Very minor, sorry, but I think this would be more readable with the
hard_regno assignment before the condition and hard_regno used in it.
Fixed.
+/* Spill some pseudos for a reload pseudo REGNO and return hard
+   register which should be used for pseudo after spilling.  The
+   function adds spilled pseudos to SPILLED_PSEUDO_BITMAP.  When we
+   choose hard register (and pseudos occupying the hard registers and
+   to be spilled), we take into account not only how REGNO will
+   benefit from the spills but also how other reload pseudos not
+   assigned to hard registers yet benefit from the spills too.	*/
"...not yet assigned to hard registers benefit..."

Fixed.
+ curr_pseudo_check++; /* Invalidate try_hard_reg_pseudos elements. */
Comment on its own line.
Fixed.
+  bitmap_clear (&ignore_pseudos_bitmap);
+  bitmap_clear (&best_spill_pseudos_bitmap);
+  EXECUTE_IF_SET_IN_BITMAP (&lra_reg_info[regno].insn_bitmap, 0, uid, bi)
+    {
+      struct lra_insn_reg *ir;
+
+      for (ir = lra_get_insn_regs (uid); ir != NULL; ir = ir->next)
+	if (ir->regno >= FIRST_PSEUDO_REGISTER)
+	  bitmap_set_bit (&ignore_pseudos_bitmap, ir->regno);
+    }
The name "ignore_pseudos_bitmap" doesn't seem to describe how the set is
actually used.  We still allow the pseudos to be spilled, but the number
of such spills is the first-order cost.  Maybe "insn_conflict_pseudos"
or something like that?
Ok. Fixed.
+      /* Spill pseudos.	 */
+      CLEAR_HARD_REG_SET (spilled_hard_regs);
+      EXECUTE_IF_SET_IN_BITMAP (&spill_pseudos_bitmap, 0, spill_regno, bi)
+	if ((int) spill_regno >= lra_constraint_new_regno_start
+	    /* ??? */
+	    && ! bitmap_bit_p (&lra_inheritance_pseudos, spill_regno)
+	    && ! bitmap_bit_p (&lra_split_pseudos, spill_regno)
+	    && ! bitmap_bit_p (&lra_optional_reload_pseudos, spill_regno))
+	  goto fail;
Leftover ??? (or lacks enough info if it's supposed to be kept)
It is a leftover. As I remember it was a mark to me to check this code when i worked on inheritance and splitting.

I removed it.
+	      EXECUTE_IF_SET_IN_BITMAP (&live_hard_reg_pseudos[r->start],
+					0, k, bi2)
+		sparseset_set_bit (live_range_hard_reg_pseudos, k);
live_range_hard_reg_pseudos and &live_hard_reg_pseudos[r->start]
seem like similar quantities.  Was there a reason for using
sparsesets for one and bitmaps for the other?
+	      for (p = r->start + 1; p <= r->finish; p++)
+		{
+		  lra_live_range_t r2;
+		
+		  for (r2 = lra_start_point_ranges[p];
+		       r2 != NULL;
+		       r2 = r2->start_next)
+		    if (r2->regno >= lra_constraint_new_regno_start)
+		      sparseset_set_bit (live_range_reload_pseudos, r2->regno);
+		}
This is probably just showing my ignorance, but -- taking the above two
quotes together -- why do we calculate these two live sets in different ways?
Also, does live_range_reload_pseudos really just contain "reload" pseudos,
or inheritance pseudos as well?
Thanks for finding this. live_range_hard_reg_pseudos is not used in this function.
As I remember, i used the same code as in find_hard_regno_for and live_(hard_)reg_pseudos contained all pseudos including spilled ones. But it was too expensive (especially when the register pressure was high). So I started to use less accurate but faster heuristics.


So I am removing

+	      EXECUTE_IF_SET_IN_BITMAP (&live_hard_reg_pseudos[r->start],
+					0, k, bi2)
+		sparseset_set_bit (live_range_hard_reg_pseudos, k);

I changed

p = r->start + 1 to p = r->start


All these variants of code do not result in wrong code generation, only the quality of spilling (which is not worse reload's one in any case).


live_range_reload_pseudos contains inheritance pseudos too (inheritance psedous are also short live range pseudos). I renamed it.

+      /* We are trying to spill a reload pseudo.  That is wrong we
+	 should assign all reload pseudos, otherwise we cannot reuse
+	 the selected alternatives.  */
+      hard_regno = find_hard_regno_for (regno, &cost, -1);
+      if (hard_regno >= 0)
+	{
Don't really understand this comment, sorry.
It removed the comment. It is from an old solution code trying to guarantee assignment to the reload pseudo.
Also, why are we passing -1 to find_hard_regno_for, rather than hard_regno?
The loop body up till this point has been specifically freeing up registers
to make hard_regno allocatable.  I realise that, by spilling everything
that overlaps this range, we might have freed up other registers too,
and so made others besides hard_regno allocatable.  But wouldn't we want
to test those other hard registers in "their" iteration of the loop
instead of this one?  The spills in those iterations ought to be more
directed (i.e. there should be less incidental spilling).

As things stand, doing an rclass_size * rclass_size scan seems
unnecessarily expensive, although probably off the radar.
We cannot just pass hard_regno for multi-word pseudo when hard_regno-1 is already free.
You are right about possibility to speed up the code, although on profiles I looked (including the last huge tests) spill_for and find_hard_regno_for called from takes few time. That is probably because you don't need spill frequently. Freeing one long live range pseudo permits to find hard regno without spilling for many short live pseudos (reload and inheritance ones).
Also loop rclass_size * rclass_size is not expensive, the preparation of data for the loop is expensive.


I believe it has a potential to speed up spill_for function if we inline find_hard_regno_for and remove duplicated code but it will be achieved by significant complication of already complicated function.
+	  assign_temporarily (regno, hard_regno);
+	  n = 0;
+	  EXECUTE_IF_SET_IN_SPARSESET (live_range_reload_pseudos, reload_regno)
+	    if (live_pseudos_reg_renumber[reload_regno] < 0
+		&& (hard_reg_set_intersect_p
+		    (reg_class_contents
+		     [regno_allocno_class_array[reload_regno]],
+		     spilled_hard_regs)))
+	      sorted_reload_pseudos[n++] = reload_regno;
+	  qsort (sorted_reload_pseudos, n, sizeof (int),
+		 reload_pseudo_compare_func);
+	  for (j = 0; j < n; j++)
+	    {
+	      reload_regno = sorted_reload_pseudos[j];
+	      if (live_pseudos_reg_renumber[reload_regno] < 0
Just trying to make sure I understand, but: isn't the final condition in
this quote redundant?  I thought that was a requirement for the register
being in sorted_reload_pseudos to begin with.
Yes, it is redundant. It is a leftover from some experiments with the code. I converted it to assert.
+		  && (reload_hard_regno
+		      = find_hard_regno_for (reload_regno,
+					     &reload_cost, -1)) >= 0
+		  && (lra_hard_reg_set_intersection_p
+		      (reload_hard_regno, PSEUDO_REGNO_MODE (reload_regno),
+		       spilled_hard_regs)))
+		{
+		  if (lra_dump_file != NULL)
+		    fprintf (lra_dump_file, " assign %d(cost=%d)",
+			     reload_regno, reload_cost);
+		  assign_temporarily (reload_regno, reload_hard_regno);
+		  cost += reload_cost;
It looks like registers that can be reallocated make hard_regno more
expensive (specifically by reload_cost), but registers that can't be
reallocated contribute no cost.  Is that right?  Seemed a little odd,
so maybe worth a comment.
Reload cost is a negative cost. It is negative base is the pseudo frequency.
I added the comment. The better hard register is, the more negative cost is.
Also, AIUI find_hard_regno_for is trying to allocate the register for
reload_regno on the basis that reload_regno has the same conflicts as
the current regno, and so it's only an approximation.  Is that right?
Might be worth a comment if so (not least to explain why we don't commit
to this allocation if we end up choosing hard_regno).
Sorry, I did not understand what you are asking. We try to find best pseudos to spill which helps to assign more reload pseudos (on cost base) as possible. Pseudos for which find_hard_regno finds not spilled hard regs are ignored as they can be assigned without spilling.
+	  if (best_insn_pseudos_num > insn_pseudos_num
+	      || (best_insn_pseudos_num == insn_pseudos_num
+		  && best_cost > cost))
Should we check the register bank and levelling here too,
for consistency?
As I remember I had a quick check of it on a few tests but did not find a difference in generated code. I think that is because the probability of the same cost is smaller than in assigning code as we usually spill different cost pseudos.
I think we should try it on bigger tests and add such code if it is worth, or write a comment about this. I'll put it on my todo list. I'll work on it later on the branch.
+      /* Restore the live hard reg pseudo info for spilled pseudos.  */
+      EXECUTE_IF_SET_IN_BITMAP (&spill_pseudos_bitmap, 0, spill_regno, bi)
+	update_lives (spill_regno, false);
I couldn't tell why this was outside the "hard_regno >= 0" condition.
Do we really change these registers even if find_hard_regno_for fails?
The first we spill some pseudo and then call find_hard_regno_for. So we should restore the state before spilling independently on success of find_hard_regno_for.
+  /* Spill: */
+  EXECUTE_IF_SET_IN_BITMAP (&best_spill_pseudos_bitmap, 0, spill_regno, bi)
Very minor, but I think it'd be worth asserting that best_hard_regno >= 0
before this loop.
Unfortunately, in very rare cases, best_hard_regno can be < 0. That is why we have two iteration for assignment of reload pseudos (see comment for 2nd iter for reload pseudo assignments.

I've added a comment for the function that it can return negative value.
+/* Constraint transformation can use equivalences and they can
+   contains pseudos assigned to hard registers.	 Such equivalence
+   usage might create new conflicts of pseudos with hard registers
+   (like ones used for parameter passing or call clobbered ones) or
+   other pseudos assigned to the same hard registers.  Another very
+   rare risky transformation is restoring whole multi-register pseudo
+   when only one subreg lives and unused hard register is used already
+   for something else.
In a way, I found this comment almost too detailed. :-) Maybe:

/* The constraints pass is allowed to create equivalences between
    pseudos that make the current allocation "incorrect" (in the sense
    that pseudos are assigned to hard registers from their own conflict sets).
    The global variable lra_risky_transformations_p says whether this might
    have happened.

if that's accurate.  The detail about when this occurs probably
belongs above lra_risky_transformations_p, although it's mostly
there already.  (Haven't got to the ira-conflicts.c stuff yet,
so no comments about that here.)
Ok. Fixed. It looks better to me. But more important if it looks better to you because you have a fresh look.
+   Process pseudos assigned to hard registers (most frequently used
+   first), spill if a conflict is found, and mark the spilled pseudos
+   in SPILLED_PSEUDO_BITMAP.  Set up LIVE_HARD_REG_PSEUDOS from
+   pseudos, assigned to hard registers.	 */
Why do we spill the most frequently used registers first?  Probably worth
a comment.

It should be less frequently pseudos as an intuitive heuristic. Although it does matter as even on all_cp2k_fortran.f90 (500K lines test) I did not find that the order affect the generated code. I fixed the code and the comment.
+  for (n = 0, i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
+    if (reg_renumber[i] >= 0 && lra_reg_info[i].nrefs > 0)
+      {
+	if (lra_risky_transformations_p)
+	  sorted_pseudos[n++] = i;
+	else
+	  update_lives (i, false);
+      }
+  if (! lra_risky_transformations_p)
+    return;
Seems like this would be more logically split into two (the
lra_risky_transformations_p case and the !lra_risky_transformations_p case).
Fixed.
+	    /* If it is multi-register pseudos they should start on
+	       the same hard register.	*/
+	    || hard_regno != reg_renumber[conflict_regno])
This seems different from the find_hard_regno_for case, which took
biggest_mode into account.
I think we should use the common code. So I'll fix it with the big endian problem.
+	  /* Don't change reload pseudo allocation.  It might have
+	     this allocation for a purpose (e.g. bound to another
+	     pseudo) and changing it can result in LRA cycling.	 */
+	  if (another_regno < lra_constraint_new_regno_start
+	      && (another_hard_regno = reg_renumber[another_regno]) >= 0
+	      && another_hard_regno != hard_regno)
Seems like this excludes split pseudos as well as reload pseudos,
or are they never included in these copies?  Might be worth mentioning
them either way.
I fixed it.
The only general comment I have so far is that it's sometimes
difficult to follow which types of pseudos are being included
or excluded by a comparison with lra_constraint_new_regno_start.
Sometimes the comments talk about "reload pseudos", but other
similar checks imply that the registers could be inheritance
pseudos or split pseudos as well.  Some thin inline wrappers
might help here.
Inheritance, split, reload pseudos created since last constraint pass >= lra_constraint_new_regno_start.
Inheritance and split pseudos created on any pass are in the corresponding bitmaps.
Inheritance and split pseudos since the last constraint pass has also restore_regno >= 0 until split or inheritance transformations are done.


I am putting the comment about this at the top of the file.
+	      /* Remember that reload pseudos can be spilled on the
+		 1st pass.  */
+	      bitmap_clear_bit (&all_spilled_pseudos, regno);
+	      assign_hard_regno (hard_regno, regno);
Maybe:

   /* This register might have been spilled by the previous pass.
      Indicate that it is no longer spilled.  */
Fixed.
+		/* We can use inheritance pseudos in original insns
+		   (not reload ones).  */
+		if (regno < lra_constraint_new_regno_start
+		    || bitmap_bit_p (&lra_inheritance_pseudos, regno)
+		    || reg_renumber[regno] < 0)
+		  continue;
+		sorted_pseudos[nfails++] = regno;
+		if (lra_dump_file != NULL)
+		  fprintf (lra_dump_file,
+			   "	  Spill reload r%d(hr=%d, freq=%d)\n",
+			   regno, reg_renumber[regno],
+			   lra_reg_info[regno].freq);
Same comment about types of pseudo as above.  (I.e. the code checks for
inheritance pseudos, but not split pseudos.)
I modified the comment to

/* A reload pseudo did not get a hard register on the
   first iteration because of the conflict with
   another reload pseudos in the same insn.  So we
   consider only reload pseudos assigned to hard
   registers.  We shall exclude inheritance pseudos as
   they can occur in original insns (not reload ones).
   We can omit the check for split pseudos because
   they occur only in move insns containing non-reload
   pseudos. */

I hope it explains the code.
+  bitmap_initialize (&do_not_assign_nonreload_pseudos, &reg_obstack);
+  EXECUTE_IF_SET_IN_BITMAP (&lra_inheritance_pseudos, 0, u, bi)
+    if ((restore_regno = lra_reg_info[u].restore_regno) >= 0
+	&& reg_renumber[u] < 0 && bitmap_bit_p (&lra_inheritance_pseudos, u))
+      bitmap_set_bit (&do_not_assign_nonreload_pseudos, restore_regno);
+  EXECUTE_IF_SET_IN_BITMAP (&lra_split_pseudos, 0, u, bi)
+    if ((restore_regno = lra_reg_info[u].restore_regno) >= 0
+	&& reg_renumber[u] >= 0 && bitmap_bit_p (&lra_split_pseudos, u))
+      bitmap_set_bit (&do_not_assign_nonreload_pseudos, restore_regno);
The bitmap_bit_p tests look redundant. Also, the following code is:

Fixed.
+  for (n = 0, i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
+    if (((i < lra_constraint_new_regno_start
+	  && ! bitmap_bit_p (&do_not_assign_nonreload_pseudos, i))
+	 || (bitmap_bit_p (&lra_inheritance_pseudos, i)
+	     && lra_reg_info[i].restore_regno >= 0)
+	 || (bitmap_bit_p (&lra_split_pseudos, i)
+	     && lra_reg_info[i].restore_regno >= 0)
+	 || bitmap_bit_p (&lra_optional_reload_pseudos, i))
+	&& reg_renumber[i] < 0 && lra_reg_info[i].nrefs != 0
+	&& regno_allocno_class_array[i] != NO_REGS)
+      sorted_pseudos[n++] = i;
+  bitmap_clear (&do_not_assign_nonreload_pseudos);
where we test very similar things inline, and then clear
do_not_assign_nonreload_pseudos.  Do we need d_n_a_n_p at all?
No the code is right. We still need d_n_a_n_p as we can not easily calculate from what pseudo the given pseudo was inherited or split. It is different from the loop where new inheritance and split pseudos are checked (it is different from their origins marked in d_n_a_n_p.
+  if (n != 0 && lra_dump_file != NULL)
+    fprintf (lra_dump_file, "  Reassing non-reload pseudos\n");
"Reassigning"

Fixed. :)

Richard, thank you for the review. This is the most useful review I ever had. There are a lot of code can be simplified. I had numerous experiments with this code (and my be with code in lra-constraints.c) during my work on LRA. Therefore it can contains some artifacts from these experiments.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]