This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: RFC: LRA for x86/x86-64 [7/9] -- continuation

From: Vladimir Makarov <vmakarov at redhat dot com>
To: GCC Patches <gcc-patches at gcc dot gnu dot org>, rdsandiford at googlemail dot com
Date: Sun, 14 Oct 2012 13:36:40 -0400
Subject: Re: RFC: LRA for x86/x86-64 [7/9] -- continuation
References: <5064DA43.90803@redhat.com> <87vceie42n.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com>

On 12-10-10 11:31 AM, Richard Sandiford wrote:

Hi Vlad,

Here's a review of the first part of ira-constraints.c.  It's a big file,
and this is a long reply, so I'll send further comments another day in
another message.
Vladimir Makarov <vmakarov@redhat.com> writes:

+/* This file contains code for 3 passes: constraint pass,
+   inheritance/split pass, and pass for undoing failed inheritance and
+   split.
+
+   The major goal of constraint pass is to transform RTL to satisfy
+   insn and address constraints by:
+     o choosing insn alternatives;
+     o generating *reload insns* (or reloads in brief) and *reload
+       pseudos* which will got necessary hard registers later;

s/got/get/

Fixed.

+     o substituting pseudo equivalences (if it is done once, is done
+       everywhere) and removes insns initializing used equivalent
+       substitution.

Suggest:

      o substituting pseudos with equivalent values and removing the
        instructions that initialized those pseudos.

Fixed.

+   To speed the pass up we process only necessary insns (first time
+   all insns) and reuse of already chosen alternatives in some
+   cases.

Suggest:

    On the first iteration of the pass we process every instruction and
    choose an alternative for each one.  On subsequent iterations we try
    to avoid reprocessing instructions if we can be sure that the old
    choice is still valid.

Fixed.

+   The inheritance/spilt pass is to transform code to achieve
+   ineheritance and live range splitting.  It is done on backward
+   traverse of EBBs.

Typo: inheritance. "backward traversal".

Fixed.

+   The inheritance optimization goal is to reuse values in hard
+   registers. There is analogous optimization in old reload pass.  The
+   inheritance is achieved by following transformation:
+
+       reload_p1 <- p	     reload_p1 <- p
+       ...		     new_p <- reload_p1
+       ...		=>   ...
+       reload_p2 <- p	     reload_p2 <- new_p
+
+   where p is spilled and not changed between the insns.  Reload_p1 is
+   also called *original pseudo* and new_p is called *inheritance
+   pseudo*.
+
+   The subsequent assignment pass will try to assign the same (or
+   another if it is not possible) hard register to new_p as to
+   reload_p1 or reload_p2.
+
+   If it fails to assign a hard register, the opposite transformation
+   will restore the original code on (the pass called undoing
+   inheritance) because with spilled new_p the code would be much
+   worse. [...]

Maybe:

    If the assignment pass fails to assign a hard register to new_p,
    this file will undo the inheritance and restore the original code.
    This is because implementing the above sequence with a spilled
    new_p would make the code much worse.

Fixed.

+   Splitting (transformation) is also done in EBB scope on the same
+   pass as the inheritance:
+
+       r <- ... or ... <- r		 r <- ... or ... <- r
+       ...				 s <- r (new insn -- save)
+       ...			  =>
+       ...				 r <- s (new insn -- restore)
+       ... <- r				 ... <- r
+
+    The *split pseudo* s is assigned to the hard register of the
+    original pseudo or hard register r.
+
+    Splitting is done:
+      o In EBBs with high register pressure for global pseudos (living
+	in at least 2 BBs) and assigned to hard registers when there
+	are more one reloads needing the hard registers;
+      o for pseudos needing save/restore code around calls.
+
+    If the split pseudo still has the same hard register as the
+    original pseudo after the subsequent assignment pass, the opposite
+    transformation is done on the same pass for undoing inheritance.  */

AIUI spill_for can spill split pseudos.  I think the comment should say
what happens then.  If I understand the code correctly, we keep the
split if "r" is a hard register or was assigned a hard register.
We undo it if "r" was not assigned a hard register.  Is that right?

Yes. To be more correctly, r and s are assigned the same hard reg before the assignment pass. So if r is *spilled* (by assignment pass) or r and s have the same hard reg after the assignment pass, we undo the transformation.

I added a comment.

+/* Array whose element is (MEM:MODE BASE_REG) corresponding to the
+   mode (index) and where BASE_REG is a base hard register for given
+   memory mode.	 */
+static rtx indirect_mem[MAX_MACHINE_MODE];

Maybe:

/* Index M is an rtx of the form (mem:M BASE_REG), where BASE_REG
    is a sample hard register that is a valid address for mode M.
    The memory refers to the generic address space.  */

Fixed.

+/* Return class of hard regno of REGNO or if it is was not assigned to
+   a hard register, return its allocno class but only for reload
+   pseudos created on the current constraint pass.  Otherwise, return
+   NO_REGS.  */
+static enum reg_class
+get_reg_class (int regno)

Maybe:

/* If REGNO is a hard register or has been allocated a hard register,
    return the class of that register.  If REGNO is a pseudo created
    by the current constraints pass, assume that it will be allocated
    a hard register and return the class that that register will have.
    (This assumption is optimistic when REGNO is an inheritance or
    split pseudo.)  Return NO_REGS otherwise.  */

I don't like

assume that it will be allocated
   a hard register and return the class that that register will have

For example, I could treat this as assigning ax to pseudo having class general_regs and returning class a_reg instead of general_regs.

So I modified your comment a bit.

if that's accurate.  I dropped the term "reload pseudo" because of
the general comment in my earlier reply about the use of "reload pseudo"
when the code seems to include inheritance and split pseudos too.

There is no inheritance and splitting yet. It is done after the constraint pass. So at this stage >= new_regno_start means reload pseudo.

+/* Return true if REGNO in REG_MODE satisfies reg class constraint CL.
+   For new reload pseudos we should make more accurate class
+   *NEW_CLASS (we set up it if it is not NULL) to satisfy the
+   constraints.  Otherwise, set up NEW_CLASS to NO_REGS.  */
+static bool
+in_class_p (int regno, enum machine_mode reg_mode,
+	    enum reg_class cl, enum reg_class *new_class)

Same comment here, since it uses get_reg_class.  I.e. for registers >=
new_regno_start, we're really testing whether the first allocatable
register in REGNO's allocno class satisfies CL.

See my comment above.

Also, the only caller that doesn't directly pass REGNO and REG_MODE
from an rtx is process_addr_reg.  in_class_p uses:

+  if (new_class != NULL)
+    *new_class = NO_REGS;
+  if (regno < FIRST_PSEUDO_REGISTER)
+    return TEST_HARD_REG_BIT (reg_class_contents[cl], regno);
+  rclass = get_reg_class (regno);

whereas process_addr_reg uses:

+  final_regno = regno = REGNO (reg);
+  if (regno < FIRST_PSEUDO_REGISTER)
+    {
+      rtx final_reg = reg;
+      rtx *final_loc = &final_reg;
+
+      lra_eliminate_reg_if_possible (final_loc);
+      final_regno = REGNO (*final_loc);
+    }

I.e. process_addr_reg applies eliminations before testing whereas
in_class_p doesn't.  I couldn't really tell why the two were different.
Since the idea is that we use elimination source registers to represent
their targets, shouldn't in_class_p eliminate too?

As I remember that was a fix for a bug for some target which needed an eliminated reg for legitimate address recognition. I did not see such problem for the constraints.

With that difference removed, in_class_p could take the rtx instead
of a (REGNO, MODE) pair.  It could then pass that rtx directly to
lra_eliminate_reg_if_possible.  I think this would lead to a cleaner
interface and make things more regular.

Thanks. I fixed it.

Then the comment for in_class_p could be:

/* Return true if X satisfies (or will satisfy) reg class constraint CL.
    If X is a pseudo created by this constraints pass, assume that it will
    be allocated a hard register from its allocno class, but allow that
    class to be narrowed to CL if it is currently a superset of CL.

    If NEW_CLASS is nonnull, set *NEW_CLASS to the new allocno class
    of REGNO (X), or NO_REGS if no change in its class was needed.  */

Fixed. I just added 'X is a reload pseudo...'

That's a change in the meaning of NEW_CLASS, but seems easier for
callers to handle.  I think all it requires is changing:

+      common_class = ira_reg_class_subset[rclass][cl];
+      if (new_class != NULL)
+	*new_class = common_class;

to:

       common_class = ira_reg_class_subset[rclass][cl];
       if (new_class != NULL && rclass != common_class)
	*new_class = common_class;

This change results in infinite LRA looping on a first libgcc file compilation. Unfortunately I have no time to investigate it. I'd like to say that most code of in this code is very sensitive to changes. I see it a lot. You change something looking obvious and a target is broken. I am going to investigate it when I have more time.

+  if (regno < new_regno_start
+      /* Do not make more accurate class from reloads generated.  They
+	 are mostly moves with a lot of constraints.  Making more
+	 accurate class may results in very narrow class and
+	 impossibility of find registers for several reloads of one
+	 insn.	*/

Maybe:

       /* Do not allow the constraints for reload instructions to
	 influence the classes of new pseudos.  These reloads are
	 typically moves that have many alternatives, and restricting
	 reload pseudos for one alternative may lead to situations
	 where other reload pseudos are no longer allocatable.  */

Fixed.

+      || INSN_UID (curr_insn) >= new_insn_uid_start)
+    return ((regno >= new_regno_start && rclass == ALL_REGS)
+	    || (rclass != NO_REGS && ira_class_subset_p[rclass][cl]
+		&& ! hard_reg_set_subset_p (reg_class_contents[cl],
+					    lra_no_alloc_regs)));

Why the ALL_REGS special case? I think it deserves a comment.

I added a comment.

+/* Return the defined and profitable equiv substitution of reg X, return
+   X otherwise.	 */

Maybe:

/* If we have decided to substitute X with another value, return that value,
    otherwise return X.  */

Fixed.

+/* Change class of pseudo REGNO to NEW_CLASS.  Print info about it
+   using TITLE.	 Output a new line if NL_P.  */
+static void
+change_class (int regno, enum reg_class new_class,
+	      const char *title, bool nl_p)
+{
+  if (lra_dump_file != NULL)
+    fprintf (lra_dump_file, "%s to class %s for r%d",
+	     title, reg_class_names[new_class], regno);
+  setup_reg_classes (regno, new_class, NO_REGS, new_class);
+  if (lra_dump_file != NULL && nl_p)
+    fprintf (lra_dump_file, "\n");
+}

I think either this or setup_reg_classes should have an assert
that REGNO is >= FIRST_PSEUDO_REGISTER.   This matters more now
because a lot of LRA deals with hard and pseudo registers
side-by-side.

Ok. I added an assert.

+/* Create a new pseudo using MODE, RCLASS, ORIGINAL, TITLE or reuse
+   already created input reload pseudo (only if TYPE is not OP_OUT).
+   The result pseudo is returned through RESULT_REG.  Return TRUE if
+   we created a new pseudo, FALSE if we reused the already created
+   input reload pseudo.	 */

Maybe:

/* Store in *RESULT_REG a register for reloading ORIGINAL, which has
    mode MODE.  TYPE specifies the direction of the reload -- either OP_IN
    or OP_OUT -- and RCLASS specifies the class of hard register required.

It can be OP_INOUT too.

    Try to reuse existing input reloads where possible.  Return true if
    *RESULT_REG is a new register, false if it is an existing one.
    Use TITLE to describe new registers for debug purposes.  */

although I admit that's a bit convoluted...

I combined two comments in a better one.

+  for (i = 0; i < curr_insn_input_reloads_num; i++)
+    if (rtx_equal_p (curr_insn_input_reloads[i].input, original))
+      break;
+  if (i >= curr_insn_input_reloads_num
+      || ! in_class_p (REGNO (curr_insn_input_reloads[i].reg),
+		       GET_MODE (curr_insn_input_reloads[i].reg),
+		       rclass, &new_class))
+    {
+      res_p = true;
+      *result_reg = lra_create_new_reg (mode, original, rclass, title);
+    }
+  else
+    {
+      lra_assert (! side_effects_p (original));
+      res_p = false;
+      *result_reg = curr_insn_input_reloads[i].reg;
+      regno = REGNO (*result_reg);
+      if (lra_dump_file != NULL)
+	 {
+	   fprintf (lra_dump_file, "	 Reuse r%d for reload ", regno);
+	   print_value_slim (lra_dump_file, original, 1);
+	 }
+      if (rclass != new_class)
+	 change_class (regno, new_class, ", change", false);
+      if (lra_dump_file != NULL)
+	 fprintf (lra_dump_file, "\n");
+    }
+  lra_assert (curr_insn_input_reloads_num < LRA_MAX_INSN_RELOADS);
+  curr_insn_input_reloads[curr_insn_input_reloads_num].input = original;
+  curr_insn_input_reloads[curr_insn_input_reloads_num++].reg = *result_reg;
+  return res_p;

It probably doesn't matter in practice, but I think this would
be better as:

   for (i = 0; i < curr_insn_input_reloads_num; i++)
     if (rtx_equal_p (curr_insn_input_reloads[i].input, original)
         && in_class_p (curr_insn_input_reloads[i].reg, rclass, &new_class))
       {
         ...reuse case..
         return false;
       }
   ...new case...
   return true;

which also copes with the unlikely case that the same input is used
three times, and that the third use requires the same class as the
second.

Ok. I fixed it.

+/* The page contains code to extract memory address parts.  */
+
+/* Info about base and index regs of an address.  In some rare cases,
+   base/index register can be actually memory.	In this case we will
+   reload it.  */
+struct address
+{
+  rtx *base_reg_loc;  /* NULL if there is no a base register.  */
+  rtx *base_reg_loc2; /* Second location of {post/pre}_modify, NULL
+			 otherwise.  */
+  rtx *index_reg_loc; /* NULL if there is no an index register.	 */
+  rtx *index_loc; /* location of index reg * scale or index_reg_loc
+		      otherwise.  */
+  rtx *disp_loc; /* NULL if there is no a displacement.	 */
+  /* Defined if base_reg_loc is not NULL.  */
+  enum rtx_code base_outer_code, index_code;
+  /* True if the base register is modified in the address, for
+     example, in PRE_INC.  */
+  bool base_modify_p;
+};

Comments should be consistently above the fields rather than to the right.

Fixed.

+/* Process address part in space AS (or all address if TOP_P) with
+   location *LOC to extract address characteristics.
+
+   If CONTEXT_P is false, we are looking at the base part of an
+   address, otherwise we are looking at the index part.
+
+   MODE is the mode of the memory reference; OUTER_CODE and INDEX_CODE
+   give the context that the rtx appears in; MODIFY_P if *LOC is
+   modified.  */
+static void
+extract_loc_address_regs (bool top_p, enum machine_mode mode, addr_space_t as,
+			  rtx *loc, bool context_p, enum rtx_code outer_code,
+			  enum rtx_code index_code,
+			  bool modify_p, struct address *ad)
+{
+  rtx x = *loc;
+  enum rtx_code code = GET_CODE (x);
+  bool base_ok_p;
+
+  switch (code)
+    {
+    case CONST_INT:
+    case CONST:
+    case SYMBOL_REF:
+    case LABEL_REF:
+      if (! context_p)
+	ad->disp_loc = loc;

This looks a bit odd.  I assume it's trying to avoid treating MULT
scale factors as displacements, but I thought whether something was
a displacement or not depended on whether it is involved (possibly
indirectly) in a sum with the base.  Seems like it'd be better
to check for that directly.

+	/* If this machine only allows one register per address, it
+	   must be in the first operand.  */
+	if (MAX_REGS_PER_ADDRESS == 1 || code == LO_SUM)
+	  {
+	    extract_loc_address_regs (false, mode, as, arg0_loc, false, code,
+				      code1, modify_p, ad);
+	    ad->disp_loc = arg1_loc;
+	  }
+	/* If index and base registers are the same on this machine,
+	   just record registers in any non-constant operands.	We
+	   assume here, as well as in the tests below, that all
+	   addresses are in canonical form.  */
+	else if (INDEX_REG_CLASS
+		 == base_reg_class (VOIDmode, as, PLUS, SCRATCH)
+		 && code0 != PLUS && code0 != MULT)
+	  {
+	    extract_loc_address_regs (false, mode, as, arg0_loc, false, PLUS,
+				      code1, modify_p, ad);
+	    if (! CONSTANT_P (arg1))
+	      extract_loc_address_regs (false, mode, as, arg1_loc, true, PLUS,
+					code0, modify_p, ad);
+	    else
+	      ad->disp_loc = arg1_loc;
+	  }
+
+	/* If the second operand is a constant integer, it doesn't
+	   change what class the first operand must be.	 */
+	else if (code1 == CONST_INT || code1 == CONST_DOUBLE)
+	  {
+	    ad->disp_loc = arg1_loc;
+	    extract_loc_address_regs (false, mode, as, arg0_loc, context_p,
+				      PLUS, code1, modify_p, ad);
+	  }
+	/* If the second operand is a symbolic constant, the first
+	   operand must be an index register but only if this part is
+	   all the address.  */
+	else if (code1 == SYMBOL_REF || code1 == CONST || code1 == LABEL_REF)
+	  {
+	    ad->disp_loc = arg1_loc;
+	    extract_loc_address_regs (false, mode, as, arg0_loc,
+				      top_p ? true : context_p, PLUS, code1,
+				      modify_p, ad);
+	  }

What's the reason for the distinction between the last two, which AIUI
doesn't exist in reload?  I'm not sure the:

top_p ? true : context_p

condition is safe: some targets use aligning addresses like
(and X (const_int -ALIGN)), but that shouldn't really affect whether
a register in X is treated as a base or an index.

The code works for PPC which uses aligning addresses.

+	/* If both operands are registers but one is already a hard
+	   register of index or reg-base class, give the other the
+	   class that the hard register is not.	 */
+	else if (code0 == REG && code1 == REG
+		 && REGNO (arg0) < FIRST_PSEUDO_REGISTER
+		 && ((base_ok_p
+		      = ok_for_base_p_nonstrict (arg0, mode, as, PLUS, REG))
+		     || ok_for_index_p_nonstrict (arg0)))
+	  {
+	    extract_loc_address_regs (false, mode, as, arg0_loc, ! base_ok_p,
+				      PLUS, REG, modify_p, ad);
+	    extract_loc_address_regs (false, mode, as, arg1_loc, base_ok_p,
+				      PLUS, REG, modify_p, ad);
+	  }
+	else if (code0 == REG && code1 == REG
+		 && REGNO (arg1) < FIRST_PSEUDO_REGISTER
+		 && ((base_ok_p
+		      = ok_for_base_p_nonstrict (arg1, mode, as, PLUS, REG))
+		     || ok_for_index_p_nonstrict (arg1)))
+	  {
+	    extract_loc_address_regs (false, mode, as, arg0_loc, base_ok_p,
+				      PLUS, REG, modify_p, ad);
+	    extract_loc_address_regs (false, mode, as, arg1_loc, ! base_ok_p,
+				      PLUS, REG, modify_p, ad);
+	  }
+	/* If one operand is known to be a pointer, it must be the
+	   base with the other operand the index.  Likewise if the
+	   other operand is a MULT.  */
+	else if ((code0 == REG && REG_POINTER (arg0)) || code1 == MULT)
+	  {
+	    extract_loc_address_regs (false, mode, as, arg0_loc, false, PLUS,
+				      code1, modify_p, ad);
+	    if (code1 == MULT)
+	      ad->index_loc = arg1_loc;
+	    extract_loc_address_regs (false, mode, as, arg1_loc, true, PLUS,
+				      code0, modify_p, ad);
+	  }
+	else if ((code1 == REG && REG_POINTER (arg1)) || code0 == MULT)
+	  {
+	    extract_loc_address_regs (false, mode, as, arg0_loc, true, PLUS,
+				      code1, modify_p, ad);
+	    if (code0 == MULT)
+	      ad->index_loc = arg0_loc;
+	    extract_loc_address_regs (false, mode, as, arg1_loc, false, PLUS,
+				      code0, modify_p, ad);
+	  }

Some targets care about the choice between index and base for
correctness reasons (PA IIRC) or for performance (some ppc targets IIRC),
so I'm not sure whether it's safe to give REG_POINTER such a low priority.

This code works for PPC and PARISC.

+    default:
+      {
+	const char *fmt = GET_RTX_FORMAT (code);
+	int i;
+
+	if (GET_RTX_LENGTH (code) != 1
+	    || fmt[0] != 'e' || GET_CODE (XEXP (x, 0)) != UNSPEC)
+	  {
+	    for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+	      if (fmt[i] == 'e')
+		extract_loc_address_regs (false, mode, as, &XEXP (x, i),
+					  context_p, code, SCRATCH,
+					  modify_p, ad);
+	    break;
+	  }
+	/* fall through for case UNARY_OP (UNSPEC ...)	*/
+      }
+
+    case UNSPEC:
+      if (ad->disp_loc == NULL)
+	ad->disp_loc = loc;
+      else if (ad->base_reg_loc == NULL)
+	{
+	  ad->base_reg_loc = loc;
+	  ad->base_outer_code = outer_code;
+	  ad->index_code = index_code;
+	  ad->base_modify_p = modify_p;
+	}
+      else
+	{
+	  lra_assert (ad->index_reg_loc == NULL);
+	  ad->index_reg_loc = loc;
+	}
+      break;
+
+    }

Which targets use a bare UNSPEC as a displacement?  I thought a
displacement had to be a link-time constant, in which case it should
satisfy CONSTANT_P.  For UNSPECs, that means wrapping it in a CONST.

I saw it somewhere. I guess IA64.

I'm just a bit worried that the UNSPEC handling is sensitive to the
order that subrtxes are processed (unlike PLUS, which goes to some
trouble to work out what's what).  It could be especially confusing
because the default case processes operands in reverse order while
PLUS processes them in forward order.

Also, which cases require the special UNARY_OP (UNSPEC ...) fallthrough?
Probably deserves a comment.

I don't remember. To figure out, I should switch it off and try all targets supported by LRA.

AIUI the base_reg_loc, index_reg_loc and disp_loc fields aren't just
recording where reloads of a particular class need to go (obviously
in the case of disp_loc, which isn't reloaded at all).  The feidls
have semantic value too.  I.e. we use them to work out the value
of at least part of the address.

In that case it seems dangerous to look through general rtxes
in the way that the default case above does.  Maybe just making
sure that DISP_LOC is involved in a sum with the base would be
enough, but another idea was:

----------------------------------------------------------------
I know of three ways of "mutating" (for want of a better word)
an address:

   1. (and X (const_int X)), to align
   2. a subreg
   3. a unary operator (such as truncation or extension)

So maybe we could:

   a. remove outer mutations (using a helper function)
   b. handle LO_SUM, PRE_*, POST_*: as now
   c. otherwise treat the address of the sum of one, two or three pieces.
      c1. Peel mutations of all pieces.
      c2. Classify the pieces into base, index and displacement.
          This would be similar to the jousting code above, but hopefully
          easier because all three rtxes are to hand.  E.g. we could
          do the base vs. index thing in a similar way to
          commutative_operand_precedence.
      c3. Record which pieces were mutated (e.g. using something like the
          index_loc vs. index_reg_loc distinction in the current code)

That should be general enough for current targets, but if it isn't,
we could generalise it further when we know what generalisation is needed.

That's still going to be a fair amount of code, but hopefully not more,
and we might have more confidence at each stage what each value is.
And it avoids the risk of treating "mutated" addresses as "unmutated" ones.
----------------------------------------------------------------

Just an idea though.  Probably not for 4.8, although I might try it
if I find time.

I am not sure that you listed all the cases. It would be great if you listed all the cases. In this case we could make this function more clear. I tried to do this first but permanently found new cases. After that I gave up and tried to use more general implementation.

This function was rewritten and modified many times. I am afraid to do this again when clock is ticking.

It would be great if you re-implement the function according to your ideas and we could try it on 8 targets to which LRA was already ported. An LRA sub-branch would a perfect place to do it

It would be nice to sort out the disp_loc thing for 4.8 though.

+/* Extract address characteristics in address with location *LOC in
+   space AS.  Return them in AD.  Parameter OUTER_CODE for MEM should
+   be MEM.  Parameter OUTER_CODE for 'p' constraint should be ADDRESS
+   and MEM_MODE should be VOIDmode.  */

Maybe:

/* Describe address *LOC in AD. There are two cases:

    - *LOC is the address in a (mem ...).  In this case OUTER_CODE is MEM
      and AS is the mem's address space.

    - *LOC is matched to an address constraint such as 'p'.  In this case
      OUTER_CODE is ADDRESS and AS is ADDR_SPACE_GENERIC.  */

Fixed.

+/* Return start register offset of hard register REGNO in MODE.	 */
+int
+lra_constraint_offset (int regno, enum machine_mode mode)
+{
+  lra_assert (regno < FIRST_PSEUDO_REGISTER);
+  /* On a WORDS_BIG_ENDIAN machine, point to the last register of a
+     multiple hard register group of scalar integer registers, so that
+     for example (reg:DI 0) and (reg:SI 1) will be considered the same
+     register.	*/
+  if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (mode) > UNITS_PER_WORD
+      && SCALAR_INT_MODE_P (mode))
+    return hard_regno_nregs[regno][mode] - 1;
+  return 0;
+}

Maybe the head comment could be:

/* Return the offset from REGNO of the least significant register
    in (reg:MODE REGNO).

    This function is used to tell whether two registers satisfy
    a matching constraint.  (reg:MODE1 REGNO1) matches (reg:MODE2 REGNO2) if:

          REGNO1 + lra_constraint_offset (REGNO1, MODE1)
       == REGNO2 + lra_constraint_offset (REGNO2, MODE2)  */

(and remove the inner comment).

Fixed.

+/* Like rtx_equal_p except that it allows a REG and a SUBREG to match
+   if they are the same hard reg, and has special hacks for
+   auto-increment and auto-decrement.  This is specifically intended for
+   process_alt_operands to use in determining whether two operands
+   match.  X is the operand whose number is the lower of the two.
+
+   It is supposed that X is the output operand and Y is the input
+   operand.  */
+static bool
+operands_match_p (rtx x, rtx y, int y_hard_regno)

Need to say what Y_HARD_REGNO is.

Fixed.

+  switch (code)
+    {
+    case CONST_INT:
+    case CONST_DOUBLE:
+    case CONST_FIXED:
After a recent change this should be:

CASE_CONST_UNIQUE:

Ok. It looks I need another round of merging. Oh, well.

+	      val = operands_match_p (XVECEXP (x, i, j), XVECEXP (y, i, j),
+				      y_hard_regno);
+	      if (val == 0)
+		return false;

Why do we pass the old y_hard_regno even though Y has changed?
Some of the earlier code assumes that GET_MODE (y) is the mode
of y_hard_regno.

It does not matter. As we processed reg and subreg case above and y_hard_regno can be non-negative only for this cases. We can not process it again. But I changed it to -1 for clearness.

+/* Reload pseudos created for matched input and output reloads whose
+   mode are different.	Such pseudos has a modified rules for finding
+   their living ranges, e.g. assigning to subreg of such pseudo means
+   changing all pseudo value.  */
+bitmap_head lra_bound_pseudos;

Maybe:

/* Reload pseudos created for matched input and output reloads whose
    modes are different.  Such pseudos have different live ranges from
    other pseudos; e.g. any assignment to a subreg of these pseudos
    changes the whole pseudo's value.  */

Fixed.

Although that said, couldn't emit_move_insn_1 (called by gen_move_insn)
split a multiword pseudo move into two word moves?  Using the traditional
clobber technique sounds better than having special liveness rules.

It is not only about multi-words pseudos. It is about representation of this situation by constructions semantically incorrect in order parts of compiler. Reload has no such problem as it does not use RTL. So I don't think it splits as I use emit_move_insn and that calls emit_move_insn_1 too. I really needed a special liveness treatment (although I don't remember details) and therefore I added it. I had no detail design for LRA. The code was modified by numerous test failures on different targets. There is a lot of code analogous to reload one and probably its necessity should be rigorously questioned. I thought about and modified part of this code but unfortunately not all.

Also bound pseudos are rare. Their bitmap is very small and testing (2 lines of code in overall) them in ira-lives.c is fast.

+/* True if C is a non-empty register class that has too few registers
+   to be safely used as a reload target class.	*/
+#define SMALL_REGISTER_CLASS_P(C)					\
+  (reg_class_size [(C)] == 1						\
+   || (reg_class_size [(C)] >= 1 && targetm.class_likely_spilled_p (C)))

Feels like ira_class_hard_regs_num might be better, but since the
current definition is traditional, that shouldn't be a merge requirement.

Ok.

+/* Return mode of WHAT inside of WHERE whose mode of the context is
+   OUTER_MODE.	If WHERE does not contain WHAT, return VOIDmode.  */
+static enum machine_mode
+find_mode (rtx *where, enum machine_mode outer_mode, rtx *what)
+{
+  int i, j;
+  enum machine_mode mode;
+  rtx x;
+  const char *fmt;
+  enum rtx_code code;
+
+  if (where == what)
+    return outer_mode;
+  if (*where == NULL_RTX)
+    return VOIDmode;
+  x = *where;
+  code = GET_CODE (x);
+  outer_mode = GET_MODE (x);
+  fmt = GET_RTX_FORMAT (code);
+  for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+    {
+      if (fmt[i] == 'e')
+	{
+	  if ((mode = find_mode (&XEXP (x, i), outer_mode, what)) != VOIDmode)
+	    return mode;
+	}
+      else if (fmt[i] == 'E')
+	{
+	  for (j = XVECLEN (x, i) - 1; j >= 0; j--)
+	  if ((mode = find_mode (&XVECEXP (x, i, j), outer_mode, what))
+	      != VOIDmode)
+	    return mode;
+	}
+    }
+  return VOIDmode;
+}
+
+/* Return mode for operand NOP of the current insn.  */
+static inline enum machine_mode
+get_op_mode (int nop)
+{
+  rtx *loc;
+  enum machine_mode mode;
+  bool md_first_p = asm_noperands (PATTERN (curr_insn)) < 0;
+
+  /* Take mode from the machine description first.  */
+  if (md_first_p && (mode = curr_static_id->operand[nop].mode) != VOIDmode)
+    return mode;
+  loc = curr_id->operand_loc[nop];
+  /* Take mode from the operand second.	 */
+  mode = GET_MODE (*loc);
+  if (mode != VOIDmode)
+    return mode;
+  if (! md_first_p && (mode = curr_static_id->operand[nop].mode) != VOIDmode)
+    return mode;
+  /* Here is a very rare case.	Take mode from the context.  */
+  return find_mode (&PATTERN (curr_insn), VOIDmode, loc);
+}

This looks a lot more complicated than the reload version.  Why is
it needed?  In reload the conditions for address operands were:

	  /* Address operands are reloaded in their existing mode,
	     no matter what is specified in the machine description.  */
	  operand_mode[i] = GET_MODE (recog_data.operand[i]);

	  /* If the address is a single CONST_INT pick address mode
	     instead otherwise we will later not know in which mode
	     the reload should be performed.  */
	  if (operand_mode[i] == VOIDmode)
	    operand_mode[i] = Pmode;

which for LRA might look like:

	  /* The mode specified in the .md file for address operands
	     is the mode of the addressed value, not the address itself.
	     We therefore need to get the mode from the operand rtx.
	     If the operand has no mode, assume it was Pmode.  */

For other operands, recog_data.operand_mode ought to be correct.

find_mode assumes that the mode of an operand is the same as the mode of
the outer rtx, which isn't true when the outer rtx is a subreg, mem,
or one of several unary operators.

This is one that I think would be best decided for 4.8.

I found the reason for this code:

http://old.nabble.com/-lra--patch-to-fix-SPEC2000-sixtrack-compiler-crash-p32189310.html

But I tried it again and can not repeat the problem. So I reverted the patch. If on stage3 we find a problem, we try to find a solution.

+/* If REG is a reload pseudo, try to make its class satisfying CL.  */
+static void
+narrow_reload_pseudo_class (rtx reg, enum reg_class cl)
+{
+  int regno;
+  enum reg_class rclass;
+
+  /* Do not make more accurate class from reloads generated.  They are
+     mostly moves with a lot of constraints.  Making more accurate
+     class may results in very narrow class and impossibility of find
+     registers for several reloads of one insn.	 */
+  if (INSN_UID (curr_insn) >= new_insn_uid_start)
+    return;
+  if (GET_CODE (reg) == SUBREG)
+    reg = SUBREG_REG (reg);
+  if (! REG_P (reg) || (regno = REGNO (reg)) < new_regno_start)
+    return;
+  rclass = get_reg_class (regno);
+  rclass = ira_reg_class_subset[rclass][cl];
+  if (rclass == NO_REGS)
+    return;
+  change_class (regno, rclass, "      Change", true);
+}

There seems to be an overlap in functionality with in_class_p here.
Maybe:

{
   enum reg_class rclass;

   if (in_class_p (reg, cl, &rclass) && rclass != NO_REGS)
     change_class (REGNO (reg), rclass, "      Change", true);
}

(assuming the change in in_class_p interface suggested above).
This avoids duplicating subtleties like the handling of reloads.

Fixed.

+      /* We create pseudo for out rtx because we always should keep
+	 registers with the same original regno have synchronized
+	 value (it is not true for out register but it will be
+	 corrected by the next insn).

I don't understand this comment, sorry.

Pseudos have values -- see comments for lra_reg_info. Different pseudos with the same value do not conflict even if they live in the same place. When we create a pseudo we assign value of original pseudo (if any) from which we created the new pseudo. If we create the pseudo from the input pseudo, the new pseudo will no conflict with the input pseudo which is wrong when the input pseudo lives after the insn and as the new pseudo value is changed by the insn output. Therefore we create the new pseudo from the output.

I hope it is more understandable. I changed the comment.

+	 Do not reuse register because of the following situation: a <-
+	 a op b, and b should be the same as a.	 */

This part is very convincing though :-)  Maybe:
	 We cannot reuse the current output register because we might
	 have a situation like "a <- a op b", where the constraints force
	 the second input operand ("b") to match the output operand ("a").
	 "b" must then be copied into a new register so that it doesn't
	 clobber the current value of "a".  */

Ok. Fixed.

We should probably keep the other reason too, of course.

+      /* Don't generate inheritance for the new register because we
+	 can not use the same hard register for the corresponding
+	 inheritance pseudo for input reload.  */
+      bitmap_set_bit (&lra_matched_pseudos, REGNO (new_in_reg));

Suggest dropping this comment, since we don't do any inheritance here.
The comment above lra_matched_pseudos already says the same thing.

Fixed.

+  /* In and out operand can be got from transformations before
+     processing constraints.  So the pseudos might have inaccurate
+     class and we should make their classes more accurate.  */
+  narrow_reload_pseudo_class (in_rtx, goal_class);
+  narrow_reload_pseudo_class (out_rtx, goal_class);

I don't understand this, sorry.  Does "transformations" mean inheritance
and reload splitting?  So the registers we're changing here are inheritance
and split pseudos rather than reload pseudos created for this instruction?
If so, it sounds on face value like it conflicts with the comment quoted
above about not allowing reload instructions to the narrow the class
of pseudos.  Might be worth saying why that's OK here but not there.

Again, inheritance and splitting is done after the constraint pass.

The transformations here are mostly reloading of subregs which is done before reloads for given insn. On this transformation we create new pseudos for which we don't know reg class yet. In case we don't know pseudo reg class yet, we assign ALL_REGS to the pseudo.

Also, I'm not sure I understand why it helps.  Is it just trying
to encourage the pseudos to form a chain in lra-assigns.c?

E.g. MIPS16 has several instructions that require matched MIPS16 registers.
However, moves between MIPS16 registers and general registers are as cheap
as moves between two MIPS16 registers, so narrowing the reloaded values
from GENERAL_REGS to M16_REGS (if that ever happens) wouldn't necessarily
be a good thing.

Not saying this is wrong, just that it might need more commentary
to justify it.

+  for (i = 0; (in = ins[i]) >= 0; i++)
+    *curr_id->operand_loc[in] = new_in_reg;

The code assumes that all input operands have the same mode.
Probably worth asserting that here (or maybe further up; I don't mind),
just to make the assumption explicit.

I added an assert.

+/* Return final hard regno (plus offset) which will be after
+   elimination.	 We do this for matching constraints because the final
+   hard regno could have a different class.  */
+static int
+get_final_hard_regno (int hard_regno, int offset)
+{
+  if (hard_regno < 0)
+    return hard_regno;
+  hard_regno += offset;
+  return lra_get_elimation_hard_regno (hard_regno);

Why apply the offset before rather than after elimination?
AIUI, AVR's eliminable registers span more than one hard register,
and the elimination is based off the first.

I fixed it.

Also, all uses but one of lra_get_hard_regno_and_offset follow
the pattern:

       lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
       /* The real hard regno of the operand after the allocation.  */
       x_hard_regno = get_final_hard_regno (x_hard_regno, offset);

so couldn't lra_get_hard_regno_and_offset just return the final
hard register, including elimination?  Then it could apply the
elimination on the original rtx.

FWIW, the exception I mentioned was operands_match_p:

       lra_get_hard_regno_and_offset (x, &i, &offset);
       if (i < 0)
	goto slow;
       i += offset;

but I'm not sure why this is the only caller that would want
to ignore elimination.

???

+/* Return register class of OP.	 That is a class of the hard register
+   itself (if OP is a hard register), or class of assigned hard
+   register to the pseudo (if OP is pseudo), or allocno class of
+   unassigned pseudo (if OP is reload pseudo).	Return NO_REGS
+   otherwise.  */
+static enum reg_class
+get_op_class (rtx op)
+{
+  int regno, hard_regno, offset;
+
+  if (! REG_P (op))
+    return NO_REGS;
+  lra_get_hard_regno_and_offset (op, &hard_regno, &offset);
+  if (hard_regno >= 0)
+    {
+      hard_regno = get_final_hard_regno (hard_regno, offset);
+      return REGNO_REG_CLASS (hard_regno);
+    }
+  /* Reload pseudo will get a hard register in any case.  */
+  if ((regno = REGNO (op)) >= new_regno_start)
+    return lra_get_allocno_class (regno);
+  return NO_REGS;
+}

This looks like it ought to be the same as:

return REG_P (x) ? get_reg_class (REGNO (x)) : NO_REGS;

If not, I think there should be a comment explaining the difference.
If so, the comment might be:

/* If OP is a register, return the class of the register as per
    get_reg_class, otherwise return NO_REGS.  */

The difference is in elimination. But if i add elimination to get_reg_class, it will be the same. I think it is right to do. It permits to remove elimination code in process_addr_reg too. So I modified the code. It looks more brief and logical.

+/* Return generated insn mem_pseudo:=val if TO_P or val:=mem_pseudo
+   otherwise.  If modes of MEM_PSEUDO and VAL are different, use
+   SUBREG for VAL to make them equal.  Assign CODE to the insn if it
+   is not recognized.
+
+   We can not use emit_move_insn in some cases because of bad used
+   practice in some machine descriptions.  For example, power can use
+   only base+index addressing for altivec move insns and it is checked
+   by insn predicates.	On the other hand, the same move insn
+   constraints permit to use offsetable memory for moving vector mode
+   values from/to general registers to/from memory.  emit_move_insn
+   will transform offsetable address to one with base+index addressing
+   which is rejected by the constraint.	 So sometimes we need to
+   generate move insn without modifications and assign the code
+   explicitly because the generated move can be unrecognizable because
+   of the predicates.  */

Ick :-)  Can't we just say that fixing this is part of the process
of porting a target to LRA?  It'd be nice not to carry hacks like
this around in shiny new code.

It would be great but I don't expect that target maintainers will be so cooperative. So my goal was to write LRA requiring minimum changes in target code or no changes at all. Even with keeping this goal, I am a bit pessimistic about how much time will be needed to remove reload. With such requirements not to use hacks, it would take forever.

The biggest number of tricks I saw was PPC. I spent a lot of time porting LRA to it. And there are a lot of code in rs6000.c for LRA.

As it stands:

+static rtx
+emit_spill_move (bool to_p, rtx mem_pseudo, rtx val, int code)
+{
+  rtx insn, after;
+
+  start_sequence ();
+  if (GET_MODE (mem_pseudo) != GET_MODE (val))
+    val = gen_rtx_SUBREG (GET_MODE (mem_pseudo),
+			  GET_CODE (val) == SUBREG ? SUBREG_REG (val) : val,
+			  0);
+  if (to_p)
+    insn = gen_move_insn (mem_pseudo, val);
+  else
+    insn = gen_move_insn (val, mem_pseudo);
+  if (recog_memoized (insn) < 0)
+    INSN_CODE (insn) = code;
+  emit_insn (insn);
+  after = get_insns ();
+  end_sequence ();
+  return after;
+}

this recog_memoized code effectively assumes that INSN is just one
instruction, whereas emit_move_insn_1 or the backend move expanders
could split moves into several instructions.

Since the code-forcing stuff is for rs6000, I think we could drop it
from 4.8 whatever happens.

The sequence stuff above looks redundant; we should just return
INSN directly.

Ok. I fixed. Although it makes my life harder as some targets will be broken on the branch after all these changes.

+  /* Quick check on the right move insn which does not need
+     reloads.  */
+  if ((dclass = get_op_class (dest)) != NO_REGS
+      && (sclass = get_op_class (src)) != NO_REGS
+      && targetm.register_move_cost (GET_MODE (src), dclass, sclass) == 2)
+    return true;

Suggest:

   /* The backend guarantees that register moves of cost 2 never need
      reloads.  */

Fixed.

+  if (GET_CODE (dest) == SUBREG)
+    dreg = SUBREG_REG (dest);
+  if (GET_CODE (src) == SUBREG)
+    sreg = SUBREG_REG (src);
+  if (! REG_P (dreg) || ! REG_P (sreg))
+    return false;
+  sclass = dclass = NO_REGS;
+  dr = get_equiv_substitution (dreg);
+  if (dr != dreg)
+    dreg = copy_rtx (dr);

I think this copy is too early, because there are quite a few
conditions under which we never emit anything with DREG in it.

Ok, fixed.

+  if (REG_P (dreg))
+    dclass = get_reg_class (REGNO (dreg));
+  if (dclass == ALL_REGS)
+    /* We don't know what class we will use -- let it be figured out
+       by curr_insn_transform function.	 Remember some targets does not
+       work with such classes through their implementation of
+       machine-dependent hooks like secondary_memory_needed.  */
+    return false;

Don't really understand this comment, sorry.

Again ALL_REGS is used for new pseudos created by transformation like reload of SUBREG_REG. We don't know its class yet. We should figure out the class from processing the insn constraints not in this fast path function. Even if ALL_REGS were a right class for the pseudo, secondary_... hooks usually are not define for ALL_REGS.

I fixed the comment.

+  sreg_mode = GET_MODE (sreg);
+  sr = get_equiv_substitution (sreg);
+  if (sr != sreg)
+    sreg = copy_rtx (sr);

This copy also seems too early.

Fixed.

+  sri.prev_sri = NULL;
+  sri.icode = CODE_FOR_nothing;
+  sri.extra_cost = 0;
+  secondary_class = NO_REGS;
+  /* Set up hard register for a reload pseudo for hook
+     secondary_reload because some targets just ignore unassigned
+     pseudos in the hook.  */
+  if (dclass != NO_REGS
+      && REG_P (dreg) && (dregno = REGNO (dreg)) >= new_regno_start
+      && lra_get_regno_hard_regno (dregno) < 0)
+    reg_renumber[dregno] = ira_class_hard_regs[dclass][0];
+  else
+    dregno = -1;
+  if (sclass != NO_REGS
+      && REG_P (sreg) && (sregno = REGNO (sreg)) >= new_regno_start
+      && lra_get_regno_hard_regno (sregno) < 0)
+    reg_renumber[sregno] = ira_class_hard_regs[sclass][0];
+  else
+    sregno = -1;

I think this would be correct without the:

&& REG_P (dreg) && (dregno = REGNO (dreg)) >= new_regno_start

condition (and similarly for the src case).  IMO it would be clearer too:
the decision about when to return a register class for unallocated pseudos
is then localised to get_reg_class rather than copied both here and there.

Right. fixed.

+  if (sclass != NO_REGS)
+    secondary_class
+      = (enum reg_class) targetm.secondary_reload (false, dest,
+						   (reg_class_t) sclass,
+						   GET_MODE (src), &sri);
+  if (sclass == NO_REGS
+      || ((secondary_class != NO_REGS || sri.icode != CODE_FOR_nothing)
+	  && dclass != NO_REGS))
+    secondary_class
+      = (enum reg_class) targetm.secondary_reload (true, sreg,
+						   (reg_class_t) dclass,
+						   sreg_mode, &sri);

Hmm, so for register<-register moves, if the target says that the output
reload needs a secondary reload, we try again with an input reload and
hope for a different answer?

If the target is giving different answers in that case, I think that's
a bug in the target, and we should assert instead.  The problem is that
if we allow the answers to be different, and both answers involve
secondary reloads, we have no way of knowing whether the second answer
is easier to implement or "more correct" than the first.  An assert
avoids that, and puts the onus on the target to sort itself out.

Again, as long as x86 is free of this bug for 4.8, I don't the merge
needs to cater for broken targets.

I added an assert.

+ *change_p = true;
I think this is the point at which substituted values should be copied.

Fixed.

+  new_reg = NULL_RTX;
+  if (secondary_class != NO_REGS)
+    new_reg = lra_create_new_reg_with_unique_value (sreg_mode, NULL_RTX,
+						    secondary_class,
+						    "secondary");
+  start_sequence ();
+  if (sri.icode == CODE_FOR_nothing)
+    lra_emit_move (new_reg, sreg);
+  else
+    {
+      enum reg_class scratch_class;
+
+      scratch_class = (reg_class_from_constraints
+		       (insn_data[sri.icode].operand[2].constraint));
+      scratch_reg = (lra_create_new_reg_with_unique_value
+		     (insn_data[sri.icode].operand[2].mode, NULL_RTX,
+		      scratch_class, "scratch"));
+      emit_insn (GEN_FCN (sri.icode) (new_reg != NULL_RTX ? new_reg : dest,
+				      sreg, scratch_reg));
+    }
+  before = get_insns ();
+  end_sequence ();
+  lra_process_new_insns (curr_insn, before, NULL_RTX, "Inserting the move");

AIUI, the constraints pass will look at these instructions and generate
what are now known as tertiary reloads where needed (by calling this
function again).  Is that right?  Very nice if so: that's far more
natural than the current reload handling.

Yes.

+/* The chosen reg classes which should be used for the corresponding
+   operands.  */
+static enum reg_class goal_alt[MAX_RECOG_OPERANDS];
+/* True if the operand should be the same as another operand and the
+   another operand does not need a reload.  */

s/and the another/and that other/

Fixed.

+/* Make reloads for addr register in LOC which should be of class CL,
+   add reloads to list BEFORE.	If AFTER is not null emit insns to set
+   the register up after the insn (it is case of inc/dec, modify).  */

Maybe:

/* Arrange for address element *LOC to be a register of class CL.
    Add any input reloads to list BEFORE.  AFTER is nonnull if *LOC is an
    automodified value; handle that case by adding the required output
    reloads to list AFTER.  Return true if the RTL was changed.  */

Fixed.

+static bool
+process_addr_reg (rtx *loc, rtx *before, rtx *after, enum reg_class cl)
+{
+  int regno, final_regno;
+  enum reg_class rclass, new_class;
+  rtx reg = *loc;
+  rtx new_reg;
+  enum machine_mode mode;
+  bool change_p = false;
+
+  mode = GET_MODE (reg);
+  if (! REG_P (reg))
+    {
+      /* Always reload memory in an address even if the target
+	 supports such addresses.  */
+      new_reg
+	= lra_create_new_reg_with_unique_value (mode, reg, cl, "address");
+      push_to_sequence (*before);
+      lra_emit_move (new_reg, reg);
+      *before = get_insns ();
+      end_sequence ();
+      *loc = new_reg;
+      if (after != NULL)
+	{
+	  start_sequence ();
+	  lra_emit_move (reg, new_reg);
+	  emit_insn (*after);
+	  *after = get_insns ();
+	  end_sequence ();
+	}
+      return true;

Why does this need to be a special case, rather than reusing the
code later in the function?  Specifically:

I simplified the function factoring common code.

+    }
+  lra_assert (REG_P (reg));
+  final_regno = regno = REGNO (reg);
+  if (regno < FIRST_PSEUDO_REGISTER)
+    {
+      rtx final_reg = reg;
+      rtx *final_loc = &final_reg;
+
+      lra_eliminate_reg_if_possible (final_loc);
+      final_regno = REGNO (*final_loc);
+    }
+  /* Use class of hard register after elimination because some targets
+     do not recognize virtual hard registers as valid address
+     registers.	 */
+  rclass = get_reg_class (final_regno);
+  if ((*loc = get_equiv_substitution (reg)) != reg)
+    {
+      if (lra_dump_file != NULL)
+	{
+	  fprintf (lra_dump_file,
+		   "Changing pseudo %d in address of insn %u on equiv ",
+		   REGNO (reg), INSN_UID (curr_insn));
+	  print_value_slim (lra_dump_file, *loc, 1);
+	  fprintf (lra_dump_file, "\n");
+	}
+      *loc = copy_rtx (*loc);
+      change_p = true;
+    }
+  if (*loc != reg || ! in_class_p (final_regno, GET_MODE (reg), cl, &new_class))
+    {
+      reg = *loc;
+      if (get_reload_reg (OP_IN, mode, reg, cl, "address", &new_reg))
+	{
+	  push_to_sequence (*before);
+	  lra_emit_move (new_reg, reg);
+	  *before = get_insns ();
+	  end_sequence ();
+	}
+      *loc = new_reg;
+      if (after != NULL)
+	{
+	  start_sequence ();
+	  lra_emit_move (reg, new_reg);
+	  emit_insn (*after);
+	  *after = get_insns ();
+	  end_sequence ();
+	}
+      change_p = true;
+    }
+  else if (new_class != NO_REGS && rclass != new_class)
+    change_class (regno, new_class, "	   Change", true);
+  return change_p;
+}

E.g.:

   if ((*loc = get_equiv_substitution (reg)) != reg)
     ...as above...
   if (*loc != reg || !in_class_p (reg, cl, &new_class))
     ...as above...
   else if (new_class != NO_REGS && rclass != new_class)
     change_class (regno, new_class, "	   Change", true);
   return change_p;

(assuming change to in_class_p suggested earlier) seems like it
covers the same cases.

Also, should OP_IN be OP_INOUT for after != NULL, so that we don't try
to reuse existing reload pseudos?  That would mean changing get_reload_reg
(both commentary and code) to handle OP_INOUT like OP_OUT.

Fixed.

Or maybe just pass OP_OUT instead of OP_INOUT, if that's more consistent.
I don't mind which.

+  /* Force reload if this is a constant or PLUS or if there may be a
+     problem accessing OPERAND in the outer mode.  */

Suggest:

/* Force a reload of the SUBREG_REG if this ...

Fixed.

+      /* Constant mode ???? */
+      enum op_type type = curr_static_id->operand[nop].type;

Not sure what the comment means, but REG is still the original SUBREG_REG,
so there shouldn't be any risk of a VOIDmode constant.  (subreg (const_int))
is invalid rtl.

I removed the comment. It was probably a question for myself.

+/* Return TRUE if *LOC refers for a hard register from SET.  */
+static bool
+uses_hard_regs_p (rtx *loc, HARD_REG_SET set)
+{

Nothing seems to care about the address, so we would pass the rtx
rather than a pointer to it.

Fixed.

+  int i, j, x_hard_regno, offset;
+  enum machine_mode mode;
+  rtx x;
+  const char *fmt;
+  enum rtx_code code;
+
+  if (*loc == NULL_RTX)
+    return false;
+  x = *loc;
+  code = GET_CODE (x);
+  mode = GET_MODE (x);
+  if (code == SUBREG)
+    {
+      loc = &SUBREG_REG (x);
+      x = SUBREG_REG (x);
+      code = GET_CODE (x);
+      if (GET_MODE_SIZE (GET_MODE (x)) > GET_MODE_SIZE (mode))
+	mode = GET_MODE (x);
+    }
+
+  if (REG_P (x))
+    {
+      lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
+      /* The real hard regno of the operand after the allocation.  */
+      x_hard_regno = get_final_hard_regno (x_hard_regno, offset);
+      return (x_hard_regno >= 0
+	      && lra_hard_reg_set_intersection_p (x_hard_regno, mode, set));

With the subreg mode handling above, this looks little-endian specific.
The MEM case:

+  if (MEM_P (x))
+    {
+      struct address ad;
+      enum machine_mode mode = GET_MODE (x);
+      rtx *addr_loc = &XEXP (x, 0);
+
+      extract_address_regs (mode, MEM_ADDR_SPACE (x), addr_loc, MEM, &ad);
+      if (ad.base_reg_loc != NULL)
+	{
+	  if (uses_hard_regs_p (ad.base_reg_loc, set))
+	    return true;
+	}
+      if (ad.index_reg_loc != NULL)
+	{
+	  if (uses_hard_regs_p (ad.index_reg_loc, set))
+	    return true;
+	}
+    }

is independent of the subreg handling, so perhaps the paradoxical subreg
case should be handled separately, using simplify_subreg_regno.

+/* Major function to choose the current insn alternative and what
+   operands should be reloaded and how.	 If ONLY_ALTERNATIVE is not
+   negative we should consider only this alternative.  Return false if
+   we can not choose the alternative or find how to reload the
+   operands.  */
+static bool
+process_alt_operands (int only_alternative)
+{
+  bool ok_p = false;
+  int nop, small_class_operands_num, overall, nalt, offset;
+  int n_alternatives = curr_static_id->n_alternatives;
+  int n_operands = curr_static_id->n_operands;
+  /* LOSERS counts those that don't fit this alternative and would
+     require loading.  */
+  int losers;

s/those/the operands/

Fixed.

+  /* Calculate some data common for all alternatives to speed up the
+     function.	*/
+  for (nop = 0; nop < n_operands; nop++)
+    {
+      op = no_subreg_operand[nop] = *curr_id->operand_loc[nop];
+      lra_get_hard_regno_and_offset (op, &hard_regno[nop], &offset);
+      /* The real hard regno of the operand after the allocation.  */
+      hard_regno[nop] = get_final_hard_regno (hard_regno[nop], offset);
+
+      operand_reg[nop] = op;
+      biggest_mode[nop] = GET_MODE (operand_reg[nop]);
+      if (GET_CODE (operand_reg[nop]) == SUBREG)
+	{
+	  operand_reg[nop] = SUBREG_REG (operand_reg[nop]);
+	  if (GET_MODE_SIZE (biggest_mode[nop])
+	      < GET_MODE_SIZE (GET_MODE (operand_reg[nop])))
+	    biggest_mode[nop] = GET_MODE (operand_reg[nop]);
+	}
+      if (REG_P (operand_reg[nop]))
+	no_subreg_operand[nop] = operand_reg[nop];
+      else
+	operand_reg[nop] = NULL_RTX;

This looks odd: no_subreg_operand ends up being a subreg if the
SUBREG_REG wasn't a REG.  Some more commentary might help.

Probably I should have a better name no_subreg_reg_operand. I also added comments for its definition and definition of operand_reg.

+  /* The constraints are made of several alternatives.	Each operand's
+     constraint looks like foo,bar,... with commas separating the
+     alternatives.  The first alternatives for all operands go
+     together, the second alternatives go together, etc.
+
+     First loop over alternatives.  */
+  for (nalt = 0; nalt < n_alternatives; nalt++)
+    {
+      /* Loop over operands for one constraint alternative.  */
+      if (
+#ifdef HAVE_ATTR_enabled
+	  (curr_id->alternative_enabled_p != NULL
+	   && ! curr_id->alternative_enabled_p[nalt])
+	  ||
+#endif
+	  (only_alternative >= 0 && nalt != only_alternative))
+	continue;

Probably more natural if split into two "if (...) continue;"s. E.g.:

Fixed.

#ifdef HAVE_ATTR_enabled
       if (curr_id->alternative_enabled_p != NULL
	  && !curr_id->alternative_enabled_p[nalt])
	continue;
#endif
       if (only_alternative >= 0 && nalt != only_alternative))
	continue;

+      for (nop = 0; nop < n_operands; nop++)
+	{
+	  const char *p;
+	  char *end;
+	  int len, c, m, i, opalt_num, this_alternative_matches;
+	  bool win, did_match, offmemok, early_clobber_p;
+	  /* false => this operand can be reloaded somehow for this
+	     alternative.  */
+	  bool badop;
+	  /* false => this operand can be reloaded if the alternative
+	     allows regs.  */
+	  bool winreg;
+	  /* False if a constant forced into memory would be OK for
+	     this operand.  */
+	  bool constmemok;
+	  enum reg_class this_alternative, this_costly_alternative;
+	  HARD_REG_SET this_alternative_set, this_costly_alternative_set;
+	  bool this_alternative_match_win, this_alternative_win;
+	  bool this_alternative_offmemok;
+	  int invalidate_m;
+	  enum machine_mode mode;
+
+	  opalt_num = nalt * n_operands + nop;
+	  if (curr_static_id->operand_alternative[opalt_num].anything_ok)
+	    {
+	      /* Fast track for no constraints at all.	*/
+	      curr_alt[nop] = NO_REGS;
+	      CLEAR_HARD_REG_SET (curr_alt_set[nop]);
+	      curr_alt_win[nop] = true;
+	      curr_alt_match_win[nop] = false;
+	      curr_alt_offmemok[nop] = false;
+	      curr_alt_matches[nop] = -1;
+	      continue;
+	    }

Given that this code is pretty complex, it might be clearer to remove
the intermediate "this_*" variables and assign directly to curr_alt_*.  I.e.:

I'd rather not to do this. Using array element instead of simple variable occurrences in about 50 places. I don't think it makes code cleaner.

	  curr_alt[nop] = NO_REGS;
	  CLEAR_HARD_REG_SET (curr_alt_set[nop]);
	  curr_alt_win[nop] = false;
	  curr_alt_match_win[nop] = false;
	  curr_alt_offmemok[nop] = false;
	  curr_alt_matches[nop] = -1;

	  opalt_num = nalt * n_operands + nop;
	  if (curr_static_id->operand_alternative[opalt_num].anything_ok)
	    {
	      /* Fast track for no constraints at all.	*/
	      curr_alt_win[nop] = true;
	      continue;
	    }

Obviously keep this nice comment:

+	  /* We update set of possible hard regs besides its class
+	     because reg class might be inaccurate.  For example,
+	     union of LO_REGS (l), HI_REGS(h), and STACK_REG(k) in ARM
+	     is translated in HI_REGS because classes are merged by
+	     pairs and there is no accurate intermediate class.	 */

somewhere though, either here or above the declaration of curr_alt_set.

+		    /* We are supposed to match a previous operand.
+		       If we do, we win if that one did.  If we do
+		       not, count both of the operands as losers.
+		       (This is too conservative, since most of the
+		       time only a single reload insn will be needed
+		       to make the two operands win.  As a result,
+		       this alternative may be rejected when it is
+		       actually desirable.)  */
+		    /* If it conflicts with others.  */

Last line looks incomplete/misplaced.

I have no idea where/what it should be. I removed it.

+		    match_p = false;
+		    if (operands_match_p (*curr_id->operand_loc[nop],
+					  *curr_id->operand_loc[m], m_hregno))
+		      {
+			int i;
+			
+			for (i = 0; i < early_clobbered_regs_num; i++)
+			  if (early_clobbered_nops[i] == m)
+			    break;
+			/* We should reject matching of an early
+			   clobber operand if the matching operand is
+			   not dying in the insn.  */
+			if (i >= early_clobbered_regs_num

Why not simply use operands m's early_clobber field?

Ok. Fixed.

+			    || operand_reg[nop] == NULL_RTX
+			    || (find_regno_note (curr_insn, REG_DEAD,
+						 REGNO (operand_reg[nop]))
+				!= NULL_RTX))
+			  match_p = true;

...although I don't really understand this condition.  If the two
operands are the same value X, then X must die here whatever the
notes say.  So I assume this is coping with a case where the operands
are different but still match.  If so, could you give an example?

I remember I saw such insn but I don't remember details.

Matched earlyclobbers explicitly guarantee that the earlyclobber doesn't
apply to the matched input operand; the earlyclobber only applies to
other input operands.  So I'd have expected it was those operands
that might need reloading rather than this one.

E.g. if X occurs three times, twice in a matched earlyclobber pair
and once as an independent operand, it's the latter operand that would
need reloading.

Yes, I know.

+			/* Operands don't match.  */
+			/* Retroactively mark the operand we had to
+			   match as a loser, if it wasn't already and
+			   it wasn't matched to a register constraint
+			   (e.g it might be matched by memory).	 */
+			if (curr_alt_win[m]
+			    && (operand_reg[m] == NULL_RTX
+				|| hard_regno[m] < 0))
+			  {
+			    losers++;
+			    if (curr_alt[m] != NO_REGS)
+			      reload_nregs
+				+= (ira_reg_class_max_nregs[curr_alt[m]]
+				    [GET_MODE (*curr_id->operand_loc[m])]);
+			}
+			invalidate_m = m;
+			if (curr_alt[m] == NO_REGS)
+			  continue;

I found this a bit confusing.  If the operands don't match and operand m
allows no registers, don't we have to reject this constraint outright?
E.g. something like:

Yes, this function as in reload can be investigated for long time. I cleared it a bit. May be it is right time to clear it up more although time is a scarce resource. I tried your variant. I did not find serious problems on x86 which should be concerned of this code. So I am using it.

			/* Operands don't match.  Both operands must
			   allow a reload register, otherwise we cannot
			   make them match.  */
			if (curr_alt[m] == NO_REGS)
			  break;
			/* Retroactively mark the operand we had to
			   match as a loser, if it wasn't already and
			   it wasn't matched to a register constraint
			   (e.g it might be matched by memory).	 */
			if (curr_alt_win[m]
			    && (operand_reg[m] == NULL_RTX
				|| hard_regno[m] < 0))
			  {
			    losers++;
			    reload_nregs
			      += (ira_reg_class_max_nregs[curr_alt[m]]
				  [GET_MODE (*curr_id->operand_loc[m])]);
			  }

+		    /* This can be fixed with reloads if the operand
+		       we are supposed to match can be fixed with
+		       reloads.	 */
+		    badop = false;
+		    this_alternative = curr_alt[m];
+		    COPY_HARD_REG_SET (this_alternative_set, curr_alt_set[m]);
+		
+		    /* If we have to reload this operand and some
+		       previous operand also had to match the same
+		       thing as this operand, we don't know how to do
+		       that.  So reject this alternative.  */
+		    if (! did_match)
+		      for (i = 0; i < nop; i++)
+			if (curr_alt_matches[i] == this_alternative_matches)
+			  badop = true;

OK, so this is another case of cruft from reload that I'd like to remove,
but do you know of any reason why this shouldn't be:

		    /* If we have to reload this operand and some previous
		       operand also had to match the same thing as this
		       operand, we don't know how to do that.  */
		    if (!match_p || !curr_alt_win[m])
		      {
			for (i = 0; i < nop; i++)
			  if (curr_alt_matches[i] == m)
			    break;
			if (i < nop)
			  break;
		      }
		    else
--->		      did_match = true;

		    /* This can be fixed with reloads if the operand
		       we are supposed to match can be fixed with
		       reloads.	 */
--->		    this_alternative_matches = m;
--->		    invalidate_m = m;
		    badop = false;
		    this_alternative = curr_alt[m];
		    COPY_HARD_REG_SET (this_alternative_set, curr_alt_set[m]);

(although a helper function might be better than the awkward breaking)?
Note that the ---> lines have moved from further up.

This is the only time in the switch statement where one constraint
in a constraint string uses "badop = true" to reject the operand.
I.e. for "<something else>0" we should normally not reject the
alternative based solely on the "0", since the "<something else>"
might have been satisfied instead.  And we should only record matching
information if we've decided the match can be implemented by reloads
(the last block).

Yes, that is bad. Especially it is not documented but people use this.

I tried it and I did not a problem with this. So I use your variant (with one modification setting invalidate_m only for !did_match).

+			/* We prefer no matching alternatives because
+			   it gives more freedom in RA.	 */
+			if (operand_reg[nop] == NULL_RTX
+			    || (find_regno_note (curr_insn, REG_DEAD,
+						 REGNO (operand_reg[nop]))
+				 == NULL_RTX))
+			  reject += 2;

Looks like a new reject rule.  I agree it makes conceptual sense though,
so I'm all for it.

+		      || (REG_P (op)
+			  && REGNO (op) >= FIRST_PSEUDO_REGISTER
+			  && in_mem_p (REGNO (op))))

This pattern occurs several times.  I think a helper function like
spilled_reg_p (op) would help.  See 'g' below.

Ok. Fixed.

+		case 's':
+		  if (CONST_INT_P (op)
+		      || (GET_CODE (op) == CONST_DOUBLE && mode == VOIDmode))

...

+		case 'n':
+		  if (CONST_INT_P (op)
+		      || (GET_CODE (op) == CONST_DOUBLE && mode == VOIDmode))

After recent changes these should be CONST_SCALAR_INT_P (op)

OK.

+		case 'g':
+		  if (/* A PLUS is never a valid operand, but LRA can
+			 make it from a register when eliminating
+			 registers.  */
+		      GET_CODE (op) != PLUS
+		      && (! CONSTANT_P (op) || ! flag_pic
+			  || LEGITIMATE_PIC_OPERAND_P (op))
+		      && (! REG_P (op)
+			  || (REGNO (op) >= FIRST_PSEUDO_REGISTER
+			      && in_mem_p (REGNO (op)))))

Rather than the special case for PLUS, I think this would be better as:

	  if (MEM_P (op)
	      || spilled_reg_p (op)
	      || general_constant_p (op))
	    win = true;

where general_constant_p abstracts away:

	  (CONSTANT_P (op)
	   && (! flag_pic || LEGITIMATE_PIC_OPERAND_P (op)))

general_constant_p probably ought to go in a common file because several
places need this condition (including other parts of this switch statement).

OK. Fixed.

+#ifdef EXTRA_CONSTRAINT_STR
+		      if (EXTRA_MEMORY_CONSTRAINT (c, p))
+			{
+			  if (EXTRA_CONSTRAINT_STR (op, c, p))
+			    win = true;
+			  /* For regno_equiv_mem_loc we have to
+			     check.  */
+			  else if (REG_P (op)
+				   && REGNO (op) >= FIRST_PSEUDO_REGISTER
+				   && in_mem_p (REGNO (op)))

Looks like an old comment from an earlier iteration.  There doesn't
seem to be a function called regno_equiv_mem_loc in the current patch.
But...

Yes, it is from early version. I removed it.

+			    {
+			      /* We could transform spilled memory
+				 finally to indirect memory.  */
+			      if (EXTRA_CONSTRAINT_STR
+				  (get_indirect_mem (mode), c, p))
+				win = true;
+			    }

...is this check really needed?  It's a documented requirement that memory
constraints accept plain base registers.  Also, the following code is:

I removed it. At least it works for x86/x86-64.

+			  /* If we didn't already win, we can reload
+			     constants via force_const_mem, and other
+			     MEMs by reloading the address like for
+			     'o'.  */
+			  if (CONST_POOL_OK_P (mode, op) || MEM_P (op))
+			    badop = false;

It seems a bit inconsistent to treat a spilled pseudo whose address
might well need reloading as a win, while not treating existing MEMs
whose addresses need reloading as a win.

Well, probability of reloading address of spilled pseudo is very small on most targets but reloading for MEM in this case is real. So I see it logical.

+		      if (EXTRA_CONSTRAINT_STR (op, c, p))
+			win = true;
+		      else if (REG_P (op)
+			       && REGNO (op) >= FIRST_PSEUDO_REGISTER
+			       && in_mem_p (REGNO (op)))
+			{
+			  /* We could transform spilled memory finally
+			     to indirect memory.  */
+			  if (EXTRA_CONSTRAINT_STR (get_indirect_mem (mode),
+						    c, p))
+			    win = true;
+			}

I don't understand why there's two copies of this.  I think we have
to trust the target's classification of constraints, so if the target
says that something isn't a memory constraint, we shouldn't check the
(mem (base)) case.

I removed it too.

+	      if (c != ' ' && c != '\t')
+		costly_p = c == '*';

I think there needs to be a comment somewhere saying how we handle this.
Being costly seems to contribute one reject point (i.e. a sixth of a '?')
compared to the normal case, which is very different from the current
reload behaviour.  We should probably update the "*" documentation
in md.texi too.

Yes. It is different. New heuristics result in a better code generation.

Some targets use "*" constraints to make sure that floating-point
registers don't get used as spill space in purely integer code
(so that task switches don't pay the FPU save/restore penalty).
Would that still "work" with this definition?  (FWIW, I think
using "*" is a bad way to achieve this feature, just asking.)

There are two places for processing '*'. One is in ira-cost.c for choose classes. Therefore I believe it will work. At least I did not find any problems on 8 targets. I changed md.texi too.

+		  /* We simulate the behaviour of old reload here.
+		     Although scratches need hard registers and it
+		     might result in spilling other pseudos, no reload
+		     insns are generated for the scratches.  So it
+		     might cost something but probably less than old
+		     reload pass believes.  */
+		  if (lra_former_scratch_p (REGNO (operand_reg[nop])))
+		    reject += LOSER_COST_FACTOR;

Yeah, this caused me no end of trouble when tweaking the MIPS
multiply-accumulate patterns.  However, unlike the other bits of
cruft I've been complaining about, this is one where I can't think
of any alternative that makes more inherent sense (to me).  So I agree
that leaving it as-is is the best approach for now.

+	      /* If the operand is dying, has a matching constraint,
+		 and satisfies constraints of the matched operand
+		 which failed to satisfy the own constraints, we do
+		 not need to generate a reload insn for this
+		 operand.  */
+	      if (this_alternative_matches < 0
+		  || curr_alt_win[this_alternative_matches]
+		  || ! REG_P (op)
+		  || find_regno_note (curr_insn, REG_DEAD,
+				      REGNO (op)) == NULL_RTX
+		  || ((hard_regno[nop] < 0
+		       || ! in_hard_reg_set_p (this_alternative_set,
+					       mode, hard_regno[nop]))
+		      && (hard_regno[nop] >= 0
+			  || ! in_class_p (REGNO (op), GET_MODE (op),
+					   this_alternative, NULL))))
+		losers++;

I think this might be clearer as:

Fixed.

	      if (!(this_alternative_matches >= 0
		    && !curr_alt_win[this_alternative_matches]
		    && REG_P (op)
		    && find_regno_note (curr_insn, REG_DEAD, REGNO (op))
		    && (hard_regno[nop] >= 0
			? in_hard_reg_set_p (this_alternative_set,
					     mode, hard_regno[nop])
			: in_class_p (op, this_alternative, NULL))))
		losers++;

+	      if (operand_reg[nop] != NULL_RTX)
+		{
+		  int last_reload = (lra_reg_info[ORIGINAL_REGNO
+						  (operand_reg[nop])]
+				     .last_reload);
+
+		  if (last_reload > bb_reload_num)
+		    reload_sum += last_reload;
+		  else
+		    reload_sum += bb_reload_num;

The comment for reload_sum says:

+/* Overall number reflecting distances of previous reloading the same
+   value.  It is used to improve inheritance chances.  */
+static int best_reload_sum;

That is a wrong comment. It should be

/* Overall number reflecting distances of previous reloading the same
   value.  The distances are counted from the current BB start.  It is
   used to improve inheritance chances.  */

I fixed it. I am also decreasing the number by bb_reload_num every time when I increase reload_sum.

which made me think of distance from the current instruction.  I see
it's actually something else, effectively a sum of instruction numbers.

I assumed the idea was to prefer registers that were reloaded more
recently (closer the current instruction).  In that case I thought that,
for a distance-based best_reload_sum, smaller would be better,
while for an instruction-number-based best_reload_sum, larger would
be better.  It looks like we use instruction-number based best_reload_sums
but prefer smaller sums:

+			      && (reload_nregs < best_reload_nregs
+				  || (reload_nregs == best_reload_nregs
+				      && best_reload_sum < reload_sum))))))))

Is that intentional?

Now it has sense the bigger number, the closer the last reloading to the current insn.

Also, is this value meaningful for output reloads, which aren't really
going to be able to inherit a value as such?  We seem to apply the cost
regardless of whether it's an input or an output, so probably deserves
a comment.

Same for matched input operands, which as you say elsewhere aren't
inherited.

Right. It could improve the heuristic more. I added the code.

+	      if (badop
+		  /* Alternative loses if it has no regs for a reg
+		     operand.  */
+		  || (REG_P (op) && no_regs_p
+		      && this_alternative_matches < 0))
+		goto fail;

More reload cruft, but I don't understand why we have both this and, later:

+	      if (this_alternative_matches < 0
+		  && no_regs_p && ! this_alternative_offmemok && ! constmemok)
+		goto fail;

We also had the earlier:

+	  /* If this operand could be handled with a reg, and some reg
+	     is allowed, then this operand can be handled.  */
+	  if (winreg && this_alternative != NO_REGS)
+	    badop = false;

which I think belongs in the same else statement.  At least after the
matching changes suggested above, I think all three can be replaced by:

	  /* If this operand accepts a register, and if the register class
	     has at least one allocatable register, then this operand
	     can be reloaded.  */
	  if (winreg && !no_regs_p)
	    badop = false;

	  if (badop)
	    goto fail;

which IMO belongs after the "no_regs_p" assignment.  badop should never
be false if we have no way of reloading the value.

This change is non-trivial without knowing semantics of all these variables. It looks ok to me.

So I changed the code.

+	      if (! no_regs_p)
+		reload_nregs
+		  += ira_reg_class_max_nregs[this_alternative][mode];

I wasn't sure why we counted this even in the "const_to_mem && constmmeok"
and "MEM_P (op) && offmemok" cases from:
	      /* We prefer to reload pseudos over reloading other
		 things, since such reloads may be able to be
		 eliminated later.  So bump REJECT in other cases.
		 Don't do this in the case where we are forcing a
		 constant into memory and it will then win since we
		 don't want to have a different alternative match
		 then.	*/
	      if (! (REG_P (op)
		     && REGNO (op) >= FIRST_PSEUDO_REGISTER)
		  && ! (const_to_mem && constmemok)
		  /* We can reload the address instead of memory (so
		     do not punish it).	 It is preferable to do to
		     avoid cycling in some cases.  */
		  && ! (MEM_P (op) && offmemok))
		reject += 2;

I think constmemok is obvious. It is not a reload, it just putting constant in the constant pool. We should not punish it as no additional insns are generated.

There is a comment for offmemok case. I think it describes it. Apparently it was a fix for LRA cycling. I don't remember details. To restore them, I need to remove the code and to try it on many targets. I guess, it would take 3-4 days. But I removed this as it does not affect x86/x86-64.

+	  if (early_clobber_p)
+	    reject++;
+	  /* ??? Should we update the cost because early clobber
+	     register reloads or it is a rare thing to be worth to do
+	     it.  */
+	  overall = losers * LOSER_COST_FACTOR + reject;

Could you expand on the comment a bit?

Yes, I did it.

+	  if ((best_losers == 0 || losers != 0) && best_overall < overall)
+	    goto fail;
+
+	  curr_alt[nop] = this_alternative;
+	  COPY_HARD_REG_SET (curr_alt_set[nop], this_alternative_set);
+	  curr_alt_win[nop] = this_alternative_win;
+	  curr_alt_match_win[nop] = this_alternative_match_win;
+	  curr_alt_offmemok[nop] = this_alternative_offmemok;
+	  curr_alt_matches[nop] = this_alternative_matches;
+
+	  if (invalidate_m >= 0 && ! this_alternative_win)
+	    curr_alt_win[invalidate_m] = false;

BTW, after the matching changes above, I don't think we need both
"invalidate_m" and "this_alternative_matches".

Yes, if we use did_match, we could remove invalidate_m. I removed it.

+	  for (j = hard_regno_nregs[clobbered_hard_regno][biggest_mode[i]] - 1;
+	       j >= 0;
+	       j--)
+	    SET_HARD_REG_BIT (temp_set, clobbered_hard_regno + j);

add_to_hard_reg_set.

Fixed.

+	    else if (curr_alt_matches[j] == i && curr_alt_match_win[j])
+	      {
+		/* This is a trick.  Such operands don't conflict and
+		   don't need a reload.	 But it is hard to transfer
+		   this information to the assignment pass which
+		   spills one operand without this info.  We avoid the
+		   conflict by forcing to use the same pseudo for the
+		   operands hoping that the pseudo gets the same hard
+		   regno as the operands and the reloads are gone.  */
+		if (*curr_id->operand_loc[i] != *curr_id->operand_loc[j])

...

+		/* See the comment for the previous case.  */
+		if (*curr_id->operand_loc[i] != *curr_id->operand_loc[j])

What are these last two if statements for?  I wasn't sure how two operands
could have the same address.

Not saying they're wrong, but I think a comment would be good.

I think it is a leftover from older state of code. If lra-assign.c does something wrong in this situation, I think, next round lra-constraint.c will fix it. So I am removing the code.

+      small_class_operands_num = 0;
+      for (nop = 0; nop < n_operands; nop++)
+	/* If this alternative can be made to work by reloading, and
+	   it needs less reloading than the others checked so far,
+	   record it as the chosen goal for reloading.	*/
+	small_class_operands_num
+	  += SMALL_REGISTER_CLASS_P (curr_alt[nop]) ? 1 : 0;

Misplaced comment; I think it belongs after this line.

Yes. Fixed. That was a rigorous review. I took 2 full days and numerous bootstraps and testing to check your proposals and answer you questions. But I am glad. I believe the code became more clear.

Follow-Ups:
- Re: RFC: LRA for x86/x86-64 [7/9] -- continuation
  - From: Richard Sandiford

References:
- Re: RFC: LRA for x86/x86-64 [7/9] -- continuation
  - From: Richard Sandiford

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]