This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: RFC: LRA for x86/x86-64 [7/9] -- continuation

From: Vladimir Makarov <vmakarov at redhat dot com>
To: GCC Patches <gcc-patches at gcc dot gnu dot org>, rdsandiford at googlemail dot com
Date: Fri, 19 Oct 2012 01:14:30 -0400
Subject: Re: RFC: LRA for x86/x86-64 [7/9] -- continuation
References: <5064DA43.90803@redhat.com> <87vceie42n.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> <507AF828.70406@redhat.com> <87obk4djnd.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com>

On 12-10-15 8:06 AM, Richard Sandiford wrote:

Vladimir Makarov <vmakarov@redhat.com> writes:

if that's accurate.  I dropped the term "reload pseudo" because of
the general comment in my earlier reply about the use of "reload pseudo"
when the code seems to include inheritance and split pseudos too.

There is no inheritance and splitting yet.  It is done after the
constraint pass.
So at this stage >= new_regno_start means reload pseudo.

Ah, OK.

That's a change in the meaning of NEW_CLASS, but seems easier for
callers to handle.  I think all it requires is changing:

+      common_class = ira_reg_class_subset[rclass][cl];
+      if (new_class != NULL)
+	*new_class = common_class;

to:

        common_class = ira_reg_class_subset[rclass][cl];
        if (new_class != NULL && rclass != common_class)
	*new_class = common_class;

This change results in infinite LRA looping on a first libgcc file
compilation.  Unfortunately I have no time to investigate it.
I'd like to say that most code of in this code is very sensitive to
changes.  I see it a lot.  You change something looking obvious and a
target is broken.
I am going to investigate it when I have more time.

Thanks.

+    default:
+      {
+	const char *fmt = GET_RTX_FORMAT (code);
+	int i;
+
+	if (GET_RTX_LENGTH (code) != 1
+	    || fmt[0] != 'e' || GET_CODE (XEXP (x, 0)) != UNSPEC)
+	  {
+	    for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+	      if (fmt[i] == 'e')
+		extract_loc_address_regs (false, mode, as, &XEXP (x, i),
+					  context_p, code, SCRATCH,
+					  modify_p, ad);
+	    break;
+	  }
+	/* fall through for case UNARY_OP (UNSPEC ...)	*/
+      }
+
+    case UNSPEC:
+      if (ad->disp_loc == NULL)
+	ad->disp_loc = loc;
+      else if (ad->base_reg_loc == NULL)
+	{
+	  ad->base_reg_loc = loc;
+	  ad->base_outer_code = outer_code;
+	  ad->index_code = index_code;
+	  ad->base_modify_p = modify_p;
+	}
+      else
+	{
+	  lra_assert (ad->index_reg_loc == NULL);
+	  ad->index_reg_loc = loc;
+	}
+      break;
+
+    }

Which targets use a bare UNSPEC as a displacement?  I thought a
displacement had to be a link-time constant, in which case it should
satisfy CONSTANT_P.  For UNSPECs, that means wrapping it in a CONST.

I saw it somewhere. I guess IA64.

I'm just a bit worried that the UNSPEC handling is sensitive to the
order that subrtxes are processed (unlike PLUS, which goes to some
trouble to work out what's what).  It could be especially confusing
because the default case processes operands in reverse order while
PLUS processes them in forward order.

Also, which cases require the special UNARY_OP (UNSPEC ...) fallthrough?
Probably deserves a comment.

I don't remember.  To figure out, I should switch it off and try all
targets supported by LRA.

AIUI the base_reg_loc, index_reg_loc and disp_loc fields aren't just
recording where reloads of a particular class need to go (obviously
in the case of disp_loc, which isn't reloaded at all).  The feidls
have semantic value too.  I.e. we use them to work out the value
of at least part of the address.

In that case it seems dangerous to look through general rtxes
in the way that the default case above does.  Maybe just making
sure that DISP_LOC is involved in a sum with the base would be
enough, but another idea was:

----------------------------------------------------------------
I know of three ways of "mutating" (for want of a better word)
an address:

    1. (and X (const_int X)), to align
    2. a subreg
    3. a unary operator (such as truncation or extension)

So maybe we could:

    a. remove outer mutations (using a helper function)
    b. handle LO_SUM, PRE_*, POST_*: as now
    c. otherwise treat the address of the sum of one, two or three pieces.
       c1. Peel mutations of all pieces.
       c2. Classify the pieces into base, index and displacement.
           This would be similar to the jousting code above, but hopefully
           easier because all three rtxes are to hand.  E.g. we could
           do the base vs. index thing in a similar way to
           commutative_operand_precedence.
       c3. Record which pieces were mutated (e.g. using something like the
           index_loc vs. index_reg_loc distinction in the current code)

That should be general enough for current targets, but if it isn't,
we could generalise it further when we know what generalisation is needed.

That's still going to be a fair amount of code, but hopefully not more,
and we might have more confidence at each stage what each value is.
And it avoids the risk of treating "mutated" addresses as "unmutated" ones.
----------------------------------------------------------------

Just an idea though.  Probably not for 4.8, although I might try it
if I find time.

I am not sure that you listed all the cases.  It would be great if you
listed all the cases. In this case we could make this function more clear.
I tried to do this first but permanently found new cases.  After that I
gave up and tried to use more general implementation.

This function was rewritten and modified many times.  I am afraid to do
this again when clock is ticking.

It would be great if you re-implement the function according to your
ideas and we could try it on 8 targets to which LRA was already ported.
An LRA sub-branch would a perfect place to do it

Fair enough. I still stand by:

It would be nice to sort out the disp_loc thing for 4.8 though.

though.  My point is that base_plus_disp_to_reg assumes that
*base_loc and *disp_loc are involved in a sum.  It adds them together,
replaces the base_loc with the new pseudo, and removes the disp_loc.
But extract_address_regs seems to be deliberately written in a way that
doesn't require base_loc and disp_loc to be involved in a sum, and uses
a more indirect approach to working out disp_loc instead.  It just feels
like it leaves open the potential for a silent wrong-code bug.

I add the explicit check of base + disp case. I also added a lot of asserts checking that we don't assign twice.

+/* Reload pseudos created for matched input and output reloads whose
+   mode are different.	Such pseudos has a modified rules for finding
+   their living ranges, e.g. assigning to subreg of such pseudo means
+   changing all pseudo value.  */
+bitmap_head lra_bound_pseudos;

Maybe:

/* Reload pseudos created for matched input and output reloads whose
     modes are different.  Such pseudos have different live ranges from
     other pseudos; e.g. any assignment to a subreg of these pseudos
     changes the whole pseudo's value.  */

Fixed.

Although that said, couldn't emit_move_insn_1 (called by gen_move_insn)
split a multiword pseudo move into two word moves?  Using the traditional
clobber technique sounds better than having special liveness rules.

It is not only about multi-words pseudos.  It is about representation of
this situation by constructions semantically incorrect in order parts of
compiler.  Reload has no such problem as it does not use RTL.   So I
don't think it splits as I use emit_move_insn and that calls
emit_move_insn_1 too.

But my point is that emit_move_insn_1 _does_ split moves that have no
.md pattern of their own.  E.g. some targets do not define double-word
move patterns because such moves are always equivalent to two individual
word moves.  And if emit_move_insn_1 splits:

(set (reg:DI X) (reg:DI Y))

into:

    (set (subreg:SI (reg:DI X) 0) (subreg:SI (reg:DI Y) 0))
    (set (subreg:SI (reg:DI X) 4) (subreg:SI (reg:DI Y) 4))

then it would be to say that the subreg in the second instruction
is a complete definition of X.

I really needed a special liveness treatment (although I don't
remember details) and therefore I added it.  I had no detail design
for LRA.  The code was modified by numerous test failures on different
targets.  There is a lot of code analogous to reload one and probably
its necessity should be rigorously questioned.  I thought about and
modified part of this code but unfortunately not all.

Also bound pseudos are rare.  Their bitmap is very small and testing (2
lines of code in overall) them in ira-lives.c is fast.

FWIW, It wasn't really speed as much as correctness I was worried about.
In a way, rarity makes having special rules seem even more dangerous.

Right. I don't think it is more dangerous than clobbers but I like what you proposed. After few tries, I managed with clobbers. So now there are no bound pseudos (or special reload pseudos. that is how it is called now). The code is clear and there is no additional notion for LRA. Thank you.

+      /* We create pseudo for out rtx because we always should keep
+	 registers with the same original regno have synchronized
+	 value (it is not true for out register but it will be
+	 corrected by the next insn).

I don't understand this comment, sorry.

Pseudos have values -- see comments for lra_reg_info.  Different pseudos
with the same value do not conflict even if they live in the same
place.  When we create a pseudo we assign value of original pseudo (if
any) from which we created the new pseudo.  If we create the pseudo from
the input pseudo, the new pseudo will no conflict with the input pseudo
which is wrong when the input pseudo lives after the insn and as the new
pseudo value is changed by the insn output.  Therefore we create the new
pseudo from the output.

I hope it is more understandable. I changed the comment.

Yeah, I think that makes it a lot clearer, thanks.

+  /* In and out operand can be got from transformations before
+     processing constraints.  So the pseudos might have inaccurate
+     class and we should make their classes more accurate.  */
+  narrow_reload_pseudo_class (in_rtx, goal_class);
+  narrow_reload_pseudo_class (out_rtx, goal_class);

I don't understand this, sorry.  Does "transformations" mean inheritance
and reload splitting?  So the registers we're changing here are inheritance
and split pseudos rather than reload pseudos created for this instruction?
If so, it sounds on face value like it conflicts with the comment quoted
above about not allowing reload instructions to the narrow the class
of pseudos.  Might be worth saying why that's OK here but not there.

Again, inheritance and splitting is done after the constraint pass.

The transformations here are mostly reloading of subregs which is done
before reloads for given insn.  On this transformation we create new
pseudos for which we don't know reg class yet.  In case we don't know
pseudo reg class yet, we assign ALL_REGS to the pseudo.

OK, thanks.

Also, all uses but one of lra_get_hard_regno_and_offset follow
the pattern:

        lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
        /* The real hard regno of the operand after the allocation.  */
        x_hard_regno = get_final_hard_regno (x_hard_regno, offset);

so couldn't lra_get_hard_regno_and_offset just return the final
hard register, including elimination?  Then it could apply the
elimination on the original rtx.

FWIW, the exception I mentioned was operands_match_p:

        lra_get_hard_regno_and_offset (x, &i, &offset);
        if (i < 0)
	goto slow;
        i += offset;

but I'm not sure why this is the only caller that would want
to ignore elimination.

???

Sorry, I just composed this email for 2 days and that was a placeholder for your question I should answer or work on:) It looks I did not answered the question.

Not sure what you meant here :-)  Was that a placeholder,
or something else?  What I was getting at was that it would
be nice to replace all occurences of:

       lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
       /* The real hard regno of the operand after the allocation.  */
       x_hard_regno = get_final_hard_regno (x_hard_regno, offset);

with something like:

x_hard_regno = lra_get_hard_regno (x);

and this operands_match_p code seemed to be the only place that didn't
apply get_final_hard_regno to the result of lra_get_hard_regno_and_offset.
I wasn't really sure why operands_match_p was different.

Fixed.

+  int i, j, x_hard_regno, offset;
+  enum machine_mode mode;
+  rtx x;
+  const char *fmt;
+  enum rtx_code code;
+
+  if (*loc == NULL_RTX)
+    return false;
+  x = *loc;
+  code = GET_CODE (x);
+  mode = GET_MODE (x);
+  if (code == SUBREG)
+    {
+      loc = &SUBREG_REG (x);
+      x = SUBREG_REG (x);
+      code = GET_CODE (x);
+      if (GET_MODE_SIZE (GET_MODE (x)) > GET_MODE_SIZE (mode))
+	mode = GET_MODE (x);
+    }
+
+  if (REG_P (x))
+    {
+      lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
+      /* The real hard regno of the operand after the allocation.  */
+      x_hard_regno = get_final_hard_regno (x_hard_regno, offset);
+      return (x_hard_regno >= 0
+	      && lra_hard_reg_set_intersection_p (x_hard_regno, mode, set));

With the subreg mode handling above, this looks little-endian specific.
The MEM case:

+  if (MEM_P (x))
+    {
+      struct address ad;
+      enum machine_mode mode = GET_MODE (x);
+      rtx *addr_loc = &XEXP (x, 0);
+
+      extract_address_regs (mode, MEM_ADDR_SPACE (x), addr_loc, MEM, &ad);
+      if (ad.base_reg_loc != NULL)
+	{
+	  if (uses_hard_regs_p (ad.base_reg_loc, set))
+	    return true;
+	}
+      if (ad.index_reg_loc != NULL)
+	{
+	  if (uses_hard_regs_p (ad.index_reg_loc, set))
+	    return true;
+	}
+    }

is independent of the subreg handling, so perhaps the paradoxical subreg
case should be handled separately, using simplify_subreg_regno.

Not sure: did you have any thoughts on this?

No. Sorry. I'll think about this morning. A bit tired for today.

+		    match_p = false;
+		    if (operands_match_p (*curr_id->operand_loc[nop],
+					  *curr_id->operand_loc[m], m_hregno))
+		      {
+			int i;
+			
+			for (i = 0; i < early_clobbered_regs_num; i++)
+			  if (early_clobbered_nops[i] == m)
+			    break;
+			/* We should reject matching of an early
+			   clobber operand if the matching operand is
+			   not dying in the insn.  */
+			if (i >= early_clobbered_regs_num

Why not simply use operands m's early_clobber field?

Ok. Fixed.

+			    || operand_reg[nop] == NULL_RTX
+			    || (find_regno_note (curr_insn, REG_DEAD,
+						 REGNO (operand_reg[nop]))
+				!= NULL_RTX))
+			  match_p = true;

...although I don't really understand this condition.  If the two
operands are the same value X, then X must die here whatever the
notes say.  So I assume this is coping with a case where the operands
are different but still match.  If so, could you give an example?

I remember I saw such insn but I don't remember details.

Matched earlyclobbers explicitly guarantee that the earlyclobber doesn't
apply to the matched input operand; the earlyclobber only applies to
other input operands.  So I'd have expected it was those operands
that might need reloading rather than this one.

E.g. if X occurs three times, twice in a matched earlyclobber pair
and once as an independent operand, it's the latter operand that would
need reloading.

Yes, I know.

But in that case I don't understand the condition. If we have:

(set (reg X) (... (reg X) ...))

(which is the kind of thing operands_match_p is testing for)
then there is no requirement for a REG_DEAD note for X.
But it's still OK for two Xs form an earlyclobber pair.

I think I added this for some purpose. It might be IRA assigning wrongly the same hard register in situation.

(set (reg X1) (... (reg X2) ...)) REG_UNUSED:X1)
and X2 lives below.

I think it is better not to change this code. I promise I'll remove it on LRA branch to find why I added.

+			  /* If we didn't already win, we can reload
+			     constants via force_const_mem, and other
+			     MEMs by reloading the address like for
+			     'o'.  */
+			  if (CONST_POOL_OK_P (mode, op) || MEM_P (op))
+			    badop = false;

It seems a bit inconsistent to treat a spilled pseudo whose address
might well need reloading as a win, while not treating existing MEMs
whose addresses need reloading as a win.

Well, probability of reloading address of spilled pseudo is very small
on most targets but reloading for MEM in this case is real. So I see it
logical.

OK, that's probably true. :-)

+	      if (! no_regs_p)
+		reload_nregs
+		  += ira_reg_class_max_nregs[this_alternative][mode];

I wasn't sure why we counted this even in the "const_to_mem && constmmeok"
and "MEM_P (op) && offmemok" cases from:
	      /* We prefer to reload pseudos over reloading other
		 things, since such reloads may be able to be
		 eliminated later.  So bump REJECT in other cases.
		 Don't do this in the case where we are forcing a
		 constant into memory and it will then win since we
		 don't want to have a different alternative match
		 then.	*/
	      if (! (REG_P (op)
		     && REGNO (op) >= FIRST_PSEUDO_REGISTER)
		  && ! (const_to_mem && constmemok)
		  /* We can reload the address instead of memory (so
		     do not punish it).	 It is preferable to do to
		     avoid cycling in some cases.  */
		  && ! (MEM_P (op) && offmemok))
		reject += 2;

I think constmemok is obvious.  It is not a reload, it just putting
constant in the constant pool.  We should not punish it as no additional
insns are generated.

There is a comment for offmemok case.  I think it describes it.
Apparently it was a fix for LRA cycling.  I don't remember details. To
restore them, I need to remove the code and to try it on many targets.
I guess, it would take 3-4 days.  But I removed this as it does not
affect x86/x86-64.

Sorry, my comment wasn't as clear as it should have been. I think:

	      /* We prefer to reload pseudos over reloading other
		 things, since such reloads may be able to be
		 eliminated later.  So bump REJECT in other cases.
		 Don't do this in the case where we are forcing a
		 constant into memory and it will then win since we
		 don't want to have a different alternative match
		 then.	*/
	      if (! (REG_P (op)
		     && REGNO (op) >= FIRST_PSEUDO_REGISTER)
		  && ! (const_to_mem && constmemok)
		  /* We can reload the address instead of memory (so
		     do not punish it).	 It is preferable to do to
		     avoid cycling in some cases.  */
		  && ! (MEM_P (op) && offmemok))
		reject += 2;

is fine.  The point I was trying to make is that this code (correctly IMO)
ignores register reload costs if the operand won't be reloaded into
a register.  The operand won't be reloaded into a register if:

	(const_to_mem && constmemok)
      || (MEM_P (op) && offmemok)

However:

	      if (! no_regs_p)
		reload_nregs
		  += ira_reg_class_max_nregs[this_alternative][mode];

seems to count the register reload cost regardless of whether
the operand would be reloaded into a register.  It felt like
it ought to be something like:

	      if (!((const_to_mem && const_mem_ok)
		    || (MEM_P (op) && offmemok)))
		{
		  /* We prefer to reload pseudos over reloading other
		     things, since such reloads may be able to be
		     eliminated later.  So bump REJECT in other cases.  */
		  if (! (REG_P (op)
			 && REGNO (op) >= FIRST_PSEUDO_REGISTER)
		    reject += 2;

		  if (! no_regs_p)
		    reload_nregs
		      += ira_reg_class_max_nregs[this_alternative][mode];
		}

It has sense. I fixed it using your variant.

References:
- Re: RFC: LRA for x86/x86-64 [7/9] -- continuation
  - From: Richard Sandiford
- Re: RFC: LRA for x86/x86-64 [7/9] -- continuation
  - From: Vladimir Makarov
- Re: RFC: LRA for x86/x86-64 [7/9] -- continuation
  - From: Richard Sandiford

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]