This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: LRA for x86/x86-64 [7/9] -- continuation


Vladimir Makarov <vmakarov@redhat.com> writes:
>>> +/* Info about pseudo used during the assignment pass.  Thread is a set
>>> +   of connected reload and inheritance pseudos with the same set of
>>> +   available hard reg set.  Thread is a pseudo itself for other
>>> +   cases.  */
>>> +struct regno_assign_info
>> Maybe:
>>
>> /* Information about the thread to which a pseudo belongs.  Threads are
>>     a set of connected reload and inheritance pseudos with the same set of
>>     available hard registers.  Lone registers belong to their own threads.  */
> Fixed.
>> Although the condition seems to be:
>>> +	&& (ira_class_hard_regs_num[regno_allocno_class_array[regno1]]
>>> +	    == ira_class_hard_regs_num[regno_allocno_class_array[regno2]]))
>> i.e. the same _number_ of available hard regs, but not necessarily the
>> same set.
> It should be the same in most cases.  This condition is just faster 
> approximation of the same available hard reg set.

The distinction does seem important though.   It's possible that
a target has two distinct register files of the same allocatable size.
Would something like:

      (ira_class_subset_p[class1][class2]
       && ira_class_subset_p[class2][class1])

work instead?

>> /* Update the preference for using HARD_REGNO for pseudos that are
>>     connected directly or indirectly with REGNO.  Apply divisor DIV
>>     to any preference adjustments.
>>
>>     The more indirectly a pseudo is connected, the smaller its effect
>>     should be.  We therefore increase DIV on each "hop".  */
>>
> Fixed.  By the way, it is your invention from IRA.

Heh, I'd forgotten all about that.

>>> +      /* We are trying to spill a reload pseudo.  That is wrong we
>>> +	 should assign all reload pseudos, otherwise we cannot reuse
>>> +	 the selected alternatives.  */
>>> +      hard_regno = find_hard_regno_for (regno, &cost, -1);
>>> +      if (hard_regno >= 0)
>>> +	{
>> Don't really understand this comment, sorry.
> It removed the comment.  It is from an old solution code trying to 
> guarantee assignment to the reload pseudo.
>> Also, why are we passing -1 to find_hard_regno_for, rather than hard_regno?
>> The loop body up till this point has been specifically freeing up registers
>> to make hard_regno allocatable.  I realise that, by spilling everything
>> that overlaps this range, we might have freed up other registers too,
>> and so made others besides hard_regno allocatable.  But wouldn't we want
>> to test those other hard registers in "their" iteration of the loop
>> instead of this one?  The spills in those iterations ought to be more
>> directed (i.e. there should be less incidental spilling).
>>
>> As things stand, doing an rclass_size * rclass_size scan seems
>> unnecessarily expensive, although probably off the radar.
> We cannot just pass hard_regno for multi-word pseudo when hard_regno-1 
> is already free.

But this call is in a loop that iterates over all registers in the class:

  for (i = 0; i < rclass_size; i++)
    {
      hard_regno = ira_class_hard_regs[rclass][i];

and we reach the find_hard_regno_for call unless there is some
conflicting register that we cannot spill.  So if "hard_regno - 1"
belongs to the allocation class and is a viable choice, "its" iteration
of the loop would spill specifically for "hard_regno - 1" and get the
most accurate cost for that register.  I couldn't see why any other
iteration of the loop would want to consider "hard_regno - 1".

> You are right about possibility to speed up the code, although on 
> profiles I looked (including the last huge tests) spill_for and 
> find_hard_regno_for called from takes few time. That is probably because 
> you don't need spill frequently.  Freeing one long live range pseudo 
> permits to find hard regno without spilling for many short live pseudos 
> (reload and inheritance ones).
> Also loop rclass_size * rclass_size is not expensive, the preparation of 
> data for the loop is expensive.

OK, in that case maybe the efficiency concern wasn't justified.
FWIW, I still think passing hard_regno would be clearer though,
in terms of meeting expectations.  It just seems odd to spill for
one specific register and then test all of them.  Especially when the
spilling we actually do after choosing register X is based on "X's"
iteration of this loop.

(I realise I could well be missing the point here though, sorry.)

>>> +		  && (reload_hard_regno
>>> +		      = find_hard_regno_for (reload_regno,
>>> +					     &reload_cost, -1)) >= 0
>>> +		  && (lra_hard_reg_set_intersection_p
>>> +		      (reload_hard_regno, PSEUDO_REGNO_MODE (reload_regno),
>>> +		       spilled_hard_regs)))
>>> +		{
>>> +		  if (lra_dump_file != NULL)
>>> +		    fprintf (lra_dump_file, " assign %d(cost=%d)",
>>> +			     reload_regno, reload_cost);
>>> +		  assign_temporarily (reload_regno, reload_hard_regno);
>>> +		  cost += reload_cost;
>> It looks like registers that can be reallocated make hard_regno more
>> expensive (specifically by reload_cost), but registers that can't be
>> reallocated contribute no cost.  Is that right?  Seemed a little odd,
>> so maybe worth a comment.
> Reload cost is a negative cost.  It is negative base is the pseudo 
> frequency.

Ah!  Missed that, sorry.

> I added the comment.  The better hard register is, the more negative 
> cost is.

Thanks.

>> Also, AIUI find_hard_regno_for is trying to allocate the register for
>> reload_regno on the basis that reload_regno has the same conflicts as
>> the current regno, and so it's only an approximation.  Is that right?
>> Might be worth a comment if so (not least to explain why we don't commit
>> to this allocation if we end up choosing hard_regno).
> Sorry, I did not understand what you are asking.  We try to find best 
> pseudos to spill which helps to assign more reload pseudos (on cost 
> base) as possible.  Pseudos for which find_hard_regno finds not spilled 
> hard regs are ignored as they can be assigned without spilling.

Yeah, sorry, ignore this.  I think I'd taken a break here and then
forgotten that find_hard_regno_for works out the conflicts set for
the register itself.  I think I'd assumed when writing it that
find_hard_regno_for used some state that had been precalculated
before the main "for (i = 0; i < rclass_size; i++)" loop.

>>> +      /* Restore the live hard reg pseudo info for spilled pseudos.  */
>>> +      EXECUTE_IF_SET_IN_BITMAP (&spill_pseudos_bitmap, 0, spill_regno, bi)
>>> +	update_lives (spill_regno, false);
>> I couldn't tell why this was outside the "hard_regno >= 0" condition.
>> Do we really change these registers even if find_hard_regno_for fails?
> The first we spill some pseudo and then call find_hard_regno_for.  So we 
> should restore the state before spilling independently on success of 
> find_hard_regno_for.

Yeah, ignore this too :-)

>>> +  /* Spill: */
>>> +  EXECUTE_IF_SET_IN_BITMAP (&best_spill_pseudos_bitmap, 0, spill_regno, bi)
>> Very minor, but I think it'd be worth asserting that best_hard_regno >= 0
>> before this loop.
> Unfortunately, in very rare cases, best_hard_regno can be < 0.  That is 
> why we have two iteration for assignment of reload pseudos (see comment 
> for 2nd iter for reload pseudo assignments.
>
> I've added a comment for the function that it can return negative value.

OK, thanks.

>> The only general comment I have so far is that it's sometimes
>> difficult to follow which types of pseudos are being included
>> or excluded by a comparison with lra_constraint_new_regno_start.
>> Sometimes the comments talk about "reload pseudos", but other
>> similar checks imply that the registers could be inheritance
>> pseudos or split pseudos as well.  Some thin inline wrappers
>> might help here.
> Inheritance, split, reload pseudos created since last constraint pass >= 
> lra_constraint_new_regno_start.
> Inheritance and split pseudos created on any pass are in the 
> corresponding bitmaps.
> Inheritance and split pseudos since the last constraint pass has also 
> restore_regno >= 0 until split or inheritance transformations are done.

OK.  What prompted this was that some comments refer specifically to
"reload pseudo" whereas the accompanying code simply checks against
lra_constraint_new_regno_start.  It then wasn't obvious whether the code
really did just include "reload pseudos" in what I thought was the
strict sense -- e.g. because no other type of LRA-created pseudo could
occur in that context, so there was no point checking anything else --
or whether the code was actually handling inheritance and split pseudos too.

Maybe it would help to have a term to refer all four of:

  - reload pseudos
  - optional reload pseudos
  - inheritance pseudos
  - split pseudos

although I won't suggest one because I'm useless at naming things.

>>> +		/* We can use inheritance pseudos in original insns
>>> +		   (not reload ones).  */
>>> +		if (regno < lra_constraint_new_regno_start
>>> +		    || bitmap_bit_p (&lra_inheritance_pseudos, regno)
>>> +		    || reg_renumber[regno] < 0)
>>> +		  continue;
>>> +		sorted_pseudos[nfails++] = regno;
>>> +		if (lra_dump_file != NULL)
>>> +		  fprintf (lra_dump_file,
>>> +			   "	  Spill reload r%d(hr=%d, freq=%d)\n",
>>> +			   regno, reg_renumber[regno],
>>> +			   lra_reg_info[regno].freq);
>> Same comment about types of pseudo as above.  (I.e. the code checks for
>> inheritance pseudos, but not split pseudos.)
> I modified the comment to
>
> /* A reload pseudo did not get a hard register on the
>     first iteration because of the conflict with
>     another reload pseudos in the same insn.  So we
>     consider only reload pseudos assigned to hard
>     registers.  We shall exclude inheritance pseudos as
>     they can occur in original insns (not reload ones).
>     We can omit the check for split pseudos because
>     they occur only in move insns containing non-reload
>     pseudos. */
>
> I hope it explains the code.

Yes, thanks.

>>> +  for (n = 0, i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
>>> +    if (((i < lra_constraint_new_regno_start
>>> +	  && ! bitmap_bit_p (&do_not_assign_nonreload_pseudos, i))
>>> +	 || (bitmap_bit_p (&lra_inheritance_pseudos, i)
>>> +	     && lra_reg_info[i].restore_regno >= 0)
>>> +	 || (bitmap_bit_p (&lra_split_pseudos, i)
>>> +	     && lra_reg_info[i].restore_regno >= 0)
>>> +	 || bitmap_bit_p (&lra_optional_reload_pseudos, i))
>>> +	&& reg_renumber[i] < 0 && lra_reg_info[i].nrefs != 0
>>> +	&& regno_allocno_class_array[i] != NO_REGS)
>>> +      sorted_pseudos[n++] = i;
>>> +  bitmap_clear (&do_not_assign_nonreload_pseudos);
>> where we test very similar things inline, and then clear
>> do_not_assign_nonreload_pseudos.  Do we need d_n_a_n_p at all?
> No the code is right.  We still need d_n_a_n_p as we can not easily 
> calculate from what pseudo the given pseudo was inherited or split.  It 
> is different from the loop where new inheritance and split pseudos are 
> checked (it is different from their origins marked in d_n_a_n_p.

Yeah, sorry about that.  I misread the indices.

Richard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]