This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: MIPS: 2'nd pass of ira, causes weird register allocation for 2-op mult


Klaus Pedersen <projectu@gmail.com> writes:
> The summery goes something like this:
>
> It is possible for the second pass of ira to get confused and decide that
> NO_REGS or a hard float register are better choices for the result of the
> 2 operand mult. First pass already optimally allocated in GR_AND_MD1_REGS.

Yeah.  I'm afraid this is something I've been sitting on for a while now.
I think the problem boils down to this:

> Pass 0 for finding pseudo/allocno costs
>
>     a2 (r209,l0) best GR_REGS, allocno GR_REGS
>     a4 (r208,l0) best GR_REGS, allocno GR_REGS
>     a5 (r207,l0) best GR_REGS, allocno GR_REGS
>     a1 (r200,l0) best GR_REGS, allocno GR_REGS
>     a3 (r199,l0) best GR_AND_MD1_REGS, allocno GR_AND_MD1_REGS
>     a0 (r194,l0) best GR_REGS, allocno GR_REGS
>
> ...
>   a3(r199,l0) costs: M16_REGS:6000,6000 T_REG:6000,6000
> M16_T_REGS:6000,6000 PIC_FN_ADDR_REG:6000,6000 V1_REG:6000,6000
> LEA_REGS:6000,6000 GR_REGS:6000,6000 FP_REGS:14000,14000
> MD1_REG:6000,6000 MD_REGS:2000000,2000000 ACC_REGS:2000000,2000000
> GR_AND_MD0_REGS:2000000,2000000 GR_AND_MD1_REGS:18000,18000
> GR_AND_MD_REGS:2000000,2000000 GR_AND_ACC_REGS:2000000,2000000
> ALL_REGS:2000000,2000000 MEM:14000,14000

Here the cost of using GR_REGS is 6000 and the cost of using MD1_REG
is also 6000.  But the cost of using _either_ GR_REGS _or_ MD1_REG
is 18000!  That doesn't make conceptual sense.

This doesn't matter during the first pass because we collect the
union of the cheapest classes:

	      if (i_costs[k] < best_cost)
		{
		  best_cost = i_costs[k];
		  best = (enum reg_class) rclass;
		}
	      else if (i_costs[k] == best_cost)
		best = ira_reg_class_subunion[best][rclass];

I.e. if the cost of using X is C and the cost of using Y is C, we assume
that the cost of using either X or Y is also C.  We don't care in the
"else" case about the cost of ira_reg_class_subunion[best][rclass],
because if that cost is something other than C, it's wrong.
So despite what the dump says, the first pass is effectively
(and correctly) giving GR_AND_MD1_REGS a cost of 6000.

But the second pass goes out of its way to avoid calculating the costs
of these constituent classes, and just calculates the cost of
GR_AND_MD1_REGS itself:

      /* We exclude classes from consideration which are subsets of
	 ACLASS only if ACLASS is a pressure class or subset of a
	 pressure class.  It means by the definition of pressure classes
	 that cost of moving between susbets of ACLASS is cheaper than
	 load or store.  */
      for (i = 0; i < ira_pressure_classes_num; i++)
	{
	  cl = ira_pressure_classes[i];
	  if (cl == aclass)
	    break;
	  COPY_HARD_REG_SET (temp2, reg_class_contents[cl]);
	  AND_COMPL_HARD_REG_SET (temp2, ira_no_alloc_regs);
	  if (hard_reg_set_subset_p (temp, temp2))
	    break;
	}
      exclude_p = i < ira_pressure_classes_num;
      classes.num = 0;
      for (i = 0; i < ira_important_classes_num; i++)
	{
	  cl = ira_important_classes[i];
	  if (exclude_p)
	    {
	      /* Exclude no-pressure classes which are subsets of
		 ACLASS.  */
	      COPY_HARD_REG_SET (temp2, reg_class_contents[cl]);
	      AND_COMPL_HARD_REG_SET (temp2, ira_no_alloc_regs);
	      if (! ira_reg_pressure_class_p[cl]
		  && hard_reg_set_subset_p (temp2, temp) && cl != aclass)
		continue;
	    }
	  classes.classes[classes.num++] = cl;
	}

So we get:

>   a3(r199,l0) costs: FP_REGS:14000,14000 MD_REGS:2000000,2000000
> ACC_REGS:2000000,2000000 GR_AND_MD0_REGS:2000000,2000000
> GR_AND_MD1_REGS:18000,18000 GR_AND_MD_REGS:2000000,2000000
> GR_AND_ACC_REGS:2000000,2000000 ALL_REGS:2000000,2000000
> MEM:14000,14000

I.e. the same incorrect value for GR_AND_MD1_REGS, but without the
individual GR_REGS and MD1_REG costs that would give the correct value.

Part of the problem is that may_move_{in,out}_cost is too conservative.
It is either the full cost of a move or 0.  E.g. take:

    MMOC == may_move_out_cost[mode][X][Y]

which is used where an output constraint specifies class X and the
allocation class is Y.  MMOC is 0 if Y is a subset of X, because no move
is ever needed in that case.  But in other cases MMOC is the worst-case
cost of moving from X to _anything_ in Y.  So if Y is X \union X', say,
then MMOC will include the cost of an X->X move, even though a move is
only needed for registers in the X' part of Y.

We could do better by making:

    may_move_out_cost[mode][X][Y] = move_cost[mode][X][superdiff[Y][X]]

where superdiff is a new array specifying the smallest class that contains
Y - X.  But even that's too pessimistic.  When we see an X constraint,
we'll count the cost of:

    move_cost[mode][X][X']

And when we see an X' constraint, we'll count the cost of:

    move_cost[mode][X'][X]

And we'll sum these costs together, even though they represent
mutually-exclusive scenarios.  In cases like yours (which are common
on MIPS) we'll end up with a GR_AND_MD1_REGS cost of 12000 instead
of 18000, but the value really ought to be 6000.

I think the only practical way of calculating accurate costs for things
like GR_AND_MD_REGS really is to count the cost of the constituent classes
and then take their MAX.

Vlad, what do you think?  Is the above exclude_p code "just" a compile-time
speed-up?  Or is it a conceptual part of the algorithm?  More generally,
what kind of situations does the second pass help with?  I.e. how does
it improve on the first pass?

Richard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]