This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch RFC] SH: PR target/22553


Kaz Kojima wrote:

I've tried to see what is going on according to your explanation for
this issue.
In that PR, the basic blocks before try_optimize_cfg look like the left
side of the following figure and try_optimize_cfg combines two blocks
into one block

	+-----------+
	| ...       |
	| call insn |
	+-----------+
	   | ...  \
    fall  | ...   \ abnormal
	   | ...    \
	   | ...     \
	   V
	+-----------+
    b:	| val. copy |
	+-----------+			+-----------+
	   |		   combine b&c	| val. copy |
	   | jump		->	| ...       |
	   V				|	    |
    c:	+-----------+			+-----------+
	| ...	    |
	+-----------+

Then the first scheduling pass produces a sequence which will fail in
reload because the register pressure is too high in the combined BB,
right?


It's not the register pressure in general, but the register pressure on r0, at the point
where the register calue copy is, in particular.
We used to have the convenstion that the return value copy must stay an immediate
successor of the call_insn till reload. This is how we can keep recognizing the return
value copy as such, and avoid moving it or doing invalid combinations. This becomes
harder to maintain when the two insn are not in the same basic block anymore. We must
get back to the point where every optimizer with a need to know can recognize a return
value copy and leave it alone.


As an experiment, I've ruled out the above situation with

edge e = NULL;

	if (single_pred_p (b)
	    && (single_pred_edge (b)->flags & EDGE_FALLTHRU))
	  {
	    edge_iterator ei;
	    basic_block d = single_pred_edge (b)->src;

	    FOR_EACH_EDGE (e, ei, d->succs)
	      if (e->flags & EDGE_COMPLEX)
		break;
	  }

if (e)

and made the above combining not to happen in try_optimize_cfg. It
gets rid of the ICE, though I'm not sure that it's the Right Thing
even for SH-4.


Combining of blocks as such is actually harmless. What causes trouble is moving the
call insn or the return value copy from their places at the edge of their blocks,
or inserting any insns or entire blocks in between.


I think that the fundamental bug is the way that exception handling places basic block
boundaries after call_insns, without changing all the code that relies on basic blocks
only being ended by jumps, labels, or the end of a function.
You could add some dummy insn (could be even nominally a pure CALL_INSN) at the
start of the basic block that does the return value copy to stop the register allocation problems
there, but that would still not stop optimizers like lcm placing new basic blocks after the
CALL_INSN. For the purposes of most optimizers, the fall-through block after the CALL_INSN
must still be considered to be the same basic block as the one that contains the CALL_INSN.
I wonder if it really makes sense getiing almost every optimization pass to pretend these basic
block boundaries are not there. A better solution might be to have some separate way to represent
the abnormal edge coming out of the CALL_INSN.
A particularily tricky question is how we should handle the destination block of the abnormal edge
for register allocation. THe abnormal edge actually comes out from the middle of the call_insn:
after all in input operands needed to make the call are evaluated, but before the result is assigned.
But we can't pretend the call_insn consists actually of two insns and that we have an ordinary
basic block boundary in between: we can't insert a new basic block there, nor can we insert
reload insns (e.g. RELOAD_FOR_OUTADDR) inside the call_insn.
I suppose we'll have to strictly distinguish CALL_INSNs that set a hard register - these should be
subject to protection of their output value - an those that set a MEM, a pseudo, or nothing - these
should be OK to handle with the current scheme (except for performance problems due to misguided
heuristics). If the CALL_INSN only sets a hard register, no reloads are required for the output part,
and for the input part we are restricted to input reloads only - as we are already for JUMP_INSNs.
So with these caveats, we can treat these CALL_INSNs as split inside by a basic block boundary,
but only during reload/global_alloc.


N.B. it might look tempting to 'solve' the problem by changing the CALL_INSN to set a pseudo,
but that would work only for the simples of CALL_INSNs. The 'z' constrait can only be used for
return registers that consist only for R0 itself. A DImode / TImode / DFmode value needs
more than one hard register. If we introduced register classes for every kind of return value
register set, REGNO_REG_CLASS for a lot of registers would change, and register allocation
would get worse. Look at global.c:find_reg to see why this would be a bad idea. If we wanted to
go that way, we'd have to have a new target macro/hook to tell global/regrename what register classes
to use in order to check if two registers are supposed to be considered equivalent. And we'll need a new
MODE parameter for REG_CLASS_FROM_CONSTRAINT to avoid an explosion of the number of call
patterns.




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]