[patch RFC] SH: PR target/22553

Joern RENNECKE joern.rennecke@st.com
Fri Oct 14 12:40:00 GMT 2005


Kaz Kojima wrote:

>I've tried to see what is going on according to your explanation for
>this issue.
>In that PR, the basic blocks before try_optimize_cfg look like the left
>side of the following figure and try_optimize_cfg combines two blocks
>into one block
>
>	+-----------+
>	| ...       |
>	| call insn |
>	+-----------+
>	   | ...  \
>     fall  | ...   \ abnormal
>	   | ...    \
>	   | ...     \
>	   V
>	+-----------+
>     b:	| val. copy |
>	+-----------+			+-----------+
>	   |		   combine b&c	| val. copy |
>	   | jump		->	| ...       |
>	   V				|	    |
>     c:	+-----------+			+-----------+
>	| ...	    |
>	+-----------+
>
>Then the first scheduling pass produces a sequence which will fail in
>reload because the register pressure is too high in the combined BB,
>right?
>  
>
It's not the register pressure in general, but the register pressure on 
r0, at the point
where the register calue copy is, in particular.
We used to have the convenstion that the return value copy must stay an 
immediate
successor of the call_insn till reload.  This is how we can keep 
recognizing the return
value copy as such, and avoid moving it or doing invalid combinations.  
This becomes
harder to maintain when the two insn are not in the same basic block 
anymore.  We must
get back to the point where every optimizer with a need to know can 
recognize a return
value copy and leave it alone.

>As an experiment, I've ruled out the above situation with
>
>	edge e = NULL;
>
>	if (single_pred_p (b)
>	    && (single_pred_edge (b)->flags & EDGE_FALLTHRU))
>	  {
>	    edge_iterator ei;
>	    basic_block d = single_pred_edge (b)->src;
>
>	    FOR_EACH_EDGE (e, ei, d->succs)
>	      if (e->flags & EDGE_COMPLEX)
>		break;
>	  }
>
>	if (e)
>
>and made the above combining not to happen in try_optimize_cfg.  It
>gets rid of the ICE, though I'm not sure that it's the Right Thing
>even for SH-4.
>  
>
Combining of blocks as such is actually harmless.  What causes trouble 
is moving the
call insn or the return value copy from their places at the edge of 
their blocks,
or inserting any insns or entire blocks in between.

I think that the fundamental bug is the way that exception handling 
places basic block
boundaries after call_insns, without changing all the code that relies 
on basic blocks
only being ended by jumps, labels, or the end of a function.
You could add some dummy insn (could be even nominally a pure CALL_INSN) 
at the
start of the basic block that does the return value copy to stop the 
register allocation problems
there, but that would still not stop optimizers like lcm placing new 
basic blocks after the
CALL_INSN.  For the purposes of most optimizers, the fall-through block 
after the CALL_INSN
must still be considered to be the same basic block as the one that 
contains the CALL_INSN.
I wonder if it really makes sense getiing almost every optimization pass 
to pretend these basic
block boundaries are not there.  A better solution might be to have some 
separate way to represent
the abnormal edge coming out of the CALL_INSN.
A particularily tricky question is how we should handle the destination 
block of the abnormal edge
for register allocation.  THe abnormal edge actually comes out from the 
middle of the call_insn:
after all in input operands needed to make the call are evaluated, but 
before the result is assigned.
But we can't pretend the call_insn consists actually of two insns and 
that we have an ordinary
basic block boundary in between: we can't insert a new basic block 
there, nor can we insert
reload insns (e.g. RELOAD_FOR_OUTADDR) inside the call_insn.
I suppose we'll have to strictly distinguish CALL_INSNs that set a hard 
register - these should be
subject to protection of their output value - an those that set a MEM, a 
pseudo, or nothing - these
should be OK to handle with the current scheme (except for performance 
problems due to misguided
heuristics).  If the CALL_INSN only sets a hard register, no reloads are 
required for the output part,
and for the input part we are restricted to input reloads only - as we 
are already for JUMP_INSNs.
So with these caveats, we can treat these CALL_INSNs as split inside by 
a basic block boundary,
but only during reload/global_alloc.

N.B. it might look tempting to 'solve' the problem by changing the 
CALL_INSN to set a pseudo,
but that would work only for the simples of CALL_INSNs.  The 'z' 
constrait can only be used for
return registers that consist only for R0 itself.  A DImode / TImode / 
DFmode value needs
more than one hard register.  If we introduced register classes for 
every kind of return value
register set,  REGNO_REG_CLASS for a lot of registers would change, and 
register allocation
would get worse.  Look at global.c:find_reg to see why this would be a 
bad idea.  If we wanted to
go that way, we'd have to have a new target macro/hook to tell 
global/regrename what register classes
to use in order to check if two registers are supposed to be considered 
equivalent.  And we'll need a new
MODE parameter for REG_CLASS_FROM_CONSTRAINT to avoid an explosion of 
the number of call
patterns.
 



More information about the Gcc-patches mailing list