[patch RFC] SH: PR target/22553
Joern RENNECKE
joern.rennecke@st.com
Fri Oct 14 12:40:00 GMT 2005
Kaz Kojima wrote:
>I've tried to see what is going on according to your explanation for
>this issue.
>In that PR, the basic blocks before try_optimize_cfg look like the left
>side of the following figure and try_optimize_cfg combines two blocks
>into one block
>
> +-----------+
> | ... |
> | call insn |
> +-----------+
> | ... \
> fall | ... \ abnormal
> | ... \
> | ... \
> V
> +-----------+
> b: | val. copy |
> +-----------+ +-----------+
> | combine b&c | val. copy |
> | jump -> | ... |
> V | |
> c: +-----------+ +-----------+
> | ... |
> +-----------+
>
>Then the first scheduling pass produces a sequence which will fail in
>reload because the register pressure is too high in the combined BB,
>right?
>
>
It's not the register pressure in general, but the register pressure on
r0, at the point
where the register calue copy is, in particular.
We used to have the convenstion that the return value copy must stay an
immediate
successor of the call_insn till reload. This is how we can keep
recognizing the return
value copy as such, and avoid moving it or doing invalid combinations.
This becomes
harder to maintain when the two insn are not in the same basic block
anymore. We must
get back to the point where every optimizer with a need to know can
recognize a return
value copy and leave it alone.
>As an experiment, I've ruled out the above situation with
>
> edge e = NULL;
>
> if (single_pred_p (b)
> && (single_pred_edge (b)->flags & EDGE_FALLTHRU))
> {
> edge_iterator ei;
> basic_block d = single_pred_edge (b)->src;
>
> FOR_EACH_EDGE (e, ei, d->succs)
> if (e->flags & EDGE_COMPLEX)
> break;
> }
>
> if (e)
>
>and made the above combining not to happen in try_optimize_cfg. It
>gets rid of the ICE, though I'm not sure that it's the Right Thing
>even for SH-4.
>
>
Combining of blocks as such is actually harmless. What causes trouble
is moving the
call insn or the return value copy from their places at the edge of
their blocks,
or inserting any insns or entire blocks in between.
I think that the fundamental bug is the way that exception handling
places basic block
boundaries after call_insns, without changing all the code that relies
on basic blocks
only being ended by jumps, labels, or the end of a function.
You could add some dummy insn (could be even nominally a pure CALL_INSN)
at the
start of the basic block that does the return value copy to stop the
register allocation problems
there, but that would still not stop optimizers like lcm placing new
basic blocks after the
CALL_INSN. For the purposes of most optimizers, the fall-through block
after the CALL_INSN
must still be considered to be the same basic block as the one that
contains the CALL_INSN.
I wonder if it really makes sense getiing almost every optimization pass
to pretend these basic
block boundaries are not there. A better solution might be to have some
separate way to represent
the abnormal edge coming out of the CALL_INSN.
A particularily tricky question is how we should handle the destination
block of the abnormal edge
for register allocation. THe abnormal edge actually comes out from the
middle of the call_insn:
after all in input operands needed to make the call are evaluated, but
before the result is assigned.
But we can't pretend the call_insn consists actually of two insns and
that we have an ordinary
basic block boundary in between: we can't insert a new basic block
there, nor can we insert
reload insns (e.g. RELOAD_FOR_OUTADDR) inside the call_insn.
I suppose we'll have to strictly distinguish CALL_INSNs that set a hard
register - these should be
subject to protection of their output value - an those that set a MEM, a
pseudo, or nothing - these
should be OK to handle with the current scheme (except for performance
problems due to misguided
heuristics). If the CALL_INSN only sets a hard register, no reloads are
required for the output part,
and for the input part we are restricted to input reloads only - as we
are already for JUMP_INSNs.
So with these caveats, we can treat these CALL_INSNs as split inside by
a basic block boundary,
but only during reload/global_alloc.
N.B. it might look tempting to 'solve' the problem by changing the
CALL_INSN to set a pseudo,
but that would work only for the simples of CALL_INSNs. The 'z'
constrait can only be used for
return registers that consist only for R0 itself. A DImode / TImode /
DFmode value needs
more than one hard register. If we introduced register classes for
every kind of return value
register set, REGNO_REG_CLASS for a lot of registers would change, and
register allocation
would get worse. Look at global.c:find_reg to see why this would be a
bad idea. If we wanted to
go that way, we'd have to have a new target macro/hook to tell
global/regrename what register classes
to use in order to check if two registers are supposed to be considered
equivalent. And we'll need a new
MODE parameter for REG_CLASS_FROM_CONSTRAINT to avoid an explosion of
the number of call
patterns.
More information about the Gcc-patches
mailing list