RFA: reload infrastructure to fix PR target/21623

Thu Sep 29 17:42:00 GMT 2005

Joern RENNECKE <joern.rennecke@st.com> writes:

> Ian Lance Taylor wrote:
> 
> >  SECONDARY_{,INPUT_,OUTPUT_}RELOAD_CLASS and reload_{in,out} are
> >already incredibly confusing.  I don't think that adding another
> >confusing feature on top of them is the right way to go.
> >
> This actually means the reload patterns more intuitive, and simpler to
> code, too.

Agreed.  Still, it amounts to an extra layer on top.

> >With our current infrastructure, I can't think of any benefit to the
> >reload_in and reload_out patterns at all.  They've been around for a
> >long time, at least since gcc 1.23, and maybe they once made sense.
> >But now they are always implemented using define_expand.  The
> >constraints are mainly used in a strange combination with the
> >SECONDARY_RELOAD_CLASS macros to determine how many and which types of
> >scratch registers are required.
> >
> >If we're going to tackle this at all, I think we should simplify it.
> >For example, suppose we add a new target hook
> >
> >int TARGET_SECONDARY_RELOAD (enum reg_class, enum machine_mode,
> >                             rtx x, bool in, enum reg_class *classes)
> >
> >Like today, if IN is true we are copying X to a register in CLASS with
> >MODE, and otherwise we are copying a register in CLASS with MODE to X.
> >The function returns the number of scratch registers needed, and
> >stores the required classes in *CLASSES.  We use this information in
> >push_secondary_reload.  We can say that the maximum is two scratch
> >registers for now, and we can increase that without changing the
> >interface if some target ever needs more (which I hope is unlikely).
> >
> This won't work as proposed.  To match the current functionality, we
> also need
> information on the modes, and which registers are temporaries that are
> actually
> holding the reloaded valuei, i.e. suitable for reload inheritance.
> Ideally, we'd give every register a value where possible, i.e. the
> hook could fill in
> an rtx, and also provide information that amounts to a dependency graph.

OK, fair enough.  I personally think the reload inheritance stuff
should be removed, and handled in a separate CSE pass.  But you're
right that with the current scheme we need the modes, and we need to
describe which register hold the intermediate value.  Would it really
help to fill in an rtx?  I'm not sure, because I don't see what reload
would usefully do with that.

I think reload in general ought to have a dependency graph.  It would
be useful in, e.g., merge_assigned_reloads and no doubt other places.
It's not clear to me that we should introduce a dependency graph here
in advance of one used generally in reload.  In particular, what would
reload do with it today?

> Another thing to note is that the lack of more than one scratch register has
> forced some port to write reload patterns that use scratch registers in wide
> modes in order to  get mutliple hard registers.  This is not only backwards
> from a design point of view, it is also undesirable because group needs are
> harder to satisfy, we generally only have modes available to get a number
> of registers that is a power of two, and only up to so small limit, and we
> can't have more than one class of scratch register.
> Thus, any radical interface change should support more than one scratch
> register in addition to one or better multiple temporary registers.

OK, makes sense.

> We should try to avoid merging everything into one big blob, as
> SECONDARY_RELOAD_CLASS does.  I think this can be accomplished
> assigning the actual subtarget-specific target hook in OVERRIDE_OPTIONS,
> and inside the hook using a switch statement using the reload_mode
> (as a substitute for reload_in_optab / reload_out_optab).

I actually don't like changing the target hook in OVERRIDE_OPTIONS,
because I would prefer that targetm be a const variable.  But I know
that targets do this today for various reasons, e.g., ARM for
rtx_costs.  But that is a minor detail--for example, it's easy enough
to have the single function call through a function pointer.

> >Then we add a corresponding target hook to emit the instructions:
> >
> >void TARGET_EMIT_SECONDARY_RELOAD (rtx to, rtx from, rtx *scratch)
> >
> >This is copying a value from FROM to TO.  The hook will only be called
> >if TARGET_SECONDARY_RELOAD returned non-zero.  SCRATCH will be an
> >array of the registers allocated by the reload pass.  They will have
> >the classes requested by TARGET_SECONDARY_RELOAD.  This hook must
> >ensure that any insns that it emits do not themselves require
> >reloading.
> >
> That means that the port writer sudenly has to write special code to make
> straightforward copies from/to temporary registers.
> If we go with dependency and value information as suggested above, we
> can avoid this by noting when a reload depends on a single other reload -
> or the original value - and the value rtxes are identical.  Then we can
> emit a simple move insn.

Or, even simpler, have default_emit_secondary_reload insist on having
a single scratch register, and emit instructions to copy from FROM to
the scratch register, and then from the scratch register to TO.

> >To me this seems a lot simpler than trying to remember what the
> >register constraints mean in the reload_{in,out} patterns.  And it
> >seems to give us the flexibility that you want.
> >
> What problem are you exactly talking about here?

I'm talking about the general problem: the way that
SECONDARY_RELOAD_CLASS and reload_{in,out} work together to describe
the intermediate and scratch registers needed.  In particular the
magic way in which you use a constraint which doesn't match the
original reload register in order to get a scratch register.  There is
no reason to make people go through that dance.

And my feeling about your patch is that it adds another layer on top
of the already complex situation.  The problem is that we are trying
to pick a register by writing specific constraints in specific
patterns, rather than just saying what we mean in a simple way.  I
don't think that adding more named patterns is the solution.

> My patch, on the other hand, is well localized, does not change the actual
> workings of reload, and is designed to only affect ports that choose to use
> the new interface capabilities.  I think that makes it safe enough for
> stage3.

Your patch is extending the secondary reload interface in a way that I
don't think we want to go.

But of course you should feel free to seek a second opinion.

> > Is there any temporary patch which
> >can fix the problem at hand for the 4.1 release, perhaps at the cost
> >of generating worse code?
> >
> The problem at hand is that it is impossible to describe the reloads needed
> to reload a constant into fpul and T into a floating point register.
> Describing one reload correctly has the side effect of  forcing an incorrect
> description for the other reload.
> 
> I could introduce and reload_insi predicate which accepts T in addition to
> immediate_operand, and use that for reload_insi.  However, this forces
> to use r0 as the scratch for the T -> rn -> fpul ->frm reload, thus causing
> possible ICEs when r0 is in use as the return value regiser.
> Worse, it will channel every secondary reload of T for any other register
> to go through r0 and fpul, which not only generates worse code, it is likely
> to generate unrecognizable instructions.  E.g. you can't move directly from
> fpul to macl.  Thus, the problem is not solvable with the current
> one-reloadinsi-alternative-fits-all approach.
>  We could have a target macro
> FROB_SECONDARY_RELOAD(x, reload_mode, reload_class, class, icode) ,
> which, if defined, is called just after the predicate check in
> push_secondary_reload,
> and inspects and/or modifies these variables at will.
> The very uncleanness of this interface should be an incentive to
> replace it with
> something solid in phase 1.

I would be more or less OK with something like that for 4.1, in the
hopes of replacing it when we are back in stage 1.

Ian