RFA: reload infrastructure to fix PR target/21623

Thu Sep 29 14:58:00 GMT 2005

Ian Lance Taylor wrote:

>  
>
>SECONDARY_{,INPUT_,OUTPUT_}RELOAD_CLASS and reload_{in,out} are
>already incredibly confusing.  I don't think that adding another
>confusing feature on top of them is the right way to go.
>  
>
This actually means the reload patterns more intuitive, and simpler to
code, too.

>With our current infrastructure, I can't think of any benefit to the
>reload_in and reload_out patterns at all.  They've been around for a
>long time, at least since gcc 1.23, and maybe they once made sense.
>But now they are always implemented using define_expand.  The
>constraints are mainly used in a strange combination with the
>SECONDARY_RELOAD_CLASS macros to determine how many and which types of
>scratch registers are required.
>
>If we're going to tackle this at all, I think we should simplify it.
>For example, suppose we add a new target hook
>
>int TARGET_SECONDARY_RELOAD (enum reg_class, enum machine_mode,
>                             rtx x, bool in, enum reg_class *classes)
>
>Like today, if IN is true we are copying X to a register in CLASS with
>MODE, and otherwise we are copying a register in CLASS with MODE to X.
>The function returns the number of scratch registers needed, and
>stores the required classes in *CLASSES.  We use this information in
>push_secondary_reload.  We can say that the maximum is two scratch
>registers for now, and we can increase that without changing the
>interface if some target ever needs more (which I hope is unlikely).
>  
>
This won't work as proposed.  To match the current functionality, we 
also need
information on the modes, and which registers are temporaries that are 
actually
holding the reloaded valuei, i.e. suitable for reload inheritance.
Ideally, we'd give every register a value where possible, i.e. the hook 
could fill in
an rtx, and also provide information that amounts to a dependency graph.

Another thing to note is that the lack of more than one scratch register has
forced some port to write reload patterns that use scratch registers in wide
modes in order to  get mutliple hard registers.  This is not only backwards
from a design point of view, it is also undesirable because group needs are
harder to satisfy, we generally only have modes available to get a number
of registers that is a power of two, and only up to so small limit, and we
can't have more than one class of scratch register.
Thus, any radical interface change should support more than one scratch
register in addition to one or better multiple temporary registers.

We should try to avoid merging everything into one big blob, as
SECONDARY_RELOAD_CLASS does.  I think this can be accomplished
assigning the actual subtarget-specific target hook in OVERRIDE_OPTIONS,
and inside the hook using a switch statement using the reload_mode
(as a substitute for reload_in_optab / reload_out_optab).

>Then we add a corresponding target hook to emit the instructions:
>
>void TARGET_EMIT_SECONDARY_RELOAD (rtx to, rtx from, rtx *scratch)
>
>This is copying a value from FROM to TO.  The hook will only be called
>if TARGET_SECONDARY_RELOAD returned non-zero.  SCRATCH will be an
>array of the registers allocated by the reload pass.  They will have
>the classes requested by TARGET_SECONDARY_RELOAD.  This hook must
>ensure that any insns that it emits do not themselves require
>reloading.
>  
>
That means that the port writer sudenly has to write special code to make
straightforward copies from/to temporary registers.
If we go with dependency and value information as suggested above, we
can avoid this by noting when a reload depends on a single other reload -
or the original value - and the value rtxes are identical.  Then we can
emit a simple move insn.

>To me this seems a lot simpler than trying to remember what the
>register constraints mean in the reload_{in,out} patterns.  And it
>seems to give us the flexibility that you want.
>  
>
What problem are you exactly talking about here?
As a port maintainer, you need to be able to remember (or take the time to
look up every time) what the constraint letters/strings mean.  They are used
throughout the md file for insn patterns, so we can't do away with them
without completely redesigning the interface.
The problem that is specific to reload patterns is that the constraints
don't actually work when you want to tighten up the conditions when they
are used.  This is one of the problems my patch solves (except I don't 
evaluate
the constraint that goes with the predicate that is evaluated; that 
would be a
possible improvement, since it can avoid the necessity to write a specific
predicate when a constraint is available to get the desired effect.  
However,
this is more on the nice-to-have level, since writing another predicate 
is cheap,
while the current interface inflexibility leads to crooked and broken 
ports).

>
>This is probably not stage 3 material, but I don't think your patch is
>really stage 3 material either. 
>
Your proposal is definitely not stage 3 material, since you are completely
redesigning the interface.  And to make it work properly, you'll also 
have to
change some things how reloads are processed, which puts it into the 
high risk,
stage1 - only category.
My patch, on the other hand, is well localized, does not change the actual
workings of reload, and is designed to only affect ports that choose to use
the new interface capabilities.  I think that makes it safe enough for 
stage3.
The patch as posted triggers a spurious uninitialized warning on 
t_pat_classes
during bootstrap, but that could be worked around by adding an unnecessary
initialization.  (A lot of such initalizations have been added to gcc 
for this reason.
I see that the GNU coding standard has been changed to allow this practice.)

> Is there any temporary patch which
>can fix the problem at hand for the 4.1 release, perhaps at the cost
>of generating worse code?
>  
>
The problem at hand is that it is impossible to describe the reloads needed
to reload a constant into fpul and T into a floating point register.
Describing one reload correctly has the side effect of  forcing an incorrect
description for the other reload.

I could introduce and reload_insi predicate which accepts T in addition to
immediate_operand, and use that for reload_insi.  However, this forces
to use r0 as the scratch for the T -> rn -> fpul ->frm reload, thus causing
possible ICEs when r0 is in use as the return value regiser.
Worse, it will channel every secondary reload of T for any other register
to go through r0 and fpul, which not only generates worse code, it is likely
to generate unrecognizable instructions.  E.g. you can't move directly from
fpul to macl.  Thus, the problem is not solvable with the current
one-reloadinsi-alternative-fits-all approach.

 We could have a target macro
FROB_SECONDARY_RELOAD(x, reload_mode, reload_class, class, icode) ,
which, if defined, is called just after the predicate check in 
push_secondary_reload,
and inspects and/or modifies these variables at will.
The very uncleanness of this interface should be an incentive to replace 
it with
something solid in phase 1.