hard register reload patch
Jeffrey A Law
Tue Nov 30 19:03:00 GMT 1999
In message < Pine.LNX.firstname.lastname@example.org >you writ
> I suggest we keep the rules simple and easy to understand: the only insns
> which reference hard registers may be moves which copy them into/out of
> pseudos. Hard registers may not live across any other insns.
> To me, this appears to be what was originally intended, but it's not how
> parts of the compiler behave these days. [I have an unreviewed combiner
> patch (~ 10 months old) which tries to replace the (convoluted) logic
> dealing with hard regs and SMALL_REGISTER_CLASSES with code that just
> enforces these
I remember that patch. And I still think that particular issue is better
solved by improving the LOG_LINKS that we generate. Ultimately what we need to
do is prevent pseudos which must be assigned to the same hard register from
having overlapping lifetimes.
We can do that either via a series of hacks based on SMALL_REGISTER_CLASSES,
which is rather lame. First, SMALL_REGISTER_CLASSES as a whole is a hack.
Second, there are targets which tend to have plenty of hard registers is
most classes, but may have one or two classes with a limited number of
registers. Third, it just doesn't seem right.
Or we can attack the problem using dependency analysis, which is a hell of
a lot cleaner IMHO and works regardless of the setting of S_R_C and has
the potential to provide us the benefits of still being able to optimize
hard registers that appear in the combiner, but not muck up things in such
a way as to have multiple pseudos which have to be allocated to the same
hard register with overlapping lifetimes.
By accounting for pseudos which must be allocated to particular hard registers
during life analysis, we'll avoid creating LOG_LINKS which encourage combine
to make transformations which lead to problems.
> The SH port explicitly sets this up in some patterns. (Some of the
> following is partly guesswork, since I haven't yet seen the testcase that
> led to this patch). Take a pattern like this:
> (define_insn "fix_truncdfsi2_i"
> [(set (reg:SI 22)
> (fix:SI (match_operand:DF 0 "arith_reg_operand" "f")))
> (use (match_operand:PSI 1 "fpscr_operand" "c"))]
> "ftrc %0,fpul"
> [(set_attr "type" "dfp_conv")])
> If I recall Joerns explanation correctly, the problem is that the input
> operand may need reg 22 (fpul) for a secondary reload. IMO, the pattern
> should be changed to use a match_operand with a single register class for
> the output operand. That should enable reload to work correctly; we'd get
> two reloads for reg 22 which don't conflict.
> Joern believes that this approach would cause unacceptable loss of code
> quality, which is possible, but I haven't seen any evidence yet to support
I strongly agree with Bernd here.
This is exactly the kind of thing we should generally be avoiding in machine
descriptions because it makes correctness of the machine description highly
dependent on the internals of reload and other passes. That kind of dependency
is what leads to long term maintainability problems.
> There was a discussion about this a month or two ago (before someone tries
> to find it in the mail archives: that was on the cygnus local gcc list).
> I sent a patch against sh.md plus some analysis in which I tried to show
> that code quality is the same on average. (Note that my patch wasn't
> tested for correctness; as I said I'm lacking a testcase. It may have
> been completely bogus).
Having done similar analysis on ports with very small register sets (mn102
and mn103) I generally found it to be an overall wash from a code quality
standpoint. In some cases it was better. In other cases it was worse. In
the end they tended to cancel each other out and I choose the most maintainable
version (no explicit hard registers).
The net result is we have two machine descriptions that tend to work and
work reliably regardless of what changes inside reload -- even though they
can be more limited than in the sh in terms of register allocation choices
for common operations.
> > I'm not going to accept or reject the patch right now -- I want to get a
> > better understanding of why we need to solve this problem in reload.
> > Thus, my comments below apply if and only if we decide that solving this
> > problem in reload is the right thing to do.
> I believe that, regardless of what we decide to do about the problems
> described above, there are at least some parts of the patch which are going
> in the right direction and make sense.
Then let's extract those one by one and deal with them. I'm all for extracting
pieces that are useful now and continuing the debate about rest.
> It's a start to fix one of the remaining problems that make reload the mess
> that it is today: reload registers are allocated both in reload1.c
> (choose_reload_regs, find_reload_regs, and all the rest) and in reload.c
> (all the places that set reload_reg_rtx). The latter know about the fact
> that certain hard regs live only through parts of the insn and make
> decisions based on that knowledge, while the former ignore this issue.
> It would be nice to have only one place in reload that allocates reload
> registers. Joerns patch is a step in that direction (combine_reloads goes
How related is this to your reload cleanup patch. It sounds like the same
thing, and I've already agreed that we're going in the right direction.
Continuing down this road seems like a reasonable thing to do.
> I'd like to see all the RELOAD_FOR_xxx codes go away. I've got half-done
> patches to scan the rtl for replacements and thus build a definition/use
> chain for reloads which should enable us to schedule them precisely with
> much simpler code than reload_reg_free_xxx/reloads_conflict/etc. But I'll
> leave that discussion for another occasion 8)
Ohhh. I can't wait. I think that really hits on one of the fundamental
problems with reload -- we're trying to encode way too much information
in lifetimes and such. def-use/use-def chains would be a huge conceptual
leap in the right direction.
Ironic that it's the same basic solution as what I recommend for fixing the
S_R_C vs combine problem, or for solving the "sharing of stack slots vs
aliasing problem", and probably many others. I like def-use/use-def chains
(or their moral equivalents like ssa :-)
So, the question is, given the reload patches that have already been submitted
how do we prioritize them and how do we break up the immediately useful ideas
from Joern's patch and get them into the queue?
I'm tempted to start with your cleanup patch first simply because it does
make parts of reload less fragile and easier to understand. It also seems
like is fairly close to being ready to integrate. Probably the next one would
be Joern's code to improve live_before/live_after stuff. Then carve out the
next useful concept.
More information about the Gcc-patches