This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [rfc] multi-word subreg lowering pass
- From: Björn Haase <bjoern dot m dot haase at web dot de>
- To: Roger Sayle <roger at eyesopen dot com>
- Cc: gcc-patches at gcc dot gnu dot org, Richard Henderson <rth at redhat dot com>
- Date: Sun, 28 May 2006 10:30:08 +0200
- Subject: Re: [rfc] multi-word subreg lowering pass
- References: <Pine.LNX.4.44.0605280835280.26534-100000@www.eyesopen.com>
Hi Roger,
> Thoughts?
> I just mention this as an alternative (but possibly
> complementary approach) to explicit SUBREG lowering. Although poor
> x86 DImode arithmetic is listed in bugzilla as a regression from 2.95,
> the complexity of a solution means I've been working on other regressions
> more suitable for 4.1 and 4.2. It's also unclear whether there are
> debugging and other issues that need to be addressed first, or whether
> the improvements I've seen on microbenchmarks can be reproduced in
> 186.crafty and similar "long long" heavy applications.
I did not understand all of the details of your description. I agree with you
in that the key point is indeed not which exact method is used for lowering
the composite-mode expression. I.e. I think that it is not important if one
generates subregs in a first step in order to replace them afterwards or if
there never exist subregs in the RTL but only the smashed individual word
operations.
Making the default expanders smarter than they are right now would certainly
be a good idea and would facilitate the task of targeting 8/16 bit machines,
even if it would be better (in the case of AVR) to use specially tailored
expanders in order to facilitate optimizations.
> What we gain
> in register allocation, scheduling, combine and CSE/GCSE, we also loose
> for higher level tranformations where RTL loop optimizers can no longer
> figure out the number of loop iterations when using a long long index,
> for example.
That sounds like you came to similar conclusions concerning the timing of the
lowering process. Abstracting from the way the lowering is done: I understand
your mail so that it is probably better to delay the smashing into individual
word operations until after the initial RTL optimizers had a chance to see
the undamaged original value.?
One could possibly try to use the same "first split and then lower" method
that I plan to use for avr. In the case of the x86 (and when, e.g. using RTHs
approach) this would mean to use splitting patterns that transform a
monolithic DImode operation into two subsequent SImode operations that work
on the two different subregs. Given that no references to the monolithic
values remain after splitting all of the operations the lowering could
generate individual operations on usual SImode registers and the DImode
object would have completely vanished.
It would not at all be important which way one uses to replace references to
the DImode value. The important things would be (IMO) that 1.) after the
lowering step only word mode objects remain and 2.) that the removal of
references to the composite object take place before register allocation and
scheduling start.
Bjoern.