This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [rfc] multi-word subreg lowering pass
- From: BjÃrn Haase <bjoern dot m dot haase at web dot de>
- To: Richard Henderson <rth at redhat dot com>
- Cc: denisc at overta dot ru, gcc-patches at gcc dot gnu dot org
- Date: Sun, 28 May 2006 05:51:42 +0200
- Subject: Re: [rfc] multi-word subreg lowering pass
- References: <200505062218.15370.bjoern.m.haase@web.de> <20050507033326.GC23300@redhat.com> <200505072051.14372.bjoern.m.haase@web.de>
Hello Richard,
much time has passed, but your subreg lowering work was not forgotten: I still
consider it to be extremely valuable for avr and I'd like to come back to
your patch dating from last year:
I have successfully re-animated your code this week with head when making
experiments for cc0->CCmode conversion for the AVR target.
My objective is to rewrite large portions of the back-end so that the
complexity that is now handled on the text output pass is readily expressed
in the RTL. Including condition code handling and subreg lowering.
In order to make it possible to realize at least similar performance as the
present back-end, however, I expect that some support from the mid-end would
be needed:
Your subreg lowering pass would be the key component of the desired new
infrastructure. In contrast to the present version, I found out that it would
be important to do the subreg lowering fairly late. I.e. what I would be
needing is the subreg lowering pass after combine! (I'll shortly explain
why.) This generates problems with the present implementation since subreg
lowering inserts -- of course -- new pseudos and this seems to be forbidden
after life analysis.
Why lowering after combine?
One difficulty that I have identified is that when doing the subreg lowering
directly after expand, one will miss optimization opportunities that are
extremely important for the typical bit-banging applications for small
microcontrollers. In order to illustrate the issue:
The AVR architecture supports, e.g., a "test_single_bit_and_branch" pattern
that itself could be good for up to 3-5% code size decrease in my own
real-world applications. After expand we are having right now RTL like
(set (reg:HI 42) (and:HI (reg:HI 43) (const_int 32)))
(set (cc0) (compare:HI (reg:HI 42) (const_int 0)))
(set (pc) (.....))
. The present back-end has patterns where combine could transform such a
sequence into the "test_single_bit_and_branch" form. This is possible because
we are having only 3 data flow related insn.
When smashing above sequence into QImode pieces after expand, my impression is
that the RTL will get so complex that we will never be able to identify the
optimization opportunity.
Similar situations will show up when one plans to do condition-code re-use.
(Possibly one could use ugly tricks with tons of libcall and reg-equal notes.
However, IIUC the general idea has been to try to avoid such backyard magic
in the future by trying to express all of the information in the RTL.?)
The good news that I have identified when analyzing 4.2.x is that the first
splitting pass has moved and is now located before register allocation takes
place. Also the expander does not seem to have any problems if more than two
patterns are generated. This observation lead me to the following ...
... plan to re-write the avr back-end (including cc0->CCmode conversion):
My present plan for the modified back-end is, to
1.) Expand RTL similar to the existing one (except for the double sets and
clobbers required when using CCmode). Most imporantly, one would have RTL
that still refers to SImode and HImode expressions.
2.) Use the CSE pass in order to find places where the condition code could be
re-used.
3.) Use combine for identifying possible uses of the "bit-banging" patterns
when operating on HI and SI expressions. My present impression is that there
stand good chances to find the optimization opportunities despite the new
clobbers and double sets.
4.) Use splitting patterns after combine for smashing most of the HI and SI
mode calculations into patterns that now truly correspond to avr machine
instructions. We will be having a completely changed RTL after this step!!!
(Possible exceptions to lowering to QImode are volatile memory references and
possible uses of adiw, sbiw and movw that should be kept as HImode
operations.) Also double sets would be replaced here by single sets with
clobbers if the second result does not happen to be used.
5.) Use the subreg lowering pass in order to give the register allocator and
the optimizers more freedom.
6.) Re-run the most important RTL optimizations in order to find new
optimization opportunities in the now smashed RTL. I think that probably one
additional CSE pass already would do. Possibly other passes could be helpful
as well. These passes would only be used for back-ends that use subreg
lowering.
7.) Expand special move patterns that do not clobber the condition code in
case that reload happens to requires to insert memory moves.
I am confident to have learned enough about gcc right now to be able to
realize all of the above steps myself except for step 5 (ok, possibly I might
also need a bit of help depending on what reload might spit out when it sees
the smashed RTL :-) ).
The key difficulty with 5.) that I am facing right now is that your original
patch works fine as long as I place it shortly after expand. The present
implementation, however seems to have problems when generating new pseudos
after life analysis.
Summing up, everything above culminates to the following questions:
1.) Is there hope to get a subreg-lowering pass integrated after combine in
4.3.x (together with a couple of optional RTL passes between subreg-lowering
and reg allocation?)?
2.) What would I need to do If I would like to get a pass working that inserts
new pseudos after combine?
Your help would be appreciated.
Bjoern
P.S.:
In order to motivate the more skilled gcc folks to consider above issue: Maybe
what comes out could also prove to be helpful for future improvements of 64
bit arithmetics on 32 bit hosts.?