This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [rfc] multi-word subreg lowering pass


Hello Richard,

much time has passed, but your subreg lowering work was not forgotten: I still 
consider it to be extremely valuable for avr and I'd like to come back to 
your patch dating from last year: 

I have successfully re-animated your code this week with head when making 
experiments for cc0->CCmode conversion for the AVR target.

My objective is to rewrite large portions of the back-end so that the 
complexity that is now handled on the text output pass is readily expressed 
in the RTL. Including condition code handling and subreg lowering. 

In order to make it possible to realize at least similar performance as the 
present back-end, however, I expect that some support from the mid-end would 
be needed:

Your subreg lowering pass would be the key component of the desired new 
infrastructure. In contrast to the present version, I found out that it would 
be important to do the subreg lowering fairly late. I.e. what I would be 
needing is the subreg lowering pass after combine! (I'll shortly explain 
why.) This generates problems with the present implementation since subreg 
lowering inserts -- of course -- new pseudos and this seems to be forbidden 
after life analysis.


Why lowering after combine?

One difficulty that I have identified is that when doing the subreg lowering 
directly after expand, one will miss optimization opportunities that are 
extremely important for the typical bit-banging applications for small 
microcontrollers. In order to illustrate the issue:

The AVR architecture supports, e.g., a "test_single_bit_and_branch" pattern 
that itself could be good for up to 3-5% code size decrease in my own 
real-world applications. After expand we are having right now RTL like

(set (reg:HI 42) (and:HI (reg:HI 43) (const_int 32)))
(set (cc0) (compare:HI (reg:HI 42) (const_int 0)))
(set (pc) (.....))

. The present back-end has patterns where combine could transform such a 
sequence into the "test_single_bit_and_branch" form. This is possible because 
we are having only 3 data flow related insn.
When smashing above sequence into QImode pieces after expand, my impression is 
that the RTL will get so complex that we will never be able to identify the 
optimization opportunity.
Similar situations will show up when one plans to do condition-code re-use. 
(Possibly one could use ugly tricks with tons of libcall and reg-equal notes. 
However, IIUC the general idea has been to try to avoid such backyard magic 
in the future by trying to express all of the information in the RTL.?)


The good news that I have identified when analyzing 4.2.x is that the first 
splitting pass has moved and is now located before register allocation takes 
place. Also the expander does not seem to have any problems if more than two 
patterns are generated. This observation lead me to the following ...

... plan to re-write the avr back-end (including cc0->CCmode conversion):

My present plan for the modified back-end is, to 
1.) Expand RTL similar to the existing one (except for the double sets and 
clobbers required when using CCmode). Most imporantly, one would have RTL 
that still refers to SImode and HImode expressions.
2.) Use the CSE pass in order to find places where the condition code could be 
re-used.
3.) Use combine for identifying possible uses of the "bit-banging" patterns 
when operating on HI and SI expressions. My present impression is that there 
stand good chances to find the optimization opportunities despite the new 
clobbers and double sets.
4.) Use splitting patterns after combine for smashing most of the HI and SI 
mode calculations into patterns that now truly correspond to avr machine 
instructions. We will be having a completely changed RTL after this step!!! 
(Possible exceptions to lowering to QImode are volatile memory references and 
possible uses of adiw, sbiw and movw that should be kept as HImode 
operations.) Also double sets would be replaced here by single sets with 
clobbers if the second result does not happen to be used.
5.) Use the subreg lowering pass in order to give the register allocator and 
the optimizers more freedom.
6.) Re-run the most important RTL optimizations in order to find new 
optimization opportunities in the now smashed RTL. I think that probably one 
additional CSE pass already would do. Possibly other passes could be helpful 
as well. These passes would only be used for back-ends that use subreg 
lowering.
7.) Expand special move patterns that do not clobber the condition code in 
case that reload happens to requires to insert memory moves.


I am confident to have learned enough about gcc right now to be able to 
realize all of the above steps myself except for step 5 (ok, possibly I might 
also need a bit of help depending on what reload might spit out when it sees 
the smashed RTL :-) ). 

The key difficulty with 5.) that I am facing right now is that your original 
patch works fine as long as I place it shortly after expand. The present 
implementation, however seems to have problems when generating new pseudos 
after life analysis.

Summing up, everything above culminates to the following questions:

1.) Is there hope to get a subreg-lowering pass integrated after combine in 
4.3.x (together with a couple of optional RTL passes between subreg-lowering 
and reg allocation?)?
2.) What would I need to do If I would like to get a pass working that inserts 
new pseudos after combine?

Your help would be appreciated.

Bjoern

P.S.: 
In order to motivate the more skilled gcc folks to consider above issue: Maybe 
what comes out could also prove to be helpful for future improvements of 64 
bit arithmetics on 32 bit hosts.?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]