[rfc] subreg lowering pass / Overcoming double-set difficulties for CC re-use

Sun Jun 18 20:25:00 GMT 2006

Steven Bosscher wrote on Montag, 12. Juni 2006 10:09 :
> On 6/12/06, Björn Haase <bjoern.m.haase@web.de> wrote:
> > Maybe concerning the pass sequence, there is no real "one size fits it
> > all". With my present working version of the AVR port, I am having so
> > much different RTL after splitting so that I am convinced that re-running
> > at least CSE would be a big advantage. This will be quite different for
> > the targets originally supporting 32 bit operations. Maybe one solution
> > could be to use target defines in order to selectively enable passes that
> > could be expected to be justified only for a subset of the targets.?
>
> That would be one option, but not one many people would like.
>
> It would be much better to just make CSE faster and less heavy (I'm
> the lone Don Quixote who is still trying to make this happen).  There
> are at least four unrelated quadratic bottle-necks in cse.c and
> probably dozens silly "optimizations" that take up a lot of analysis
> time without resulting in significantly better code.
>
> Until then, you could add e.g. a post-combine machine specific pass
> for AVR that runs CSE as often as you want ;-)
>
> Gr.
> Steven
Maybe indeed CSE is not the right tool for implementing part of the 
optimizations I was looking for. Basically, I was seeking for a method for 
implementing a replacement for the NOTICE_UPDATE_CC mechanism. E.g. it would 
be nice, if gcc could identify a situation where within the sequence

(set (reg:XX loop_counter) (minus:XX  (reg:XX loop_counter) (const_int -1)))
(set (reg:CC) (compare:CC (reg:XX loop_counter) (const_int 0)))
BNE to the start of the loop block.

the explicit compare instruction could be avoided if the preceeding subXX 
operation already sets the required condition codes.

The difficulty that I am seeing right now is, that we now need to expand RTL 
that uses (clobber (reg:CC)) instead of expanding RTL that is describing the 
full condition code complexity with double-set parallels. Otherwise the RTL 
optimizers would not work. For this reason CSE and friends can never identify 
the optimization opportunity.
IIUC, also combine is not able to help since there is no data-flow link 
between the arithmetic instruction that leaves a useful condition code behind 
and the branch instruction.? (Please correct me if I'm wrong.)

The possible re-uses of unexpectedly useful second results of double-sets insn 
is, IMO, most likely only for condition codes (with the possible exception of 
div/mod patterns).

Probably one would localize most of the optimization opportunities by looking 
at what happens during the last few instructions that are located close to 
the end of basic blocks that finish with a conditional branch.

The presently existing tiny CSE pass looks at possible re-uses of the CC reg 
*after* the branch is taken or not taken. The pass I am thinking about would 
be looking in front of the branch insn. It would be identifying the last CC 
setter that is located in front of the branch (most probably a compare insn) 
and would be looking for the next insn in the same basic block that also sets 
or clobbers the CC reg.

We would then be having a set of three insn:

1) last insn setting or clobbering CC (usually arithmetic insn)
2) insn setting FLAGS:CC_XXX (compare)
3) insn using FLAGS:CC_XXX (conditional branch)

Generally insn 1) would be something like

(parallel [
  (set (reg:SI a) (minus:SI (reg:SI b) (const_int -1)))
  (clobber (reg:CC FLAGS)) ])

I think that one could grab most optimization opportunities if in case that 
insn 1) is a single-set with clobber one tries to replace isnn 1) by 
double-set insn of the type

(parallel [
  (set (reg:SI a) (minus:SI (reg:SI b) (const_int -1)))
  (set (reg:ALL_POSSIBLE_CCMODES_COMPATIBLE_WITH_CC_XXX FLAGS)
         (compare:ALL_POSSIBLE_CCMODES_COMPTIBLE_WITH_CC_XXX
              (object stored to destination of single set) 
              (const_int 0))) ])

or in the case of original subXX-patterns also with

(parallel [
  (set (reg:SI a) (minus:SI (B) (C)))
  (set (reg:ALL_POSSIBLE_CCMODES_COMPATIBLE_WITH_CC_XXX FLAGS)
         (compare:ALL_POSSIBLE_CCMODES_COMPTIBLE_WITH_CC_XXX
              (B) (C) ])

since subXX instructions are likely to compute the full condition code.

One would, thus, be trying to match a hand full of possibly existing 
double-sets patterns against the machine description. I.e. double-sets that 
are very similar to the original set-with-clobber. If the machine description 
has such double-set insn that are very similar to the 
arithmetic_with_clobber-insn of the expanded RTL, one would be looking at the 
generated condition codes and check if they are similar to the condition code 
calculated by insn 2) of the original sequence.

Maybe this would be useful for quite a number of targets. IIUC, there are some 
targets that are realizing above optimizations after reload with many many 
peephole2 patterns.

IMO, this would be a very light-weight pass and it should scale linearly with 
the code size.

Again comments would be welcome :-).

Bjoern.