[rfc] subreg lowering pass / Overcoming double-set difficulties for CC re-use

Mon Jun 19 18:33:00 GMT 2006

Björn Haase <bjoern.m.haase@web.de> writes:

> Maybe indeed CSE is not the right tool for implementing part of the 
> optimizations I was looking for. Basically, I was seeking for a method for 
> implementing a replacement for the NOTICE_UPDATE_CC mechanism. E.g. it would 
> be nice, if gcc could identify a situation where within the sequence
> 
> (set (reg:XX loop_counter) (minus:XX  (reg:XX loop_counter) (const_int -1)))
> (set (reg:CC) (compare:CC (reg:XX loop_counter) (const_int 0)))
> BNE to the start of the loop block.
> 
> the explicit compare instruction could be avoided if the preceeding subXX 
> operation already sets the required condition codes.
> 
> The difficulty that I am seeing right now is, that we now need to expand RTL 
> that uses (clobber (reg:CC)) instead of expanding RTL that is describing the 
> full condition code complexity with double-set parallels. Otherwise the RTL 
> optimizers would not work. For this reason CSE and friends can never identify 
> the optimization opportunity.
> IIUC, also combine is not able to help since there is no data-flow link 
> between the arithmetic instruction that leaves a useful condition code behind 
> and the branch instruction.? (Please correct me if I'm wrong.)
> 
> The possible re-uses of unexpectedly useful second results of double-sets insn 
> is, IMO, most likely only for condition codes (with the possible exception of 
> div/mod patterns).
> 
> Probably one would localize most of the optimization opportunities by looking 
> at what happens during the last few instructions that are located close to 
> the end of basic blocks that finish with a conditional branch.
> 
> The presently existing tiny CSE pass looks at possible re-uses of the CC reg 
> *after* the branch is taken or not taken. The pass I am thinking about would 
> be looking in front of the branch insn. It would be identifying the last CC 
> setter that is located in front of the branch (most probably a compare insn) 
> and would be looking for the next insn in the same basic block that also sets 
> or clobbers the CC reg.
> 
> We would then be having a set of three insn:
> 
> 1) last insn setting or clobbering CC (usually arithmetic insn)
> 2) insn setting FLAGS:CC_XXX (compare)
> 3) insn using FLAGS:CC_XXX (conditional branch)
> 
> Generally insn 1) would be something like
> 
> (parallel [
>   (set (reg:SI a) (minus:SI (reg:SI b) (const_int -1)))
>   (clobber (reg:CC FLAGS)) ])
> 
> I think that one could grab most optimization opportunities if in case that 
> insn 1) is a single-set with clobber one tries to replace isnn 1) by 
> double-set insn of the type
> 
> (parallel [
>   (set (reg:SI a) (minus:SI (reg:SI b) (const_int -1)))
>   (set (reg:ALL_POSSIBLE_CCMODES_COMPATIBLE_WITH_CC_XXX FLAGS)
>          (compare:ALL_POSSIBLE_CCMODES_COMPTIBLE_WITH_CC_XXX
>               (object stored to destination of single set) 
>               (const_int 0))) ])
> 
> or in the case of original subXX-patterns also with
> 
> (parallel [
>   (set (reg:SI a) (minus:SI (B) (C)))
>   (set (reg:ALL_POSSIBLE_CCMODES_COMPATIBLE_WITH_CC_XXX FLAGS)
>          (compare:ALL_POSSIBLE_CCMODES_COMPTIBLE_WITH_CC_XXX
>               (B) (C) ])
> 
> since subXX instructions are likely to compute the full condition code.
> 
> One would, thus, be trying to match a hand full of possibly existing 
> double-sets patterns against the machine description. I.e. double-sets that 
> are very similar to the original set-with-clobber. If the machine description 
> has such double-set insn that are very similar to the 
> arithmetic_with_clobber-insn of the expanded RTL, one would be looking at the 
> generated condition codes and check if they are similar to the condition code 
> calculated by insn 2) of the original sequence.
> 
> Maybe this would be useful for quite a number of targets. IIUC, there are some 
> targets that are realizing above optimizations after reload with many many 
> peephole2 patterns.
> 
> IMO, this would be a very light-weight pass and it should scale linearly with 
> the code size.
> 
> Again comments would be welcome :-).

This sounds like an interesting idea.  Rather than make the pass
clever, I think it would probably be worth implementing a version of
NOTICE_UPDATE_CC as a target hook: given an instruction which clobbers
the CC, return what the CC is really set to.  Then if it has a real
value, generate the double-SET instruction.

Obviously the MD file would have to be modified to add the double-SET
instructions.  Maybe most of the CC0 machine descriptions already have
those operations--I haven't checked.

When were you thinking of running this pass?  My initial rough guess
would be that you would get the best results after combine but before
the scheduler.

Ian