This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [rfc] subreg lowering pass / Overcoming double-set difficulties for CC re-use


Hi,

I have spend some time on studying the subreg lowering issue and cc0->CCmode 
conversion for the AVR target. I'd like to summarize my conclusions 
concerning subreg lowering and CCmode conversion in this mail.

Test results that draw me to the conclusion that lowering directly after 
expand is not a good idea. In the second part, I'd like to suggest a method 
for overcoming part of the double-set problems that seem to generate 
difficulties with re-using condition code information.

A) Why I think subreg lowering is best delayed until after combine:

1.)
First thing is: Subreg lowering and cc0->CCmode conversion are kind of linked 
since in order to properly describe lowered "add/add with carry" sequences, 
it is necessary to properly describe the side effects on the carry flags in 
the RTL. The result is, that when doing the lowering directly after expand, 
we would have lots of double-set parallels in the RTL, e.g. for addsi or 
subsi. My experiments have shown, that the presence of such double-sets 
prevents almost any optimization. I first did not understand why, e.g. i386 
expands RTL with lots of clobbers instead of truely describing the CC side 
effects. Now I think, I know why: E.g. in a sequence like

(set (reg:HI 42) (const_int 12))
(parallel[
   (set (reg:HI 43) (plus:HI (reg:HI 43) (reg:HI 42)))
   (clobber (reg:CC 46))])

CSE will be able to merge the const int as second parameter into the addhi 
instruction to yield

(parallel[
   (set (reg:HI 43) (plus:HI (reg:HI 43) (const_int 12)))
   (clobber (reg:CC 46))])

(given that the predicates allow for immediates and registers) while it will 
not be able to do that for the corresponding double-set

(set (reg:HI 42) (const_int 12))
(parallel[
   (set (reg:HI 43) (plus:HI (reg:HI 43) (reg:HI 42)))
   (set (reg:CC_NCZ 46) (compare:CC_NCZ (plus:HI (reg:HI 43) (reg:HI 42))
                                                          (const_int 0)))])

sequence. So my conclusion is: As long as the RTL passes are not able to do 
much useful with double-sets, we need to avoid them in the early RTL passes. 
This implies the conclusion that any lowering that generates double-sets at 
expand prevents optimizations.

2.)
Since, e.g. for AVR, even Pmode objects need to be lowered to two QImode 
expressions, lowering directly after expand would prevent all optimizations 
of the addressing modes. Since, IIUC, choosing the best addressing modes for 
pointers is largely done by combine, we need to do the lowering after 
combine.


This draws me to the conclusion that it is best to expand RTL that 1) refers 
to the unsplitted original modes and 2) uses clobbers everywhere instead of a 
clean description of the side effects on the CC registers.

Splitters could then be used after combine to smash all the references to the 
full-mode entities into the smallest mode that could properly be handled by 
the machine instructions.
In order to give the register allocator more freedom to choose the right 
register and more specifically to remove the requirement for consecutive 
registers, one would then use RTH's method for smashing the subregs and 
replacing them by isolated word-size entities. Only for the parameters and 
return values of function calls and possibly for inline-asm statements the 
references to the entire original modes would remain.


B) Possible method for the double-set issue

I have spend quite some time on writing explicit splitting patterns that do 
such a lowering. One (IMO) interresting result is, that one could make use of 
the existing splitting algorithms in order to implement kind of a 
RTL-transformation pass. It is, e.g. readily possible to implement 
"splitting" patterns that look for patterns like

(parallel[
   (set (reg:HI 43) (plus:HI (reg:HI 43) (const_int 12)))
   (clobber (reg:CC 46))])

and "split" them into the corresponding pattern with the full information on 
the condition code

(set (reg:HI 42) (const_int 12))
(parallel[
   (set (reg:HI 43) (plus:HI (reg:HI 43) (reg:HI 42)))
   (set (reg:CC_NCZ 46) (compare:CC_NCZ (plus:HI (reg:HI 43) (reg:HI 42))
                                                          (const_int 0)))])

. This transformation is possible back-and forth. While converting clobbers 
into true double-set instructions is always possible, one only would need to 
check for reg_unused_after (cc_register_regno) before converting a double-set 
into a clobber instruction, thus discarding the knowledge on the CC side 
effect. Which direction to use for the transformation could be chosen by 
appropriate conditions in the splitting patterns.

I think that by using a pass sequence like

split_pass_that_converts_clobbers_to_proper_double_sets
CSE_pass
split_pass_that_converts_proper_double_sets_back_to_clobbers

one could pragmatically get the best of the two worlds. I.e. one would 1.) 
expand RTL with clobbers, 2.) perform optimizations on RTL with clobbers, 3.) 
switch on the true information on the codition code by splitting patterns for 
a short time period, 4.) search and eliminate common sub-expressions 
concerning the condition codes within the RTL that has the full complexity of 
condition codes exposed, 5.) simplify the optimizer's life by again replacing 
the double-sets by appropriate clobbers everywhere where the CC result of the 
double-set instruction is not used.

Putting all together, one would be replacing the present pass sequence

  ...
  NEXT_PASS (pass_combine);
  NEXT_PASS (pass_if_after_combine);
  NEXT_PASS (pass_partition_blocks);
  NEXT_PASS (pass_regmove);
  NEXT_PASS (pass_split_all_insns);
  ...

by 

  ...
  NEXT_PASS (pass_combine);
  NEXT_PASS (pass_if_after_combine);
  NEXT_PASS (pass_partition_blocks);
  NEXT_PASS (pass_regmove);
  NEXT_PASS (pass_use_split_for_exposing_CC_complexity);
  NEXT_PASS (pass_cse3);
  NEXT_PASS (pass_use_split_for_again_hiding_CC_complexity_and_possibly 
smashing_subregs);
  if (target_uses_subreg_lowering)
    {
      NEXT_PASS (pass_subreg_lowering_according_to_rth);
      NEXT_PASS (pass_cse4);
    }
   ...

The sequence

  NEXT_PASS (pass_use_split_for_exposing_CC_complexity);
  NEXT_PASS (pass_cse3);
  NEXT_PASS (pass_use_split_for_again_hiding_CC_complexity_and_possibly 

could then be considered to be a single "try to re-use pre-computed condition 
codes" pass.

I agree that possibly the machine description would lack beauty due to the 
presence of a lot of "splitting" patterns for the purpose of target-specific 
RTL manipulations. And I don't know how much time the RTL processing by the 
splitters may actually require. The advantage (IMO), would however be that 
concerning the overcoming of part of the double-set problems one would not 
need to implement any new concept. All the necessary passes are already 
there.

Bjoern.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]