This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [rfc] subreg lowering pass / Overcoming double-set difficulties for CC re-use
- From: BjÃrn Haase <bjoern dot m dot haase at web dot de>
- To: denisc at overta dot ru
- Cc: Roger Sayle <roger at eyesopen dot com>, gcc-patches at gcc dot gnu dot org, Richard Henderson <rth at redhat dot com>
- Date: Sun, 11 Jun 2006 13:01:53 +0200
- Subject: Re: [rfc] subreg lowering pass / Overcoming double-set difficulties for CC re-use
- References: <Pine.LNX.4.44.0605280835280.26534-100000@www.eyesopen.com> <Pine.LNX.4.58.0606021807170.4311@wotan.suse.de> <m3d5dq27ae.fsf@gossamer.airs.com>
Hi,
I have spend some time on studying the subreg lowering issue and cc0->CCmode
conversion for the AVR target. I'd like to summarize my conclusions
concerning subreg lowering and CCmode conversion in this mail.
Test results that draw me to the conclusion that lowering directly after
expand is not a good idea. In the second part, I'd like to suggest a method
for overcoming part of the double-set problems that seem to generate
difficulties with re-using condition code information.
A) Why I think subreg lowering is best delayed until after combine:
1.)
First thing is: Subreg lowering and cc0->CCmode conversion are kind of linked
since in order to properly describe lowered "add/add with carry" sequences,
it is necessary to properly describe the side effects on the carry flags in
the RTL. The result is, that when doing the lowering directly after expand,
we would have lots of double-set parallels in the RTL, e.g. for addsi or
subsi. My experiments have shown, that the presence of such double-sets
prevents almost any optimization. I first did not understand why, e.g. i386
expands RTL with lots of clobbers instead of truely describing the CC side
effects. Now I think, I know why: E.g. in a sequence like
(set (reg:HI 42) (const_int 12))
(parallel[
(set (reg:HI 43) (plus:HI (reg:HI 43) (reg:HI 42)))
(clobber (reg:CC 46))])
CSE will be able to merge the const int as second parameter into the addhi
instruction to yield
(parallel[
(set (reg:HI 43) (plus:HI (reg:HI 43) (const_int 12)))
(clobber (reg:CC 46))])
(given that the predicates allow for immediates and registers) while it will
not be able to do that for the corresponding double-set
(set (reg:HI 42) (const_int 12))
(parallel[
(set (reg:HI 43) (plus:HI (reg:HI 43) (reg:HI 42)))
(set (reg:CC_NCZ 46) (compare:CC_NCZ (plus:HI (reg:HI 43) (reg:HI 42))
(const_int 0)))])
sequence. So my conclusion is: As long as the RTL passes are not able to do
much useful with double-sets, we need to avoid them in the early RTL passes.
This implies the conclusion that any lowering that generates double-sets at
expand prevents optimizations.
2.)
Since, e.g. for AVR, even Pmode objects need to be lowered to two QImode
expressions, lowering directly after expand would prevent all optimizations
of the addressing modes. Since, IIUC, choosing the best addressing modes for
pointers is largely done by combine, we need to do the lowering after
combine.
This draws me to the conclusion that it is best to expand RTL that 1) refers
to the unsplitted original modes and 2) uses clobbers everywhere instead of a
clean description of the side effects on the CC registers.
Splitters could then be used after combine to smash all the references to the
full-mode entities into the smallest mode that could properly be handled by
the machine instructions.
In order to give the register allocator more freedom to choose the right
register and more specifically to remove the requirement for consecutive
registers, one would then use RTH's method for smashing the subregs and
replacing them by isolated word-size entities. Only for the parameters and
return values of function calls and possibly for inline-asm statements the
references to the entire original modes would remain.
B) Possible method for the double-set issue
I have spend quite some time on writing explicit splitting patterns that do
such a lowering. One (IMO) interresting result is, that one could make use of
the existing splitting algorithms in order to implement kind of a
RTL-transformation pass. It is, e.g. readily possible to implement
"splitting" patterns that look for patterns like
(parallel[
(set (reg:HI 43) (plus:HI (reg:HI 43) (const_int 12)))
(clobber (reg:CC 46))])
and "split" them into the corresponding pattern with the full information on
the condition code
(set (reg:HI 42) (const_int 12))
(parallel[
(set (reg:HI 43) (plus:HI (reg:HI 43) (reg:HI 42)))
(set (reg:CC_NCZ 46) (compare:CC_NCZ (plus:HI (reg:HI 43) (reg:HI 42))
(const_int 0)))])
. This transformation is possible back-and forth. While converting clobbers
into true double-set instructions is always possible, one only would need to
check for reg_unused_after (cc_register_regno) before converting a double-set
into a clobber instruction, thus discarding the knowledge on the CC side
effect. Which direction to use for the transformation could be chosen by
appropriate conditions in the splitting patterns.
I think that by using a pass sequence like
split_pass_that_converts_clobbers_to_proper_double_sets
CSE_pass
split_pass_that_converts_proper_double_sets_back_to_clobbers
one could pragmatically get the best of the two worlds. I.e. one would 1.)
expand RTL with clobbers, 2.) perform optimizations on RTL with clobbers, 3.)
switch on the true information on the codition code by splitting patterns for
a short time period, 4.) search and eliminate common sub-expressions
concerning the condition codes within the RTL that has the full complexity of
condition codes exposed, 5.) simplify the optimizer's life by again replacing
the double-sets by appropriate clobbers everywhere where the CC result of the
double-set instruction is not used.
Putting all together, one would be replacing the present pass sequence
...
NEXT_PASS (pass_combine);
NEXT_PASS (pass_if_after_combine);
NEXT_PASS (pass_partition_blocks);
NEXT_PASS (pass_regmove);
NEXT_PASS (pass_split_all_insns);
...
by
...
NEXT_PASS (pass_combine);
NEXT_PASS (pass_if_after_combine);
NEXT_PASS (pass_partition_blocks);
NEXT_PASS (pass_regmove);
NEXT_PASS (pass_use_split_for_exposing_CC_complexity);
NEXT_PASS (pass_cse3);
NEXT_PASS (pass_use_split_for_again_hiding_CC_complexity_and_possibly
smashing_subregs);
if (target_uses_subreg_lowering)
{
NEXT_PASS (pass_subreg_lowering_according_to_rth);
NEXT_PASS (pass_cse4);
}
...
The sequence
NEXT_PASS (pass_use_split_for_exposing_CC_complexity);
NEXT_PASS (pass_cse3);
NEXT_PASS (pass_use_split_for_again_hiding_CC_complexity_and_possibly
could then be considered to be a single "try to re-use pre-computed condition
codes" pass.
I agree that possibly the machine description would lack beauty due to the
presence of a lot of "splitting" patterns for the purpose of target-specific
RTL manipulations. And I don't know how much time the RTL processing by the
splitters may actually require. The advantage (IMO), would however be that
concerning the overcoming of part of the double-set problems one would not
need to implement any new concept. All the necessary passes are already
there.
Bjoern.