This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [rfc] multi-word subreg lowering pass
- From: BjÃrn Haase <bjoern dot m dot haase at web dot de>
- To: Richard Henderson <rth at redhat dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Wed, 18 May 2005 20:33:25 +0200
- Subject: Re: [rfc] multi-word subreg lowering pass
- References: <200505062218.15370.bjoern.m.haase@web.de> <20050507033326.GC23300@redhat.com>
Hello Richard,
I have had a close look at your patch during the last week. My overall opinion
expressed in two words is: *Extremely* helpful!
I come to the conclusion, however, that your patch could expose it's whole
power only when being supplemented with a small portion of additional
supporter infrastructure.
1.) Origin of the (sometimes tremendous) improvements, I have seen:
The main benefit that I have seen when comparing your method with splitters
after reload is, that the additional freedom the register allocator now has
could avoid a lot of register->register copying. E.g. with your patch it is
for the first time that operations like
uint8_t a;
uint16_t b,c;
c = a | (b<<8);
result in absolutely *no* code. For the avr target, such expressions used to
be extremely inefficient.
Also, generally all of the mixed-mode expressions benefit largely. E.g.
expressions like (add:DI (zero_extend:DI (reg:SI ) (reg:DI) ).
> I'd also planned to restructure this so that it uses recog_data to
> replace values in real operands, rather than replacing subregs
> wherever they may be found.
This could be helpful. But already at it's present stage, I am seeing a lot of
improvement.
2.) Observed difficulties:
> I hadn't posted this yet because I've yet to show real measurable
> improvements in long long arithmetic on x86. I suspect that I
> won't be able to show this until we have a new register allocator.
I think one origin of the present absence of sizeable improvements for the x86
might be related to CCmode re-use. When using splitters after reload, there
is no difficulty to communicate to gcc that the CC contains useful data. This
way gcc is able to optimize away quite a number of comparison instructions.
When expanding all of the patterns at RTL generation, however, I presently do
not see how to do that without new infrastructure. Main difficulty is, IMO,
that in order to express a comparison pattern like compare:DI (reg:DI)
(reg:DI) one needs to refer to all of the four involved SImode subregs. When
thinking about minus operations, however, the individual instructions of the
"lowered" expanded patterns refer either to the two most significant or the
least significant subregs. To teach the mid end that at the end of a sequence
like
sub rev1,reg3,reg5
sub_carry reg2,reg4,reg6
one ends up with exactly the same condition code as for
cmp reg3,reg5
cmp_carry reg4,reg6
would, IMO, be very difficult. I think that re-using condition codes is the
main present weakness.
3.) How, IMO, CC re-use might be implemented without much work
IIUC, the CC re-use issue would be resolved when finding a way to tell the CSE
and GCSE passes that after an expanded insn sequence the CC register holds a
value corresponding to the result of a standardized comparison instruction.
If G/CSE could identify that the result of a comparison instruction has been
previously calculated, it could optimize away the re-calculation of the
common expression.
This could, IMO, be realized by inserting "announcement" instructions. I.e.
instructions that are generated at expand and that are carrying a special
flag that causes them to be removed after the initial G/CSE passes. I.e. one
would require a tiny additional pass to be run after G/CSE that replaces
these instructions by note instructions before register allocation. In order
to describe more clearly, what I am thinking about: Looking at the example
long long x;
long long y;
long long d;
d = (x - y); // operation 1
if (x < y) // operation 2
{
I'd suggest to expand this into a sequence like (pseudocode)
; RTX for operation 1 starts here
(set reg:SI 100) (subreg:SI (reg:DI x) 0)
(set reg:SI 101) (subreg:SI (reg:DI x) 4)
(set reg:SI 102) (subreg:SI (reg:DI y) 0)
(set reg:SI 103) (subreg:SI (reg:DI y) 4)
(parallel (
(set reg:SI 104) (minus:SI (reg:SI 100) (reg:SI 102)))
(set CC:CC_carry) ( ... rtx for carry bit ...))
(parallel (
(set reg:SI 105) ( minus:SI (CC:SI) (minus:SI (reg:SI 100) (reg:SI 102))))
(set CC:CC_comp) (unspec [(reg:SI 100) (reg:SI 102) (set CC:CC_carry)] cp)))
; "announce" insn that is removed without replacement after GCSE
(parallel (
(set CC:CC_comparison) (compare:DI (reg:DI x) (reg:DI y))
(use CC)
(note ("to be removed after gcse") ))
(set (subreg:SI (reg:DI x) 0) (reg:SI 104))
(set (subreg:SI (reg:DI x) 4) (reg:SI 105))
; RTX for operation 2 starts here
(set reg:SI 106) (subreg:SI (reg:DI x) 0)
(set reg:SI 107) (subreg:SI (reg:DI x) 4)
(set reg:SI 108) (subreg:SI (reg:DI y) 0)
(set reg:SI 109) (subreg:SI (reg:DI y) 4)
(set CC:CC_comp) (compare:CC_comparison (reg:SI 106) (reg:SI 108))
(set CC:CC_comp) (unspec [ (reg:SI 106) (reg:SI 108) (CC:CC_comp)] cpc)
; Identical "announce" insn that is removed without replacement after GCSE
(parallel (
(set CC:CC_comp) (compare:DI (reg:DI x) (reg:DI y))
(use CC)
(note ("to be removed after gcse") ))
I have seen that this kind of approach indeed can work, by studying the
example of a (admittedly sub-optimal) divmodsi4 pattern for the AVR target.
(There I admit are other ways for resolving the missed optimization issue).
The optimizers indeed are able to identify by using "announcement"
instructions that common expressions do not need to be re-calculated. Also
the restrictions for "double-set" instruction patterns would not apply.
For implementing such a new class of announcement instructions, one only would
need to implement a pass that replaces them by simple note insns after GCSE.
Yours,
BjÃrn
P.S.:
I think that "announcement instructions" could be useful as well in other
circumstances. E.g.
- for communicating that a built-in library function for divmodsi4 had
calculated as an unexpected by-product the absolute values of the input
operands that remain in some registers.
- for communicating that when returning from some built-in function some
register has a known constant value, e.g. 0.0, -1.0.