This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH]: Add CSE pass directly after expand / Add register equal notes for divmod4 for AVR
- From: Björn Haase <Bjoern dot M dot Haase at web dot de>
- To: StevenBosscher <stevenb at suse dot de>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Fri, 17 Jun 2005 01:06:12 +0200
- Subject: Re: [PATCH]: Add CSE pass directly after expand / Add register equal notes for divmod4 for AVR
Steven Bosscher <stevenb@suse.de> schrieb am 16.06.05 12:06:10:
>
> On Thursday 16 June 2005 11:45, Björn Haase wrote:
> > The point why I have posted the patch never the less is: I don't have time
> > to waste and I refuse to be working dozens of hours on an approach
> > requiring an early CSE pass in case that there never is any hope to get the
> > approach integrated into gcc because some folks favorite target does not
> > benefit much from it.
>
> Maybe those targets benefit as well, but that isn't clear from your
> post. If you are right about those notes and all that, then maybe
> such an early CSE pass is necessary. But maybe the existing CSE1
> pass should be moved, instead of adding a new pass.
I overlook only part of the complexity. For the issues I am presently trying to
implement the additional CSE passes on the RTL level *are* useful because the
structure on the RTL level differs significantly from what the structure on the
tree level looks like.
The most important issue that is causing substantial differences between tree and
RTL representation, IMO, is subreg lowering. One difficulty with subreg
lowering could be studied when looking what happens when expanding a sequence
for MINUS:DI into two
(parallel [
(set (subreg:SI (reg:DI resultreg) 0) (minus:SI (subreg:SI (reg:DI inputreg1) 0) (subreg:SI (reg:DI inputreg2) 0 )))
(set (CCREG:CC_borrow) (generate_borrow:CC_carry (subreg:SI (reg:DI inputreg1) 0) (subreg:SI (reg:DI inputreg2) 0)))])
(parallel [
(set (subreg:SI (reg:DI resultreg) 4) (minus:SI (minus:SI (subreg:SI (reg:DI inputreg1) 4) (subreg:SI (reg:DI inputreg2) 4 ))) (CCREG:CC_carry) )
(set (CCREG:CC) ( ### ))])
. One would be having two double-set instructions, each one setting both,
some numerical result and the condition code. The three # are placed where the
problem resides: In order to correctly express what kind of condition code is
generated, one needs to refer to the entire DImode register.
Replacing ### by (compare:CC (reg:DI outputreg) (const_0) ) would be incorrect
(IIUC) since it would set the condition code according to the state the output
register had prior to the second parallel instruction. I.e. it would claim to
set the condition code according to a state where the lower half of the output
reg is already written and where the upper half is undefined.
When replacing ### by (compare:CC (reg:DI inputreg1) (reg:DI inputreg2) ), the result
would be correct. However, GCC would be forced to maintain a copy of the unchanged entire
inputregs (both upper and lower part) in registers until the end of the second parallel.
This would prevent the input and output operands to overlap. In other words: GCC then thinks
that it needs both the original low and high part of inputreg1 and inputreg2 in order to
implement the condition-code set of the second parallel instruction.
It is easy to guarantee that after expand, resultreg, inputreg1 and inputreg2 are independent
and different pseudos, so there is no overlap problem initially.
However later on the problems start. Namely, when mapping RTL to actual instructions.
The solution I am presently thinking about is: Do it like the expanders in optabs do: Add
two "copy to itself with added register note" instructions.
I.e. 1.) replace ### by (some-unspec-operation-on-the-two-upper-inputreg-parts) and 2.) add
(set (reg:DI outputreg) (reg:DI outputreg) )__with_reg_equal_note__\
"i am equal to (minus:DI (reg:DI inputreg1) (reg:DI inputreg2) )".
(set (reg CCREG) (reg CCREG) )__with_reg_equal_note__\
"i am equal to (compare:CC_compare (reg:DI inputreg1) (reg:DI inputreg2) )".
(set (reg CCREG) (reg CCREG) )__with_reg_equal_note__\
"i am equal to (compare:CC_compare (reg:DI outputreg) (const_0) )".
at the end of the two parallels above. The additional instruction serve for nothing
except telling CSE: "Hi CSE, in case that you would have to re-calculate the same thing
elsewhere: By accident there's some unexpected additional result available in register
CCREG. Maybe it's useful.".
CSE knows how to work with such kind of information. I.e. it could remove a later compare:DI
instruction that recalculates the same condition code.
Problem right now is twofold:
1.) the register_equiv notes are removed from the RTL prior to the
first CSE pass (at least that's what I have seen so far: All of them seem to be removed
immediately after expand and before the first jump optimizations.).
2.) GCC is so smart that it recognizes that all of the (set_to_myself register_equiv notes
are in fact useless. For this reason it removes them before any CSE pass has a chance to
see them (at least that's what I have seen in the cases I have investigated so far). Already the
first jump optimization pass seems to remove them.
The solution I am suggesting is:
Let's once run CSE *immediately* after expand and let's then let the smart gcc passes
remove the "copy-to-itself-with-register-note" marker instructions.
> Or maybe another CSE pass is not the answer. Your first point ("the
> AVR port no longer generates two calls to the divmod**4 library
> function .. if both results (DIV and MOD) are used.") makes me wonder
> if we can not do that differently somehow, e.g. by further lowering
> at the tree level.
The difficulty with the divmod4 patterns for AVR is that the expanded RTL is too
complex for gcc. Without adding the reg_equiv notes, GCC is not able to find out
that it is in fact twice calculating the same result.
Background is that the divmod4 library call does not operate on the original pseudos
but on hard regs instead. GCC does not recognize that the hard regs actually
contain a copy of the original pseudos.
> Your patch would increase the need for CSE and libcall notes, and it
> is pretty generally agreed that those are warts that alternative
> solutions ought to be found for.
I understand that and that's one of the reasons why I considered it to be important to discuss this
issue. ;-)
> > My present impression is that it is impossible to get any change into the
> > gcc mid-end tree.
>
> Bull.
I am exaggerating of course.
> > I would *not* have suggested any change in the mid-end if I had seen any
> > other possibility for addressing the issue I have described in my post
> > simply by modifications on the back-ends!!!
>
> But you haven't given a full description of the issue either (i.e. a
> test case with the desired and actual output). Maybe there are other
> ways. Or maybe not.
I'm not in front of my own machine now but I could easily provide a test case
for the divmod4 issue. IMO, this case has some similarity with the problems that one
will be facing in the case of CC re-use for subreg-lowered expressions. For this (second)
part I am presently having a crude hack so therefore it's not easy to provide a test case
right now.
> > As explained this could IMO also help to improve DImode
> > operatations on x86.
>
> And TImode on amd64, too.
And what I am most concerned about: SI mode and HI mode on AVR.
> > So, if it helps you, consider my patch to be in fact a
> > "RFC: Is there any hope to get an early CSE pass running before a
> > remove_unneccessary_notes() eliminates all of the reg_equal notes for speed
> > reasons?"
>
> I still don't understand why remove_unnecessary_notes throws away those
> notes. I have looked at CSE a lot and I've always seen it use REG_EQUAL
> notes that were there on insns. In fact at one point we showed that the
> only real benefit from fold_rtx was that it uses REG_EQUAL notes. If we
> remove all notes, as you say, then why do we add them in the first place?
I did not look into all of the details in fact. I assume that not *all* of the
notes are removed. What I, however, *can* say is: All of the useful notes I have
seen so far prior to remove_unnecessary_notes have vanished afterwards :-).
Gr,
Björn
______________________________________________________________
Verschicken Sie romantische, coole und witzige Bilder per SMS!
Jetzt bei WEB.DE FreeMail: http://f.web.de/?mc=021193