This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH]: Add CSE pass directly after expand / Add register equal notes for divmod4 for AVR




Steven Bosscher <stevenb@suse.de> schrieb am 16.06.05 12:06:10:
> 
> On Thursday 16 June 2005 11:45, Björn Haase wrote:
> > The point why I have posted the patch never the less is: I don't have time
> > to waste and I refuse to be working dozens of hours on an approach
> > requiring an early CSE pass in case that there never is any hope to get the
> > approach integrated into gcc because some folks favorite target does not
> > benefit much from it.
> 
> Maybe those targets benefit as well, but that isn't clear from your
> post.  If you are right about those notes and all that, then maybe
> such an early CSE pass is necessary.  But maybe the existing CSE1
> pass should be moved, instead of adding a new pass.

I overlook only part of the complexity. For the issues I am presently trying to 
implement the additional CSE passes on the RTL level *are* useful because the 
structure on the RTL level differs significantly from what the structure on the 
tree level looks like.

The most important issue that is causing substantial differences between tree and
RTL representation, IMO, is subreg lowering. One difficulty with subreg 
lowering could be studied when looking what happens when expanding a sequence 
for MINUS:DI into two

(parallel [ 
  (set (subreg:SI (reg:DI resultreg) 0) (minus:SI (subreg:SI (reg:DI inputreg1) 0) (subreg:SI (reg:DI inputreg2) 0 )))
  (set (CCREG:CC_borrow) (generate_borrow:CC_carry (subreg:SI (reg:DI inputreg1) 0) (subreg:SI (reg:DI inputreg2) 0)))])
(parallel [ 
  (set (subreg:SI (reg:DI resultreg) 4) (minus:SI (minus:SI (subreg:SI (reg:DI inputreg1) 4) (subreg:SI (reg:DI inputreg2) 4 ))) (CCREG:CC_carry) )
  (set (CCREG:CC) ( ### ))])

. One would be having two double-set instructions, each one setting both, 
some numerical result and the condition code. The three # are placed where the 
problem resides: In order to correctly express what kind of condition code is 
generated, one needs to refer to the entire DImode register. 

Replacing ### by (compare:CC (reg:DI outputreg) (const_0) ) would be incorrect 
(IIUC) since it would set the condition code according to the state the output 
register had prior to the second parallel instruction. I.e. it would claim to 
set the condition code according to a state where the lower half of the output 
reg is already written and where the upper half is undefined.

When replacing ### by (compare:CC (reg:DI inputreg1) (reg:DI inputreg2) ), the result 
would be correct. However, GCC would be forced to maintain a copy of the unchanged entire 
inputregs (both upper and lower part) in registers until the end of the second parallel. 
This would prevent the input and output operands to overlap. In other words: GCC then thinks 
that it needs both the original low and high part of inputreg1 and inputreg2 in order to 
implement the condition-code set of the second parallel instruction.

It is easy to guarantee that after expand, resultreg, inputreg1 and inputreg2 are independent 
and different pseudos, so there is no overlap problem initially. 
However later on the problems start. Namely, when mapping RTL to actual instructions.

The solution I am presently thinking about is: Do it like the expanders in optabs do: Add 
two "copy to itself with added register note" instructions. 
I.e. 1.) replace ### by (some-unspec-operation-on-the-two-upper-inputreg-parts) and 2.) add

(set (reg:DI outputreg) (reg:DI outputreg) )__with_reg_equal_note__\
"i am equal to (minus:DI (reg:DI inputreg1) (reg:DI inputreg2) )".
(set (reg CCREG) (reg CCREG) )__with_reg_equal_note__\
"i am equal to (compare:CC_compare (reg:DI inputreg1) (reg:DI inputreg2) )".
(set (reg CCREG) (reg CCREG) )__with_reg_equal_note__\
"i am equal to (compare:CC_compare (reg:DI outputreg) (const_0) )".

at the end of the two parallels above. The additional instruction serve for nothing
except telling CSE: "Hi CSE, in case that you would have to re-calculate the same thing
elsewhere: By accident there's some unexpected additional result available in register 
CCREG. Maybe it's useful.".

CSE knows how to work with such kind of information. I.e. it could remove a later compare:DI
instruction that recalculates the same condition code.

Problem right now is twofold: 
1.) the register_equiv notes are removed from the RTL prior to the 
first CSE pass (at least that's what I have seen so far: All of them seem to be removed 
immediately after expand and before the first jump optimizations.).
2.) GCC is so smart that it recognizes that all of the (set_to_myself register_equiv notes 
are in fact useless. For this reason it removes them before any CSE pass has a chance to
see them (at least that's what I have seen in the cases I have investigated so far). Already the
first jump optimization pass seems to remove them.


The solution I am suggesting is: 

Let's once run CSE *immediately* after expand and let's then let the smart gcc passes 
remove the "copy-to-itself-with-register-note" marker instructions.

> Or maybe another CSE pass is not the answer.  Your first point ("the
> AVR port no longer  generates two calls to the divmod**4 library
> function .. if  both results (DIV and MOD) are used.") makes me wonder
> if we can not do that differently somehow, e.g. by further lowering
> at the tree level.
The difficulty with the divmod4 patterns for AVR is that the expanded RTL is too
complex for gcc. Without adding the reg_equiv notes, GCC is not able to find out 
that it is in fact twice calculating the same result. 
Background is that the divmod4 library call does not operate on the original pseudos 
but on hard regs instead. GCC does not recognize that the hard regs actually 
contain a copy of the original pseudos.

> Your patch would increase the need for CSE and libcall notes, and it
> is pretty generally agreed that those are warts that alternative
> solutions ought to be found for.
I understand that and that's one of the reasons why I considered it to be important to discuss this
issue. ;-)


> > My present impression is that it is impossible to get any change into the
> > gcc mid-end tree.
> 
> Bull.
I am exaggerating of course.

 
> > I would *not* have suggested any change in the mid-end if I had seen any
> > other possibility for addressing the issue I have described in my post
> > simply by modifications on the back-ends!!!
> 
> But you haven't given a full description of the issue either (i.e. a
> test case with the desired and actual output).  Maybe there are other
> ways.  Or maybe not.

I'm not in front of my own machine now but I could easily provide a test case
for the divmod4 issue. IMO, this case has some similarity with the problems that one 
will be facing in the case of CC re-use for subreg-lowered expressions. For this (second) 
part I am presently having a crude hack so therefore it's not easy to provide a test case
right now.

> > As explained this could IMO also help to improve DImode
> > operatations on x86. 
> 
> And TImode on amd64, too.
And what I am most concerned about: SI mode and HI mode on AVR.

> > So, if it helps you, consider my patch to be in fact a
> > "RFC: Is there any hope to get an early CSE pass running before a
> > remove_unneccessary_notes() eliminates all of the reg_equal notes for speed
> > reasons?"
> 
> I still don't understand why remove_unnecessary_notes throws away those
> notes.  I have looked at CSE a lot and I've always seen it use REG_EQUAL
> notes that were there on insns.  In fact at one point we showed that the
> only real benefit from fold_rtx was that it uses REG_EQUAL notes.  If we
> remove all notes, as you say, then why do we add them in the first place?

I did not look into all of the details in fact. I assume that not *all* of the
notes are removed. What I, however, *can* say is: All of the useful notes I have 
seen so far prior to remove_unnecessary_notes have vanished afterwards :-).

Gr,

Björn

______________________________________________________________
Verschicken Sie romantische, coole und witzige Bilder per SMS!
Jetzt bei WEB.DE FreeMail: http://f.web.de/?mc=021193


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]