This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Register Rematerialization In New Register Allocator
- From: Denis Chertykov <denisc at overta dot ru>
- To: "Mukta Punjani, Noida" <muktap at noida dot hcltech dot com>
- Cc: gcc at gcc dot gnu dot org, Michael Matz <matz at suse dot de>
- Date: 30 Jun 2003 20:38:27 +0400
- Subject: Re: Register Rematerialization In New Register Allocator
- References: <E04CF3F88ACBD5119EFE00508BBB21210A3BF6FF@exch-01.noida.hcltech.com>
"Mukta Punjani, Noida" <muktap@noida.hcltech.com> writes:
> Hi,
>
> I would like to discuss the register remat(rematerialization)
> in new register allocator branch.
>
> During register allocation, when a value cannot be kept in a register and
> needs to be spilled, the register allocator should recognize when it is
> cheaper to recompute the value (i.e. to remat it), rather than to store and
> reload it.
> An example of this can be
> o Calculate Frame Pointer(FP) + offset ->store to say r150
> o Spill/restore r150 to/from stack even though it would be cheaper to
> clobber r150 and regenerate it from scratch.
>
> Remat can score over spilling (in terms of aggregate cost of remat/spill
> instructions generated) in a number of cases such as frame pointer/static
> data calculation related spills and also other trivial calculations from
> ever live values in a function.
>
> The new register allocator seem to remat only constant loads. I plan to
> enhance the remat infrastructure as follows -
>
> o Collect remat information during the data flow pass itself (to propagate
> live range information) - (in df.c)
> o The remat v/s spill cost criteria can't be directly applied in the current
> framework (as no hard stack slots are assigned during spilling, so can't
> determine the spill cost). A workaround for remat/spill criteria might be to
> find sure shot cases of better remat/spill and take spill decision
> accordingly.
>
> Example of such a case (sure remat case) could be spilling of a def before
> its first use. Here, spill can be avoided by moving the def being spilled
> closer to its use. Other such cases can be reasoned out and treated
> likewise.
>
> Thoughts and comments awaited.
Few months ago I have a long discussion with Michael Matz about
rematerialization and related problems.
Discussion (mail by mail, <Gazovik it's me>):
From: Gazovik <gazovik@overta.ru>
Subject: Example of delayed remat
To: matz@suse.de
CC: denisc@overta.ru
Date: Wed, 22 Jan 2003 17:08:04 +0300
Reply-To: Gazovik <gazovik@overta.ru>
Organization: Gazovik
Hi Michael !
This is an example of delayed rematerialization.
I forgot to include diff against ra.c and ra.h to my
delayed rematerialization patch which is in gcc-patches.
Files description:
SparseCompRow.c - part of scimark2;
delayed-remat-patch - corrected patch for delayed remat;
SparseCompRow.c.23.lreg-delayed - compiled by gcc with applied
delayed-remat-patch;
SparseCompRow.s-delayed - likewise;
simple-remat-patch - patch with rematerialization of
arguments passed in stack;
SparseCompRow.c.23.lreg - compiled by gcc with simple-remat-patch;
SparseCompRow.s - likewise.
Interested things happened in second pass of allocation.
The "delayed remat" win in loop with maximal depth (.L15).
IMHO: It's a cost related thing.
Denis.
PS: Please answer to my home mail <denisc@overta.ru>
[2. application/octet-stream; example.tar.gz]...
From: Michael Matz <matz@suse.de>
Subject: Re: Example of delayed remat
To: <denisc@overta.ru>
Date: Wed, 22 Jan 2003 17:03:24 +0100 (CET)
Hi Denis,
On Wed, 22 Jan 2003, Gazovik wrote:
> This is an example of delayed rematerialization.
Thank you.
> SparseCompRow.c - part of scimark2;
> delayed-remat-patch - corrected patch for delayed remat;
> SparseCompRow.c.23.lreg-delayed - compiled by gcc with applied
> delayed-remat-patch;
> SparseCompRow.s-delayed - likewise;
>
> simple-remat-patch - patch with rematerialization of
> arguments passed in stack;
> SparseCompRow.c.23.lreg - compiled by gcc with simple-remat-patch;
> SparseCompRow.s - likewise.
>
> Interested things happened in second pass of allocation.
> The "delayed remat" win in loop with maximal depth (.L15).
Yes, but I'm not convinced that this is really the result of delayed
remat. The code already starts with selecting a different alternative for
one of the div instructions. Looking at the dumps I can't decide if
this is pure luck (in that a completely different order of
coloring the webs was chosen, and that subsequently lead to better code in
the inner loop), or if it's systematic. Have you also tested with _just_
delayed remat switched off and on (because your patch also included other
changes)?
I would like to understand _why_ delayed remat is better here. After all
you emit more copy insns during spilling, and by that complicate the graph
(can also be seen, because we now need one more round of coloring). With
sooner remat we leave fewer nodes to color later. Hmm, that brings me to
an idea: would it be too much trouble to introduce a flag ala
'flag_ra_delay_remat" which we can switch on and off, maybe based on
optimization level?
Having said all that, I guess, delayed remat really isn't that wrong, so
committing it is probably OK. What still makes me more nervous than
delayed remat is the sooner assignments of stack slots to spill pseudos,
which got no real color.
Ciao,
Michael.
From: Denis Chertykov <denisc@overta.ru>
Subject: Re: Example of delayed remat
To: Michael Matz <matz@suse.de>
Cc: <denisc@overta.ru>
Date: 23 Jan 2003 08:31:24 +0300
Michael Matz <matz@suse.de> writes:
> Hi Denis,
>
> On Wed, 22 Jan 2003, Gazovik wrote:
>
> > This is an example of delayed rematerialization.
>
> Thank you.
>
> > SparseCompRow.c - part of scimark2;
> > delayed-remat-patch - corrected patch for delayed remat;
> > SparseCompRow.c.23.lreg-delayed - compiled by gcc with applied
> > delayed-remat-patch;
> > SparseCompRow.s-delayed - likewise;
> >
> > simple-remat-patch - patch with rematerialization of
> > arguments passed in stack;
> > SparseCompRow.c.23.lreg - compiled by gcc with simple-remat-patch;
> > SparseCompRow.s - likewise.
> >
> > Interested things happened in second pass of allocation.
> > The "delayed remat" win in loop with maximal depth (.L15).
>
> Yes, but I'm not convinced that this is really the result of delayed
> remat. The code already starts with selecting a different alternative for
> one of the div instructions. Looking at the dumps I can't decide if
> this is pure luck
IMHO it's a pure luck.
I will try to explain why delayed remat better than simple in next
mail. (Which will sended from "Gazovik").
> (in that a completely different order of coloring the webs was
> chosen, and that subsequently lead to better code in the inner
> loop), or if it's systematic. Have you also tested with _just_
> delayed remat switched off and on (because your patch also included
> other changes)?
No, but pretty all examples win with delayed remat.
>
>
> I would like to understand _why_ delayed remat is better here.
> After all you emit more copy insns during spilling, and by that
> complicate the graph (can also be seen, because we now need one more
> round of coloring). With sooner remat we leave fewer nodes to color
> later. Hmm, that brings me to an idea: would it be too much trouble
> to introduce a flag ala 'flag_ra_delay_remat" which we can switch on
> and off, maybe based on optimization level?
I don't think so because pretty all examples with delayed remat wins.
>
> Having said all that, I guess, delayed remat really isn't that
> wrong, so committing it is probably OK. What still makes me more
> nervous than delayed remat is the sooner assignments of stack slots
> to spill pseudos, which got no real color.
Just wait a few hours for my next mail.
Denis.
From: Gazovik <gazovik@overta.ru>
Subject: Example of delayed remat
To: matz@suse.de
CC: denisc@overta.ru
Date: Thu, 23 Jan 2003 16:32:35 +0300
Reply-To: Gazovik <gazovik@overta.ru>
Organization: Gazovik
Hi Michael !
I will try to rearrange my thoughts about delayed remat.
Abstract:
1. Originally new allocator has only IR spilling;
2. After that you have added web splitting based on IR spilling. I wrote about
generation spill pseudos instead of stack slots. I call it IR splitting.
So, now we play with allocator with IR splitting + IR spilling.
OK ?
The allocation process (simplified):
1. Try to colorize webs;
2. Choose webs for spilling; (Webs selection based on coloring algorithm
not on web cost, but on web degree.
It's symplify not select_spill.)
3. Choose webs with lowest cost for spilling; (It's select_spill)
4. Recolor spills;
5. Actual spilling or stack slots allocation.
Actual spilling is an IR splitting not a spilling.
Stack slots allocation is an IR spilling on top of IR splitting.
Stack slots allocation is assign_stack_slots ().
It's my view on colorize - spilling phases.
So, why the delayed remat is a win in current allocator implementation ?
My answer: Because some (or many) spill pseudos generated in first or
second pass of allocation will be colored and some (or many) splitted webs
will be colored too. They will be colored by real colors not by
an_unusable_color.
If you use simple remat then you exclude webs with possible remat from the
colorization process, but with delayed remat such webs may got colors.
You can look to the SparseCompRow.s for example.
Why assign stack slots in any pass of allocator (not at end of allocation) ?
Example only.
insn1 def web1
...
...
...
insn2 use web1
...
...
...
insn3 use web1
After IR splitting. After generation of spill reg. After actual_spill ().
insn1 def web10
insn10' def of spill web1' = use web10
...
...
...
insn12' def web11 = use web1'
insn2 use web11
...
...
...
insn13' def web12 = use web1'
insn3 use web12
In this stage we have *increased register pressure*. I don't incorporate spill
web1' to insn1,insn2,insn3 to avoid over-constraining of web1'.
IMHO: Delayed remat related.
Very important thing that cost of web1' lower then cost of web1
(before IR splitting). The spill web1' usually will have a lower degree
and web1' can be coalesced with web10 and/or web11 and/or web12 and can
be successfully colored or not coalesced and colored.
If web1' isn't colored (colored by an_unusable_color) then web10, web11, web12
still require color and *register pressure still increased*.
After assign stack slots and coalesce spill slots:
# if possible then web10 and web1' will be processed by coalesce_spill_slot
insn1 def (stack-slot-web1')
DELETED insn10' def of spill web1' = use web10
...
...
...
DELETED insn12' def web11 = use web1'
#if possible
insn2 use (stack-slot-web1')
...
...
...
DELETED insn13' def web12 = use web1'
#if possible
insn3 use (stack-slot-web1')
Register pressure reduced as possible.
Denis.
PS: Too long mail with not too clean things and thoughts :-(
From: Michael Matz <matz@suse.de>
Subject: Re: Example of delayed remat
To: <denisc@overta.ru>
Date: Thu, 23 Jan 2003 17:28:42 +0100 (CET)
Hi Denis,
On Thu, 23 Jan 2003, Gazovik wrote:
> I will try to rearrange my thoughts about delayed remat.
>
> Abstract:
> 1. Originally new allocator has only IR spilling;
> 2. After that you have added web splitting based on IR spilling. I
> wrote about
> generation spill pseudos instead of stack slots. I call it IR splitting.
To make sure I've understood you correctly: that we use spill pseudos (or
stack pseudos as I've called them in the comments), _instead_ of directly
using stack slots, is named IR splitting by you, yes?
> The allocation process (simplified):
> 1. Try to colorize webs;
Here is one (important) part missing. Before building the normal SELECT
stack (your number 2. ) first all stack pseudos are removed from the
graph.
> 2. Choose webs for spilling; (Webs selection based on coloring algorithm
> not on web cost, but on web degree.
> It's symplify not select_spill.)
Do you mean here the building of the SELECT stack (i.e. the order in which
the webs later are tried to be colored)?
> 3. Choose webs with lowest cost for spilling; (It's select_spill)
> 4. Recolor spills;
> 5. Actual spilling or stack slots allocation.
> Actual spilling is an IR splitting not a spilling.
> Stack slots allocation is an IR spilling on top of IR splitting.
> Stack slots allocation is assign_stack_slots ().
>
> It's my view on colorize - spilling phases.
> So, why the delayed remat is a win in current allocator implementation ?
Ok, I think I now agree that delayed remat might be a win in some
situations. At least it should do not too much harm.
That leaves just the issue with using stack slots instead of stack pseudos
not just at the very end.
> Why assign stack slots in any pass of allocator (not at end of
> allocation) ?
Ahh, yes. The interesting question ;-)
> After IR splitting. After generation of spill reg. After actual_spill ().
>
> insn1 def web10
> insn10' def of spill web1' = use web10
> ...
> insn12' def web11 = use web1'
> insn2 use web11
> ...
> insn13' def web12 = use web1'
> insn3 use web12
Yes. Ok, web1' is the former web1, but it's now a stack pseudo! I.e.
conceptually it represents a stack slot. web10, web11 and web12 are
normal webs (Ok, they are spill-temps, but otherwise normal short webs).
> In this stage we have *increased register pressure*.
Yes and no. The first thing, which is done for the next coloring phase
is, that _all_ stack pseudo webs are removed from the conflict graph and
put on the SELECT stack (i.e. later they will be colored last). In that
way register pressure is lower then in the former pass (because the
long-live web1' is now not in the graph anymore).
This means, that after removing those stack pseudos, the graph is
_exactly_ the same, as if we had used stack slots directly for web1'. The
reason why I originally introduced stack pseudos was, that I wanted to
track liveness for stack-slots too, but it seemed to me too much work to
implement now also such tracking for (MEM) rtx'es, although I already have
all the machinery for (REG) rtx'es. But otherwise those stack pseudos are
totally equivalent to stack slots, just that their final address is not
yet known, and that they have a funny rtx code for memory positions (i.e.
they have REG ;-) ). What we get for free by doing it that way is, that
we can possibly (if the situation permit) also color a stack slot again
(which is equivalent to make a liveness analysis pass just for stackslots,
and if we have free registers put those stack slots back into registers).
But otherwise while the coloring is in process the graph is totally the
same as if we had used stack slots from the beginning.
> IMHO: Delayed remat related.
> Very important thing that cost of web1' lower then cost of web1
> (before IR splitting). The spill web1' usually will have a lower degree
> and web1' can be coalesced with web10 and/or web11 and/or web12 and can
> be successfully colored or not coalesced and colored.
>
> If web1' isn't colored (colored by an_unusable_color) then web10,
> web11, web12 still require color and *register pressure still
> increased*.
No. If web1' isn't colored, this means, that now we are sure, that in
fact web1' _does_ represent a stack slot, and not a register. I.e. we
know then, that we could have used a stack slot instead of a stack pseudo
and get the same coloring of the graph. But this can change in the next
round. Btw. you'll notice that if stack pseudos get no color, that is no
reason to repeat the coloring process, or even to emit spill code. They
are finally equivalent to a stack-based (MEM).
That web10,web11 and web12 still need a color is not different if we had
used directly stack slots. The code above would have looked like so:
insn1 def web10
insn10' [stack+8] = use web10
...
insn12' def web11 = use [stack+8]
insn2 use web11
...
insn13' def web12 = use [stack+8]
insn3 use web12
> After assign stack slots and coalesce spill slots:
>
> # if possible then web10 and web1' will be processed by coalesce_spill_slot
> insn1 def (stack-slot-web1')
> DELETED insn10' def of spill web1' = use web10
> ...
> ...
> ...
> DELETED insn12' def web11 = use web1'
> #if possible
> insn2 use (stack-slot-web1')
> ...
> ...
> ...
> DELETED insn13' def web12 = use web1'
> #if possible
> insn3 use (stack-slot-web1')
>
> Register pressure reduced as possible.
I see, what you want to do. You want to eliminate those small webs just
used for loading values from spilled webs for one instruction (web10-12).
And you want to do that by directly placing the reference for the stack
slot into the insn using (or defining) it. This is a sensible thing to
do, and partially we were doing that already. Conceptually you want to
convert (this is just example):
I1 web10 <= [stack+8]
I2 webX <= webX + web10
into
I2' webX <= webX + [stack+8]
if possible (i.e. if the constraints of I2 allow a mem reference). Is that
correct so far? And that was the reason you began creating real stack
slots sooner, yes? But there is no reason to really use stack slots
already. Let me explain, how I've done this:
I never create stack slots, but instead stack pseudos, so initially I
would end up with this code (sweb is my stack-pseudo web):
I1 web10 <= sweb
I2 webX <= webX + web10
and I also want to transform it into:
I2' webX <= webX + sweb
The question is, when is this possible. Well, simple: in validate_change
(and it's subfunctions), if you encounter a stack pseudo you have to think
of it, as _if_ it were a stack slot. There's no need to actually create
that stack slot, one just has to differentiate between normal pseudo regs
and stack pseudos. This worked out pretty well for my versions of the
branch, and I believe in this respect it's equivalent of what you are
doing. But it has one big advantage: that stack slots are only assigned
at the last step, means, we can track the liveness of those "stack-slots"
better (because they are still just stack-pseudos), and we can later merge
stack slots very easily.
Hmm, I'm not too sure, if I have made myself very clear. Please ask if
something wasn't understandable. And please correct me if I've understood
the purpose of your sooner stackslot assignments incorrectly. Just to
collect it in one sentence: _I_ think, that you do that only to be able to
delete the small webs. If there are also other reasons please tell me,
because I think my solution above is more elegant.
Ciao,
Michael.
From: Denis Chertykov <denisc@overta.ru>
Subject: Re: Example of delayed remat
To: Michael Matz <matz@suse.de>
Cc: <denisc@overta.ru>
Date: 23 Jan 2003 21:36:00 +0300
Michael Matz <matz@suse.de> writes:
> Hi Denis,
>
> On Thu, 23 Jan 2003, Gazovik wrote:
>
> > I will try to rearrange my thoughts about delayed remat.
> >
> > Abstract:
> > 1. Originally new allocator has only IR spilling;
> > 2. After that you have added web splitting based on IR spilling. I
> > wrote about
> > generation spill pseudos instead of stack slots. I call it IR splitting.
>
> To make sure I've understood you correctly: that we use spill pseudos (or
> stack pseudos as I've called them in the comments), _instead_ of directly
> using stack slots, is named IR splitting by you, yes?
Yes.
>
> > The allocation process (simplified):
> > 1. Try to colorize webs;
>
> Here is one (important) part missing. Before building the normal SELECT
> stack (your number 2. ) first all stack pseudos are removed from the
> graph.
IMHO: Pretty all stack pseudos will be coalesced with splitted parts of
original webs and pretty all stack pseudos will be in SELECT stack.
>
> > 2. Choose webs for spilling; (Webs selection based on coloring algorithm
> > not on web cost, but on web degree.
> > It's symplify not select_spill.)
>
> Do you mean here the building of the SELECT stack (i.e. the order in which
> the webs later are tried to be colored)?
Yes.
>
> > 3. Choose webs with lowest cost for spilling; (It's select_spill)
> > 4. Recolor spills;
> > 5. Actual spilling or stack slots allocation.
> > Actual spilling is an IR splitting not a spilling.
> > Stack slots allocation is an IR spilling on top of IR splitting.
> > Stack slots allocation is assign_stack_slots ().
> >
> > It's my view on colorize - spilling phases.
> > So, why the delayed remat is a win in current allocator implementation ?
>
> Ok, I think I now agree that delayed remat might be a win in some
> situations. At least it should do not too much harm.
OK.
> > Why assign stack slots in any pass of allocator (not at end of
> > allocation) ?
>
> Ahh, yes. The interesting question ;-)
>
> > After IR splitting. After generation of spill reg. After actual_spill ().
> >
> > insn1 def web10
> > insn10' def of spill web1' = use web10
> > ...
> > insn12' def web11 = use web1'
> > insn2 use web11
> > ...
> > insn13' def web12 = use web1'
> > insn3 use web12
>
> Yes. Ok, web1' is the former web1, but it's now a stack pseudo! I.e.
> conceptually it represents a stack slot.
I disagree with you.
IMHO: web1' is a special pseudo with weak coloring possibility.
It's not principal, but I imagine things so.
> web10, web11 and web12 are normal webs (Ok, they are spill-temps,
> but otherwise normal short webs).
>
> > In this stage we have *increased register pressure*.
>
> Yes and no. The first thing, which is done for the next coloring phase
> is, that _all_ stack pseudo webs are removed from the conflict graph and
> put on the SELECT stack (i.e. later they will be colored last). In that
> way register pressure is lower then in the former pass (because the
> long-live web1' is now not in the graph anymore).
Exclude stack pseudo webs which coalesced with normal webs.
Is this right ?
> This means, that after removing those stack pseudos, the graph is
> _exactly_ the same, as if we had used stack slots directly for web1'.
My previous question actual here.
> The reason why I originally introduced stack pseudos was, that I
> wanted to track liveness for stack-slots too, but it seemed to me
> too much work to implement now also such tracking for (MEM) rtx'es,
> although I already have all the machinery for (REG) rtx'es. But
> otherwise those stack pseudos are totally equivalent to stack slots,
> just that their final address is not yet known, and that they have a
> funny rtx code for memory positions (i.e. they have REG ;-) ).
We have a different points of view.
IMHO: stack pseudo webs is a special webs with weak coloring possibility.
> What we get for free by doing it that way is, that we can possibly
> (if the situation permit) also color a stack slot again (which is
> equivalent to make a liveness analysis pass just for stackslots, and
> if we have free registers put those stack slots back into
> registers).
>
> But otherwise while the coloring is in process the graph is totally the
> same as if we had used stack slots from the beginning.
Again IMHO: Exclude stack pseudo webs which coalesced with normal webs.
> > IMHO: Delayed remat related.
> > Very important thing that cost of web1' lower then cost of web1
> > (before IR splitting). The spill web1' usually will have a lower degree
> > and web1' can be coalesced with web10 and/or web11 and/or web12 and can
> > be successfully colored or not coalesced and colored.
> >
> > If web1' isn't colored (colored by an_unusable_color) then web10,
> > web11, web12 still require color and *register pressure still
> > increased*.
I was wrong here.
Register pressure has a small reduction.
>
> No. If web1' isn't colored, this means, that now we are sure, that in
> fact web1' _does_ represent a stack slot, and not a register.
Yes. Yes. It was my fault. I'm already know about this.
Register pressure has a small reduction here.
> I.e. we know then, that we could have used a stack slot instead of a
> stack pseudo and get the same coloring of the graph. But this can
> change in the next round. Btw. you'll notice that if stack pseudos
> get no color, that is no reason to repeat the coloring process, or
> even to emit spill code. They are finally equivalent to a
> stack-based (MEM).
>
> That web10,web11 and web12 still need a color is not different if we had
> used directly stack slots. The code above would have looked like so:
>
> insn1 def web10
> insn10' [stack+8] = use web10
> ...
> insn12' def web11 = use [stack+8]
> insn2 use web11
> ...
> insn13' def web12 = use [stack+8]
> insn3 use web12
>
Yes. Yes. You right.
> > After assign stack slots and coalesce spill slots:
> >
> > # if possible then web10 and web1' will be processed by coalesce_spill_slot
> > insn1 def (stack-slot-web1')
> > DELETED insn10' def of spill web1' = use web10
> > ...
> > ...
> > ...
> > DELETED insn12' def web11 = use web1'
> > #if possible
> > insn2 use (stack-slot-web1')
> > ...
> > ...
> > ...
> > DELETED insn13' def web12 = use web1'
> > #if possible
> > insn3 use (stack-slot-web1')
> >
> > Register pressure reduced as possible.
>
> I see, what you want to do. You want to eliminate those small webs just
> used for loading values from spilled webs for one instruction (web10-12).
> And you want to do that by directly placing the reference for the stack
> slot into the insn using (or defining) it. This is a sensible thing to
> do, and partially we were doing that already. Conceptually you want to
> convert (this is just example):
>
> I1 web10 <= [stack+8]
> I2 webX <= webX + web10
>
> into
>
> I2' webX <= webX + [stack+8]
>
> if possible (i.e. if the constraints of I2 allow a mem reference). Is that
> correct so far? And that was the reason you began creating real stack
> slots sooner, yes?
Generally no. I have already answered this question.
IMHO: I must know size of stack slots for register elimination.
(elimination of argument-pointer or frame-pointer)
> But there is no reason to really use stack slots already.
It's speedup compiler, but this is not important here.
> Let me explain, how I've done this:
>
> I never create stack slots, but instead stack pseudos, so initially I
> would end up with this code (sweb is my stack-pseudo web):
>
> I1 web10 <= sweb
> I2 webX <= webX + web10
>
> and I also want to transform it into:
>
> I2' webX <= webX + sweb
>
> The question is, when is this possible. Well, simple: in validate_change
> (and it's subfunctions), if you encounter a stack pseudo you have to think
> of it, as _if_ it were a stack slot. There's no need to actually create
> that stack slot, one just has to differentiate between normal pseudo regs
> and stack pseudos. This worked out pretty well for my versions of the
> branch, and I believe in this respect it's equivalent of what you are
> doing. But it has one big advantage: that stack slots are only assigned
> at the last step, means, we can track the liveness of those "stack-slots"
> better (because they are still just stack-pseudos), and we can later merge
> stack slots very easily.
Merging is not a problem. I can mark all basic blocks which used web1'
before transform it to stack and use this info later for merging stack
slots. (I wrote about using one stack slot for two or more
non-interfered webs which must be stack pseudos)
> Please ask if something wasn't understandable. And please correct
> me if I've understood the purpose of your sooner stackslot
> assignments incorrectly. Just to collect it in one sentence: _I_
> think, that you do that only to be able to delete the small webs.
Not only.
1. I want to know size of stack slots for register elimination.
(It is a separate question with many problems)
2. Your usage of validate_change over-constrain a spill slot web.
3. Many targets don't support infinite displacement in address, so
you will have a wrong insn after substitution spill-slot-web to
stack-spill-slot. Such insns will be reloaded. Some ports will
require remaining register for reloading and reload will kill all
your amazing allocation.
Reload is a code quality killer. Not reload. Reload has his own
spilling. This spilling is a killer.
> If there are also other reasons please tell me, because I think my
> solution above is more elegant.
In ideal case your solution more elegant. (for ix86)
Just a question for better understanding.
With which ports you are familiar ?
Denis.
From: Michael Matz <matz@suse.de>
Subject: Re: Example of delayed remat
To: Denis Chertykov <denisc@overta.ru>
Date: Fri, 24 Jan 2003 14:38:50 +0100 (CET)
Hi,
On 23 Jan 2003, Denis Chertykov wrote:
> > Here is one (important) part missing. Before building the normal SELECT
> > stack (your number 2. ) first all stack pseudos are removed from the
> > graph.
>
> IMHO: Pretty all stack pseudos will be coalesced with splitted parts of
> original webs and pretty all stack pseudos will be in SELECT stack.
Ooops. I see, that in the version in CVS the check for that is commented
out. In all my version I have this in ra-build.c:copy_insn_p():
if ((s_regno < FIRST_PSEUDO_REGISTER
&& d_regno < FIRST_PSEUDO_REGISTER)
|| SPILL_SLOT_P (s_regno)
|| SPILL_SLOT_P (d_regno))
return 0;
I.e. coalescing of stack-pseudos with _anything_ is prevented. I've
done this because generally the code with coalescing stack-pseudo was a
little bit worse, than when not coelescing them (some of them will anyway
be coalesced later by spill coalescing). That is the reason that for me
stack-pseudos and stack-slots are equivalent. I probably never committed
that version above, because I hoped to somewhen make that coalescing work
better.
> > > insn1 def web10
> > > insn10' def of spill web1' = use web10
> > > ...
> > > insn12' def web11 = use web1'
> > > insn2 use web11
> > > ...
> > > insn13' def web12 = use web1'
> > > insn3 use web12
> >
> > Yes. Ok, web1' is the former web1, but it's now a stack pseudo! I.e.
> > conceptually it represents a stack slot.
>
> I disagree with you.
> IMHO: web1' is a special pseudo with weak coloring possibility.
> It's not principal, but I imagine things so.
Hmm, but I think this is exactly the source of the problem. You see, we
spilled some web there, because reg pressure was too high in some way. We
don't gain much, if we just create another web, there. Reg pressure would
be the same. That new web must therefore be somehow "special" and handled
differently. In a traditional allocator we would have used a stack slot,
and ergo the natural way of dealing with that web is, to handle it like a
funny stack slot. Look at how stack-pseudo webs are handled in the
allocator:
1 they are removed from the graph very soon --> they don't contribute to
register pressure
2 (because of 1) they don't really conflict with normal webs during
coloring them
3 they aren't spilled, instead if they fail to get a color, they are
either ignored, or directly replaced by a stack slot (in the last pass)
You are right in some way, that stack webs are like webs with a weaker
coloring possibility. But they also differ in other ways, and because of
that are much more like colorizable stack slots.
> > put on the SELECT stack (i.e. later they will be colored last). In that
> > way register pressure is lower then in the former pass (because the
> > long-live web1' is now not in the graph anymore).
>
> Exclude stack pseudo webs which coalesced with normal webs.
> Is this right ?
Right. But as I've written above in my version I never did coalesce them
with anything.
> > The reason why I originally introduced stack pseudos was, that I
> > wanted to track liveness for stack-slots too, but it seemed to me
> > too much work to implement now also such tracking for (MEM) rtx'es,
> > although I already have all the machinery for (REG) rtx'es. But
> > otherwise those stack pseudos are totally equivalent to stack slots,
> > just that their final address is not yet known, and that they have a
> > funny rtx code for memory positions (i.e. they have REG ;-) ).
>
> We have a different points of view.
Obviously ;-) I just can say, what I had in mind when implementing stack
pseudos and the design decisions.
> IMHO: stack pseudo webs is a special webs with weak coloring possibility.
One more thing (similar to one of the above paragraphs): that we have
spilled something indicates high register pressure. That means, that
there is not _that_ much to gain by cleverness in trying to retain some
webs in register which already were selected for spilling.
> > if possible (i.e. if the constraints of I2 allow a mem reference). Is that
> > correct so far? And that was the reason you began creating real stack
> > slots sooner, yes?
>
> Generally no. I have already answered this question.
> IMHO: I must know size of stack slots for register elimination.
> (elimination of argument-pointer or frame-pointer)
Ohh, OK. Then let me rephrase it: The exact reasons you introduced
sooner stack slot assignments were:
1) to know stack size for register elimination
2) to know displacements in insns which use stack slots
3) to be able to get rid of the small webs
Are there more reasons to use stack slots so soon? All of these reasons
are indeed valid to use stack slots instead stack pseudos. But see below.
> Merging [stack-slots] is not a problem. I can mark all basic blocks
> which used web1' before transform it to stack and use this info later
> for merging stack slots. (I wrote about using one stack slot for two
> or more non-interfered webs which must be stack pseudos)
Yes, I've also done such a pass, and it's really a small function, because
we can use the normal coloring process restricted on stack pseudos.
> > assignments incorrectly. Just to collect it in one sentence: _I_
> > think, that you do that only to be able to delete the small webs.
>
> Not only.
> 1. I want to know size of stack slots for register elimination.
> (It is a separate question with many problems)
Yes, this I understand now.
> 2. Your usage of validate_change over-constrain a spill slot web.
In which way? Note that over-constraining a stack pseudo web is actually
no problem at all. If will be a stack slot if we can't color it. If we
weren't such clever to have stack pseudos at all, it would have been a
stack slot from the begin.
> 3. Many targets don't support infinite displacement in address, so
> you will have a wrong insn after substitution spill-slot-web to
> stack-spill-slot
Yes, that also is a problem with stack pseudos (on some architectures).
> Reload is a code quality killer. Not reload. Reload has his own
> spilling. This spilling is a killer.
Yes, we all know this ;-)
> > If there are also other reasons please tell me, because I think my
> > solution above is more elegant.
>
> In ideal case your solution more elegant. (for ix86)
And for many other architectures too. I agree that for also many
architectures it's not an ideal solution.
> Just a question for better understanding.
> With which ports you are familiar ?
x86, x86-64. I've worked also on alpha and ppc(64), but I wouldn't call
that familiar. But I know, that there are many ports which have certain
constraints on immediate operands, addressing modes and so on.
Ok, I now see, that using stack slots early has advantages on _some_
architectures. In fact I knew this from the beginning, I never deleted
the code, which instead of allocating a new pseudo reg for a spilled web,
allocated a stack slot directly, it was just deactivated (see
allocate_spill_web() before your changes). But on other architectures
it has more advantages to never use stack slots until the very end
(notable x86 and x86-64) (because they don't have hard constraints on
displacements and so on, and register elimination can also be done
afterwards).
Therefore I propose to make possible both things: your way of directly
allocating / using stack slots for "uncolored" stack pseudos (which is an
optimization of my old ever-use-stack-slots method), and my way of never
using stack-slots until the end. Would that work for you?
Ciao,
Michael.
From: Denis Chertykov <denisc@overta.ru>
Subject: Re: Example of delayed remat
To: Michael Matz <matz@suse.de>
Cc: Denis Chertykov <denisc@overta.ru>
Date: 24 Jan 2003 21:21:50 +0300
Michael Matz <matz@suse.de> writes:
> Hi,
>
> On 23 Jan 2003, Denis Chertykov wrote:
>
> > > Here is one (important) part missing. Before building the normal SELECT
> > > stack (your number 2. ) first all stack pseudos are removed from the
> > > graph.
> >
> > IMHO: Pretty all stack pseudos will be coalesced with splitted parts of
> > original webs and pretty all stack pseudos will be in SELECT stack.
>
> Ooops. I see, that in the version in CVS the check for that is commented
> out. In all my version I have this in ra-build.c:copy_insn_p():
> if ((s_regno < FIRST_PSEUDO_REGISTER
> && d_regno < FIRST_PSEUDO_REGISTER)
> || SPILL_SLOT_P (s_regno)
> || SPILL_SLOT_P (d_regno))
> return 0;
>
> I.e. coalescing of stack-pseudos with _anything_ is prevented. I've
> done this because generally the code with coalescing stack-pseudo was a
> little bit worse, than when not coelescing them (some of them will anyway
> be coalesced later by spill coalescing). That is the reason that for me
> stack-pseudos and stack-slots are equivalent. I probably never committed
> that version above, because I hoped to somewhen make that coalescing work
> better.
>
>
> > > > insn1 def web10
> > > > insn10' def of spill web1' = use web10
> > > > ...
> > > > insn12' def web11 = use web1'
> > > > insn2 use web11
> > > > ...
> > > > insn13' def web12 = use web1'
> > > > insn3 use web12
> > >
> > > Yes. Ok, web1' is the former web1, but it's now a stack pseudo! I.e.
> > > conceptually it represents a stack slot.
> >
> > I disagree with you.
> > IMHO: web1' is a special pseudo with weak coloring possibility.
> > It's not principal, but I imagine things so.
>
> Hmm, but I think this is exactly the source of the problem. You see, we
> spilled some web there, because reg pressure was too high in some way. We
> don't gain much, if we just create another web, there. Reg pressure would
> be the same. That new web must therefore be somehow "special" and handled
> differently. In a traditional allocator we would have used a stack slot,
> and ergo the natural way of dealing with that web is, to handle it like a
> funny stack slot. Look at how stack-pseudo webs are handled in the
> allocator:
> 1 they are removed from the graph very soon --> they don't contribute to
> register pressure
> 2 (because of 1) they don't really conflict with normal webs during
> coloring them
> 3 they aren't spilled, instead if they fail to get a color, they are
> either ignored, or directly replaced by a stack slot (in the last pass)
>
> You are right in some way, that stack webs are like webs with a weaker
> coloring possibility. But they also differ in other ways, and because of
> that are much more like colorizable stack slots.
>
> > > put on the SELECT stack (i.e. later they will be colored last). In that
> > > way register pressure is lower then in the former pass (because the
> > > long-live web1' is now not in the graph anymore).
> >
> > Exclude stack pseudo webs which coalesced with normal webs.
> > Is this right ?
>
> Right. But as I've written above in my version I never did coalesce them
> with anything.
>
> > > The reason why I originally introduced stack pseudos was, that I
> > > wanted to track liveness for stack-slots too, but it seemed to me
> > > too much work to implement now also such tracking for (MEM) rtx'es,
> > > although I already have all the machinery for (REG) rtx'es. But
> > > otherwise those stack pseudos are totally equivalent to stack slots,
> > > just that their final address is not yet known, and that they have a
> > > funny rtx code for memory positions (i.e. they have REG ;-) ).
> >
> > We have a different points of view.
>
> Obviously ;-) I just can say, what I had in mind when implementing stack
> pseudos and the design decisions.
>
> > IMHO: stack pseudo webs is a special webs with weak coloring possibility.
>
> One more thing (similar to one of the above paragraphs): that we have
> spilled something indicates high register pressure. That means, that
> there is not _that_ much to gain by cleverness in trying to retain some
> webs in register which already were selected for spilling.
>
> > > if possible (i.e. if the constraints of I2 allow a mem reference). Is that
> > > correct so far? And that was the reason you began creating real stack
> > > slots sooner, yes?
> >
> > Generally no. I have already answered this question.
> > IMHO: I must know size of stack slots for register elimination.
> > (elimination of argument-pointer or frame-pointer)
>
> Ohh, OK. Then let me rephrase it: The exact reasons you introduced
> sooner stack slot assignments were:
> 1) to know stack size for register elimination
> 2) to know displacements in insns which use stack slots
> 3) to be able to get rid of the small webs
> Are there more reasons to use stack slots so soon? All of these reasons
> are indeed valid to use stack slots instead stack pseudos. But see below.
>
> > Merging [stack-slots] is not a problem. I can mark all basic blocks
> > which used web1' before transform it to stack and use this info later
> > for merging stack slots. (I wrote about using one stack slot for two
> > or more non-interfered webs which must be stack pseudos)
>
> Yes, I've also done such a pass, and it's really a small function, because
> we can use the normal coloring process restricted on stack pseudos.
>
> > > assignments incorrectly. Just to collect it in one sentence: _I_
> > > think, that you do that only to be able to delete the small webs.
> >
> > Not only.
> > 1. I want to know size of stack slots for register elimination.
> > (It is a separate question with many problems)
>
> Yes, this I understand now.
>
> > 2. Your usage of validate_change over-constrain a spill slot web.
>
> In which way? Note that over-constraining a stack pseudo web is actually
> no problem at all. If will be a stack slot if we can't color it. If we
> weren't such clever to have stack pseudos at all, it would have been a
> stack slot from the begin.
>
> > 3. Many targets don't support infinite displacement in address, so
> > you will have a wrong insn after substitution spill-slot-web to
> > stack-spill-slot
>
> Yes, that also is a problem with stack pseudos (on some architectures).
>
> > Reload is a code quality killer. Not reload. Reload has his own
> > spilling. This spilling is a killer.
>
> Yes, we all know this ;-)
>
> > > If there are also other reasons please tell me, because I think my
> > > solution above is more elegant.
> >
> > In ideal case your solution more elegant. (for ix86)
>
> And for many other architectures too. I agree that for also many
> architectures it's not an ideal solution.
>
> > Just a question for better understanding.
> > With which ports you are familiar ?
>
> x86, x86-64. I've worked also on alpha and ppc(64), but I wouldn't call
> that familiar. But I know, that there are many ports which have certain
> constraints on immediate operands, addressing modes and so on.
>
> Ok, I now see, that using stack slots early has advantages on _some_
> architectures. In fact I knew this from the beginning, I never deleted
> the code, which instead of allocating a new pseudo reg for a spilled web,
> allocated a stack slot directly, it was just deactivated (see
> allocate_spill_web() before your changes). But on other architectures
> it has more advantages to never use stack slots until the very end
> (notable x86 and x86-64) (because they don't have hard constraints on
> displacements and so on, and register elimination can also be done
> afterwards).
>
> Therefore I propose to make possible both things: your way of directly
> allocating / using stack slots for "uncolored" stack pseudos (which is an
> optimization of my old ever-use-stack-slots method), and my way of never
> using stack-slots until the end. Would that work for you?
I can't answer right now. (I havn't a time)
I will be on hunt twomorrow. :-[]
Please wait to sunday. :(
Denis.
From: Michael Matz <matz@suse.de>
Subject: Re: Example of delayed remat
To: Denis Chertykov <denisc@overta.ru>
Date: Fri, 24 Jan 2003 19:46:53 +0100 (CET)
Hi,
On 24 Jan 2003, Denis Chertykov wrote:
> I can't answer right now. (I havn't a time)
> I will be on hunt twomorrow. :-[]
> Please wait to sunday. :(
Well, given how my reaction time sometimes is, I can't possibly be angry
about that. ;-)
Ciao,
Michael.
P.S: What things do you hunt?
From: Denis Chertykov <denisc@overta.ru>
Subject: Re: Example of delayed remat
To: Michael Matz <matz@suse.de>
Cc: Denis Chertykov <denisc@overta.ru>
Date: 26 Jan 2003 15:50:58 +0300
Michael Matz <matz@suse.de> writes:
> Hi,
>
> On 24 Jan 2003, Denis Chertykov wrote:
>
> > I can't answer right now. (I havn't a time)
> > I will be on hunt twomorrow. :-[]
> > Please wait to sunday. :(
>
> Well, given how my reaction time sometimes is, I can't possibly be angry
> about that. ;-)
>
>
> Ciao,
> Michael.
> P.S: What things do you hunt?
Winter: Hare, deer/roe.
Spring: Goose.
Summer: None.
Autumn: Duck.
Are you interested ?
Denis.
From: Denis Chertykov <denisc@overta.ru>
Subject: Re: Example of delayed remat
To: Michael Matz <matz@suse.de>
Cc: Denis Chertykov <denisc@overta.ru>
Date: 26 Jan 2003 15:50:46 +0300
Michael Matz <matz@suse.de> writes:
> Hi,
>
> On 23 Jan 2003, Denis Chertykov wrote:
>
> > > Here is one (important) part missing. Before building the normal SELECT
> > > stack (your number 2. ) first all stack pseudos are removed from the
> > > graph.
> >
> > IMHO: Pretty all stack pseudos will be coalesced with splitted parts of
> > original webs and pretty all stack pseudos will be in SELECT stack.
>
> Ooops. I see, that in the version in CVS the check for that is commented
> out. In all my version I have this in ra-build.c:copy_insn_p():
> if ((s_regno < FIRST_PSEUDO_REGISTER
> && d_regno < FIRST_PSEUDO_REGISTER)
> || SPILL_SLOT_P (s_regno)
> || SPILL_SLOT_P (d_regno))
> return 0;
>
> I.e. coalescing of stack-pseudos with _anything_ is prevented. I've
> done this because generally the code with coalescing stack-pseudo was a
> little bit worse, than when not coelescing them (some of them will anyway
> be coalesced later by spill coalescing). That is the reason that for me
> stack-pseudos and stack-slots are equivalent. I probably never committed
> that version above, because I hoped to somewhen make that coalescing work
> better.
>
>
> > > > insn1 def web10
> > > > insn10' def of spill web1' = use web10
> > > > ...
> > > > insn12' def web11 = use web1'
> > > > insn2 use web11
> > > > ...
> > > > insn13' def web12 = use web1'
> > > > insn3 use web12
> > >
> > > Yes. Ok, web1' is the former web1, but it's now a stack pseudo! I.e.
> > > conceptually it represents a stack slot.
> >
> > I disagree with you.
> > IMHO: web1' is a special pseudo with weak coloring possibility.
> > It's not principal, but I imagine things so.
>
> Hmm, but I think this is exactly the source of the problem. You see, we
> spilled some web there, because reg pressure was too high in some way. We
> don't gain much, if we just create another web, there. Reg pressure would
> be the same. That new web must therefore be somehow "special" and handled
> differently. In a traditional allocator we would have used a stack slot,
> and ergo the natural way of dealing with that web is, to handle it like a
> funny stack slot. Look at how stack-pseudo webs are handled in the
> allocator:
> 1 they are removed from the graph very soon --> they don't contribute to
> register pressure
> 2 (because of 1) they don't really conflict with normal webs during
> coloring them
> 3 they aren't spilled, instead if they fail to get a color, they are
> either ignored, or directly replaced by a stack slot (in the last pass)
>
> You are right in some way, that stack webs are like webs with a weaker
> coloring possibility. But they also differ in other ways, and because of
> that are much more like colorizable stack slots.
Only one thing disturb me.
IE:
i1 use web1 (GENERAL_REGS)
...
i2 use web1 (CREG)
...
i3 use web1 (GENERAL_REGS)
We can't colorize web1 only because web1 require too constrained
class.
Spill web1:
i11 def web10 use web1'
i1 use web10 (GENERAL_REGS)
...
i12 def web11 use web1'
i2 use web11 (CREG)
...
i13 def web12 use web1'
i3 use web12 (GENERAL_REGS)
I don't want to think that web1' already in stack.
I want to think that web1' will be colorized as any other web.
Even more. I want to think that web1' even isn't weak colorable.
> > > put on the SELECT stack (i.e. later they will be colored last). In that
> > > way register pressure is lower then in the former pass (because the
> > > long-live web1' is now not in the graph anymore).
> >
> > Exclude stack pseudo webs which coalesced with normal webs.
> > Is this right ?
>
> Right. But as I've written above in my version I never did coalesce them
> with anything.
>
> > > The reason why I originally introduced stack pseudos was, that I
> > > wanted to track liveness for stack-slots too, but it seemed to me
> > > too much work to implement now also such tracking for (MEM) rtx'es,
> > > although I already have all the machinery for (REG) rtx'es. But
> > > otherwise those stack pseudos are totally equivalent to stack slots,
> > > just that their final address is not yet known, and that they have a
> > > funny rtx code for memory positions (i.e. they have REG ;-) ).
> >
> > We have a different points of view.
>
> Obviously ;-) I just can say, what I had in mind when implementing stack
> pseudos and the design decisions.
>
> > IMHO: stack pseudo webs is a special webs with weak coloring possibility.
>
> One more thing (similar to one of the above paragraphs): that we have
> spilled something indicates high register pressure. That means, that
> there is not _that_ much to gain by cleverness in trying to retain some
> webs in register which already were selected for spilling.
>
> > > if possible (i.e. if the constraints of I2 allow a mem reference). Is that
> > > correct so far? And that was the reason you began creating real stack
> > > slots sooner, yes?
> >
> > Generally no. I have already answered this question.
> > IMHO: I must know size of stack slots for register elimination.
> > (elimination of argument-pointer or frame-pointer)
>
> Ohh, OK. Then let me rephrase it: The exact reasons you introduced
> sooner stack slot assignments were:
> 1) to know stack size for register elimination
> 2) to know displacements in insns which use stack slots
> 3) to be able to get rid of the small webs
> Are there more reasons to use stack slots so soon? All of these reasons
> are indeed valid to use stack slots instead stack pseudos.
4) to treat stack pseudos as a simple webs with possibility to be
converted to stack slots.
> But see below.
>
> > Merging [stack-slots] is not a problem. I can mark all basic blocks
> > which used web1' before transform it to stack and use this info later
> > for merging stack slots. (I wrote about using one stack slot for two
> > or more non-interfered webs which must be stack pseudos)
>
> Yes, I've also done such a pass, and it's really a small function, because
> we can use the normal coloring process restricted on stack pseudos.
Yes.
> > > assignments incorrectly. Just to collect it in one sentence: _I_
> > > think, that you do that only to be able to delete the small webs.
> >
> > Not only.
> > 1. I want to know size of stack slots for register elimination.
> > (It is a separate question with many problems)
>
> Yes, this I understand now.
>
> > 2. Your usage of validate_change over-constrain a spill slot web.
>
> In which way?
i1 use web1 (GENERAL_REGS)
...
i2 use web1 (CREG)
after spill.
i11 def web10 use web1'
i1 use web10 (GENERAL_REGS)
...
# I prefer this: i12 def web11 use web1'
# and this: i2 use web11 (CREG)
# You prefer the following insn:
i2 use web1' (CREG)
In your case the possibility of web'1 colorization will be
dramatically low.
Is this right ? (May be I miss some your think.)
> Note that over-constraining a stack pseudo web is actually no
> problem at all. If will be a stack slot if we can't color it.
Significantly better to colorize stack pseudo than to think that it's
already a semi 'stack slot'
[...]
> Therefore I propose to make possible both things: your way of directly
> allocating / using stack slots for "uncolored" stack pseudos (which is an
> optimization of my old ever-use-stack-slots method), and my way of never
> using stack-slots until the end. Would that work for you?
Yes.
But, please compare a generated code quality with my patch and
without it.
Dealayed remat again:
Delayed remat and my view to stack pseudo are the the same things.
I can try to rearrange my thought about spill pseudos.
I just want to divide spill vars to two layers:
1) spill pseudos for coloring; (I have used here the spill-slot-web)
2) spill pseudos for stack. (I have used here the real stack slots)
The spill vars moved form 1) to 2) if and only if allocator color it
to an_unusable_color.
Denis.
PS: How about delayed remat patch ?
PPS: I have founded a bug in ra-rewrite.c:web_class_spill_ref, but this
fix can't help to compile jcf-dump.i. Try it.
I'm continue to debug jcf-dump.i.
diff -c3p /root/d/cvs/new-regalloc-branch/gcc/ra-rewrite.c.\~1.1.2.7.\~ /root/d/cvs/new-regalloc-branch/gcc/ra-rewrite.c
*** /root/d/cvs/new-regalloc-branch/gcc/ra-rewrite.c.~1.1.2.7.~ Wed Jan 8 02:21:41 2003
--- /root/d/cvs/new-regalloc-branch/gcc/ra-rewrite.c Sun Jan 26 12:21:36 2003
*************** web_class_spill_ref (web, ref)
*** 2792,2799 ****
int i, j;
rtx source, target;
struct ref **refs;
rtx reg = gen_reg_rtx (PSEUDO_REGNO_MODE (web->regno));
- rtx def_rtx = NULL;
basic_block bb = BLOCK_FOR_INSN (insn);
for (i = 0, refs = web->uses, num_refs = web->num_uses;
--- 2792,2800 ----
int i, j;
rtx source, target;
struct ref **refs;
+ rtx def_dst;
+ rtx def_src = NULL;
rtx reg = gen_reg_rtx (PSEUDO_REGNO_MODE (web->regno));
basic_block bb = BLOCK_FOR_INSN (insn);
for (i = 0, refs = web->uses, num_refs = web->num_uses;
*************** web_class_spill_ref (web, ref)
*** 2814,2822 ****
ra_validate_change (insn, DF_REF_LOC (refs[j]), source, 1);
if (i == 1) /* This is a def. */
{
! if (def_rtx)
abort ();
! def_rtx = source;
}
}
if (!ra_apply_change_group ())
--- 2815,2824 ----
ra_validate_change (insn, DF_REF_LOC (refs[j]), source, 1);
if (i == 1) /* This is a def. */
{
! if (def_src)
abort ();
! def_src = source;
! def_dst = DF_REF_REG (refs[j]);
}
}
if (!ra_apply_change_group ())
*************** web_class_spill_ref (web, ref)
*** 2826,2832 ****
bitmap_set_bit (ra_modified_insns, INSN_UID (insn));
start_sequence ();
! ra_emit_move_insn (reg, DF_REF_REG (ref));
insns = get_insns ();
end_sequence ();
if (insns)
--- 2828,2834 ----
bitmap_set_bit (ra_modified_insns, INSN_UID (insn));
start_sequence ();
! ra_emit_move_insn (reg, web->orig_x);
insns = get_insns ();
end_sequence ();
if (insns)
*************** web_class_spill_ref (web, ref)
*** 2845,2854 ****
}
}
! if (def_rtx)
{
start_sequence ();
! ra_emit_move_insn (DF_REF_REG (ref), copy_rtx (def_rtx));
insns = get_insns ();
end_sequence ();
if (insns)
--- 2847,2856 ----
}
}
! if (def_src)
{
start_sequence ();
! ra_emit_move_insn (def_dst, copy_rtx (def_src));
insns = get_insns ();
end_sequence ();
if (insns)
From: Michael Matz <matz@suse.de>
Subject: Re: Example of delayed remat
To: Denis Chertykov <denisc@overta.ru>
Date: Wed, 29 Jan 2003 20:30:48 +0100 (CET)
Hi,
better late than never:
On 26 Jan 2003, Denis Chertykov wrote:
> > You are right in some way, that stack webs are like webs with a weaker
> > coloring possibility. But they also differ in other ways, and because of
> > that are much more like colorizable stack slots.
>
> Only one thing disturb me.
> IE:
>
> i1 use web1 (GENERAL_REGS)
> ...
> i2 use web1 (CREG)
> ...
> i3 use web1 (GENERAL_REGS)
>
> We can't colorize web1 only because web1 require too constrained
> class.
That's correct.
> Spill web1:
>
> i11 def web10 use web1'
> i1 use web10 (GENERAL_REGS)
> ...
> i12 def web11 use web1'
> i2 use web11 (CREG)
> ...
> i13 def web12 use web1'
> i3 use web12 (GENERAL_REGS)
>
> I don't want to think that web1' already in stack.
> I want to think that web1' will be colorized as any other web.
> Even more. I want to think that web1' even isn't weak colorable.
I agree with you. We have to look at the reason for spilling in some way,
i.e. if the reason was, that it's too constrained, or that simply the
register pressure was too high. For the second case I _do_ want to see
the stack web as funny stack slot.
For the first case though, I agree with you, that in some way not web1
should be thought of as spilled, but instead changed in some way to make
it colorable more easy. You already have written code to "spill" webs, if
their class is too constrained (I think you only spill it, if the classes
don't intersect at all, but it could be extended, if the resulting class
is too small, for some definition of too small). In a way I don't see
this as spilling web1, and I would like we would produce this code:
i1 use web1 (GENERAL_REGS)
...
i12 def web11 use web1
i2 use web11 (CREG)
...
i3 use web1 (GENERAL_REGS)
i.e. only insert small webs at the constraining insns, instead of spilling
it everywhere. This then would lead to web1 _not_ regarded as spilled,
and being a stack slot, but instead of having one use of it moved to a
place, where it isn't that much constrained anymore.
> > Ohh, OK. Then let me rephrase it: The exact reasons you introduced
> > sooner stack slot assignments were:
> > 1) to know stack size for register elimination
> > 2) to know displacements in insns which use stack slots
> > 3) to be able to get rid of the small webs
> > Are there more reasons to use stack slots so soon? All of these reasons
> > are indeed valid to use stack slots instead stack pseudos.
>
> 4) to treat stack pseudos as a simple webs with possibility to be
> converted to stack slots.
How is that a reason to make them stack slots sooner? Either you have
stack pseudos or stack slots. 4) would be a reason to _not_ make stack
slots too soon, because in this case we have more stack pseudos, which we
can handle like simple webs, with possibility to be converted to stack
slots.
> > > > assignments incorrectly. Just to collect it in one sentence: _I_
> > > > think, that you do that only to be able to delete the small webs.
> > >
> > > Not only.
> > > 1. I want to know size of stack slots for register elimination.
> > > (It is a separate question with many problems)
> >
> > Yes, this I understand now.
> >
> > > 2. Your usage of validate_change over-constrain a spill slot web.
> >
> > In which way?
>
> i1 use web1 (GENERAL_REGS)
> ...
> i2 use web1 (CREG)
>
> after spill.
>
> i11 def web10 use web1'
> i1 use web10 (GENERAL_REGS)
> ...
> # I prefer this: i12 def web11 use web1'
> # and this: i2 use web11 (CREG)
> # You prefer the following insn:
> i2 use web1' (CREG)
>
> In your case the possibility of web'1 colorization will be
> dramatically low.
> Is this right ? (May be I miss some your think.)
This is indeed correct. But it's also true, that the stack pseudo is not
more constrained than the spilled web, for which it was created. One
could think about the possibility to only in-place change the use to the
stack pseudo, if that reference is not constraining too much.
Alternatively my version of insn i2 above could be converted to your form
(using the intermediate web11) at the begin of the next coloring pass
using the machinery which you already partly wrote (for "pre-spilling" too
constrained webs).
> > Note that over-constraining a stack pseudo web is actually no
> > problem at all. If will be a stack slot if we can't color it.
>
> Significantly better to colorize stack pseudo than to think that it's
> already a semi 'stack slot'
I think it's not that easy. If the spilling was because of too
constrained-ness then you are right obviously. But if the spilling was
because of too high reg pressure, it's anyway unlikely to be possible to
color the stack pseudo, but it _could_ happen. It's just that I think one
should not put too much intelligence to color such stack pseudos (I mean
the ones resulting from too high pressure), because it's unlikely to help
anyway. It's different with stack pseudos resulting from other types of
spilling, but I also think those other webs should not be stack pseudos
from the beginning. I.e. too constrained webs should be handled
differently (before actual coloring) from webs which are spilled for the
normal reason (too high pressure). Hmm, that's not yet thought through
from me, it's more like developing an idea while writing this mail ;-)
> > allocating / using stack slots for "uncolored" stack pseudos (which is an
> > optimization of my old ever-use-stack-slots method), and my way of never
> > using stack-slots until the end. Would that work for you?
>
> Yes.
>
> But, please compare a generated code quality with my patch and
> without it.
Which patch do you mean now? The delayed remat patch? I already aggreed
that this is probably a good idea. I was only talking about your former
(already committed) patch introducing stack slots earlier than before.
> I just want to divide spill vars to two layers:
> 1) spill pseudos for coloring; (I have used here the spill-slot-web)
> 2) spill pseudos for stack. (I have used here the real stack slots)
Ohh, I believe that's a similar idea to mine above, right? To divide
those spill places according to the reason why they were created. This
division can also be done without really creating (MEM) expression for the
stack slots. A simple flag in the web would be enough. (The reason why
you need sooner stack slots, like knowing the displacements are of course
still valid, so this is likely only helping on machines with fairly weak
constraints regarding addressing modes).
> The spill vars moved form 1) to 2) if and only if allocator color it
> to an_unusable_color.
>
> Denis.
> PS: How about delayed remat patch ?
Please check in that one.
Had you have a chance to look into the jcf-dump.i failure?
Ciao,
Michael.
From: Denis Chertykov <denisc@overta.ru>
Subject: Re: Example of delayed remat
To: Michael Matz <matz@suse.de>
Cc: Denis Chertykov <denisc@overta.ru>
Date: 01 Feb 2003 17:11:41 +0300
Michael Matz <matz@suse.de> writes:
[...]
>
> Had you have a chance to look into the jcf-dump.i failure?
This fix will be a part of 'dalayed remat' patch which I want to
commit today after testing and bootstrapping.
The fix:
Index: ra-build.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/ra-build.c,v
retrieving revision 1.1.2.7
diff -c -3 -p -r1.1.2.7 ra-build.c
*** ra-build.c 8 Jan 2003 00:51:53 -0000 1.1.2.7
--- ra-build.c 1 Feb 2003 14:06:21 -0000
*************** web_class (web)
*** 3694,3699 ****
--- 3694,3703 ----
{
blocks = 0;
c = class;
+ /* FIXME denisc@overta.ru
+ I have disabled the `web_class_spill_ref'. */
+ class = ALL_REGS;
+ break;
}
if (!blocks)
{
Index: ra-rewrite.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/ra-rewrite.c,v
retrieving revision 1.1.2.7
diff -c -3 -p -r1.1.2.7 ra-rewrite.c
*** ra-rewrite.c 8 Jan 2003 00:51:53 -0000 1.1.2.7
--- ra-rewrite.c 1 Feb 2003 14:07:13 -0000
*************** web_class_spill_ref (web, ref)
*** 2792,2799 ****
int i, j;
rtx source, target;
struct ref **refs;
rtx reg = gen_reg_rtx (PSEUDO_REGNO_MODE (web->regno));
- rtx def_rtx = NULL;
basic_block bb = BLOCK_FOR_INSN (insn);
for (i = 0, refs = web->uses, num_refs = web->num_uses;
--- 2792,2800 ----
int i, j;
rtx source, target;
struct ref **refs;
+ rtx def_dst;
+ rtx def_src = NULL;
rtx reg = gen_reg_rtx (PSEUDO_REGNO_MODE (web->regno));
basic_block bb = BLOCK_FOR_INSN (insn);
for (i = 0, refs = web->uses, num_refs = web->num_uses;
*************** web_class_spill_ref (web, ref)
*** 2814,2822 ****
ra_validate_change (insn, DF_REF_LOC (refs[j]), source, 1);
if (i == 1) /* This is a def. */
{
! if (def_rtx)
abort ();
! def_rtx = source;
}
}
if (!ra_apply_change_group ())
--- 2815,2824 ----
ra_validate_change (insn, DF_REF_LOC (refs[j]), source, 1);
if (i == 1) /* This is a def. */
{
! if (def_src)
abort ();
! def_src = source;
! def_dst = DF_REF_REG (refs[j]);
}
}
if (!ra_apply_change_group ())
*************** web_class_spill_ref (web, ref)
*** 2826,2832 ****
bitmap_set_bit (ra_modified_insns, INSN_UID (insn));
start_sequence ();
! ra_emit_move_insn (reg, DF_REF_REG (ref));
insns = get_insns ();
end_sequence ();
if (insns)
--- 2828,2834 ----
bitmap_set_bit (ra_modified_insns, INSN_UID (insn));
start_sequence ();
! ra_emit_move_insn (reg, web->orig_x);
insns = get_insns ();
end_sequence ();
if (insns)
*************** web_class_spill_ref (web, ref)
*** 2845,2854 ****
}
}
! if (def_rtx)
{
start_sequence ();
! ra_emit_move_insn (DF_REF_REG (ref), copy_rtx (def_rtx));
insns = get_insns ();
end_sequence ();
if (insns)
--- 2847,2856 ----
}
}
! if (def_src)
{
start_sequence ();
! ra_emit_move_insn (def_dst, copy_rtx (def_src));
insns = get_insns ();
end_sequence ();
if (insns)
From: Denis Chertykov <denisc@overta.ru>
Subject: Re: Example of delayed remat
To: Michael Matz <matz@suse.de>
Cc: Denis Chertykov <denisc@overta.ru>
Date: 02 Feb 2003 11:54:24 +0300
Michael Matz <matz@suse.de> writes:
> Hi,
>
>
> better late than never:
>
> On 26 Jan 2003, Denis Chertykov wrote:
>
> > > You are right in some way, that stack webs are like webs with a weaker
> > > coloring possibility. But they also differ in other ways, and because of
> > > that are much more like colorizable stack slots.
> >
> > Only one thing disturb me.
> > IE:
> >
> > i1 use web1 (GENERAL_REGS)
> > ...
> > i2 use web1 (CREG)
> > ...
> > i3 use web1 (GENERAL_REGS)
> >
> > We can't colorize web1 only because web1 require too constrained
> > class.
>
> That's correct.
>
> > Spill web1:
> >
> > i11 def web10 use web1'
> > i1 use web10 (GENERAL_REGS)
> > ...
> > i12 def web11 use web1'
> > i2 use web11 (CREG)
> > ...
> > i13 def web12 use web1'
> > i3 use web12 (GENERAL_REGS)
> >
> > I don't want to think that web1' already in stack.
> > I want to think that web1' will be colorized as any other web.
> > Even more. I want to think that web1' even isn't weak colorable.
>
> I agree with you. We have to look at the reason for spilling in some way,
> i.e. if the reason was, that it's too constrained, or that simply the
> register pressure was too high. For the second case I _do_ want to see
> the stack web as funny stack slot.
>
> For the first case though, I agree with you, that in some way not web1
> should be thought of as spilled, but instead changed in some way to make
> it colorable more easy. You already have written code to "spill" webs, if
> their class is too constrained (I think you only spill it, if the classes
> don't intersect at all, but it could be extended, if the resulting class
> is too small, for some definition of too small). In a way I don't see
> this as spilling web1, and I would like we would produce this code:
>
> i1 use web1 (GENERAL_REGS)
> ...
> i12 def web11 use web1
> i2 use web11 (CREG)
> ...
> i3 use web1 (GENERAL_REGS)
>
> i.e. only insert small webs at the constraining insns, instead of spilling
> it everywhere. This then would lead to web1 _not_ regarded as spilled,
> and being a stack slot, but instead of having one use of it moved to a
> place, where it isn't that much constrained anymore.
>
> > > Ohh, OK. Then let me rephrase it: The exact reasons you introduced
> > > sooner stack slot assignments were:
> > > 1) to know stack size for register elimination
> > > 2) to know displacements in insns which use stack slots
> > > 3) to be able to get rid of the small webs
> > > Are there more reasons to use stack slots so soon? All of these reasons
> > > are indeed valid to use stack slots instead stack pseudos.
> >
> > 4) to treat stack pseudos as a simple webs with possibility to be
> > converted to stack slots.
>
> How is that a reason to make them stack slots sooner? Either you have
> stack pseudos or stack slots. 4) would be a reason to _not_ make stack
> slots too soon, because in this case we have more stack pseudos, which we
> can handle like simple webs, with possibility to be converted to stack
> slots.
>
> > > > > assignments incorrectly. Just to collect it in one sentence: _I_
> > > > > think, that you do that only to be able to delete the small webs.
> > > >
> > > > Not only.
> > > > 1. I want to know size of stack slots for register elimination.
> > > > (It is a separate question with many problems)
> > >
> > > Yes, this I understand now.
> > >
> > > > 2. Your usage of validate_change over-constrain a spill slot web.
> > >
> > > In which way?
> >
> > i1 use web1 (GENERAL_REGS)
> > ...
> > i2 use web1 (CREG)
> >
> > after spill.
> >
> > i11 def web10 use web1'
> > i1 use web10 (GENERAL_REGS)
> > ...
> > # I prefer this: i12 def web11 use web1'
> > # and this: i2 use web11 (CREG)
> > # You prefer the following insn:
> > i2 use web1' (CREG)
> >
> > In your case the possibility of web'1 colorization will be
> > dramatically low.
> > Is this right ? (May be I miss some your think.)
>
> This is indeed correct. But it's also true, that the stack pseudo is not
> more constrained than the spilled web, for which it was created. One
> could think about the possibility to only in-place change the use to the
> stack pseudo, if that reference is not constraining too much.
> Alternatively my version of insn i2 above could be converted to your form
> (using the intermediate web11) at the begin of the next coloring pass
> using the machinery which you already partly wrote (for "pre-spilling" too
> constrained webs).
>
> > > Note that over-constraining a stack pseudo web is actually no
> > > problem at all. If will be a stack slot if we can't color it.
> >
> > Significantly better to colorize stack pseudo than to think that it's
> > already a semi 'stack slot'
>
> I think it's not that easy. If the spilling was because of too
> constrained-ness then you are right obviously. But if the spilling was
> because of too high reg pressure, it's anyway unlikely to be possible to
> color the stack pseudo, but it _could_ happen. It's just that I think one
> should not put too much intelligence to color such stack pseudos (I mean
> the ones resulting from too high pressure), because it's unlikely to help
> anyway. It's different with stack pseudos resulting from other types of
> spilling, but I also think those other webs should not be stack pseudos
> from the beginning. I.e. too constrained webs should be handled
> differently (before actual coloring) from webs which are spilled for the
> normal reason (too high pressure). Hmm, that's not yet thought through
> from me, it's more like developing an idea while writing this mail ;-)
Some weeks ago I was agree with you :) Just look to garbage in
'allocate_spill_web', especially to commented call to GO_IF_HARD_REG_EQUAL
It was my experiments with different spill types.
The main idea was: if we try to spill any web which require
GENERAL_REGS then it is a 'too high pressure' spill else it is a
constrained reg-class spill.
I got bad results with this idea and commented it (I don't delete it
specially for you).
Your previous thought:
> We have to look at the reason for spilling in some way,
> i.e. if the reason was, that it's too constrained, or that simply the
> register pressure was too high.
We can got 'too high pressure' in constrained class.
i.e. (For your favorite ix86 ;)
We can got 'too high pressure' in INDEX_REGS. In this case spilling to
GENERAL_REGS can be more effective than spilling to memory, but
pre-spilling (i.e spilling by web_class) impossible because INDEX_REGS
almost equal to GENERAL_REGS.
I call this situation "'too high pressure' in INDEX_REGS" because
INDEX_REGS has a many registers. It's not a CREG or AD_REGS - which I
call 'constrained'.
I'm agree with you, we must have more smart pre-spilling, but IR
spilling MUST work without it. Pre-spilling only an optimization.
(It's like reload. The reload pass can work almost without any other
pass and can generate ugly code. ;)
> > PS: How about delayed remat patch ?
>
> Please check in that one.
Done.
Denis.
PS: Are you compared code with my 'sooner stack slot assignments' and
before it ?
From: Michael Matz <matz@suse.de>
Subject: Re: Example of delayed remat
To: Denis Chertykov <denisc@overta.ru>
Date: Mon, 3 Feb 2003 20:50:04 +0100 (CET)
Hi Denis,
On 2 Feb 2003, Denis Chertykov wrote:
> Some weeks ago I was agree with you :) Just look to garbage in
> 'allocate_spill_web', especially to commented call to GO_IF_HARD_REG_EQUAL
;)
> It was my experiments with different spill types.
> The main idea was: if we try to spill any web which require
> GENERAL_REGS then it is a 'too high pressure' spill else it is a
> constrained reg-class spill.
> I got bad results with this idea and commented it (I don't delete it
> specially for you).
Yes, we might want to experiment with better heuristics. When I thought
about that I often thought, that something like this would work: If a
certain reference takes away twice as many registers, as there were if we
would not include that reference, than it should be handled specially.
With some cut-off numbers, so we don't spill already, if we go from
ALL_REGS to GENERAL_REGS. Additionally something like if the final
registers are only two registers more than the web needs (and before that
reference it were more registers), then this is also a constraining ref,
so should be spilled away somehow. Of course that all are heuristics, but
the whole allocator already is just a big heuristic ;-)
> Your previous thought:
> > We have to look at the reason for spilling in some way,
> > i.e. if the reason was, that it's too constrained, or that simply the
> > register pressure was too high.
>
> We can got 'too high pressure' in constrained class.
> i.e. (For your favorite ix86 ;)
> We can got 'too high pressure' in INDEX_REGS. In this case spilling to
> GENERAL_REGS can be more effective than spilling to memory,
Note really. On x86 GENERAL_REGS is equal to INDEX_REGS plus %esp (stack
pointer) (and some other fixed regs). That means, that if we have to
spill in INDEX_REGS, we also will need to spill in GENERAL_REGS. In this
sense GENERAL_REGS has no more free regs than INDEX_REGS. But I see your
general problem: If you have a wide regset, and a slightly wider one, this
should not be spilled directly to memory, but first to the wider pseudo.
In difference to constraining refs, where the difference between current
regset, and the one after adding the ref is too high.
Hmm. I have an idea, we can introduce an array which says for each
regclass, how a web of this class should be spilled. Either it will give
a wider regclass, or NO_REGS, in which case we spill to memory. This can
be initialized from the different rtl-cost functions (if class-mem moves
are cheaper than class-class moves for instance). So that we somewhere
then would use code like:
if (preferred_spill_class[web->current_class] == NO_REGS)
spill_to_memory ()
else
spill_to_pseudo (preferred_spill_class[web->current_class]);
I still want the ability, that the spill_to_memory() only introduces a
pseudo (i.e. what was formerly a real stack-pseudo, i.e. nearly equivalent
to a stack slot), for machines where this is worthwhile (i.e. not having
strict constraints on offsets and so on). And spill_to_pseudo () would
also create a new pseudo, but one which isn't like a stack slot, but
simply a new web, which is less constrained than the original one.
> I'm agree with you, we must have more smart pre-spilling, but IR
> spilling MUST work without it. Pre-spilling only an optimization.
Well, it will work without pre-spilling, but probably also emit
non-optimal code.
> (It's like reload. The reload pass can work almost without any other
> pass and can generate ugly code. ;)
;-)
> Done.
Thanks. You forgot to also update the ChangeLog.RA file (I can see the
log only in cvs log ...).
> PS: Are you compared code with my 'sooner stack slot assignments' and
> before it ?
Not yet. If was still waiting for the bootstrap fix. Currently we have
many things to do for the compiler of our next distribution release ;(
But I'm slowly getting my stuff merged into the branch (it already is able
to compile SPEC2000 again, but bootstrap is failing (it's a failure in the
merge, not in the branch)).
Ciao,
Michael.
From: Denis Chertykov <denisc@overta.ru>
Subject: Re: Example of delayed remat
To: Michael Matz <matz@suse.de>
Cc: Denis Chertykov <denisc@overta.ru>
Date: 04 Feb 2003 00:43:04 +0300
Michael Matz <matz@suse.de> writes:
[...]
>
> Thanks. You forgot to also update the ChangeLog.RA file (I can see the
> log only in cvs log ...).
Oops !
Fixed.
> > PS: Are you compared code with my 'sooner stack slot assignments' and
> > before it ?
>
> Not yet.
Again about 'sooner stack slot assignments':
Michael ! Are you sure that first three (3) conditions (`if'
expressions) in `coalesce_spill_slot' equal to your conditions in
emit_loads and insert_stores ?
I have asked because my experiments shows that coalesce_spill_slot
coalesce *ALL* small webs with spilled web (supports full UNDO of moves
inserted by IR spilling), but your previous method wasn't have so good
results. (Your have used web->last_use_insn).
(I may be wrong because I have a lot of experiments and I can forgot
which result was related to which experiment.)
I have asked the question because I don't sure that this things are
equal. I mean coalesce_spill_slot and your validate_change in
emit_loads and insert_stores.
May be better to clarify this ?
I warring about this because I think that we have a some misunderstandings.
> If was still waiting for the bootstrap fix. Currently we have many
Who is `we' ?
Are you plan to merge new allocator to 3.4 mainstream ?
> things to do for the compiler of our next distribution release ;(
> But I'm slowly getting my stuff merged into the branch (it already
> is able to compile SPEC2000 again, but bootstrap is failing (it's a
> failure in the merge, not in the branch)).
Can I help ?
Denis.
PS: I have started to work on regclass.c which I want to rewrite to work with
webs. Yes, it will be another piece of garbage in pre-reload.c (Don't cry).
I want to try the following scheme:
1) build webs;
2) fill web->class*, and additional field web->pref_class** by method
similar to regclass.c. It's a lot of approximations;
3) pre-reload all insns for satisfying constraints in all insns and
classes which was recorded in web->class;
4) Clean and true colorization.
* web->class a widest possible class;
** web->pref_class a class with lowest cost.
From: Michael Matz <matz@suse.de>
Subject: Re: Example of delayed remat
To: Denis Chertykov <denisc@overta.ru>
Date: Tue, 4 Feb 2003 01:39:43 +0100 (CET)
Hi,
On 4 Feb 2003, Denis Chertykov wrote:
> Again about 'sooner stack slot assignments':
>
> Michael ! Are you sure that first three (3) conditions (`if'
> expressions) in `coalesce_spill_slot' equal to your conditions in
> emit_loads and insert_stores ?
I'm confused. coalesce_spill_slot() is your function, so you must know
;-)
> I have asked because my experiments shows that coalesce_spill_slot
> coalesce *ALL* small webs with spilled web (supports full UNDO of moves
> inserted by IR spilling),
just a sidenote: I got sometimes worse code, when I tried to coalesce
spill pseudos and normal webs (and sometimes better), therefore I
deactiveated such coalescing later alltogether in my version. (My
reasoning was again similar to "ohh, we anyway needed to spill here, so
there is not much use to be clever, and try to color/coalesce stuff, as
ww will be unable to color it anyway, and would break the coalesce
anyway". As we now know that reasoning doesn't give the full picture).
> but your previous method wasn't have so good
> results. (Your have used web->last_use_insn).
The web->last_use_insn is necessary for the following:
suppose you have:
i1 use W1
...
i2 use W1
and W1 is spilled, and that nothing dies between i2 and i1. I used
validate_change to change the references of W1 in-place to the stack
pseudos (say W1'). The problem is, in which insn should this be done? I
have to remember them. And to not waste too much memory, I only
remembered the last instruction, and only changed in-place, if that was
the _only_ one (per spilling, i.e. the web itself can of course have had
more using insns). That's why in the above situation would be changed
into:
load W1 <== W1'
i1 use W1
...
i2 use W2
Your current method of coalesce_spill_slot might be better, that's
possible, especially if you can handle the above situation better (i.e. if
you can change both uses in-place).
> I have asked the question because I don't sure that this things are
> equal. I mean coalesce_spill_slot and your validate_change in
> emit_loads and insert_stores.
>
> May be better to clarify this ?
That would be good. One big difference of course is, that
coalesce_spill_slot() is only called with (MEM) expressions, whereas my
methods used web->stack_slot, which usually was a pseudo. But these are
more or less equivalent.
> I warring about this because I think that we have a some misunderstandings.
Hmm, where? In the above mentioned conditions? Could you give more
details, in which conditions might be worrysome?
> > If was still waiting for the bootstrap fix. Currently we have many
>
> Who is `we' ?
SuSE ;)
> Are you plan to merge new allocator to 3.4 mainstream ?
Probably. But not the next two/three weeks or so. The branch needs to
become stable again. Currently it still doesn't bootstrap java IIRC, even
with you latest changes. It gets further than before, but compiling
libjava has a problem:
libjava/java/lang/Runtime.java: In method `java.lang.Runtime.exit(int)':
libjava/java/lang/Runtime.java:280: internal compiler error:
in fixup_abnormal_edges, at reload1.c:9487
And I want to at least integrate the splitting pass.
> > things to do for the compiler of our next distribution release ;(
> > But I'm slowly getting my stuff merged into the branch (it already
> > is able to compile SPEC2000 again, but bootstrap is failing (it's a
> > failure in the merge, not in the branch)).
>
> Can I help ?
Not with the merge directly I think. But if you want, I can give you the
current diff, if you want to look at some things. Currently it seems I
can't write to my ftp directory. Shall I send it to you per mail (52KB)?
I'll anyway only do it tomorrow.
> PS: I have started to work on regclass.c which I want to rewrite to work with
> webs.
Ugh. regclass.c is so slow. I had hoped we can avoid it to a large
parts. And at least for x86, x86-64 and probably other regular
architectures (i.e. not too many funny insn constraints), it already seems
to work quite well without it.
> Yes, it will be another piece of garbage in pre-reload.c (Don't cry).
;-) Somewhen, somewhen ... we will clean this up ;)
> I want to try the following scheme:
> 1) build webs;
> 2) fill web->class*, and additional field web->pref_class** by method
> similar to regclass.c. It's a lot of approximations;
> 3) pre-reload all insns for satisfying constraints in all insns and
> classes which was recorded in web->class;
I see.
> 4) Clean and true colorization.
That would be cool, yes ;)
Ciao,
Michael.
From: Denis Chertykov <denisc@overta.ru>
Subject: Re: Example of delayed remat
To: Michael Matz <matz@suse.de>
Cc: Denis Chertykov <denisc@overta.ru>
Date: 04 Feb 2003 22:28:19 +0300
Michael Matz <matz@suse.de> writes:
[...]
>
> Hmm, where? In the above mentioned conditions? Could you give more
> details, in which conditions might be worrysome?
Probably nowhere. I'm agree to leave this theme.
>
> > > If was still waiting for the bootstrap fix. Currently we have many
> >
> > Who is `we' ?
>
> SuSE ;)
>
> > Are you plan to merge new allocator to 3.4 mainstream ?
>
> Probably. But not the next two/three weeks or so. The branch needs to
> become stable again. Currently it still doesn't bootstrap java IIRC, even
> with you latest changes. It gets further than before, but compiling
> libjava has a problem:
> libjava/java/lang/Runtime.java: In method `java.lang.Runtime.exit(int)':
> libjava/java/lang/Runtime.java:280: internal compiler error:
> in fixup_abnormal_edges, at reload1.c:9487
Yes, I got this error today. I will try to fix it.
>
> And I want to at least integrate the splitting pass.
>
> > > things to do for the compiler of our next distribution release ;(
> > > But I'm slowly getting my stuff merged into the branch (it already
> > > is able to compile SPEC2000 again, but bootstrap is failing (it's a
> > > failure in the merge, not in the branch)).
> >
> > Can I help ?
>
> Not with the merge directly I think. But if you want, I can give you the
> current diff, if you want to look at some things.
Yes. I want to look at diff.
> Currently it seems I can't write to my ftp directory. Shall I send
> it to you per mail (52KB)?
Yes.
> > PS: I have started to work on regclass.c which I want to rewrite to work with
> > webs.
>
> Ugh. regclass.c is so slow. I had hoped we can avoid it to a large
> parts. And at least for x86, x86-64 and probably other regular
> architectures (i.e. not too many funny insn constraints), it already seems
> to work quite well without it.
How to avoid it ? (use pre-reload as used today)
> > Yes, it will be another piece of garbage in pre-reload.c (Don't
> > cry).
>
> ;-) Somewhen, somewhen ... we will clean this up ;)
I leave pre-reload because I'm not sure that it will a part of
allocator. May be allocator will successfully work only with 2) from
my following scheme.
IMHO: pre-reload a very experimental part.
IMHO: it's a successful experiment, but not ideal and not complete.
Are you interested in reloads count after allocation ?
(I mean reloads with type != optional)
>
> > I want to try the following scheme:
> > 1) build webs;
> > 2) fill web->class*, and additional field web->pref_class** by method
> > similar to regclass.c. It's a lot of approximations;
> > 3) pre-reload all insns for satisfying constraints in all insns and
> > classes which was recorded in web->class;
>
> I see.
>
> > 4) Clean and true colorization.
>
> That would be cool, yes ;)
Don't hope it was a joke ;-)
Denis.
From: Michael Matz <matz@suse.de>
Subject: Re: Example of delayed remat
To: Denis Chertykov <denisc@overta.ru>
Date: Tue, 11 Feb 2003 17:28:35 +0100 (CET)
Hi Denis,
As usual, I was very busy to get our version of gcc 3.3 to work correctly
for our distribution, so the delay. That's why I'm also not further with
merging.
On 4 Feb 2003, Denis Chertykov wrote:
> > with you latest changes. It gets further than before, but compiling
> > libjava has a problem:
> > libjava/java/lang/Runtime.java: In method `java.lang.Runtime.exit(int)':
> > libjava/java/lang/Runtime.java:280: internal compiler error:
> > in fixup_abnormal_edges, at reload1.c:9487
>
> Yes, I got this error today. I will try to fix it.
Thank's. (Just one side-note, the IBM and Apple people seem to want to
use the reg-alloc branch too, but want to start from a bootstrapping
version first).
> > Currently it seems I can't write to my ftp directory. Shall I send
> > it to you per mail (52KB)?
>
> Yes.
Attached. It's now nearly two weeks old ;( But at least it should give
you an idea, what I've done to different parts of gcc. When applied to
the branch it doesn't work at all, though, so be careful ;-)
> > > PS: I have started to work on regclass.c which I want to rewrite to work with
> > > webs.
> >
> > Ugh. regclass.c is so slow. I had hoped we can avoid it to a large
> > parts. And at least for x86, x86-64 and probably other regular
> > architectures (i.e. not too many funny insn constraints), it already seems
> > to work quite well without it.
>
> How to avoid it ? (use pre-reload as used today)
Yes this I meant. With a nice pre-reload regclass isn't needed anymore
(well, what is done by regclass, would be done by pre-reload, and partly
is already).
> > ;-) Somewhen, somewhen ... we will clean this up ;)
>
> I leave pre-reload because I'm not sure that it will a part of
> allocator. May be allocator will successfully work only with 2) from
> my following scheme.
>
> IMHO: pre-reload a very experimental part.
> IMHO: it's a successful experiment, but not ideal and not complete.
I agree definitely, that it is successfull. The quality of the code
improves quite much with it (not surprisingly).
> Are you interested in reloads count after allocation ?
> (I mean reloads with type != optional)
That's also an interesting measure, yes. If you look at my patch, there
is a new dump pass in toplev.c (print_costs_bb) , which tries to generally
count the cost of all load and store insns which point into the frame.
There we could also print the number of reloads, we just need to remember
them from reload().
Ciao,
Michael.
[2. application/x-gunzip; to-denis.diff.gz]...