This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFA] The Integrated Register Allocator

From: Kenneth Zadeck <zadeck at naturalbridge dot com>
To: Vladimir Makarov <vmakarov at redhat dot com>
Cc: Kenneth dot Zadeck at NaturalBridge dot com, gcc-patches at gcc dot gnu dot org
Date: Tue, 01 Apr 2008 21:23:40 -0400
Subject: Re: [RFA] The Integrated Register Allocator
References: <87y77x9vd1.fsf@moria.site> <47F2D25D.9050300@redhat.com>

Vladimir Makarov wrote:

Kenneth Zadeck wrote:
I have made a cursory look over the code, and there are some things
that I like about the new allocator and some things that cause me
great concern.
The thing that I like about it is that some of the parts of it are true steps forward. The thing that bothers me is that there is not just some backward steps but there appear to be a lot of sideways steps. When we committed the actual dataflow branch, the patch was about twice as large as this one, but to a very large extent, it was a finished project. Certainly there were loose ends and fixes on minor ports, but the dataflow branch essentially replaced the existing dataflow, and there was nothing (or at least very little) left of the old framework.
IMHO that was a mistake because now we can not see what the dataflow
framework giving in terms of performance, compilation speed and so on.
Although I understand that was very hard to keep the old code with the
new one.

I disagree completely. The purpose of the dataflow branch was most correctness and sanity. We essentially had the case where every pass was doing something different and there were only informal and generally violated agreements as to what it meant for correctness. It was virtually impossible to make progress on improving the backend with such a mismash of crap.

My perspective on gcc is perhaps different from others in the community. My first concern is correctness and my second concern is maintainability. The rest of my concerns always follow these two. I looked as at the dataflow branch as a way to address a deadly cycle in gcc development at the rtl level, the analysis was neither very good nor was it correct, but it did not matter because the transformations were not powerful enough to take advantage of good information.

The only way to break this cycle was to get the analysis correct and good and with time, people would have the courage to start doing aggressive transformations. The dataflow provided a set of interfaces and invariants that the entire back end now must respect but at the same time time rely on.

Here we have a patch that inserts a new allocator, but the old one has
been left "largely" in place.   While the dataflow branch did not work
on ever supported target when it was committed, we had at least tested
it ourselves or had it tested on almost every platform (except the
vax, knuth's machine and the mc6810).   Furthermore there was a
commitment to the steering committee and the community that we would
either fix ourselves or at least help with the problems that arose on
all of the active platforms and by the time that stage 2 closed, we
had pretty much accomplished this.


Ken, As i wrote my wish is to remove the old RA.  But we should be
realists.  It will not happen soon.

The differences between dataflow framework and IRA is that one
infrastructure (and by itself should not improve code performance) and
another is an optimization (most machine dependent one).  It is easy
to switch IRA on for other targets by adding a macro in
machine-dependent file.  But to make it right, a lot of benchmarking
and tuning should be done because it is an optimization which should
work at least not worse than the old version.

You did dataflow framework benchmarking for a few targets.  That is
practically the same targets for which IRA is currently implemented.
You did not do benchmarking (just testing) for all GCC targets which
is probably ok because it is a framework.

Before the code went in, we knew where we stood on almost every target that gcc supported. We were working with either the maintainers or at least people of interest on almost every platform. We were in good shape on the targets that you were on, but we understood pretty well how close we were (or in the case of the pa risc, how far we were) from closure. Our only surprise was the alpha. That gave us the confidence to commit without a backup plan.

My point is that it is good to have the old RA for sometime at least
for comparison purposes until IRA matures.  But as I said, imho the
goal should be removing the old register allocator.

How fast it will happen depends on other people (mainly other port
maintainers).  Again I am ready to help anyone.

I am especially concerned that this patch is setting up for something
that is going to make the continued maintenance of reload, which is at
the core of the rtl level of the compiler, much more difficult because
there are now two completely separate code paths thru it that need to
be kept working.  I understand that gcc has passes that are optional
for individual targets and that this is what is necessary, but none of
them are as intertwined in the core rtl machinery as this patch.  If
the plan is that by the end of this release that we are going to be
there, then this particular objection is moot.


The changes to the reload is very small (less than hundred lines).  It
is still the same reload.  The additional code is just impelementation
of better communication between the reload and IRA.

I understand that people do not like the reload and want to remove (or
rewrite) it and probably it should be done sometime.  But what is your
proposal here.  Spend a few years more to remove the reload when there
something which aready impoves the RA.

I did not expect that you would have replaced reload, though i would have been the first to stand up an applaud had you done so. What i am worried about is that reload is a strange beast that is prone to hard to diagnose errors. During the transition period, which i gather may take several years, we now have two register allocators that are generating problems for reload to solve.

Again, in my book it comes down to correctness and maintainability.

First of all, I already tried this and spent about year working on
this (please see yara-branch).  I know how hard it is.  There were
also other unsuccesfull attempts of other people too.  Second, I am
not sure that the final result will be better with the performance
point of view.

I don't think such approach "all or nothing" (new RA without the reload
or nothing) will work.

On the other hand, if the plan is to have two register allocators for
the foreseeable future, this patch raises several troubling issues in
addition to reload: for instance, the new allocator seems to use sets
of registers rather than register classes.  I think that this is an
excellent idea.  However, if we are going to live with both allocators
for a long time, then we should make global use the same data
structure.   This will allow the continued evolution or removal of
passes that feed into the ra stack.


I am disagree.  I don't think we should spend our time on the old
register allocator if the final goal is to remove it.  I worked on the
improvement of the old register allocator too.  The result of my work
was realizing necessity of a new register allocator if we want to
achieve code performance improvements.  Therefore I started two
projects YARA (a new register allocator without the reload) and more
realistic IRA (a new register allocator with the reload).

My next concern is that the new allocator seems to be a large
regression with respect to building conflict graphs.  Peter Bergner
and my self have spent a lot of time replacing the conflict builder in
global with a modern algorithm that is both efficient and very
precise, and yet this patch essentially regresses back to the
technology used in global before we started.  In general the problem
is that this work seems to have started from where we were three years
ago and there has been minimal work to keep up with the improvements
that have surrounded the register allocator.  While I am certainly for
an allocator that improves the quality of code that is generated,
I wonder how much better it could have been had it integrated the
other improvements rather than simply replaced them.


You are welcome to improve it in IRA if it gives some performance
improvement (in speed, space, or code quality).

As I remember correctly Peter made conflict matrix more compact and
reported no visible improvements of the compilation speed.  IRA is
different from the old register allocator.  It is a regional register
allocator.  There are much more short lived allocnos in IRA.  It means
that the conflict matrix is bigger and more sparse.  Therefore IRA
uses bit vector or array of conflicts (sometime with duplication)
whatever is more profitable for an allocno.  This implementation is
hidden in abstraction.  IRA code just saying iterate on all conflicts
of allocnos or add/check a conflict and so on.  Also I keep conflicts
such way only for allocno pairs of the same register class.  It is
still necessary to have all conflicts for coloring stack slots.  I use
even more compact information for this (allocnos live ranges of
program points where the allocnos live).  Live ranges is also
necessary for fast transformation IRA regional internal representation
(after creation of new pseudo-registers and emitting pseudo-register
shuffles on region borders) into flat form used during the reload.  So
I am affraid that Peter's implementation is not more superior.  If it
was not true, I'd really use Peter's work.

Another improvement was conflict graph building on backward traverse
of the BBs.  It is not so important and principal and can be easy
changed in IRA because it is a very small part of IRA.  I did not do
this because we still have register notes.  You can not remove it
because it is needed in the reload.  If it is removed, as I wrote it
is easily to change in IRA.

One more change was taking subregs into account in building conflict
graph.  But I did not see a report (may be I missed) what kind code
performance improvement it gives.  I don't think it is important after
we have subreg lowering pass which breaks subregs of muti-word pseudos
in different pseudos.  If it is really important and you can show this
we could talk more about implementation of it in IRA.

And the last change which I remember (may be i missed something) was using live information with partial availability (LIVE instead of LR in dataflow terms). Actually it was my idea when I worked on the old register allocator improvement long ago. I should acknowledge that was my mistake, the improvement was a very small and only in one case for SPEC95. It created more problems than it is worth.

GCC is a moving target, and certainly I learned on the dataflow branch that if you do not continually track the changes that are happening in the rest of the compiler and the community, you run into these kinds of issues. I think that there needs to be an explicit plan as to both how and how quickly we are going to seamlessly integrate this technology. The patch as it stands as something to just be bolted onto the side of gcc is not really acceptable.


I realized that you have a good point here.  I think it is worth to
make IRA a default RA for the targets it is currently implemented for.
In this case it will not be on the side.  How fast removing the old RA
will be dependent on other people work.  I am ready to help in this.
I think it is possible to remove the old IRA in the next release.

I would personally love to see a series of patches that do things like
change the entire compiler from register classes to register sets.  I
think that some of the changes that vlad makes are long overdue.  Such
of series of patches would allow the community to move forward with
other changes in the compiler, but to have to live with will become a
pastiche of technologies I believe is going to cause more problems
than it is going to solve.


Sorry, I took this path once 4 years ago when I tried to improve the
old register allocator.  I don't want to make the mistake again.

Some things like better communication with the reload as it done in
IRA is very hard to implement in the old register allocator.  It is
better throw it away and write a new RA.  That is what I've done.

Sorry it is hard to me to understand.  In one place you are concerned
that the reload stays (and removing it is more revolution step) and
now you are proposing to improve the old register allocator and take
more evolution steps.

I am talking in both directions because i do not see a plan. It is certainly not clear to me how long we have to live with both allocators and so i am in reality trying to pin you down to get a plan on when we only have one path to maintain.

I personally dislike the Chow based approach, but if the improvements
stand over what we currently have, then I am willing to let it go.
My primary concern is that the integration is so superficial that it
will be years before we can move forward after this.


Chow's algorithm is simple and good but only as a first step of RA
implementation.  It is worse than Chaitin-Briggs especially than a
regional one.  Even Fred Chow does not use it any more.  If you look
at Pathscale compiler sources where Chow is an architect and also
major implementer of RA for x86/x86_64, he does not use his own
algorithm.


And, Ken, thanks for the email helping to express my position more
clear.

Vlad

Follow-Ups:
- Re: [RFA] The Integrated Register Allocator
  - From: Mark Mitchell
- Re: [RFA] The Integrated Register Allocator
  - From: Eric Botcazou
- Re: [RFA] The Integrated Register Allocator
  - From: Vladimir Makarov

References:
- Re: [RFA] The Integrated Register Allocator
  - From: Kenneth Zadeck
- Re: [RFA] The Integrated Register Allocator
  - From: Vladimir Makarov

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]