This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Some thoughts and quetsions about the data flow infrastructure


Vladimir N. Makarov wrote:
>> Vlad,
>> I think that different people can have different perspectives. 
>> You have been working on improving the register allocation for several
>> years, but very little has come of it because the reload
>> infrastructure does not suit itself to being integrated with modern
>> register allocators.  You have spent several years of work without
>> touching the underlying problem that reload is generally going to
>> defeat almost any effort to get good benefits out of a new register
>> allocator.  I do not want to denigrate your work in any way, but at
>> the end of the day, any new register allocator will be compromised by
>> the existing reload implementation.
>>
>
> Ken, to be exact I've been working for a bit more 2 years on the
> register allocator itself.  Probably you don't know I attacked exactly
> underlying problem - the reload.  If you look at YARA branch, it is
> register allocator without the reload.  And I am really worring that
> there is a little outcome.  But ridding the reload off is so complex
> problem. I can not work on it for a few more years. I need some result
> too.
>
> Using the experience I've got from YARA branch I've created another
> register allocator (IRA branch) to make it ready for gcc-4.4.  IRA
> still uses reload.  But may be I have higher standards.  I don't want
> to include the code in sake of inclusion.  I have no worse code
> generated by IRA than the current register allocator.  The code size
> is smaller for the most platforms.  For some platforms I have better
> generated code (up to 4% on SPECINT2000 in 32-bt mode: bo exact 1930
> vs 1850 for Core2 according this weekend benchmarking).
>
> Actually I could make IRA ready for gcc4.3.  It works for x86, x86_64,
> itanium, ppc, sparc, s390, arm.  It is optional so other platform can
> use the current register allocator.  But I don't to rush.
>
> But still you are right the reload comprising the generated code.  If
> you are interesting in my opinion, df infrastracture is a tiny part of
> RA implementation problem (and as I understand insn scheduler and code
> selection).  Actually IRA uses he tDF infrastracture but it can be easily
> to be switched to the old life anaysis.
>
>> I am interested bringing the rest of the back end into the modern
>> world.  While some of the passes can and should be moved into the ssa
>> middle end of the compiler, there are several optimizations that can
>> only be done after the details of the target have been fully exposed.
>>
>
> Bring the rest of the back end into the modern word is too chalenging
> task.  If you really want it, imho you should attack RTL and machine
> descriptions.  But this task is a magnitude more difficult than
> introducing tree-SSA.
>
It is a hard project and you are right that replacing rtl would be
better.  However, I do know how to do that either from a logistical
point of view or from the point of view of having a better replacement 
that covers all of the platforms as well.

However there are a lot of sins in the back end and a large number of
them are either being directly addressed by this replacement or are now
accessible.  The addition of df will allow others to introduce better
technology.

>> My experience with trying to do this was that the number one problem
>> was that the existing dataflow is in many cases wrong or too
>> conservative and that it was not flexible enough to accommodate many
>> most modern optimization techniques.  So rather than hack around the
>> problem, I decided to attack the bad infrastructure problem first, and
>> open the way for myself and the others who work on the back end to
>> benefit from that infrastructure to get the rest of passes into shape.
>>
>
> I am not against a new DF infrastracture, more accurate one. I am
> against a slower infrastracture.
>
>> There are certainly performance issues here.  There are limits on
>> how much I, and the others who have worked on this have been able to
>> change before we do our merge.  So far, only those passes that were
>> directly hacked into flow, such as dce, and auto-inc-dec detection
>> have been rewritten from the ground up to fully utilize the new
>> framework.  However, it had gotten to the point where the two
>> frameworks really should not coexist.  Both implementations expect
>> to work in an environment where the information is maintained from
>> pass to pass and doing with two systems was not workable.  So the
>> plan accepted by the steering committee accommodates the wholesale
>> replacement of the dataflow analysis but even after the merge, there
>> will still be many passes that will be changed.
>
> Does it means that compiler will be even more slower?
>
>> I would have liked
>> to have the df information more tightly integrated into the rtl
>> rather than it being on the side.  It is cumbersome to keep this
>> information up to date.  However, the number of places in the
>> backends that depend on the existing rtl data structures apis make
>> such a replacement very difficult.  I do believe that by the time
>> that we merge the branch, we will be down to a 5% compile time
>> regression.  While I would like this number to be 0% or negative, I
>> personally believe that having precise and correct information is
>> worth it and that over time we will be able to remove that 5%
>> penalty.  As far as the other regressions, these will be dealt with
>> very soon.
>
Algorithmically, the core of the dataflow branch, the scanning, the
setting up of the problems and the solution finder are pretty good. 
People with the skill set and the interest are still going to be able to
shave a little time here and there but that is not likely to be where
the big performance gains are going to come from. 

The likely source of many of the performance issues are based on our
still having to maintain some of the older outdated datastructures. 
Regs_ever_live is the poster child of this.  In theory regs_ever_live is
easy, it is just the set of hard registers that are used.  In practice
this is a disaster to keep track of because it was only updated
occasionally and its values are "randomly" changed by the backends in
totally undocumented ways.  Maintaining regs_ever_live requires a lot of
special mechanism that slows down a the incremental scanning.  To the
extent that someone just wants to know is some register used, df has
accessable are counters for each reg that give the exact number of uses
and defs.  However, because of the manual fiddling of regs_ever_live, it
is impossible to just replace one structure with another.  It is quite
likely that 1% of slowdown can be charged to having to maintain this
duplicate structure.

These older data structures, with time, will be replaced with direct
access to df's "informantion on the side", but for now it is just too
much work to try to fix them all of the merge. 

> Great!
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]