This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

New brach 'yara-branch' is created


 I've created a branch for my allocator project which is called Yet
Another Register Allocator (or YARA - yet another recursive acronim).

 I am think I reached the point when my work on a public branch can
be made.  I am focused only on x86 right now.  So YARA will work only
for x86 and probably for x86_64 for some tests (at least for SPEC).
There is no way that it will work for PPC, Itanium or other targets
for now.

 I wanted to report the current state of YARA on coming gcc summit.
Unfortunately my proposal was rejected by the review committee.
Therefore I try to describe YARA briefly here to prevent questions for
me (sorry I can not spend much time on answering the questions).

1. Changes in the infrastructure in comparison with the report on
  previous gcc summit.

 o Copy activeness has been removed because it complicates the
   implementation a lot.

 o More coarse grain allocnos which is now close to store ranges (see
   for example, Kolte's article load/store range analysis for global
   register allocation).

o New notion CAN (set of connected allocnos) is added.

2. Currently YARA has the following passes (in the same order as they work).

 o build IR (allocnos, copies, allocno-allocno conflicts,
             allocno-copy conflicts).
 o GVN (global value numbering) and removing conflicts for allocnos
   with the same value.
 o Building CANs and CAN conflicts
 o optimistic and extended CAN coalescing (optional)
 o Calculating register costs and classes for CAN (only classes from
   class cover set are used.  Cover set is a set of intersected
   classes covering all register classes.  The set is calculated such
   way that moving between the register classes included in a cover
   set class is cheaper than register-memory moves).
 o Briggs CAN coloring (optimistic and biased coloring) with
   optional live range splitting during coloring (it is analogous to
   splitting in Chow's article about priority based coloring or
   Bergner's article).
 o Optional Chow's priority based coloring is implemented too but
   it is practically debugged because the code is a bit worse than
   one generated by Briggs coloring.
 o Local allocation done on allocno level: satisfying insn constraint,
   register elimination, allocno spilling.  This pass does what reload
   does in the current gcc.
 o Coalescing on allocno level.
 o Synchronization optimization (analogous to one described in ...):
   redundant move removing and changing ld/st by register moves.
 o RTL rewriting

 YARA still uses old allocator passes (regclass and post-reload) for
some reasons but I hope finally it will not needed.

3. x86 YARA on SPEC

SPEC was tested on 3.2Ghz Pentium4 on gcc mainline as of Dec. 2005
with option -O2 -mtune=pentium4 (for base) and -O2 -mtune=pentium4
-fyara (for peak).

The results are the following: no improvement on SPECINT2000, 4-5%
improvement for SPECFP2000, code size is practically the same,
compiler 8% slower on SPECINT (in other words YARA is 2 times slower
than the current gcc allocator, SPECFP has some anomaly tests where
compiler 40% slower).  You can find more info about SPEC at the email
end.

I have 3% improvement of SPECFP for x86_64 (on Intel Nocona, sorry I
have no free AMD machine and I think the results for AMD will be
worse) but I had no time to work on its improvement.

If you are interesting where YARA spend more time:

It is local allocator (about 40%) and building IR (15%).  So they might
be rewritten (again).

YARA is a long project so please don't expect that it will be in main
line soon.  Moreover even if YARA is successful don't expect reload
will be gone after that (all target ports are bound to reload very
tight that is what can I say after long work on the allocator).

What I am going to do in short perspective is
 o work on code quality of some SPECINT tests (e.g. reload is doing
   better job for crafty with many multi-registers than YARA)
 o removing regressions for x86_64
 o make it work for PPC and Itanium
 o tuning for x86_64 (now YARA is very bad when a lot of hard registers
   used in RTL that what differs x86_64 code from x86 one) and than
   other platforms
 o work on its speeding up

In long perspective I am going to add more passes:
 o rematerialization (at least for const and bp+const that is what
   reload does now).
 o register pressure relief (splitting during coloring does not see
   CFG well).
 o Uncoalescing during splitting (for better splitting decision
   during coloring).
 o Chameleon intervals: instead of

   M:=R0                   Swap R0, R1
   R0:=R1     generate     Use R0
   use R0                  Swap R0, R1
   R0:=M

 o solving debug information problem (too many location where the
   variable can be placed)


Estimated Estimated Base Base Base Peak Peak Peak Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio ------------ -------- -------- -------- -------- -------- -------- 164.gzip 1400 149 942* 1400 144 971* 175.vpr 1400 236 594* 1400 236 592* 176.gcc 1100 96.4 1141* 1100 96.7 1138* 181.mcf 1800 262 688* 1800 261 689* 186.crafty 1000 100 998* 1000 107 932* 197.parser 1800 232 776* 1800 228 791* 252.eon 1300 172 755* 1300 173 750* 253.perlbmk 1800 153 1174* 1800 135 1338* 254.gap 1100 103 1065* 1100 106 1038* 255.vortex 1900 175 1087* 1900 175 1085* 256.bzip2 1500 196 764* 1500 199 756* 300.twolf 3000 376 798* 3000 390 770* Est. SPECint_base2000 879 Est. SPECint2000 881


Estimated Estimated Base Base Base Peak Peak Peak Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio ------------ -------- -------- -------- -------- -------- -------- 168.wupwise 1600 174 919* 1600 156 1027* 171.swim 3100 471 659* 3100 458 678* 172.mgrid 1800 273 660* 1800 283 637* 173.applu 2100 235 894* 2100 227 925* 177.mesa 1400 181 775* 1400 185 757* 178.galgel 2900 550 527* 2900 548 529* 179.art 2600 689 377* 2600 695 374* 183.equake 1300 108 1204* 1300 107 1216* 187.facerec 1900 370 514* 1900 254 748* 188.ammp 2200 433 508* 2200 426 516* 189.lucas 2000 247 808* 2000 216 924* 191.fma3d 2100 331 635* 2100 310 677* 200.sixtrack 1100 197 557* 1100 208 529* 301.apsi 2600 399 652* 2600 398 654* Est. SPECfp_base2000 664 Est. SPECfp2000 696


Code size (text segment)


----------------CINT2000-----------------
1.047%          33613          33965 164.gzip
-0.785%         130485         129461 175.vpr
-1.212%        1248740        1233604 176.gcc
2.131%           9760           9968 181.mcf
3.360%         196677         203285 186.crafty
-0.129%          86944          86832 197.parser
3.250%         419854         433499 252.eon
-0.353%         466461         464813 253.perlbmk
-0.155%         424017         423361 254.gap
0.267%         540252         541692 255.vortex
-0.149%          28857          28814 256.bzip2
-2.669%         187603         182595 300.twolf
Average = 0.306833%
----------------CFP2000-----------------
-2.888%          25483          24747 168.wupwise
0.106%           7512           7520 171.swim
-1.768%          12671          12447 172.mgrid
3.579%          46718          48390 173.applu
0.456%         452877         454941 177.mesa
2.806%         236603         243243 178.galgel
0.936%          13246          13370 179.art
0.464%          17246          17326 183.equake
-3.257%          60917          58933 187.facerec
2.970%         106134         109286 188.ammp
-11.536%          45909          40613 189.lucas
3.402%         954621         987093 191.fma3d
-2.581%         831071         809623 200.sixtrack
-0.088%         109703         109607 301.apsi
Average = -0.435182%


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]