This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
RFA: merging dfa-branch into the main trunk
- From: Vladimir Makarov <vmakarov at redhat dot com>
- To: gcc at gcc dot gnu dot org
- Date: Tue, 26 Feb 2002 18:23:08 -0500
- Subject: RFA: merging dfa-branch into the main trunk
Hello, I'd like to get an approval for merging the dfa-branch into
the main trunk. Currently the branch contains dfa descriptions for
ultrsparc and sh4.
David Edelsohn <dje@watson.ibm.com> wrote:
> Does the new scheduler produce any significant performance
> improvement in the code GCC generates commensurate with its compile-time
> cost?
>
> Both my experience with the new DFA scheduler targeted at PowerPC
> delivered with GNUPro and the GCC for IA-64 Summit minutes report that the
> scheduler compile time was no faster than the Haifa Scheduler and the
> scheduler did it not generate faster code. The software pipelining did
> not seem to be effective either, mainly because of dependency information
> infrastructure still lacking in GCC.
>
> In theory, the DFA scheduler and software pipelining should be
> much better. Until the new work demonstrates an improvement and is shown
> to be robust, I think it should be on a branch, as GCC's development
> policy specifies.
>
> What evidence shows that the DFA scheduler and software pipeliner
> have evolved beyond a work in progress?
David Miller wrote pipeline description for ultrasparc processor.
Dan Nicolaesku ran SPEC95 tests comparing gcc with traditional and
dfa-based scheduler
http://gcc.gnu.org/ml/gcc/2001-11/msg00736.html
DFA-based scheduler generates always a better code for any SPEC95
tests. One SPECfp95 test was speeded up to 11%.
Another important thing is that sparc.c was decreased on several
hundred lines which was mainly tuning the scheduler to ultrasparc
for better insn scheduling.
Naveen Sharma wrote a sh4 dfa pipeline description and reported
12-13% on SLALOM benchmark.
http://gcc.gnu.org/ml/gcc-patches/2001-12/msg02157.html
Mike Meissner reported 5-15 percent speedup by going to the DFA
scheduler for a particular MIPS target on EEBC.
http://gcc.gnu.org/ml/gcc/2001-09/msg00061.html
As for ppc I've tried to improve the description several times for
ppc750/ppc7400 but have no visible improvement for real tests like
gcc. Although I got some improvement on few small tests, e.g. hanoi
(about 3%).
/home/vmakarov/build/gcc-dfa-branch1/toymac/bin/gcc -O2 -mcpu=7400
/home/vmakarov/aburto/hanoi/hanoi.c -DUNIX -o f1 -lm -g -mdfa;./f1
/home/vmakarov/aburto/hanoi/hanoi.c: In function `main':
/home/vmakarov/aburto/hanoi/hanoi.c:64: warning: return type of `main'
is not `int'
Towers of Hanoi Puzzle Test Program (27 Oct 94)
Disks Moves Time(sec) Moves/25usec
16 65535 0.00000 inf
17 131071 0.02000 163.8388
18 262143 0.02000 327.6788
19 524287 0.04000 327.6794
20 1048575 0.08000 327.6797
21 2097151 0.16000 327.6798
22 4194303 0.31000 338.2502
23 8388607 0.62000 338.2503
24 16777215 1.24000 338.2503
25 33554431 2.47000 339.6197
26 67108863 4.94000 339.6198
27 134217727 9.88000 339.6198
28 268435455 19.77000 339.4480
29 536870911 39.64000 338.5916
Average Moves Per 25 usec = 337.7033
/home/vmakarov/build/gcc-dfa-branch1/toymac/bin/gcc -O2 -mcpu=7400
/home/vmakarov/aburto/hanoi/hanoi.c -DUNIX -o f1 -lm -g;./f1
/home/vmakarov/aburto/hanoi/hanoi.c: In function `main':
/home/vmakarov/aburto/hanoi/hanoi.c:64: warning: return type of `main'
is not `int'
Towers of Hanoi Puzzle Test Program (27 Oct 94)
Disks Moves Time(sec) Moves/25usec
16 65535 0.00000 inf
17 131071 0.02000 163.8388
18 262143 0.02000 327.6788
19 524287 0.04000 327.6794
20 1048575 0.08000 327.6797
21 2097151 0.16000 327.6798
22 4194303 0.32000 327.6799
23 8388607 0.64000 327.6800
24 16777215 1.27000 330.2601
25 33554431 2.55000 328.9650
26 67108863 5.10000 328.9650
27 134217727 10.19000 329.2878
28 268435455 20.37000 329.4495
29 536870911 40.74000 329.4495
Average Moves Per 25 usec = 328.8241
Old description model does not permit to describe ppc750/ppc7400
reservations kind of 2-1-1 (where numbers are cycles taken for
execution of an insn in each floating point pipeline stages) or cr*
insns serialization. Probably therefore hanoi test got the
improvement. But in overall as I said there is no visible advantages
of usage of dfa-scheduler for the ppc processors. I can explain it
only by that the ppc processors are out-of-order execution processors.
The advantage of the dfa-based scheduler usage definitely exists for
classical (not out-of-order/speculative) RISC processors. The more
processor has irregular pipelines (e.g. sh4), the bigger improvement
we have.
Actually, I did not expect such big improvement for ultrasparc and
sh4. As I already wrote I originally positioned the DFA based
pipeline hazard recognizer as more infrastructure with simpler
interface and more readable description, faster pipeline hazard
recognizer,
and finally for better future insn schedulers when you can try easily
more insn schedules for the same time to choose the best one.
I think that the current dfa-based scheduler proved itself as useful
and robust and deserves moving it into the main trunk.
After moving the dfa-branch code into the main trunk, I could commit
RCSP into the branch and work on its improvement on the branch.
I would like to say thanks to Richard Henderson, Mike Meissner,
David Miller, Naveen Sharma, Bernd Schmidt, Jan Hubicka, Dan
Nicolaesku and all others (sorry if I missed you) for the
help and helpful comments.
Vladimir Makarov