This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/29874] New: gcc-4.1.1 generates consistently worse performming SSE code than gcc-3.4.6
- From: "sergstesh at yahoo dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 17 Nov 2006 01:24:28 -0000
- Subject: [Bug rtl-optimization/29874] New: gcc-4.1.1 generates consistently worse performming SSE code than gcc-3.4.6
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
Hello,
this is in a sense continuation of
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29818
, the discussion on performance.
Here I'll present performance numbers obtained with widely available GPL'ed
code - fftw-3.1.2.
I did the following:
1) built gcc-3.4.6;
2) ran 10 times this command line:
/usr/bin/time /maxtor5/sergei/AppsFromScratchWD/build/fftw-3.1.2/tests/bench
--speed if524288 -v4 -oexhaustive
- 'fftw-3.1.2/tests/bench' comes with fftw-3.1.2.
3) built gcc-4.1.1;
4) repeated '2)'.
Here are the results.
gcc-3.4.6:
Problem: if524288, setup: 30.90 s, time: 88.12 ms, ``mflops'': 565.2
31.26user 0.21system 0:31.76elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5107minor)pagefaults 0swaps
Problem: if524288, setup: 30.90 s, time: 88.33 ms, ``mflops'': 563.86
31.32user 0.21system 0:31.75elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5136minor)pagefaults 0swaps
Problem: if524288, setup: 30.89 s, time: 88.51 ms, ``mflops'': 562.76
31.20user 0.24system 0:31.69elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5134minor)pagefaults 0swaps
Problem: if524288, setup: 30.93 s, time: 88.49 ms, ``mflops'': 562.86
31.41user 0.20system 0:31.84elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5130minor)pagefaults 0swaps
Problem: if524288, setup: 30.90 s, time: 88.55 ms, ``mflops'': 562.45
31.35user 0.22system 0:31.82elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5133minor)pagefaults 0swaps
Problem: if524288, setup: 31.25 s, time: 90.50 ms, ``mflops'': 550.37
82.48user 0.46system 1:23.56elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+13044minor)pagefaults 0swaps
Problem: if524288, setup: 30.89 s, time: 88.11 ms, ``mflops'': 565.29
31.24user 0.21system 0:31.70elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5130minor)pagefaults 0swaps
Problem: if524288, setup: 30.89 s, time: 88.29 ms, ``mflops'': 564.15
31.25user 0.24system 0:31.75elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5134minor)pagefaults 0swaps
Problem: if524288, setup: 30.85 s, time: 87.81 ms, ``mflops'': 567.2
31.26user 0.21system 0:31.70elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5130minor)pagefaults 0swaps
Problem: if524288, setup: 30.89 s, time: 88.71 ms, ``mflops'': 561.45
87.62user 0.44system 1:28.72elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+13294minor)pagefaults 0swaps
;
gcc-4.1.1:
Problem: if524288, setup: 32.13 s, time: 91.64 ms, ``mflops'': 543.53
32.51user 0.23system 0:33.01elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5114minor)pagefaults 0swaps
Problem: if524288, setup: 32.11 s, time: 92.67 ms, ``mflops'': 537.45
84.25user 0.45system 1:25.31elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+13295minor)pagefaults 0swaps
Problem: if524288, setup: 32.16 s, time: 92.33 ms, ``mflops'': 539.44
84.84user 0.46system 1:25.94elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+13301minor)pagefaults 0swaps
Problem: if524288, setup: 32.18 s, time: 92.54 ms, ``mflops'': 538.22
85.41user 0.49system 1:27.18elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+13299minor)pagefaults 0swaps
Problem: if524288, setup: 32.19 s, time: 91.40 ms, ``mflops'': 544.91
32.54user 0.22system 0:33.03elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5139minor)pagefaults 0swaps
Problem: if524288, setup: 32.17 s, time: 92.60 ms, ``mflops'': 537.9
91.29user 0.45system 1:32.42elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+13301minor)pagefaults 0swaps
Problem: if524288, setup: 32.20 s, time: 91.83 ms, ``mflops'': 542.37
32.60user 0.24system 0:33.08elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5140minor)pagefaults 0swaps
Problem: if524288, setup: 32.15 s, time: 91.82 ms, ``mflops'': 542.42
32.60user 0.22system 0:33.04elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5138minor)pagefaults 0swaps
Problem: if524288, setup: 32.16 s, time: 91.37 ms, ``mflops'': 545.12
32.54user 0.23system 0:32.99elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5140minor)pagefaults 0swaps
Problem: if524288, setup: 32.11 s, time: 91.24 ms, ``mflops'': 545.89
32.48user 0.21system 0:32.92elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5141minor)pagefaults 0swaps
.
IMO difference in favor of gcc-3.4.6 is seen with naked eye (see, for example,
``mflops'' - larger numbers are better).
Say, let's compare worst numbers:
gcc-3.4.6 : 550.37
gcc-4.1.1 : 537.45
.
I think it's worth porting gcc-3.4.6 x86 optimization engine to gcc-4.*
series.
--
Summary: gcc-4.1.1 generates consistently worse performming SSE
code than gcc-3.4.6
Product: gcc
Version: 4.1.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: sergstesh at yahoo dot com
GCC build triplet: Linux comp.home.net 2.6.12-27mdk-i686-up-4GB #1 Tue Sep
26 12:41
GCC host triplet: Linux comp.home.net 2.6.12-27mdk-i686-up-4GB #1 Tue Sep
26 12:41
GCC target triplet: Linux comp.home.net 2.6.12-27mdk-i686-up-4GB #1 Tue Sep
26 12:41
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29874