This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug rtl-optimization/29874] New: gcc-4.1.1 generates consistently worse performming SSE code than gcc-3.4.6


Hello,

this is in a sense continuation of

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29818

, the discussion on performance.

Here I'll present performance numbers obtained with widely available GPL'ed
code - fftw-3.1.2.

I did the following:

1) built gcc-3.4.6;
2) ran 10 times this command line:

/usr/bin/time /maxtor5/sergei/AppsFromScratchWD/build/fftw-3.1.2/tests/bench
--speed if524288 -v4 -oexhaustive

- 'fftw-3.1.2/tests/bench' comes with fftw-3.1.2.

3) built gcc-4.1.1;
4) repeated '2)'.


Here are the results.

gcc-3.4.6:


Problem: if524288, setup: 30.90 s, time: 88.12 ms, ``mflops'': 565.2
31.26user 0.21system 0:31.76elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5107minor)pagefaults 0swaps
Problem: if524288, setup: 30.90 s, time: 88.33 ms, ``mflops'': 563.86
31.32user 0.21system 0:31.75elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5136minor)pagefaults 0swaps
Problem: if524288, setup: 30.89 s, time: 88.51 ms, ``mflops'': 562.76
31.20user 0.24system 0:31.69elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5134minor)pagefaults 0swaps
Problem: if524288, setup: 30.93 s, time: 88.49 ms, ``mflops'': 562.86
31.41user 0.20system 0:31.84elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5130minor)pagefaults 0swaps
Problem: if524288, setup: 30.90 s, time: 88.55 ms, ``mflops'': 562.45
31.35user 0.22system 0:31.82elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5133minor)pagefaults 0swaps
Problem: if524288, setup: 31.25 s, time: 90.50 ms, ``mflops'': 550.37
82.48user 0.46system 1:23.56elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+13044minor)pagefaults 0swaps
Problem: if524288, setup: 30.89 s, time: 88.11 ms, ``mflops'': 565.29
31.24user 0.21system 0:31.70elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5130minor)pagefaults 0swaps
Problem: if524288, setup: 30.89 s, time: 88.29 ms, ``mflops'': 564.15
31.25user 0.24system 0:31.75elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5134minor)pagefaults 0swaps
Problem: if524288, setup: 30.85 s, time: 87.81 ms, ``mflops'': 567.2
31.26user 0.21system 0:31.70elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5130minor)pagefaults 0swaps
Problem: if524288, setup: 30.89 s, time: 88.71 ms, ``mflops'': 561.45
87.62user 0.44system 1:28.72elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+13294minor)pagefaults 0swaps
;

gcc-4.1.1:


Problem: if524288, setup: 32.13 s, time: 91.64 ms, ``mflops'': 543.53
32.51user 0.23system 0:33.01elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5114minor)pagefaults 0swaps
Problem: if524288, setup: 32.11 s, time: 92.67 ms, ``mflops'': 537.45
84.25user 0.45system 1:25.31elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+13295minor)pagefaults 0swaps
Problem: if524288, setup: 32.16 s, time: 92.33 ms, ``mflops'': 539.44
84.84user 0.46system 1:25.94elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+13301minor)pagefaults 0swaps
Problem: if524288, setup: 32.18 s, time: 92.54 ms, ``mflops'': 538.22
85.41user 0.49system 1:27.18elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+13299minor)pagefaults 0swaps
Problem: if524288, setup: 32.19 s, time: 91.40 ms, ``mflops'': 544.91
32.54user 0.22system 0:33.03elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5139minor)pagefaults 0swaps
Problem: if524288, setup: 32.17 s, time: 92.60 ms, ``mflops'': 537.9
91.29user 0.45system 1:32.42elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+13301minor)pagefaults 0swaps
Problem: if524288, setup: 32.20 s, time: 91.83 ms, ``mflops'': 542.37
32.60user 0.24system 0:33.08elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5140minor)pagefaults 0swaps
Problem: if524288, setup: 32.15 s, time: 91.82 ms, ``mflops'': 542.42
32.60user 0.22system 0:33.04elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5138minor)pagefaults 0swaps
Problem: if524288, setup: 32.16 s, time: 91.37 ms, ``mflops'': 545.12
32.54user 0.23system 0:32.99elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5140minor)pagefaults 0swaps
Problem: if524288, setup: 32.11 s, time: 91.24 ms, ``mflops'': 545.89
32.48user 0.21system 0:32.92elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5141minor)pagefaults 0swaps
.

IMO difference in favor of gcc-3.4.6 is seen with naked eye (see, for example,
``mflops'' - larger numbers are better).

Say, let's compare worst numbers:

gcc-3.4.6 : 550.37
gcc-4.1.1 : 537.45
.

I think  it's worth porting gcc-3.4.6 x86 optimization engine to gcc-4.*
series.


-- 
           Summary: gcc-4.1.1 generates consistently worse performming SSE
                    code than gcc-3.4.6
           Product: gcc
           Version: 4.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: sergstesh at yahoo dot com
 GCC build triplet: Linux comp.home.net 2.6.12-27mdk-i686-up-4GB #1 Tue Sep
                    26 12:41
  GCC host triplet: Linux comp.home.net 2.6.12-27mdk-i686-up-4GB #1 Tue Sep
                    26 12:41
GCC target triplet: Linux comp.home.net 2.6.12-27mdk-i686-up-4GB #1 Tue Sep
                    26 12:41


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29874


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]