This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

3.0 vs 3.0.1 on oopack's Max


Hi all,

once more, I'm writing to the list to describe a recent performance
regression :( on a simple benchmark.

On my system (PII-400, 256 M, glibc2.2.4, binutils2.11.2) 3.0.1 produces
code much slower than 3.0 for the Max test of the oopack suite. In this
test the following two styles (the first "C-style", the second
"OOP-style") are compared:

void MaxBenchmark::c_style() const  // Compute max of vector (C-style)
{
    double max = U[0];
    for( int k=1; k<M; k++ )   // Loop over vector elements
 if( U[k] > max )
     max=U[k];
    MaxResult = max;
}

inline int Greater( double i, double j )
{
    return i>j;
}

void MaxBenchmark::oop_style() const   // Compute max of vector
(OOP-style)
{
    double max = U[0];
    for( int k=1; k<M; k++ )   // Loop over vector elements
 if( Greater( U[k], max ) )
     max=U[k];
    MaxResult = max;
}

Now, if I compile oopack_v1p8.C with 3.0 and with 3.0.1 on my system
with

    g++ -O2 -finline-limit=600 oopack_v1p8.C

this is what I get as run times:

3.0.1
-----
                         Seconds       Mflops
Test       Iterations     C    OOP     C    OOP  Ratio
----       ----------  -----------  -----------  -----
Max            500000    7.1  19.6   70.6  25.4    2.8


3.0
---
                         Seconds       Mflops
Test       Iterations     C    OOP     C    OOP  Ratio
----       ----------  -----------  -----------  -----
Max            500000    7.1   9.7   70.5  51.7    1.4


It turns out that the core loop over k is compiled in the same way for
the "C-style" case by both the compilers:

 80488d0: dd 02                 fldl   (%edx)
 80488d2: dd e1                 fucom  %st(1)
 80488d4: df e0                 fnstsw %ax
 80488d6: 9e                    sahf
 80488d7: 76 04                 jbe    80488dd
<_ZNK12MaxBenchmark7c_styleEv+0x2d>
 80488d9: dd d9                 fstp   %st(1)
 80488db: eb 02                 jmp    80488df
<_ZNK12MaxBenchmark7c_styleEv+0x2f>
 80488dd: dd d8                 fstp   %st(0)
 80488df: 83 c2 08              add    $0x8,%edx
 80488e2: 49                    dec    %ecx
 80488e3: 79 eb                 jns    80488d0
<_ZNK12MaxBenchmark7c_styleEv+0x20>


On the other hand, for the "OOP-style" case:

3.0.1
-----

 8048910: dd 01                 fldl   (%ecx)
 8048912: dd e1                 fucom  %st(1)
 8048914: df e0                 fnstsw %ax
 8048916: 9e                    sahf
 8048917: 0f 97 c0              seta   %al
 804891a: 83 e0 01              and    $0x1,%eax
 804891d: 74 04                 je     8048923
<_ZNK12MaxBenchmark9oop_styleEv+0x33>
 804891f: dd d9                 fstp   %st(1)
 8048921: eb 02                 jmp    8048925
<_ZNK12MaxBenchmark9oop_styleEv+0x35>
 8048923: dd d8                 fstp   %st(0)
 8048925: 83 c1 08              add    $0x8,%ecx
 8048928: 4a                    dec    %edx
 8048929: 79 e5                 jns    8048910
<_ZNK12MaxBenchmark9oop_styleEv+0x20>

3.0
---

 8048910: dd 03                 fldl   (%ebx)
 8048912: 31 d2                 xor    %edx,%edx
 8048914: dd e1                 fucom  %st(1)
 8048916: df e0                 fnstsw %ax
 8048918: 9e                    sahf
 8048919: 0f 97 c2              seta   %dl
 804891c: 85 d2                 test   %edx,%edx
 804891e: 74 04                 je     8048924
<_ZNK12MaxBenchmark9oop_styleEv+0x34>
 8048920: dd d9                 fstp   %st(1)
 8048922: eb 02                 jmp    8048926
<_ZNK12MaxBenchmark9oop_styleEv+0x36>
 8048924: dd d8                 fstp   %st(0)
 8048926: 83 c3 08              add    $0x8,%ebx
 8048929: 49                    dec    %ecx
 804892a: 79 e4                 jns    8048910
<_ZNK12MaxBenchmark9oop_styleEv+0x20>


By the way, the same performance regression with respect to some weeks
ago happens for recent 3.1 snapshots:

3.1 20010902 (experimental)
---------------------------

                         Seconds       Mflops
Test       Iterations     C    OOP     C    OOP  Ratio
----       ----------  -----------  -----------  -----
Max            500000    7.1  19.7   70.1  25.4    2.8


"C-style"
---------

 80488e0: dd 04 d5 60 b2 04 08  fldl   0x804b260(,%edx,8)
 80488e7: dd e1                 fucom  %st(1)
 80488e9: df e0                 fnstsw %ax
 80488eb: 9e                    sahf
 80488ec: 76 22                 jbe    8048910
<_ZNK12MaxBenchmark7c_styleEv+0x40>
 80488ee: dd d9                 fstp   %st(1)
 80488f0: 42                    inc    %edx
 80488f1: 81 fa e7 03 00 00     cmp    $0x3e7,%edx
 80488f7: 7e e7                 jle    80488e0
<_ZNK12MaxBenchmark7c_styleEv+0x10>


"OOP-style"
-----------

 8048930: dd 04 d5 60 b2 04 08  fldl   0x804b260(,%edx,8)
 8048937: dd e1                 fucom  %st(1)
 8048939: df e0                 fnstsw %ax
 804893b: 9e                    sahf
 804893c: 0f 97 c0              seta   %al
 804893f: 83 e0 01              and    $0x1,%eax
 8048942: 74 1c                 je     8048960
<_ZNK12MaxBenchmark9oop_styleEv+0x40>
 8048944: dd d9                 fstp   %st(1)
 8048946: 42                    inc    %edx
 8048947: 81 fa e7 03 00 00     cmp    $0x3e7,%edx
 804894d: 7e e1                 jle    8048930
<_ZNK12MaxBenchmark9oop_styleEv+0x10>



I hope that some of the gcc developers (perhaps Jan Hubicka?) may take
care of this disappointing behavior!

Regards,
Paolo Carlini.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]