This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
3.0 vs 3.0.1 on oopack's Max
- To: gcc at gcc dot gnu dot org
- Subject: 3.0 vs 3.0.1 on oopack's Max
- From: Paolo Carlini <pcarlini at unitus dot it>
- Date: Fri, 07 Sep 2001 15:41:49 +0200
- CC: jh at suse dot cz, jbuck at synopsys dot com
- Organization: Universita' della Tuscia
- Reply-To: pcarlini at unitus dot it
Hi all,
once more, I'm writing to the list to describe a recent performance
regression :( on a simple benchmark.
On my system (PII-400, 256 M, glibc2.2.4, binutils2.11.2) 3.0.1 produces
code much slower than 3.0 for the Max test of the oopack suite. In this
test the following two styles (the first "C-style", the second
"OOP-style") are compared:
void MaxBenchmark::c_style() const // Compute max of vector (C-style)
{
double max = U[0];
for( int k=1; k<M; k++ ) // Loop over vector elements
if( U[k] > max )
max=U[k];
MaxResult = max;
}
inline int Greater( double i, double j )
{
return i>j;
}
void MaxBenchmark::oop_style() const // Compute max of vector
(OOP-style)
{
double max = U[0];
for( int k=1; k<M; k++ ) // Loop over vector elements
if( Greater( U[k], max ) )
max=U[k];
MaxResult = max;
}
Now, if I compile oopack_v1p8.C with 3.0 and with 3.0.1 on my system
with
g++ -O2 -finline-limit=600 oopack_v1p8.C
this is what I get as run times:
3.0.1
-----
Seconds Mflops
Test Iterations C OOP C OOP Ratio
---- ---------- ----------- ----------- -----
Max 500000 7.1 19.6 70.6 25.4 2.8
3.0
---
Seconds Mflops
Test Iterations C OOP C OOP Ratio
---- ---------- ----------- ----------- -----
Max 500000 7.1 9.7 70.5 51.7 1.4
It turns out that the core loop over k is compiled in the same way for
the "C-style" case by both the compilers:
80488d0: dd 02 fldl (%edx)
80488d2: dd e1 fucom %st(1)
80488d4: df e0 fnstsw %ax
80488d6: 9e sahf
80488d7: 76 04 jbe 80488dd
<_ZNK12MaxBenchmark7c_styleEv+0x2d>
80488d9: dd d9 fstp %st(1)
80488db: eb 02 jmp 80488df
<_ZNK12MaxBenchmark7c_styleEv+0x2f>
80488dd: dd d8 fstp %st(0)
80488df: 83 c2 08 add $0x8,%edx
80488e2: 49 dec %ecx
80488e3: 79 eb jns 80488d0
<_ZNK12MaxBenchmark7c_styleEv+0x20>
On the other hand, for the "OOP-style" case:
3.0.1
-----
8048910: dd 01 fldl (%ecx)
8048912: dd e1 fucom %st(1)
8048914: df e0 fnstsw %ax
8048916: 9e sahf
8048917: 0f 97 c0 seta %al
804891a: 83 e0 01 and $0x1,%eax
804891d: 74 04 je 8048923
<_ZNK12MaxBenchmark9oop_styleEv+0x33>
804891f: dd d9 fstp %st(1)
8048921: eb 02 jmp 8048925
<_ZNK12MaxBenchmark9oop_styleEv+0x35>
8048923: dd d8 fstp %st(0)
8048925: 83 c1 08 add $0x8,%ecx
8048928: 4a dec %edx
8048929: 79 e5 jns 8048910
<_ZNK12MaxBenchmark9oop_styleEv+0x20>
3.0
---
8048910: dd 03 fldl (%ebx)
8048912: 31 d2 xor %edx,%edx
8048914: dd e1 fucom %st(1)
8048916: df e0 fnstsw %ax
8048918: 9e sahf
8048919: 0f 97 c2 seta %dl
804891c: 85 d2 test %edx,%edx
804891e: 74 04 je 8048924
<_ZNK12MaxBenchmark9oop_styleEv+0x34>
8048920: dd d9 fstp %st(1)
8048922: eb 02 jmp 8048926
<_ZNK12MaxBenchmark9oop_styleEv+0x36>
8048924: dd d8 fstp %st(0)
8048926: 83 c3 08 add $0x8,%ebx
8048929: 49 dec %ecx
804892a: 79 e4 jns 8048910
<_ZNK12MaxBenchmark9oop_styleEv+0x20>
By the way, the same performance regression with respect to some weeks
ago happens for recent 3.1 snapshots:
3.1 20010902 (experimental)
---------------------------
Seconds Mflops
Test Iterations C OOP C OOP Ratio
---- ---------- ----------- ----------- -----
Max 500000 7.1 19.7 70.1 25.4 2.8
"C-style"
---------
80488e0: dd 04 d5 60 b2 04 08 fldl 0x804b260(,%edx,8)
80488e7: dd e1 fucom %st(1)
80488e9: df e0 fnstsw %ax
80488eb: 9e sahf
80488ec: 76 22 jbe 8048910
<_ZNK12MaxBenchmark7c_styleEv+0x40>
80488ee: dd d9 fstp %st(1)
80488f0: 42 inc %edx
80488f1: 81 fa e7 03 00 00 cmp $0x3e7,%edx
80488f7: 7e e7 jle 80488e0
<_ZNK12MaxBenchmark7c_styleEv+0x10>
"OOP-style"
-----------
8048930: dd 04 d5 60 b2 04 08 fldl 0x804b260(,%edx,8)
8048937: dd e1 fucom %st(1)
8048939: df e0 fnstsw %ax
804893b: 9e sahf
804893c: 0f 97 c0 seta %al
804893f: 83 e0 01 and $0x1,%eax
8048942: 74 1c je 8048960
<_ZNK12MaxBenchmark9oop_styleEv+0x40>
8048944: dd d9 fstp %st(1)
8048946: 42 inc %edx
8048947: 81 fa e7 03 00 00 cmp $0x3e7,%edx
804894d: 7e e1 jle 8048930
<_ZNK12MaxBenchmark9oop_styleEv+0x10>
I hope that some of the gcc developers (perhaps Jan Hubicka?) may take
care of this disappointing behavior!
Regards,
Paolo Carlini.