This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: 3.0 vs 3.0.1 on oopack's Max
- To: <pcarlini at unitus dot it>, <gcc at gcc dot gnu dot org>
- Subject: Re: 3.0 vs 3.0.1 on oopack's Max
- From: "Tim Prince" <tprince at computer dot org>
- Date: Fri, 7 Sep 2001 10:14:46 -0700
- Cc: <jh at suse dot cz>, <jbuck at synopsys dot com>
- References: <3B98CE9D.AE04A5B7@unitus.it>
I don't know why you would not use at least -march=pentiumpro for
this compilation. It used to require -ffast-math as well for
good performance, but I have noticed that has become less
necessary. I find that gcc has been out-performing several well
known commercial compilers on similar operations, although my
source is not identical to yours.
----- Original Message -----
From: "Paolo Carlini" <pcarlini@unitus.it>
To: <gcc@gcc.gnu.org>
Cc: <jh@suse.cz>; <jbuck@synopsys.com>
Sent: Friday, September 07, 2001 6:41 AM
Subject: 3.0 vs 3.0.1 on oopack's Max
> Hi all,
>
> once more, I'm writing to the list to describe a recent
performance
> regression :( on a simple benchmark.
>
> On my system (PII-400, 256 M, glibc2.2.4, binutils2.11.2) 3.0.1
produces
> code much slower than 3.0 for the Max test of the oopack suite.
In this
> test the following two styles (the first "C-style", the second
> "OOP-style") are compared:
>
> void MaxBenchmark::c_style() const // Compute max of vector
(C-style)
> {
> double max = U[0];
> for( int k=1; k<M; k++ ) // Loop over vector elements
> if( U[k] > max )
> max=U[k];
> MaxResult = max;
> }
>
> inline int Greater( double i, double j )
> {
> return i>j;
> }
>
> void MaxBenchmark::oop_style() const // Compute max of vector
> (OOP-style)
> {
> double max = U[0];
> for( int k=1; k<M; k++ ) // Loop over vector elements
> if( Greater( U[k], max ) )
> max=U[k];
> MaxResult = max;
> }
>
> Now, if I compile oopack_v1p8.C with 3.0 and with 3.0.1 on my
system
> with
>
> g++ -O2 -finline-limit=600 oopack_v1p8.C
>
> this is what I get as run times:
>
> 3.0.1
> -----
> Seconds Mflops
> Test Iterations C OOP C OOP Ratio
> ---- ---------- ----------- ----------- -----
> Max 500000 7.1 19.6 70.6 25.4 2.8
>
>
> 3.0
> ---
> Seconds Mflops
> Test Iterations C OOP C OOP Ratio
> ---- ---------- ----------- ----------- -----
> Max 500000 7.1 9.7 70.5 51.7 1.4
>
>
> It turns out that the core loop over k is compiled in the same
way for
> the "C-style" case by both the compilers:
>
> 80488d0: dd 02 fldl (%edx)
> 80488d2: dd e1 fucom %st(1)
> 80488d4: df e0 fnstsw %ax
> 80488d6: 9e sahf
> 80488d7: 76 04 jbe 80488dd
> <_ZNK12MaxBenchmark7c_styleEv+0x2d>
> 80488d9: dd d9 fstp %st(1)
> 80488db: eb 02 jmp 80488df
> <_ZNK12MaxBenchmark7c_styleEv+0x2f>
> 80488dd: dd d8 fstp %st(0)
> 80488df: 83 c2 08 add $0x8,%edx
> 80488e2: 49 dec %ecx
> 80488e3: 79 eb jns 80488d0
> <_ZNK12MaxBenchmark7c_styleEv+0x20>
>
>
> On the other hand, for the "OOP-style" case:
>
> 3.0.1
> -----
>
> 8048910: dd 01 fldl (%ecx)
> 8048912: dd e1 fucom %st(1)
> 8048914: df e0 fnstsw %ax
> 8048916: 9e sahf
> 8048917: 0f 97 c0 seta %al
> 804891a: 83 e0 01 and $0x1,%eax
> 804891d: 74 04 je 8048923
> <_ZNK12MaxBenchmark9oop_styleEv+0x33>
> 804891f: dd d9 fstp %st(1)
> 8048921: eb 02 jmp 8048925
> <_ZNK12MaxBenchmark9oop_styleEv+0x35>
> 8048923: dd d8 fstp %st(0)
> 8048925: 83 c1 08 add $0x8,%ecx
> 8048928: 4a dec %edx
> 8048929: 79 e5 jns 8048910
> <_ZNK12MaxBenchmark9oop_styleEv+0x20>
>
> 3.0
> ---
>
> 8048910: dd 03 fldl (%ebx)
> 8048912: 31 d2 xor %edx,%edx
> 8048914: dd e1 fucom %st(1)
> 8048916: df e0 fnstsw %ax
> 8048918: 9e sahf
> 8048919: 0f 97 c2 seta %dl
> 804891c: 85 d2 test %edx,%edx
> 804891e: 74 04 je 8048924
> <_ZNK12MaxBenchmark9oop_styleEv+0x34>
> 8048920: dd d9 fstp %st(1)
> 8048922: eb 02 jmp 8048926
> <_ZNK12MaxBenchmark9oop_styleEv+0x36>
> 8048924: dd d8 fstp %st(0)
> 8048926: 83 c3 08 add $0x8,%ebx
> 8048929: 49 dec %ecx
> 804892a: 79 e4 jns 8048910
> <_ZNK12MaxBenchmark9oop_styleEv+0x20>
>
>
> By the way, the same performance regression with respect to
some weeks
> ago happens for recent 3.1 snapshots:
>
> 3.1 20010902 (experimental)
> ---------------------------
>
> Seconds Mflops
> Test Iterations C OOP C OOP Ratio
> ---- ---------- ----------- ----------- -----
> Max 500000 7.1 19.7 70.1 25.4 2.8
>
>
> "C-style"
> ---------
>
> 80488e0: dd 04 d5 60 b2 04 08 fldl 0x804b260(,%edx,8)
> 80488e7: dd e1 fucom %st(1)
> 80488e9: df e0 fnstsw %ax
> 80488eb: 9e sahf
> 80488ec: 76 22 jbe 8048910
> <_ZNK12MaxBenchmark7c_styleEv+0x40>
> 80488ee: dd d9 fstp %st(1)
> 80488f0: 42 inc %edx
> 80488f1: 81 fa e7 03 00 00 cmp $0x3e7,%edx
> 80488f7: 7e e7 jle 80488e0
> <_ZNK12MaxBenchmark7c_styleEv+0x10>
>
>
> "OOP-style"
> -----------
>
> 8048930: dd 04 d5 60 b2 04 08 fldl 0x804b260(,%edx,8)
> 8048937: dd e1 fucom %st(1)
> 8048939: df e0 fnstsw %ax
> 804893b: 9e sahf
> 804893c: 0f 97 c0 seta %al
> 804893f: 83 e0 01 and $0x1,%eax
> 8048942: 74 1c je 8048960
> <_ZNK12MaxBenchmark9oop_styleEv+0x40>
> 8048944: dd d9 fstp %st(1)
> 8048946: 42 inc %edx
> 8048947: 81 fa e7 03 00 00 cmp $0x3e7,%edx
> 804894d: 7e e1 jle 8048930
> <_ZNK12MaxBenchmark9oop_styleEv+0x10>
>
>
>
> I hope that some of the gcc developers (perhaps Jan Hubicka?)
may take
> care of this disappointing behavior!
>
> Regards,
> Paolo Carlini.
>