This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: 3.0 vs 3.0.1 on oopack's Max


I don't know why you would not use at least -march=pentiumpro for
this compilation.  It used to require -ffast-math as well for
good performance, but I have noticed that has become less
necessary.  I find that gcc has been out-performing several well
known commercial compilers on similar operations, although my
source is not identical to yours.
----- Original Message -----
From: "Paolo Carlini" <pcarlini@unitus.it>
To: <gcc@gcc.gnu.org>
Cc: <jh@suse.cz>; <jbuck@synopsys.com>
Sent: Friday, September 07, 2001 6:41 AM
Subject: 3.0 vs 3.0.1 on oopack's Max


> Hi all,
>
> once more, I'm writing to the list to describe a recent
performance
> regression :( on a simple benchmark.
>
> On my system (PII-400, 256 M, glibc2.2.4, binutils2.11.2) 3.0.1
produces
> code much slower than 3.0 for the Max test of the oopack suite.
In this
> test the following two styles (the first "C-style", the second
> "OOP-style") are compared:
>
> void MaxBenchmark::c_style() const  // Compute max of vector
(C-style)
> {
>     double max = U[0];
>     for( int k=1; k<M; k++ )   // Loop over vector elements
>  if( U[k] > max )
>      max=U[k];
>     MaxResult = max;
> }
>
> inline int Greater( double i, double j )
> {
>     return i>j;
> }
>
> void MaxBenchmark::oop_style() const   // Compute max of vector
> (OOP-style)
> {
>     double max = U[0];
>     for( int k=1; k<M; k++ )   // Loop over vector elements
>  if( Greater( U[k], max ) )
>      max=U[k];
>     MaxResult = max;
> }
>
> Now, if I compile oopack_v1p8.C with 3.0 and with 3.0.1 on my
system
> with
>
>     g++ -O2 -finline-limit=600 oopack_v1p8.C
>
> this is what I get as run times:
>
> 3.0.1
> -----
>                          Seconds       Mflops
> Test       Iterations     C    OOP     C    OOP  Ratio
> ----       ----------  -----------  -----------  -----
> Max            500000    7.1  19.6   70.6  25.4    2.8
>
>
> 3.0
> ---
>                          Seconds       Mflops
> Test       Iterations     C    OOP     C    OOP  Ratio
> ----       ----------  -----------  -----------  -----
> Max            500000    7.1   9.7   70.5  51.7    1.4
>
>
> It turns out that the core loop over k is compiled in the same
way for
> the "C-style" case by both the compilers:
>
>  80488d0: dd 02                 fldl   (%edx)
>  80488d2: dd e1                 fucom  %st(1)
>  80488d4: df e0                 fnstsw %ax
>  80488d6: 9e                    sahf
>  80488d7: 76 04                 jbe    80488dd
> <_ZNK12MaxBenchmark7c_styleEv+0x2d>
>  80488d9: dd d9                 fstp   %st(1)
>  80488db: eb 02                 jmp    80488df
> <_ZNK12MaxBenchmark7c_styleEv+0x2f>
>  80488dd: dd d8                 fstp   %st(0)
>  80488df: 83 c2 08              add    $0x8,%edx
>  80488e2: 49                    dec    %ecx
>  80488e3: 79 eb                 jns    80488d0
> <_ZNK12MaxBenchmark7c_styleEv+0x20>
>
>
> On the other hand, for the "OOP-style" case:
>
> 3.0.1
> -----
>
>  8048910: dd 01                 fldl   (%ecx)
>  8048912: dd e1                 fucom  %st(1)
>  8048914: df e0                 fnstsw %ax
>  8048916: 9e                    sahf
>  8048917: 0f 97 c0              seta   %al
>  804891a: 83 e0 01              and    $0x1,%eax
>  804891d: 74 04                 je     8048923
> <_ZNK12MaxBenchmark9oop_styleEv+0x33>
>  804891f: dd d9                 fstp   %st(1)
>  8048921: eb 02                 jmp    8048925
> <_ZNK12MaxBenchmark9oop_styleEv+0x35>
>  8048923: dd d8                 fstp   %st(0)
>  8048925: 83 c1 08              add    $0x8,%ecx
>  8048928: 4a                    dec    %edx
>  8048929: 79 e5                 jns    8048910
> <_ZNK12MaxBenchmark9oop_styleEv+0x20>
>
> 3.0
> ---
>
>  8048910: dd 03                 fldl   (%ebx)
>  8048912: 31 d2                 xor    %edx,%edx
>  8048914: dd e1                 fucom  %st(1)
>  8048916: df e0                 fnstsw %ax
>  8048918: 9e                    sahf
>  8048919: 0f 97 c2              seta   %dl
>  804891c: 85 d2                 test   %edx,%edx
>  804891e: 74 04                 je     8048924
> <_ZNK12MaxBenchmark9oop_styleEv+0x34>
>  8048920: dd d9                 fstp   %st(1)
>  8048922: eb 02                 jmp    8048926
> <_ZNK12MaxBenchmark9oop_styleEv+0x36>
>  8048924: dd d8                 fstp   %st(0)
>  8048926: 83 c3 08              add    $0x8,%ebx
>  8048929: 49                    dec    %ecx
>  804892a: 79 e4                 jns    8048910
> <_ZNK12MaxBenchmark9oop_styleEv+0x20>
>
>
> By the way, the same performance regression with respect to
some weeks
> ago happens for recent 3.1 snapshots:
>
> 3.1 20010902 (experimental)
> ---------------------------
>
>                          Seconds       Mflops
> Test       Iterations     C    OOP     C    OOP  Ratio
> ----       ----------  -----------  -----------  -----
> Max            500000    7.1  19.7   70.1  25.4    2.8
>
>
> "C-style"
> ---------
>
>  80488e0: dd 04 d5 60 b2 04 08  fldl   0x804b260(,%edx,8)
>  80488e7: dd e1                 fucom  %st(1)
>  80488e9: df e0                 fnstsw %ax
>  80488eb: 9e                    sahf
>  80488ec: 76 22                 jbe    8048910
> <_ZNK12MaxBenchmark7c_styleEv+0x40>
>  80488ee: dd d9                 fstp   %st(1)
>  80488f0: 42                    inc    %edx
>  80488f1: 81 fa e7 03 00 00     cmp    $0x3e7,%edx
>  80488f7: 7e e7                 jle    80488e0
> <_ZNK12MaxBenchmark7c_styleEv+0x10>
>
>
> "OOP-style"
> -----------
>
>  8048930: dd 04 d5 60 b2 04 08  fldl   0x804b260(,%edx,8)
>  8048937: dd e1                 fucom  %st(1)
>  8048939: df e0                 fnstsw %ax
>  804893b: 9e                    sahf
>  804893c: 0f 97 c0              seta   %al
>  804893f: 83 e0 01              and    $0x1,%eax
>  8048942: 74 1c                 je     8048960
> <_ZNK12MaxBenchmark9oop_styleEv+0x40>
>  8048944: dd d9                 fstp   %st(1)
>  8048946: 42                    inc    %edx
>  8048947: 81 fa e7 03 00 00     cmp    $0x3e7,%edx
>  804894d: 7e e1                 jle    8048930
> <_ZNK12MaxBenchmark9oop_styleEv+0x10>
>
>
>
> I hope that some of the gcc developers (perhaps Jan Hubicka?)
may take
> care of this disappointing behavior!
>
> Regards,
> Paolo Carlini.
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]