Bug 13712 - Executable runs 25% slower than when compiled with INTEL compiler
Summary: Executable runs 25% slower than when compiled with INTEL compiler
Status: RESOLVED WONTFIX
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 3.3.2
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2004-01-16 21:11 UTC by Bill Crocker
Modified: 2006-02-27 14:22 UTC (History)
2 users (show)

See Also:
Host:
Target: i686-pc-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed: 2005-04-24 14:18:13


Attachments
Compressed TAR file. (24.26 KB, application/octet-stream)
2004-01-29 13:27 UTC, Bill Crocker
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bill Crocker 2004-01-16 21:11:31 UTC
Keep up the good work BUT
When I compile my (very large number crunching) program
with the INTEL C++ compiler (icc,v7.0) it runs 20 to 30% faster.
On INTEL I use a single compilation pass with no "special" options, just -O3.
With GCC(3.3.2) I tried all sorts of options, but could not get any closer.
I would really like to continue using GCC,
but 25% is hard to pass up.
I used no special options when GCC was built and installed.

Bill
Comment 1 Andrew Pinski 2004-01-16 22:54:51 UTC
We need a testcase?
Comment 2 Bill Crocker 2004-01-22 19:42:12 UTC
Subject: Re:  Executable runs 25% slower than when compiled with INTEL compiler


> From dberlin@gcc.gnu.org Fri Jan 16 17:54:53 2004
> Date: 16 Jan 2004 22:54:51 -0000
> From: "pinskia at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org>
> To: william.crocker@analog.com
> Subject: [Bug c++/13712] Executable runs 25% slower than when compiled with INTEL compiler
> X-Bugzilla-Reason: Reporter
> X-Spam-Status: No, hits=0.0 required=10.0
> 	tests=none
> 	version=2.60
> X-Spam-Level:  
> X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp)
> X-Scanned-By: MIMEDefang 2.38
> 
> 
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2004-01-16 22:54 -------
> We need a testcase?

I can probably put together a test case but I don't think you
are going to like it. (I tried to make a small test case, but it
did not demonstrate the problem.)

My app is 750K LOC.
gprof shows that a single 2K line function accounts
for most (if not all) of the 25% difference in execution speed.
This function is primarily floating point operations
with a little control logic.

I can try to carve this function out
and see if it demonstrates the problem.

Would that be satisfactory ?

Bill

> 
> -- 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|UNCONFIRMED                 |WAITING
>            Keywords|                            |pessimizes-code
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13712
> 
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.
> 
Comment 3 Andrew Pinski 2004-01-22 19:45:41 UTC
No we do not care about how bad the code looks, we can always reduce it.
Comment 4 Wolfgang Bangerth 2004-01-22 19:48:02 UTC
That would be a good first step. The testcase doesn't have to be particularly  
small as a first attempt, but it should be selfcontained. The testcase also  
doesn't have to do something particularly useful.  
  
W. 
Comment 5 Bill Crocker 2004-01-29 13:10:50 UTC
Subject: Re:  Executable runs 25% slower than when compiled with INTEL compiler


> From dberlin@gcc.gnu.org Thu Jan 22 14:48:06 2004
> Date: 22 Jan 2004 19:48:04 -0000
> From: "bangerth at dealii dot org" <gcc-bugzilla@gcc.gnu.org>
> To: william.crocker@analog.com
> Subject: [Bug optimization/13712] Executable runs 25% slower than when compiled with INTEL compiler
> X-Bugzilla-Reason: Reporter
> X-Spam-Status: No, hits=0.0 required=10.0
> 	tests=none
> 	version=2.60
> X-Spam-Level:  
> X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp)
> X-Scanned-By: MIMEDefang 2.38
> 
> 
> ------- Additional Comments From bangerth at dealii dot org  2004-01-22 19:48 -------
> That would be a good first step. The testcase doesn't have to be particularly  
> small as a first attempt, but it should be selfcontained. The testcase also  
> doesn't have to do something particularly useful.  
>

I sent in my test case, but the bugzilla data base implies that
you are still "waiting".

Did you get my test case or should I send it again ?

Do you have any email filters which would have rejected
my email ?

Bill
  
> W. 
> 
> -- 
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13712
> 
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.
> 
Comment 6 falk.hueffner 2004-01-29 13:16:33 UTC
Subject: Re:  Executable runs 25% slower than when compiled with INTEL compiler

"william dot crocker at analog dot com" <gcc-bugzilla@gcc.gnu.org> writes:

> I sent in my test case, but the bugzilla data base implies that
> you are still "waiting".
> 
> Did you get my test case or should I send it again ?

I don't see any attachments. Could you try again using the "Create a
New Attachment" link at
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13712
?

Comment 7 Bill Crocker 2004-01-29 13:27:32 UTC
Created attachment 5608 [details]
Compressed TAR file.

This example shows a case where code compiled by the
INTEL compiler (icc 7.1) runs 35% faster than when compiled
with gcc 3.3.2

I'm sorry the example is so large, but attempts to
produce a smaller example failed. The example has also
been obfiscated for reasons or propriety.

I used -O3 with gcc because that produced the fastest executable.

-------------------------------------------------------------------

>make

###### USING GCC ######

`/home/whc/lintel/gcc/usr_local/bin/g++ -v`
Reading specs from
/home/whc/lintel/gcc/usr_local/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/specs
Configured with: ./configure --prefix=/home/whc/lintel/gcc/usr_local
--exec-prefix=/home/whc/lintel/gcc/usr_local
Thread model: posix
gcc version 3.3.2
/home/whc/lintel/gcc/usr_local/bin/g++ -O3 -c -Wno-deprecated y.cc
ls -ls y.o
  46 -rw-r--r--    1 whc      cad	  46272 Jan 26 12:48 y.o
/home/whc/lintel/gcc/usr_local/bin/g++ -O3 -o y y.o
/bin/csh -c "(setenv LD_LIBRARY_PATH /home/whc/lintel/gcc/usr_local/lib; time
y)"
X422 = 469.196
39.490u 0.000s 0:39.57 99.7%	0+0k 0+0io 236pf+0w

###### USING ICC ######

/dcad/apps/intel/compiler70/ia32/bin/icc -V -c -O y.cc
Intel(R) C++ Compiler for 32-bit applications, Version 7.1   Build 20030307Z
Copyright (C) 1985-2003 Intel Corporation.  All rights reserved.

Edison Design Group C/C++ Front End, version 3.0 (Mar  8 2003 18:39:53)
Copyright 1988-2002 Edison Design Group, Inc.

ls -ls y.o
 144 -rw-r--r--    1 whc      cad	 135824 Jan 26 12:49 y.o
/dcad/apps/intel/compiler70/ia32/bin/icc -o y y.o
/bin/csh -c "(setenv LD_LIBRARY_PATH /dcad/apps/intel/compiler70/ia32/lib; time
y)"
X422 = 469.196
25.650u 0.000s 0:25.65 100.0%	0+0k 0+0io 173pf+0w
~
~
Comment 8 Bill Crocker 2004-01-29 13:29:13 UTC
Subject: Re:  Executable runs 25% slower than when compiled with INTEL compiler


> From dberlin@gcc.gnu.org Thu Jan 29 08:16:41 2004
> Date: 29 Jan 2004 13:16:35 -0000
> From: "falk dot hueffner at student dot uni-tuebingen dot de" <gcc-bugzilla@gcc.gnu.org>
> To: william.crocker@analog.com
> Subject: [Bug optimization/13712] Executable runs 25% slower than when compiled with INTEL compiler
> X-Bugzilla-Reason: Reporter
> X-Spam-Status: No, hits=0.0 required=10.0
> 	tests=none
> 	version=2.60
> X-Spam-Level:  
> X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp)
> X-Scanned-By: MIMEDefang 2.38
> 
> 
> ------- Additional Comments From falk dot hueffner at student dot uni-tuebingen dot de  2004-01-29 13:16 -------
> Subject: Re:  Executable runs 25% slower than when compiled with INTEL compiler
> 
> "william dot crocker at analog dot com" <gcc-bugzilla@gcc.gnu.org> writes:
> 
> > I sent in my test case, but the bugzilla data base implies that
> > you are still "waiting".
> > 
> > Did you get my test case or should I send it again ?
> 
> I don't see any attachments. Could you try again using the "Create a
> New Attachment" link at
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13712
> ?

I just attached a compressed (.Z) tar file.

Bill

> 
> 
> 
> -- 
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13712
> 
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.
> 
Comment 9 Falk Hueffner 2004-01-29 16:01:31 UTC
Thanks for the test case. I see it's using FP math heavily. Could you also
try the options -ffast-math and -mfpmath=sse? AFAIK the Intel compiler does
the equivalent of these switches by default so they are needed for a fair
comparison.
Comment 10 Bill Crocker 2004-01-29 16:27:49 UTC
Subject: Re:  Executable runs 25% slower than when compiled with INTEL compiler

> 
> ------- Additional Comments From falk at debian dot org  2004-01-29 16:01 -------
> Thanks for the test case. I see it's using FP math heavily. Could you also
> try the options -ffast-math and -mfpmath=sse? AFAIK the Intel compiler does
> the equivalent of these switches by default so they are needed for a fair
> comparison.
>

Your email suggest that I sould put something (?) after the sse, but the 
online GCC doc does not show this.

I get this when compiling.

>/home/whc/lintel/gcc/usr_local/bin/g++ -O3 -c -Wno-deprecated -ffast-math -mfpmath=sse y.cc
cc1plus: warning: SSE instruction set disabled, using 387 arithmetics

I also tried -msse and -msse2, but got no better than 30 Seconds.

ANYWAY it runs and I get the following which now shows INTEL with only
a 22% advantage versus the previous 37%.

Bill

###### USING GCC ######

`/home/whc/lintel/gcc/usr_local/bin/g++ -v`
Reading specs from /home/whc/lintel/gcc/usr_local/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/specs
Configured with: ./configure --prefix=/home/whc/lintel/gcc/usr_local --exec-prefix=/home/whc/lintel/gcc/usr_local
Thread model: posix
gcc version 3.3.2
/home/whc/lintel/gcc/usr_local/bin/g++ -O3 -c -Wno-deprecated -ffast-math -mfpmath=sse y.cc
cc1plus: warning: SSE instruction set disabled, using 387 arithmetics
ls -ls y.o
  46 -rw-r--r--    1 whc      cad         46676 Jan 29 11:05 y.o
/home/whc/lintel/gcc/usr_local/bin/g++ -O3 -o y y.o
/bin/csh -c "(setenv LD_LIBRARY_PATH /home/whc/lintel/gcc/usr_local/lib; time y)"
X422 = 469.196
30.669u 0.007s 0:31.04 98.7%	0+0k 0+0io 235pf+0w

###### USING ICC ######

/dcad/apps/intel/compiler70/ia32/bin/icc -V -c -O y.cc
Intel(R) C++ Compiler for 32-bit applications, Version 7.1   Build 20030307Z
Copyright (C) 1985-2003 Intel Corporation.  All rights reserved.

Edison Design Group C/C++ Front End, version 3.0 (Mar  8 2003 18:39:53)
Copyright 1988-2002 Edison Design Group, Inc.

ls -ls y.o
 144 -rw-r--r--    1 whc      cad        135824 Jan 29 11:06 y.o
/dcad/apps/intel/compiler70/ia32/bin/icc -o y y.o
/bin/csh -c "(setenv LD_LIBRARY_PATH /dcad/apps/intel/compiler70/ia32/lib; time y)"
X422 = 469.196
23.871u 0.011s 0:24.13 98.9%	0+0k 0+0io 178pf+0w









[whc@juno testcase]$ 


> 
> -- 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>   GCC build triplet|???                         |
>    GCC host triplet|DELL, Pentium4, Linux RedHat|
>                    |7.3                         |
>  GCC target triplet|???                         |i686-pc-linux-gnu
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13712
> 
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.
> 
Comment 11 Falk Hueffner 2004-01-29 16:48:21 UTC
I've quickly hacked the test case to compile with a C compiler, to compare
gcc against the DEC compiler. I replaced exp(x) with x*x, since exp() is a
libcall and not under the control of the compiler. Then I get on an 800MHz
ev68:

gcc 3.4: 46.18s (-O3 -ffast-math -mcpu=ev67)
DEC C:   30.02s (-O6 -fast)

So there's also something to gain on other platforms...
Comment 12 Bill Crocker 2004-02-03 21:34:23 UTC
Subject: Re:  Executable runs 25% slower than when compiled with INTEL compiler

GCC Team:

> 
> ANYWAY it runs and I get the following which now shows INTEL with only
> a 22% advantage versus the previous 37%.
> 

I assume I will not see any changes to GCC to improve this
situation in the immediate future.

I guess I need to change my code to simulate the optimizations
performed by the INTEL compiler and NOT performed by GCC.

Do you have any suggestions (or references to suggestions)
as to what these changes might be so that I can make my program
run faster now.

Bill

Comment 13 Uroš Bizjak 2004-02-16 14:55:06 UTC
I have done some experiments with CVS gcc-3.5 [3.5.0 20040216 (experimental)]
and gcc-3.2 [3.2 20020903 (Red Hat Linux 8.0 3.2-7)] on P4 3.2GHz, using various
options.

gcc-3.2: g++ -O3
X422 = 469.196

real    0m35.219s
user    0m35.143s
sys     0m0.061s

gcc-3.2: g++ -march=i686 -O3 -ffast-math
X422 = 469.196

real    0m29.439s
user    0m29.410s
sys     0m0.021s

gcc-3.5: g++ -march=i686 -O3 -ffast-math
X422 = 469.196

real    0m26.380s
user    0m26.287s
sys     0m0.004s

gcc-3.5: g++ -march=i686 -msse2 -mfpmath=sse -O3 -ffast-math
X422 = 469.196

real    0m26.591s
user    0m26.359s
sys     0m0.059s

gcc-3.5: g++ -march=i686 -O3 -ffast-math,
with all functions in source changed to __builtin_<function>:
HUGE_VAL => __builtin_huge_val()
sqrt() => __builtin_sqrt()
log() => __builtin_log()
exp() => __built_in_exp()
X422 = 469.196

real    0m23.145s
user    0m23.115s
sys     0m0.018s
Comment 14 Steven Bosscher 2005-01-23 18:39:48 UTC
Where are we standing with this one today? 
 
Comment 15 Uroš Bizjak 2005-01-24 11:16:04 UTC
(In reply to comment #14)
> Where are we standing with this one today? 

gcc version 4.0.0 20050124 (experimental)

g++ -O3 -ffast-math y.cc
real    0m27.102s
user    0m26.980s
sys     0m0.016s

g++ -O3 -ffast-math -D__NO_MATH_INLINES y.cc
real    0m23.484s
user    0m23.307s
sys     0m0.076s

g++ -O3 -march=pentium4 -ffast-math -D__NO_MATH_INLINES y.cc
real    0m23.101s
user    0m23.014s
sys     0m0.078s

g++ -O3 -march=pentium4 -mfpmath=sse -ffast-math -D__NO_MATH_INLINES y.cc
real    0m31.650s
user    0m31.605s
sys     0m0.025s

g++ -O3 -march=pentium4 -mfpmath=sse -ffast-math y.cc
real    0m29.068s
user    0m28.863s
sys     0m0.023s

g++ -O3 -march=pentium4 -mfpmath=sse y.cc
real    0m35.343s
user    0m34.848s
sys     0m0.047s

g++ -O3 -march=pentium4 -mfpmath=sse -ffast-math -mno-80387 -D__NO_MATH_INLINES y.cc
*** FAILED: X422 = nan ***
real    2m56.700s
user    2m55.615s
sys     0m0.145s

g++ -O3 -march=pentium4 -mfpmath=sse -mno-80387 -D__NO_MATH_INLINES y.cc
*** TIMEOUT AFTER 3min ***

-mfpmath=sse runs a bit slow.
-mno-80387 IMHO generates wrong code.
Comment 16 Steven Bosscher 2005-05-07 15:27:18 UTC
Looks like -D__NO_MATH_INLINES makes gcc produced code much better... 
Comment 17 Steven Bosscher 2006-02-27 14:22:16 UTC
For -D__NO_MATH_INLINES we're probably not going to make any progress as long as Uli is the glibc maintainer.

Other than that, this appears to be fixed.  Note that ICC has -ffast-math and SSE as the defaults, where GCC choses for safe math and code that works on any ix86 CPU, not just the ones with SSE.  So if there is still a significant difference, it is as much philosophical as it is in code generation.

Given the right set of options, GCC can compete with ICC on my Pentium4 box, and on Uros' box.  So there doesn't seem to be a good reason to keep this report open.
Comment 18 Steven Bosscher 2006-02-27 14:22:35 UTC
.