Keep up the good work BUT When I compile my (very large number crunching) program with the INTEL C++ compiler (icc,v7.0) it runs 20 to 30% faster. On INTEL I use a single compilation pass with no "special" options, just -O3. With GCC(3.3.2) I tried all sorts of options, but could not get any closer. I would really like to continue using GCC, but 25% is hard to pass up. I used no special options when GCC was built and installed. Bill
We need a testcase?
Subject: Re: Executable runs 25% slower than when compiled with INTEL compiler > From dberlin@gcc.gnu.org Fri Jan 16 17:54:53 2004 > Date: 16 Jan 2004 22:54:51 -0000 > From: "pinskia at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> > To: william.crocker@analog.com > Subject: [Bug c++/13712] Executable runs 25% slower than when compiled with INTEL compiler > X-Bugzilla-Reason: Reporter > X-Spam-Status: No, hits=0.0 required=10.0 > tests=none > version=2.60 > X-Spam-Level: > X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) > X-Scanned-By: MIMEDefang 2.38 > > > ------- Additional Comments From pinskia at gcc dot gnu dot org 2004-01-16 22:54 ------- > We need a testcase? I can probably put together a test case but I don't think you are going to like it. (I tried to make a small test case, but it did not demonstrate the problem.) My app is 750K LOC. gprof shows that a single 2K line function accounts for most (if not all) of the 25% difference in execution speed. This function is primarily floating point operations with a little control logic. I can try to carve this function out and see if it demonstrates the problem. Would that be satisfactory ? Bill > > -- > What |Removed |Added > ---------------------------------------------------------------------------- > Status|UNCONFIRMED |WAITING > Keywords| |pessimizes-code > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13712 > > ------- You are receiving this mail because: ------- > You reported the bug, or are watching the reporter. >
No we do not care about how bad the code looks, we can always reduce it.
That would be a good first step. The testcase doesn't have to be particularly small as a first attempt, but it should be selfcontained. The testcase also doesn't have to do something particularly useful. W.
Subject: Re: Executable runs 25% slower than when compiled with INTEL compiler > From dberlin@gcc.gnu.org Thu Jan 22 14:48:06 2004 > Date: 22 Jan 2004 19:48:04 -0000 > From: "bangerth at dealii dot org" <gcc-bugzilla@gcc.gnu.org> > To: william.crocker@analog.com > Subject: [Bug optimization/13712] Executable runs 25% slower than when compiled with INTEL compiler > X-Bugzilla-Reason: Reporter > X-Spam-Status: No, hits=0.0 required=10.0 > tests=none > version=2.60 > X-Spam-Level: > X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) > X-Scanned-By: MIMEDefang 2.38 > > > ------- Additional Comments From bangerth at dealii dot org 2004-01-22 19:48 ------- > That would be a good first step. The testcase doesn't have to be particularly > small as a first attempt, but it should be selfcontained. The testcase also > doesn't have to do something particularly useful. > I sent in my test case, but the bugzilla data base implies that you are still "waiting". Did you get my test case or should I send it again ? Do you have any email filters which would have rejected my email ? Bill > W. > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13712 > > ------- You are receiving this mail because: ------- > You reported the bug, or are watching the reporter. >
Subject: Re: Executable runs 25% slower than when compiled with INTEL compiler "william dot crocker at analog dot com" <gcc-bugzilla@gcc.gnu.org> writes: > I sent in my test case, but the bugzilla data base implies that > you are still "waiting". > > Did you get my test case or should I send it again ? I don't see any attachments. Could you try again using the "Create a New Attachment" link at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13712 ?
Created attachment 5608 [details] Compressed TAR file. This example shows a case where code compiled by the INTEL compiler (icc 7.1) runs 35% faster than when compiled with gcc 3.3.2 I'm sorry the example is so large, but attempts to produce a smaller example failed. The example has also been obfiscated for reasons or propriety. I used -O3 with gcc because that produced the fastest executable. ------------------------------------------------------------------- >make ###### USING GCC ###### `/home/whc/lintel/gcc/usr_local/bin/g++ -v` Reading specs from /home/whc/lintel/gcc/usr_local/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/specs Configured with: ./configure --prefix=/home/whc/lintel/gcc/usr_local --exec-prefix=/home/whc/lintel/gcc/usr_local Thread model: posix gcc version 3.3.2 /home/whc/lintel/gcc/usr_local/bin/g++ -O3 -c -Wno-deprecated y.cc ls -ls y.o 46 -rw-r--r-- 1 whc cad 46272 Jan 26 12:48 y.o /home/whc/lintel/gcc/usr_local/bin/g++ -O3 -o y y.o /bin/csh -c "(setenv LD_LIBRARY_PATH /home/whc/lintel/gcc/usr_local/lib; time y)" X422 = 469.196 39.490u 0.000s 0:39.57 99.7% 0+0k 0+0io 236pf+0w ###### USING ICC ###### /dcad/apps/intel/compiler70/ia32/bin/icc -V -c -O y.cc Intel(R) C++ Compiler for 32-bit applications, Version 7.1 Build 20030307Z Copyright (C) 1985-2003 Intel Corporation. All rights reserved. Edison Design Group C/C++ Front End, version 3.0 (Mar 8 2003 18:39:53) Copyright 1988-2002 Edison Design Group, Inc. ls -ls y.o 144 -rw-r--r-- 1 whc cad 135824 Jan 26 12:49 y.o /dcad/apps/intel/compiler70/ia32/bin/icc -o y y.o /bin/csh -c "(setenv LD_LIBRARY_PATH /dcad/apps/intel/compiler70/ia32/lib; time y)" X422 = 469.196 25.650u 0.000s 0:25.65 100.0% 0+0k 0+0io 173pf+0w ~ ~
Subject: Re: Executable runs 25% slower than when compiled with INTEL compiler > From dberlin@gcc.gnu.org Thu Jan 29 08:16:41 2004 > Date: 29 Jan 2004 13:16:35 -0000 > From: "falk dot hueffner at student dot uni-tuebingen dot de" <gcc-bugzilla@gcc.gnu.org> > To: william.crocker@analog.com > Subject: [Bug optimization/13712] Executable runs 25% slower than when compiled with INTEL compiler > X-Bugzilla-Reason: Reporter > X-Spam-Status: No, hits=0.0 required=10.0 > tests=none > version=2.60 > X-Spam-Level: > X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) > X-Scanned-By: MIMEDefang 2.38 > > > ------- Additional Comments From falk dot hueffner at student dot uni-tuebingen dot de 2004-01-29 13:16 ------- > Subject: Re: Executable runs 25% slower than when compiled with INTEL compiler > > "william dot crocker at analog dot com" <gcc-bugzilla@gcc.gnu.org> writes: > > > I sent in my test case, but the bugzilla data base implies that > > you are still "waiting". > > > > Did you get my test case or should I send it again ? > > I don't see any attachments. Could you try again using the "Create a > New Attachment" link at > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13712 > ? I just attached a compressed (.Z) tar file. Bill > > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13712 > > ------- You are receiving this mail because: ------- > You reported the bug, or are watching the reporter. >
Thanks for the test case. I see it's using FP math heavily. Could you also try the options -ffast-math and -mfpmath=sse? AFAIK the Intel compiler does the equivalent of these switches by default so they are needed for a fair comparison.
Subject: Re: Executable runs 25% slower than when compiled with INTEL compiler > > ------- Additional Comments From falk at debian dot org 2004-01-29 16:01 ------- > Thanks for the test case. I see it's using FP math heavily. Could you also > try the options -ffast-math and -mfpmath=sse? AFAIK the Intel compiler does > the equivalent of these switches by default so they are needed for a fair > comparison. > Your email suggest that I sould put something (?) after the sse, but the online GCC doc does not show this. I get this when compiling. >/home/whc/lintel/gcc/usr_local/bin/g++ -O3 -c -Wno-deprecated -ffast-math -mfpmath=sse y.cc cc1plus: warning: SSE instruction set disabled, using 387 arithmetics I also tried -msse and -msse2, but got no better than 30 Seconds. ANYWAY it runs and I get the following which now shows INTEL with only a 22% advantage versus the previous 37%. Bill ###### USING GCC ###### `/home/whc/lintel/gcc/usr_local/bin/g++ -v` Reading specs from /home/whc/lintel/gcc/usr_local/lib/gcc-lib/i686-pc-linux-gnu/3.3.2/specs Configured with: ./configure --prefix=/home/whc/lintel/gcc/usr_local --exec-prefix=/home/whc/lintel/gcc/usr_local Thread model: posix gcc version 3.3.2 /home/whc/lintel/gcc/usr_local/bin/g++ -O3 -c -Wno-deprecated -ffast-math -mfpmath=sse y.cc cc1plus: warning: SSE instruction set disabled, using 387 arithmetics ls -ls y.o 46 -rw-r--r-- 1 whc cad 46676 Jan 29 11:05 y.o /home/whc/lintel/gcc/usr_local/bin/g++ -O3 -o y y.o /bin/csh -c "(setenv LD_LIBRARY_PATH /home/whc/lintel/gcc/usr_local/lib; time y)" X422 = 469.196 30.669u 0.007s 0:31.04 98.7% 0+0k 0+0io 235pf+0w ###### USING ICC ###### /dcad/apps/intel/compiler70/ia32/bin/icc -V -c -O y.cc Intel(R) C++ Compiler for 32-bit applications, Version 7.1 Build 20030307Z Copyright (C) 1985-2003 Intel Corporation. All rights reserved. Edison Design Group C/C++ Front End, version 3.0 (Mar 8 2003 18:39:53) Copyright 1988-2002 Edison Design Group, Inc. ls -ls y.o 144 -rw-r--r-- 1 whc cad 135824 Jan 29 11:06 y.o /dcad/apps/intel/compiler70/ia32/bin/icc -o y y.o /bin/csh -c "(setenv LD_LIBRARY_PATH /dcad/apps/intel/compiler70/ia32/lib; time y)" X422 = 469.196 23.871u 0.011s 0:24.13 98.9% 0+0k 0+0io 178pf+0w [whc@juno testcase]$ > > -- > What |Removed |Added > ---------------------------------------------------------------------------- > GCC build triplet|??? | > GCC host triplet|DELL, Pentium4, Linux RedHat| > |7.3 | > GCC target triplet|??? |i686-pc-linux-gnu > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13712 > > ------- You are receiving this mail because: ------- > You reported the bug, or are watching the reporter. >
I've quickly hacked the test case to compile with a C compiler, to compare gcc against the DEC compiler. I replaced exp(x) with x*x, since exp() is a libcall and not under the control of the compiler. Then I get on an 800MHz ev68: gcc 3.4: 46.18s (-O3 -ffast-math -mcpu=ev67) DEC C: 30.02s (-O6 -fast) So there's also something to gain on other platforms...
Subject: Re: Executable runs 25% slower than when compiled with INTEL compiler GCC Team: > > ANYWAY it runs and I get the following which now shows INTEL with only > a 22% advantage versus the previous 37%. > I assume I will not see any changes to GCC to improve this situation in the immediate future. I guess I need to change my code to simulate the optimizations performed by the INTEL compiler and NOT performed by GCC. Do you have any suggestions (or references to suggestions) as to what these changes might be so that I can make my program run faster now. Bill
I have done some experiments with CVS gcc-3.5 [3.5.0 20040216 (experimental)] and gcc-3.2 [3.2 20020903 (Red Hat Linux 8.0 3.2-7)] on P4 3.2GHz, using various options. gcc-3.2: g++ -O3 X422 = 469.196 real 0m35.219s user 0m35.143s sys 0m0.061s gcc-3.2: g++ -march=i686 -O3 -ffast-math X422 = 469.196 real 0m29.439s user 0m29.410s sys 0m0.021s gcc-3.5: g++ -march=i686 -O3 -ffast-math X422 = 469.196 real 0m26.380s user 0m26.287s sys 0m0.004s gcc-3.5: g++ -march=i686 -msse2 -mfpmath=sse -O3 -ffast-math X422 = 469.196 real 0m26.591s user 0m26.359s sys 0m0.059s gcc-3.5: g++ -march=i686 -O3 -ffast-math, with all functions in source changed to __builtin_<function>: HUGE_VAL => __builtin_huge_val() sqrt() => __builtin_sqrt() log() => __builtin_log() exp() => __built_in_exp() X422 = 469.196 real 0m23.145s user 0m23.115s sys 0m0.018s
Where are we standing with this one today?
(In reply to comment #14) > Where are we standing with this one today? gcc version 4.0.0 20050124 (experimental) g++ -O3 -ffast-math y.cc real 0m27.102s user 0m26.980s sys 0m0.016s g++ -O3 -ffast-math -D__NO_MATH_INLINES y.cc real 0m23.484s user 0m23.307s sys 0m0.076s g++ -O3 -march=pentium4 -ffast-math -D__NO_MATH_INLINES y.cc real 0m23.101s user 0m23.014s sys 0m0.078s g++ -O3 -march=pentium4 -mfpmath=sse -ffast-math -D__NO_MATH_INLINES y.cc real 0m31.650s user 0m31.605s sys 0m0.025s g++ -O3 -march=pentium4 -mfpmath=sse -ffast-math y.cc real 0m29.068s user 0m28.863s sys 0m0.023s g++ -O3 -march=pentium4 -mfpmath=sse y.cc real 0m35.343s user 0m34.848s sys 0m0.047s g++ -O3 -march=pentium4 -mfpmath=sse -ffast-math -mno-80387 -D__NO_MATH_INLINES y.cc *** FAILED: X422 = nan *** real 2m56.700s user 2m55.615s sys 0m0.145s g++ -O3 -march=pentium4 -mfpmath=sse -mno-80387 -D__NO_MATH_INLINES y.cc *** TIMEOUT AFTER 3min *** -mfpmath=sse runs a bit slow. -mno-80387 IMHO generates wrong code.
Looks like -D__NO_MATH_INLINES makes gcc produced code much better...
For -D__NO_MATH_INLINES we're probably not going to make any progress as long as Uli is the glibc maintainer. Other than that, this appears to be fixed. Note that ICC has -ffast-math and SSE as the defaults, where GCC choses for safe math and code that works on any ix86 CPU, not just the ones with SSE. So if there is still a significant difference, it is as much philosophical as it is in code generation. Given the right set of options, GCC can compete with ICC on my Pentium4 box, and on Uros' box. So there doesn't seem to be a good reason to keep this report open.
.