[Bug c++/21628] New: GCC much slower than ICL. Lack of inlining?
laurent at ient dot rwth-aachen dot de
gcc-bugzilla@gcc.gnu.org
Tue May 17 15:46:00 GMT 2005
I first posted this problem at gcc-help@gcc.gnu.org. I was advice to post my
problem here.
I have a program with many many inline template functions.
It is essential for the execution speed that every (or almost every) function
marked as inline, becomes really inlined by the compiler.
I already compiled the program with Intel Compiler (ICL) on Visual C++, and it
works fine and fast. I verified that the functions are then really inlined.
But with GCC 3.4.X (Linux & Cygwin) the same program is much slower (5-20 times)
than the version compiled with ICL. The '-Winline' option of GCC shows me that
many functions are not inlined like they should.
The compiler considers the 'inline' keyword as an hint, but does not follow it.
I tried to set various options of GCC, but nothing is satisfactory as far: -
finline-limit 100000000 --param large-function-growth=1000000 --param max-
inline-insns-single=1000000 ...
I am convicted that the poor performance is due to the lack of inlining because
I get slow execution speed with ICL when the functions are not marked
as 'inline'. With the '-Winline' option of GCC, I see every not inlined
functions.
Also the SSE mode of the following test program should be much quicker than
without SIMD, but requires much more inlining. ICL manages it, GCC not at all.
Do you know a mean to force GCC to obey the inline statement, or to increase
the limits that these compilers internally have? Or do you have an alternative?
It is not possible to give a small test program. If you want to test on your
own, I propose you download my library at this address, and compile the
following test. (No need to compile the library, it is STL-like)
http://www.ient.rwth-aachen.de/team/laurent/genial/genial.html
#define FFT_LEVEL 32
#include "signal/fft.h"
int main()
{
DenseVector<complex<float> >::self X(32,0);
DenseVector<complex<float> >::self Y(X.size(),0);
double t0=get_time();
for (int i=0; i<1000000; ++i)
fft(X,Y);
cout << get_time()-t0 << endl;
}
The execution time on a Pentium 4, 3.2GHz:
With ICL on Windows:
-No simd: 0.368s
-SSE: 0.126s
-SSE3: 0.112s
With GCC 3.4 on Cygwin/Linux (-O3 -msse3 -UWIN32 -ftemplate-depth-36 -lstlport)
-No SIMD : 0.969s
-SSE: 2.069s
For more informations, contact me per email (see home page)
Thanks
--
Summary: GCC much slower than ICL. Lack of inlining?
Product: gcc
Version: 3.4.1
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: laurent at ient dot rwth-aachen dot de
CC: gcc-bugs at gcc dot gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21628
More information about the Gcc-bugs
mailing list