[Bug c++/21628] New: GCC much slower than ICL. Lack of inlining?

laurent at ient dot rwth-aachen dot de gcc-bugzilla@gcc.gnu.org
Tue May 17 15:46:00 GMT 2005


I first posted this problem at gcc-help@gcc.gnu.org. I was advice to post my 
problem here.

I have a program with many many inline template functions.
It is essential for the execution speed that every (or almost every) function 
marked as inline, becomes really inlined by the compiler.

I already compiled the program with Intel Compiler (ICL) on Visual C++, and it 
works fine and fast. I verified that the functions are then really inlined.

But with GCC 3.4.X (Linux & Cygwin) the same program is much slower (5-20 times)
than the version compiled with ICL. The '-Winline' option of GCC shows me that 
many functions are not inlined like they should.

The compiler considers the 'inline' keyword as an hint, but does not follow it. 
I tried to set various options of GCC, but nothing is satisfactory as far: -
finline-limit 100000000 --param large-function-growth=1000000 --param max-
inline-insns-single=1000000 ...

I am convicted that the poor performance is due to the lack of inlining because 
I get slow execution speed with ICL when the functions are not marked 
as 'inline'. With the '-Winline' option of GCC, I see every not inlined 
functions.

Also the SSE mode of the following test program should be much quicker than 
without SIMD, but requires much more inlining. ICL manages it, GCC not at all.

Do you know a mean to force GCC to obey the inline statement, or to increase 
the limits that these compilers internally have? Or do you have an alternative?


It is not possible to give a small test program. If you want to test on your 
own, I propose you download my library at this address, and compile the 
following test. (No need to compile the library, it is STL-like) 
http://www.ient.rwth-aachen.de/team/laurent/genial/genial.html

#define FFT_LEVEL 32
#include "signal/fft.h"
int main()
{
  DenseVector<complex<float> >::self X(32,0);
  DenseVector<complex<float> >::self Y(X.size(),0);
  double t0=get_time();
  for (int i=0; i<1000000; ++i)
    fft(X,Y);
  cout << get_time()-t0 << endl;
}

The execution time on a Pentium 4, 3.2GHz:
With ICL on Windows:
-No simd: 0.368s
-SSE: 0.126s
-SSE3: 0.112s
With GCC 3.4 on Cygwin/Linux (-O3 -msse3 -UWIN32 -ftemplate-depth-36 -lstlport)
-No SIMD : 0.969s
-SSE: 2.069s

For more informations, contact me per email (see home page)

Thanks

-- 
           Summary: GCC much slower than ICL. Lack of inlining?
           Product: gcc
           Version: 3.4.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: c++
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: laurent at ient dot rwth-aachen dot de
                CC: gcc-bugs at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21628



More information about the Gcc-bugs mailing list