This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

GCC much slower than ICL. How to force inline?


Hello

I have a program with many many inline template functions.
It is essential for the execution speed that every (or almost every)
function marked as inline, becomes really inlined by the compiler.

I already compiled the program with Intel Compiler (ICL) on Visual C++, and
it works fine and fast. I verified that the functions are really inlined.

But with GCC 3.4 (Linux & Cygwin) the same program is about 5 times slower
than the version compiled with ICL.
The '-Winline' option of GCC shows me that many functions are not inlined
like they should.

The compiler considers the 'inline' keyword as an hint, but does not follow
it.
I tried to set various options of GCC, but nothing is satisfactory as far:
-finline-limit 100000000
--param large-function-growth=1000000
--param max-inline-insns-single=1000000
...

I am convicted that the poor performance is due to the lack of inlining
because I get slow execution speed with ICL when the functions are not
marked as 'inline'.
With the '-Winline' option of GCC, I see every not inlined functions.

Also the SSE mode should be much quicker than without SIMD, but requires
much more inlining.
ICL manages it, GCC not at all.

Do you know a mean to force GCC to obey the inline statement, or to increase
the limits that these compilers internally have?
Or do you have an alternative?


It is not possible to give a small test program. If you want to test on your
own, I propose you download my library at this address, and compile the
following test. (No need to compile the library, it is STL-like)
http://www.ient.rwth-aachen.de/team/laurent/genial/genial.html

#define FFT_LEVEL 32
#include "signal/fft.h"
int main()
{
  DenseVector<complex<float> >::self X(32,0);
  DenseVector<complex<float> >::self Y(X.size(),0);
  double t0=get_time();
  for (int i=0; i<1000000; ++i)
    fft(X,Y);
  cout << get_time()-t0 << endl;
}

The execution time on a Pentium 4, 3.2GHz:
With ICL on Windows:
-No simd: 0.368s
-SSE: 0.126s
-SSE3: 0.112s
With GCC on Cygwin (-O3 -msse3 -UWIN32 -ftemplate-depth-36 -lstlport)
-No SIMD : 0.969s
-SSE: 2.069s

Thanks

Patrick
-------------------------------------------------

Patrick LAURENT
IENT - Institute for Communications Engineering
RWTH - Aachen University of Technology
Melatener Strasse 23, 52074 Aachen (Germany)
Tel: +49 241/80-27679   Fax: +49 241/80-22196
E-Mail: Laurent@ient.rwth-aachen.de
-------------------------------------------------





Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]