Compiler: gcc 3.3.3 System: Redhat, SuSE 8.1, 8.2, 9.0, Pentium3, 1.2GHz, 2GB Ram compiler options: default options, was compiled with "configure, make, make install" command line: g++ -O2 file.cpp compiler output: no warning, no error output of "g++ -v": Reading specs from /usr/local/gcc-3.3.3/lib/gcc-lib/i686-pc-linux-gnu/3.3.3/specs Configured with: /soft/gcc/gcc-3.3.3/gcc-3.3.3/configure --prefix=/usr/local/gcc-3.3.3 --enable-shared --enable-languages=c,c++,f77 Thread model: posix gcc version 3.3.3 The performance of a program seems to depend on the way how vectors are allocated: Here are 4 small example codes: (a) slow[35sec]: using C++ "new/delete" (b) fast[22sec]: using C-like malloc (c) slow[35sec]: using C-like malloc via function call (d) fast[22sec]: using C-like malloc via INLINE fuction call g++ -O2 [a,b,c,d].cpp timex a.out i don't understand, why the performance depends on the way how i allocate pointers. Please help. thanks Sixten ----- a.cpp --- #include <stdio.h> int main () { long it; int i, n = 1000; double *x = new double [n]; double *y = new double [n]; double a = 0.2, b = -a; for (i=0; i < n; ++i) x[i] = y[i] = i; for (it=0; it < 5000000L; ++it) { for (i=0; i < n; ++i) { y[i] += a * x[i]; y[i] += b * x[i]; } } printf ("%g\n", y[0]); delete [] y; delete [] x; } ---- b.cpp --- int main () { long it; int i, n = 1000; double *x = (double *)malloc ((size_t)(n * sizeof(double))); double *y = (double *)malloc ((size_t)(n * sizeof(double))); double a = 0.2, b = -a; for (i=0; i < n; ++i) x[i] = y[i] = i; for (it=0; it < 5000000L; ++it) { for (i=0; i < n; ++i) { y[i] += a * x[i]; y[i] += b * x[i]; } } printf ("%g\n", y[0]); free (y); free (x); } --- c.cpp --- double *myMalloc (int nElem) { return (double *)(malloc ((size_t)(nElem * sizeof(double)))); } int main () { long it; int i, n = 1000; double *x = myMalloc(n); double *y = myMalloc (n); double a = 0.2, b = -a; for (i=0; i < n; ++i) x[i] = y[i] = i; for (it=0; it < 5000000L; ++it) { for (i=0; i < n; ++i) { y[i] += a * x[i]; y[i] += b * x[i]; } } printf ("%g\n", y[0]); free (y); free (x); } --- d.cpp --- inline double *myMalloc (int nElem) { return (double *)(malloc ((size_t)(nElem * sizeof(double)))); } int main () { long it; int i, n = 1000; double *x = myMalloc(n); double *y = myMalloc (n); double a = 0.2, b = -a; for (i=0; i < n; ++i) x[i] = y[i] = i; for (it=0; it < 5000000L; ++it) { for (i=0; i < n; ++i) { y[i] += a * x[i]; y[i] += b * x[i]; } } printf ("%g\n", y[0]); free (y); free (x); }
I cannot reproduce it on my machine with the mainline. tin:~/src/gnu/gcctest>g++ -O2 pr14809.a.cc tin:~/src/gnu/gcctest>time ./a.out 0 21.460u 0.020s 0:42.13 50.9% 0+0k 0+0io 180pf+0w tin:~/src/gnu/gcctest>g++ -O2 pr14809.b.cc time ./a.out 0 21.750u 0.110s 0:33.76 64.7% 0+0k 0+0io 177pf+0w tin:~/src/gnu/gcctest>g++ -O2 pr14809.c.cc tin:~/src/gnu/gcctest>!tim time ./a.out 0 20.970u 0.750s 0:28.55 76.0% 0+0k 0+0io 177pf+0w
It probably depends on alignment of the pointers relative to cache lines. You should look at alignment.
Subject: Re: performance depends on allocation methods hi, > ------- Additional Comments From falk at debian dot org 2004-04-01 13:27 > ------- > It probably depends on alignment of the pointers relative to cache lines. > You > should look at alignment. thank you very much for your response. i tried all compiler flags related somehow to alignment. vectors allocated with "new" are still significantly slower than those allocated with "malloc". btw. i tried the test programs on some more linux machines. the effect seems to be larger on SMP machines. but i don't know what this means... any ideas??? :) sixten
Compiler flags don't have any effect on this. I was suggesting to examine alignment and try to adjust it manually. Also check coalignment with respect to cache line index, maybe you have a one-way associative cache and get collisions. In any case, it seems unlikely that gcc can do much about this.
I can't reproduce this either, but it took me a while until I understood Falk's comment. The point is: you call malloc/new only once, so the run time of these functions must necessarily be negligible and can't account for the 13 second slowdown you observe. So it must be the output of these functions (for example the alignment of the data region returned), which however is not under gcc control. Since none of us can reproduce this, I guess we should close this PR. For the record, here are the results for running the four testcases: g/x> /home/bangerth/bin/gcc-3.4-pre/bin/c++ -O2 a.cc -o a g/x> /home/bangerth/bin/gcc-3.4-pre/bin/c++ -O2 b.cc -o b g/x> /home/bangerth/bin/gcc-3.4-pre/bin/c++ -O2 c.cc -o c g/x> /home/bangerth/bin/gcc-3.4-pre/bin/c++ -O2 d.cc -o d g/x> time ./a ; time ./b ; time ./c ; time ./d 0 real 0m19.101s user 0m18.792s sys 0m0.016s 0 real 0m19.340s user 0m19.092s sys 0m0.019s 0 real 0m19.135s user 0m18.869s sys 0m0.014s 0 [1]+ Done emacs -fn 9x15 b.cc real 0m20.012s user 0m19.669s sys 0m0.164s W.
REOPEN the bug 14809: i can verify the problem partly. performance with global static arrays of double's is up to 10% higher as with pointers to dynamic (new or malloc) arrays of same size and type!! PSA .. pointer to static arrays time(average over 5runs): (big matrix multiplications, g++) static=21.8sec; PSA=23.4sec ;dyn.new=23.4sec; dyn.malloc=23.0sec (last week i had up to 3 sec diff.!?) result: pointers slow down my prog!? used memory: ca. 05 megabyte computer: kernel 2.4.23-1.SMP i686 Dual Processor Pentium III, 1.2 GHz, 512K L2-Cache, 2530 bogomips Red Hat Linux(2years old, but some updates (gcc,kernel,libs) GCC: -3.3.3 --enable-shared --enable-languages=c,c++,f77 Thread model: posix, gcc version 3.3.3
That's an unrelated thing. Sometimes you can get away with a smaller number of dereferencing operations if memory is allocated statically. W.