14809 – performance depends on allocation methods

Bug 14809 - performance depends on allocation methods

Summary: performance depends on allocation methods

Status:	RESOLVED INVALID

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	c++ (show other bugs)
Version:	3.3.3

Importance:	P2 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2004-04-01 12:06 UTC by Sixten Boeck
Modified:	2005-07-23 22:49 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Sixten Boeck 2004-04-01 12:06:08 UTC

Compiler: gcc 3.3.3 
System: Redhat,  SuSE 8.1, 8.2, 9.0, Pentium3, 1.2GHz, 2GB Ram 
compiler options: default options, was compiled with "configure, make, make install" 
command line: g++ -O2 file.cpp 
compiler output: no warning, no error 
output of "g++ -v": 
Reading specs from /usr/local/gcc-3.3.3/lib/gcc-lib/i686-pc-linux-gnu/3.3.3/specs 
Configured with: /soft/gcc/gcc-3.3.3/gcc-3.3.3/configure --prefix=/usr/local/gcc-3.3.3 
--enable-shared --enable-languages=c,c++,f77 
Thread model: posix 
gcc version 3.3.3 
 
 
The performance of a program seems to depend on the way how vectors are allocated: 
Here are 4 small example codes: 
(a) slow[35sec]: using C++ "new/delete" 
(b) fast[22sec]: using C-like malloc 
(c) slow[35sec]: using C-like malloc via function call 
(d) fast[22sec]: using C-like malloc via INLINE fuction call 
 
 
g++ -O2 [a,b,c,d].cpp 
timex a.out 
 
i don't understand, why the performance depends on the way how i allocate pointers. 
Please help. 
 
thanks  
 
 
Sixten 
 
 
 
----- a.cpp --- 
#include <stdio.h> 
int main () 
{ 
   long it; 
   int i, n = 1000; 
   double *x = new double [n]; 
   double *y = new double [n]; 
   double a = 0.2, b = -a; 
   for (i=0; i < n; ++i)    x[i] = y[i] = i; 
 
   for (it=0; it < 5000000L; ++it) {  
      for (i=0; i < n; ++i)  { 
         y[i] += a * x[i]; 
         y[i] += b * x[i]; 
      } 
   } 
 
   printf ("%g\n", y[0]); 
 
   delete [] y; 
   delete [] x; 
} 
 
 ---- b.cpp --- 
int main () 
{ 
   long it; 
   int i, n = 1000; 
 
   double *x = (double *)malloc ((size_t)(n * sizeof(double))); 
   double *y = (double *)malloc ((size_t)(n * sizeof(double))); 
   double a = 0.2, b = -a; 
 
   for (i=0; i < n; ++i)    x[i] = y[i] = i; 
 
   for (it=0; it < 5000000L; ++it) {  
      for (i=0; i < n; ++i)  { 
         y[i] += a * x[i]; 
         y[i] += b * x[i]; 
      } 
   } 
 
   printf ("%g\n", y[0]); 
   free (y); 
   free (x); 
} 
 
--- c.cpp --- 
double *myMalloc (int nElem) 
{ 
   return (double *)(malloc ((size_t)(nElem * sizeof(double)))); 
} 
 
int main () 
{ 
   long it; 
   int i, n = 1000; 
 
   double *x = myMalloc(n); 
   double *y = myMalloc (n); 
   double a = 0.2, b = -a; 
   for (i=0; i < n; ++i)   x[i] = y[i] = i; 
 
   for (it=0; it < 5000000L; ++it) {  
      for (i=0; i < n; ++i)  { 
         y[i] += a * x[i]; 
         y[i] += b * x[i]; 
      } 
   } 
 
   printf ("%g\n", y[0]); 
 
   free (y); 
   free (x); 
} 
 
 
--- d.cpp --- 
inline double *myMalloc (int nElem) 
{ 
   return (double *)(malloc ((size_t)(nElem * sizeof(double)))); 
} 
 
int main () 
{ 
   long it; 
   int i, n = 1000; 
 
   double *x = myMalloc(n); 
   double *y = myMalloc (n); 
   double a = 0.2, b = -a; 
   for (i=0; i < n; ++i)   x[i] = y[i] = i; 
 
   for (it=0; it < 5000000L; ++it) {  
      for (i=0; i < n; ++i)  { 
         y[i] += a * x[i]; 
         y[i] += b * x[i]; 
      } 
   } 
 
   printf ("%g\n", y[0]); 
 
   free (y); 
   free (x); 
}

Comment 1 Andrew Pinski 2004-04-01 13:25:09 UTC

I cannot reproduce it on my machine with the mainline.

tin:~/src/gnu/gcctest>g++ -O2 pr14809.a.cc
tin:~/src/gnu/gcctest>time ./a.out
0
21.460u 0.020s 0:42.13 50.9%    0+0k 0+0io 180pf+0w
tin:~/src/gnu/gcctest>g++ -O2 pr14809.b.cc
time ./a.out
0
21.750u 0.110s 0:33.76 64.7%    0+0k 0+0io 177pf+0w
tin:~/src/gnu/gcctest>g++ -O2 pr14809.c.cc
tin:~/src/gnu/gcctest>!tim
time ./a.out
0
20.970u 0.750s 0:28.55 76.0%    0+0k 0+0io 177pf+0w

Comment 2 Falk Hueffner 2004-04-01 13:27:24 UTC

It probably depends on alignment of the pointers relative to cache lines. You
should look at alignment.

Comment 3 boeck@sfhingx.de 2004-04-01 15:13:23 UTC

Subject: Re:  performance depends on allocation methods

hi, 
 
> ------- Additional Comments From falk at debian dot org  2004-04-01 13:27 
> ------- 
> It probably depends on alignment of the pointers relative to cache lines. 
> You 
> should look at alignment. 
 
thank you very much for your response. i tried all compiler flags related 
somehow to alignment. vectors allocated with "new" are still significantly 
slower than those allocated with "malloc". 
 
btw. i tried the test programs on some more linux machines. the effect seems 
to be larger on SMP machines. but i don't know what this means... 
 
 
any ideas??? 
 
 
:) 
sixten

Comment 4 Falk Hueffner 2004-04-01 15:34:51 UTC

Compiler flags don't have any effect on this. I was suggesting to
examine alignment and try to adjust it manually. Also check
coalignment with respect to cache line index, maybe you have a one-way
associative cache and get collisions.

In any case, it seems unlikely that gcc can do much about this.

Comment 5 Wolfgang Bangerth 2004-04-01 15:59:52 UTC

I can't reproduce this either, but it took me a while until I understood 
Falk's comment. The point is: you call malloc/new only once, so the run 
time of these functions must necessarily be negligible and can't account 
for the 13 second slowdown you observe. So it must be the output of these 
functions (for example the alignment of the data region returned), which 
however is not under gcc control. 
 
Since none of us can reproduce this, I guess we should close this PR. 
 
For the record, here are the results for running the four testcases: 
g/x> /home/bangerth/bin/gcc-3.4-pre/bin/c++ -O2 a.cc -o a 
g/x> /home/bangerth/bin/gcc-3.4-pre/bin/c++ -O2 b.cc -o b 
g/x> /home/bangerth/bin/gcc-3.4-pre/bin/c++ -O2 c.cc -o c 
g/x> /home/bangerth/bin/gcc-3.4-pre/bin/c++ -O2 d.cc -o d 
g/x> time ./a ; time ./b ; time ./c ; time ./d 
0 
 
real    0m19.101s 
user    0m18.792s 
sys     0m0.016s 
0 
 
real    0m19.340s 
user    0m19.092s 
sys     0m0.019s 
0 
 
real    0m19.135s 
user    0m18.869s 
sys     0m0.014s 
0 
[1]+  Done                    emacs -fn 9x15 b.cc 
 
real    0m20.012s 
user    0m19.669s 
sys     0m0.164s 
 
W.

Comment 6 Alexander Kleinsorge 2004-05-03 17:26:19 UTC

REOPEN the bug 14809: i can verify the problem partly. 
 
performance with global static arrays of double's is up to 10% higher as with   
pointers to dynamic (new or malloc) arrays of same size and type!!  
PSA .. pointer to static arrays   
time(average over 5runs):  (big matrix multiplications, g++)   
   static=21.8sec; PSA=23.4sec ;dyn.new=23.4sec; dyn.malloc=23.0sec  
   (last week i had up to 3 sec diff.!?) 
result: pointers slow down my prog!?  
used memory: ca. 05 megabyte  
   
computer: kernel 2.4.23-1.SMP i686   
Dual Processor Pentium III, 1.2 GHz, 512K L2-Cache, 2530 bogomips   
Red Hat Linux(2years old, but some updates (gcc,kernel,libs)    
   
GCC: -3.3.3 --enable-shared --enable-languages=c,c++,f77   
Thread model: posix,   gcc version 3.3.3

Comment 7 Wolfgang Bangerth 2004-05-03 18:23:19 UTC

That's an unrelated thing. Sometimes you can get away with a smaller number 
of dereferencing operations if memory is allocated statically. 
W.