This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Strange Performance Hit on 2D-Loop


Just for the records: I finally found the issue here. It's a problem
of both, alignment and cache thrashing. When using aligned memory
(e.g. via posix_memalign()) and using a suitable offset within that
memory, the effect goes away. So it's a processor effect, not a
compiler issue. :-) 

Best
-Andreas


On 16:19 Thu 09 Jul     , Andreas Schäfer wrote:
> Hey guys,
> 
> I noticed a strange performance hit in one of our stencil codes,
> causing it to run twice as long. 
> 
> To nail down the error, I reduced our code to the two attached demo
> programs. Basically they take two matrices and average each matrix
> element with its four direct neighbors. Depending on how these
> matrices are allocated, the performance hit occurs -- or does not.
> 
> Here is the diff of the two files:
> @@ -17,8 +17,7 @@
> 
>  void test(double (*grid)[GRID_WIDTH])
>  {
> -    double (*gridOld)[GRID_WIDTH] =
> -        malloc(GRID_WIDTH * GRID_HEIGHT * sizeof(double));
> +    double (*gridOld)[GRID_WIDTH] = gridOldArray;
>      double (*gridNew)[GRID_WIDTH] = gridNewArray;
>      printAddress(&gridNew[0][0]);
>      printAddress(&gridOld[0][0]);
> 
> where gridOldArray is a statically allocated array. Depending on the
> machines processor the performance hit varies from negligible to
> dramatic:
> 
> 
> Processor          GCC Version Time(slow) Time(fast) Performance Hit
> ------------------ ----------- ---------- ---------- ---------------
> Core 2 Quad Q9550  4.3.3       12.19s      5.11s     138%
> Athlon 64 X2 3800+ 4.3.3        7.34s      6.61s      11%
> Opteron 2378       4.3.2        6.13s      5.60s       9%
> Opteron 2352       4.3.3        8.16s      7.96s       2%
> Xeon 3.00GHz       4.3.3       18.98s     14.67s      29%
> 
> Apparently Intel systems are more susceptible to this effect. 
> 
> Can anyone reproduce these results?
> And could anyone explain, why this happens?
> 
> Thanks in advance
> -Andreas
> 
> 
> -- 
> ============================================
> Andreas Schäfer
> Cluster and Metacomputing Working Group
> Friedrich-Schiller-Universität Jena, Germany
> 0049/3641-9-46376
> PGP/GPG key via keyserver
> I'm a bright... http://www.the-brights.net
> ============================================
> 
> (\___/)
> (+'.'+)
> (")_(")
> This is Bunny. Copy and paste Bunny into your 
> signature to help him gain world domination!

> #define GRID_WIDTH  1024
> #define GRID_HEIGHT 1024
> #define MAX_STEPS 1024
> 
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> 
> double grid[GRID_HEIGHT][GRID_WIDTH];
> double gridNewArray[GRID_HEIGHT][GRID_WIDTH];
> double gridOldArray[GRID_HEIGHT][GRID_WIDTH];
> 
> void printAddress(void *p)
> {
>     printf("address %p\n", p);
> }
> 
> void test(double (*grid)[GRID_WIDTH])
> {
>     double (*gridOld)[GRID_WIDTH] = gridOldArray;
>     double (*gridNew)[GRID_WIDTH] = gridNewArray;
>     printAddress(&gridNew[0][0]);
>     printAddress(&gridOld[0][0]);
> 
>     // copy initial state
>     for (int y = 0; y < GRID_HEIGHT; ++y) {
>         memcpy(&gridOld[y][0], &grid[y][0], GRID_WIDTH * sizeof(double));
>         memset(&gridNew[y][0], 0, GRID_WIDTH * sizeof(double));
>     }
> 
>     // update matrices
>     for (int step = 0; step < MAX_STEPS; ++step) {
>         for (int y = 1; y < GRID_HEIGHT-1; ++y) 
>             for (int x = 1; x < GRID_WIDTH-1; ++x)
>                 gridNew[y][x] = 
>                     (gridOld[y-1][x  ] + 
>                      gridOld[y  ][x-1] + 
>                      gridOld[y  ][x  ] + 
>                      gridOld[y  ][x+1] + 
>                      gridOld[y+1][x  ]) * 0.2;
>         double (*tmp)[GRID_WIDTH] = gridOld;
>         gridOld = gridNew;
>         gridNew = tmp;
>     }
> 
>     // copy result back
>     for (int y = 0; y < GRID_HEIGHT; ++y)
>         memcpy(&grid[y][0], &gridOld[y][0], GRID_WIDTH * sizeof(double));
> }
> 
> void setupGrid()
> {
>     for (int y = 0; y < GRID_HEIGHT; ++y)
>         for (int x = 0; x < GRID_WIDTH; ++x)
>             grid[y][x] = 0;
> 
>     for (int y = 10; y < 20; ++y)
>         for (int x = 10; x < 20; ++x)
>             grid[y][x] = 1;
> }
> 
> int main(int argc, char** argv)
> {
>     setupGrid();
>     test(grid);
>     printf("res: %f\n", grid[10][10]); // prevent dead code elimination
>     return 0;
> }






-- 
==========================================================
Andreas Schäfer
HPC and Grid Computing
Chair of Computer Science 3
Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
+49 9131 85-27910
PGP/GPG key via keyserver
I'm a bright... http://www.the-brights.net
==========================================================

(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your 
signature to help him gain world domination!

Attachment: pgp00000.pgp
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]