This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: gomp slowness

From: skaller <skaller at users dot sourceforge dot net>
To: Tomash Brechko <tomash dot brechko at gmail dot com>
Cc: gcc at gcc dot gnu dot org
Date: Sun, 21 Oct 2007 10:37:00 +1000
Subject: Re: gomp slowness
References: <1192640402.10798.11.camel@rosella.wigram> <4716430A.5010204@swansea.ac.uk> <1192668349.25512.2.camel@rosella.wigram> <ff6m44$9g2$1@ger.gmane.org> <1192682864.25512.49.camel@rosella.wigram> <20071020183237.GA27627@moonlight.home>

On Sat, 2007-10-20 at 22:32 +0400, Tomash Brechko wrote:
> I'm not sure what OpenMP spec says about default data scope (too lazy
> to read through),

>  but it seems that examples from
> http://kallipolis.com/openmp/2.html assume default(private), while GCC
> GOMP defaults to shared.  In your case,
> 
>   #pragma omp parallel for shared(A, row, col)
>     for (i = k+1; i<SIZE; i++) {
>       for (j = k+1; j<SIZE; j++) {
>           A[i][j] = A[i][j] - row[i] * col[j];
>       }
>     }
> 
> '#pragma omp for' makes 'i' private implicitly (it couldn't be
> otherwise), but 'j' is still shared.  

Good job!! 

Dang, so used to C++ and other languages where the control
variable is localised. Haha .. but not in my own language Felix.



> I just tried your original case,
> not only it is slow, but it also produces different results with and
> without OpenMP (just try to print any elem of 'A').  Adding
> 'private(j)' (or defining 'j' inside the outer loop) will fix the
> case.
> 
> It would be nice if someone would post the measurement for the fixed
> case, my machine has only HT, and I experience slowdown for this
> example (but still it runs much faster then before the fix).

Now I get: #threads   Real  User   Sys
               1     1.052  1.043  0.009
	       2     0.866  1.582  0.026

This is a much better result, 50% speedup (30% less time used).
I only have a dual core at the moment (without HT), be nice
to see the result for a quad!

BTW: I also tried this variation in C++:

  #pragma omp parallel for shared(A, row, col)
    for (i = k+1; i<SIZE; i++) {
      for (int j = k+1; j<SIZE; j++) {
           ///<-----------------
          A[i][j] = A[i][j] - row[i] * col[j];
      }
    }

which works with the same timings as the C with 'private(j)'.


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

References:
- gomp slowness
  - From: skaller
- Re: gomp slowness
  - From: Biagio Lucini
- Re: gomp slowness
  - From: skaller
- Re: gomp slowness
  - From: Biplab Kumar Modak
- Re: gomp slowness
  - From: skaller
- Re: gomp slowness
  - From: Tomash Brechko

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]