This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: gomp slowness
On Sat, 2007-10-20 at 22:32 +0400, Tomash Brechko wrote:
> I'm not sure what OpenMP spec says about default data scope (too lazy
> to read through),
> but it seems that examples from
> http://kallipolis.com/openmp/2.html assume default(private), while GCC
> GOMP defaults to shared. In your case,
>
> #pragma omp parallel for shared(A, row, col)
> for (i = k+1; i<SIZE; i++) {
> for (j = k+1; j<SIZE; j++) {
> A[i][j] = A[i][j] - row[i] * col[j];
> }
> }
>
> '#pragma omp for' makes 'i' private implicitly (it couldn't be
> otherwise), but 'j' is still shared.
Good job!!
Dang, so used to C++ and other languages where the control
variable is localised. Haha .. but not in my own language Felix.
> I just tried your original case,
> not only it is slow, but it also produces different results with and
> without OpenMP (just try to print any elem of 'A'). Adding
> 'private(j)' (or defining 'j' inside the outer loop) will fix the
> case.
>
> It would be nice if someone would post the measurement for the fixed
> case, my machine has only HT, and I experience slowdown for this
> example (but still it runs much faster then before the fix).
Now I get: #threads Real User Sys
1 1.052 1.043 0.009
2 0.866 1.582 0.026
This is a much better result, 50% speedup (30% less time used).
I only have a dual core at the moment (without HT), be nice
to see the result for a quad!
BTW: I also tried this variation in C++:
#pragma omp parallel for shared(A, row, col)
for (i = k+1; i<SIZE; i++) {
for (int j = k+1; j<SIZE; j++) {
///<-----------------
A[i][j] = A[i][j] - row[i] * col[j];
}
}
which works with the same timings as the C with 'private(j)'.
--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net