Hi, I noticed that profiling data are wrong if OpenMp is used via -fopenmp. I have only two #pragma omp parallel statements in a class and the calling statistic of this class contructor is completly wrong (gprof tells me this constructor is called far too often). I found bug http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29935 related to OpenMP and profiling but it seems to be a different issue. I tried to reproduce it on a minimal example but could not get very wrong results. Nevertheless I have a simple example which results in different profiling data depending on the -fopenmp option: #include <vector> class VectorStuff { public: VectorStuff() { } void AddVectors(int n, const double *x1, const double *x2, double *y); }; void VectorStuff::AddVectors(int n, const double *x1, const double *x2, double *y) { y = new double[n]; #pragma omp parallel for for (int i=0; i<n; ++i) y[i] = x1[i] + x2[i]; } int main() { VectorStuff vec; const int n = 1000; std::vector<double> x1(n, -2), x2(n, 2); double *y; vec.AddVectors(n, &x1[0], &x2[0], y); return 0; } Comparig the output of g++-4.2svn -pg -g -fopenmp main.cpp; ./a.out; gprof ./a.out > ./a.out.gprof1 g++-4.2svn -pg -g main.cpp; ./a.out; gprof ./a.out > ./a.out.gprof2 results in --- a.out.gprof1 2007-07-17 10:47:04.000000000 +0200 +++ a.out.gprof2 2007-07-17 10:47:13.000000000 +0200 @@ -37,7 +37,6 @@ 0.00 0.00 0.00 2 0.00 0.00 void std::_Destroy<double*, double>(double*, double*, std::allocator<double>) 0.00 0.00 0.00 1 0.00 0.00 VectorStuff::AddVectors(int, double const*, double const*, double*) 0.00 0.00 0.00 1 0.00 0.00 VectorStuff::VectorStuff() - 0.00 0.00 0.00 1 0.00 0.00 main % the percentage of the total running time of the time program used by this function. I agree that this is not very critical for this example, but for other programs profiling is just useless. $ c++-4.2svn --version c++-4.2svn (GCC) 4.2.1 20070713 (prerelease)
This is similar to the comment (maybe misplaced) of http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31862 The problem, as far as I understand it is that any kind of profiling (gprof, profile-arcs, probably mudflap, ...) rely on some global variables that would need to be made thread local. I do not know how difficult this would be however....
And to reply to myself, it needs either to use thread local storage to hold the counters and then to add some piece of code to fuse the values of the various counters at the end of a thread (which might not be easy ?) or to use atomic operations (which existence depends on the architecture, but I hope that all multi-core processors have such instructions).
(In reply to comment #2) > And to reply to myself, it needs either to use thread local storage to hold the > counters and then to add some piece of code to fuse the values of the various > counters at the end of a thread (which might not be easy ?) or to use atomic > operations (which existence depends on the architecture, but I hope that all > multi-core processors have such instructions). (Don't forget about SMP machines, I have a SGI Octane (2 x Mips R12000 CPUs).) An Open MPI related discussion about atomic operations happened the last days, because architecture specific assembler code failed again for some exotic platforms. See e.g. http://lists.debian.org/debian-mips/2007/07/msg00036.html
grpof profiling is all done via a call to mcount and mcount is controlled by libc (in the GNU/Linux case glibc). So I doubt this is a GCC bug.
Subject: Re: Profiling not possible with -fopenmp On 17 Jul 2007 10:24:12 -0000, jensseidel at users dot sf dot net <gcc-bugzilla@gcc.gnu.org> wrote: > An Open MPI related discussion about atomic operations happened > the last days, because architecture specific assembler code failed again > for some exotic platforms. And that is the reason why GCC added atomic builtins when openmp came in also. There are manuals for a reason :). -- Pinski
(In reply to comment #4) > grpof profiling is all done via a call to mcount and mcount is controlled by > libc (in the GNU/Linux case glibc). So I doubt this is a GCC bug.> OK, for the record: I use OpenSuse 10.2 with glibc 2.5. According to http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html#SEC1 mcount occurs in the gprof output but I haven't seen this yet (gprof 2.17.50.0.5). (In reply to comment #5) > And that is the reason why GCC added atomic builtins when openmp came > in also. There are manuals for a reason :). I don't understand this but it may be off topic. (Should I inform the Open MPI people about atomic assembler locking code in gcc so that they can reuse it?)
gprof and mcount is under control of glibc project. There was an old email about this on the glibc side at https://sourceware.org/legacy-ml/libc-hacker/1999-01/msg00105.html .