Hi, I have tried various OpenMP examples on the web and got them all to work fine. When I tried to parallelize some software that is part of a larger project, I have had a problem; OpenMP reports only one processor available and so I get only 1 thread. More details follow. I am running on an dual quad-core PC. I use RHEL 5.2 and have used the gcc 4.3.2 compiler (compiled it myself) and the RedHat's "stock" 4.1.2 compiler that has had OpenMP backported to it. Both exhibit the same problem. In some simple example code (e.g., dotproduct_reduction.cpp that found on the web), I can put in the following print: #ifdef _OPENMP cout << " Number of processors available:" << omp_get_num_procs() << " MAX Number of threads " << omp_get_max_threads() << endl; cout.flush(); #endif and get something like: Number of processors available:8 MAX Number of threads 8 The example runs 8 threads and appears to run on multiple processors. In my large project on the same computer: #ifdef _OPENMP cout << " Number of processors available:" << omp_get_num_procs() << " MAX number of OpenMP threads " << omp_get_max_threads() << endl; #endif and get: Number of processors available:1 MAX number of OpenMP threads 1 And so it doesn't parallelize. I am stumped! What could be causing OpenMP to be confused with regard to the number of processors? I thought it might be something in the compile, so I re-compiled absolutely everything with "-fopenmp" turned on and it made no difference. I tried setting the environment variable OMP_NUM_THREADS to 2 and got 2 threads, but only one processor was running the job. Compile samples: cd /home/hopper/UMBRA_4/umbra/SNL/rwm/buildRWM/utility && /usr/bin/g++-4.3.2 -DBOOST_ALL_NO_LIB -Dutility_EXPORTS -O3 -DNDEBUG -fopenmp -fPIC -I/usr/include/boost-1_35 -I/home/hopper/UMBRA_4/umbra/SNL/rwm/TNT -I/home/hopper/UMBRA_4 -Wall -fPIC -o CMakeFiles/utility.dir/matrix_math.o -c /home/hopper/UMBRA_4/umbra/SNL/rwm/utility/matrix_math.cpp cd /home/hopper/UMBRA_4/umbra/SNL/rwm/buildRWM/micp && /usr/bin/g++-4.3.2 -DBOOST_ALL_NO_LIB -Dmicp_EXPORTS -O3 -DNDEBUG -fopenmp -fPIC -I/usr/include/boost-1_35 -I/home/hopper/UMBRA_4/umbra/SNL/rwm/utility -I/home/hopper/UMBRA_4/umbra/SNL/rwm/CDF -I/home/hopper/UMBRA_4 -I/home/hopper/UMBRA_4/include/cstk2 -Wall -fPIC -o CMakeFiles/micp.dir/icp.o -c /home/hopper/UMBRA_4/umbra/SNL/rwm/micp/icp.cpp Link sample (everything is linked as a shared library) Linking CXX shared library ../libutility.so cd /home/hopper/UMBRA_4/umbra/SNL/rwm/buildRWM/utility && /usr/bin/cmake -E cmake_link_script CMakeFiles/utility.dir/link.txt --verbose=1 /usr/bin/g++-4.3.2 -fPIC -O3 -DNDEBUG -fopenmp -Wl,--no-undefined -shared -Wl,-soname,libutility.so -o ../libutility.so CMakeFiles/utility.dir/CharObj.o CMakeFiles/utility.dir/Classify.o CMakeFiles/utility.dir/crc16_func.o CMakeFiles/utility.dir/crc32_func.o CMakeFiles/utility.dir/dsvdfit.o CMakeFiles/utility.dir/geometry.o CMakeFiles/utility.dir/math_funcs.o CMakeFiles/utility.dir/math_predicate.o CMakeFiles/utility.dir/matrix_math.o CMakeFiles/utility.dir/PCI.o CMakeFiles/utility.dir/rwm_status.o CMakeFiles/utility.dir/statistics.o CMakeFiles/utility.dir/string_funcs.o CMakeFiles/utility.dir/timing.o CMakeFiles/utility.dir/utility.o CMakeFiles/utility.dir/ValIndex.o /usr/local/lib/libumb.so -Wl,-rpath,/usr/local/lib BTW, the .tgz file containing the source code (exclude *.o and*.so) for this project has a size of 315788806, so its huge!
"I tried setting the environment variable OMP_NUM_THREADS to 2 and got 2 threads, but only one processor was running the job." this suggests your operating system is limiting your job to one CPU.
Subject: Re: OpenMP thinks that I have 1 processor on an 8 processor pc rguenth at gcc dot gnu dot org wrote: > ------- Comment #1 from rguenth at gcc dot gnu dot org 2008-09-19 14:43 ------- > "I tried setting the environment variable OMP_NUM_THREADS to 2 and got 2 > threads, but only one processor was running the job." > > this suggests your operating system is limiting your job to one CPU. > > > -- > > rguenth at gcc dot gnu dot org changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Status|UNCONFIRMED |RESOLVED > Resolution| |INVALID > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37586 > > ------- You are receiving this mail because: ------- > You reported the bug, or are watching the reporter. > > Hi, Thanks for the response! If I run the simple sample code, then I get 8 processors and 8 threads while my umbra test case always replys that only 1 processor is available. The umbra test case that I am loading is small with regards to memory used and runs well (if slowly... 30 seconds or so), so it seems odd that the OS thinks that only 1 of 8 processors are available when the system-monitor shows that the remaining 6 or 7 are not being used (< 1%). So there is something strange going on... Either the query to the operating system is providing an invalid answer or the compile is somehow wrong or something!??? What do you think is the root cause of the problem? What does the code making the query do? Where is it? I am quite willing and (probably) able to work with someone to run tests on my computers to try to understand what the root problem is. There are a number of people here at the Intelligent Systems and Robotics Center interested in parallelizing umbra ( http://www.sandia.gov/isrc/UMBRA.html) and I have NO idea why my simple test inside umbra has FAILED. Thanks, Ralph Peters Principal Member of the Technical Staff Intelligent Systems and Robotics Sandia National Laboratories Albuquerque, NM USA
As already said in the openmp.org forum, omp_get_num_procs () will only return smaller number than the number of system CPUs online, if GOMP_CPU_AFFINITY env var is used, or if the calling process and/or thread has CPU affinity limited to a subset of CPUs. You can just step through omp_get_num_procs () / get_num_procs () routine and/or look at strace to see what is the case. omp_get_max_threads () in 4.3 and earlier incorrectly adjusts for dyn_var etc., works like parallel region determines the number of threads if num_thread isn't specified, only on the GCC trunk (4.4 and later) it returns the nthreads_var ICV.
Subject: Re: OpenMP thinks that I have 1 processor on an 8 processor pc jakub at gcc dot gnu dot org wrote: > ------- Comment #3 from jakub at gcc dot gnu dot org 2008-09-19 21:45 ------- > As already said in the openmp.org forum, omp_get_num_procs () will only return > smaller number than the number of system CPUs online, if GOMP_CPU_AFFINITY env > var is used, or if the calling process and/or thread has CPU affinity limited > to a subset of CPUs. You can just step through omp_get_num_procs () / > get_num_procs () routine and/or look at strace to see what is the case. > > omp_get_max_threads () in 4.3 and earlier incorrectly adjusts for dyn_var etc., > works like parallel region determines the number of threads if num_thread isn't > specified, only on the GCC trunk (4.4 and later) it returns the nthreads_var > ICV. > > > -- > > jakub at gcc dot gnu dot org changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Status|RESOLVED |UNCONFIRMED > Keywords| |openmp > Resolution|INVALID | > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37586 > > ------- You are receiving this mail because: ------- > You reported the bug, or are watching the reporter. > > Hi, Do you mean that there might be a problem with gcc 4.3 with regards to the problem that I see, but it may be fixed with gcc 4.4? It appears to me that GOMP_CPU_AFFINITY allows you to force threads onto particular processors -- is this correct? So are you suggesting that I could use GOMP_CPU_AFFINITY to sidestep this problem? If so, I will try it Monday. Thanks! Have a nice weekend! Ralph
No, I didn't mean to ask you to try to work around it, I asked you to investigate why omp_get_num_procs returns 1 instead of 8. The gcc 4.3 vs. 4.4 difference only affects omp_get_max_threads, not omp_get_num_procs, so even omp_get_num_procs reports 1, the options are 1) the process has affinity set to just one CPU 2) GOMP_CPU_AFFINITY has been used 3) you have omp_get_num_procs defined elsewhere in your program, verify that libgomp function is really used. strace -f -e sched_getaffinity dump of the program could reveal 1), 2) can be determined by looking at the scripts you use to start the program (or by adding getenv call in the program to print that for you), 3) can be seen in the debugger and/or by using LD_DEBUG=all env var.
Subject: Re: OpenMP thinks that I have 1 processor on an 8 processor pc jakub at gcc dot gnu dot org wrote: > ------- Comment #5 from jakub at gcc dot gnu dot org 2008-09-20 08:22 ------- > No, I didn't mean to ask you to try to work around it, I asked you to > investigate why omp_get_num_procs returns 1 instead of 8. The gcc 4.3 vs. 4.4 > difference only affects omp_get_max_threads, not omp_get_num_procs, so even > omp_get_num_procs reports 1, the options are 1) the process has affinity set to > just one CPU 2) GOMP_CPU_AFFINITY has been used 3) you have omp_get_num_procs > defined elsewhere in your program, verify that libgomp function is really used. > strace -f -e sched_getaffinity dump of the program could reveal 1), 2) can be > determined by looking at the scripts you use to start the program (or by adding > getenv call in the program to print that for you), 3) can be seen in the > debugger and/or by using LD_DEBUG=all env var. > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37586 > > ------- You are receiving this mail because: ------- > You reported the bug, or are watching the reporter. > > Hi Jakub, I finally have a bit of time respond to your last email! I may be new enough to this to be missing something important. Why don't we start at the beginning? I compiled everything with -fopenmp flag on the compiler (4.3.2 is the version for g++ and gcc) ***************************************** Openmp code exists in only one place in my software. [hopper@babs rwm]$ find . -name "*.*pp" -exec grep -n -H omp.h {} \; ./umbraModCDF/umbModCDFDataSet.cpp:237:#include <omp.h> [hopper@babs rwm]$ find . -name "*.*pp" -exec grep -n -H omp_get_num_procs {} \; ./umbraModCDF/umbModCDFDataSet.cpp:246: cerr << " OPENMP is " << _OPENMP << " Number of processors available:" << omp_get_num_procs() << " MAX number of OpenMP threads " << omp_get_max_threads() << endl; [hopper@babs rwm]$ All of the OpenMP code that I have is in the following function ********************************** #include <omp.h> void test_openmp() { cerr << "\nEnter test_openmp() " << endl; #ifdef _OPENMP // omp_set_dynamic(true); // omp_set_num_threads(8); cerr << " OPENMP is " << _OPENMP << " Number of processors available:" << omp_get_num_procs() << " MAX number of OpenMP threads " << omp_get_max_threads() << endl; char *GCU = getenv("GOMP_CPU_AFFINITY"); if(GCU != NULL) cerr << " GOMP_CPU_AFFINITY is " << GCU << "!!!!\n"<< endl; else cerr << " GOMP_CPU_AFFINITY is unknown" << endl; #endif cerr << "\nExit test_openmp()\n" << endl; cerr.flush(); } *********** Output snippet is: Enter test_openmp() OPENMP is 200505 Number of processors available:1 MAX number of OpenMP threads 1 GOMP_CPU_AFFINITY is unknown Exit test_openmp() ************ I ran strace and I got some results. In the output I see: sched_getaffinity(3502, 128, { ff, 0, 0, 0 }) = 32 .... sched_getaffinity(3502, 128, { 1, 0, 0, 0 }) = 32 ... sched_getaffinity(3502, 128, { 1, 0, 0, 0 }) = 32 I will let you interpret the output which is reproduce in full below. [hopper@babs rwm]$ strace -f -e sched_getaffinity uview ./viewer_scripts/viewer_CDF.tcl Process 3503 attached (waiting for parent) Process 3503 resumed (parent 3502 ready) Process 3503 detached --- SIGCHLD (Child exited) @ 0 (0) --- provided packge "ucl" version 4.7.1 STATUS> Umbra installed at UmbraPath -> /home/hopper/UMBRA_4 STATUS> Use UmbraPath variable to refer to this location STATUS> uview is using gui from developerViewer STATUS> loading umbraConfig.tcl from uclLoad.tcl STATUS> searching for umbra config files in path /home/hopper/UMBRA_4/umbra/SNL/rwm . /home/hopper/UMBRA_4 library r {umb {4.7.1 debug {Sep 29 2008} 13:08:47}} {usg {4.7.1 debug {Sep 29 2008} 13:08:56}} {ucl {4.7.1 debug {Sep 29 2008} 13:08:55}} STATUS> loading ustk library from uclLoad.tcl *-----------------------------------------------------------------------------* * The C-Space Toolkit (C) software contained in this program is the * * property of Sandia Corporation. * * Copyright 1995-2003 (C) Sandia Corporation. All rights reserved. * * * * The C-Space Toolkit (``CSTk'') was developed at Sandia National * * Laboratories, which is operated by the Sandia Corporation under contract * * for the United States Department of Energy. The CSTk is is protected by * * copyright under the laws of the United States. CSTk software is not to * * be used, disclosed, or duplicated without explicit written authorization * * from Sandia Corporation. * *-----------------------------------------------------------------------------* STATUS> loading general.tcl from umbra/core/ucl directory STATUS> loading iTclUtilities.tcl from umbra/core/ucl directory STATUS> loading monitor.tcl from umbra/core/ucl directory STATUS> loading XmlSaxParserCore.tcl from umbra/core/ucl directory STATUS> loading XmlUtils.tcl from umbra/core/ucl directory STATUS> loading umbTime.tcl from from umbra/core/ucl directory STATUS> Time Modules simClock & wallClock built STATUS> loading tkconUmb.tcl from uclLoad.tcl loading tkconUmb.tcl ... modified version of tkcon.tcl for umbra STATUS> loading umbraConsole.tcl from uclLoad.tcl STATUS> loading mkConsoleUmb.tcl from uclLoad.tcl STATUS> Making the umbra console and a usg::Scene scene STATUS> All tcl output will now go to umbra console STATUS> Only cout/cerr from C++ will go to this window -------------------------------------------------------------------------------- NOTE: usg::Scene using usg default data file path list path = .:/usr/local/share/OpenSceneGraph/data:/usr/local/share/OpenSceneGraph/data/Env:/usr/local/share/OpenSceneGraph/data/fonts:/usr/local/share/OpenSceneGraph/data/Images:/usr/share/OpenSceneGraph/data:/usr/share/OpenSceneGraph/data/Env:/usr/share/OpenSceneGraph/data/fonts:/usr/share/OpenSceneGraph/data/Images camera number: 0 sched_getaffinity(3502, 128, { ff, 0, 0, 0 }) = 32 INFO> navigation mode set to umbra Warning: font file "fonts/arial.ttf" not found. Setting savConfigCB to '::guiWrapper saveConfig' sched_getaffinity(3502, 128, { 1, 0, 0, 0 }) = 32 sLoadFile filename:/home/vision_data/geometry_objects/test.cdf fileType: lineRowCnt=-1 lineColCnt=-1 std::string UmbModCDFDataSet::LoadFile(const s.... Enter test_openmp() sched_getaffinity(3502, 128, { 1, 0, 0, 0 }) = 32 OPENMP is 200505 Number of processors available:1 MAX number of OpenMP threads 1 GOMP_CPU_AFFINITY is unknown Exit test_openmp() After test_openmp() Processing CDF File: /home/vision_data/geometry_objects/test.cdf M(9): Start of CDF Version 1.0 stream M(1292): End of CDF Version 1.0 stream Entering UmbModCutCDFDataSet::_CutData(...) cdf_cut Acummulator replacing data. New CDF contains 469 points. No texturing as there are no images Plotting data for 469 points and 466 polygons. entering CDFDataSet::plotData Shape:0 / 13 NumberPoints =8 NumberPolygns=12 NumberLineSegments=0 Shape:1 / 13 NumberPoints =8 NumberPolygns=12 NumberLineSegments=0 Shape:2 / 13 NumberPoints =8 NumberPolygns=12 NumberLineSegments=0 Shape:3 / 13 NumberPoints =62 NumberPolygns=120 NumberLineSegments=0 Shape:4 / 13 NumberPoints =62 NumberPolygns=120 NumberLineSegments=0 Shape:5 / 13 NumberPoints =14 NumberPolygns=24 NumberLineSegments=0 Shape:6 / 13 NumberPoints =14 NumberPolygns=24 NumberLineSegments=0 Shape:7 / 13 NumberPoints =12 NumberPolygns=20 NumberLineSegments=0 Shape:8 / 13 NumberPoints =62 NumberPolygns=120 NumberLineSegments=0 Shape:9 / 13 NumberPoints =4 NumberPolygns=2 NumberLineSegments=0 Shape:10 / 13 NumberPoints =112 NumberPolygns=0 NumberLineSegments=0 Shape:11 / 13 NumberPoints =60 NumberPolygns=0 NumberLineSegments=0 Shape:12 / 13 NumberPoints =36 NumberPolygns=0 NumberLineSegments=0 Shape:13 / 13 NumberPoints =7 NumberPolygns=0 NumberLineSegments=6 exiting CDFDataSet::plotData DEBUG: Trying to exit. DEBUG: main loop all done... bye. End-of-run CumTimer timing results Number of entries: 2 Total run time: 2.62966 secs Id:cdf_cutUmbModCutCDFDataSet::_CutData(): TotalTime:6.91414e-06 Frac:2.62929e-06 Count:1 Id:cdf_rot_tr_sclRotateTranslateScaleData() TotalTime:6.50883e-05 Frac:2.47516e-05 Count:1 BasicDrawParmsGuarantor::cleanup() cstk::ilWin::finalCleanup() Cleaning up known windows. DistEngine::DeleteDfltInst() called. [hopper@babs rwm]$ *************************************** I hope this helps to find the problem! Please email me with further questions as they arise, Ralph
sched_getaffinity(3502, 128, { ff, 0, 0, 0 }) = 32 .... sched_getaffinity(3502, 128, { 1, 0, 0, 0 }) = 32 ... sched_getaffinity(3502, 128, { 1, 0, 0, 0 }) = 32 Shows that the process originally was bound to any of the 8 CPUs (mask 0xff), but after a while it got confined only to the first CPU. The question is what has done that. You can look for sched_setaffinity syscall in the strace dump, then either from the surrounding syscalls or under debugger find out where sched_setaffinity or pthread_setaffinity_np functions have been called from or if the program invokes the syscall directly, without calling a libc/libpthread function.
Subject: Re: OpenMP thinks that I have 1 processor on an 8 processor pc jakub at gcc dot gnu dot org wrote: > ------- Comment #7 from jakub at gcc dot gnu dot org 2008-10-13 09:43 ------- > sched_getaffinity(3502, 128, { ff, 0, 0, 0 }) = 32 > .... > sched_getaffinity(3502, 128, { 1, 0, 0, 0 }) = 32 > ... > sched_getaffinity(3502, 128, { 1, 0, 0, 0 }) = 32 > Shows that the process originally was bound to any of the 8 CPUs (mask 0xff), > but > after a while it got confined only to the first CPU. The question is what > has done that. You can look for sched_setaffinity syscall in the strace dump, > then either from the surrounding syscalls or under debugger find out where > sched_setaffinity or pthread_setaffinity_np functions have been called from > or if the program invokes the syscall directly, without calling a > libc/libpthread function. > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37586 > > ------- You are receiving this mail because: ------- > You reported the bug, or are watching the reporter. > > Hi Jakub, I did as you asked and get: sched_setaffinity(24094, 128, { 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }) = 0 Its buried in other "startup" code: ................... NOTE: usg::Scene using usg default data file path list path = .:/usr/local/share/OpenSceneGraph/data:/usr/local/share/OpenSceneGraph/data/Env:/usr/local/share/OpenSceneGraph/data/fonts:/usr/local/share/OpenSceneGraph/data/Images:/usr/share/OpenSceneGraph/data:/usr/share/OpenSceneGraph/data/Env:/usr/share/OpenSceneGraph/data/fonts:/usr/share/OpenSceneGraph/data/Images camera number: 0 sched_setaffinity(24094, 128, { 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }) = 0 INFO> navigation mode set to umbra Warning: font file "fonts/arial.ttf" not found. Setting savConfigCB to '::guiWrapper saveConfig' sLoadFile filename:/home/vision_data/geometry_objects/test.cdf fileType: lineRowCnt=-1 lineColCnt=-1 std::string UmbModCDFDataSet::LoadFile(const s...... ................................... I suppose that this means that something in Tcl, OpenScenegraph, umbra, my own rwm is setting the following software to run on only one processor. I ran a strace -f to get a full "dump" and found that it occurs between 2 "open(library)" calls. I backtracked to OpenSceneGraph to find one possibility: [hopper@babs src]$ pwd /usr/local/src/OpenSceneGraph-2.4.0/src [hopper@babs src]$ find . -exec grep -n -H sched_setaffinity {} \; ./OpenThreads/pthreads/CMakeLists.txt:73: sched_setaffinity( 0, sizeof(cpumask), &cpumask ); ./OpenThreads/pthreads/CMakeLists.txt:83: sched_setaffinity( 0, &cpumask ); ./OpenThreads/pthreads/PThread.c++:135: sched_setaffinity( 0, sizeof(cpumask), &cpumask ); ./OpenThreads/pthreads/PThread.c++:137: sched_setaffinity( 0, &cpumask ); ./OpenThreads/pthreads/PThread.c++:549: sched_setaffinity( 0, sizeof(cpumask), &cpumask ); ./OpenThreads/pthreads/PThread.c++:551: sched_setaffinity( 0, &cpumask ); ./OpenThreads/pthreads/PThread.c++:984: sched_setaffinity( 0, sizeof(cpumask), &cpumask ); ./OpenThreads/pthreads/PThread.c++:986: sched_setaffinity( 0, &cpumask ); ./OpenThreads/pthreads/GNUmakefile:43:ifeq ($(COMPILE_USING_TWO_PARAM_sched_setaffinity),yes) ./OpenThreads/pthreads/GNUmakefile:44:DEF += -DCOMPILE_USING_TWO_PARAM_sched_setaffinity [hopper@babs src]$ This looks like a possibility for the source of my problem or not? Can I make my own call to sched_setaffinity? There is, of course, always the possibility of causing problems. Ralph
Yes, that certainly is the source of your problems. You could call sched_setaffinity or pthread_sched_setaffinity in your program before entering an OpenMP region, guess you should also find out why does that library confine the thread to just one CPU. Anyway, closing this, as it clearly is not a GCC/libgomp bug.
Subject: Re: OpenMP thinks that I have 1 processor on an 8 processor pc jakub at gcc dot gnu dot org wrote: > ------- Comment #9 from jakub at gcc dot gnu dot org 2008-10-13 17:52 ------- > Yes, that certainly is the source of your problems. You could call > sched_setaffinity or pthread_sched_setaffinity in your program before entering > an OpenMP region, guess you should also find out why does that library confine > the thread to just one CPU. > Anyway, closing this, as it clearly is not a GCC/libgomp bug. > > > -- > > jakub at gcc dot gnu dot org changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Status|WAITING |RESOLVED > Resolution| |FIXED > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37586 > > ------- You are receiving this mail because: ------- > You reported the bug, or are watching the reporter. > > THANKS for your help! Ralph