thread_results is allocated using scalar new, yet deleted with array delete[] Version of libs (ubuntu package): libstdc++6-4.4-dev through libstdc++6-4.6-dev Version of compiler: g++ (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5 Also with: g++ (Debian 4.5.2-8) 4.5.2 Snippets: /usr/include/c++/4.4/parallel/par_loop.h:87 # pragma omp single { num_threads = omp_get_num_threads(); thread_results = static_cast<Result*>( ::operator new(num_threads * sizeof(Result))); constructed = new bool[num_threads]; } but line 127: delete[] thread_results; This is unbalanced, and leads to the expected valgrind errors. Attached is a snippet (originally from http://stackoverflow.com/questions/5769908/unexpected-segfault-with-gnu-parallelaccumulate) that will segfault. I witnessed the segfault went away when fixing the `delete[]` into `delete`. However, I think it should be evaluated whether the optimized (uninitializing) allocation of thread_results is safe to do (valgrind keeps reporting uninitialized memory references even if the attached example doesn't segfault anymore). ---- I'll attach the sources in 10 minutes due to time restriction and browser flakiness
Created attachment 24089 [details] minimal example Compiled with g++ -g -O0 -fopenmp minimal.cpp -o minimal -save-temps
This has been fixed a lot of time ago: 4_5-branch, 4_6-branch and mainline are all fine.
Never mind, now I see.
*** Bug 48751 has been marked as a duplicate of this bug. ***
Let's add Johannes in CC. Indeed, something is fishy here and yes I agree the correct fix doesn't seem just using ::operator delete(__thread_results);
That _is quick response. Thank you. I'll keep an eye on this one
I had a quick look to this code and something really bad is going on, like, eg, memory deallocated before calling the destructor of the object constructed in it via placement new. And much more. The patchlet below, which I'd ask the submitters to test, gives me something much more sane, wrt Valgrind too, but Jahannes should really have a look asap. (Note the patch is versus mainline first, as usual, thus doesn't apply cleanly as-is to 4.4 for various trivial reasons, like uglified names) ///////////////// Index: par_loop.h =================================================================== --- par_loop.h (revision 172920) +++ par_loop.h (working copy) @@ -91,8 +91,7 @@ _ThreadIndex __iam = omp_get_thread_num(); // Neutral element. - _Result* __reduct = static_cast<_Result*> - (::operator new(sizeof(_Result))); + _Result* __reduct; _DifferenceType __start = equally_split_point(__length, __num_threads, __iam), @@ -100,7 +99,7 @@ if (__start < __stop) { - new(__reduct) _Result(__f(__o, __begin + __start)); + __reduct = new _Result(__f(__o, __begin + __start)); ++__start; __constructed[__iam] = true; } @@ -110,18 +109,26 @@ for (; __start < __stop; ++__start) *__reduct = __r(*__reduct, __f(__o, __begin + __start)); - __thread_results[__iam] = *__reduct; + if (__constructed[__iam]) + { + ::new(&__thread_results[__iam]) _Result(*__reduct); + delete __reduct; + } } //parallel for (_ThreadIndex __i = 0; __i < __num_threads; ++__i) if (__constructed[__i]) - __output = __r(__output, __thread_results[__i]); + { + __output = __r(__output, __thread_results[__i]); + (&__thread_results[__i])->~_Result(); + } // Points to last element processed (needed as return value for // some algorithms like transform). __f._M_finish_iterator = __begin + __length; - delete[] __thread_results; + ::operator delete(__thread_results); + delete[] __constructed; return __o;
Johannes, can you please also audit the other uses of operator new, placement new, etc, and double check that: 1- We are not assigning to an object before constructing. 2- We destruct each object before deleting its memory. Thanks!
Seth, Tom, if you get a chance to test the changes I propose in Comment 7 (suitably, trivially tweaked for 4.4.x), please let me know as soon as possible. If Johannes cannot provide feedback on this issue over the next few days I mean to go ahead with my patch, likely for 4.6.x too, and work on a other similar tweaks in the same spirit elsewhere in the parallel code (luckily we have only an handful of operator new / operator delete pairs needing attention).
Hi Paulo, Yes, this certainly fixes my segfault. Valgrind complains a little - although I am not sure if this is important (I am very new to the helgrind tool). In case it is useful, I attach the output I get, So the line numbers are meaningful, I include my modified par_loop.h and my test file. Here is my g++ -v Using built-in specs. Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.4.4-14ubuntu5' --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.4 --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --disable-werror --with-arch-32=i686 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 4.4.5 (Ubuntu/Linaro 4.4.4-14ubuntu5) running on Ubuntu 10-10 Many thanks for your assistance with this. My gratitude to the FSF grows daily. Tom On 26 April 2011 16:20, paolo.carlini at oracle dot com < gcc-bugzilla@gcc.gnu.org> wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48750 > > --- Comment #9 from Paolo Carlini <paolo.carlini at oracle dot com> > 2011-04-26 15:19:12 UTC --- > Seth, Tom, if you get a chance to test the changes I propose in Comment 7 > (suitably, trivially tweaked for 4.4.x), please let me know as soon as > possible. > > If Johannes cannot provide feedback on this issue over the next few days I > mean > to go ahead with my patch, likely for 4.6.x too, and work on a other > similar > tweaks in the same spirit elsewhere in the parallel code (luckily we have > only > an handful of operator new / operator delete pairs needing attention). > > -- > Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are on the CC list for the bug. >
(In reply to comment #10) > Yes, this certainly fixes my segfault. For me, it only fixes the segfault, also avoids all valgrind and glibc errors messages at runtime and drastically improves the valgring report in terms of memory leaks, to 0 definitely and 0 indirectly lost (try not ding 'delete __reduct', as was effectively happening before the patch, for example, to see what I mean). The report is still not perfect, indeed, there may be other leaks somewhere else, I'm going to audit other bits of the parallel infrastructure as I said. This is the valgrind report I get here on an i7-980X I have at hand: ==7206== LEAK SUMMARY: ==7206== definitely lost: 0 bytes in 0 blocks ==7206== indirectly lost: 0 bytes in 0 blocks ==7206== possibly lost: 3,344 bytes in 11 blocks ==7206== still reachable: 2,984 bytes in 3 blocks ==7206== suppressed: 0 bytes in 0 blocks > Valgrind complains a little - although I am not sure if this is important (I > am very new to the helgrind tool). In case it is useful, I attach the > output I get, Sure it would be useful. Note, in general, a few memory issues have been fixed already in 4.5.x and 4.6.x, thus at some point, if you are serious about parallel mode, updating GCC will be certainly recommended.
Forgot: my numbers, etc, are all about the exaxt testcase in PR48751.
Created attachment 24108 [details] Draft patch This is my work ìn progress patch. I'm for example seeing very good to perfect improvements in terms of valgrind reports for sort / stable_sort / random_shuffle too, which, before the patch, when the ValueType has a non trivial destructor leak a lot. I'm going to the run the patch through the testsuite in parallel-mode and get it closer to the final shape.
I just built a debian SID chroot and applied the diff to it's libstdc++6 (4.6.0-2), and tested your patch with the minimal.cpp from this bug report patch -p1 < ~sehe/attachment.cgi?id=24108 patching file parallel/multiway_merge.h patching file parallel/losertree.h patching file parallel/par_loop.h Hunk #1 succeeded at 91 with fuzz 1. patching file parallel/quicksort.h patching file parallel/random_shuffle.h patching file parallel/multiway_mergesort.h patching file parallel/partial_sum.h g++-4.6 -g -O0 -fopenmp minimal.cpp -o minimal -save-temps valgrind --leak-check=full ./minimal ==5960== LEAK SUMMARY: ==5960== definitely lost: 0 bytes in 0 blocks ==5960== indirectly lost: 0 bytes in 0 blocks ==5960== possibly lost: 456 bytes in 3 blocks ==5960== still reachable: 1,748 bytes in 3 blocks ==5960== suppressed: 0 bytes in 0 blocks The allocation possibly lost are attributed to GOMP_parallel_start from numeric:99). Helgrind gives quite the array of possible data races. I don't know whether I need to worry about them (as I don't know what kind of thread primitives GOMP uses internally and if helgrind supports them). If you want I can attach precompiled sources + val/helgrind outputs
Probably redundant info, but the compiler used in my prior post is sehe@sid:~$ g++-4.6 -v Using built-in specs. COLLECT_GCC=g++-4.6 COLLECT_LTO_WRAPPER=/usr/lib/gcc/i486-linux-gnu/4.6.1/lto-wrapper Target: i486-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 4.6.0-2' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++,go --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-gold --enable-ld=default --with-plugin-ld --enable-objc-gc --enable-targets=all --with-arch-32=i586 --with-tune=generic --enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu Thread model: posix gcc version 4.6.1 20110329 (prerelease) (Debian 4.6.0-2)
I have reviewed the patch, it looks mostly fine. Thanks! Only line 606 of losertree.h might not yet be fully consistent with line 597. I'm not sure all loser tree members are correctly constructed in the first place. We might have to add for (unsigned int __i = 0; __i < _M_k; ++__i) { ::new(&(_M_losers[__i]._M_key)) _Tp(__sentinel); _M_losers[__i]._M_source = -1; } I will test that... We have used these ::operator new/placement new construction to avoid calling the default constructor, which might not be available for the ValueType. Interference with the actual parallelism should be minimal.
Thanks Johannes. Indeed, I was still a little nervous about those bits of losertree.h. If you can, please make sure that (even if it costs a bit of performance): 1- After memory allocation, the first time each objects is constructed, via placement new, not assigned. 2- All the following times are assigned, not constructed. 3- All the object are destructed, once, by calling explicitly the destructor, in the destructor of _LoserTree* class, before deallocating the memory via operator delete. Careful not trying to destruct objects never constructed in the first place. In general, I recommend preparing a tiny snippet code exercising the parallel code at issue and running it under valgrind (doing that yesterday for the elements of a vector<string>, for sort, stable_sort, etc, turned out to be rather "interesting" ;) Please, keep me up to date, we want to resolve this for 4.6.1, I think.
Author: paolo Date: Tue May 3 14:20:45 2011 New Revision: 173309 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=173309 Log: 2011-05-03 Paolo Carlini <paolo.carlini@oracle.com> PR libstdc++/48750 * include/parallel/multiway_merge.h: Run _ValueType destructors. * include/parallel/multiway_mergesort.h: Likewise. * include/parallel/quicksort.h: Likewise. * include/parallel/random_shuffle.h: Likewise. * include/parallel/partial_sum.h: Likewise. * include/parallel/losertree.h: Run destructors; minor tweaks. * include/parallel/par_loop.h: Run destructors, fix memory allocations and deallocations. * testsuite/26_numerics/accumulate/48750.cc: New. * testsuite/ext/profile/mutex_extensions_neg.cc: Do not run in parallel-mode to avoid spurious multiple errors. Added: trunk/libstdc++-v3/testsuite/26_numerics/accumulate/48750.cc Modified: trunk/libstdc++-v3/ChangeLog trunk/libstdc++-v3/include/parallel/losertree.h trunk/libstdc++-v3/include/parallel/multiway_merge.h trunk/libstdc++-v3/include/parallel/multiway_mergesort.h trunk/libstdc++-v3/include/parallel/par_loop.h trunk/libstdc++-v3/include/parallel/partial_sum.h trunk/libstdc++-v3/include/parallel/quicksort.h trunk/libstdc++-v3/include/parallel/random_shuffle.h trunk/libstdc++-v3/testsuite/ext/profile/mutex_extensions_neg.cc
Cheers!
Author: paolo Date: Tue May 3 17:54:35 2011 New Revision: 173335 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=173335 Log: 2011-05-03 Paolo Carlini <paolo.carlini@oracle.com> PR libstdc++/48750 * include/parallel/multiway_merge.h: Run _ValueType destructors. * include/parallel/multiway_mergesort.h: Likewise. * include/parallel/quicksort.h: Likewise. * include/parallel/random_shuffle.h: Likewise. * include/parallel/partial_sum.h: Likewise. * include/parallel/losertree.h: Run destructors; minor tweaks. * include/parallel/par_loop.h: Run destructors, fix memory allocations and deallocations. * testsuite/26_numerics/accumulate/48750.cc: New. * testsuite/ext/profile/mutex_extensions_neg.cc: Do not run in parallel-mode to avoid spurious multiple errors. Added: branches/gcc-4_6-branch/libstdc++-v3/testsuite/26_numerics/accumulate/48750.cc Modified: branches/gcc-4_6-branch/libstdc++-v3/ChangeLog branches/gcc-4_6-branch/libstdc++-v3/include/parallel/losertree.h branches/gcc-4_6-branch/libstdc++-v3/include/parallel/multiway_merge.h branches/gcc-4_6-branch/libstdc++-v3/include/parallel/multiway_mergesort.h branches/gcc-4_6-branch/libstdc++-v3/include/parallel/par_loop.h branches/gcc-4_6-branch/libstdc++-v3/include/parallel/partial_sum.h branches/gcc-4_6-branch/libstdc++-v3/include/parallel/quicksort.h branches/gcc-4_6-branch/libstdc++-v3/include/parallel/random_shuffle.h branches/gcc-4_6-branch/libstdc++-v3/testsuite/ext/profile/mutex_extensions_neg.cc
Fixed for 4.6.1.