User account creation filtered due to spam.
The following (reduced) code works if using only one core but segfault randomly when using more core. The more core your have, the more chance you have to get a segfault. Sometime it also fail with a double free error. I compiled it using g++-6 -std=c++11 -fcilkplus -g -Wall -lcilkrts red2.cpp -o red2 #include <vector> #include <cilk/cilk.h> void walk(std::vector<int> v, unsigned size) { if (v.size() < size) for (int i=0; i<8; i++) { std::vector<int> vnew(v); vnew.push_back(i); cilk_spawn walk(vnew, size); } } int main(int argc, char **argv) { std::vector<int> v{}; walk(v, 5); } FWIW: $ g++-6 -v Using built-in specs. COLLECT_GCC=g++-6 COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-suse-linux/6/lto-wrapper Target: x86_64-suse-linux Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,java,ada,go --enable-offload-targets=hsa --enable-checking=release --with-gxx-include-dir=/usr/include/c++/6 --enable-ssp --disable-libssp --disable-libvtv --disable-libcc1 --disable-plugin --with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --disable-libgcj --with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --with-default-libstdcxx-abi=gcc4-compatible --enable-version-specific-runtime-libs --enable-linker-build-id --enable-linux-futex --enable-gnu-indirect-function --program-suffix=-6 --without-system-libunwind --enable-multilib --with-arch-32=x86-64 --with-tune=generic --build=x86_64-suse-linux --host=x86_64-suse-linux Thread model: posix gcc version 6.3.1 20170202 [gcc-6-branch revision 245119] (SUSE Linux)
It's clear to me that Cilk+ doesn't handle finalization (ctor of vnew) properly. Maybe cilk_spawn is documented (esp. if done recursively) as not blocking for all children to return. Thus insert a blocking primitive?
Thanks for your quick answer ! I'm not sure to understand what you mean by "Thus insert a blocking primitive?" Do you suggest to add a cilk_sync somewhere ?
Don't know Cilk+ but yes, try for (int i=0; i<8; i++) { std::vector<int> vnew(v); vnew.push_back(i); cilk_spawn walk(vnew, size); cilk_sync; } with appropriate syntax. We expand the spawn like try { <<cleanup_point <<< Unknown tree: expr_stmt std::vector<int>::push_back (&vnew, (const value_type &) &i) >>>>>; <<cleanup_point <<< Unknown tree: expr_stmt _Cilk_spawn walk (&TARGET_EXPR <D.15394, <<< Unknown tree: aggr_init_expr 5 __comp_ctor D.15394 (struct vector *) <<< Unknown tree: void_cst >>> (const struct vector &) &vnew >>>>, size) >>>>>; } finally { std::vector<int>::~vector (&vnew); } which means the spawned copies use a shared vnew and a single destructor is run "afterwards". If there's no synchronization then all threads will call the destructor I guess. Not sure if cilk_sync will properly "merge" execution back to one thread or if there's another primitive that does that. Cilk+ and C++ is going to be "interesting" with these kind of issues.
It might be that GCC is in error and that the ctor/dtor need to be arranged to run per thread. Who knows... You might want to check a competing Cilk+ implementation for this (I think the Intel compiler has Cilk+ support).
(In reply to Richard Biener from comment #3) > Don't know Cilk+ but yes, try > > for (int i=0; i<8; i++) { > std::vector<int> vnew(v); > vnew.push_back(i); > cilk_spawn walk(vnew, size); > cilk_sync; > } > > with appropriate syntax. Ok ! Putting the cilk_sync here works but completely destroy the purpose of using Cilk (no parallelism). I had the feeling that passing vnew as a value to the function makes a copy so there is no possibility for the caller to destroy vnew before it gets copied by the function call. However using manual allocation seems to fix the problem #include <vector> #include <cilk/cilk.h> void walk(std::vector<int> *v, unsigned size) { if (v->size() < size) for (int i=0; i<8; i++) { auto *vnew = new std::vector<int>(*v); vnew->push_back(i); cilk_spawn walk(vnew, size); } cilk_sync; delete v; } int main(int argc, char **argv) { auto *v = new std::vector<int>{}; walk(v, 5); }
Also note it is likely Cilk+ will be deprecated in gcc-7
(In reply to Jeffrey A. Law from comment #6) > Also note it is likely Cilk+ will be deprecated in gcc-7 That's a huge pity from my point of view. For recursive exploration like the code I put here, I don't know any equivalent in both easyness of writing the code and performance of the result. However, I can't probably imagine the difficulty of maintaining such a code.
It's unfortunate and directly related to the maintenance effort involved. The deprecation plan for Cilk+ would have code in gcc-7, but the code would be removed prior to the gcc-8 release. Unless someone wanted to step up and maintain the code -- so far efforts to find new maintainers for Cilk+ haven't produced any results.
*** Bug 80045 has been marked as a duplicate of this bug. ***
(In reply to Richard Biener from comment #3) > which means the spawned copies use a shared vnew and a single destructor > is run "afterwards". If there's no synchronization then all threads will > call the destructor I guess. Not sure if cilk_sync will properly > "merge" execution back to one thread or if there's another primitive that > does that. I don't think so. I peeked into the assembly code generated. A temp copy of vnew is generated for "cilk_spawn walk(...)" just like there is a simple walk(...) call. However the temp is soon destroyed after cilk_spawn returned. And at the point the child is still running and reading that temp copy. It's easy to find out using -fsanitize=thread. > Cilk+ and C++ is going to be "interesting" with these kind of issues. It's clear in cilkplus spec <https://www.cilkplus.org/sites/default/files/open_specifications/Intel_Cilk_plus_lang_spec_1.2.htm> section 3.6 bullet [4]: > Any unnamed temporary variables created prior to the spawn point are > not destroyed until after the spawn point (i.e., their destructors are > invoked in the child). But GCC is invoking destructors in the *parent*. That caused this issue. In fact Florent's workaround in comment 5 is exactly doing what the spec said. We may let GCC generate code like this. But then the temp would be in the heap instead of stack. To keep the temp living and the parent running, if we want to use stack, we have to alloc this temp on the stack of the *child*. Currently I have no idea how to do this.
It is also interesting to note that GCC 5.x is not impacted. Even if the given example code will not compile on GCC 5 (ICE in lower_stmt, at gimple-low.c:397 for the line "cilk_spawn walk(vnew, size);"), we discovered the bug in this project: https://github.com/hivert/NumericMonoid/tree/master/src/Cilk%2B%2B (source) https://github.com/hivert/NumericMonoid/issues/2 (results, unfortunately in French) Which does compile under GCC 5. The quick tests done on my machine show that something between GCC 5.4.0 and GCC 6.3.1 introduced the destructor bug.
(In reply to Edgar Fournival from comment #11) > It is also interesting to note that GCC 5.x is not impacted. > > Even if the given example code will not compile on GCC 5 (ICE in lower_stmt, > at gimple-low.c:397 for the line "cilk_spawn walk(vnew, size);"), we > discovered the bug in this project: > https://github.com/hivert/NumericMonoid/tree/master/src/Cilk%2B%2B (source) > https://github.com/hivert/NumericMonoid/issues/2 (results, unfortunately in > French) > Which does compile under GCC 5. Florent please add [Regression 6/7] in the title. > The quick tests done on my machine show that something between GCC 5.4.0 and > GCC 6.3.1 introduced the destructor bug. My first guess is the fix to the ICE introduced this bug. Testing...
I've not yet been able to find a small example that shows the regression due to GCC5 ICE. In particular, I'm not sure the problem mentioned by Fournival is not due to a bug in my code (See my question of http://stackoverflow.com/questions/42816920/data-race-in-cilk-home-made-reducer)
(In reply to Florent Hivert from comment #13) > I've not yet been able to find a small example that shows the regression due > to GCC5 ICE. In particular, I'm not sure the problem mentioned by Fournival > is not due to a bug in my code (See my question of > http://stackoverflow.com/questions/42816920/data-race-in-cilk-home-made- > reducer) Although I don't understand all the code in the Github repo, but GCC 6 indeed moved the temporary construction (initialization) and destruction code to the parent, and violated cilkplus spec. So it's very sure this is a regression of GCC.
(In reply to Xi Ruoyao from comment #14) > Although I don't understand all the code in the Github repo, but GCC 6 indeed > moved the temporary construction (initialization) and destruction code to the > parent, and violated cilkplus spec. So it's very sure this is a regression > of GCC. Sorry for this, I should have been more specific. The code we have that uses Cilk is located in treewalk.cpp:49: https://github.com/hivert/NumericMonoid/blob/master/src/Cilk%2B%2B/treewalk.cpp#L49 We fixed it with this: https://github.com/hivert/NumericMonoid/pull/3/commits/bf8898dce0276b8568e867b6ff255a1c9a0fa434 Unfortunately, this is the only thing we have that demonstrates the bug *and* compiles under GCC 5. A shorter and simpler example would be really helpful but I have no clue how to workaround the ICE. Xi Ruoyao, do you know which commit would be involved? If not, what are the typical process used to find out the modification which introduced the bug? Bisection?
On 2017-03-16 10:16 +0000, contact@edgar-fournival.fr wrote: > Bisection? I am bisecting now.
A simple test case for bisecting: extern "C" { int x(); int y(int); int z(); } int z() { int ret = _Cilk_spawn y(x()); return ret; } Compile with '-fcilkplus -S'. According to cilkplus spec, x() should be evaluated in the child (for the asm produced by GCC, in _cilk_spn_0). If you find "call x" in the asm of "z" instead of "_cilk_spn_0", that version of GCC is buggy.
> Compile with '-fcilkplus -S'. According to cilkplus spec, x() should be > evaluated in the child (for the asm produced by GCC, in _cilk_spn_0). > If you find "call x" in the asm of "z" instead of "_cilk_spn_0", that version > of GCC is buggy. Note that some version of GCC put "call _cilk_enter_frame_1" in "z", before calling "x". They are good.
Thanks for your explanations. I will investigate on my side and check the generated assembly of our project. Please keep us posted with the results of your bisection.
On 2017-03-16 12:31 +0000, contact@edgar-fournival.fr wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80038 > > --- Comment #19 from Edgar Fournival <contact@edgar-fournival.fr> --- > Thanks for your explanations. I will investigate on my side and check the > generated assembly of our project. Please keep us posted with the results of > your bisection. Seems my explanation is still buggy. We should say: There must be no frame change (cilk_..._enter_frame) between "call x" and "call y". It's more reliable to compile and run Florent's original testcase.
Start from r227423. Fix spawned function with lambda function Make sure that the spawned function's arguments will not be pushed into lambda function. gcc/c-family/ 2015-09-02 Balaji V. Iyer <balaji.v.iyer@intel.com> PR middle-end/60586 * c-common.h (cilk_gimplify_call_params_in_spawned_fn): New prototype. * c-gimplify.c (c_gimplify_expr): Added a call to the function cilk_gimplify_call_params_in_spawned_fn. * cilk.c (cilk_gimplify_call_params_in_spawned_fn): New function. (gimplify_cilk_spawn): Removed EXPR_STMT and CLEANUP_POINT_EXPR unwrapping. gcc/cp/ 2015-09-02 Balaji V. Iyer <balaji.v.iyer@intel.com> PR middle-end/60586 * cp-gimplify.c (cilk_cp_gimplify_call_params_in_spawned_fn): New function. (cp_gimplify_expr): Added a call to the function cilk_cp_gimplify_call_params_in_spawned_fn. gcc/testsuite/ 2015-09-02 Balaji V. Iyer <balaji.v.iyer@intel.com> PR middle-end/60586 * c-c++-common/cilk-plus/CK/pr60586.c: New file. * g++.dg/cilk-plus/CK/pr60586.cc: Likewise. PR60586 makes GCC ICE on Florent's testcase.
PR 60586 is a false PR to an expected behaviour... Unfortunately nobody noticed that and someone "fixed" it. But without r227423 GCC just ICE on the testcase.
Sorry. I was too tired so I misunderstood r227423. Now I am not sure whether it introduced the bug. I'll test it with NumericMonoid code.
The cilkplus spec described an "unsymmetrical" construct/destruct of temp variables. It seems difficult to implement in GCC. I'm trying to find a solution.
I confirmed that r227423 broke NumericMonoid.
Reverting r227423 makes both Florent's and NumericMonoid code work. But it reintroduced pr60586. Trying to solve it.
Created attachment 41027 [details] patch proposal Now I've prepared a patch. It reverted r227423 (but preserved its testcase) and used a more conforming way to resolve pr60586. It seems this patch fixed Florent's testcase, and NumericMonoid code as well. I'll run a complete dejaGNU regression test. If it's OK I'll submit the patch.
Created attachment 41028 [details] testcase
https://gcc.gnu.org/ml/gcc-patches/2017-03/msg01563.html Patch proposal
Given that Cilk is likely to be deprecated, there's no way this should be a P2 issue.
Author: law Date: Mon May 1 22:26:02 2017 New Revision: 247446 URL: https://gcc.gnu.org/viewcvs?rev=247446&root=gcc&view=rev Log: 2017-05-01 Xi Ruoyao <ryxi@stu.xidian.edu.cn> PR c++/80038 * cilk_common.c (expand_builtin_cilk_detach): Move pedigree operations here. * gimplify.c (gimplify_cilk_detach): New function. (gimplify_call_expr, gimplify_modify_expr): Call it as needed. * tree-core.h: Document EXPR_CILK_SPAWN. * tree.h (EXPR_CILK_SPAWN): Define. PR c++/80038 * c-common.h (cilk_gimplify_call_params_in_spawned_fn): Remove prototype. (cilk_install_body_pedigree_operations): Likewise. * cilk.c (cilk_set_spawn_marker): Mark functions that should be detatched. (cilk_gimplify_call_params_in_spawned_fn): Remove. (cilk_install_body_pedigree_operations): Likewise. (gimplify_cilk_spawn): Add EXPR_STMT and CLEANUP_POINT_EXPR unwrapping. PR c++/80038 * c-gimplify.c (c_gimplify_expr): Remove calls to cilk_gimplifY_call_params_in_spawned_fn. PR c++/80038 * cp-cilkplus.c (cilk_install_body_with_frame_cleanup): Don't add pedigree operation and detach call here. * cp-gimplify.c (cp_gimplify_expr): Remove the calls to cilk_cp_gimplify_call_params_in_spawned_fn. (cilk_cp_gimplify_call_params_in_spawned_fn): Remove function. * semantics.c (simplify_aggr_init_expr): Copy EXPR_CILK_SPAWN. PR c++/80038 * lto-lang.c (lto_init): Set in_lto_p earlier. PR c++/80038 * g++.dg/cilk-plus/CK/pr80038.cc: New test. Added: trunk/gcc/testsuite/g++.dg/cilk-plus/CK/pr80038.cc Modified: trunk/gcc/ChangeLog trunk/gcc/c-family/ChangeLog trunk/gcc/c-family/c-common.h trunk/gcc/c-family/c-gimplify.c trunk/gcc/c-family/cilk.c trunk/gcc/c/ChangeLog trunk/gcc/c/c-typeck.c trunk/gcc/cilk-common.c trunk/gcc/cp/ChangeLog trunk/gcc/cp/cp-cilkplus.c trunk/gcc/cp/cp-gimplify.c trunk/gcc/cp/semantics.c trunk/gcc/gimplify.c trunk/gcc/lto/ChangeLog trunk/gcc/lto/lto-lang.c trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-core.h trunk/gcc/tree.h
GCC 6.4 is being released, adjusting target milestone.