User account creation filtered due to spam.

Bug 80038 - [6/7 Regression] Random segfault using local vectors in Cilk function
Summary: [6/7 Regression] Random segfault using local vectors in Cilk function
Status: WAITING
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 6.3.1
: P4 normal
Target Milestone: 6.5
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
: 80045 (view as bug list)
Depends on:
Blocks: 69582
  Show dependency treegraph
 
Reported: 2017-03-14 09:36 UTC by Florent Hivert
Modified: 2017-07-04 08:47 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2017-03-14 00:00:00


Attachments
patch proposal (4.23 KB, patch)
2017-03-23 05:08 UTC, Xi Ruoyao
Details | Diff
testcase (360 bytes, text/x-csrc)
2017-03-23 05:10 UTC, Xi Ruoyao
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Florent Hivert 2017-03-14 09:36:04 UTC
The following (reduced) code works if using only one core but segfault randomly when using more core. The more core your have, the more chance you have to get a segfault. Sometime it also fail with a double free error.

I compiled it using 

g++-6 -std=c++11 -fcilkplus -g -Wall -lcilkrts red2.cpp   -o red2

#include <vector>
#include <cilk/cilk.h>

void walk(std::vector<int> v, unsigned size) {
  if (v.size() < size)
    for (int i=0; i<8; i++) {
      std::vector<int> vnew(v);
      vnew.push_back(i);
      cilk_spawn walk(vnew, size);
    }
}

int main(int argc, char **argv) {
  std::vector<int> v{};
  walk(v, 5);
}

FWIW:

$ g++-6 -v
Using built-in specs.
COLLECT_GCC=g++-6
COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-suse-linux/6/lto-wrapper
Target: x86_64-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,java,ada,go --enable-offload-targets=hsa --enable-checking=release --with-gxx-include-dir=/usr/include/c++/6 --enable-ssp --disable-libssp --disable-libvtv --disable-libcc1 --disable-plugin --with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --disable-libgcj --with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --with-default-libstdcxx-abi=gcc4-compatible --enable-version-specific-runtime-libs --enable-linker-build-id --enable-linux-futex --enable-gnu-indirect-function --program-suffix=-6 --without-system-libunwind --enable-multilib --with-arch-32=x86-64 --with-tune=generic --build=x86_64-suse-linux --host=x86_64-suse-linux
Thread model: posix
gcc version 6.3.1 20170202 [gcc-6-branch revision 245119] (SUSE Linux)
Comment 1 Richard Biener 2017-03-14 09:40:52 UTC
It's clear to me that Cilk+ doesn't handle finalization (ctor of vnew) properly.
Maybe cilk_spawn is documented (esp. if done recursively) as not blocking for
all children to return.  Thus insert a blocking primitive?
Comment 2 Florent Hivert 2017-03-14 09:51:30 UTC
Thanks for your quick answer ! I'm not sure to understand what you mean by "Thus insert a blocking primitive?" Do you suggest to add a cilk_sync somewhere ?
Comment 3 Richard Biener 2017-03-14 10:07:43 UTC
Don't know Cilk+ but yes, try

   for (int i=0; i<8; i++) {
      std::vector<int> vnew(v);
      vnew.push_back(i);
      cilk_spawn walk(vnew, size);
      cilk_sync;
    }

with appropriate syntax.

We expand the spawn like

                      try
                        {
                          <<cleanup_point <<< Unknown tree: expr_stmt
  std::vector<int>::push_back (&vnew, (const value_type &) &i) >>>>>;
                          <<cleanup_point <<< Unknown tree: expr_stmt
  _Cilk_spawn walk (&TARGET_EXPR <D.15394, <<< Unknown tree: aggr_init_expr
  5
  __comp_ctor
  D.15394
  (struct vector *) <<< Unknown tree: void_cst >>>
  (const struct vector &) &vnew >>>>, size) >>>>>;
                        }
                      finally
                        {
                          std::vector<int>::~vector (&vnew);
                        }

which means the spawned copies use a shared vnew and a single destructor
is run "afterwards".  If there's no synchronization then all threads will
call the destructor I guess.  Not sure if cilk_sync will properly
"merge" execution back to one thread or if there's another primitive that
does that.

Cilk+ and C++ is going to be "interesting" with these kind of issues.
Comment 4 Richard Biener 2017-03-14 10:11:36 UTC
It might be that GCC is in error and that the ctor/dtor need to be arranged to run per thread.  Who knows...

You might want to check a competing Cilk+ implementation for this (I think the Intel compiler has Cilk+ support).
Comment 5 Florent Hivert 2017-03-14 11:08:59 UTC
(In reply to Richard Biener from comment #3)
> Don't know Cilk+ but yes, try
> 
>    for (int i=0; i<8; i++) {
>       std::vector<int> vnew(v);
>       vnew.push_back(i);
>       cilk_spawn walk(vnew, size);
>       cilk_sync;
>     }
> 
> with appropriate syntax.

Ok ! Putting the cilk_sync here works but completely destroy the purpose of using Cilk (no parallelism). I had the feeling that passing vnew as a value to the function makes a copy so there is no possibility for the caller to destroy vnew before it gets copied by the function call.  

However using manual allocation seems to fix the problem

#include <vector>
#include <cilk/cilk.h>

void walk(std::vector<int> *v, unsigned size) {
  if (v->size() < size)
    for (int i=0; i<8; i++) {
      auto *vnew = new std::vector<int>(*v);
      vnew->push_back(i);
      cilk_spawn walk(vnew, size);
    }
  cilk_sync;
  delete v;
}

int main(int argc, char **argv) {
  auto *v = new std::vector<int>{};
  walk(v, 5);
}
Comment 6 Jeffrey A. Law 2017-03-14 15:45:03 UTC
Also note it is likely Cilk+ will be deprecated in gcc-7
Comment 7 Florent Hivert 2017-03-14 16:38:31 UTC
(In reply to Jeffrey A. Law from comment #6)
> Also note it is likely Cilk+ will be deprecated in gcc-7

That's a huge pity from my point of view. For recursive exploration like the code I put here, I don't know any equivalent in both easyness of writing the code and performance of the result. However, I can't probably imagine the difficulty of maintaining such a code.
Comment 8 Jeffrey A. Law 2017-03-14 17:23:40 UTC
It's unfortunate and directly  related to the maintenance effort involved.

The deprecation plan for Cilk+ would have code in gcc-7, but the code would be removed prior to the gcc-8 release.  Unless someone wanted to step up and maintain the code -- so far efforts to find new maintainers for Cilk+ haven't produced any results.
Comment 9 Xi Ruoyao 2017-03-15 10:56:53 UTC
*** Bug 80045 has been marked as a duplicate of this bug. ***
Comment 10 Xi Ruoyao 2017-03-15 11:20:54 UTC
(In reply to Richard Biener from comment #3)
> which means the spawned copies use a shared vnew and a single destructor
> is run "afterwards".  If there's no synchronization then all threads will
> call the destructor I guess.  Not sure if cilk_sync will properly
> "merge" execution back to one thread or if there's another primitive that
> does that.

I don't think so.  I peeked into the assembly code generated.  A temp
copy of vnew is generated for "cilk_spawn walk(...)" just like there is
a simple walk(...) call.  However the temp is soon destroyed after
cilk_spawn returned.  And at the point the child is still running and
reading that temp copy.  It's easy to find out using -fsanitize=thread.

> Cilk+ and C++ is going to be "interesting" with these kind of issues.

It's clear in cilkplus spec <https://www.cilkplus.org/sites/default/files/open_specifications/Intel_Cilk_plus_lang_spec_1.2.htm> section 3.6 bullet [4]:

> Any unnamed temporary variables created prior to the spawn point are
> not destroyed until after the spawn point (i.e., their destructors are
> invoked in the child).

But GCC is invoking destructors in the *parent*.  That caused this issue.
In fact Florent's workaround in comment 5 is exactly doing what the spec
said.

We may let GCC generate code like this.  But then the temp would be in the
heap instead of stack.  To keep the temp living and the parent running, if
we want to use stack, we have to alloc this temp on the stack of the *child*.
Currently I have no idea how to do this.
Comment 11 Edgar Fournival 2017-03-16 08:32:40 UTC
It is also interesting to note that GCC 5.x is not impacted.

Even if the given example code will not compile on GCC 5 (ICE in lower_stmt, at gimple-low.c:397 for the line "cilk_spawn walk(vnew, size);"), we discovered the bug in this project:
https://github.com/hivert/NumericMonoid/tree/master/src/Cilk%2B%2B (source)
https://github.com/hivert/NumericMonoid/issues/2 (results, unfortunately in French)
Which does compile under GCC 5.

The quick tests done on my machine show that something between GCC 5.4.0 and GCC 6.3.1 introduced the destructor bug.
Comment 12 Xi Ruoyao 2017-03-16 08:47:08 UTC
(In reply to Edgar Fournival from comment #11)
> It is also interesting to note that GCC 5.x is not impacted.
> 
> Even if the given example code will not compile on GCC 5 (ICE in lower_stmt,
> at gimple-low.c:397 for the line "cilk_spawn walk(vnew, size);"), we
> discovered the bug in this project:
> https://github.com/hivert/NumericMonoid/tree/master/src/Cilk%2B%2B (source)
> https://github.com/hivert/NumericMonoid/issues/2 (results, unfortunately in
> French)
> Which does compile under GCC 5.

Florent please add [Regression 6/7] in the title.

> The quick tests done on my machine show that something between GCC 5.4.0 and
> GCC 6.3.1 introduced the destructor bug.

My first guess is the fix to the ICE introduced this bug.  Testing...
Comment 13 Florent Hivert 2017-03-16 09:35:03 UTC
I've not yet been able to find a small example that shows the regression due to GCC5 ICE. In particular, I'm not sure the problem mentioned by Fournival is not due to a bug in my code (See my question of http://stackoverflow.com/questions/42816920/data-race-in-cilk-home-made-reducer)
Comment 14 Xi Ruoyao 2017-03-16 09:59:30 UTC
(In reply to Florent Hivert from comment #13)
> I've not yet been able to find a small example that shows the regression due
> to GCC5 ICE. In particular, I'm not sure the problem mentioned by Fournival
> is not due to a bug in my code (See my question of
> http://stackoverflow.com/questions/42816920/data-race-in-cilk-home-made-
> reducer)

Although I don't understand all the code in the Github repo, but GCC 6 indeed
moved the temporary construction (initialization) and destruction code to the
parent, and violated cilkplus spec.  So it's very sure this is a regression
of GCC.
Comment 15 Edgar Fournival 2017-03-16 10:16:36 UTC
(In reply to Xi Ruoyao from comment #14)
> Although I don't understand all the code in the Github repo, but GCC 6 indeed
> moved the temporary construction (initialization) and destruction code to the
> parent, and violated cilkplus spec.  So it's very sure this is a regression
> of GCC.

Sorry for this, I should have been more specific. The code we have that uses Cilk is located in treewalk.cpp:49:
https://github.com/hivert/NumericMonoid/blob/master/src/Cilk%2B%2B/treewalk.cpp#L49
We fixed it with this:
https://github.com/hivert/NumericMonoid/pull/3/commits/bf8898dce0276b8568e867b6ff255a1c9a0fa434

Unfortunately, this is the only thing we have that demonstrates the bug *and* compiles under GCC 5. A shorter and simpler example would be really helpful but I have no clue how to workaround the ICE.

Xi Ruoyao, do you know which commit would be involved? If not, what are the typical process used to find out the modification which introduced the bug? Bisection?
Comment 16 Xi Ruoyao 2017-03-16 10:42:31 UTC
On 2017-03-16 10:16 +0000, contact@edgar-fournival.fr wrote:

> Bisection?
I am bisecting now.
Comment 17 Xi Ruoyao 2017-03-16 10:50:38 UTC
A simple test case for bisecting:

extern "C" {

int x();
int y(int);
int z();

}

int z()
{
  int ret = _Cilk_spawn y(x());
  return ret;
}

Compile with '-fcilkplus -S'. According to cilkplus spec, x() should be
evaluated in the child (for the asm produced by GCC, in _cilk_spn_0).
If you find "call x" in the asm of "z" instead of "_cilk_spn_0", that version
of GCC is buggy.
Comment 18 Xi Ruoyao 2017-03-16 12:03:44 UTC
> Compile with '-fcilkplus -S'. According to cilkplus spec, x() should be
> evaluated in the child (for the asm produced by GCC, in _cilk_spn_0).
> If you find "call x" in the asm of "z" instead of "_cilk_spn_0", that version
> of GCC is buggy.

Note that some version of GCC put "call _cilk_enter_frame_1" in "z", before
calling "x".  They are good.
Comment 19 Edgar Fournival 2017-03-16 12:31:29 UTC
Thanks for your explanations. I will investigate on my side and check the generated assembly of our project. Please keep us posted with the results of your bisection.
Comment 20 Xi Ruoyao 2017-03-16 12:56:06 UTC
On 2017-03-16 12:31 +0000, contact@edgar-fournival.fr wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80038
> 
> --- Comment #19 from Edgar Fournival <contact@edgar-fournival.fr> ---
> Thanks for your explanations. I will investigate on my side and check the
> generated assembly of our project. Please keep us posted with the results of
> your bisection.

Seems my explanation is still buggy.  We should say:

  There must be no frame change (cilk_..._enter_frame) between "call x"
  and "call y".

It's more reliable to compile and run Florent's original testcase.
Comment 21 Xi Ruoyao 2017-03-17 08:48:23 UTC
Start from r227423.

Fix spawned function with lambda function

Make sure that the spawned function's arguments will not be pushed
into lambda function.

gcc/c-family/

2015-09-02  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	PR middle-end/60586
	* c-common.h (cilk_gimplify_call_params_in_spawned_fn): New
	prototype.
	* c-gimplify.c (c_gimplify_expr): Added a call to the function
	cilk_gimplify_call_params_in_spawned_fn.
	* cilk.c (cilk_gimplify_call_params_in_spawned_fn): New function.
	(gimplify_cilk_spawn): Removed EXPR_STMT and CLEANUP_POINT_EXPR
	unwrapping.

gcc/cp/

2015-09-02  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	PR middle-end/60586
	* cp-gimplify.c (cilk_cp_gimplify_call_params_in_spawned_fn): New
	function.
	(cp_gimplify_expr): Added a call to the function
	cilk_cp_gimplify_call_params_in_spawned_fn.

gcc/testsuite/

2015-09-02  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	PR middle-end/60586
	* c-c++-common/cilk-plus/CK/pr60586.c: New file.
	* g++.dg/cilk-plus/CK/pr60586.cc: Likewise.

PR60586 makes GCC ICE on Florent's testcase.
Comment 22 Xi Ruoyao 2017-03-17 09:27:52 UTC
PR 60586 is a false PR to an expected behaviour...  Unfortunately nobody
noticed that and someone "fixed" it.

But without r227423 GCC just ICE on the testcase.
Comment 23 Xi Ruoyao 2017-03-17 09:33:51 UTC
Sorry.  I was too tired so I misunderstood r227423.

Now I am not sure whether it introduced the bug.  I'll test it with
NumericMonoid code.
Comment 24 Xi Ruoyao 2017-03-17 10:18:37 UTC
The cilkplus spec described an "unsymmetrical" construct/destruct of temp
variables.  It seems difficult to implement in GCC.  I'm trying to find a solution.
Comment 25 Xi Ruoyao 2017-03-17 10:27:21 UTC
I confirmed that r227423 broke NumericMonoid.
Comment 26 Xi Ruoyao 2017-03-17 22:11:40 UTC
Reverting r227423 makes both Florent's and NumericMonoid code work.  But it
reintroduced pr60586.  Trying to solve it.
Comment 27 Xi Ruoyao 2017-03-23 05:08:53 UTC
Created attachment 41027 [details]
patch proposal

Now I've prepared a patch.  It reverted r227423 (but preserved its testcase)
and used a more conforming way to resolve pr60586.

It seems this patch fixed Florent's testcase, and NumericMonoid code as well.

I'll run a complete dejaGNU regression test.  If it's OK I'll submit the patch.
Comment 28 Xi Ruoyao 2017-03-23 05:10:49 UTC
Created attachment 41028 [details]
testcase
Comment 29 Xi Ruoyao 2017-04-04 11:35:34 UTC
https://gcc.gnu.org/ml/gcc-patches/2017-03/msg01563.html

Patch proposal
Comment 30 Jeffrey A. Law 2017-04-12 21:17:56 UTC
Given that Cilk is likely to be deprecated, there's no way this should be a P2 issue.
Comment 31 Jeffrey A. Law 2017-05-01 22:26:34 UTC
Author: law
Date: Mon May  1 22:26:02 2017
New Revision: 247446

URL: https://gcc.gnu.org/viewcvs?rev=247446&root=gcc&view=rev
Log:
2017-05-01  Xi Ruoyao  <ryxi@stu.xidian.edu.cn>

        PR c++/80038
	* cilk_common.c (expand_builtin_cilk_detach): Move pedigree
	operations here.
	* gimplify.c (gimplify_cilk_detach): New function.
	(gimplify_call_expr, gimplify_modify_expr): Call it as needed.
	* tree-core.h: Document EXPR_CILK_SPAWN.
	* tree.h (EXPR_CILK_SPAWN): Define.

        PR c++/80038
	* c-common.h (cilk_gimplify_call_params_in_spawned_fn): Remove
	prototype.
	(cilk_install_body_pedigree_operations): Likewise.
	* cilk.c (cilk_set_spawn_marker): Mark functions that should be
	detatched.
	(cilk_gimplify_call_params_in_spawned_fn): Remove.
	(cilk_install_body_pedigree_operations): Likewise.
	(gimplify_cilk_spawn): Add EXPR_STMT and CLEANUP_POINT_EXPR
	unwrapping.

        PR c++/80038
	* c-gimplify.c (c_gimplify_expr): Remove calls to
	cilk_gimplifY_call_params_in_spawned_fn.

        PR c++/80038
	* cp-cilkplus.c (cilk_install_body_with_frame_cleanup): Don't
	add pedigree operation and detach call here.
	* cp-gimplify.c (cp_gimplify_expr): Remove the calls to
	cilk_cp_gimplify_call_params_in_spawned_fn.
	(cilk_cp_gimplify_call_params_in_spawned_fn): Remove function.
	* semantics.c (simplify_aggr_init_expr): Copy EXPR_CILK_SPAWN.

        PR c++/80038
	* lto-lang.c (lto_init): Set in_lto_p earlier.

	PR c++/80038
	* g++.dg/cilk-plus/CK/pr80038.cc: New test.

Added:
    trunk/gcc/testsuite/g++.dg/cilk-plus/CK/pr80038.cc
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/c-family/ChangeLog
    trunk/gcc/c-family/c-common.h
    trunk/gcc/c-family/c-gimplify.c
    trunk/gcc/c-family/cilk.c
    trunk/gcc/c/ChangeLog
    trunk/gcc/c/c-typeck.c
    trunk/gcc/cilk-common.c
    trunk/gcc/cp/ChangeLog
    trunk/gcc/cp/cp-cilkplus.c
    trunk/gcc/cp/cp-gimplify.c
    trunk/gcc/cp/semantics.c
    trunk/gcc/gimplify.c
    trunk/gcc/lto/ChangeLog
    trunk/gcc/lto/lto-lang.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-core.h
    trunk/gcc/tree.h
Comment 32 Richard Biener 2017-07-04 08:47:55 UTC
GCC 6.4 is being released, adjusting target milestone.