Bug 78387 - OpenMP segfault/stack size exceeded writing to internal file
Summary: OpenMP segfault/stack size exceeded writing to internal file
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: fortran (show other bugs)
Version: 7.0
: P3 enhancement
Target Milestone: ---
Assignee: Jerry DeLisle
URL:
Keywords:
: 82555 (view as bug list)
Depends on:
Blocks: 82555
  Show dependency treegraph
 
Reported: 2016-11-16 20:26 UTC by Andrew Benson
Modified: 2022-01-12 01:06 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-11-17 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Benson 2016-11-16 20:26:02 UTC
The following code when run using OpenMP crashes with a "stack size exceeded" error (and, sometimes, a segfault) under gfortran 7.0.0 (r242500). This happens most times I run the code, but not every time. When compiled with gfortran 6.0.0 it runs successfully every time.

program p
  double precision :: b
  b=0.1d0
  !$omp parallel
  call s(b)
  !$omp end parallel
contains
  subroutine s(a)
    implicit none
    double precision, intent(in) :: a
    character(len=10) :: f,c
    write (f,'(a5,i1,a1)') "(e10.",2,")"
    write (c,f) a
    write (0,*) trim(c)
    return
  end subroutine s
end program p

$ gfortran -v
Using built-in specs.
COLLECT_GCC=gfortran
COLLECT_LTO_WRAPPER=/home/abenson/Galacticus/Tools/libexec/gcc/x86_64-pc-linux-gnu/7.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-trunk/configure --prefix=/home/abenson/Galacticus/Tools --enable-languages=c,c++,fortran --disable-multilib
Thread model: posix
gcc version 7.0.0 20161116 (experimental) (GCC) 

$ gfortran t.F90 -g -fopenmp
[abenson@mies v0.9.4]$ a.out 
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
At line 12 of file t.F90
Internal Error: stash_internal_unit(): Stack Size Exceeded

Error termination. Backtrace:
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
#0  0x400a7b in s
        at /home/abenson/Galacticus/v0.9.4/t.F90:12
#1  0x400c53 in MAIN__._omp_fn.0
        at /home/abenson/Galacticus/v0.9.4/t.F90:5
#2  0x7f786cbc955d in gomp_thread_start
        at ../../../gcc-trunk/libgomp/team.c:119
#3  0x7f786bdbdc39 in start_thread
        at /data001/abenson/Galacticus/Tools/glibc-2.12.1/nptl/pthread_create.c:301
#4  0x7f786c4c361c in ???
        at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
#5  0xffffffffffffffff in ???

The frequency with which the code crashes increases as the number of OpenMP threads is increased (running with 2 threads it almost always runs successfully, with 4 threads it crashes on 1 run in 10-20, with 16 threads it crashes almost every time).

The error is sometimes reported at line 12, sometimes at line 13 - but always the same "stack size exceeded" error.
Comment 1 Thomas Koenig 2016-11-16 20:42:04 UTC
Hi Andrew,

I/O operations are not thread safe.  If you enclose them
in !$omp critical, you will get the expected result:

ig25@linux-fd1f:/tmp> cat omp.f90

program p
  double precision :: b
  b=0.1d0
  !$omp parallel
  call s(b)
  !$omp end parallel
contains
  subroutine s(a)
    implicit none
    double precision, intent(in) :: a
    character(len=10) :: f,c
    !$omp critical
    write (f,'(a5,i1,a1)') "(e10.",2,")"
    write (c,f) a
    write (0,*) trim(c)
    !$omp end critical
    return
  end subroutine s
end program p
ig25@linux-fd1f:/tmp> gfortran -fopenmp omp.f90
ig25@linux-fd1f:/tmp> ./a.out
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
   0.10E+00
Comment 2 Andrew Benson 2016-11-16 20:44:50 UTC
OK - thanks. I hadn't realized that the internal I/O operations weren't threadsafe - I guess I've just been fortunate to avoid this with previous versions of gfortran. I'll update my code to use critical sections as necessary.

Thanks for the fast response!
Comment 3 Jakub Jelinek 2016-11-16 20:47:28 UTC
I thought they are supposed to be, at least that is what we had unit_lock, u->lock etc. for.  So has something in libgfortran changed so that it doesn't properly lock the units anymore?
Comment 4 Jerry DeLisle 2016-11-17 00:03:28 UTC
(In reply to Jakub Jelinek from comment #3)
> I thought they are supposed to be, at least that is what we had unit_lock,
> u->lock etc. for.  So has something in libgfortran changed so that it
> doesn't properly lock the units anymore?

Our intent was to maintain the correct locking. Recent changes for the DTIO patch may have broken something. We did touch a little there and thought we had it correct.
Comment 5 Andrew Benson 2016-11-17 01:21:01 UTC
I couldn't find anything in the OpenMP specifications which addresses this issue - so presumably it's undefined behavior as far as OpenMP is concerned. But it seems that internal file writes were intended to be thread safe in gfortran - am I correct in understanding that? From what I can tell Intel fortran also makes internal file writes thread safe, for what it's worth.
Comment 6 Thomas Koenig 2016-11-17 07:13:56 UTC
OK, in that case, maybe I was hasty :-)

The original test case has an external write, but the error does
not depend on this.

So, confirming.
Comment 7 Jim Wilson 2017-06-28 18:54:44 UTC
The testcase can trigger the bug fixed in
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81195
if the number of threads is high enough.  It also triggers an unrelated problem which gives the stack size exceeded error.

Looking at this issue, I see that in libgfortran/io/unit.c, the newunit_alloc function starts with
      newunits = xcalloc (16, 1);
      newunit_size = 16;
and then if we need more of them, it does
  newunit_size *= 2;
  newunits = xrealloc (newunits, newunit_size);

However, for stash_internal_unit, we have as global definitions
  #define NEWUNIT_STACK_SIZE 16
  static gfc_saved_unit newunit_stack[NEWUNIT_STACK_SIZE];
and inside the function we have
  if (newunit_tos >= NEWUNIT_STACK_SIZE)

When the testcase fails, newunit_size is 32.  So we have allocated more units than we can stash.

It looks like the fix is to change stash_internal_unit to dynamically resize newunit_stack, same as the newunit_alloc function already does for newunits.  This appears related to bug 48587, which added the code to increase the size of newunits array.
Comment 8 Jerry DeLisle 2017-06-30 22:37:56 UTC
(In reply to Jim Wilson from comment #7)
--- snip ---
> 
> However, for stash_internal_unit, we have as global definitions
>   #define NEWUNIT_STACK_SIZE 16
>   static gfc_saved_unit newunit_stack[NEWUNIT_STACK_SIZE];
> and inside the function we have
>   if (newunit_tos >= NEWUNIT_STACK_SIZE)

That size is fairly arbitrary. Short term we can bump this limit up to some higher value. Longer term, I wonder if this needs to be more dynamic.  I will get on it.
Comment 9 Dominique d'Humieres 2017-06-30 23:19:54 UTC
Related to pr81195 and pr81241? Works on darwin with 8 threads.
Comment 10 Jim Wilson 2017-07-01 04:58:52 UTC
newunit_stack needs to be allocated dynamically, just like the newunits array.

I modified libgfortran to compute the highest value of newunit_tos, and print it when the program exits.  I set NEWUNIT_STACK_SIZE to 128 so I could go up to 128 threads.

weathertop:2188$ OMP_NUM_THREADS=8 ./a.out 2> /dev/null
newunit_tos_max = 8
weathertop:2188$ OMP_NUM_THREADS=16 ./a.out 2> /dev/null
newunit_tos_max = 16
weathertop:2189$ OMP_NUM_THREADS=32 ./a.out 2> /dev/null
newunit_tos_max = 32
weathertop:2190$ OMP_NUM_THREADS=64 ./a.out 2> /dev/null
newunit_tos_max = 64
weathertop:2191$ OMP_NUM_THREADS=128 ./a.out 2> /dev/null
newunit_tos_max = 128
Segmentation fault (core dumped)
weathertop:2192$

So the size of newunit stack depends on the number of threads, and can be any number.  If you don't want to put a hard limit of the number of threads that gfortran can support, then you need to dynamically allocate it.

There is some variation from one run to the next, so using 16 threads doesn't guarantee failure, and using 8 threads doesn't guarantee success.

weathertop:2192$ OMP_NUM_THREADS=8 ./a.out 2> /dev/null
newunit_tos_max = 8
weathertop:2193$ OMP_NUM_THREADS=8 ./a.out 2> /dev/null
newunit_tos_max = 8
weathertop:2193$ OMP_NUM_THREADS=8 ./a.out 2> /dev/null
newunit_tos_max = 8
weathertop:2193$ OMP_NUM_THREADS=8 ./a.out 2> /dev/null
newunit_tos_max = 8
weathertop:2193$ OMP_NUM_THREADS=8 ./a.out 2> /dev/null
newunit_tos_max = 3
weathertop:2193$ OMP_NUM_THREADS=8 ./a.out 2> /dev/null
newunit_tos_max = 8
weathertop:2193$ OMP_NUM_THREADS=8 ./a.out 2> /dev/null
newunit_tos_max = 5
weathertop:2193$ OMP_NUM_THREADS=8 ./a.out 2> /dev/null
newunit_tos_max = 8
weathertop:2193$ OMP_NUM_THREADS=8 ./a.out 2> /dev/null
newunit_tos_max = 8
weathertop:2193$
Comment 11 Jerry DeLisle 2017-08-19 21:05:59 UTC
After thinking about this, I think I have a better idea which gets rid of the stash completely and will avoid allocating a unit structure for internal units. To accomplish this, I intend to pass the string, string length and string array descriptor directly to the dtio child procedures vis hidden arguments. Just want to capture the idea here for others heads up.
Comment 12 Jerry DeLisle 2017-08-20 18:55:18 UTC
Patch submitted here:

https://gcc.gnu.org/ml/fortran/2017-08/msg00045.html
Comment 13 Jerry DeLisle 2017-08-28 03:43:19 UTC
Author: jvdelisle
Date: Mon Aug 28 03:42:47 2017
New Revision: 251374

URL: https://gcc.gnu.org/viewcvs?rev=251374&root=gcc&view=rev
Log:
2017-08-27  Jerry DeLisle  <jvdelisle@gcc.gnu.org>

	PR libgfortran/78387
	* io/list_read.c (nml_read_obj): Remove use of stash.
	* io/transfer.c (st_read_done, st_write_done): Likewise.
	* io/unit.c (stash_internal_unit): Delete function.
	(get_unit): Remove use of stash.
	(init_units): Likewise.
	(close_units): Likewise.
	* io/write.c (nml_write_obj): Likewise:

Modified:
    trunk/libgfortran/ChangeLog
    trunk/libgfortran/io/list_read.c
    trunk/libgfortran/io/transfer.c
    trunk/libgfortran/io/unit.c
    trunk/libgfortran/io/write.c
Comment 14 Aldy Hernandez 2017-09-13 17:32:24 UTC
Author: aldyh
Date: Wed Sep 13 17:31:53 2017
New Revision: 252585

URL: https://gcc.gnu.org/viewcvs?rev=252585&root=gcc&view=rev
Log:
2017-08-27  Jerry DeLisle  <jvdelisle@gcc.gnu.org>

	PR libgfortran/78387
	* io/list_read.c (nml_read_obj): Remove use of stash.
	* io/transfer.c (st_read_done, st_write_done): Likewise.
	* io/unit.c (stash_internal_unit): Delete function.
	(get_unit): Remove use of stash.
	(init_units): Likewise.
	(close_units): Likewise.
	* io/write.c (nml_write_obj): Likewise:

Modified:
    branches/range-gen2/libgfortran/ChangeLog
    branches/range-gen2/libgfortran/io/list_read.c
    branches/range-gen2/libgfortran/io/transfer.c
    branches/range-gen2/libgfortran/io/unit.c
    branches/range-gen2/libgfortran/io/write.c
Comment 15 Jerry DeLisle 2017-09-20 01:33:32 UTC
Author: jvdelisle
Date: Wed Sep 20 01:32:59 2017
New Revision: 252992

URL: https://gcc.gnu.org/viewcvs?rev=252992&root=gcc&view=rev
Log:
2017-09-19  Jerry DeLisle  <jvdelisle@gcc.gnu.org>

	Backport from trunk
	PR libgfortran/78387
	* io/list_read.c (nml_read_obj): Remove use of stash.
	* io/transfer.c (st_read_done, st_write_done): Likewise.
	* io/unit.c (stash_internal_unit): Delete function.
	(get_unit): Remove use of stash.
	(init_units): Likewise.
	(close_units): Likewise.
	* io/write.c (nml_write_obj): Likewise:

Modified:
    branches/gcc-7-branch/libgfortran/ChangeLog
    branches/gcc-7-branch/libgfortran/io/list_read.c
    branches/gcc-7-branch/libgfortran/io/transfer.c
    branches/gcc-7-branch/libgfortran/io/unit.c
    branches/gcc-7-branch/libgfortran/io/write.c
Comment 16 Jerry DeLisle 2017-09-20 01:37:48 UTC
Fixed on trunk and 7. Closing
Comment 17 kugan 2017-10-15 10:16:56 UTC
*** Bug 82555 has been marked as a duplicate of this bug. ***