|Summary:||SPEC CPU2017 621.wrf_s failure with 40+ openmp threads|
|Product:||gcc||Reporter:||Jim Wilson <wilson>|
|Component:||libfortran||Assignee:||Not yet assigned to anyone <unassigned>|
|Severity:||normal||CC:||jvdelisle2, pthaugen, tkoenig, townsend|
|Build:||Known to work:|
|Known to fail:||Last reconfirmed:||2017-06-24 00:00:00|
|Bug Depends on:|
|Attachments:||patch to make wrf_s work|
Description Jim Wilson 2017-06-24 03:01:45 UTC
Comment 1 Jim Wilson 2017-06-24 03:07:55 UTC
Created attachment 41623 [details] patch to make wrf_s work
Comment 2 Thomas Koenig 2017-06-24 08:27:02 UTC
I was sort of waiting for the new SPEC suite to expose bugs :-) The patch looks obvious enough, even in the absence of a test case. Could you run a regression test and then submit the patch to both the gcc-patches and fortran mailing lists?
Comment 3 Dominique d'Humieres 2017-06-24 08:48:05 UTC
Could not the + __gthread_mutex_lock (&unit_lock); before the IFs be moved inside the IF blocks, making the second + __gthread_mutex_unlock (&unit_lock); not necessary?
Comment 4 Jim Wilson 2017-06-24 13:54:19 UTC
Suppose we move the locking inside the if statement. Suppose newunit_tos is 1. Two threads hit the statement if (newunit_tos) at the same time, and both enter the if block. We then hit the lock. The first thread gets the lock, grabs newunit_stack, sets newunit_tos to 0, and then unlocks. The second thread waits for the lock, grabs newunit_stack which is an invalid struct of all zeros, sets newunit_tos to -1, and then unlocks. The next thread then gets newunit_stack[-1] which is an out-of-bounds array access. There might be other ways to write this, but I'm convinced that newunit_tos does have to be checked after we grab the lock.
Comment 5 Jerry DeLisle 2017-06-24 17:58:14 UTC
(In reply to Thomas Koenig from comment #2) > I was sort of waiting for the new SPEC suite to expose bugs :-) > > The patch looks obvious enough, even in the absence > of a test case. > > Could you run a regression test and then submit the patch > to both the gcc-patches and fortran mailing lists? I concur. Thanks Jim, for the report too.
Comment 6 Jim Wilson 2017-06-24 20:16:46 UTC
I should point out that there are currently four places that use newunit_stack/newunit_tos without locking. Two of these places required locking code to get wrf working. The other two are init_units and close_units. These are called from a constructor and a destructor. Presumably these can only be called once, and hence don't require locking code. Though curiously, close_units has some locking code anyways. I'm not sure why. I saw no failures here, and hence didn't bother to add any locking.
Comment 7 Jim Wilson 2017-06-25 01:24:00 UTC
I have reproduced the same problem from the gcc-7 branch.
Comment 8 Thomas Koenig 2017-06-25 11:02:38 UTC
(In reply to Jim Wilson from comment #7) > I have reproduced the same problem from the gcc-7 branch. I fully expect this bug to be in all active branches of gfortran. We should commit a fix to all of them.
Comment 9 Jim Wilson 2017-06-26 21:41:19 UTC
Author: wilson Date: Mon Jun 26 21:40:47 2017 New Revision: 249667 URL: https://gcc.gnu.org/viewcvs?rev=249667&root=gcc&view=rev Log: Fix for SPEC CPU2017 621.wrf_s failure, add missing locking code. libgfortran/ PR libfortran/81195 * io/unit.c (get_unit): Call __gthread_mutex_lock before newunit_stack and newunit_tos references. Call __gthread_mutex_unlock afterward. Modified: trunk/libgfortran/ChangeLog trunk/libgfortran/io/unit.c
Comment 10 Jim Wilson 2017-06-26 21:45:22 UTC
Author: wilson Date: Mon Jun 26 21:44:50 2017 New Revision: 249668 URL: https://gcc.gnu.org/viewcvs?rev=249668&root=gcc&view=rev Log: Fix for SPEC CPu2017 621.wrf_s failure, add missing locking code. libgfortran/ Backport from trunk PR libfortran/81195 * io/unit.c (get_unit): Call __gthread_mutex_lock before newunit_stack and newunit_tos references. Call __gthread_mutex_unlock afterward. Modified: branches/gcc-7-branch/libgfortran/ChangeLog branches/gcc-7-branch/libgfortran/io/unit.c
Comment 11 Jim Wilson 2017-06-26 21:46:35 UTC
Fixed on trunk and gcc 7 branch.
Comment 12 Dominique d'Humieres 2017-06-29 15:09:53 UTC
*** Bug 81241 has been marked as a duplicate of this bug. ***