Summary: | SPEC CPU2017 621.wrf_s failure with 40+ openmp threads | ||
---|---|---|---|
Product: | gcc | Reporter: | Jim Wilson <wilson> |
Component: | libfortran | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | jvdelisle2, pthaugen, tkoenig, townsend |
Priority: | P3 | Keywords: | openmp, wrong-code |
Version: | 7.1.1 | ||
Target Milestone: | --- | ||
See Also: | https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103985 | ||
Host: | Target: | ||
Build: | Known to work: | ||
Known to fail: | Last reconfirmed: | 2017-06-24 00:00:00 | |
Bug Depends on: | |||
Bug Blocks: | 82555 | ||
Attachments: | patch to make wrf_s work |
Description
Jim Wilson
2017-06-24 03:01:45 UTC
Created attachment 41623 [details]
patch to make wrf_s work
I was sort of waiting for the new SPEC suite to expose bugs :-) The patch looks obvious enough, even in the absence of a test case. Could you run a regression test and then submit the patch to both the gcc-patches and fortran mailing lists? Could not the + __gthread_mutex_lock (&unit_lock); before the IFs be moved inside the IF blocks, making the second + __gthread_mutex_unlock (&unit_lock); not necessary? Suppose we move the locking inside the if statement. Suppose newunit_tos is 1. Two threads hit the statement if (newunit_tos) at the same time, and both enter the if block. We then hit the lock. The first thread gets the lock, grabs newunit_stack[1], sets newunit_tos to 0, and then unlocks. The second thread waits for the lock, grabs newunit_stack[0] which is an invalid struct of all zeros, sets newunit_tos to -1, and then unlocks. The next thread then gets newunit_stack[-1] which is an out-of-bounds array access. There might be other ways to write this, but I'm convinced that newunit_tos does have to be checked after we grab the lock. (In reply to Thomas Koenig from comment #2) > I was sort of waiting for the new SPEC suite to expose bugs :-) > > The patch looks obvious enough, even in the absence > of a test case. > > Could you run a regression test and then submit the patch > to both the gcc-patches and fortran mailing lists? I concur. Thanks Jim, for the report too. I should point out that there are currently four places that use newunit_stack/newunit_tos without locking. Two of these places required locking code to get wrf working. The other two are init_units and close_units. These are called from a constructor and a destructor. Presumably these can only be called once, and hence don't require locking code. Though curiously, close_units has some locking code anyways. I'm not sure why. I saw no failures here, and hence didn't bother to add any locking. I have reproduced the same problem from the gcc-7 branch. (In reply to Jim Wilson from comment #7) > I have reproduced the same problem from the gcc-7 branch. I fully expect this bug to be in all active branches of gfortran. We should commit a fix to all of them. Author: wilson Date: Mon Jun 26 21:40:47 2017 New Revision: 249667 URL: https://gcc.gnu.org/viewcvs?rev=249667&root=gcc&view=rev Log: Fix for SPEC CPU2017 621.wrf_s failure, add missing locking code. libgfortran/ PR libfortran/81195 * io/unit.c (get_unit): Call __gthread_mutex_lock before newunit_stack and newunit_tos references. Call __gthread_mutex_unlock afterward. Modified: trunk/libgfortran/ChangeLog trunk/libgfortran/io/unit.c Author: wilson Date: Mon Jun 26 21:44:50 2017 New Revision: 249668 URL: https://gcc.gnu.org/viewcvs?rev=249668&root=gcc&view=rev Log: Fix for SPEC CPu2017 621.wrf_s failure, add missing locking code. libgfortran/ Backport from trunk PR libfortran/81195 * io/unit.c (get_unit): Call __gthread_mutex_lock before newunit_stack and newunit_tos references. Call __gthread_mutex_unlock afterward. Modified: branches/gcc-7-branch/libgfortran/ChangeLog branches/gcc-7-branch/libgfortran/io/unit.c Fixed on trunk and gcc 7 branch. *** Bug 81241 has been marked as a duplicate of this bug. *** |