The Itanium code for GOMP_start_critical starts 0x2000000000334900 <GOMP_critical_start>: [MMI] alloc r16=ar.pfs,1,1,0 0x2000000000334901 <GOMP_critical_start+1>: addl r32=840,r1 0x2000000000334902 <GOMP_critical_start+2>: nop.i 0x0 0x2000000000334910 <GOMP_critical_start+16>: [MMI] mf;; 0x2000000000334911 <GOMP_critical_start+17>: mov.m ar.ccv=0 0x2000000000334912 <GOMP_critical_start+18>: mov r14=1;; 0x2000000000334920 <GOMP_critical_start+32>: [MMI] nop.m 0x0 0x2000000000334921 <GOMP_critical_start+33>: cmpxchg4.rel r14=[r32],r14,ar.ccv 0x2000000000334922 <GOMP_critical_start+34>: nop.i 0x0;; 0x2000000000334930 <GOMP_critical_start+48>: [MIB] nop.m 0x0 0x2000000000334931 <GOMP_critical_start+49>: cmp.eq p6,p7=0,r14 0x2000000000334932 <GOMP_critical_start+50>: (p06) br.ret.dptk.many b0;; Note the mf followed by a cmxchg4.rel. I don't believe this enforces sufficient memory ordering constraints. A subsequent store from inside the critical section may become visible to other threads before the cmpxchg4.rel, which is only intended to prevent reordering in the OTHER direction. Thus a store inside the critical section can become visible before the lock is really acquired, which is, at least theoretically, very bad. I do not know whether current hardware may actually execute these out of order. I observed this while trying to understand the GNU OpenMP support. I also don't know whether this problem is limited to Itanium. I expect it doesn't exist on X86. It may exist onother weakly-ordered architectures. I believe that this is due to incorrect code generated for the __sync_bool_compare_and_swap in gomp_mutex_lock().
Please check trunk.
I think the code is wrong. Looking at ToT, config/ia64/sync.md will generate the code shown where the memory fence is in front of the cmpxchg4.rel instruction. cmpxchg4.rel has release semantics which are defined in the Itanium manual as: "Release instructions guarentee that all previous orderable instructions are made visible prior to being made visible themselves." This would imply to me that the memory fence before the cmpxchg4.rel is not needed but we do need one after it in order to prevent things being moved up ahead of the cmpxchg4.rel. Alternatively, we could change to code to use cmpxchg4.acq instead of cmpxchg4.rel and keep the memory fence where it is. cmpxchg4.acq has acquire semantics instead of release semantics.
__sync_bool_compare_and_swap is documented to be a full barrier, so I think the bug must be on the ia64/sync.md side.
Yes, I think this is clearly a bug in the IA64 definition of sync_compare_and_swap. I think the fix is swapping the two instructions being generated by the IA64 sync_compare_and_swap instruction. (cmpxchg4.rel followed by a memory fence instead of a memory fence followed by cmpxchg4.rel).
Subject: Bug 42869 Author: sje Date: Fri Mar 12 18:19:14 2010 New Revision: 157410 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=157410 Log: 2010-03-12 Steve Ellcey <sje@cup.hp.com> PR target/42869 * config/ia64/sync.md (sync_compare_and_swap): Move memory fence. Modified: trunk/gcc/ChangeLog trunk/gcc/config/ia64/sync.md
Fixed by moving the mf after the cmpxch.rel. The cmpxchg.rel will keep memory operations from moving down and the mf will keep them from moving up. Changing cmpxchg.rel to cmpxchg.acq would have achieved the same result.
Subject: Bug 42869 Author: sje Date: Wed Jul 21 22:37:53 2010 New Revision: 162387 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162387 Log: 2010-07-21 Steve Ellcey <sje@cup.hp.com> PR target/42869 * config/ia64/sync.md (sync_compare_and_swap): Move memory fence. Modified: branches/gcc-4_4-branch/gcc/ChangeLog branches/gcc-4_4-branch/gcc/config/ia64/sync.md