This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libgomp/45025] New: Memory ordering issues with libgomp critical sections and __sync


This is really a continuation of bug 42869, but that title mentions Itanium,
and this is not Itanium-specific.

Gomp_mutex_lock and as a result OpenMP critical sections and locks do not
implement sane memory ordering semantics.  This is largely due to __sync
implementation problems.  I looked at Itanium and PowerPC, but other weakly
ordered architectures are probably similar.

I was trying to understand the memory ordering semantics of OpenMP critical
sections with gcc, since the 3.0 OpenMP spec requires full fence semantics,
which seems to be overkill.  I concluded that the current state of affairs in
4.4.4 doesn't make much sense, independent of the spec interpretation. 
Specific issues:

1. The Itanium version of gomp_mutex_lock still has the fence on the wrong side
of the cmpxchg4.rel.  This is a __sync_bool_compare_and_swap problem, and was
fixed in the trunk (bug 42869, thanks Steve!) but still appears to be wrong in
the 4.4 series.  I think this needs a backport to 4.4, since we may effectively
be generating critical sections that don't guarantee mutual exclusion.  (I
don't know whether current hardware provides stronger properties that mitigate
this, but ...)

2. The PowerPC & Itanium version of gomp_mutex_lock explicitly calls
__sync_bool_compare_and_swap.  Nonetheless the PowerPC code ends up generating
a code sequence along the lines of

lwsync; lwarx; stwcx.; bne; isync

As far as I can determine, this does not order a preceding store with respect
to a later load.  (It uses lwsync, not sync.) It's unclear whether this is a
problem with respect to the OpenMP spec.  (Technically it is for 3.0, but 3.1
was reworded, on my suggestion, to allow this, since it's a useless
restriction.)  But I think it clearly fails to satisfy the
__sync_bool_compare_and_swap spec, which claims this should be a "full
barrier".

More generally, gomp_mutex_lock() appears inconsistent about whether it
satisfies the OpenMP 3.0 and earlier rules that it include a flush or full
barrier.  Modulo the earlier bug, the Itanium versions do.  The PowerPC version
does for unlock, but not lock.  (And given that it doesn't, I'm not sure
there's a point behind the lwsync in the lock sequence either.)  Since this
choice has a significant performance impact on both architectures, this seems
to be a way to get the worst of all possible worlds; we get both weak
guarantees and, in many cases, bad performance.

I would recommend following the OpenMP3.1 draft spec, and insisting on only
acquire/release semantics for gomp_mutex_(un)lock.


-- 
           Summary: Memory ordering issues with libgomp critical sections
                    and  __sync
           Product: gcc
           Version: 4.4.4
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libgomp
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: Hans dot Boehm at hp dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45025


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]