This is the mail archive of the
mailing list for the GCC project.
Re: Optimization of conditional access to globals: thread-unsafe?
- From: "Bart Van Assche" <bart dot vanassche at gmail dot com>
- To: "Tomash Brechko" <tomash dot brechko at gmail dot com>
- Cc: gcc at gcc dot gnu dot org, "Andrew Pinski" <pinskia at gmail dot com>
- Date: Fri, 26 Oct 2007 15:20:58 +0200
- Subject: Re: Optimization of conditional access to globals: thread-unsafe?
- References: <firstname.lastname@example.org>
On 10/21/07, Tomash Brechko <email@example.com> wrote:
> I have a question regarding the thread-safeness of a particular GCC
> optimization. I'm sorry if this was already discussed on the list, if
> so please provide me with the reference to the previous discussion.
> Consider this piece of code:
> extern int v;
> f(int set_v)
> if (set_v)
> v = 1;
> If f() is called concurrently from several threads, then call to f(1)
> should be protected by the mutex. But do we have to acquire the mutex
> for f(0) calls? I'd say no, why, there's no access to global v in
> that case. But GCC 3.3.4--4.3.0 on i686 with -01 generates the
> pushl %ebp
> movl %esp, %ebp
> cmpl $0, 8(%ebp)
> movl $1, %eax
> cmove v, %eax ; load (maybe)
> movl %eax, v ; store (always)
> popl %ebp
> Note the last unconditional store to v. Now, if some thread would
> modify v between our load and store (acquiring the mutex first), then
> we will overwrite the new value with the old one (and would do that in
> a thread-unsafe manner, not acquiring the mutex).
> So, do the calls to f(0) require the mutex, or it's a GCC bug?
> So, could someone explain me why this GCC optimization is valid, and,
> if so, where lies the boundary below which I may safely assume GCC
> won't try to store to objects that aren't stored to explicitly during
> particular execution path? Or maybe the named bug report is valid
> after all?
I'm not an expert in the C89/C99 standards, but I have written a Ph.D.
on the subject of memory models. What I learned during writing that
Ph.D. is the following:
- If you want to know which optimizations are valid and which ones are
not, you have to look at the semantics defined in the language
- Every language standard document defines what the result is of
executing a sequential program. The definition of the behavior of a
multithreaded program written in a certain programming language is
called the memory model of that programming language.
- The memory model of C and C++ is still under discussion as has
already been pointed out on this mailing list.
- Although the memory model for C and C++ is still under discussion,
there is a definition for the behavior of multithreaded C and C++
programs. The following is required by the ANSI/ISO C89 standard (from
paragraph 188.8.131.52, Program Execution):
Accessing a volatile object, modifying an object, modifying a file,
or calling a function
that does any of those operations are all side effects, which are
changes in the state of
the execution environment. Evaluation of an expression may produce
side effects. At
certain specified points in the execution sequence called sequence
points, all side effects
of previous evaluations shall be complete and no side effects of
shall have taken place. (A summary of the sequence points is given
in annex C.)
In annex C it is explained that a.o. the call to a function (after
argument evaluation) is a sequence point.
See also http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n843.pdf
- The above paragraph does not impose any limitation for the compiler
with regard to optimizations on non-volatile variables. Or: the
generated code shown in your mail is allowed by the above paragraph.
- The above paragraph has also the following implications for volatile
* There exists a total order for all accesses to all volatile variables.
* It is the responsibility of the compiler to ensure cache coherency
for volatile variables. If memory barrier instructions are needed to
ensure cache coherency on the architecture for which the compiler is
generating code for, then it is the responsibility of the compiler to
generate these instructions for volatile variables. This fact is often
* The compiler must generate code such that exactly one store
statement is executed for each assignment to a volatile variable.
Prefetching volatile variables is allowed as long as it does not
violate paragraph 184.108.40.206 from the language definition.
* As known the compiler may reorder function calls and assignments
to non-volatile variables if the compiler can prove that the called
function won't modify that variable. This becomes problematic if the
variable is modified by more than one thread and the called function
is a synchronization function, e.g. pthread_mutex_lock(). This kind of
reordering is highly undesirable. This is why any variable that is
shared over threads has to be declared volatile, even when using
explicit locking calls.
I hope the above brings more clarity in this discussion.
Bart Van Assche.