[tsan] Instrument atomics

Xinliang David Li davidxl@google.com
Fri Nov 23 17:07:00 GMT 2012


On Fri, Nov 23, 2012 at 8:39 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Nov 23, 2012 at 08:10:39PM +0400, Dmitry Vyukov wrote:
>> > This patch attempts to instrument __atomic_* and __sync_* builtins.
>> > Unfortunately for none of the builtins there is 1:1 mapping to the tsan
>> > replacements, tsan uses weirdo memory model encoding (instead of values
>> > from 0 to 5 apparently 1 << 0 to 1 << 5, as if one could have more than
>> > one memory model at the same time), so even for the easy cases GCC
>> > has to transform them.
>>
>>
>> gcc must be using old version of the library. I've switched to ABI
>> constants some time ago.
>
> Ah, it was just me looking at llvm compiler-rt tsan checkout from a few days
> ago.  Guess I'll need to update the patch.  So, it now expects 0 for relaxed
> up to 5 for sequentially consistent?
>
>> > More importantly, there is no way to pass through
>> > __ATOMIC_HLE_ACQUIRE/__ATOMIC_HLE_RELEASE.  Right now the patch just gives
>> > up and doesn't replace anything if e.g.
>> > __atomic_store_n (p, 0, __ATOMIC_HLE_RELEASE | __ATOMIC_RELEASE);
>> > is seen, perhaps one could ignore the hle bits which are just an
>> > optimization, but there is no way to find out in the generic code
>> > whether the extra bits are just an optimization or change the behavior
>> > of the builtin.
>> >
>>
>> Do you mean hardware lock elission? oh, boy
>
> Yes.
>
>> It's not just "an atomic". It should be legal to downgrade them to plain
>> atomic ops (however, I am not sure what memory order they must have... is
>> it possible to have HLE_ACQUIRE before seq_cst atomic rmw?). And I think
>> that's what we need to do.
>
> Perhaps if there wasn't the compatibility hack or what is that 100500
> comparison, or if it could be tweaked, we could pass through the HLE bits
> too and let the library decide what to do with it.
>
> If HLE bits are set, the low order bits (model & 65535) contains the
> normal memory model, i.e. 0 (relaxed) through 5 (seq_cst), and either 65536
> (hle acquire) or 131072 (hle release) is ored with that.
>
>> Well, it's a dirty implementation that relies on x86 memory model (and no
>> compiler optimizations, well, apparently there are data races :)).
>> I think eventually I will just replace them with mutexes.
>
> I don't see why mutexes would be better than just plain __atomic_* builtins.
> With mutexes there wouldn't be any atomicity...
>
>> I've just committed a patch to llvm with failure_memory_order (r168518).
>
> Ok, I'll adjust the patch to pass both memory models through then.
>
>> Yeah, I think it's better to transform them to a more standard ones (llvm
>> also has own weird atomic ops and own memory orders).
>
> Ok, no change on the GCC side then needed for that beyond what I posted.
>
>> > Oh, and there are no 16-byte atomics provided by libtsan, the library
>> > would need to be built with -mcx16 on x86_64 for that and have the
>> > additional entry points for unsigned __int128.  The patch doesn't touch
>> > the 16-byte atomics, leaves them as is.
>>
>> I think that's fine for now.
>
> Perhaps.  Note that such atomics are used also e.g. for #pragma omp atomic
> on long double etc.
>
>> That's what llvm does as well. But it inserts a fast path before
>> __cxa_guard_acquire -- acquire-load of the lock word. Doesn't gcc do the
>> same?
>> tsan intercepts __cxa_guard functions.
>
> Yes, except it isn't __atomic_load_*, but plain memory read.
>   _3 = MEM[(char *)&_ZGVZ3foovE1a];
>   if (_3 == 0)
>     goto <bb 3>;
>   else
>     goto <bb 8>;
>
>   <bb 8>:
>   fast path, whatever;
>
>   <bb 3>:
>   _5 = __cxa_guard_acquire (&_ZGVZ3foovE1a);
>   ...
>
> So, right now tsan would just instrument it as __tsan_read1 from
> &_ZGVZ3foovE1a rather than any atomic load.
>
>> Well, yes, the compiler module must pass to the runtime all memory
>> accesses, whatever form they have in compiler internal representation.
>> Yes, I think I need to provide range memory access functions in runtime. I
>> already have this issue for Go language, there are a lot of range accesses
>> due to arrays and slices.
>> I will add them next week.
>
> Ok.  A slight problem then is that where the tsan pass sits right now, there
> is no easy way to find out if the builtin call will be expanded inline or
> not, so (similar for asan), if we instrument them in the pass, it might be
> instrumented twice at runtime if the builtin is expanded as a library call
> (once the added instrumentation for the builtin, once in the intercepted
> library call).  That isn't wrong, just might need slightly more resources
> than if we ensured we only instrument the builtin if it isn't expanded
> inline.
>

Should inlining of those functions be disabled as if -fno-builtins is specified?

David



>         Jakub



More information about the Gcc-patches mailing list