This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [tsan] Instrument atomics


On Fri, Nov 23, 2012 at 8:39 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Nov 23, 2012 at 08:10:39PM +0400, Dmitry Vyukov wrote:
>> > This patch attempts to instrument __atomic_* and __sync_* builtins.
>> > Unfortunately for none of the builtins there is 1:1 mapping to the tsan
>> > replacements, tsan uses weirdo memory model encoding (instead of values
>> > from 0 to 5 apparently 1 << 0 to 1 << 5, as if one could have more than
>> > one memory model at the same time), so even for the easy cases GCC
>> > has to transform them.
>>
>>
>> gcc must be using old version of the library. I've switched to ABI
>> constants some time ago.
>
> Ah, it was just me looking at llvm compiler-rt tsan checkout from a few days
> ago.  Guess I'll need to update the patch.  So, it now expects 0 for relaxed
> up to 5 for sequentially consistent?

Yes.


>> > More importantly, there is no way to pass through
>> > __ATOMIC_HLE_ACQUIRE/__ATOMIC_HLE_RELEASE.  Right now the patch just gives
>> > up and doesn't replace anything if e.g.
>> > __atomic_store_n (p, 0, __ATOMIC_HLE_RELEASE | __ATOMIC_RELEASE);
>> > is seen, perhaps one could ignore the hle bits which are just an
>> > optimization, but there is no way to find out in the generic code
>> > whether the extra bits are just an optimization or change the behavior
>> > of the builtin.
>> >
>>
>> Do you mean hardware lock elission? oh, boy
>
> Yes.
>
>> It's not just "an atomic". It should be legal to downgrade them to plain
>> atomic ops (however, I am not sure what memory order they must have... is
>> it possible to have HLE_ACQUIRE before seq_cst atomic rmw?). And I think
>> that's what we need to do.
>
> Perhaps if there wasn't the compatibility hack or what is that 100500
> comparison, or if it could be tweaked, we could pass through the HLE bits
> too and let the library decide what to do with it.

100500 was a tricky multi-step migration to the new scheme, because we
can't deploy compiler and runtime atomically.
I think we just need to drop HLE bits. I don't know what the runtime
can possibly do with them. And I don't want one more tricky multi-step
migration.



>
> If HLE bits are set, the low order bits (model & 65535) contains the
> normal memory model, i.e. 0 (relaxed) through 5 (seq_cst), and either 65536
> (hle acquire) or 131072 (hle release) is ored with that.
>
>> Well, it's a dirty implementation that relies on x86 memory model (and no
>> compiler optimizations, well, apparently there are data races :)).
>> I think eventually I will just replace them with mutexes.
>
> I don't see why mutexes would be better than just plain __atomic_* builtins.
> With mutexes there wouldn't be any atomicity...

For race detector any atomic operations is a heavy operations, which
is not atomic anyway.
Currently I do:

update vector clock
do the atomic operation
update vector clock

where 'update vector clock' is
lock container mutex
find atomic descriptor
lock atomic descriptor
unlock container mutex
update clock (O(N))
unlock atomic descriptor

it's much wiser to combine 2 vector clock updates and do the atomic
operation itself under the atomic descriptor mutex.



>> I've just committed a patch to llvm with failure_memory_order (r168518).
>
> Ok, I'll adjust the patch to pass both memory models through then.
>
>> Yeah, I think it's better to transform them to a more standard ones (llvm
>> also has own weird atomic ops and own memory orders).
>
> Ok, no change on the GCC side then needed for that beyond what I posted.
>
>> > Oh, and there are no 16-byte atomics provided by libtsan, the library
>> > would need to be built with -mcx16 on x86_64 for that and have the
>> > additional entry points for unsigned __int128.  The patch doesn't touch
>> > the 16-byte atomics, leaves them as is.
>>
>> I think that's fine for now.
>
> Perhaps.  Note that such atomics are used also e.g. for #pragma omp atomic
> on long double etc.
>
>> That's what llvm does as well. But it inserts a fast path before
>> __cxa_guard_acquire -- acquire-load of the lock word. Doesn't gcc do the
>> same?
>> tsan intercepts __cxa_guard functions.
>
> Yes, except it isn't __atomic_load_*, but plain memory read.
>   _3 = MEM[(char *)&_ZGVZ3foovE1a];
>   if (_3 == 0)
>     goto <bb 3>;
>   else
>     goto <bb 8>;
>
>   <bb 8>:
>   fast path, whatever;
>
>   <bb 3>:
>   _5 = __cxa_guard_acquire (&_ZGVZ3foovE1a);
>   ...
>
> So, right now tsan would just instrument it as __tsan_read1 from
> &_ZGVZ3foovE1a rather than any atomic load.


Looks like a bug. That needs to be load-acquire with proper compiler
and hardware memory ordering.




>> Well, yes, the compiler module must pass to the runtime all memory
>> accesses, whatever form they have in compiler internal representation.
>> Yes, I think I need to provide range memory access functions in runtime. I
>> already have this issue for Go language, there are a lot of range accesses
>> due to arrays and slices.
>> I will add them next week.
>
> Ok.  A slight problem then is that where the tsan pass sits right now, there
> is no easy way to find out if the builtin call will be expanded inline or
> not, so (similar for asan), if we instrument them in the pass, it might be
> instrumented twice at runtime if the builtin is expanded as a library call
> (once the added instrumentation for the builtin, once in the intercepted
> library call).  That isn't wrong, just might need slightly more resources
> than if we ensured we only instrument the builtin if it isn't expanded
> inline.
>
>         Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]