This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [tsan] Instrument atomics


On Fri, Nov 23, 2012 at 08:10:39PM +0400, Dmitry Vyukov wrote:
> > This patch attempts to instrument __atomic_* and __sync_* builtins.
> > Unfortunately for none of the builtins there is 1:1 mapping to the tsan
> > replacements, tsan uses weirdo memory model encoding (instead of values
> > from 0 to 5 apparently 1 << 0 to 1 << 5, as if one could have more than
> > one memory model at the same time), so even for the easy cases GCC
> > has to transform them.
> 
> 
> gcc must be using old version of the library. I've switched to ABI
> constants some time ago.

Ah, it was just me looking at llvm compiler-rt tsan checkout from a few days
ago.  Guess I'll need to update the patch.  So, it now expects 0 for relaxed
up to 5 for sequentially consistent?

> > More importantly, there is no way to pass through
> > __ATOMIC_HLE_ACQUIRE/__ATOMIC_HLE_RELEASE.  Right now the patch just gives
> > up and doesn't replace anything if e.g.
> > __atomic_store_n (p, 0, __ATOMIC_HLE_RELEASE | __ATOMIC_RELEASE);
> > is seen, perhaps one could ignore the hle bits which are just an
> > optimization, but there is no way to find out in the generic code
> > whether the extra bits are just an optimization or change the behavior
> > of the builtin.
> >
> 
> Do you mean hardware lock elission? oh, boy

Yes.

> It's not just "an atomic". It should be legal to downgrade them to plain
> atomic ops (however, I am not sure what memory order they must have... is
> it possible to have HLE_ACQUIRE before seq_cst atomic rmw?). And I think
> that's what we need to do.

Perhaps if there wasn't the compatibility hack or what is that 100500
comparison, or if it could be tweaked, we could pass through the HLE bits
too and let the library decide what to do with it.

If HLE bits are set, the low order bits (model & 65535) contains the
normal memory model, i.e. 0 (relaxed) through 5 (seq_cst), and either 65536
(hle acquire) or 131072 (hle release) is ored with that.

> Well, it's a dirty implementation that relies on x86 memory model (and no
> compiler optimizations, well, apparently there are data races :)).
> I think eventually I will just replace them with mutexes.

I don't see why mutexes would be better than just plain __atomic_* builtins.
With mutexes there wouldn't be any atomicity...

> I've just committed a patch to llvm with failure_memory_order (r168518).

Ok, I'll adjust the patch to pass both memory models through then.

> Yeah, I think it's better to transform them to a more standard ones (llvm
> also has own weird atomic ops and own memory orders).

Ok, no change on the GCC side then needed for that beyond what I posted.

> > Oh, and there are no 16-byte atomics provided by libtsan, the library
> > would need to be built with -mcx16 on x86_64 for that and have the
> > additional entry points for unsigned __int128.  The patch doesn't touch
> > the 16-byte atomics, leaves them as is.
> 
> I think that's fine for now.

Perhaps.  Note that such atomics are used also e.g. for #pragma omp atomic
on long double etc.

> That's what llvm does as well. But it inserts a fast path before
> __cxa_guard_acquire -- acquire-load of the lock word. Doesn't gcc do the
> same?
> tsan intercepts __cxa_guard functions.

Yes, except it isn't __atomic_load_*, but plain memory read.
  _3 = MEM[(char *)&_ZGVZ3foovE1a];
  if (_3 == 0)
    goto <bb 3>;
  else
    goto <bb 8>;

  <bb 8>:
  fast path, whatever;

  <bb 3>:
  _5 = __cxa_guard_acquire (&_ZGVZ3foovE1a);
  ...

So, right now tsan would just instrument it as __tsan_read1 from
&_ZGVZ3foovE1a rather than any atomic load.

> Well, yes, the compiler module must pass to the runtime all memory
> accesses, whatever form they have in compiler internal representation.
> Yes, I think I need to provide range memory access functions in runtime. I
> already have this issue for Go language, there are a lot of range accesses
> due to arrays and slices.
> I will add them next week.

Ok.  A slight problem then is that where the tsan pass sits right now, there
is no easy way to find out if the builtin call will be expanded inline or
not, so (similar for asan), if we instrument them in the pass, it might be
instrumented twice at runtime if the builtin is expanded as a library call
(once the added instrumentation for the builtin, once in the intercepted
library call).  That isn't wrong, just might need slightly more resources
than if we ensured we only instrument the builtin if it isn't expanded
inline.

	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]