This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Fix sporadic failure in g++.dg/tsan/aligned_vs_unaligned_race.C

From: Mike Stump <mikestump at comcast dot net>
To: Bernd Edlinger <bernd dot edlinger at hotmail dot de>
Cc: Jakub Jelinek <jakub at redhat dot com>, "H.J. Lu" <hjl dot tools at gmail dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
Date: Sun, 4 Jan 2015 14:18:59 -0800
Subject: Re: [PATCH] Fix sporadic failure in g++.dg/tsan/aligned_vs_unaligned_race.C
Authentication-results: sourceware.org; auth=none
References: <DUB118-W421A4A9D8D09CBAF5103DAE45A0 at phx dot gbl>,<3C12133C-DABF-40FA-94F7-9DB785F6E914 at comcast dot net>,<DUB118-W292D09896C48BED7B29B14E45A0 at phx dot gbl>,<DUB118-W8F5E63DE36A8DB2DA1DF6E45B0 at phx dot gbl>,<623A5348-6FC9-4F7B-A9BC-B2B098AF7D37 at comcast dot net>,<20150104191658 dot GK1667 at tucnak dot redhat dot com> <DUB118-W450ACE7162969EFB27F817E45B0 at phx dot gbl>

On Jan 4, 2015, at 11:44 AM, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
> On Sun, 4 Jan 2015 20:16:58, Jakub Jelinek wrote:
>> 
>> On Sun, Jan 04, 2015 at 11:07:31AM -0800, Mike Stump wrote:
>>> On Jan 4, 2015, at 9:00 AM, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
>>>> It is due to a race condition in tsan itself, it cannot decide which access was the
>>>> previous one and which was the second one, but our tsan tests are not meant as
>>>> a functional test of the tsan runtime library, they are only meant to test the GCC
>>>> instrumentation.
>>> 
>>> Well, at least one test case that _is_ a functional test of the tsan runtime library isn’t a bad idea.
> 
> There are of course numerous functional tests, but they are all in LLVM's tree 
> at /compiler-rt/trunk/test/tsan
> 
>> 
>> The GCC instrumentation can be tested just by scanning the *.optimized dumps
>> or assembly. And perhaps the runtime test just be hidden by some special
>> environment variable that the user acks he doesn't main spurious FAILs.
>> 
> 
> I never had suspected the sleep calls to play such an important role, If I did, I would
> have smuggled one in before checking in….

We prefer a nice designed where each element can stand on its own and is defensible if reviewed.  :-)

> That's surely due to that there was no comment anywhere mentioning that race
> condition.

A comment or a bug report someplace that describe the race would be nice.

> I wonder if there is a test for that in the LLVM tree?  Probably they
> wouldn't consider that a BUG.

I would not assume that.  If given the choice between 5% slower and 100% reliable, and 5% faster and 90% reliable, it may be reasonable to pick the 100% reliable option. Also, in the fullness of time, maybe the choice between the two should be exposed to the user.  With 1 user, no option is fine, with 10,000 users, maybe 100 of them would like the other choice.

> I've looked deep in the implementation and I
> saw, for every 8-byte word, there are 4 shadow words, each containing previous
> accesses with call stack, and the __tsan_write8 functions just dont lock
> a mutex because of performance reasons, if the application does not have
> a race conditions tsan does not have a race either, if the application has a
> race condition, there is a certain chance that both threads call __tsan_write8,
> both look at the same shadow word, see nothing, and both write their
> callstack in the same shadow cell.

So, this sounds like programming 101 type stuff.  I’d like to think that once could atomic increment it from 0 and notice if they were first, and if not, know reliably they were not first (and could fix the value by subtracting the value back out).  I don’t think this would hurt performance much, if any.  I’m assuming they only read towards the end, and at the end, they could ensure that all the code finished running (that the atomic subtract above would have finished).

> But tsan still gets at least 90% chance to spot that.  As a matter of fact, the tsan runtime is just _incredibly_ fast,
> and catches errors, with a very high reliability.  But it is racy by design.

You say by design.  I’m skeptical of that.  Can you quote the email or the code were they said I wanted to design a system that wasn’t 100% reliable, because the best they could do was 2% slower, and they didn’t want it to be 2% slower?  I tend to think someone didn’t figure out they could atomically, locklessly do the update, or they didn’t understand the performance hit on the lockless code was small, or worse, they had no clue it was racy.  I’d be curious to know.  Maybe, there are other complications that prevent it from being non-racy.

Using sleep goes back decades, and I think that style should have mostly died around 1987 or so.  It’s 2015, and I’d rather consider it an obsolete style, exclusive of hard real-time.  I bring it up, as I’ve been burned by sleep and people that think sleep is a nice solution.

Follow-Ups:
- RE: [PATCH] Fix sporadic failure in g++.dg/tsan/aligned_vs_unaligned_race.C
  - From: Bernd Edlinger

References:
- [PATCH] Fix sporadic failure in g++.dg/tsan/aligned_vs_unaligned_race.C
  - From: Bernd Edlinger
- Re: [PATCH] Fix sporadic failure in g++.dg/tsan/aligned_vs_unaligned_race.C
  - From: Mike Stump
- RE: [PATCH] Fix sporadic failure in g++.dg/tsan/aligned_vs_unaligned_race.C
  - From: Bernd Edlinger
- RE: [PATCH] Fix sporadic failure in g++.dg/tsan/aligned_vs_unaligned_race.C
  - From: Bernd Edlinger
- Re: [PATCH] Fix sporadic failure in g++.dg/tsan/aligned_vs_unaligned_race.C
  - From: Mike Stump
- Re: [PATCH] Fix sporadic failure in g++.dg/tsan/aligned_vs_unaligned_race.C
  - From: Jakub Jelinek
- RE: [PATCH] Fix sporadic failure in g++.dg/tsan/aligned_vs_unaligned_race.C
  - From: Bernd Edlinger

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]