This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] Conditional count update for fast coverage test in multi-threaded programs
- From: Rong Xu <xur at google dot com>
- To: Richard Biener <richard dot guenther at gmail dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Jan Hubicka <hubicka at ucw dot cz>
- Date: Mon, 25 Nov 2013 11:19:30 -0800
- Subject: Re: [PATCH] Conditional count update for fast coverage test in multi-threaded programs
- Authentication-results: sourceware.org; auth=none
- References: <CAF1bQ=Sz3B0o-sU7v4OjsJCscDaEoPoKrDEqG2cSyYpc0oXaKw at mail dot gmail dot com> <CAFiYyc2Z2AzDX89d4gKMV3NMNatDoUDcX+OXFSwWZ49LU8nrNQ at mail dot gmail dot com> <CAF1bQ=Q=4mKkxWcCmNGLMr=NXpTRMcfWNXWQ_DY-=19c4JndCA at mail dot gmail dot com> <CAFiYyc0QpM0XCPdVGAJONtYyraE4DLjmFPwx4n5etnSu5BQT7A at mail dot gmail dot com>
On Mon, Nov 25, 2013 at 2:11 AM, Richard Biener
> On Fri, Nov 22, 2013 at 10:49 PM, Rong Xu <firstname.lastname@example.org> wrote:
>> On Fri, Nov 22, 2013 at 4:03 AM, Richard Biener
>> <email@example.com> wrote:
>>> On Fri, Nov 22, 2013 at 4:51 AM, Rong Xu <firstname.lastname@example.org> wrote:
>>>> This patch injects a condition into the instrumented code for edge
>>>> counter update. The counter value will not be updated after reaching
>>>> value 1.
>>>> The feature is under a new parameter --param=coverage-exec_once.
>>>> Default is disabled and setting to 1 to enable.
>>>> This extra check usually slows the program down. For SPEC 2006
>>>> benchmarks (all single thread programs), we usually see around 20%-35%
>>>> slow down in -O2 coverage build. This feature, however, is expected to
>>>> improve the coverage run speed for multi-threaded programs, because
>>>> there virtually no data race and false sharing in updating counters.
>>>> The improvement can be significant for highly threaded programs -- we
>>>> are seeing 7x speedup in coverage test run for some non-trivial google
>>>> Tested with bootstrap.
>>> Err - why not simply emit
>>> counter = 1
>>> for the counter update itself with that --param (I don't like a --param
>>> for this either).
>>> I assume that CPUs can avoid data-races and false sharing for
>>> non-changing accesses?
>> I'm not aware of any CPU having this feature. I think a write to the
>> shared cache line to invalidate all the shared copies. I cannot find
>> any reference on checking the value of the write. Do you have any
>> pointer to the feature?
> I don't have any pointer - but I remember seeing this in the context
> of atomics thus it may be only in the context of using a xchg
> or cmpxchg instruction. Which would make it non-portable to
> some extent (if you don't want to use atomic builtins here).
cmpxchg should work here -- it's a conditional write so the data race
/false sharing can be avoided.
I'm comparing the performance b/w explicit branch vs cmpxchg and will
>> I just tested this implementation vs. simply setting to 1, using
>> google search as the benchmark.
>> This one is 4.5x faster. The test was done on Intel Westmere systems.
>> I can change the parameter to an option.