This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Conditional count update for fast coverage test in multi-threaded programs

On Mon, Nov 25, 2013 at 2:11 AM, Richard Biener
<> wrote:
> On Fri, Nov 22, 2013 at 10:49 PM, Rong Xu <> wrote:
>> On Fri, Nov 22, 2013 at 4:03 AM, Richard Biener
>> <> wrote:
>>> On Fri, Nov 22, 2013 at 4:51 AM, Rong Xu <> wrote:
>>>> Hi,
>>>> This patch injects a condition into the instrumented code for edge
>>>> counter update. The counter value will not be updated after reaching
>>>> value 1.
>>>> The feature is under a new parameter --param=coverage-exec_once.
>>>> Default is disabled and setting to 1 to enable.
>>>> This extra check usually slows the program down. For SPEC 2006
>>>> benchmarks (all single thread programs), we usually see around 20%-35%
>>>> slow down in -O2 coverage build. This feature, however, is expected to
>>>> improve the coverage run speed for multi-threaded programs, because
>>>> there virtually no data race and false sharing in updating counters.
>>>> The improvement can be significant for highly threaded programs -- we
>>>> are seeing 7x speedup in coverage test run for some non-trivial google
>>>> applications.
>>>> Tested with bootstrap.
>>> Err - why not simply emit
>>>   counter = 1
>>> for the counter update itself with that --param (I don't like a --param
>>> for this either).
>>> I assume that CPUs can avoid data-races and false sharing for
>>> non-changing accesses?
>> I'm not aware of any CPU having this feature. I think a write to the
>> shared cache line to invalidate all the shared copies. I cannot find
>> any reference on checking the value of the write. Do you have any
>> pointer to the feature?
> I don't have any pointer - but I remember seeing this in the context
> of atomics thus it may be only in the context of using a xchg
> or cmpxchg instruction.  Which would make it non-portable to
> some extent (if you don't want to use atomic builtins here).

cmpxchg should work here -- it's a conditional write so the data race
/false sharing can be avoided.
I'm comparing the performance b/w explicit branch vs cmpxchg and will
report back.


> Richard.
>> I just tested this implementation vs. simply setting to 1, using
>> google search as the benchmark.
>> This one is 4.5x faster. The test was done on Intel Westmere systems.
>> I can change the parameter to an option.
>> -Rong
>>> Richard.
>>>> Thanks,
>>>> -Rong

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]