This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Conditional count update for fast coverage test in multi-threaded programs


On Fri, Nov 22, 2013 at 10:49 PM, Rong Xu <xur@google.com> wrote:
> On Fri, Nov 22, 2013 at 4:03 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Fri, Nov 22, 2013 at 4:51 AM, Rong Xu <xur@google.com> wrote:
>>> Hi,
>>>
>>> This patch injects a condition into the instrumented code for edge
>>> counter update. The counter value will not be updated after reaching
>>> value 1.
>>>
>>> The feature is under a new parameter --param=coverage-exec_once.
>>> Default is disabled and setting to 1 to enable.
>>>
>>> This extra check usually slows the program down. For SPEC 2006
>>> benchmarks (all single thread programs), we usually see around 20%-35%
>>> slow down in -O2 coverage build. This feature, however, is expected to
>>> improve the coverage run speed for multi-threaded programs, because
>>> there virtually no data race and false sharing in updating counters.
>>> The improvement can be significant for highly threaded programs -- we
>>> are seeing 7x speedup in coverage test run for some non-trivial google
>>> applications.
>>>
>>> Tested with bootstrap.
>>
>> Err - why not simply emit
>>
>>   counter = 1
>>
>> for the counter update itself with that --param (I don't like a --param
>> for this either).
>>
>> I assume that CPUs can avoid data-races and false sharing for
>> non-changing accesses?
>>
>
> I'm not aware of any CPU having this feature. I think a write to the
> shared cache line to invalidate all the shared copies. I cannot find
> any reference on checking the value of the write. Do you have any
> pointer to the feature?

I don't have any pointer - but I remember seeing this in the context
of atomics thus it may be only in the context of using a xchg
or cmpxchg instruction.  Which would make it non-portable to
some extent (if you don't want to use atomic builtins here).

Richard.

> I just tested this implementation vs. simply setting to 1, using
> google search as the benchmark.
> This one is 4.5x faster. The test was done on Intel Westmere systems.
>
> I can change the parameter to an option.
>
> -Rong
>
>> Richard.
>>
>>> Thanks,
>>>
>>> -Rong


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]