This is the mail archive of the
mailing list for the GCC project.
Re: volatile access optimization (C++ / x86_64)
- From: Matt Godbolt <matt at godbolt dot org>
- To: Andrew Haley <aph at redhat dot com>
- Cc: GCC Development <gcc at gcc dot gnu dot org>
- Date: Sat, 27 Dec 2014 12:49:48 -0600
- Subject: Re: volatile access optimization (C++ / x86_64)
- Authentication-results: sourceware.org; auth=none
- References: <CAFWXXN3quEdSnaoWuPcQn2k-F99Yaw+6=NqgFgcu9ABpv5ZD3Q at mail dot gmail dot com> <549DE09B dot 8060502 at redhat dot com> <CAFWXXN0V9yvNTpcz54DCK237KPURQs1XkaHcQZK5Eoj_VCj0OA at mail dot gmail dot com> <549DED1B dot 3070006 at redhat dot com> <CAFWXXN13eKJVP0z6V6DEVwoxZtqo4Z0nnQA2YFd9+4UWJxrnVg at mail dot gmail dot com> <549EF314 dot 5060305 at redhat dot com> <CAFWXXN03T7dT8FGNgnaZf9Kq+hZu4e6pZqEWRb3BQsSjrzRCLw at mail dot gmail dot com> <549EFD1E dot 5090000 at redhat dot com>
> On Sat, Dec 27, 2014 at 11:57 AM, Andrew Haley <firstname.lastname@example.org> wrote:
> Is it faster? Have you measured it? Is it so much faster that it's critical for your
Well, I couldn't really leave this be: I did a little bit of
benchmarking using my company's proprietary benchmarking library,
which I'll try and get open sourced. It follows Intel's
recommendations for using RDTSCP/CPUID etc, and I've also spent some
time looking at Agner Fog 's techniques. I believe it to be pretty
accurate, to within a clock cycle or two.
On my laptop (Core i5 M520) the volatile and non-volatile increments
are so fast as to be within the noise - 1-2 clock cycles. So that
certainly lends support to your theory Andrew that it's probably not
worth the effort (other than offending my aesthetic sensibilities!).
Obviously this doesn't really take into account the extra i-cache
As a comparison, the "lock xaddl" versions come out at 18 cycles.
Obviously this is also pretty much "free" by any reasonable metric,
but it's hard to measure the impact of the bus lock on other
processors' memory accesses in a highly multi-threaded environment.
For completeness I also tried it on a few other machines:
X5670 : 0-2 for normal, 28 clocks for lock xadd
E5-2667 v2: as above, 27 clocks for lock xadd
E5-2667 v3: as above, 15 clocks for lock xadd
On Sat, Dec 27, 2014 at 11:57 AM, Andrew Haley <email@example.com> wrote:
> Well, in this case you now know: it's a bug! But one that it's
>fairly hard to care deeply about, although it might get fixed now.
Understood completely! Thanks again,