User account creation filtered due to spam.

Bug 24704 - __gnu_cxx::__exchange_and_add is called even for single threaded applications
Summary: __gnu_cxx::__exchange_and_add is called even for single threaded applications
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 4.0.2
: P3 enhancement
Target Milestone: 4.2.0
Assignee: Paolo Carlini
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-11-07 11:37 UTC by bert hubert
Modified: 2006-05-24 16:41 UTC (History)
3 users (show)

See Also:
Host: i486-linux-gnu
Target: i486-linux-gnu
Build: i486-linux-gnu
Known to work:
Known to fail:
Last reconfirmed: 2006-02-07 02:09:11


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description bert hubert 2005-11-07 11:37:33 UTC
__gnu_cxx::__exchange_and_add shows up in profiles of the PowerDNS recursor, which is single threaded. I'm somewhat amazed this is an out-of-line function (there are 40 cycles to be gained by inlining it, and this function gets called a lot), but I'm worried more about the 1000 cycle overhead even in a single-threaded program!

I think it would be good to teach std::string to use a non-atomic exchange_and_add  (or even just 'add') for non-REENTRANT applications, plus it would be good to be able to inline the function, but I understand this is of far smaller importance.

The code exhibiting prominent profiled use of __gnu_cxx::__exchange_and_add can be found via the repository described on http://wiki.powerdn.somc.

Thanks!
Comment 1 bert hubert 2005-11-07 12:39:46 UTC
that would be http://wiki.powerdns.com - not sure how I mistyped that.

Also, the application in question is rather string-happy, which is probably why the exchange_and_add thing shows up so much.
Comment 2 Nathanael C. Nerode 2005-11-07 15:15:56 UTC
True, enhancement request.
Comment 3 bert hubert 2005-11-07 17:00:41 UTC
I've now verified that even when the entire compiler is compiled with --enable-threads=single, the resulting libstdc++ does the full 'lock; xadd' thing in __exchange_and_add, which I consider a bug (and more importantly, so does pinskia).

Comment 4 Andrew Pinski 2005-11-07 17:05:48 UTC
Linking to the orginal thread:
http://gcc.gnu.org/ml/libstdc++/2004-02/msg00372.html
Comment 5 Andrew Pinski 2005-11-07 17:06:11 UTC
This is a regression too.
Comment 6 Paolo Carlini 2005-11-07 17:07:55 UTC
(In reply to comment #3)
> I've now verified that even when the entire compiler is compiled with
> --enable-threads=single, the resulting libstdc++ does the full 'lock; xadd'
> thing in __exchange_and_add,

This is well known, if you only cared to ask (sorry ;) If you look at the
sources it's clear that nothing about threads changes the choice of the
atomicity.h. I agree however, that it's a very good idea to have that 
configuration option influencing also the atomicity things.
Comment 7 Andrew Pinski 2005-11-07 17:32:30 UTC
Actually the real issue is not even here and there is no performance problems with the out of line/non threaded calls but the real issue is that we are just calling them too much.
Comment 8 Paolo Carlini 2005-11-07 17:36:37 UTC
(In reply to comment #7)
> Actually the real issue is not even here and there is no performance problems
> with the out of line/non threaded calls but the real issue is that we are just
> calling them too much.

Andrew, in some sense you are right, given the current reference-counted nature
of std::string. However, let's not close this PR: submitter has a point that
the user should be enabled to disable the use of atomics when configuring for
single thread. I leave to you the exact categorization... Thanks.

Comment 9 Andrew Pinski 2005-11-07 17:38:06 UTC
(In reply to comment #8)
> Andrew, in some sense you are right, given the current reference-counted nature
> of std::string. However, let's not close this PR: submitter has a point that
> the user should be enabled to disable the use of atomics when configuring for
> single thread. I leave to you the exact categorization... Thanks.
I should note I closed this based on a conversation with the submitter.
I also filed PR 24715 for the real issue.
Comment 10 bert hubert 2005-11-08 10:15:49 UTC
An easy solution might be to check for __gthread_active_p() in bits/basic_string before calling __exchange_and_add, and to do this locally and non-atomically otherwise.
Comment 11 Paolo Carlini 2005-11-08 10:53:17 UTC
(In reply to comment #10)
> An easy solution might be to check for __gthread_active_p() in
> bits/basic_string before calling __exchange_and_add, and to do this locally and
> non-atomically otherwise.

Yes, this solution can work. Actually, I'm hoping for something a bit cleaner.
Remember that we want a single clean solution for *all* the places where the
atomics are used and most are inside the built *.so.
Comment 12 Paolo Carlini 2005-11-08 11:30:45 UTC
In my opinion, the way to go is consistently using the macro __GTHREADS which
is undefined when --enable-threads=single is passed. This is consistent with
the approach already used in mt_allocator, for instance. And indeed, in mt_alloc
nothing is thread-safe in that case (but much simpler and faster!).
Comment 13 bert.hubert@netherlabs.nl 2005-11-08 11:43:36 UTC
Subject: Re:  __gnu_cxx::__exchange_and_add is called even for single threaded applications

On Tue, Nov 08, 2005 at 11:30:46AM -0000, pcarlini at suse dot de wrote:
> ------- Comment #12 from pcarlini at suse dot de  2005-11-08 11:30 -------
> In my opinion, the way to go is consistently using the macro __GTHREADS which
> is undefined when --enable-threads=single is passed. This is consistent with
> the approach already used in mt_allocator, for instance. And indeed, in
> mt_alloc
> nothing is thread-safe in that case (but much simpler and faster!).
> -- 

 __GTHREADS is fixed for libstdc++ depending on enable-threads=single|posix,
--whereas __gthread_active_p() depends on if the binary was compiled with
--pthread.

Indeed, other parts of libstdc++ already rely on __gthread_active_p(). 

Using __GTHREADS would get us halfway, but nobody in his right mind
(certainly no distributor) is going to compile with --enable-threads=single
on a platform that does posix - you get a crippled compiler. I'd hate to
have to keep two compilers around.

I've verified that adding a check on __gthread_active_p() to
bits/basic_string.h does the right thing. It only does so after a complete
recompile of libstdc++ though, the code is indeed instantiated earlier (as
you mentioned already).

The sticking point is people who don't do pthreads but DO need atomic
operations because they use some other threading system. There might be a
need for a mechanism to make __gthread_active_p() return true in that case.

Ideas?

Comment 14 Paolo Carlini 2005-11-08 11:50:23 UTC
(In reply to comment #13)

> Indeed, other parts of libstdc++ already rely on __gthread_active_p(). 

Sure, see mt_alloc: there is an external __GTHREADS and an internal
__gthread_active_p().

> Using __GTHREADS would get us halfway, but nobody in his right mind
> (certainly no distributor) is going to compile with --enable-threads=single
> on a platform that does posix - you get a crippled compiler. I'd hate to
> have to keep two compilers around.

This is meaningless. The configure option is present and has to be used
consistently in the whole library. Again, see mt_alloc.

> I've verified that adding a check on __gthread_active_p() to
> bits/basic_string.h does the right thing. It only does so after a complete
> recompile of libstdc++ though, the code is indeed instantiated earlier (as
> you mentioned already).

Of course does the right thing, there is no doubt about this. But, again,
it's inconsistent and ugly adding conditionals all around only in basic_string
and not touching all the other places. We simply want to do the same thing
that mt_alloc is already doing, centralized in one place.
Comment 15 bert.hubert@netherlabs.nl 2005-11-08 11:58:45 UTC
Subject: Re:  __gnu_cxx::__exchange_and_add is called even for single threaded applications

On Tue, Nov 08, 2005 at 11:50:24AM -0000, pcarlini at suse dot de wrote:
> Of course does the right thing, there is no doubt about this. But, again,
> it's inconsistent and ugly adding conditionals all around only in basic_string
> and not touching all the other places. We simply want to do the same thing
> that mt_alloc is already doing, centralized in one place.

How about we make (or I contribute) a set of 'atomic-if-needed'
'exchange-and-add' and 'add' inlineable functions?

These could just replace __gnu_cxx::__exchange_and_add in all places and
automatically do the right thing: lock when needed and just add otherwise.

The added bonus is that for unthreaded applications, still the majority,
everything is inlined anyhow, which even works on i386.

My fear is that otherwise the unthreaded case is never ever going to see
real use as the amount of people that will recompile their compiler or
libstdc++ is going to be rather minimal.

Thanks.

Comment 16 Paolo Carlini 2005-11-08 12:06:01 UTC
(In reply to comment #15)

> How about we make (or I contribute) a set of 'atomic-if-needed'
> 'exchange-and-add' and 'add' inlineable functions?
> 
> These could just replace __gnu_cxx::__exchange_and_add in all places and
> automatically do the right thing: lock when needed and just add otherwise.
> 
> The added bonus is that for unthreaded applications, still the majority,
> everything is inlined anyhow, which even works on i386.

Indeed, that's the idea. But:
1- Do you have a copyright assignment on file?
2- Are you aware of the various binary compatibility quirks? That means that
*nothing* has to to change (besides performance) wrt to exported symbols and
interoperability between binaries, if the user doesn't pass any --thread-model.
It's not trivial, but we can work out something for 4.2.

> My fear is that otherwise the unthreaded case is never ever going to see
> real use as the amount of people that will recompile their compiler or
> libstdc++ is going to be rather minimal.

To repeat: this is needed anyway, because we want to use consistently the
--thread-model configure option. Nothing will go in not checking also and
exploting the macro. Then comes everything else... And, sorry, *you* pointed
out this inconsistency in the first place ;)
Comment 17 bert.hubert@netherlabs.nl 2005-11-08 14:26:04 UTC
Subject: Re:  __gnu_cxx::__exchange_and_add is called even for single threaded applications

On Tue, Nov 08, 2005 at 12:06:02PM -0000, pcarlini at suse dot de wrote:
> To repeat: this is needed anyway, because we want to use consistently the
> --thread-model configure option. Nothing will go in not checking also and
> exploting the macro. Then comes everything else... And, sorry, *you* pointed
> out this inconsistency in the first place ;)

True - but would  #if __GTHREADS if(__gthread_active_p()) be ok? So check
both?

I've discussed a bit with Momchil Velikov and we feel it might be good if
there were a generic way for an application or library to signal that
multiple threads might be active. Momchil fakes this by defining the pthread
symbol that __gthread_active_p() looks for, but that is not generic enough.

This would enable libstdc++ to turn on its locking or atomic operations, but
remain fast and unlocked for truly single threaded programs.

Thanks.

Comment 18 Paolo Carlini 2005-11-08 14:34:58 UTC
(In reply to comment #17)
> > To repeat: this is needed anyway, because we want to use consistently the
> > --thread-model configure option. Nothing will go in not checking also and
> > exploting the macro. Then comes everything else... And, sorry, *you* pointed
> > out this inconsistency in the first place ;)
> 
> True - but would  #if __GTHREADS if(__gthread_active_p()) be ok? So check
> both?

Sure, this is the general idea. I'm a little bit concerned that for something
apparently so elemental as an addiction (atomic, yes...) we are going to add
a conditional, but probably, given the many cycles of atomics, it's ok. Any
chance you can benchmark a bit that? I gather you already played with adding
those checks, can you measure the overhead of the conditional alone compared
to not doing nothing (i.e., non-atomic inline addition/subtraction).

> I've discussed a bit with Momchil Velikov and we feel it might be good if
> there were a generic way for an application or library to signal that
> multiple threads might be active. Momchil fakes this by defining the pthread
> symbol that __gthread_active_p() looks for, but that is not generic enough.

Frankly, this is another issue. What we are going to do for *this* PR is
working along the lines of mt_allocator and the rest of the library, which
are largely relying on __gthread_active_p() in its present form and __GTHREAD,
nothing else. If you feel something can be improved about __gthread_active_p()
please submit a detailed, separate PR. Probably, it's not even a libstdc++
PR...
Comment 19 Paolo Carlini 2005-11-08 14:54:26 UTC
(In reply to comment #18)
> Sure, this is the general idea. I'm a little bit concerned that for something
> apparently so elemental as an addiction (atomic, yes...) we are going to add
> a conditional, but probably, given the many cycles of atomics, it's ok. Any
> chance you can benchmark a bit that? I gather you already played with adding
> those checks, can you measure the overhead of the conditional alone compared
> to not doing nothing (i.e., non-atomic inline addition/subtraction).

Actually, this kind of benchmarking is not necessary, because currently we
are always using the normal atomics. We have only to check that the additional
__gthread_active_p() conditional is not measurable wrt the atomics itself.
Comment 20 Andrew Pinski 2006-05-21 20:14:19 UTC
Any news on this one?
Comment 21 Paolo Carlini 2006-05-21 20:28:44 UTC
Yes, the audit trail doesn't say that in private mail with Loren we agreed that for now we only want to minimally take into account --enable-threads, that is exploit the __GTHREADS macro, consistently with other parts of the library (like the allocators). I'm going to implement that.
Comment 22 bert.hubert@netherlabs.nl 2006-05-22 05:20:06 UTC
Subject: Re:  __gnu_cxx::__exchange_and_add is called even for single threaded applications

On Sun, May 21, 2006 at 08:28:45PM -0000, pcarlini at suse dot de wrote:
> ------- Comment #21 from pcarlini at suse dot de  2006-05-21 20:28 -------
> Yes, the audit trail doesn't say that in private mail with Loren we agreed that
> for now we only want to minimally take into account --enable-threads, that is
> exploit the __GTHREADS macro, consistently with other parts of the library
> (like the allocators). I'm going to implement that.

This would mean a non-enable-threads compiled libstdc++ would skip the calls
to __exchange_and_add?

And to clarify, there would be no other way to skip these calls except
making a separate non-threaded libstdc++?

It would work for me, but I'd like to know.

Thanks.

Comment 23 Paolo Carlini 2006-05-22 08:54:38 UTC
(In reply to comment #22)

> This would mean a non-enable-threads compiled libstdc++ would skip the calls
> to __exchange_and_add?

Of course the idea would not be that of "skipping", instead providing a version of __exchange_and_add not using slow atomics and inlined.

> And to clarify, there would be no other way to skip these calls except
> making a separate non-threaded libstdc++?

Yes. At least for this PR. If you can provide compelling experimental evidence that normal users can pay the overhead of checking each __gthread_active_p() each time, then we could reconsider that, in a *separate* PR. Also, since we are moving anyway to not-reference-counted strings (already provided in ext/), there is not much point for more invasive and complex changes.
Comment 24 Paolo Carlini 2006-05-22 14:44:05 UTC
Humm, maybe I (we ;) was wrong: I'm doing some benchmarks and it looks like the cost of checking __gthread_active_p() compared to that of an atomic operation is very small... More later.
Comment 25 paolo@gcc.gnu.org 2006-05-24 16:37:53 UTC
Subject: Bug 24704

Author: paolo
Date: Wed May 24 16:37:42 2006
New Revision: 114044

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=114044
Log:
2006-05-24  Paolo Carlini  <pcarlini@suse.de>

	PR libstdc++/24704
	* include/bits/atomicity.h (__exchange_and_add_single,
	__atomic_add_single): New, single thread versions of the atomic
	functions.
	(__exchange_and_add_dispatch, __atomic_add_dispatch): New,
	depending on __GTHREADS and __gthread_active_p() dispatch either
	to the above or to the existing atomic functions.
	* include/ext/pool_allocator.h: Update callers.
	* include/ext/rc_string_base.h: Likewise.
	* include/bits/locale_classes.h: Likewise.
	* include/bits/basic_string.h: Likewise.
	* include/bits/ios_base.h: Likewise.
	* include/tr1/boost_shared_ptr.h: Likewise.
	* src/ios.cc: Likewise.
	* src/locale.cc: Likewise.
	* src/ios_init.cc: Likewise.

Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/atomicity.h
    trunk/libstdc++-v3/include/bits/basic_string.h
    trunk/libstdc++-v3/include/bits/ios_base.h
    trunk/libstdc++-v3/include/bits/locale_classes.h
    trunk/libstdc++-v3/include/ext/pool_allocator.h
    trunk/libstdc++-v3/include/ext/rc_string_base.h
    trunk/libstdc++-v3/include/tr1/boost_shared_ptr.h
    trunk/libstdc++-v3/src/ios.cc
    trunk/libstdc++-v3/src/ios_init.cc
    trunk/libstdc++-v3/src/locale.cc

Comment 26 Paolo Carlini 2006-05-24 16:41:56 UTC
Fixed.