This is the mail archive of the mailing list for the libstdc++ project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

libstdc++-v3 PATCH: STL configuration change

Here is the patch to change the configuration of the STL
implementation used within libstdc++-v3 in order to regain the
performance profile of the STL implementation used within libstdc++-v2
as was shipped with gcc 2.95.X.  Unfortunately, a non-trivial amount
of commentary is required for this one-line configuration change.

Within libstdc++-v3/docs/html/17_intro/howto.html under the
Thread-safety section and recent posts from Benjamin, one finds that
__USE_MALLOC was defined after due consideration.  Here was one
representative discussion thread that was referenced (it seemed to be
a bug report about v2 not v3 but Benjamin explained why __USE_MALLOC
was defined for libstdc++-v3):

(Aside, I note that most links to libstdc++ list traffic referred to
 in the howto.html are currently dead.)

Also see these references regarding threading and memory allocation
issues in the version of STL used in libstdc++-v3:

(FYI, I have already committed the obvious patch to
 docs/html/23_containers/howto.html and docs/html/17_intro/howto.html
 to fix and/or add these links.)

I think I know why mysterious bug reports come in about threading with
the STL as shipped with libstdc++-v2 (both to report crashes and
random memory leaks).  My theory is that it has never been configured
properly so that it works with all ports, all the time, even when they
were explicitly built with --enable-threads.  For example, if I take
this very simple example (call it t.C):

#include <list>

and run:

g++ -V2.95.X -E <published threading option for the port> t.C | grep _Lock

on various platforms I have at my disposal (all known to have been
configured with --enable-threads), what do I find?

alpha-dec-osf4.0f/2.95.1: g++ -E -threads  t.C|grep ' _Lock('
            _Lock() {  ; }

[It is possible to get the thread-safe version of the allocator, on
 most/all ports with this undocumented trick:]

alpha-dec-osf4.0f/2.95.1: g++ -E -threads -D_PTHREADS  t.C|grep ' _Lock('
   [...]_Lock() { if (threads) pthread_mutex_lock(&_S_node_allocator_lock) ; }

sparc-sun-solaris2.7/2.95.3: g++ -E -pthreads t.C|grep ' _Lock('
   [...]_Lock() { if (threads) pthread_mutex_lock(&_S_node_allocator_lock) ; }

i386-redhat-linux/egcs-2.91.66: /usr/bin/g++ -pthread -E t.C|grep ' _Lock('
[No _Lock found since this version of STL is different than others but
 it looks OK at first glance (look for lock instead of _Lock).]

i386-redhat-linux/2.96: /usr/bin/g++ -E -pthread t.C|grep ' _Lock('
   [...]_Lock() { if (threads) pthread_mutex_lock(&_S_node_allocator_lock) ; }

i386-unknown-freebsd4.2/2.95.2: /usr/bin/g++ -E -pthread t.C|grep ' _Lock('
            _Lock() {  ; }

i386-unknown-freebsd4.2/2.95.2: /usr/bin/g++ -E -pthread -D_PTHREADS t.C|grep ' _Lock('
   [...]_Lock() { if (threads) pthread_mutex_lock(&_S_node_allocator_lock) ; }

(A pending libstdc++-v3 configuration patch, which is still being
 fine-tuned, will address this aspect of rampant misconfiguration and
 misnomers on how to get g++ to force STL threading support in the
 default container memory allocator.)

Since no test case was ever produced from aforementioned report or any
other that I could find, I have generated a trivial one that should
leak, if the high-speed allocator code path has a leak.  FYI, by
inspection of the code at interest, unless corruption of the data
structure occurs only in the threaded case (i.e. the mutex locking
isn't done right), we see that the allocator should leak whether it is
configured for threading or not.

#include <list>
#include <map>
#include <utility>

int main ()
  std::list<int> l;
  std::map<int, int> m;

  for (int k = 0; k < 4; k++)
      for (int i = 0; i < 10000000; i++)
        l.push_back (int());
      std::map<int, double> *m2 = new std::map<int, double>;
      for (int i = 0; i < 10000000; i++)
        l.pop_front ();
      for (int i = 0; i < 1000000; i++)
	m.insert (std::make_pair (i, i*i));
      for (int i = 0; i < 1000000; i++)
        m.erase (i);
      for (int i = 0; i < 1000000; i++)
	m2->insert (std::make_pair<int, double> (i, i*i));
      delete m2;
  return 0;

In my environment, I see no memory leak with this STL configuration
patch.  A memory leak in this context refers to memory demands growing
each time the k loop is rerun.  Due to the caching of memory by the
STL allocator, by design, it is not fair to call growth in cycle k=0
of the k loop a memory leak.  We need to make sure that any reports
from users have this same understanding.  Results:

Compiler (options) - memory usage observed - running time (u+s in all cases)
2.95.2 (-O3) - end of k=0 cycle 219M, never grew after - 62.2 seconds
2.95.2 (-O3 -pthread -D_PTHREADS [1]) - ditto - 185.8 seconds
2.95.2 (-O3 -D__USE_MALLOC) - grew to 156M, end of k=0 cycle ~0M - 230.3 seconds
[-D__USE_MALLOC dominates -D_PTHREADS, thus mixed results are not shown.]

mainline (-O3) - end of k=0 cycle 219M, never grew after - 57.2 seconds
mainline (-O3 -pthread -D_PTHREADS [1]) - ditto - 173.7
mainline (-O3 -D__USE_MALLOC) - grew to 156M, end of k=0 cycle ~0M - 227.2 secs
[-D__USE_MALLOC dominates -D_PTHREADS, thus mixed results are not shown.]

[1] Note: Once a patch to arrange for the default STL allocators to
    use the gthr.h abstraction layer is in place, no port should ever
    need this explicit mention of _PTHREADS to get a thread-safe
    allocation.  This note exists to ensure that bad information does
    not propagate to the user population after that work is completed!

Examples with more locality of reference may show even better run-time
performance improvements with this patch.  I have real C++ application
code which makes heavy use of STL that sees a 3-4x improvement with
this configuration patch.

Next, a heavily modified and extended version of the code contained
here: was
studied.  FYI, I do not know how to write a test that is absolutely
guaranteed to fail if the mutex locking code in the STL allocator is
missing and/or hosed.  This is the best effort I have been able to
produce but it can't even detect when _M_acquire_lock()'s
implementation is empty on my platform (FYI, they added an extra level
of indirection in the implementation of _Lock since the version of STL
we used in libstdc++-v2).  I think we would have to insert calls to
sched_yield() inside the implementation of the default allocator to
force it to blow up in most environments.

// This multi-threading C++/STL/POSIX code adheres to rules outlined here:
// It is believed to exercise the allocation code in a manner that
// should reveal memory leaks (and, under rare cases, race conditions,
// if the STL threading support is fubar'd).
// In addition to memory leak detection, which requires some human
// observation, this test also looks for memory corruption of the data
// passed between threads using an STL container.
// To manually inspect code generation, use:
// /usr/local/beta-gcc/bin/g++ -E STL-pthread-example.C|grep _Lock
// /usr/local/beta-gcc/bin/g++ -E STL-pthread-example.C|grep _M_acquire_lock

#include <list>
#include <pthread.h>

using namespace std;

const int thread_cycles = 100;
const int thread_pairs = 10;
const unsigned max_size = 100;
const int iters = 10000;

class task_queue
  task_queue ()
    pthread_mutex_init (&fooLock, NULL);
    pthread_cond_init (&fooCond, NULL);
  ~task_queue ()
    pthread_mutex_destroy (&fooLock);
    pthread_cond_destroy (&fooCond);
  list<int> foo;
  pthread_mutex_t fooLock;
  // This code uses a special case that allows us to use just one
  // condition variable - in general, don't use this idiom unless you
  // know what you are doing. ;-)
  pthread_cond_t fooCond;

produce (void* t)
  task_queue& tq = *(static_cast<task_queue*> (t));
  int num = 0;
  while (num < iters)
      pthread_mutex_lock (&tq.fooLock);
      while ( () >= max_size)
	pthread_cond_wait (&tq.fooCond, &tq.fooLock); (num++);
      pthread_cond_signal (&tq.fooCond);
      pthread_mutex_unlock (&tq.fooLock);
  return 0;

consume (void* t)
  task_queue& tq = *(static_cast<task_queue*> (t));
  int num = 0;
  while (num < iters)
      pthread_mutex_lock (&tq.fooLock);
      while ( () == 0)
	pthread_cond_wait (&tq.fooCond, &tq.fooLock);
      if ( () != num++)
	abort (); ();
      pthread_cond_signal (&tq.fooCond);
      pthread_mutex_unlock (&tq.fooLock);
  return 0;

main (int argc, char** argv)
  pthread_t prod[thread_pairs];
  pthread_t cons[thread_pairs];

  task_queue* tq[thread_pairs];

  for (int j = 0; j < thread_cycles; j++)
      for (int i = 0; i < thread_pairs; i++)
	  tq[i] = new task_queue;
	  pthread_create (&prod[i], NULL, produce, static_cast<void*> (tq[i]));
	  pthread_create (&cons[i], NULL, consume, static_cast<void*> (tq[i]));

      for (int i = 0; i < thread_pairs; i++)
	  pthread_join (prod[i], NULL);
	  pthread_join (cons[i], NULL);
#if defined(__FreeBSD__)
	  // These lines are not required by POSIX since a successful
	  // join is suppose to detach as well...
	  pthread_detach (prod[i]);
	  pthread_detach (cons[i]);
	  // ...but they are according to the FreeBSD 4.X code base
	  // or else you get a memory leak.
	  delete tq[i];

  return 0;

Results (no memory leaks or crashes were observed in any case although
they perhaps should have been seen in the last case):

2.92.2 (-O3 -pthread -D_PTHREADS): 80.9 u+s seconds
mainline (-O3 -pthread -D_PTHREADS): 75.4 u+s seconds
mainline (-O3 -pthread -D_PTHREADS -D__USE_MALLOC): 94.5 u+s seconds
mainline (-O3 -pthread): 49.9 u+s seconds (BUT this case was "unsafe at
any speed" since no mutex was guarding the allocator's shared memory pool.)

Aside, when users provide -D_PTHREADS or -D__USE_MALLOC on the command
line (or indirectly via a LIB_SPEC setting which maps in one of those
defines based on -pthread or somesuch), the ABI of STL is changed!  We
need to be very careful when steering users to use those internal STL
macros, if we ever want to have a stable ABI.

A semi-detailed analysis of a class of major run-time performance
regressions from gcc 2.95.X involving STL test cases was posted to
libstdc++-v3 (

2001-05-30  Loren J. Rittle  <>

	* include/bits/c++config (__USE_MALLOC): Do not define it.
	Document why not and explain how a library user can get
	non-default behavior without defining it.

	* docs/html/23_containers/howto.html (Containers and
	multithreading): Explain the current configuration of the STL.
	Provide pointer to comment in c++config.  Explain current
	situation of ABI modification in light of defining
	implementation-space macros that have traditionally been
	touted as the correct way to do things.

Index: ./include/bits/c++config
RCS file: /cvs/gcc/egcs/libstdc++-v3/include/bits/c++config,v
retrieving revision 1.24
diff -r1.24 c++config
< // This is the "underlying allocator" for STL.  The alternatives are
< // homegrown schemes involving a kind of mutex and free list; see stl_alloc.h.
< #define __USE_MALLOC
> // Default to the typically higher-speed libstdc++-v2 configuration.
> // To debug STL code or to gain better performance in some threading
> // cases on some platforms, uncomment this line to globally change
> // behavior of your application code (this will require, at the very
> // least, recompilation of your entire application and, perhaps, the
> // entire libstdc++ library):
> //
> // #define __USE_MALLOC
> // However, once you define __USE_MALLOC, only the malloc allocator is
> // visible to application code (i.e. the typically higher-speed
> // allocator is not even available in this configuration).  Note that
> // it is possible to force the malloc-based allocator on a
> // per-case-basis for some application code even when the above macro
> // symbol is not defined.  The author of this comment believes that is
> // a better way to tune an application for high-speed using this
> // implementation of the STL.  Here is one possible example displaying
> // the forcing of the malloc-based allocator over the typically
> // higher-speed default allocator:
> //
> // std::list <void*, std::malloc_alloc>
Index: docs/html/23_containers/howto.html
RCS file: /cvs/gcc/egcs/libstdc++-v3/docs/html/23_containers/howto.html,v
retrieving revision 1.3
diff -r1.3 howto.html
>       The STL implementation is currently configured to use the
>       high-speed caching memory allocator.  If you absolutely think
>       you must change this on a global basis for your platform to
>       support multi-threading, then please consult all commentary in
>       include/bits/c++config.  Be fully aware that you change the ABI
>       of libstdc++-v3 when you provide -D__USE_MALLOC on the command
>       line or make a change to that configuration file. [Placeholder
>       in case other patches don't make it before the 3.0 release: That
>       memory allocator can appear buggy in multithreaded C++ programs
>       (and has been reported to leak memory), if STL is misconfigured
>       for your platform.  You may need to provide -D_PTHREADS on the
>       command line in this case to ensure the memory allocator for
>       containers is really protected by a mutex.  Also, be aware that
>       you just changed the ABI of libstdc++-v3 when you did that.]

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]