This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Regression test for thread safety?

From: Loren James Rittle <rittle at latour dot rsch dot comm dot mot dot com>
To: dank at kegel dot com
Cc: libstdc++ at gcc dot gnu dot org
Date: Tue, 16 Jul 2002 03:14:03 -0500 (CDT)
Subject: Re: Regression test for thread safety?
References: <200207160326.g6G3QXsZ068291@latour.rsch.comm.mot.com> <3D33B83B.6B5F26FD@kegel.com>
Reply-to: rittle at labs dot mot dot com

In article <3D33B83B.6B5F26FD@kegel.com>, <dank@kegel.com> writes:

>> How did you configure GCC?  We need the exact command line you used.

> [...] --enable-threads=posix [...]

>> We also need to know the thread model as reported by ``gcc -v''.

[...]
> Thread model: posix
> gcc version 3.0.2

OK, on both counts.  Now that we have established that you have a real
problem (be it in the library code or your application code), I'm very
interested in understanding the failure.  In my mind, much of the
incremental tweaking done related to thread support in libstdc++
included with GCC 2.95.X -> 3.0.X -> 3.1.X has been done to help stop
users from shooting themselves in the foot due to our library design
issues.

> The bug we're seeing could be the app's fault, too.

Just in time for the GCC 3.0.2 release, we better documented the exact
idioms that application code must adhere to in order to match library
requirements (see http://gcc.gnu.org/onlinedocs/libstdc++/faq/index.html#5_6).

Can you confirm that your application code matches those expectations?
Based upon information you have posted, this is a possible issue IMHO.

Can you confirm that a non-generic implementation of atomicity.h is
installed for the target platform (in the build tree, look at
<target-triple>/libstdc++-v3/include/<target-triple>/bits/atomicity.h)?
Based upon information you have posted, this is unlikely to be the
root issue IMHO.

For GCC 3.1.1, Phil removed additional macro-enabled paths in the
library (and more importantly, its exposed headers) that created
subtle (i.e. non-detectable at link-time) failures (this process
started with the GCC 3.0 release).  However, I am afraid that some
such paths will continue to exist.

> BTW, defining [_]_USE_MALLOC sees to make the problem go away, at least
> in initial tests.

> Anything obvious jump out at you? 

When you define __USE_MALLOC under GCC 3.0.X, your application will
map internal STL container memory needs to malloc() instead of using a
high-speed pool allocator which itself uses the mutex abstraction.
I.e. defining __USE_MALLOC would tend to mask any threading-related
bugs in our STL implementation (possible) or configuration for your
port (more likely, IMHO since you are running rarer targets/cross-targets).

Under the assumption that there is something wrong with this particular
configuration, you could try the following steps:

(1) Using the version of GCC you have built (I assume that the library
    itself was built without defining __USE_MALLOC and that you have
    not manually touched c++config, etc), run:

    $ g++ -g -pthread pthread1.cc

(2) Using any gdb that works with this GCC output executable, run:

    $ gdb -nw a.out

(3) Within gdb:

    (gdb) break pthread_mutex_lock
    (gdb) run

    ``bt'' then ``continue'' repeatedly until you see a backtrace
    similar to this (the exact names of library routines might have
    changed a bit; the key is to see a hit for line pthread1.cc:118):

#0  0x281448e1 in pthread_mutex_lock () from /usr/lib/libc_r.so.4
#1  0x280ac0c6 in std::__default_alloc_template<true, 0>::allocate(unsigned) (
    __n=672177968)
    at /usr/users/rittle/tmp/gcc-build-latour-3.1-0516/i386-unknown-freebsd4.6/libstdc++-v3/include/i386-unknown-freebsd4.6/bits/gthr-default.h:484
#2  0x080494c1 in std::__simple_alloc<std::_List_node<int>, std::__default_alloc_template<true, 0> >::allocate(unsigned) (__n=1)
    at /usr/local/include/g++-v3/bits/stl_alloc.h:224
#3  0x08049456 in std::_List_alloc_base<int, std::allocator<int>, true>::_M_get_node() (this=0x8087030) at /usr/local/include/g++-v3/bits/stl_list.h:238
#4  0x080494fc in _List_base (this=0x8087030, __a=@0xbfbfe7f0)
    at /usr/local/include/g++-v3/bits/stl_list.h:262
#5  0x08049480 in list (this=0x8087030, __a=@0xbfbfe7f0)
    at /usr/local/include/g++-v3/bits/stl_list.h:361
#6  0x0804935d in task_queue (this=0x8087030) at pthread1.cc:49
#7  0x08048d4d in main (argc=1, argv=0xbfbfe934) at pthread1.cc:118
#8  0x08048abf in _start (arguments=0xbfbfeabc "/tmp/a.out")
    at /usr/users/rittle/outside-cvs-src/freebsd-src/lib/csu/i386-elf/crt1.c:96

If you never see that backtrace, then something is completely forked
in your POSIX thread environment.  It would be *very* helpful to the
GCC project if you could track down why that is so in your
environment.

If you see that backtrace, then you adapt these steps to look at your
built application.

BTW, if any application or library code was compiled with __USE_MALLOC
defined, then it all must be or you might corrupt the malloc() heap.
Since defining __USE_MALLOC verses not defining it created no ABI or
naming change, it was impossible to detect this failure at application
link-time.  Some malloc() implementations will catch this with a
controlled abort() or similar; some will not.

OTOH, defining __USE_MALLOC will also change the profile of speed of
some library functions.  Thus, timing changes.  Thus, all sorts of
things could occur in a different order or interleave within a
threaded program...

> I've been going on the assumption that my next logical upgrade
> from gcc3.0.2 is gcc3.0.4, but haven't really checked whether gcc3.1 
> might be better (e.g. is it 'approved' yet for compiling
> the Linux kernel for ppc405, ppc750, and sh4?).

I can't answer your exact question but...

After discounting application code mismatch with library requirements
(yes, this would require an audit of all your application code) and
the above gdb exercise, testing later releases of GCC would be in
order.

It would be *very* helpful to the GCC project if you could report
whether GCC 3.1 fixes your problem.  If it fails there, then also
testing and reporting against GCC mainline would be the biggest help.

Regards,
Loren

Follow-Ups:
- Re: Regression test for thread safety?
  - From: dank

References:
- Re: Regression test for thread safety?
  - From: Loren James Rittle
- Re: Regression test for thread safety?
  - From: dank

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]