Bug 52839 - [4.7/4.8 Regression] double free or corruption running tr1/.../default_weaktoshared.exe
Summary: [4.7/4.8 Regression] double free or corruption running tr1/.../default_weakto...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 4.8.0
: P3 normal
Target Milestone: 4.7.1
Assignee: Jonathan Wakely
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-03 02:01 UTC by Alan Modra
Modified: 2012-11-02 02:41 UTC (History)
2 users (show)

See Also:
Host:
Target: powerpc-linux
Build:
Known to work:
Known to fail: 4.7.0
Last reconfirmed: 2012-04-03 00:00:00


Attachments
config patch (423 bytes, patch)
2012-04-05 04:00 UTC, Alan Modra
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alan Modra 2012-04-03 02:01:07 UTC
Occasionally seen with r185830 when running libstdc++ testsuite on power7.  Quickly reproducible using a few copies of
while LD_LIBRARY_PATH=necessary_paths ./default_weaktoshared.exe; do true done &

*** glibc detected *** ./default_weaktoshared.exe: double free or corruption (fasttop): 0x100222e8 ***
======= Backtrace: =========
/lib/power7/libc.so.6(+0x833b8)[0xfc333b8]
/lib/power7/libc.so.6(cfree+0x94)[0xfc38b24]
/home/amodra/build/gcc-virgin/powerpc-linux/./libstdc++-v3/src/.libs/libstdc++.so.6(_ZdlPv+0x2c)[0xff12b7c]
./default_weaktoshared.exe[0x10000eb8]
/lib/power7/libpthread.so.0(+0x749c)[0xfd6749c]
/lib/power7/libc.so.6(clone+0x84)[0xfca2160]
======= Memory map: ========
00100000-00120000 r-xp 00000000 00:00 0                                  [vdso]
0fbb0000-0fd40000 r-xp 00000000 fd:00 20234319                           /lib/power7/libc-2.11.1.so
0fd40000-0fd50000 rw-p 00180000 fd:00 20234319                           /lib/power7/libc-2.11.1.so
0fd60000-0fd80000 r-xp 00000000 fd:00 20234324                           /lib/power7/libpthread-2.11.1.so
0fd80000-0fd90000 rw-p 00010000 fd:00 20234324                           /lib/power7/libpthread-2.11.1.so
0fda0000-0fdc0000 r-xp 00000000 fd:00 24224298                           /home/amodra/build/gcc-virgin/gcc/libgcc_s.so.1
0fdc0000-0fdd0000 rw-p 00010000 fd:00 24224298                           /home/amodra/build/gcc-virgin/gcc/libgcc_s.so.1
0fde0000-0fe90000 r-xp 00000000 fd:00 20234322                           /lib/power7/libm-2.11.1.so
0fe90000-0fea0000 r--p 000a0000 fd:00 20234322                           /lib/power7/libm-2.11.1.so
0fea0000-0feb0000 rw-p 000b0000 fd:00 20234322                           /lib/power7/libm-2.11.1.so
0fec0000-0ffd0000 r-xp 00000000 fd:00 34160649                           /home/amodra/build/gcc-virgin/powerpc-linux/libstdc++-v3/src/.libs/libstdc++.so.6.0.17
0ffd0000-0ffe0000 r--p 00100000 fd:00 34160649                           /home/amodra/build/gcc-virgin/powerpc-linux/libstdc++-v3/src/.libs/libstdc++.so.6.0.17
0ffe0000-0fff0000 rw-p 00110000 fd:00 34160649                           /home/amodra/build/gcc-virgin/powerpc-linux/libstdc++-v3/src/.libs/libstdc++.so.6.0.17
10000000-10010000 r-xp 00000000 fd:00 14912951                           /home/amodra/build/gcc-virgin/powerpc-linux/libstdc++-v3/testsuite/default_weaktoshared.exe
10010000-10020000 rw-p 00000000 fd:00 14912951                           /home/amodra/build/gcc-virgin/powerpc-linux/libstdc++-v3/testsuite/default_weaktoshared.exe
10020000-10050000 rwxp 00000000 00:00 0                                  [heap]
f22e0000-f22f0000 ---p 00000000 00:00 0 
f22f0000-f2b40000 rw-p 00000000 00:00 0 
f2b40000-f2b50000 ---p 00000000 00:00 0 
f2b50000-f33a0000 rw-p 00000000 00:00 0 
f33a0000-f33b0000 ---p 00000000 00:00 0 
f33b0000-f3c00000 rw-p 00000000 00:00 0 
f3c00000-f3c30000 rw-p 00000000 00:00 0 
f3c30000-f3d00000 ---p 00000000 00:00 0 
f3d50000-f3d60000 ---p 00000000 00:00 0 
f3d60000-f45b0000 rw-p 00000000 00:00 0 
f45b0000-f45c0000 ---p 00000000 00:00 0 
f45c0000-f4e10000 rw-p 00000000 00:00 0 
f4e10000-f4e20000 ---p 00000000 00:00 0 
f4e20000-f5670000 rw-p 00000000 00:00 0 
f5670000-f5680000 ---p 00000000 00:00 0 
f5680000-f5ed0000 rw-p 00000000 00:00 0 
f5ed0000-f5ee0000 ---p 00000000 00:00 0 
f5ee0000-f6730000 rw-p 00000000 00:00 0 
f6730000-f6740000 ---p 00000000 00:00 0 
f6740000-f6f90000 rw-p 00000000 00:00 0 
f6f90000-f6fa0000 ---p 00000000 00:00 0 
f6fa0000-f77f0000 rw-p 00000000 00:00 0 
f77f0000-f7820000 r-xp 00000000 fd:00 7757916                            /lib/ld-2.11.1.so
f7820000-f7830000 rw-p 00020000 fd:00 7757916                            /lib/ld-2.11.1.so
ffd10000-ffe60000 rw-p 00000000 00:00 0                                  [stack]
/bin/bash: line 2:  4173 Aborted                 (core dumped) LD_LIBRARY_PATH=:/home/amodra/build/gcc-virgin/gcc:/home/amodra/build/gcc-virgin/powerpc-linux/./libstdc++-v3/../libgomp/.libs:/home/amodra/build/gcc-virgin/powerpc-linux/./libstdc++-v3/src/.libs:/home/amodra/build/gcc-virgin/gcc/64:/home/amodra/build/gcc-virgin/gcc/nof::/home/amodra/build/gcc-virgin/gcc:/home/amodra/build/gcc-virgin/powerpc-linux/./libstdc++-v3/../libgomp/.libs:/home/amodra/build/gcc-virgin/powerpc-linux/./libstdc++-v3/src/.libs:/home/amodra/build/gcc-virgin/gcc/64:/home/amodra/build/gcc-virgin/gcc/nof:/home/amodra/build/gcc-virgin/powerpc-linux/libstdc++-v3/src/.libs:/home/amodra/build/gcc-virgin/powerpc-linux/libmudflap/.libs:/home/amodra/build/gcc-virgin/powerpc-linux/libssp/.libs:/home/amodra/build/gcc-virgin/powerpc-linux/libgomp/.libs:/home/amodra/build/gcc-virgin/powerpc-linux/libitm/.libs:/home/amodra/build/gcc-virgin/./gcc:/home/amodra/build/gcc-virgin/./prev-gcc:/usr/lib64/mpi/gcc/openmpi/lib64 ./default_weaktoshared.exe

(gdb) bt
#0  0x0fbe839c in raise () from /lib/power7/libc.so.6
#1  0x0fbea034 in abort () from /lib/power7/libc.so.6
#2  0x0fc2b674 in __libc_message () from /lib/power7/libc.so.6
#3  0x0fc333b8 in malloc_printerr () from /lib/power7/libc.so.6
#4  0x0fc38b24 in free () from /lib/power7/libc.so.6
#5  0x0ff12b7c in operator delete (ptr=<optimized out>) at /home/amodra/src/gcc-virgin/libstdc++-v3/libsupc++/del_op.cc:48
#6  0x10000eb8 in _M_release (this=0x100222f8) at /home/amodra/build/gcc-virgin/powerpc-linux/libstdc++-v3/include/tr1/shared_ptr.h:148
#7  ~__shared_count (this=<synthetic pointer>, __in_chrg=<optimized out>) at /home/amodra/build/gcc-virgin/powerpc-linux/libstdc++-v3/include/tr1/shared_ptr.h:348
#8  ~__shared_ptr (this=<synthetic pointer>, __in_chrg=<optimized out>) at /home/amodra/build/gcc-virgin/powerpc-linux/libstdc++-v3/include/tr1/shared_ptr.h:548
#9  ~shared_ptr (this=<synthetic pointer>, __in_chrg=<optimized out>) at /home/amodra/build/gcc-virgin/powerpc-linux/libstdc++-v3/include/tr1/shared_ptr.h:992
#10 thread_hammer (opaque_weak=0x1002dae4) at /home/amodra/src/gcc-virgin/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/thread/default_weaktoshared.cc:132
#11 0x0fd6749c in start_thread () from /lib/power7/libpthread.so.0
#12 0x0fca2160 in clone () from /lib/power7/libc.so.6
Comment 1 Jonathan Wakely 2012-04-03 08:30:34 UTC
looking into it...
Comment 2 Jonathan Wakely 2012-04-03 09:19:01 UTC
I'll have to wait until I've built trunk on power to debug it, but this should only be possible if two threads can both decrement an atomic counter to zero, which might imply a race in the new __atomic ops for power.
Comment 3 Jonathan Wakely 2012-04-03 21:33:19 UTC
I haven't managed to reproduce a double-free on gcc110 in the compile farm, but did get this:


#0  0x00000080ef73a5fc in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00000080ef73c2fc in __GI_abort () at abort.c:91
#2  0x00000fff954ce724 in uw_init_context_1 (context=0xfff94cbdee0, outer_cfa=0xfff94cbe590, outer_ra=0xfff955c4240)
    at /home/jwakely/src/gcc/libgcc/unwind-dw2.c:1502
#3  0x00000fff954cee80 in _Unwind_RaiseException (exc=0x1002b397a80) at /home/jwakely/src/gcc/libgcc/unwind.inc:88
#4  0x00000fff955c4240 in __cxxabiv1::__cxa_throw (obj=0x1002b397aa0, tinfo=0x10013e00, dest=
    @0x10014260: 0x100029f0 <std::tr1::bad_weak_ptr::~bad_weak_ptr()>) at /home/jwakely/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:78
#5  0x0000000010002c18 in std::tr1::__throw_bad_weak_ptr ()
    at /home/jwakely/gcc/4.x/lib/gcc/powerpc64-unknown-linux-gnu/4.8.0/../../../../include/c++/4.8.0/tr1/shared_ptr.h:76
#6  0x000000001000127c in _M_add_ref_lock (this=0x1002b383ec0)
    at /home/jwakely/gcc/4.x/lib/gcc/powerpc64-unknown-linux-gnu/4.8.0/../../../../include/c++/4.8.0/tr1/shared_ptr.h:244
#7  __shared_count (__r=..., this=<synthetic pointer>)
    at /home/jwakely/gcc/4.x/lib/gcc/powerpc64-unknown-linux-gnu/4.8.0/../../../../include/c++/4.8.0/tr1/shared_ptr.h:494
#8  __shared_ptr<A> (__r=..., this=<synthetic pointer>)
    at /home/jwakely/gcc/4.x/lib/gcc/powerpc64-unknown-linux-gnu/4.8.0/../../../../include/c++/4.8.0/tr1/shared_ptr.h:586
#9  shared_ptr<A> (__r=std::tr1::weak_ptr (expired, weak 10) 0x1002b383ea0, this=<synthetic pointer>)
    at /home/jwakely/gcc/4.x/lib/gcc/powerpc64-unknown-linux-gnu/4.8.0/../../../../include/c++/4.8.0/tr1/shared_ptr.h:1015
#10 thread_hammer (opaque_weak=0x1002b39b5c8) at default_weaktoshared.cc:132
#11 0x00000080ef8ec9f0 in start_thread (arg=0xfff94cbf1e0) at pthread_create.c:311
#12 0x00000080ef7f0774 in .__clone () from /lib64/libc-2.14.90.so
Comment 4 Jonathan Wakely 2012-04-04 00:19:29 UTC
I can reliably reproduce that uncaught exception, which should be caught by the handler on line 134, but not a double-free
Comment 5 Alan Modra 2012-04-04 01:11:47 UTC
Interesting.  gcc revision?  I've held back on svn update after hearing that richi had broken the tree for powerpc.  The uncaught exception might be an eh_frame problem like that in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52828, which has a patch.
Comment 6 Jonathan Wakely 2012-04-04 08:31:02 UTC
Revision: 186100
Comment 7 Alan Modra 2012-04-04 09:57:51 UTC
I also see the same 64-bit failure on r186130.  A lot harder to reproduce than the 32-bit one I originally reported (which is still there on r186130). Likely not a problem with .eh_frame info since the fail only happens rarely.
Comment 8 Jonathan Wakely 2012-04-04 10:39:07 UTC
Doh, I completely failed to notice yours is powerpc not powerpc64 so I wasn't testing 32-bit, sorry.  I'll re-check when I get home this evening.
Comment 9 Alan Modra 2012-04-04 11:12:56 UTC
Heh.  We're even.  I didn't notice yours was a 64-bit failure until you told me your gcc revision number.
Comment 10 Alan Modra 2012-04-04 14:20:57 UTC
I caught the 64-bit failure in the act.  It's dying on the gcc_assert in unwind-dw2.c:_Unwind_SetSpColumn, with the value read from dwarf_reg_size_table[1] being zero.  The implication here is that __gthread_once, ie. pthread_once, does not provide proper locking for the power7 weakly ordered memory model.  It's too late here tonight to go delving into libpthread to prove my case, but it certainly seems like the 64-bit failure is not a gcc problem.
Comment 11 Jonathan Wakely 2012-04-04 20:22:57 UTC
I can reproduce this with -m32 on gcc110

If I compile with -D_GLIBCXX_ATOMIC_BUILTINS then I no longer get the double-free, after running in a loop for several minutes - can you confirm that?

That implies the bug is in the non-atomic builtin fallback code in config/cpu/generic/atomicity_mutex/atomicity.h which defines __exchange_and_add like so:

  _Atomic_word
  __attribute__ ((__unused__))
  __exchange_and_add(volatile _Atomic_word* __mem, int __val) throw ()
  {
    __gnu_cxx::__scoped_lock sentry(get_atomic_mutex());
    _Atomic_word __result;
    __result = *__mem;
    *__mem += __val;
    return __result;
  }

This basically does a pthread_mutex_lock on a function-local static pthread_mutex_t.  Could this also be a problem in the underlying pthread lib?

I was surprised that _GLIBCXX_ATOMIC_BUILTINS is not automatically defined, but indeed 32/libstdc++-v3/config.log shows:

configure:15247: checking for atomic builtins for long long
configure:15276: /home/jwakely/build/./gcc/xgcc -shared-libgcc -B/home/jwakely/build/./gcc -nostdinc++ -L/home/jwakely/build/powerpc64-unknown-linux-gnu/32/libstdc++-v3/src -L/home/jwakely/build/powerpc64-unknown-linux-gnu/32/libstdc++-v3/src/.libs -B/home/jwakely/gcc/4.x/powerpc64-unknown-linux-gnu/bin/ -B/home/jwakely/gcc/4.x/powerpc64-unknown-linux-gnu/lib/ -isystem /home/jwakely/gcc/4.x/powerpc64-unknown-linux-gnu/include -isystem /home/jwakely/gcc/4.x/powerpc64-unknown-linux-gnu/sys-include  -m32 -fPIC -mstrict-align -o conftest -g -O2 -D_GNU_SOURCE -fno-exceptions   conftest.cpp  >&5
/tmp/cck5YUAJ.o: In function `main':
/home/jwakely/build/powerpc64-unknown-linux-gnu/32/libstdc++-v3/conftest.cpp:30: undefined reference to `__atomic_fetch_add_8'
/home/jwakely/build/powerpc64-unknown-linux-gnu/32/libstdc++-v3/conftest.cpp:32: undefined reference to `__atomic_compare_exchange_8'

It seems unfortunate that we can't use __exchange_and_add on a 4-byte int just because the platform doesn't provide atomic ops for 64-bit types.
Comment 12 Alan Modra 2012-04-04 22:27:43 UTC
glibc/ntpl/pthread_once.c:

static int once_lock = LLL_LOCK_INITIALIZER;

int
__pthread_once (once_control, init_routine)
     pthread_once_t *once_control;
     void (*init_routine) (void);
{
  /* XXX Depending on whether the LOCK_IN_ONCE_T is defined use a
     global lock variable or one which is part of the pthread_once_t
     object.  */
  if (*once_control == PTHREAD_ONCE_INIT)
    {
      lll_lock (once_lock, LLL_PRIVATE);

      /* XXX This implementation is not complete.  It doesn't take
	 cancelation and fork into account.  */
      if (*once_control == PTHREAD_ONCE_INIT)
	{
	  init_routine ();

	  *once_control = !PTHREAD_ONCE_INIT;
	}

      lll_unlock (once_lock, LLL_PRIVATE);
    }

  return 0;
}
strong_alias (__pthread_once, pthread_once)

This is quite broken for weakly ordered memory model.  Notice how a thread that sees *once_control != PTHREAD_ONCE_INIT has no barriers at all.  That means the read of *once_control has no ordering with respect to what happened on the thread that ran init_routine.
Comment 13 Alan Modra 2012-04-04 23:02:34 UTC
Huh, so glibc has a powerpc specific pthread_once, and that one has a different bug.  Lack of lwsync before atomic_increment (once_control);
Comment 14 Alan Modra 2012-04-05 04:00:07 UTC
Created attachment 27094 [details]
config patch

Yes, the 32-bit failure seems to be gone if we use the gcc builtin atomics.  Multiple copies of the testcase have been running now for an hour and half, and no testsuite regressions with the attached patch.
Comment 15 Alan Modra 2012-04-05 08:06:30 UTC
Many hours later one of my 32-bit tests failed, but I'm relieved to say it was only the pthread_once bug.

#0  0x0fbd839c in raise () from /lib/power7/libc.so.6
#1  0x0fbda034 in abort () from /lib/power7/libc.so.6
#2  0x0fd9f480 in uw_init_context_1 (context=context@entry=0xf2dce8f8, outer_cfa=outer_cfa@entry=0xf2dcecf0, outer_ra=0xff20f98) at /home/amodra/src/gcc-current/libgcc/unwind-dw2.c:1502
#3  0x0fd9fb7c in _Unwind_RaiseException (exc=0x1002c360) at /home/amodra/src/gcc-current/libgcc/unwind.inc:88
#4  0x0ff20f98 in __cxxabiv1::__cxa_throw (obj=0x1002c380, tinfo=<optimized out>, dest=<optimized out>) at /home/amodra/src/gcc-current/libstdc++-v3/libsupc++/eh_throw.cc:78
#5  0x10000d24 in std::tr1::__throw_bad_weak_ptr () at /home/amodra/build/gcc-current/powerpc-linux/libstdc++-v3/include/tr1/shared_ptr.h:76
#6  0x10000e9c in _M_add_ref_lock (this=0x10021f60) at /home/amodra/build/gcc-current/powerpc-linux/libstdc++-v3/include/tr1/shared_ptr.h:244
#7  __shared_count (__r=..., this=<synthetic pointer>) at /home/amodra/build/gcc-current/powerpc-linux/libstdc++-v3/include/tr1/shared_ptr.h:494
#8  __shared_ptr<A> (__r=..., this=<synthetic pointer>) at /home/amodra/build/gcc-current/powerpc-linux/libstdc++-v3/include/tr1/shared_ptr.h:586
#9  shared_ptr<A> (__r=..., this=<synthetic pointer>) at /home/amodra/build/gcc-current/powerpc-linux/libstdc++-v3/include/tr1/shared_ptr.h:1015
#10 thread_hammer (opaque_weak=0x1002db44) at /home/amodra/src/gcc-current/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/thread/default_weaktoshared.cc:132
#11 0x0fd5749c in start_thread () from /lib/power7/libpthread.so.0
#12 0x0fc92160 in clone () from /lib/power7/libc.so.6
Comment 16 Jonathan Wakely 2012-04-05 08:09:53 UTC
I think we want a macro saying atomics are available for 'int' (which libstdc++ needs for its own uses) and a separate macro saying atomics are available for everything (which is needed for full C++11 <atomic> support). I've raised it on the libstdc++ list.
Comment 17 Alan Modra 2012-04-05 12:05:01 UTC
I spent quite a bit of time today looking at libpthread and can't spot a problem in pthread_mutex_lock and pthread_mutex_unlock.

I wonder if the problem is that libstdc++ is using both atomics and pthread_mutex protected manipulation of the same variable?  That of course would fail.  Looking at libstdc++ with objdump, I do see quite a few occurrences of lwarx even though _GLIBCXX_ATOMIC_BUILTINS is undefined.  These are the functions:

atomic_flag_test_and_set_explicit
_ZL33__gxx_dependent_exception_cleanup19_Unwind_Reason_CodeP17_Unwind_Exception
_ZNSt15__exception_ptr13exception_ptr9_M_addrefEv
_ZNSt15__exception_ptr13exception_ptr10_M_releaseEv
_ZSt17rethrow_exceptionNSt15__exception_ptr13exception_ptrE
_ZL22free_any_cxa_exceptionP17_Unwind_Exception
_ZL23__gxx_exception_cleanup19_Unwind_Reason_CodeP17_Unwind_Exception
__cxa_guard_acquire
__cxa_guard_abort
__cxa_guard_release
Comment 18 Jonathan Wakely 2012-04-05 13:02:53 UTC
(In reply to comment #17)
> I wonder if the problem is that libstdc++ is using both atomics and
> pthread_mutex protected manipulation of the same variable?  That of course
> would fail.

I'm pretty sure that can't happen. We always access those variables consistently through a single API, which would always use a mutex or never use a mutex.

>  Looking at libstdc++ with objdump, I do see quite a few
> occurrences of lwarx even though _GLIBCXX_ATOMIC_BUILTINS is undefined.  These
> are the functions:
> 
> atomic_flag_test_and_set_explicit
> _ZL33__gxx_dependent_exception_cleanup19_Unwind_Reason_CodeP17_Unwind_Exception
> _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv
> _ZNSt15__exception_ptr13exception_ptr10_M_releaseEv
> _ZSt17rethrow_exceptionNSt15__exception_ptr13exception_ptrE
> _ZL22free_any_cxa_exceptionP17_Unwind_Exception
> _ZL23__gxx_exception_cleanup19_Unwind_Reason_CodeP17_Unwind_Exception
> __cxa_guard_acquire
> __cxa_guard_abort
> __cxa_guard_release

That looks correct, none of those depends on _GLIBCXX_ATOMIC_BUILTINS

Could the problem be the initialization of the function-local static, rather than in locking it later. That initialization should be thread-safe, that's what __cxa_guard_acquire and __cxa_guard_release are used for: http://sourcery.mentor.com/public/cxx-abi/abi.html#once-ctor
Comment 19 Alan Modra 2012-04-10 15:13:24 UTC
I think I was on the right track when I questioned whether the problem might be mixing atomics and mutex protected ops, but was looking in the wrong place.  I should have looked at default_weaktoshared.o rather than libstdc++.so.

objdump -drS default_weaktoshared.o shows
[snip]
void* thread_hammer(void* opaque_weak)
{
   0:   94 21 ff d0     stwu    r1,-48(r1)
  static inline _Atomic_word
  __attribute__ ((__unused__))
  __exchange_and_add_dispatch(_Atomic_word* __mem, int __val)
  {
#ifdef __GTHREADS
    if (__gthread_active_p())
   4:   3d 20 00 00     lis     r9,0
                        6: R_PPC_ADDR16_HA      pthread_cancel
   8:   7d 80 00 26     mfcr    r12
   c:   7c 08 02 a6     mflr    r0
  10:   39 29 00 00     addi    r9,r9,0
                        12: R_PPC_ADDR16_LO     pthread_cancel
  14:   2e 09 00 00     cmpwi   cr4,r9,0
  18:   93 c1 00 28     stw     r30,40(r1)
      typedef typename __traits_type::pointer           pointer;

      _GLIBCXX_CONSTEXPR __normal_iterator() : _M_current(_Iterator()) { }

      explicit
      __normal_iterator(const _Iterator& __i) : _M_current(__i) { }
  1c:   3f c0 00 01     lis     r30,1
  20:   90 01 00 34     stw     r0,52(r1)
  24:   93 81 00 20     stw     r28,32(r1)
  28:   93 a1 00 24     stw     r29,36(r1)
  2c:   93 e1 00 2c     stw     r31,44(r1)
  30:   91 81 00 1c     stw     r12,28(r1)
  34:   7c 7c 1b 78     mr      r28,r3
  38:   63 de 86 a0     ori     r30,r30,34464
  3c:   83 a3 00 00     lwz     r29,0(r3)
  40:   48 00 00 18     b       58 <_Z13thread_hammerPv+0x58>
  44:   60 00 00 00     nop
  48:   60 00 00 00     nop
  4c:   60 00 00 00     nop
    0x9908b0dful, 11, 7,
    0x9d2c5680ul, 15,
    0xefc60000ul, 18> rng;
  wp_vector_t::iterator cur_weak = weak_pool.begin();

  for (unsigned int i = 0; i < HAMMER_REPEAT; ++i)
  50:   37 de ff ff     addic.  r30,r30,-1
  54:   41 82 00 ac     beq-    100 <_Z13thread_hammerPv+0x100>
  // now that __weak_count is defined we can define this constructor:
  template<_Lock_policy _Lp>
    inline
    __shared_count<_Lp>::
    __shared_count(const __weak_count<_Lp>& __r)
    : _M_pi(__r._M_pi)
  58:   83 fd 00 04     lwz     r31,4(r29)
    {
      if (_M_pi != 0)
  5c:   2f 9f 00 00     cmpwi   cr7,r31,0
  60:   41 9e 00 cc     beq-    cr7,12c <_Z13thread_hammerPv+0x12c>
    inline void
    _Sp_counted_base<_S_atomic>::
    _M_add_ref_lock()
    {
      // Perform lock-free add-if-not-zero operation.
      _Atomic_word __count = _M_use_count;
  64:   81 3f 00 04     lwz     r9,4(r31)
  68:   91 21 00 08     stw     r9,8(r1)
      do
        {
          if (__count == 0)
  6c:   2f 89 00 00     cmpwi   cr7,r9,0
  70:   41 9e 00 d8     beq-    cr7,148 <_Z13thread_hammerPv+0x148>
          // Replace the current counter value with the old value + 1, as
          // long as it's not changed meanwhile.
        }
      while (!__atomic_compare_exchange_n(&_M_use_count, &__count, __count + 1,
                                          true, __ATOMIC_ACQ_REL,
                                          __ATOMIC_RELAXED));
  74:   81 01 00 08     lwz     r8,8(r1)
  78:   39 29 00 01     addi    r9,r9,1
  7c:   38 7f 00 04     addi    r3,r31,4
  80:   7c 20 04 ac     lwsync
  84:   7d 40 18 28     lwarx   r10,0,r3
  88:   7c 0a 40 00     cmpw    r10,r8
  8c:   40 82 00 0c     bne-    98 <_Z13thread_hammerPv+0x98>
  90:   7d 20 19 2d     stwcx.  r9,0,r3
  94:   4c 00 01 2c     isync
  98:   91 41 00 08     stw     r10,8(r1)
    _Sp_counted_base<_S_atomic>::
    _M_add_ref_lock()
    {
      // Perform lock-free add-if-not-zero operation.
      _Atomic_word __count = _M_use_count;
      do
  9c:   40 82 00 a4     bne-    140 <_Z13thread_hammerPv+0x140>
  a0:   41 92 00 ac     beq-    cr4,14c <_Z13thread_hammerPv+0x14c>
      return __exchange_and_add(__mem, __val);
  a4:   38 80 ff ff     li      r4,-1
  a8:   48 00 00 01     bl      a8 <_Z13thread_hammerPv+0xa8>
                        a8: R_PPC_REL24 _ZN9__gnu_cxx18__exchange_and_addEPVii
      void
      _M_release() // nothrow
      {
        // Be race-detector-friendly.  For more info see bits/c++config.
        _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_use_count);
        if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, -1) == 1)
  ac:   2f 83 00 01     cmpwi   cr7,r3,1
  b0:   40 9e ff a0     bne+    cr7,50 <_Z13thread_hammerPv+0x50>
          {
            _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
            _M_dispose();
  b4:   81 3f 00 00     lwz     r9,0(r31)
  b8:   7f e3 fb 78     mr      r3,r31
  bc:   81 29 00 08     lwz     r9,8(r9)
  c0:   7d 29 03 a6     mtctr   r9
  c4:   4e 80 04 21     bctrl
                _GLIBCXX_WRITE_MEM_BARRIER;

That certainly looks like _M_use_count is fiddled with both by atomics and __exchange_and_add with pthread mutex.
Comment 20 Jonathan Wakely 2012-04-10 15:36:40 UTC
You're quite right, my apologies for telling you that wouldn't happen.

In bits/shared_ptr_base.h we have:

  template<> 
    inline void
    _Sp_counted_base<_S_atomic>::
    _M_add_ref_lock()
    {
      // Perform lock-free add-if-not-zero operation.
      _Atomic_word __count = _M_use_count;
      do
	{
	  if (__count == 0)
	    __throw_bad_weak_ptr();
	  // Replace the current counter value with the old value + 1, as
	  // long as it's not changed meanwhile. 
	}
      while (!__atomic_compare_exchange_n(&_M_use_count, &__count, __count + 1,
					  true, __ATOMIC_ACQ_REL, 
					  __ATOMIC_RELAXED));
    }


This will always be atomic if __default_lock_policy is _S_atomic

The problem is in ext/concurrence.h:

  // Compile time constant that indicates prefered locking policy in
  // the current configuration.
  static const _Lock_policy __default_lock_policy = 
#ifdef __GTHREADS
#if (defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_2) \
     && defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4))
  _S_atomic;
#else
  _S_mutex;
#endif
#else
  _S_single;
#endif

This is inconsistent with the definition of _GLIBCXX_ATOMIC_BUILTINS and means that on power shared_ptr thinks it can use atomics, but the "dispatch" functions in ext/atomicity.h don't use atomics.


__default_lock_policy should not say to use atomics if the dispatch functions don't use atomics.  

A quick fix would be to only set __default_lock_policy=_S_atomic if _GLIBCXX_ATOMIC_BUILTINS is also defined
Comment 21 Jonathan Wakely 2012-04-10 15:58:35 UTC
In brief:

in ext/concurrence.h we set __default_lock_policy to _S_atomic if we have 2-byte and 4-byte CAS, which is true for powerpc.

but in ext/atomicity.h we fallback to a mutex unless we have 2-byte, 4-byte *and* 8-byte CAS, and there is no 8-byte CAS on powerpc.


The fix isn't so brief:

Ideally the ext/atomicity.h code should not require 8-byte CAS.  If we had a _GLIBCXX_ATOMIC_BUILTINS_WORD that indicated atomics work for 'int' then we could use that in both ext/atomicity.h and ext/concurrence.h and then they'd be consistent, ensuring we don't mix atomics with mutexes ... *but* then powerpc would stop using a mutex to implement refcounts in libstdc++ and because those functions are inlined that would prevent mixing code built with old and new libstdc++ which is not acceptable.

So I think the right fix is to set __default_lock_policy = _S_mutex when we don't have 8-byte CAS, i.e. make it also depend on _GLIBCXX_ATOMIC_BUILTINS.

We could potentially add a --enable-libstdcxx-atomic-word switch that would change the ABI so powerpc could use atomic builtins for refcounts, then make that the default when we change the libstdc++ SONAME.
Comment 22 Alan Modra 2012-04-10 23:53:16 UTC
Regarding the ABI change, would I be correct to say that the ABI was broken for powerpc by
2012-02-10  Benjamin Kosnik  <bkoz@redhat.com>
            Jonathan Wakely  <jwakely.gcc@gmail.com>
since that particular patch introduced the dependency on having long long atomics?  If I understand correctly, before that change code would use sync builtins on powerpc throughout.  So code compiled before that date won't behave correctly using a current libstdc++.so, which now uses mutexes.

I'm not trying to apportion blame, just pointing out that making libstdc++ use atomics on powerpc *fixes* an ABI breakage for the great majority of cases.
Comment 23 Jonathan Wakely 2012-04-11 00:30:46 UTC
I hadn't got as far as tracking down when it changed - if you're right that would be quite nice and we'll be able to fix it properly.
Comment 24 Jonathan Wakely 2012-04-11 00:58:17 UTC
Benjamin, it seems that the patch for PR 51798 caused this regression and ABI-incompatible change on 32-bit power, so that the functions in ext/atomicity.h now use a mutex where they previously used atomic builtins.  What's the reason for requiring 8-byte compare-and-swap for _GLIBCXX_ATOMIC_BUILTINS to be defined?
Comment 25 torvald 2012-04-12 08:24:16 UTC
If we make an ABI switch at some point, should we move over to relying just on atomics and the libatomic fallbacks (assuming/hoping libatomic exists by then)?

Also, refcounting in basic_string has bugs too.  Do you see other areas of libstdc++ that could use a review of the synchronization bits?
Comment 26 Jonathan Wakely 2012-04-12 08:30:17 UTC
(In reply to comment #25)
> If we make an ABI switch at some point, should we move over to relying just on
> atomics and the libatomic fallbacks (assuming/hoping libatomic exists by then)?

Yes, unfortunately neither an ABI switch nor libatomic is not going to happen in the short term, the priority right now is to fix this regression in 4.7.0

I hope to get time in the next few days to partially revert the acinclude.m4 changes that cause _GLIBCXX_ATOMIC_BUILTINS to rely on 8-byte CAS, but I'd like to hear from Benjamin first.

> Also, refcounting in basic_string has bugs too.  Do you see other areas of
> libstdc++ that could use a review of the synchronization bits?

std::locale also uses atomics for refcounting.
Comment 27 Alan Modra 2012-04-12 09:18:38 UTC
I took a look at libstdc++.so use of exchange_and_add on powerpc 32-bit.  There seems to be a lot of places where it appears, even before Benjamin's change, ie. r184109.

Uses of exchange_and_add common to both r184109 and r186130, the latter patched with http://gcc.gnu.org/bugzilla/attachment.cgi?id=27094 to get _GLIBCXX_ATOMIC_BUILTINS enabled on powerpc.

_ZNKSt15basic_stringbufIcSt11char_traitsIcESaIcEE3strEv
_ZNKSt15basic_stringbufIwSt11char_traitsIwESaIwEE3strEv
_ZNKSt18basic_stringstreamIcSt11char_traitsIcESaIcEE3strEv
_ZNKSt18basic_stringstreamIwSt11char_traitsIwESaIwEE3strEv
_ZNKSt19basic_istringstreamIcSt11char_traitsIcESaIcEE3strEv
_ZNKSt19basic_istringstreamIwSt11char_traitsIwESaIwEE3strEv
_ZNKSt19basic_ostringstreamIcSt11char_traitsIcESaIcEE3strEv
_ZNKSt19basic_ostringstreamIwSt11char_traitsIwESaIwEE3strEv
_ZNKSt7collateIcE10do_compareEPKcS2_S2_S2_
_ZNSt15basic_stringbufIcSt11char_traitsIcESaIcEE8overflowEi
_ZNSt15basic_stringbufIcSt11char_traitsIcESaIcEED0Ev
_ZNSt15basic_stringbufIcSt11char_traitsIcESaIcEED1Ev
_ZNSt15basic_stringbufIwSt11char_traitsIwESaIwEE8overflowEj
_ZNSt15basic_stringbufIwSt11char_traitsIwESaIwEED0Ev
_ZNSt15basic_stringbufIwSt11char_traitsIwESaIwEED1Ev
_ZNSt18basic_stringstreamIcSt11char_traitsIcESaIcEED0Ev
_ZNSt18basic_stringstreamIcSt11char_traitsIcESaIcEED1Ev
_ZNSt18basic_stringstreamIcSt11char_traitsIcESaIcEED2Ev
_ZNSt18basic_stringstreamIwSt11char_traitsIwESaIwEED0Ev
_ZNSt18basic_stringstreamIwSt11char_traitsIwESaIwEED1Ev
_ZNSt18basic_stringstreamIwSt11char_traitsIwESaIwEED2Ev
_ZNSt19basic_istringstreamIcSt11char_traitsIcESaIcEED0Ev
_ZNSt19basic_istringstreamIcSt11char_traitsIcESaIcEED1Ev
_ZNSt19basic_istringstreamIcSt11char_traitsIcESaIcEED2Ev
_ZNSt19basic_istringstreamIwSt11char_traitsIwESaIwEED0Ev
_ZNSt19basic_istringstreamIwSt11char_traitsIwESaIwEED1Ev
_ZNSt19basic_istringstreamIwSt11char_traitsIwESaIwEED2Ev
_ZNSt19basic_ostringstreamIcSt11char_traitsIcESaIcEED0Ev
_ZNSt19basic_ostringstreamIcSt11char_traitsIcESaIcEED1Ev
_ZNSt19basic_ostringstreamIcSt11char_traitsIcESaIcEED2Ev
_ZNSt19basic_ostringstreamIwSt11char_traitsIwESaIwEED0Ev
_ZNSt19basic_ostringstreamIwSt11char_traitsIwESaIwEED1Ev
_ZNSt19basic_ostringstreamIwSt11char_traitsIwESaIwEED2Ev
_ZStlsIdcSt11char_traitsIcEERSt13basic_ostreamIT0_T1_ES6_RKSt7complexIT_E
_ZStlsIdwSt11char_traitsIwEERSt13basic_ostreamIT0_T1_ES6_RKSt7complexIT_E
_ZStlsIfcSt11char_traitsIcEERSt13basic_ostreamIT0_T1_ES6_RKSt7complexIT_E
_ZStlsIfwSt11char_traitsIwEERSt13basic_ostreamIT0_T1_ES6_RKSt7complexIT_E
_ZStlsIgcSt11char_traitsIcEERSt13basic_ostreamIT0_T1_ES6_RKSt7complexIT_E
_ZStlsIgwSt11char_traitsIwEERSt13basic_ostreamIT0_T1_ES6_RKSt7complexIT_E

Use of exchange_and_add only in r184109
r186130 with patch appears to neither use exchange_and_add or atomics in these functions.  (I say appears because I only looked for direct calls.)
_ZNKSt7num_getIcSt19istreambuf_iteratorIcSt11char_traitsIcEEE16_M_extract_floatES3_S3_RSt8ios_baseRSt12_Ios_IostateRSs
_ZNKSt7num_getIcSt19istreambuf_iteratorIcSt11char_traitsIcEEE6do_getES3_S3_RSt8ios_baseRSt12_Ios_IostateRd
_ZNKSt7num_getIcSt19istreambuf_iteratorIcSt11char_traitsIcEEE6do_getES3_S3_RSt8ios_baseRSt12_Ios_IostateRe
_ZNKSt7num_getIcSt19istreambuf_iteratorIcSt11char_traitsIcEEE6do_getES3_S3_RSt8ios_baseRSt12_Ios_IostateRf
_ZNKSt7num_getIwSt19istreambuf_iteratorIwSt11char_traitsIwEEE16_M_extract_floatES3_S3_RSt8ios_baseRSt12_Ios_IostateRSs
_ZNKSt7num_getIwSt19istreambuf_iteratorIwSt11char_traitsIwEEE6do_getES3_S3_RSt8ios_baseRSt12_Ios_IostateRd
_ZNKSt7num_getIwSt19istreambuf_iteratorIwSt11char_traitsIwEEE6do_getES3_S3_RSt8ios_baseRSt12_Ios_IostateRe
_ZNKSt7num_getIwSt19istreambuf_iteratorIwSt11char_traitsIwEEE6do_getES3_S3_RSt8ios_baseRSt12_Ios_IostateRf
_ZNKSt9money_getIcSt19istreambuf_iteratorIcSt11char_traitsIcEEE6do_getES3_S3_bRSt8ios_baseRSt12_Ios_IostateRSs
_ZNKSt9money_getIcSt19istreambuf_iteratorIcSt11char_traitsIcEEE6do_getES3_S3_bRSt8ios_baseRSt12_Ios_IostateRe
_ZNKSt9money_getIwSt19istreambuf_iteratorIwSt11char_traitsIwEEE6do_getES3_S3_bRSt8ios_baseRSt12_Ios_IostateRSbIwS2_SaIwEE
_ZNKSt9money_getIwSt19istreambuf_iteratorIwSt11char_traitsIwEEE6do_getES3_S3_bRSt8ios_baseRSt12_Ios_IostateRe
_ZNKSt9money_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE6do_putES3_bRSt8ios_basece
_ZNKSt9money_putIwSt19ostreambuf_iteratorIwSt11char_traitsIwEEE6do_putES3_bRSt8ios_basewe
_ZNSt16__numpunct_cacheIcE8_M_cacheERKSt6locale
_ZNSt16__numpunct_cacheIwE8_M_cacheERKSt6locale
_ZNSt18__moneypunct_cacheIwLb0EE8_M_cacheERKSt6locale
_ZNSt18__moneypunct_cacheIwLb1EE8_M_cacheERKSt6locale
Comment 28 Jonathan Wakely 2012-04-12 10:06:38 UTC
all those uses in streams and facets are (I believe) from std::locale
Comment 29 Jonathan Wakely 2012-04-12 10:09:10 UTC
pinging Benjamin (at an address that doesn't exclude bugzilla CC mails) ;)
Comment 30 Jonathan Wakely 2012-04-13 20:51:57 UTC
Alan, please send the attached patch to the libstdc++ and gcc-patches lists and
check it in to trunk (with the regenerated 'configure') with my approval.  It
should go on the 4.7 branch too, but maybe wait a day or two for any problems
to show up on trunk.
Comment 31 Alan Modra 2012-04-14 13:24:48 UTC
Author: amodra
Date: Sat Apr 14 13:24:43 2012
New Revision: 186453

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=186453
Log:
	PR libstdc++/52839
	* acinclude.m4 (_GLIBCXX_ATOMIC_BUILTINS): Do not depend on
	glibcxx_cv_atomic_long_long.
	* configure: Regenerate.


Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/acinclude.m4
    trunk/libstdc++-v3/configure
Comment 32 Alan Modra 2012-04-21 13:27:51 UTC
Author: amodra
Date: Sat Apr 21 13:27:44 2012
New Revision: 186650

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=186650
Log:
	PR libstdc++/52839
	* acinclude.m4 (_GLIBCXX_ATOMIC_BUILTINS): Do not depend on
	glibcxx_cv_atomic_long_long.
	* configure: Regenerate.


Modified:
    branches/gcc-4_7-branch/libstdc++-v3/ChangeLog
    branches/gcc-4_7-branch/libstdc++-v3/acinclude.m4
Comment 33 Alan Modra 2012-04-21 13:28:59 UTC
Author: amodra
Date: Sat Apr 21 13:28:53 2012
New Revision: 186651

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=186651
Log:
	PR libstdc++/52839
missed from last delta

Modified:
    branches/gcc-4_7-branch/libstdc++-v3/configure
Comment 34 Alan Modra 2012-04-21 13:30:38 UTC
fixed
Comment 35 Andrew Pinski 2012-11-02 01:35:35 UTC
I get the same "randomish" failure on MIPS64 (the Octeon 1 with 16 cores) with GCC 4.7.  The patch listed below will not help me at all as the code is already using the __atomic_* functions.  Was there another change to the libstdc++ code which fixes similar issue too?
Comment 36 Alan Modra 2012-11-02 02:13:20 UTC
The change I mention in #c22
 http://gcc.gnu.org/viewcvs?view=revision&revision=184110
tests for atomic ops on all of bool, short, int and long long, where the previous test was for *either* atomic bool or atomic short.  My fix for powerpc removed the long long test.  Does mips lack atomic on bool or short?
Comment 37 Andrew Pinski 2012-11-02 02:16:48 UTC
(In reply to comment #36)
> The change I mention in #c22
>  http://gcc.gnu.org/viewcvs?view=revision&revision=184110
> tests for atomic ops on all of bool, short, int and long long, where the
> previous test was for *either* atomic bool or atomic short.  My fix for powerpc
> removed the long long test.  Does mips lack atomic on bool or short?

No it does not lack atomic for either book or short.  I am getting exactly the same failure as mentioned in #c3 and not the double free though.
Comment 38 Alan Modra 2012-11-02 02:39:29 UTC
Ah, the #c3 fail on powerpc was due to a powerpc glibc pthread_once bug.  And comment #36 should have read:
..previous test was for *either* atomic bool or atomic int.
Comment 39 Andrew Pinski 2012-11-02 02:41:37 UTC
(In reply to comment #38)
> Ah, the #c3 fail on powerpc was due to a powerpc glibc pthread_once bug.  And
> comment #36 should have read:
> ..previous test was for *either* atomic bool or atomic int.

Looks like the MIPS glibc pthread_once has the same issue too.  I should have read the comments fully to see that.