Bug 52604 - mt allocator crashes on multi-threaded
Summary: mt allocator crashes on multi-threaded
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 4.8.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
: 68200 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-03-16 15:23 UTC by Laurent Aflonsi
Modified: 2015-11-04 09:18 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-04-12 00:00:00


Attachments
Testcase to reproduce (431 bytes, application/octet-stream)
2012-03-16 15:23 UTC, Laurent Aflonsi
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Laurent Aflonsi 2012-03-16 15:23:52 UTC
Created attachment 26904 [details]
Testcase to reproduce

There's a problem with mt_allocator when using multi-threads.
see example attached (g++ -lpthread).

At exit time, 
- the __freelist destructor is called.
  which does a __gthread_key_delete(_M_key).
- and then afterwards, the std::list<> destructor is called.
  This finally have for effect to use that thread key with gthread_setspecific.

Valgrind reports :
=================================
==5212== Invalid read of size 8
==5212==    at 0x4CA009D: __gnu_cxx::__pool<true>::_M_get_thread_id() (in ../gcc-4.6.3/lib64/libstdc++.so.6.0.16)
==5212==    by 0x4CA0173: __gnu_cxx::__pool<true>::_M_reclaim_block(char*, unsigned long) (in ../gcc-4.6.3/lib64/libstdc++.so.6.0.16)
==5212==    by 0x401989: __gnu_cxx::__mt_alloc<std::_List_node<std::string>, __gnu_cxx::__common_pool_policy<__gnu_cxx::__pool, true> >::deallocate(std::_List_node<std::string>*, unsigned long) (in /tmp/a.out)
==5212==    by 0x401847: std::_List_base<std::string, __gnu_cxx::__mt_alloc<std::string, __gnu_cxx::__common_pool_policy<__gnu_cxx::__pool, true> > >::_M_put_node(std::_List_node<std::string>*) (in /tmp/a.out)
==5212==    by 0x40168C: std::_List_base<std::string, __gnu_cxx::__mt_alloc<std::string, __gnu_cxx::__common_pool_policy<__gnu_cxx::__pool, true> > >::_M_clear() (in /tmp/a.out)
==5212==    by 0x40151A: std::_List_base<std::string, __gnu_cxx::__mt_alloc<std::string, __gnu_cxx::__common_pool_policy<__gnu_cxx::__pool, true> > >::~_List_base() (in /tmp/a.out)
==5212==    by 0x401BB1: std::list<std::string, __gnu_cxx::__mt_alloc<std::string, __gnu_cxx::__common_pool_policy<__gnu_cxx::__pool, true> > >::~list() (in /tmp/a.out)
==5212==    by 0x3814E30C94: exit (in /lib64/tls/libc-2.3.4.so)
==5212==    by 0x3814E1C411: (below main) (in /lib64/tls/libc-2.3.4.so)
==5212==  Address 0x5b5ce88 is 24 bytes inside a block of size 65,536 free'd
==5212==    at 0x4A077EC: operator delete(void*) (vg_replace_malloc.c:457)
==5212==    by 0x3814E30C94: exit (in /lib64/tls/libc-2.3.4.so)
==5212==    by 0x3814E1C411: (below main) (in /lib64/tls/libc-2.3.4.so)
=================================

I ve seen a quite close bug report (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22309), that is already fixed in 4.0.0.
This apparently leaded to re-write a part of mt_allocate (Jakub Jelinek). But this was not sufficient for the current bug report i m reporting.

Thanks in advance.
Laurent Alfonsi
Comment 1 Laurent Aflonsi 2012-03-16 15:25:23 UTC
I would propose to fix as follows :
- Set the _M_Key=NULL when calling the freelist desctuctor.
- Testing the NULL Key before using it with setspecific.
see patch below :

===================================================================
--- libstdc++-v3/src/mt_allocator.cc    (revision 2486)
+++ libstdc++-v3/src/mt_allocator.cc    (working copy)
@@ -47,6 +47,7 @@
       if (_M_thread_freelist_array)
        {
          __gthread_key_delete(_M_key);
+         _M_key = NULL;
          ::operator delete(static_cast<void*>(_M_thread_freelist_array));
        }
     }
@@ -639,7 +640,8 @@
                }
            }
 
-           __gthread_setspecific(freelist._M_key, (void*)_M_id);
+           if (freelist._M_key)
+               __gthread_setspecific(freelist._M_key, (void*)_M_id);
          }
        return _M_id >= _M_options._M_max_threads ? 0 : _M_id;
       }
===================================================================

Let me know what you think

Thanks very much
Laurent Alfonsi
Comment 2 Paolo Carlini 2012-03-19 12:26:56 UTC
To be clear: are you seeing the valgrind errors going away with the patchlet applied? First blush, I don't.
Comment 3 Laurent Aflonsi 2012-03-20 12:31:31 UTC
Well, in fact I am facing a runtime crash on another target (SH4). The crash is fixed by the patch proposed previously.

On the other hand, I ve tried to reproduce on x86, to easily report on gcc bugs. and I ve discovered this valgrind error.

I first thought both issues were the same, but they are not. And you are right, my patch doesn't fix the valgrind error. Sorry for confusion.

The _M_thread_freelist might also be set to NULL at some point. (may be also in ~__freelist()). Setting _M_thread_freelist to NULL at ~__freelist() is solves the valgrind error, but I m not sure this is the right place to do it.

Thanks
Comment 4 Paolo Carlini 2012-03-20 12:48:20 UTC
In principle I have no problem with such zeroings, make sense, but it's been a while since the last time I looked into this code and I fear races. In any case, please try to fully run the testsuite (check-performance too, which exercised mt_allocator quite a bit) on the affected targets.

Jakub, any comments on this?
Comment 5 Jakub Jelinek 2012-03-20 14:03:41 UTC
It surprises me.  The destructors of libstdc++, on which the binary depends on, should be run after the destructors of the binary, not before it.
Comment 6 Laurent Aflonsi 2012-03-23 11:06:57 UTC
Sorry for the late answer (sickness). 

Well, In fact there are 2 things in the atexit list :
1) The destruction of the "list_type" global variable. Destructor is :
  "std::list<std::string, __gnu_cxx::__mt_alloc<std::string, __gnu_cxx::__common_pool_policy<__gnu_cxx::__pool, true> > >::~list()"
  => This one is registered in atexit list by the __static_initialization_and_destruction_0() because it is a global variable.

2) The destruction of the mt_allocator itself
  "(anonymous namespace)::__freelist::~__freelist()"
  => This one is registered in atexit list during the main() execution. when first using the mt_allocator (first called get_freelist())


The above 2 destructors are registered in atexit list in this order (1 then 2).
Then, they are invoked in the reverse order (2 then 1). The freelist is first destroyed (where i proposed to add zeroing), followed by the destruction of the list_type that invokes _M_get_thread_id() :

__gnu_cxx::__pool<true>::_M_get_thread_id ()
__gnu_cxx::__pool<true>::_M_reclaim_block ()
__gnu_cxx::__mt_alloc<std::_List_node<std::string>, __gnu_cxx::__common_pool_policy<__gnu_cxx::__pool, true> >::deallocate(std::_List_node<std::string>*, unsigned long) ()
std::_List_base<std::string, __gnu_cxx::__mt_alloc<std::string, __gnu_cxx::__common_pool_policy<__gnu_cxx::__pool, true> > >::_M_put_node(std::_List_node<std::string>*) ()
std::_List_base<std::string, __gnu_cxx::__mt_alloc<std::string, __gnu_cxx::__common_pool_policy<__gnu_cxx::__pool, true> > >::_M_clear() ()
std::_List_base<std::string, __gnu_cxx::__mt_alloc<std::string, __gnu_cxx::__common_pool_policy<__gnu_cxx::__pool, true> > >::~_List_base() ()
std::list<std::string, __gnu_cxx::__mt_alloc<std::string, __gnu_cxx::__common_pool_policy<__gnu_cxx::__pool, true> > >::~list() ()
exit ()

Jakub, do you think this order of destruction is not justified ? Do you see another solution than the one i proposed ?

Thanks very much.
Laurent
Comment 7 Laurent Aflonsi 2012-04-10 15:28:07 UTC
Ping ? Here is the patch proposed.

============================
--- libstdc++-v3/src/mt_allocator.cc	2011/08/04 07:56:49	2064
+++ libstdc++-v3/src/mt_allocator.cc	2012/04/03 08:45:48	2551
@@ -47,7 +47,9 @@
       if (_M_thread_freelist_array)
 	{
 	  __gthread_key_delete(_M_key);
+	  _M_key = NULL;
 	  ::operator delete(static_cast<void*>(_M_thread_freelist_array));
+	  _M_thread_freelist = NULL;
 	}
     }
   };
@@ -639,7 +641,10 @@
 		}
 	    }
 
-	    __gthread_setspecific(freelist._M_key, (void*)_M_id);
+	    if (freelist._M_key)
+	      { 
+	        __gthread_setspecific(freelist._M_key, (void*)_M_id);
+	      }
 	  }
 	return _M_id >= _M_options._M_max_threads ? 0 : _M_id;
       }
=====================================
Comment 8 Paolo Carlini 2012-04-11 01:08:12 UTC
Note, stylistically, no curly brackets in the if body, and also no NULL, just 0 in C++98. I also note that you are not patching mainline, all the fixes go to mainline first. Finally, more substantively, how did you test the patch? Running make check-performance exercises mt_allocator quite a bit, for example. Are all the leaks / crashes gone with the patch?
Comment 9 Laurent Aflonsi 2012-04-12 15:05:25 UTC
Paolo,

I ve discovered that the encoding of M_key is encoded differently for each thread-implementation. On pthread implementation M_key is an integer, whereas on others it is pointers. So, in pthread 0 is a legal thread key: So zeroing M_key and testing if(M_key) is simply wrong.

The M_thread_freelist zeroing seems legal and is enough to avoid the above valgrind error. I am now testing with this fix only:
============================
--- libstdc++-v3/src/mt_allocator.cc   
+++ libstdc++-v3/src/mt_allocator.cc   
@@ -47,7 +47,8 @@
       if (_M_thread_freelist_array)
     {
       __gthread_key_delete(_M_key);
       ::operator delete(static_cast<void*>(_M_thread_freelist_array));
+      _M_thread_freelist = NULL;
     }
     }
   };
============================

It seems to be on each thread-implementation to handle deleted keys and it is well handled on pthread.

Laurent.
Comment 10 Paolo Carlini 2012-04-12 15:36:52 UTC
Ok, thanks. Please let us know how it goes.
Comment 11 Paolo Carlini 2012-04-12 16:29:32 UTC
In mainline, for x86_64-linux, the below patchlet indeed avoids the valgrind errors and passes make check, make check-performance.

Index: src/c++98/mt_allocator.cc
===================================================================
--- src/c++98/mt_allocator.cc	(revision 186374)
+++ src/c++98/mt_allocator.cc	(working copy)
@@ -48,6 +48,7 @@
 	{
 	  __gthread_key_delete(_M_key);
 	  ::operator delete(static_cast<void*>(_M_thread_freelist_array));
+	  _M_thread_freelist = 0;
 	}
     }
   };
Comment 12 Laurent Aflonsi 2012-04-12 21:55:42 UTC
Thanks very much Paolo. It works fine on SH4 also.
Is that enough to push the patch ?
Laurent
Comment 13 Paolo Carlini 2012-04-12 23:02:02 UTC
I suppose so, yes. We could test the fix in mainline for a while and then maybe also consider having it for 4.7.1 or 4.7.2. Care to write down a short ChangeLog entry for it?
Comment 14 Laurent Aflonsi 2012-04-13 09:46:24 UTC
Thanks very much Paolo.
Here it is. I'll commit the patch in the mainline if no objection.
Laurent

2012-04-13  Laurent Alfonsi  <laurent.alfonsi@st.com>

	PR libstdc++/52604
	* src/mt_allocator.cc: (~__freelist): Reset pointer.


Index: src/c++98/mt_allocator.cc
===================================================================
--- src/c++98/mt_allocator.cc    (revision 186374)
+++ src/c++98/mt_allocator.cc    (working copy)
@@ -48,6 +48,7 @@
     {
       __gthread_key_delete(_M_key);
       ::operator delete(static_cast<void*>(_M_thread_freelist_array));
+      _M_thread_freelist = 0;
     }
     }
   };
Comment 15 Jonathan Wakely 2012-04-13 09:48:39 UTC
(In reply to comment #14)
> 
>     PR libstdc++/52604
>     * src/mt_allocator.cc: (~__freelist): Reset pointer.
           ^
           c++98/
Comment 16 Paolo Carlini 2012-04-13 10:29:02 UTC
Yes, please go ahead, mainline only for now (PR remains open) with Jon's fix to the ChangeLog and also, between parentheses (__freelist::~__freelist).
Comment 17 Paolo Carlini 2012-04-13 10:31:01 UTC
Ah, another minor nit, remember to add 2012 to the list of Copyright Years (mind to keep the comment within 80 columns)
Comment 18 Laurent Aflonsi 2012-04-13 12:00:21 UTC
Author: 	chrbr
Date: 	Fri Apr 13 11:44:13 2012
New Revision: 186414
Log Message: 	

PR:52604: (~__freelist): Reset pointer

Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/src/c++98/mt_allocator.cc
Comment 19 Laurent Aflonsi 2012-04-13 12:00:44 UTC
Author: 	chrbr
Date: 	Fri Apr 13 11:58:15 2012
New Revision: 186415
Log Message: 	

fix last entry

Modified:
    trunk/libstdc++-v3/ChangeLog
Comment 20 Paolo Carlini 2012-04-13 14:45:29 UTC
Remember to always send the patches you commit to gcc-patches@gcc.gnu.org (and libstdc++@gcc.gnu.org in CC), even if already approved on the fly in audit trail (which should not happen very frequently)

PS: are you aware that your name family name in the email address and as email sender do not match? ;)
Comment 21 chrbr 2012-04-17 13:16:06 UTC
(In reply to comment #20)
> Remember to always send the patches you commit to gcc-patches@gcc.gnu.org (and
> libstdc++@gcc.gnu.org in CC), even if already approved on the fly in audit
> trail (which should not happen very frequently)
> 
> PS: are you aware that your name family name in the email address and as email
> sender do not match? ;)

Probably my fault, we are working together and Laurent asked me to push for him, as approved. We'll start the paperwork so he can get the Write After Approval. Sorry for the confusion.
Comment 22 Richard Biener 2015-11-04 09:18:09 UTC
*** Bug 68200 has been marked as a duplicate of this bug. ***
Comment 23 Richard Biener 2015-11-04 09:18:46 UTC
According to the dup the fix is bogus (the added stmt is optimized away).