[Bug sanitizer/90589] In Fedora 30 ps hangs using address sanitizer

mathieu.desnoyers at efficios dot com gcc-bugzilla@gcc.gnu.org
Thu Aug 12 16:17:06 GMT 2021


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90589

Mathieu Desnoyers <mathieu.desnoyers at efficios dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mathieu.desnoyers@efficios.
                   |                            |com

--- Comment #13 from Mathieu Desnoyers <mathieu.desnoyers at efficios dot com> ---
So I reproduced this hang with pgrep on my Ubuntu 18.04.1 LTS.

LD_PRELOAD=/usr/lib/gcc/x86_64-linux-gnu/{8,9,10,11}/libasan.so pgrep something
[ hang ]

but 

LD_PRELOAD=/usr/lib/gcc/x86_64-linux-gnu/7/libasan.so pgrep something [ does
not hang ]

So I focus my investigation on gcc 8's libasan which is the first to introduce
this issue.

The version of glibc on this x86-64 machine is 2.27-3ubuntu1.4.

I added breakpoints on all rwlock read lock, write lock, and unlock operations
(__GI___pthread_rwlock_rdlock, __GI___pthread_rwlock_wrlock,
__GI___pthread_rwlock_unlock). Looking at what happens to the _nl_state_lock
which protects glibc's i18n handling data structures is quite enlightening.
This lock is _not_ a recursive lock. AFAIU what happens here is that:

1. intl/bindtextdom.c:set_binding_values() locks the _nl_state_lock write lock.
2. it calls malloc, but this symbol is overridden by libasan.
3. asan's malloc handler ReplaceSystemMalloc looks up "__libc_malloc_dispatch"
and if that fails it looks up "__libc_malloc_default_dispatch".
4. The dlsym lookup failure calls __dlerror, which performs a i18n lookup, thus
taking the _nl_state_lock read lock. This should not happen, as it is not a
nestable lock.
5. The _nl_state_lock is unlocked once after the i18n lookup is done.
6. The _nl_state lock is unlocked again in set_binding_values(), which corrupts
the lock state because it is not a nestable lock.
7. The next unlucky caller trying to take the lock hangs forever on futex.

Based on an IRC discussion with Carlos O'Donell, it appears to be a defect in
glibc that a malloc interposer fails during i18n translation. There was a
discussion back in 2014
(https://sourceware.org/legacy-ml/libc-alpha/2014-12/msg00954.html) regarding
reentrancy of dlopen and other libdl interfaces.


More information about the Gcc-bugs mailing list