Bug 97461 - [11 Regression] allocate_gcov_kvp() deadlocks in firefox LTO+PGO build (overridden malloc() recursion)
Summary: [11 Regression] allocate_gcov_kvp() deadlocks in firefox LTO+PGO build (overr...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: gcov-profile (show other bugs)
Version: 11.0
: P1 normal
Target Milestone: 11.0
Assignee: Martin Liška
URL: https://gcc.gnu.org/pipermail/gcc-pat...
Keywords: patch
Depends on:
Blocks:
 
Reported: 2020-10-16 15:03 UTC by Sergei Trofimovich
Modified: 2021-03-06 08:19 UTC (History)
8 users (show)

See Also:
Host:
Target:
Build:
Known to work: 10.2.0
Known to fail: 11.0
Last reconfirmed: 2020-10-19 00:00:00


Attachments
a.c (1.57 KB, text/x-csrc)
2020-10-16 15:05 UTC, Sergei Trofimovich
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sergei Trofimovich 2020-10-16 15:03:23 UTC
Single-file example is extracted from firefox-81 build hangup (LTO+PGO flavour).

Here is the single-file reproducer that converts hangup to a crash:

// gcc-11.0.0 a.c -o a -fprofile-generate -ggdb3 && ./a

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static int malloc_depth = 0;

static char memory[128* 1024];
static size_t memory_p = 0;

void f1(void) {}
void f2(void) {}

typedef void (*fun_t)(void);
static const fun_t funs[2] = { f1, f2, };

static void * malloc_impl(size_t size) {
    void * r = &memory[memory_p];
    memory_p += size;

    // force TOPN profile
    funs[size % 2]();
    return r;
}

// Override default malloc, check it it get s called recursively
void * malloc(size_t size) {
    // Must not be called recursively. Malloc implementation does not support it.
    if (malloc_depth != 0) __builtin_trap();

    ++malloc_depth;
      void * r = malloc_impl(size);
    --malloc_depth;
    return r;
}

// Called from gcov
void *calloc(size_t nmemb, size_t size) {
    // Must not be called recursively.  Malloc implementation does not support it.
    if (malloc_depth != 0) __builtin_trap();

    ++malloc_depth;
      void * r = malloc_impl(size * nmemb);
      memset(r, 0, size * nmemb);
    --malloc_depth;
    return r;
}

void free(void *ptr){}

int main() {
    void * p = malloc(8);
    return p != 0;
}

How to crash:

$ gcc-11.0.0 a.c -o a -ggdb3 && ./a
$ gcc-11.0.0 a.c -o a -fprofile-generate -ggdb3 && ./a
Illegal instruction (core dumped)

Here we have a malloc recursion of
    malloc()->malloc_internals()->gcov->calloc()->malloc_internals().

malloc() is re-entered twice:

Program received signal SIGILL, Illegal instruction.
0x00005555555565e7 in calloc (nmemb=1, size=24) at a.c:103
103         if (malloc_depth != 0) __builtin_trap();
(gdb) bt
#0  0x00005555555565e7 in calloc (nmemb=1, size=24) at a.c:103
#1  0x0000555555556cf3 in allocate_gcov_kvp () at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov.h:441
#2  gcov_topn_add_value (count=1, increment_total=1, use_atomic=0, value=721827547, counters=0x55555557b660 <__gcov4.malloc_impl>) at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov.h:489
#3  __gcov_topn_values_profiler_body (use_atomic=0, value=721827547, counters=0x55555557b660 <__gcov4.malloc_impl>) at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov-profiler.c:103
#4  __gcov_indirect_call_profiler_body (use_atomic=0, cur_func=<optimized out>, value=721827547) at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov-profiler.c:163
#5  __gcov_indirect_call_profiler_v4 (value=721827547, cur_func=<optimized out>) at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov-profiler.c:172
#6  0x000055555555631e in f1 () at a.c:74
#7  0x0000555555556482 in malloc_impl (size=8) at a.c:85
#8  0x0000555555556537 in malloc (size=8) at a.c:95
#9  0x0000555555556760 in main () at a.c:115
Comment 1 Sergei Trofimovich 2020-10-16 15:05:16 UTC
Created attachment 49388 [details]
a.c
Comment 2 Sergei Trofimovich 2020-10-16 15:09:03 UTC
Original firefox lockup is slightly more complicated: malloc() call happens in a constructor of external library (at _gpg_err_init()).

(gdb) bt
#0  __lll_lock_wait (futex=0x5591defd9720 <gInitLock>, private=0) at lowlevellock.c:52
#1  0x00007f6e72d23305 in __GI___pthread_mutex_lock (mutex=0x5591defd9720 <gInitLock>) at ../nptl/pthread_mutex_lock.c:135
#2  0x00005591deeb60a1 in malloc_init_hard() ()
#3  0x00005591deebf75e in calloc ()
#4  0x00005591defa70df in allocate_gcov_kvp () at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov.h:441
#5  gcov_topn_add_value (count=1, use_atomic=1, increment_total=1, value=4096, counters=0x5591df011520 <__gcov3._ZL16malloc_init_hardv>)
    at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov.h:489
#6  __gcov_topn_values_profiler_body (use_atomic=1, value=4096, counters=0x5591df011520 <__gcov3._ZL16malloc_init_hardv>)
    at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov-profiler.c:103
#7  __gcov_topn_values_profiler_atomic (counters=0x5591df011520 <__gcov3._ZL16malloc_init_hardv>, value=4096)
    at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov-profiler.c:128
#8  0x00005591deeb6121 in malloc_init_hard() ()
#9  0x00005591deec052e in malloc ()
#10 0x00007f6e5c5ecddf in set_binding_values (domainname=0x7f6e58451245 "libgpg-error", dirnamep=0x7fffd54f39c8, codesetp=0x0) at bindtextdom.c:202
#11 0x00007f6e5c5ed071 in set_binding_values (codesetp=0x0, dirnamep=0x7fffd54f39c8, domainname=<optimized out>) at bindtextdom.c:82
#12 __bindtextdomain (domainname=<optimized out>, dirname=<optimized out>) at bindtextdom.c:320
#13 0x00007f6e5843cbc7 in _gpg_err_init () at /usr/lib64/libgpg-error.so.0
#14 0x00007f6e72d94cfe in call_init (l=<optimized out>, argc=argc@entry=3, argv=argv@entry=0x7fffd54f3a68, env=env@entry=0x7fffd54f3a88) at dl-init.c:74
#15 0x00007f6e72d94de0 in call_init (env=0x7fffd54f3a88, argv=0x7fffd54f3a68, argc=3, l=<optimized out>) at dl-init.c:37
#16 _dl_init (main_map=0x7f6e72db11a0, argc=3, argv=0x7fffd54f3a68, env=0x7fffd54f3a88) at dl-init.c:121
#17 0x00007f6e72d8608a in _dl_start_user () at /lib64/ld-linux-x86-64.so.2
#18 0x0000000000000003 in  ()
#19 0x00007fffd54f4a83 in  ()
#20 0x00007fffd54f4a9c in  ()
#21 0x00007fffd54f4b15 in  ()
#22 0x0000000000000000 in  ()
Comment 3 Sergei Trofimovich 2020-10-16 15:16:38 UTC
Probably started from https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=871e5ada6d53d5eb ("Make TOPN counter dynamically allocated.") when dynamic memory allocation call was added to gcov_topn_add_value().
Comment 4 Martin Liška 2020-10-19 07:12:55 UTC
Mine, thanks for the report.
Comment 5 Richard Biener 2020-10-19 07:15:27 UTC
Hmm, is the TOPN allocation strathegy configurable?  I wonder whether we have
to resort to an alternate allocation scheme (mmap/sbrk), avoiding libc?  At least
I don't see a good way to force the gcov allocation to call the libc malloc
rather than a user replacement that is being instrumented.  Of course the
instrumentation code could do sth like

  if (is_allocated == 0)
    {
      is_allocated = in_progress;
      ... = malloc ();
      is_allocated = 1;
    }
  else if (is_allocted == in_progress)
    {
      topn_mem = &transitional_garbage_space;
    }

but of course that's quite some overhead for a small benefit.  Maybe it
could be hidden in gcov_malloc.
Comment 6 Martin Liška 2020-10-19 13:42:01 UTC
(In reply to Richard Biener from comment #5)
> Hmm, is the TOPN allocation strathegy configurable?  I wonder whether we have
> to resort to an alternate allocation scheme (mmap/sbrk), avoiding libc?

No. The only thing we support is a recursive malloc as seen in:
./gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-malloc.c

It was added in g:bc2b1a232b1825b421a1aaa21a0865b2d1e4e08c as we use a statically allocated buffer when we recursively entry allocate_gcov_kvp.

However this is different as we can't call malloc/calloc from the function as we're in code that initializes a memory allocator.

We can mitigate the issue with a pair of new functions __gcov_supress_malloc and __gcov_alloc_malloc that will be called by a custom memory allocator.

What do you think about it?

>  At
> least
> I don't see a good way to force the gcov allocation to call the libc malloc
> rather than a user replacement that is being instrumented.  Of course the
> instrumentation code could do sth like
> 
>   if (is_allocated == 0)
>     {
>       is_allocated = in_progress;
>       ... = malloc ();
>       is_allocated = 1;
>     }
>   else if (is_allocted == in_progress)
>     {
>       topn_mem = &transitional_garbage_space;
>     }
> 
> but of course that's quite some overhead for a small benefit.  Maybe it
> could be hidden in gcov_malloc.
Comment 7 Jan Hubicka 2020-10-19 13:46:25 UTC
> No. The only thing we support is a recursive malloc as seen in:
> ./gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-malloc.c
> 
> It was added in g:bc2b1a232b1825b421a1aaa21a0865b2d1e4e08c as we use a
> statically allocated buffer when we recursively entry allocate_gcov_kvp.
> 
> However this is different as we can't call malloc/calloc from the function as
> we're in code that initializes a memory allocator.
> 
> We can mitigate the issue with a pair of new functions __gcov_supress_malloc
> and __gcov_alloc_malloc that will be called by a custom memory allocator.
> 
> What do you think about it?

How this works with the llvm implementation (that is very similar here,
right?)

Honza
Comment 8 Martin Liška 2020-10-19 14:15:16 UTC
(In reply to Jan Hubicka from comment #7)
> > No. The only thing we support is a recursive malloc as seen in:
> > ./gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-malloc.c
> > 
> > It was added in g:bc2b1a232b1825b421a1aaa21a0865b2d1e4e08c as we use a
> > statically allocated buffer when we recursively entry allocate_gcov_kvp.
> > 
> > However this is different as we can't call malloc/calloc from the function as
> > we're in code that initializes a memory allocator.
> > 
> > We can mitigate the issue with a pair of new functions __gcov_supress_malloc
> > and __gcov_alloc_malloc that will be called by a custom memory allocator.
> > 
> > What do you think about it?
> 
> How this works with the llvm implementation (that is very similar here,
> right?)
> 
> Honza

They have the very same problem when I disable a statically pre-allocated buffers with -mllvm -vp-static-alloc=0:

Program received signal SIGILL, Illegal instruction.
0x00000000004014e6 in calloc (nmemb=1, size=8) at pr97461.c:103
103	    if (malloc_depth != 0) __builtin_trap();
(gdb) bt
#0  0x00000000004014e6 in calloc (nmemb=1, size=8) at pr97461.c:103
#1  0x0000000000401ae1 in allocateValueProfileCounters (Data=0x40a2c8) at /home/marxin/Programming/llvm-project/compiler-rt/lib/profile/InstrProfilingValue.c:101
#2  0x0000000000401c45 in instrumentTargetValueImpl (CountValue=1, CounterIndex=0, Data=0x40a2c8, TargetValue=4199264) at /home/marxin/Programming/llvm-project/compiler-rt/lib/profile/InstrProfilingValue.c:146
#3  __llvm_profile_instrument_target (TargetValue=4199264, Data=0x40a2c8, CounterIndex=0) at /home/marxin/Programming/llvm-project/compiler-rt/lib/profile/InstrProfilingValue.c:232
#4  0x000000000040148f in malloc_impl (size=56) at pr97461.c:85
#5  0x00000000004013fe in malloc (size=56) at pr97461.c:95
#6  0x00007ffff7e048a3 in __add_to_environ (name=0x406138 "__LLVM_PROFILE_RT_INIT_ONCE", value=<optimized out>, combined=<optimized out>, replace=<optimized out>) at setenv.c:215
#7  0x0000000000402ce4 in truncateCurrentFile ()
#8  0x00000000004039bc in parseAndSetFilename ()
#9  0x0000000000404134 in __llvm_profile_initialize ()
#10 0x0000000000405e95 in __libc_csu_init (argc=argc@entry=1, argv=argv@entry=0x7fffffffdfa8, envp=0x7fffffffdfb8) at elf-init.c:89
#11 0x00007ffff7decd9a in __libc_start_main (main=0x401580 <main>, argc=1, argv=0x7fffffffdfa8, init=0x405e50 <__libc_csu_init>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdf98) at ../csu/libc-start.c:270
#12 0x00000000004012aa in _start () at ../sysdeps/x86_64/start.S:120
Comment 9 Jan Hubicka 2020-10-19 14:52:00 UTC
> 
> They have the very same problem when I disable a statically pre-allocated
> buffers with -mllvm -vp-static-alloc=0:
> 
> Program received signal SIGILL, Illegal instruction.
> 0x00000000004014e6 in calloc (nmemb=1, size=8) at pr97461.c:103
> 103         if (malloc_depth != 0) __builtin_trap();
> (gdb) bt
> #0  0x00000000004014e6 in calloc (nmemb=1, size=8) at pr97461.c:103
> #1  0x0000000000401ae1 in allocateValueProfileCounters (Data=0x40a2c8) at
> /home/marxin/Programming/llvm-project/compiler-rt/lib/profile/InstrProfilingValue.c:101
> #2  0x0000000000401c45 in instrumentTargetValueImpl (CountValue=1,
> CounterIndex=0, Data=0x40a2c8, TargetValue=4199264) at
> /home/marxin/Programming/llvm-project/compiler-rt/lib/profile/InstrProfilingValue.c:146
> #3  __llvm_profile_instrument_target (TargetValue=4199264, Data=0x40a2c8,
> CounterIndex=0) at
> /home/marxin/Programming/llvm-project/compiler-rt/lib/profile/InstrProfilingValue.c:232
> #4  0x000000000040148f in malloc_impl (size=56) at pr97461.c:85
> #5  0x00000000004013fe in malloc (size=56) at pr97461.c:95
> #6  0x00007ffff7e048a3 in __add_to_environ (name=0x406138
> "__LLVM_PROFILE_RT_INIT_ONCE", value=<optimized out>, combined=<optimized out>,
> replace=<optimized out>) at setenv.c:215
> #7  0x0000000000402ce4 in truncateCurrentFile ()
> #8  0x00000000004039bc in parseAndSetFilename ()
> #9  0x0000000000404134 in __llvm_profile_initialize ()
> #10 0x0000000000405e95 in __libc_csu_init (argc=argc@entry=1,
> argv=argv@entry=0x7fffffffdfa8, envp=0x7fffffffdfb8) at elf-init.c:89
> #11 0x00007ffff7decd9a in __libc_start_main (main=0x401580 <main>, argc=1,
> argv=0x7fffffffdfa8, init=0x405e50 <__libc_csu_init>, fini=<optimized out>,
> rtld_fini=<optimized out>, stack_end=0x7fffffffdf98) at ../csu/libc-start.c:270
> #12 0x00000000004012aa in _start () at ../sysdeps/x86_64/start.S:120

Hmm, it seems to me that having some entries prealocated by default
would be way to avoid this problem in majority cases w/o having to
modify the upstream packages. 

Honza
Comment 10 Martin Liška 2020-10-19 14:58:56 UTC
> Hmm, it seems to me that having some entries prealocated by default
> would be way to avoid this problem in majority cases w/o having to
> modify the upstream packages.

I tend to the same. I'm going to prepare a patch for it.
Comment 11 GCC Commits 2020-10-27 10:50:27 UTC
The master branch has been updated by Martin Liska <marxin@gcc.gnu.org>:

https://gcc.gnu.org/g:14e19b82c1e67ead60c3095ac23347317298904b

commit r11-4423-g14e19b82c1e67ead60c3095ac23347317298904b
Author: Martin Liska <mliska@suse.cz>
Date:   Mon Oct 19 17:40:00 2020 +0200

    gcov-profile: use static pool for TOPN first
    
    gcc/ChangeLog:
    
            PR gcov-profile/97461
            * gcov-io.h (GCOV_PREALLOCATED_KVP): Pre-allocate 64
            static counters.
    
    libgcc/ChangeLog:
    
            PR gcov-profile/97461
            * libgcov.h (gcov_counter_add): Use first static counters
            as it should help to have malloc wrappers set up.
    
    gcc/testsuite/ChangeLog:
    
            PR gcov-profile/97461
            * gcc.dg/tree-prof/pr97461.c: New test.
Comment 12 Martin Liška 2020-10-27 10:50:56 UTC
Fixed now.
Comment 13 Sergei Trofimovich 2020-10-27 22:18:31 UTC
Tried firefox-82 with LTO+PGO today on gcc-11-4428-g4a369d199bf. It gets a lot more forward, but still gets stuck. This time on free()->allocate_gcov_kvp ()->alloc() deadloc.

The backtrace:

(gdb) bt
#0  __lll_lock_wait (futex=0x7ffff7800018, private=0) at lowlevellock.c:52
#1  0x00007ffff7f69305 in __GI___pthread_mutex_lock (mutex=0x7ffff7800018) at ../nptl/pthread_mutex_lock.c:135
#2  0x00005555555b2151 in arena_t::MallocSmall(unsigned long, bool) ()
#3  0x00005555555b307c in arena_t::Malloc(unsigned long, bool) ()
#4  0x00005555555b41f2 in calloc ()
#5  0x00005555556a49ab in allocate_gcov_kvp () at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov.h:431
#6  gcov_topn_add_value (count=1, use_atomic=1, increment_total=1, value=496, counters=0x5555557158c0 <__gcov3._ZL20arena_run_reg_dallocP11arena_run_tP11arena_bin_tPvm>)
    at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov.h:477
#7  __gcov_topn_values_profiler_body (use_atomic=1, value=496, counters=0x5555557158c0 <__gcov3._ZL20arena_run_reg_dallocP11arena_run_tP11arena_bin_tPvm>)
    at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov-profiler.c:103
#8  __gcov_topn_values_profiler_atomic (counters=0x5555557158c0 <__gcov3._ZL20arena_run_reg_dallocP11arena_run_tP11arena_bin_tPvm>, value=496)
    at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov-profiler.c:128
#9  0x00005555555b1065 in arena_t::DallocSmall(arena_chunk_t*, void*, arena_chunk_map_t*) ()
#10 0x00005555555b166c in free ()
#11 0x00007fffe1b0ef1a in CollectProcessInfo(ProcessInfo&) () at /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/instrumented/dist/firefox/libxul.so
#12 0x00007fffea0afa72 in PreRecordMetaInformation() () at /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/instrumented/dist/firefox/libxul.so
#13 0x00007fffea0b28c8 in profiler_shutdown(IsFastShutdown) () at /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/instrumented/dist/firefox/libxul.so
#14 0x00007fffea7c6aa0 in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) ()
    at /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/instrumented/dist/firefox/libxul.so
#15 0x00007fffea7c7543 in XRE_main(int, char**, mozilla::BootstrapConfig const&) () at /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/instrumented/dist/firefox/libxul.so
#16 0x00007fffea7cfbd0 in mozilla::BootstrapImpl::XRE_main(int, char**, mozilla::BootstrapConfig const&) ()
    at /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/instrumented/dist/firefox/libxul.so
#17 0x000055555559a370 in do_main(int, char**, char**) ()
#18 0x00005555555980a1 in main ()

Worth filing a new bug, or this would be good enough?
Comment 14 Martin Liška 2020-10-29 10:19:59 UTC
(In reply to Sergei Trofimovich from comment #13)
> Tried firefox-82 with LTO+PGO today on gcc-11-4428-g4a369d199bf. It gets a
> lot more forward, but still gets stuck. This time on
> free()->allocate_gcov_kvp ()->alloc() deadloc.

Hmm, that's quite unpleasant. Anyway, thanks for testing that.

> 
> The backtrace:
> 
> (gdb) bt
> #0  __lll_lock_wait (futex=0x7ffff7800018, private=0) at lowlevellock.c:52
> #1  0x00007ffff7f69305 in __GI___pthread_mutex_lock (mutex=0x7ffff7800018)
> at ../nptl/pthread_mutex_lock.c:135
> #2  0x00005555555b2151 in arena_t::MallocSmall(unsigned long, bool) ()
> #3  0x00005555555b307c in arena_t::Malloc(unsigned long, bool) ()
> #4  0x00005555555b41f2 in calloc ()
> #5  0x00005555556a49ab in allocate_gcov_kvp () at
> /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/
> libgcov.h:431
> #6  gcov_topn_add_value (count=1, use_atomic=1, increment_total=1,
> value=496, counters=0x5555557158c0
> <__gcov3._ZL20arena_run_reg_dallocP11arena_run_tP11arena_bin_tPvm>)
>     at
> /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/
> libgcov.h:477
> #7  __gcov_topn_values_profiler_body (use_atomic=1, value=496,
> counters=0x5555557158c0
> <__gcov3._ZL20arena_run_reg_dallocP11arena_run_tP11arena_bin_tPvm>)
>     at
> /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/
> libgcov-profiler.c:103
> #8  __gcov_topn_values_profiler_atomic (counters=0x5555557158c0
> <__gcov3._ZL20arena_run_reg_dallocP11arena_run_tP11arena_bin_tPvm>,
> value=496)
>     at
> /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/
> libgcov-profiler.c:128
> #9  0x00005555555b1065 in arena_t::DallocSmall(arena_chunk_t*, void*,
> arena_chunk_map_t*) ()
> #10 0x00005555555b166c in free ()
> #11 0x00007fffe1b0ef1a in CollectProcessInfo(ProcessInfo&) () at
> /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/
> instrumented/dist/firefox/libxul.so
> #12 0x00007fffea0afa72 in PreRecordMetaInformation() () at
> /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/
> instrumented/dist/firefox/libxul.so
> #13 0x00007fffea0b28c8 in profiler_shutdown(IsFastShutdown) () at
> /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/
> instrumented/dist/firefox/libxul.so
> #14 0x00007fffea7c6aa0 in XREMain::XRE_main(int, char**,
> mozilla::BootstrapConfig const&) ()
>     at
> /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/
> instrumented/dist/firefox/libxul.so
> #15 0x00007fffea7c7543 in XRE_main(int, char**, mozilla::BootstrapConfig
> const&) () at
> /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/
> instrumented/dist/firefox/libxul.so
> #16 0x00007fffea7cfbd0 in mozilla::BootstrapImpl::XRE_main(int, char**,
> mozilla::BootstrapConfig const&) ()
>     at
> /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/
> instrumented/dist/firefox/libxul.so
> #17 0x000055555559a370 in do_main(int, char**, char**) ()
> #18 0x00005555555980a1 in main ()

Can you please run in gdb how many times is allocate_gcov_kvp called before
we reach the deadlock?

> 
> Worth filing a new bug, or this would be good enough?

We can stay with this PR as it contains useful discussion.

Anyway, one another fix can be usage of the suggested __gcov_supress_malloc/__gcov_allow_malloc
that will be used in an alternative memory allocator..
Comment 15 Sergei Trofimovich 2020-10-29 14:53:24 UTC
allocate_gcov_kvp() gets called 89 times. Tried as:

$ gdb --quiet --args dist/firefox/fire
fox 'data:text/html,<script>Quitter.quit()</script>' -profile ../../firefox-82.0/temp/tmpzhv0a26v
(gdb) break allocate_gcov_kvp
Breakpoint 1 at 0x1507b0: allocate_gcov_kvp. (4 locations)
(gdb) ignore 1 999999
Will ignore next 999999 crossings of breakpoint 1.
(gdb) continue
The program is not being run.
(gdb) run
^C
Thread 1 "firefox" received signal SIGINT, Interrupt.
__lll_lock_wait (futex=0x7ffff7800018, private=0) at lowlevellock.c:52
(gdb) info breakpoints
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   <MULTIPLE>
        breakpoint already hit 89 times
        ignore next 999910 hits
Comment 16 Martin Liška 2020-10-29 15:00:25 UTC
(In reply to Sergei Trofimovich from comment #15)
> allocate_gcov_kvp() gets called 89 times. Tried as:

All right, it's quite close to what we have.
So can you please test the following patch:

diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h
index 4e95c7c82ee..a899a02b765 100644
--- a/gcc/gcov-io.h
+++ b/gcc/gcov-io.h
@@ -293,7 +293,7 @@ GCOV_COUNTERS
 #define GCOV_TOPN_MAXIMUM_TRACKED_VALUES 32
 
 /* Number of pre-allocated gcov_kvp structures.  */
-#define GCOV_PREALLOCATED_KVP 64
+#define GCOV_PREALLOCATED_KVP 128
 
 /* Convert a counter index to a tag.  */
 #define GCOV_TAG_FOR_COUNTER(COUNT)				\
Comment 17 Martin Liška 2020-10-29 18:54:46 UTC
Or I may have a smarter trick: let's do dry run of malloc/free functions early in __gcov_init. Can you please test the patch as well?

commit d80cecdb87d159f0b3a4072b6e1ef80515ad2afb
Author: Martin Liska <mliska@suse.cz>
Date:   Thu Oct 29 19:52:37 2020 +0100

    PGO: use dry run of malloc and free functions.

diff --git a/libgcc/libgcov-driver.c b/libgcc/libgcov-driver.c
index e53e4dc392a..dd481b48e22 100644
--- a/libgcc/libgcov-driver.c
+++ b/libgcc/libgcov-driver.c
@@ -608,12 +608,25 @@ __gcov_exit (void)
   gcov_error_exit ();
 }
 
+/* Test memory that forces malloc and free functions to be called early
+   so that allocate_gcov_kvp can use calloc.  Indirect call counters
+   taken from a pre-allocated array are used before that.  */
+char *__gcov_test_memory;
+
 /* Add a new object file onto the bb chain.  Invoked automatically
   when running an object file's global ctors.  */
 
 void
 __gcov_init (struct gcov_info *info)
 {
+  static int malloc_utilized = 0;
+  if (!malloc_utilized)
+    {
+      __gcov_test_memory  = malloc (16);
+      free (__gcov_test_memory);
+      malloc_utilized = 1;
+    }
+
   if (!info->version || !info->n_functions)
     return;
   if (gcov_version (info, info->version, 0))
Comment 18 Jakub Jelinek 2020-10-29 19:11:19 UTC
Formatting - one space after = rather than two.
Why do you need the __gcov_test_memory variable at all?
If you want to avoid optimizing away the malloc/free pair, just make it volatile
- char *volatile p = malloc (16); free (p); or use a memory barrier in there.
Can the function be called from multiple threads concurrently?
I guess it isn't a big deal if malloc/free pair is called in each of them once, but perhaps it should use __atomic_load_n and __atomic_store_n for the malloc_initialized variable?
Comment 19 Sergei Trofimovich 2020-10-30 08:45:38 UTC
(In reply to Martin Liška from comment #16)
> (In reply to Sergei Trofimovich from comment #15)
> > allocate_gcov_kvp() gets called 89 times. Tried as:
> 
> All right, it's quite close to what we have.
> So can you please test the following patch:
> 
> diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h
> index 4e95c7c82ee..a899a02b765 100644
> --- a/gcc/gcov-io.h
> +++ b/gcc/gcov-io.h
> @@ -293,7 +293,7 @@ GCOV_COUNTERS
>  #define GCOV_TOPN_MAXIMUM_TRACKED_VALUES 32
>  
>  /* Number of pre-allocated gcov_kvp structures.  */
> -#define GCOV_PREALLOCATED_KVP 64
> +#define GCOV_PREALLOCATED_KVP 128
>  
>  /* Convert a counter index to a tag.  */
>  #define GCOV_TAG_FOR_COUNTER(COUNT)				\

Still hangs somewhere in the middle of firefox inners:

(gdb) bt
#0  __lll_lock_wait (futex=0x7f03a8900018, private=0) at lowlevellock.c:52
#1  0x00007f03a9097305 in __GI___pthread_mutex_lock (mutex=0x7f03a8900018) at ../nptl/pthread_mutex_lock.c:135
#2  0x0000556eee199141 in arena_t::MallocSmall(unsigned long, bool) ()
#3  0x0000556eee19a06c in arena_t::Malloc(unsigned long, bool) ()
#4  0x0000556eee19b1e2 in calloc ()
#5  0x0000556eee28ba2b in allocate_gcov_kvp () at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov.h:431
#6  gcov_topn_add_value (count=1, use_atomic=1, increment_total=1, value=480, counters=0x556eee2fc8c0 <__gcov3._ZL20arena_run_reg_dallocP11arena_run_tP11arena_bin_tPvm>)
    at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov.h:477
#7  __gcov_topn_values_profiler_body (use_atomic=1, value=480, counters=0x556eee2fc8c0 <__gcov3._ZL20arena_run_reg_dallocP11arena_run_tP11arena_bin_tPvm>)
    at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov-profiler.c:103
#8  __gcov_topn_values_profiler_atomic (counters=0x556eee2fc8c0 <__gcov3._ZL20arena_run_reg_dallocP11arena_run_tP11arena_bin_tPvm>, value=480)
    at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov-profiler.c:128
#9  0x0000556eee198055 in arena_t::DallocSmall(arena_chunk_t*, void*, arena_chunk_map_t*) ()
#10 0x0000556eee19865c in free ()
#11 0x00007f03a8ca5bff in _IO_deallocate_file (fp=0x7f0372220b60) at libioP.h:863
#12 _IO_new_fclose (fp=0x7f0372220b60) at iofclose.c:74
#13 0x00007f039dbd5b62 in __gcov_close () at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/../gcc/gcov-io.c:212
#14 0x00007f039dbd62ed in dump_one_gcov (run_max=139, run_counted=0, gf=<synthetic pointer>, gi_ptr=0x7f03a3c432e0)
    at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov-driver.c:516
#15 gcov_do_dump (list=<optimized out>, run_counted=0) at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov-driver.c:555
#16 0x00007f039dbd7152 in __gcov_dump_one (root=0x7f03a616b700 <__gcov_root>) at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov-driver.c:578
#17 __gcov_dump_one (root=0x7f03a616b700 <__gcov_root>) at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov-driver.c:573
#18 0x00007f039dbd7318 in __gcov_dump_int () at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov-interface.c:160
#19 __gcov_dump () at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov-interface.c:170
#20 0x00007f039dbd5497 in __gcov_execve
    (path=0x7f036db530b0 "/home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/instrumented/dist/firefox/firefox", argv=0x7f0372632a00, envp=0x7f036cc8e000)
    at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov-interface.c:343
#21 0x00007f039388b0fb in base::LaunchApp(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, base::LaunchOptions const&, int*) ()
    at /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/instrumented/dist/firefox/libxul.so
#22 0x00007f0393929bac in mozilla::ipc::PosixProcessLauncher::DoLaunch() () at /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/instrumented/dist/firefox/libxul.so
#23 0x00007f039392a35a in mozilla::ipc::BaseProcessLauncher::PerformAsyncLaunch() ()
    at /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/instrumented/dist/firefox/libxul.so
#24 0x00007f0393920296 in mozilla::detail::ProxyRunnable<mozilla::MozPromise<mozilla::ipc::LaunchResults, mozilla::ipc::LaunchError, false>, RefPtr<mozilla::MozPromise<mozilla::ipc::LaunchResults, mozilla::ipc::LaunchError, false> > (mozilla::ipc::BaseProcessLauncher::*)(), mozilla::ipc::BaseProcessLauncher>::Run() ()
    at /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/instrumented/dist/firefox/libxul.so
#25 0x00007f0392db38f0 in mozilla::TaskQueue::Runner::Run() () at /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/instrumented/dist/firefox/libxul.so
#26 0x00007f0392d978f5 in nsThread::ProcessNextEvent(bool, bool*) () at /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/instrumented/dist/firefox/libxul.so
#27 0x00007f0392d77130 in NS_ProcessNextEvent(nsIThread*, bool) () at /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/instrumented/dist/firefox/libxul.so
#28 0x00007f03939554cd in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ()
    at /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/instrumented/dist/firefox/libxul.so
#29 0x00007f0393891ecd in MessageLoop::Run() () at /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/instrumented/dist/firefox/libxul.so
#30 0x00007f0392dbc45a in nsThread::ThreadFunc(void*) () at /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/instrumented/dist/firefox/libxul.so
#31 0x00007f03a8adf880 in _pt_root () at /usr/lib64/libnspr4.so
#32 0x00007f03a9094e6e in start_thread (arg=0x7f0372755640) at pthread_create.c:463
#33 0x00007f03a8d2ed2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Comment 20 Sergei Trofimovich 2020-10-30 23:37:34 UTC
(In reply to Martin Liška from comment #17)
> Or I may have a smarter trick: let's do dry run of malloc/free functions
> early in __gcov_init. Can you please test the patch as well?
> 
> commit d80cecdb87d159f0b3a4072b6e1ef80515ad2afb
> Author: Martin Liska <mliska@suse.cz>
> Date:   Thu Oct 29 19:52:37 2020 +0100
> 
>     PGO: use dry run of malloc and free functions.
> 
> diff --git a/libgcc/libgcov-driver.c b/libgcc/libgcov-driver.c
> index e53e4dc392a..dd481b48e22 100644
> --- a/libgcc/libgcov-driver.c
> +++ b/libgcc/libgcov-driver.c
> @@ -608,12 +608,25 @@ __gcov_exit (void)
>    gcov_error_exit ();
>  }
>  
> +/* Test memory that forces malloc and free functions to be called early
> +   so that allocate_gcov_kvp can use calloc.  Indirect call counters
> +   taken from a pre-allocated array are used before that.  */
> +char *__gcov_test_memory;
> +
>  /* Add a new object file onto the bb chain.  Invoked automatically
>    when running an object file's global ctors.  */
>  
>  void
>  __gcov_init (struct gcov_info *info)
>  {
> +  static int malloc_utilized = 0;
> +  if (!malloc_utilized)
> +    {
> +      __gcov_test_memory  = malloc (16);
> +      free (__gcov_test_memory);
> +      malloc_utilized = 1;
> +    }
> +
>    if (!info->version || !info->n_functions)
>      return;
>    if (gcov_version (info, info->version, 0))

Did not seem to help (I only applied the patch, did not bump GCOV_PREALLOCATED_KVP 64 to 128). The backtrace of new deadlock:

(gdb) bt
#0  __lll_lock_wait (futex=0x7eff76800018, private=0) at lowlevellock.c:52
#1  0x00007eff76f5f305 in __GI___pthread_mutex_lock (mutex=0x7eff76800018) at ../nptl/pthread_mutex_lock.c:135
#2  0x0000563beb0fd171 in arena_t::MallocSmall(unsigned long, bool) ()
#3  0x0000563beb0fe09c in arena_t::Malloc(unsigned long, bool) ()
#4  0x0000563beb0ff212 in calloc ()
#5  0x0000563beb1efa5b in allocate_gcov_kvp () at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov.h:431
#6  gcov_topn_add_value (count=1, use_atomic=1, increment_total=1, value=496, counters=0x563beb2608c0 <__gcov3._ZL20arena_run_reg_dallocP11arena_run_tP11arena_bin_tPvm>)
    at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov.h:477
#7  __gcov_topn_values_profiler_body (use_atomic=1, value=496, counters=0x563beb2608c0 <__gcov3._ZL20arena_run_reg_dallocP11arena_run_tP11arena_bin_tPvm>)
    at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov-profiler.c:103
#8  __gcov_topn_values_profiler_atomic (counters=0x563beb2608c0 <__gcov3._ZL20arena_run_reg_dallocP11arena_run_tP11arena_bin_tPvm>, value=496)
    at /var/tmp/portage/sys-devel/gcc-11.0.0_pre9999/work/gcc-11.0.0_pre9999/libgcc/libgcov-profiler.c:128
#9  0x0000563beb0fc085 in arena_t::DallocSmall(arena_chunk_t*, void*, arena_chunk_map_t*) ()
#10 0x0000563beb0fc68c in free ()
#11 0x00007eff59e73047 in driConcatConfigs () at /usr/lib64/dri/swrast_dri.so
#12 0x00007eff599f6714 in dri_init_screen_helper () at /usr/lib64/dri/swrast_dri.so
#13 0x00007eff599efe63 in drisw_init_screen () at /usr/lib64/dri/swrast_dri.so
#14 0x00007eff59e73c15 in driCreateNewScreen2 () at /usr/lib64/dri/swrast_dri.so
#15 0x00007eff5a536c44 in driswCreateScreen () at /usr/lib64/libGLX_mesa.so.0
#16 0x00007eff5a53ce10 in __glXInitialize () at /usr/lib64/libGLX_mesa.so.0
#17 0x00007eff5a539371 in glXQueryServerString () at /usr/lib64/libGLX_mesa.so.0
#18 0x00007eff74f82b79 in epoxy_glx_version () at /usr/lib64/libepoxy.so.0
#19 0x00007eff75373645 in gdk_x11_screen_init_gl () at /usr/lib64/libgdk-3.so.0
#20 0x00007eff753739ea in _gdk_x11_screen_update_visuals_for_gl () at /usr/lib64/libgdk-3.so.0
#21 0x00007eff7537c9d8 in _gdk_x11_screen_init_visuals () at /usr/lib64/libgdk-3.so.0
#22 0x00007eff753799ab in _gdk_x11_screen_new () at /usr/lib64/libgdk-3.so.0
#23 0x00007eff75369450 in _gdk_x11_display_open () at /usr/lib64/libgdk-3.so.0
#24 0x00007eff7533b8c8 in gdk_display_manager_open_display () at /usr/lib64/libgdk-3.so.0
#25 0x00007eff697ad710 in XREMain::XRE_mainStartup(bool*) () at /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/instrumented/dist/firefox/libxul.so
#26 0x00007eff697b2bd5 in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) ()
    at /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/instrumented/dist/firefox/libxul.so
#27 0x00007eff697b3bf3 in XRE_main(int, char**, mozilla::BootstrapConfig const&) () at /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/instrumented/dist/firefox/libxul.so
#28 0x00007eff697bc280 in mozilla::BootstrapImpl::XRE_main(int, char**, mozilla::BootstrapConfig const&) ()
    at /home/slyfox/tmp/portage/www-client/firefox-82.0/work/firefox_build/instrumented/dist/firefox/libxul.so
#29 0x0000563beb0e53a0 in do_main(int, char**, char**) ()
#30 0x0000563beb0e30d1 in main ()

I suspect that this custom malloc hits TOPN (and resizes dynamic array) frequently.
Comment 21 GCC Commits 2020-11-06 13:47:54 UTC
The master branch has been updated by Martin Liska <marxin@gcc.gnu.org>:

https://gcc.gnu.org/g:15bcd01a94c23f312ac03b5faea29f18ef8d2a07

commit r11-4785-g15bcd01a94c23f312ac03b5faea29f18ef8d2a07
Author: Kewen Lin <linkw@linux.ibm.com>
Date:   Fri Nov 6 14:45:06 2020 +0100

    testsuite: fix malloc alignment in test
    
    gcc/testsuite/ChangeLog:
    
            PR gcov-profile/97461
            * gcc.dg/tree-prof/pr97461.c: Return aligned memory.
Comment 22 Martin Liška 2020-11-06 13:49:00 UTC
(In reply to CVS Commits from comment #21)
> The master branch has been updated by Martin Liska <marxin@gcc.gnu.org>:
> 
> https://gcc.gnu.org/g:15bcd01a94c23f312ac03b5faea29f18ef8d2a07
> 
> commit r11-4785-g15bcd01a94c23f312ac03b5faea29f18ef8d2a07
> Author: Kewen Lin <linkw@linux.ibm.com>
> Date:   Fri Nov 6 14:45:06 2020 +0100
> 
>     testsuite: fix malloc alignment in test
>     
>     gcc/testsuite/ChangeLog:
>     
>             PR gcov-profile/97461
>             * gcc.dg/tree-prof/pr97461.c: Return aligned memory.

Bad revision, it belongs to PR97594.
Comment 23 Martin Liška 2020-11-09 09:57:01 UTC
Thank you for the testing. Neither of the suggested changes help us.
The core issue is that gcov does not know if it's called from a free/malloc function.

That said, I see currently 2 possible fixes:

1. help from a malloc library that uses __gcov_supress_malloc/__gcov_allow_malloc:

commit ee382a226ba0e76e2242dcbb4768033a4809a890
Author: Martin Liska <mliska@suse.cz>
Date:   Fri Nov 6 16:59:43 2020 +0100

    WIP patch.

diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
index d6075d32bd4..b235231f9d6 100644
--- a/libgcc/Makefile.in
+++ b/libgcc/Makefile.in
@@ -907,7 +907,7 @@ LIBGCOV_PROFILER = _gcov_interval_profiler				\
 LIBGCOV_INTERFACE = _gcov_dump _gcov_fork				\
 	_gcov_execl _gcov_execlp					\
 	_gcov_execle _gcov_execv _gcov_execvp _gcov_execve _gcov_reset  \
-	_gcov_lock_unlock
+	_gcov_lock_unlock _gcov_supress_malloc _gcov_allow_malloc
 LIBGCOV_DRIVER = _gcov
 
 libgcov-merge-objects = $(patsubst %,%$(objext),$(LIBGCOV_MERGE))
diff --git a/libgcc/gcov.h b/libgcc/gcov.h
index 0e3eed31032..444f996692e 100644
--- a/libgcc/gcov.h
+++ b/libgcc/gcov.h
@@ -33,4 +33,12 @@ extern void __gcov_reset (void);
 
 extern void __gcov_dump (void);
 
+/* Prevent from calling malloc.  */
+
+extern void __gcov_supress_malloc (void);
+
+/* Allow calling malloc.  */
+
+extern void __gcov_allow_malloc (void);
+
 #endif /* GCC_GCOV_H */
diff --git a/libgcc/libgcov-interface.c b/libgcc/libgcov-interface.c
index 3a8a5bf44b8..bcfc38b85c3 100644
--- a/libgcc/libgcov-interface.c
+++ b/libgcc/libgcov-interface.c
@@ -36,6 +36,14 @@ void __gcov_reset (void) {}
 void __gcov_dump (void) {}
 #endif
 
+#ifdef L_gcov_supress_malloc
+void __gcov_supress_malloc (void) {}
+#endif
+
+#ifdef L_gcov_allow_malloc
+void __gcov_allow_malloc (void) {}
+#endif
+
 #else
 
 extern __gthread_mutex_t __gcov_mx ATTRIBUTE_HIDDEN;
@@ -174,6 +182,20 @@ __gcov_dump (void)
 
 #endif /* L_gcov_dump */
 
+#ifdef L_gcov_supress_malloc
+void __gcov_supress_malloc (void)
+{
+  __gcov_block_malloc = 1;
+}
+#endif /* L_gcov_supress_malloc */
+
+#ifdef L_gcov_allow_malloc
+void __gcov_allow_malloc (void)
+{
+  __gcov_block_malloc = 0;
+}
+#endif /* L_gcov_block_malloc */
+
 #ifdef L_gcov_fork
 /* A wrapper for the fork function.  We reset counters in the child
    so that they are not counted twice.  */
diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
index 45ab93c9776..b83b230cd9c 100644
--- a/libgcc/libgcov-profiler.c
+++ b/libgcc/libgcov-profiler.c
@@ -141,6 +141,12 @@ __thread
 #endif
 struct indirect_call_tuple __gcov_indirect_call;
 
+// TODO: move to a proper place
+#if defined(HAVE_CC_TLS) && !defined (USE_EMUTLS)
+__thread
+#endif
+int __gcov_block_malloc;
+
 /* By default, the C++ compiler will use function addresses in the
    vtable entries.  Setting TARGET_VTABLE_USES_DESCRIPTORS to nonzero
    tells the compiler to use function descriptors instead.  The value
diff --git a/libgcc/libgcov.h b/libgcc/libgcov.h
index e70cf63b414..2eaed9548ea 100644
--- a/libgcc/libgcov.h
+++ b/libgcc/libgcov.h
@@ -247,6 +247,12 @@ struct indirect_call_tuple
   /* Pointer to counters.  */
   gcov_type *counters;
 };
+
+extern
+#if defined(HAVE_CC_TLS) && !defined (USE_EMUTLS)
+__thread
+#endif
+int __gcov_block_malloc;
   
 /* Exactly one of these will be active in the process.  */
 extern struct gcov_master __gcov_master;
@@ -428,7 +434,12 @@ allocate_gcov_kvp (void)
 #endif
 
   if (new_node == NULL)
-    new_node = (struct gcov_kvp *)xcalloc (1, sizeof (struct gcov_kvp));
+    {
+#if !defined(IN_GCOV_TOOL)
+      if (!__gcov_block_malloc)
+	new_node = (struct gcov_kvp *)xcalloc (1, sizeof (struct gcov_kvp));
+#endif
+    }
 
   return new_node;
 }

2) make a dynamic allocation for counters for TOPN at the very beginning in __gcov_init.

@Honza: What do you think about it?
Comment 24 Martin Liška 2020-12-04 07:01:20 UTC
@Honza: PING
Comment 25 Richard Biener 2021-01-21 09:26:40 UTC
How about allocating memory by other means (brk/mmap) instead?  That is, run
your own simple allocator?  I suppose gcov never frees things (until the end)
so it could be quite simple one?  And fall back to static allocated counters
for targets that we do not handle?  Is there a command-line option to
choose "static" counters?

More possibilities involve libdl and dlsym to iterate to 'malloc' from a
different shared object than the one profiled.  Or on glibc platforms use
__malloc instead of malloc (via alias).

That said, using plain 'malloc' was a bad idea.
Comment 26 Martin Liška 2021-01-26 11:51:43 UTC
I sent a patch that uses the mmap approach.
Comment 27 GCC Commits 2021-03-03 13:22:08 UTC
The master branch has been updated by Martin Liska <marxin@gcc.gnu.org>:

https://gcc.gnu.org/g:00d79dc4be0b86ec564cfa2b32c47de6c07449e6

commit r11-7479-g00d79dc4be0b86ec564cfa2b32c47de6c07449e6
Author: Martin Liska <mliska@suse.cz>
Date:   Wed Jan 13 11:17:03 2021 +0100

    gcov: use mmap pools for KVP.
    
    gcc/ChangeLog:
    
            PR gcov-profile/97461
            * gcov-io.h (GCOV_PREALLOCATED_KVP): Remove.
    
    libgcc/ChangeLog:
    
            PR gcov-profile/97461
            * config.in: Regenerate.
            * configure: Likewise.
            * configure.ac: Check sys/mman.h header file
            * libgcov-driver.c (struct gcov_kvp): Remove static
            pre-allocated pool and use a dynamic one.
            * libgcov.h (MMAP_CHUNK_SIZE): New.
            (gcov_counter_add): Use mmap to allocate pool for struct
            gcov_kvp.
Comment 28 Martin Liška 2021-03-03 13:27:18 UTC
Should be fixed now.
Comment 29 Jakub Jelinek 2021-03-03 13:31:49 UTC
CCing some folks familiar with Windows, while Martin has committed a mmap based solution, I think it will not work on Windows, but CreateFileMapping/MapViewOfFile/UnmapViewOfFile/CloseHandle could be a Windows replacement for that:
https://gcc.gnu.org/pipermail/gcc-patches/2021-February/565575.html
Can anyone of you please have a look into that?
I don't have access to Windows (and my experience with it is 25+ years old anyway), and Martin doesn't have either.
Comment 30 Martin Liška 2021-03-03 14:49:25 UTC
> CCing some folks familiar with Windows, while Martin has committed a mmap
> based solution, I think it will not work on Windows, but

Note the libgcov still falls back to malloc. So if there's not a custom allocator (like tcmalloc) that does not allow recursively, all should be fine.
Comment 31 LIU Hao 2021-03-03 15:14:42 UTC
(In reply to Jakub Jelinek from comment #29)
> CCing some folks familiar with Windows, while Martin has committed a mmap
> based solution, I think it will not work on Windows, but
> CreateFileMapping/MapViewOfFile/UnmapViewOfFile/CloseHandle could be a
> Windows replacement for that:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-February/565575.html
> Can anyone of you please have a look into that?
> I don't have access to Windows (and my experience with it is 25+ years old
> anyway), and Martin doesn't have either.

Windows has native heap management functions [1] so there is no need to play with file mappings like `mmap()`.

`malloc(size)` becomes `HeapAlloc(GetProcessHeap(), 0, size)`, and
`calloc(nmemb, size)` becomes `HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, nmemb * size)` except for the overflow check, and
`realloc(ptr, size)` becomes `ptr ? HeapReAlloc(GetProcessHeap(), 0, ptr, size) : HeapAlloc(GetProcessHeap(), 0, size)`, and
`free(ptr)` becomes `ptr && HeapFree(GetProcessHeap(), ptr)`.


[1] https://docs.microsoft.com/en-us/windows/win32/api/heapapi/nf-heapapi-heapalloc
Comment 32 Sergei Trofimovich 2021-03-05 18:11:37 UTC
(In reply to Martin Liška from comment #28)
> Should be fixed now.

Tested today's gcc against firefox-86 in lto+pgo mode. Built fine. Thank you!
Comment 33 Martin Liška 2021-03-06 08:19:12 UTC
> Tested today's gcc against firefox-86 in lto+pgo mode. Built fine. Thank you!

I thank you. You tested various attempts I proposed :)