[Bug gcov-profile/96913] gcc-11: __gcov_merge_topn hangs

slyfox at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Sun Sep 6 10:23:16 GMT 2020


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96913

--- Comment #4 from Sergei Trofimovich <slyfox at gcc dot gnu.org> ---
(In reply to Sergei Trofimovich from comment #3)
> Specifically I think this is already a wrong format on disk:
> 
> > _json.gcda:    01a70000:   0:COUNTERS topn 0 counts
> > _json.gcda:    01a90000:  48:COUNTERS indirect_call 24 counts
> > _json.gcda:                   0: 1 1 140325305737168 1 1 140325305737200 0 0
> > _json.gcda:                   8: 0 0 0 0 0 0 0 0
> > _json.gcda:                  16: 0 0 0 0 0 0 0 0
> > ...
> 
> Assuming indirect_call is in a 'hist' value format it should  be in form of:
> 
>   [total_executions, N, value1, counter1, ..., valueN, counterN]
> 
> Main problem: we have more than one entry here (which might be ok):
> - record1 (ok):  total_executions=1 N=1 value1=140325305737168 counter1=1
> - record2 (bad): total_executions=1 N=140325305737200 counter=0 ...
> 
> This is where we trip over enormous N.

Found one of the causes of profiling data corruption. The bug happened earlier
on initial serialization of indirect counters functions. The problematic code
is:

https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libgcc/libgcov-driver.c;h=58914268d4ece0b3a3a7dcb9cb21c4fa197fd770;hb=HEAD#l427

"""
 417       ci_ptr = gfi_ptr->ctrs;
 418       for (t_ix = 0; t_ix < GCOV_COUNTERS; t_ix++)
 419         {
 420           gcov_position_t n_counts;
 421 
 422           if (!gi_ptr->merge[t_ix])
 423             continue;
 424 
 425           n_counts = ci_ptr->num;
 426 
 427           if (gi_ptr->merge[t_ix] == __gcov_merge_topn)
 428             write_top_counters (ci_ptr, t_ix, n_counts);
 429           else
 430             {
 431               /* Do not stream when all counters are zero.  */
 432               int all_zeros = 1;
 433               for (unsigned i = 0; i < n_counts; i++)
 434                 if (ci_ptr->values[i] != 0)
 435                   {
 436                     all_zeros = 0;
 437                     break;
 438                   }
"""

The problematic line here is 'if (gi_ptr->merge[t_ix] == __gcov_merge_topn)'.

In case of tauthon '__gcov_merge_topn' is defined in two places:
1. in 'tauthon' binary itself
2. in any other shared library loaded by tauthon. Looks like
'libtauthon2.8.so.1.0' is the first one.

'__gcov_merge_topn' is defined as hidden symbol and gets resolved to local
symbol:

$ x86_64-pc-linux-gnu-nm tauthon | fgrep gcov_merge_top
000000000040387f t __gcov_merge_topn
$ x86_64-pc-linux-gnu-nm libtauthon2.8.so.1.0 | fgrep gcov_merge_top
000000000029a202 t __gcov_merge_topn

Don't know yet know where 'gi_ptr->merge' gets filled in to leak executable's
symbol into binary.


More information about the Gcc-bugs mailing list