Bug 67992 - GCOV takes an absurdly long time to process a file
Summary: GCOV takes an absurdly long time to process a file
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: gcov-profile (show other bugs)
Version: unknown
: P3 normal
Target Milestone: ---
Assignee: Martin Liška
URL:
Keywords:
: 71046 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-10-16 18:26 UTC by Joshua Cranmer
Modified: 2016-08-05 13:55 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-05-20 00:00:00


Attachments
Simple file that exhibits the absurd slowdown. (1.03 KB, text/x-csrc)
2015-10-16 18:26 UTC, Joshua Cranmer
Details
Implementation of Hawick's algorithm in C++ (5.06 KB, text/plain)
2015-10-16 18:44 UTC, Joshua Cranmer
Details
reproducer based off linux kernel fs/xfs/libxfs/xfs_sb.c (486 bytes, text/plain)
2015-12-11 11:38 UTC, Jan Stancek
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Joshua Cranmer 2015-10-16 18:26:15 UTC
Created attachment 36529 [details]
Simple file that exhibits the absurd slowdown.

The loop processing code in gcov takes an absurdly long time to compile relatively small files. This comes most obviously into play if you have macro-heavy unrolled code (<https://dxr.mozilla.org/comm-central/source/mozilla/media/libjpeg/jchuff.c#562> is the case in mind here--it's so slow my scripts delete the file rather than wait to process that file).

How slow is it? With the attached file:
jcranmer@huitzilopochtli /tmp/gcov-bug $ gcc --coverage min.c
jcranmer@huitzilopochtli /tmp/gcov-bug $ ./a.out 
jcranmer@huitzilopochtli /tmp/gcov-bug $ time gcov -a -b -p min.c.gcda
File 'min.c'
Lines executed:25.00% of 8
Branches executed:0.00% of 168
Taken at least once:0.00% of 168
No calls
Creating 'min.c.gcov'


real    11m20.474s
user    11m21.476s
sys     0m0.000s

Some of the literature I've seen claims that the algorithm in use's runtime isn't O(N^3) (as gcov's comment claims) but actually O(k^N).

By comparison, using Hawick's algorithm (see <http://complexity.massey.ac.nz/cstn/013/cstn-013.pdf>), it took just:
jcranmer@huitzilopochtli /tmp/gcov-bug $ time ~/source/mozilla-tools/newcov/newcov min.c.gcda 
real    0m0.113s
user    0m0.112s
sys     0m0.000s
Comment 1 Joshua Cranmer 2015-10-16 18:44:45 UTC
Created attachment 36531 [details]
Implementation of Hawick's algorithm in C++

This is test code I wrote to figure out why I couldn't reproduce the output of gcov correctly (which eventually led me to discovering bug 67937). It's an ersatz variant of gcov (whose code is not included), except the latter half of processing was replaced with my own code. So there's a mixture of use of both gcov's arc_t, block_t, etc. structures with my own C++ classes Arc/Block/etc. I also ripped out the code that supported options I didn't need to use (I effectively only do gcov -a -b -p).

The main code in question is getCycleCounts (on line 494) and the cycle detection code that comprises the prior 100 lines of code. The has_negate/rerunning findCycles trick is needed because of bug 67937. This roughly replaces the main Tiernan's algorithm loop of accumulate_line_counts (about line 2200 of gcov.c) (the rest of the function is more fully captured by LineInfo::computeCounts/CoverageMap::computeLineCounts).

I'd do a patch myself, but, honestly, the C code of gcov is painful for me to follow, and I've never set myself up to do gcc development.
Comment 2 Richard Biener 2015-10-19 09:58:21 UTC
I think there is a duplicate about gcov slowness for some sort of CFGs.
Comment 3 Jan Stancek 2015-12-11 11:38:02 UTC
Created attachment 36992 [details]
reproducer based off linux kernel fs/xfs/libxfs/xfs_sb.c

I came across this problem while running lcov on gcov enabled kernel. gcov is really struggling with xfs_sb.c - it takes up to _36 hours_ to complete on Intel Xeon CPUs.

I narrowed it down to following condition:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/xfs/libxfs/xfs_sb.c?h=v4.4-rc4#n229

And based on that I made the attached reproducer. To reproduce run:

gcc -O2 -fprofile-arcs -ftest-coverage test1.c
./a.out
gcov -b -c -d -a test1.gcda
Comment 4 Jakub Jelinek 2016-05-10 12:24:36 UTC
*** Bug 71046 has been marked as a duplicate of this bug. ***
Comment 5 Jakub Jelinek 2016-05-10 12:25:10 UTC
Shorter testcase:
struct A
{
  char a1, a2, a3, a4, a5, a6, a7, a8, a9;
  char b1, b2, b3, b4, b5, b6, b7, b8, b9;
  char c1, c2, c3, c4, c5;
};

struct A s;

int
main ()
{
  if (__builtin_expect (s.a1 < 0 || s.a1 > 200 || s.a2 < 0 || s.a2 > 200
			|| s.a3 < 0 || s.a3 > 200 || s.a4 < 0 || s.a4 > 200
			|| s.a5 < 0 || s.a5 > 200 || s.a6 < 0 || s.a6 > 200
			|| s.a7 < 0 || s.a7 > 200 || s.a8 < 0 || s.a8 > 200
			|| s.a9 < 0 || s.a9 > 200 || s.b1 < 0 || s.b1 > 200
			|| s.b2 < 0 || s.b2 > 200 || s.b3 < 0 || s.b3 > 200
			|| s.b4 < 0 || s.b4 > 200 || s.b5 < 0 || s.b5 > 200
			|| s.b6 < 0 || s.b6 > 200 || s.b7 < 0 || s.b7 > 200
			|| s.b8 < 0 || s.b8 > 200 || s.b9 < 0 || s.b9 > 200
			|| s.c1 < 0 || s.c1 > 200 || s.c2 < 0 || s.c2 > 200
			|| s.c3 < 0 || s.c3 > 200 || s.c4 < 0 || s.c4 > 200
			|| s.c5 < 0 || s.c5 > 200, 0))
    __builtin_printf ("hello\n");
  return 0;
}
Comment 6 Jan Hubicka 2016-05-20 14:45:54 UTC
I will take a look.  The algorithm choosing minimal spanning tree can be adjusted to make the solving algorithm converge faster. I will need to re-think the details. It is well over a decade.
Comment 7 Jan Hubicka 2016-05-20 14:48:38 UTC
Joshua: gcov is seriously old tool. There is quite some need to handle profile in more generic matter (for autoFDO/LTO and other cases). So if you have more flexible implementation, it would make sense to replace it ;))
Comment 8 Martin Liška 2016-08-05 13:55:51 UTC
Fixed on trunk in r239169.