Created attachment 36529 [details] Simple file that exhibits the absurd slowdown. The loop processing code in gcov takes an absurdly long time to compile relatively small files. This comes most obviously into play if you have macro-heavy unrolled code (<https://dxr.mozilla.org/comm-central/source/mozilla/media/libjpeg/jchuff.c#562> is the case in mind here--it's so slow my scripts delete the file rather than wait to process that file). How slow is it? With the attached file: jcranmer@huitzilopochtli /tmp/gcov-bug $ gcc --coverage min.c jcranmer@huitzilopochtli /tmp/gcov-bug $ ./a.out jcranmer@huitzilopochtli /tmp/gcov-bug $ time gcov -a -b -p min.c.gcda File 'min.c' Lines executed:25.00% of 8 Branches executed:0.00% of 168 Taken at least once:0.00% of 168 No calls Creating 'min.c.gcov' real 11m20.474s user 11m21.476s sys 0m0.000s Some of the literature I've seen claims that the algorithm in use's runtime isn't O(N^3) (as gcov's comment claims) but actually O(k^N). By comparison, using Hawick's algorithm (see <http://complexity.massey.ac.nz/cstn/013/cstn-013.pdf>), it took just: jcranmer@huitzilopochtli /tmp/gcov-bug $ time ~/source/mozilla-tools/newcov/newcov min.c.gcda real 0m0.113s user 0m0.112s sys 0m0.000s
Created attachment 36531 [details] Implementation of Hawick's algorithm in C++ This is test code I wrote to figure out why I couldn't reproduce the output of gcov correctly (which eventually led me to discovering bug 67937). It's an ersatz variant of gcov (whose code is not included), except the latter half of processing was replaced with my own code. So there's a mixture of use of both gcov's arc_t, block_t, etc. structures with my own C++ classes Arc/Block/etc. I also ripped out the code that supported options I didn't need to use (I effectively only do gcov -a -b -p). The main code in question is getCycleCounts (on line 494) and the cycle detection code that comprises the prior 100 lines of code. The has_negate/rerunning findCycles trick is needed because of bug 67937. This roughly replaces the main Tiernan's algorithm loop of accumulate_line_counts (about line 2200 of gcov.c) (the rest of the function is more fully captured by LineInfo::computeCounts/CoverageMap::computeLineCounts). I'd do a patch myself, but, honestly, the C code of gcov is painful for me to follow, and I've never set myself up to do gcc development.
I think there is a duplicate about gcov slowness for some sort of CFGs.
Created attachment 36992 [details] reproducer based off linux kernel fs/xfs/libxfs/xfs_sb.c I came across this problem while running lcov on gcov enabled kernel. gcov is really struggling with xfs_sb.c - it takes up to _36 hours_ to complete on Intel Xeon CPUs. I narrowed it down to following condition: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/xfs/libxfs/xfs_sb.c?h=v4.4-rc4#n229 And based on that I made the attached reproducer. To reproduce run: gcc -O2 -fprofile-arcs -ftest-coverage test1.c ./a.out gcov -b -c -d -a test1.gcda
*** Bug 71046 has been marked as a duplicate of this bug. ***
Shorter testcase: struct A { char a1, a2, a3, a4, a5, a6, a7, a8, a9; char b1, b2, b3, b4, b5, b6, b7, b8, b9; char c1, c2, c3, c4, c5; }; struct A s; int main () { if (__builtin_expect (s.a1 < 0 || s.a1 > 200 || s.a2 < 0 || s.a2 > 200 || s.a3 < 0 || s.a3 > 200 || s.a4 < 0 || s.a4 > 200 || s.a5 < 0 || s.a5 > 200 || s.a6 < 0 || s.a6 > 200 || s.a7 < 0 || s.a7 > 200 || s.a8 < 0 || s.a8 > 200 || s.a9 < 0 || s.a9 > 200 || s.b1 < 0 || s.b1 > 200 || s.b2 < 0 || s.b2 > 200 || s.b3 < 0 || s.b3 > 200 || s.b4 < 0 || s.b4 > 200 || s.b5 < 0 || s.b5 > 200 || s.b6 < 0 || s.b6 > 200 || s.b7 < 0 || s.b7 > 200 || s.b8 < 0 || s.b8 > 200 || s.b9 < 0 || s.b9 > 200 || s.c1 < 0 || s.c1 > 200 || s.c2 < 0 || s.c2 > 200 || s.c3 < 0 || s.c3 > 200 || s.c4 < 0 || s.c4 > 200 || s.c5 < 0 || s.c5 > 200, 0)) __builtin_printf ("hello\n"); return 0; }
I will take a look. The algorithm choosing minimal spanning tree can be adjusted to make the solving algorithm converge faster. I will need to re-think the details. It is well over a decade.
Joshua: gcov is seriously old tool. There is quite some need to handle profile in more generic matter (for autoFDO/LTO and other cases). So if you have more flexible implementation, it would make sense to replace it ;))
Fixed on trunk in r239169.