Bug 47618 - Collecting multiple profiles and using all for PGO
Summary: Collecting multiple profiles and using all for PGO
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: gcov-profile (show other bugs)
Version: 4.6.0
: P3 enhancement
Target Milestone: 9.0
Assignee: Martin Liška
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-02-06 00:28 UTC by Roland Schulz
Modified: 2022-01-06 18:56 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-07-21 00:00:00


Attachments
Patch for adding merge-gcda (8.00 KB, text/plain)
2012-07-24 22:12 UTC, Andrew Pinski
Details
Patch candidate (1.46 KB, patch)
2017-06-06 19:05 UTC, Martin Liška
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Roland Schulz 2011-02-06 00:28:24 UTC
Currently only the file from one profiling run can be used for PGO. Especially for MPI programs it would be nice if several folders containing profiling files could be merged or several directories could be used together for -fprofile-use.

For saving the profiling files it would be great if the folder name could contain an environment variable or could be set by an environment variable.

Thus I suggest that one could either say:
-fprofile-dir /some/path/%q{SOME_ENV}  #same syntax as valgrind
or
export GCC_PROFILE_DIR=/some/path/$SOME_ENV

This would be very useful because MPI implementation provide the MPI rank as a environment variable. Thus with the suggestion one could store the profile of each MPI rank in a different folder.
Comment 1 Steven Bosscher 2012-07-21 23:57:18 UTC
A tool to merge multiple gcda files shoulnd't be very difficult to write. I don't think this should be done by the compiler itself, that would greatly complicate things. But a separate tool, gcov-merge say, would work, and this isn't a big job to create using libgcov (and gcov-dump as an example). You'd also be able to merge profile information from different directories.

Would something like the above work for you?
Comment 2 Andrew Pinski 2012-07-22 00:42:28 UTC
We have one internally at Cavium which is designed to run afterwards and merge a few gcda file.  It is designed for how we run multi-core programs and write a gcda file for each run.

And there one here:
http://gcc.gnu.org/ml/gcc-patches/2007-06/msg00423.html
Comment 3 xunxun 2012-07-22 06:56:11 UTC
(In reply to comment #1)
> A tool to merge multiple gcda files shoulnd't be very difficult to write. I
> don't think this should be done by the compiler itself, that would greatly
> complicate things. But a separate tool, gcov-merge say, would work, and this
> isn't a big job to create using libgcov (and gcov-dump as an example). You'd
> also be able to merge profile information from different directories.
> 
> Would something like the above work for you?

But for VC and Intel Compiler

they can auto merge all PGO information.

Will we make gcc to have the similar behavior?
Comment 4 Andrew Pinski 2012-07-22 07:06:47 UTC
-fprofile-dir= is already implemented.
Comment 5 Steven Bosscher 2012-07-22 10:23:37 UTC
(In reply to comment #3)
> (In reply to comment #1)
> > A tool to merge multiple gcda files shoulnd't be very difficult to write. I
> > don't think this should be done by the compiler itself, that would greatly
> > complicate things. But a separate tool, gcov-merge say, would work, and this
> > isn't a big job to create using libgcov (and gcov-dump as an example). You'd
> > also be able to merge profile information from different directories.
> > 
> > Would something like the above work for you?
> 
> But for VC and Intel Compiler
> 
> they can auto merge all PGO information.
> 
> Will we make gcc to have the similar behavior?

xunxun,

GCC does merge profile information from different runs into one gcda file. It works differently from ICC in that ICC produces one .dyn file per test run and uses prof_merge to generate merge multiple .dyn files into a summary file. GCC does this merging from multiple runs automatically.

What GCC does not do, is merge multiple gcda files (which would be the equivalent of merging multiple pgopti.dpi files with ICC).

The issue in this problem report, is that with MPI there will be multiple images of the same program running simultaneously. The different images can't share the same set of gcda files (you'd have races) so each image generates its own set of gcda files. For that, a new merge tool is necessary.

Ideally, this tool would also run transparently. One way to do this could be to take multiple arguments for -fprofile-dir and merge profile info from each directory.
Comment 6 Steven Bosscher 2012-07-22 10:46:30 UTC
(In reply to comment #2)
> We have one internally at Cavium which is designed to run afterwards and merge
> a few gcda file.  It is designed for how we run multi-core programs and write a
> gcda file for each run.

And now, of course, you're going to contribute that? ;-)


> And there one here:
> http://gcc.gnu.org/ml/gcc-patches/2007-06/msg00423.html

This merges results for files without their own gcno file but mentioned more than once in gcda files for multiple source files (e.g. for inline functions in headers). You can't merge multiple gcda files for one source file, but the patch does provide the infrastructure to support this.
Comment 7 Andrew Pinski 2012-07-24 22:12:44 UTC
Created attachment 27869 [details]
Patch for adding merge-gcda

here is the patch which adds merge-gcda .  I don't add any testcases as it is currently designed only for how Cavium's Simple-exec works in that each core writes out its own gcda file.
Comment 8 Andrew Pinski 2012-07-24 23:20:14 UTC
(In reply to comment #7)
> Created attachment 27869 [details]
> Patch for adding merge-gcda

I am changing the copyright over to the FSF based on the fact Cavium (Networks) has a blanket copyright assignment in place.  I just forgot to do it in the patch itself.
Comment 9 Roland Schulz 2012-07-24 23:52:41 UTC
I think a tool to merge would be a good partial solution.

As far as I can see what would still be missing for user-friendly usage, is a mechanism to guarantee that all pre-merged files are saved with different names, so that different processes don't overwrite each others output files. In the case of MPI one would want to have the mpi rank as part of the output folder to guarantee unique file names. Thus my suggestion to support -fprofile-dir /some/path/%q{SOME_ENV}, where SOME_ENV would be the environment variable containing the mpi rank. Without being able to make the output path depending on a environment variable one would be required to write some wrapper scripts and that might not even be possible in all cases.
Comment 10 Andrew Pinski 2012-07-25 00:05:40 UTC
> so that different processes don't overwrite each others output files. 

They don't overwrite each other, rather they are merged together at write out time.
Comment 11 Roland Schulz 2012-07-25 00:50:30 UTC
Steven wrote that they are not merged but that race conditions occur. That is also what I observed. To clarify: Message Passing Interface (MPI) is a parallelization method which executes the same binary multiple times in parallel (with support for messages for communication). Allowing to merge the output into one file at runtime would require file-locking (often over network file-systems) and would not scale because MPI applications are often used with more than >10000 (or even >1M) parallel processes simultaneous.
Comment 12 Steven Bosscher 2012-07-25 08:24:49 UTC
(In reply to comment #9)
> I think a tool to merge would be a good partial solution.

We will go with the tool solution. I'll take care of the tool before GCC 4.8, if that's OK with apinski.

I think we shouldn't have a new tool, though. I'd prefer to teach the gcov program to do it instead. What would you prefer?


> As far as I can see what would still be missing for user-friendly usage, is a
> mechanism to guarantee that all pre-merged files are saved with different
> names, so that different processes don't overwrite each others output files.

Deeply berried in the GCC manuals is this section:
http://gcc.gnu.org/onlinedocs/gcc-4.7.1/gcc/Cross_002dprofiling.html

With the right combination of GCOV_PREFIX_STRIP and GCOV_PREFIX, it should be possible to send the gcda files to unique directories per MPI rank. But I think that a more practical solution is necessary. (I also don't know how these environment variables interact with -profile-dir. I doubt anyone looked into this before now...)

I like the %q (and %p) variables from Valgrind, and I don't think it's very hard to add support for them in libgcov.
(http://valgrind.org/docs/manual/manual-core.html)
Comment 13 Martin Liška 2017-04-13 14:59:17 UTC
The issue is quite old, however it's probably still valid. Implementing similar to what valgrind does with '%p' and '%q{VAR}' is elegant solution. I can work on that for GCC8 when there's an interest?
Comment 14 Martin Liška 2017-06-06 19:05:52 UTC
Created attachment 41481 [details]
Patch candidate

I'm attaching patch that supports following expansion of -fprofile-dir (or arguments of -fprofile-generate and -fprofile-use) option value:

%w - expands during compile time to working directory; it's handy when one wants to preserve tree hierarchy of gcda files corresponding to another build directory
%p - expands during run-time to PID
%q{ENV} - expands to value of environmental variable 'ENV' during run-time

Having that, I guess we can eventually drop GCOV_PREFIX_STRIP and GCOV_PREFIX as one can use -fprofile-dir="%q{PREFIX}" and then e.g. set PREFIX="../my/folder/".

Feel free to comment the patch.
Comment 15 Martin Liška 2017-06-07 09:26:42 UTC
Adding Andrew, may I ask you for your opinion about suggested patch/approach?
Comment 16 Martin Liška 2017-11-09 09:19:58 UTC
I see any feedback, leaving the PR then ...
Comment 17 Petr Špaček 2017-12-23 22:26:08 UTC
I found this bug while searching for a way to solve exactly this problem, so for the record: It sounds like very good and useful addition. Thank you!
Comment 18 Martin Liška 2017-12-27 10:16:11 UTC
(In reply to Petr Špaček from comment #17)
> I found this bug while searching for a way to solve exactly this problem, so
> for the record: It sounds like very good and useful addition. Thank you!

Good to hear. Unfortunately the patch will be possible to land in GCC 9.x.
Is it acceptable for you?
Comment 19 Petr Špaček 2018-01-02 15:39:46 UTC
Sure, I would be happy with any version, thank you!

For people who want to generate code coverage reports for parallel executions, beware of https://github.com/linux-test-project/lcov/issues/37.
Comment 20 Martin Liška 2018-01-03 09:17:15 UTC
(In reply to Petr Špaček from comment #19)
> Sure, I would be happy with any version, thank you!
> 
> For people who want to generate code coverage reports for parallel
> executions, beware of https://github.com/linux-test-project/lcov/issues/37.

Good. I will do it in timeframe of stage1 of GCC 9.
Comment 21 Martin Liška 2018-06-05 12:10:54 UTC
Author: marxin
Date: Tue Jun  5 12:10:22 2018
New Revision: 261199

URL: https://gcc.gnu.org/viewcvs?rev=261199&root=gcc&view=rev
Log:
Support variables in expansion of -fprofile-generate option (PR gcov-profile/47618).

2018-06-05  Martin Liska  <mliska@suse.cz>

	PR gcov-profile/47618
	* doc/invoke.texi: Document how -fprofile-dir format
        is extended.
2018-06-05  Martin Liska  <mliska@suse.cz>

	PR gcov-profile/47618
	* libgcov-driver-system.c (replace_filename_variables): New
        function.
	(gcov_exit_open_gcda_file): Use it.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/doc/invoke.texi
    trunk/libgcc/ChangeLog
    trunk/libgcc/libgcov-driver-system.c
Comment 22 Martin Liška 2018-06-05 12:13:11 UTC
Implemented.
Comment 23 qinzhao 2019-04-24 22:10:15 UTC
(In reply to Andrew Pinski from comment #7)
> Created attachment 27869 [details]
> Patch for adding merge-gcda
> 
> here is the patch which adds merge-gcda .  I don't add any testcases as it
> is currently designed only for how Cavium's Simple-exec works in that each
> core writes out its own gcda file.

I recently found this bug due to a similar problem. looks like that there are two parts of work for this problem:

1. GCC's new feature to guarantee that all pre-merged files are saved with different names for different instances of the same process. 
2. a merge tool to merge all the gcda files afterwards. 

from my understanding, the patch for the above 1 has been committed into GCC9.
How about the patch for the above 2? has it been committed?
Comment 24 Martin Liška 2019-04-25 08:34:37 UTC
(In reply to qinzhao from comment #23)
> (In reply to Andrew Pinski from comment #7)
> > Created attachment 27869 [details]
> > Patch for adding merge-gcda
> > 
> > here is the patch which adds merge-gcda .  I don't add any testcases as it
> > is currently designed only for how Cavium's Simple-exec works in that each
> > core writes out its own gcda file.
> 
> I recently found this bug due to a similar problem. looks like that there
> are two parts of work for this problem:
> 
> 1. GCC's new feature to guarantee that all pre-merged files are saved with
> different names for different instances of the same process. 
> 2. a merge tool to merge all the gcda files afterwards. 
> 
> from my understanding, the patch for the above 1 has been committed into
> GCC9.

Yes.

> How about the patch for the above 2? has it been committed?

It has been there for a while, please take a look at:

$ gcov-tool merge --help
merge: unrecognized option '--help'
Merge subcomand usage:  merge [options] <dir1> <dir2>         Merge coverage file contents
    -o, --output <dir>                  Output directory
    -v, --verbose                       Verbose mode
    -w, --weight <w1,w2>                Set weights (float point values)
Comment 25 qinzhao 2019-04-30 21:18:34 UTC
(In reply to Martin Liška from comment #24)
> 
> > How about the patch for the above 2? has it been committed?
> 
> It has been there for a while, please take a look at:
> 
> $ gcov-tool merge --help
> merge: unrecognized option '--help'
> Merge subcomand usage:  merge [options] <dir1> <dir2>         Merge coverage
> file contents
>     -o, --output <dir>                  Output directory
>     -v, --verbose                       Verbose mode
>     -w, --weight <w1,w2>                Set weights (float point values)

two more questions on this merge tool:
1. it can only merge two directories at one time. So, for multiple directories, for example "n", we have to invoke gcov-tool merge n-1 times in order to merge all of them?
2. Intel compiler (icc)'s profmerge is able to merge all the .dyn files under one directory, does gcc have such functionality currently?
Comment 26 Martin Liška 2019-05-02 08:58:37 UTC
(In reply to qinzhao from comment #25)
> (In reply to Martin Liška from comment #24)
> > 
> > > How about the patch for the above 2? has it been committed?
> > 
> > It has been there for a while, please take a look at:
> > 
> > $ gcov-tool merge --help
> > merge: unrecognized option '--help'
> > Merge subcomand usage:  merge [options] <dir1> <dir2>         Merge coverage
> > file contents
> >     -o, --output <dir>                  Output directory
> >     -v, --verbose                       Verbose mode
> >     -w, --weight <w1,w2>                Set weights (float point values)
> 
> two more questions on this merge tool:
> 1. it can only merge two directories at one time. So, for multiple
> directories, for example "n", we have to invoke gcov-tool merge n-1 times in
> order to merge all of them?

Yep. I guess one can write a simple bash script that does that.

> 2. Intel compiler (icc)'s profmerge is able to merge all the .dyn files
> under one directory, does gcc have such functionality currently?

We have folder-base merging where we search for all .gcda files and we merge them to a destination folder.
Comment 27 Qing Zhao 2019-05-02 14:52:17 UTC
> --- Comment #26 from Martin Liška <marxin at gcc dot gnu.org> ---
> 
>> 2. Intel compiler (icc)'s profmerge is able to merge all the .dyn files
>> under one directory, does gcc have such functionality currently?
> 
> We have folder-base merging where we search for all .gcda files and we merge
> them to a destination folder.

could you please point me which command does this? thanks.
Comment 28 Martin Liška 2019-05-02 15:16:47 UTC
(In reply to Martin Liška from comment #26)
> (In reply to qinzhao from comment #25)
> > (In reply to Martin Liška from comment #24)
> > > 
> > > > How about the patch for the above 2? has it been committed?
> > > 
> > > It has been there for a while, please take a look at:
> > > 
> > > $ gcov-tool merge --help
> > > merge: unrecognized option '--help'
> > > Merge subcomand usage:  merge [options] <dir1> <dir2>         Merge coverage
> > > file contents
> > >     -o, --output <dir>                  Output directory
> > >     -v, --verbose                       Verbose mode
> > >     -w, --weight <w1,w2>                Set weights (float point values)
> > 
> > two more questions on this merge tool:
> > 1. it can only merge two directories at one time. So, for multiple
> > directories, for example "n", we have to invoke gcov-tool merge n-1 times in
> > order to merge all of them?
> 
> Yep. I guess one can write a simple bash script that does that.
> 
> > 2. Intel compiler (icc)'s profmerge is able to merge all the .dyn files
> > under one directory, does gcc have such functionality currently?
> 
> We have folder-base merging where we search for all .gcda files and we merge
> them to a destination folder.

$ echo "int main() {return 0;}" >> main.c && gcc --coverage main.c && ./a.out
$ mkdir a && mkdir b && cp main.gcda c && cp main.gcda b
$ gcov-tool merge a b -o a+b -v
reading file: ./main.gcda
tag one function id=108032747
reading file: ./main.gcda
tag one function id=108032747

$ ls a+b
main.gcda

$ gcov-dump a+b/main.gcda 
a+b/main.gcda:data:magic `gcda':version `A83*'
a+b/main.gcda:stamp 2031787297
a+b/main.gcda:  a3000000:  22:PROGRAM_SUMMARY checksum=0x33c369a8
a+b/main.gcda:                counts=1, runs=1, sum_all=2, run_max=2, sum_max=2
a+b/main.gcda:                counter histogram:
a+b/main.gcda:                 2: num counts=1, min counter=2, cum_counter=2
a+b/main.gcda:  01000000:   3:FUNCTION ident=108032747, lineno_checksum=0x3b5ee2be, cfg_checksum=0xdb5de9e8
a+b/main.gcda:    01a10000:   2:COUNTERS arcs 1 counts
Comment 29 Yury Gribov 2022-01-06 18:56:43 UTC
> > 1. it can only merge two directories at one time. So, for multiple
> > directories, for example "n", we have to invoke gcov-tool merge n-1 times in
> > order to merge all of them?
> 
> Yep. I guess one can write a simple bash script that does that.

I've added one at https://github.com/yugr/maintainer-scripts/blob/master/gcov-tool-many