Bug 92711 - GCC 10 libxul.so -fprofile-generate binary is 360MB while clang needs only 163MB.
Summary: GCC 10 libxul.so -fprofile-generate binary is 360MB while clang needs only 16...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 10.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: mozillametabug
  Show dependency treegraph
 
Reported: 2019-11-28 16:03 UTC by Jan Hubicka
Modified: 2020-01-30 10:59 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-01-30 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Hubicka 2019-11-28 16:03:31 UTC
It seems that profiling became more expensive in GCC10 compared to clang or previous GCC releases.
Clang binary is here https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/H_iSouCVTha9mEw9y5XO5Q/runs/0/artifacts/public/build/target.tar.bz2
more or less comparable GCC build is here 
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/NOUqVShcSMaJn5j3g5nEYg/runs/0/artifacts/public/build/target.tar.bz2
It also seems that profile streaming is slower in GCC build (which is important since Firefox forks multiple times on startup and then when creating new tab and that triggers profile data streamout).
Comment 1 Jan Hubicka 2019-11-28 17:21:12 UTC
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/ObkoHsHHSriQdU0Twc12Wg/runs/0/artifacts/public/build/target.tar.bz2
This is GCC9 build. 310MB, so still a lot bigger than clang, but better than gcc10.
Comment 2 Jan Hubicka 2019-11-28 17:35:36 UTC
Actually what I thought is GCC9 build is actually GCC10 build.  Seems that today profile fixes made the binary noticeably smaller which seems promising. But it is still very large.
Comment 3 Jan Hubicka 2019-11-28 18:39:28 UTC
Proper GCC 9 -fprofile-generate build is 296MB https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/aMGsffWPQ1qzjgj4LIqcwQ/runs/0/artifacts/public/build/target.tar.bz2
So about 5% regression compared to gcc9
Comment 4 Richard Biener 2019-11-28 19:47:26 UTC
Less early inlining causes more instrumentation?  You'd see the same for tramp3d I guess.
Comment 5 Martin Liška 2019-11-29 13:42:20 UTC
One particular change that has happened in the GCC 10 devel cycle is that we started using TOP N counters for indirect calls and value profiling. Right now, we track 4 key:value pairs for each counter plus one counter for total number of executions.
Comment 6 Jan Hubicka 2019-11-29 13:49:28 UTC
With GCC9 like inliner parameters I get 308MB binary, so it is still
somehwat bigger.

Honza
Comment 7 Martin Liška 2019-11-29 13:57:44 UTC
(In reply to Jan Hubicka from comment #6)
> With GCC9 like inliner parameters I get 308MB binary, so it is still
> somehwat bigger.
> 
> Honza

I would try to set:
#define GCOV_TOPN_VALUES 1
then you should see the different of TOP N counters.