It seems that profiling became more expensive in GCC10 compared to clang or previous GCC releases. Clang binary is here https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/H_iSouCVTha9mEw9y5XO5Q/runs/0/artifacts/public/build/target.tar.bz2 more or less comparable GCC build is here https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/NOUqVShcSMaJn5j3g5nEYg/runs/0/artifacts/public/build/target.tar.bz2 It also seems that profile streaming is slower in GCC build (which is important since Firefox forks multiple times on startup and then when creating new tab and that triggers profile data streamout).
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/ObkoHsHHSriQdU0Twc12Wg/runs/0/artifacts/public/build/target.tar.bz2 This is GCC9 build. 310MB, so still a lot bigger than clang, but better than gcc10.
Actually what I thought is GCC9 build is actually GCC10 build. Seems that today profile fixes made the binary noticeably smaller which seems promising. But it is still very large.
Proper GCC 9 -fprofile-generate build is 296MB https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/aMGsffWPQ1qzjgj4LIqcwQ/runs/0/artifacts/public/build/target.tar.bz2 So about 5% regression compared to gcc9
Less early inlining causes more instrumentation? You'd see the same for tramp3d I guess.
One particular change that has happened in the GCC 10 devel cycle is that we started using TOP N counters for indirect calls and value profiling. Right now, we track 4 key:value pairs for each counter plus one counter for total number of executions.
With GCC9 like inliner parameters I get 308MB binary, so it is still somehwat bigger. Honza
(In reply to Jan Hubicka from comment #6) > With GCC9 like inliner parameters I get 308MB binary, so it is still > somehwat bigger. > > Honza I would try to set: #define GCOV_TOPN_VALUES 1 then you should see the different of TOP N counters.