This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Interesting observations wrt FDO and tramp3d-v4
- From: Richard Guenther <rguenther at suse dot de>
- To: gcc at gcc dot gnu dot org
- Date: Thu, 8 Dec 2005 13:11:53 +0100 (CET)
- Subject: Interesting observations wrt FDO and tramp3d-v4
I have added FDO runs to the daily tramp3d tester and am observing
"intersting" things there. First of all, compile time with
-fprofile-generate (w/o leafify) skyrocketed from ~120s to 440s.
For reference, here's the hot spots in -ftime-report:
life analysis : 24.66 ( 6%) usr 0.00 ( 0%) sys 24.52 ( 5%)
wall 16086 kB ( 0%) ggc
integration : 13.67 ( 3%) usr 0.05 ( 0%) sys 13.67 ( 3%)
wall 806431 kB (23%) ggc
tree PTA : 10.17 ( 2%) usr 0.10 ( 1%) sys 10.24 ( 2%)
wall 20425 kB ( 1%) ggc
tree SSA incremental : 19.58 ( 5%) usr 0.21 ( 2%) sys 20.28 ( 5%)
wall 27383 kB ( 1%) ggc
tree operand scan : 11.87 ( 3%) usr 4.51 (35%) sys 16.62 ( 4%)
wall 94887 kB ( 3%) ggc
dominator optimization: 16.60 ( 4%) usr 0.06 ( 0%) sys 16.24 ( 4%)
wall 210301 kB ( 6%) ggc
expand : 23.51 ( 5%) usr 0.10 ( 1%) sys 23.15 ( 5%)
wall 310872 kB ( 9%) ggc
CSE : 52.40 (12%) usr 0.05 ( 0%) sys 52.44 (12%)
wall 24796 kB ( 1%) ggc
loop analysis : 20.06 ( 5%) usr 0.12 ( 1%) sys 20.23 ( 5%)
wall 26703 kB ( 1%) ggc
CSE 2 : 25.68 ( 6%) usr 0.01 ( 0%) sys 25.88 ( 6%)
wall 1360 kB ( 0%) ggc
global alloc : 14.93 ( 3%) usr 0.08 ( 1%) sys 14.86 ( 3%)
wall 65979 kB ( 2%) ggc
reload CSE regs : 16.20 ( 4%) usr 0.04 ( 0%) sys 16.56 ( 4%)
wall 49571 kB ( 1%) ggc
rename registers : 10.76 ( 2%) usr 0.03 ( 0%) sys 10.67 ( 2%)
wall 6109 kB ( 0%) ggc
TOTAL : 434.71 12.95 448.78
3461889 kB
look at those CSE numbers! (this is all with release checking only)
2nd, runtime of the profile generating binary raised by a factor of 50
(this is just an -O2 compile, basically)
Now, the interesting thing is, that with -fprofile-use, compile time
halved from the 120s to 62s. Nice. And the performance is exactly
the same as a non-FDO (non leafify) binary, which suggests, that we
can improve inlining heuristics wrt compile-time without regressing
in runtime performance.
The profile generating numbers suggest we're either doing something
stupid, or that we want some heuristics applied to not instrument
every edge, but only interesting ones.
Richard.