This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Interesting observations wrt FDO and tramp3d-v4


I have added FDO runs to the daily tramp3d tester and am observing
"intersting" things there.  First of all, compile time with
-fprofile-generate (w/o leafify) skyrocketed from ~120s to 440s.
For reference, here's the hot spots in -ftime-report:

 life analysis         :  24.66 ( 6%) usr   0.00 ( 0%) sys  24.52 ( 5%) 
wall   16086 kB ( 0%) ggc
 integration           :  13.67 ( 3%) usr   0.05 ( 0%) sys  13.67 ( 3%) 
wall  806431 kB (23%) ggc
 tree PTA              :  10.17 ( 2%) usr   0.10 ( 1%) sys  10.24 ( 2%) 
wall   20425 kB ( 1%) ggc
 tree SSA incremental  :  19.58 ( 5%) usr   0.21 ( 2%) sys  20.28 ( 5%) 
wall   27383 kB ( 1%) ggc
 tree operand scan     :  11.87 ( 3%) usr   4.51 (35%) sys  16.62 ( 4%) 
wall   94887 kB ( 3%) ggc
 dominator optimization:  16.60 ( 4%) usr   0.06 ( 0%) sys  16.24 ( 4%) 
wall  210301 kB ( 6%) ggc
 expand                :  23.51 ( 5%) usr   0.10 ( 1%) sys  23.15 ( 5%) 
wall  310872 kB ( 9%) ggc
 CSE                   :  52.40 (12%) usr   0.05 ( 0%) sys  52.44 (12%) 
wall   24796 kB ( 1%) ggc
 loop analysis         :  20.06 ( 5%) usr   0.12 ( 1%) sys  20.23 ( 5%) 
wall   26703 kB ( 1%) ggc
 CSE 2                 :  25.68 ( 6%) usr   0.01 ( 0%) sys  25.88 ( 6%) 
wall    1360 kB ( 0%) ggc
 global alloc          :  14.93 ( 3%) usr   0.08 ( 1%) sys  14.86 ( 3%) 
wall   65979 kB ( 2%) ggc
 reload CSE regs       :  16.20 ( 4%) usr   0.04 ( 0%) sys  16.56 ( 4%) 
wall   49571 kB ( 1%) ggc
 rename registers      :  10.76 ( 2%) usr   0.03 ( 0%) sys  10.67 ( 2%) 
wall    6109 kB ( 0%) ggc
 TOTAL                 : 434.71            12.95           448.78            
3461889 kB

look at those CSE numbers! (this is all with release checking only)

2nd, runtime of the profile generating binary raised by a factor of 50
(this is just an -O2 compile, basically)

Now, the interesting thing is, that with -fprofile-use, compile time
halved from the 120s to 62s.  Nice.  And the performance is exactly
the same as a non-FDO (non leafify) binary, which suggests, that we
can improve inlining heuristics wrt compile-time without regressing
in runtime performance.

The profile generating numbers suggest we're either doing something
stupid, or that we want some heuristics applied to not instrument
every edge, but only interesting ones.

Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]