Compilation of large random forest models with g++

NightStrike nightstrike@gmail.com
Fri Jun 28 16:21:00 GMT 2019


On Thu, Jun 27, 2019 at 5:33 PM Mikail Yayla
<mikail.yayla@tu-dortmund.de> wrote:
>
> Thanks for your reply. I faced the same issues with C. How much ram do use on the machine you are compiling the forests on ?

I didn't have any current data, so I ran a new compilation on new
hardware with a relatively newer compiler (8.2.0 built myself from
source).  The results were better than previous at 35 minutes compile
time.  The following is for a 1.3 GB source file with 15.5 million
lines, with the first part being GCC's output and the second part
being time's output.  It looks like it needed 1.5 GB memory:

$ /usr/bin/time --verbose gcc a.c -c -o delete.o -ftime-report -O3 -march=native

Time variable                                   usr           sys
    wall               GGC
 phase setup                        :   0.00 (  0%)   0.00 (  0%)
0.00 (  0%)    1508 kB (  0%)
 phase parsing                      :  94.09 (  5%)  91.01 ( 70%)
185.02 (  9%) 2490103 kB (  7%)
 phase opt and generate             :1825.26 ( 95%)  37.61 (
29%)1862.19 ( 91%)35542183 kB ( 93%)
 phase last asm                     :   3.22 (  0%)   0.13 (  0%)
3.35 (  0%)   98002 kB (  0%)
 garbage collection                 :  22.95 (  1%)   1.44 (  1%)
24.36 (  1%)       0 kB (  0%)
 dump files                         :   0.04 (  0%)   0.08 (  0%)
0.10 (  0%)       0 kB (  0%)
 callgraph construction             :   5.85 (  0%)   0.26 (  0%)
6.15 (  0%)  786668 kB (  2%)
 callgraph optimization             :   1.70 (  0%)   0.01 (  0%)
1.89 (  0%)      12 kB (  0%)
 ipa function summary               :   3.96 (  0%)   0.02 (  0%)
3.85 (  0%)    1007 kB (  0%)
 ipa devirtualization               :   0.00 (  0%)   0.00 (  0%)
0.01 (  0%)       0 kB (  0%)
 ipa cp                             :   1.20 (  0%)   0.00 (  0%)
1.26 (  0%)     205 kB (  0%)
 ipa inlining heuristics            :   0.52 (  0%)   0.00 (  0%)
0.54 (  0%)       0 kB (  0%)
 ipa pure const                     :   3.58 (  0%)   0.07 (  0%)
3.54 (  0%)    1148 kB (  0%)
 ipa icf                            :   2.83 (  0%)   0.12 (  0%)
2.95 (  0%)       0 kB (  0%)
 ipa SRA                            :   0.00 (  0%)   0.00 (  0%)
0.01 (  0%)       0 kB (  0%)
 cfg construction                   :   3.28 (  0%)   0.08 (  0%)
3.52 (  0%)  401229 kB (  1%)
 cfg cleanup                        :  60.42 (  3%)   0.07 (  0%)
60.63 (  3%)  733716 kB (  2%)
 trivially dead code                :   8.86 (  0%)   0.00 (  0%)
8.84 (  0%)       0 kB (  0%)
 df scan insns                      :   6.43 (  0%)   0.00 (  0%)
6.32 (  0%)      23 kB (  0%)
 df multiple defs                   :   5.68 (  0%)   0.00 (  0%)
5.45 (  0%)       0 kB (  0%)
 df reaching defs                   :  15.96 (  1%)   0.00 (  0%)
15.92 (  1%)       0 kB (  0%)
 df live regs                       :  57.93 (  3%)   0.00 (  0%)
57.09 (  3%)       0 kB (  0%)
 df live&initialized regs           :  33.13 (  2%)   0.01 (  0%)
34.50 (  2%)       0 kB (  0%)
 df must-initialized regs           :  19.95 (  1%)   0.00 (  0%)
20.01 (  1%)       0 kB (  0%)
 df use-def / def-use chains        :   5.77 (  0%)   0.00 (  0%)
6.02 (  0%)       0 kB (  0%)
 df reg dead/unused notes           :  25.88 (  1%)   0.04 (  0%)
25.82 (  1%)  559983 kB (  1%)
 register information               :   3.42 (  0%)   0.00 (  0%)
3.22 (  0%)       0 kB (  0%)
 alias analysis                     :  20.58 (  1%)   0.00 (  0%)
21.35 (  1%) 1280256 kB (  3%)
 alias stmt walking                 :   9.64 (  1%)   5.32 (  4%)
15.48 (  1%)      62 kB (  0%)
 register scan                      :   3.05 (  0%)   0.00 (  0%)
2.97 (  0%)       0 kB (  0%)
 rebuild jump labels                :   7.25 (  0%)   0.41 (  0%)
7.95 (  0%)       0 kB (  0%)
 preprocessing                      :  33.00 (  2%)  22.51 ( 17%)
54.95 (  3%)  204148 kB (  1%)
 lexical analysis                   :  25.95 (  1%)  45.06 ( 35%)
70.75 (  3%)       0 kB (  0%)
 parser (global)                    :   0.02 (  0%)   0.01 (  0%)
0.00 (  0%)    1400 kB (  0%)
 parser function body               :  33.31 (  2%)  23.43 ( 18%)
57.50 (  3%) 2284437 kB (  6%)
 inline parameters                  :   8.78 (  0%)   0.09 (  0%)
8.95 (  0%)    1294 kB (  0%)
 integration                        :   0.04 (  0%)   0.01 (  0%)
0.05 (  0%)    6643 kB (  0%)
 tree gimplify                      :  19.63 (  1%)   1.35 (  1%)
20.98 (  1%) 4904633 kB ( 13%)
 tree eh                            :   2.27 (  0%)   0.00 (  0%)
2.28 (  0%)     172 kB (  0%)
 tree CFG construction              :   6.70 (  0%)   0.54 (  0%)
7.37 (  0%) 2432905 kB (  6%)
 tree CFG cleanup                   :  89.98 (  5%)   0.22 (  0%)
88.62 (  4%) 1126699 kB (  3%)
 tree tail merge                    : 106.47 (  6%)   0.00 (  0%)
106.69 (  5%)   63460 kB (  0%)
 tree VRP                           :  17.73 (  1%)   0.12 (  0%)
18.32 (  1%)   35989 kB (  0%)
 tree Early VRP                     :   3.33 (  0%)   0.00 (  0%)
3.14 (  0%)    4303 kB (  0%)
 tree copy propagation              :   3.92 (  0%)   0.00 (  0%)
4.09 (  0%)    2084 kB (  0%)
 tree PTA                           :   9.19 (  0%)   0.01 (  0%)
9.44 (  0%)     152 kB (  0%)
 tree PHI insertion                 :   0.84 (  0%)   0.03 (  0%)
1.04 (  0%)  256837 kB (  1%)
 tree SSA rewrite                   :   4.34 (  0%)   0.27 (  0%)
4.62 (  0%)  868231 kB (  2%)
 tree SSA other                     :   4.46 (  0%)   4.60 (  4%)
9.18 (  0%)      70 kB (  0%)
 tree SSA incremental               :  12.42 (  1%)   0.12 (  0%)
12.49 (  1%)  525620 kB (  1%)
 tree operand scan                  :  38.38 (  2%)   9.45 (  7%)
46.93 (  2%)  837640 kB (  2%)
 dominator optimization             : 264.70 ( 14%)   0.46 (  0%)
264.95 ( 13%) 2762703 kB (  7%)
 backwards jump threading           :   0.80 (  0%)   0.01 (  0%)
0.79 (  0%)     176 kB (  0%)
 tree SRA                           :   5.77 (  0%)   1.68 (  1%)
7.32 (  0%)   64297 kB (  0%)
 isolate eroneous paths             :   0.74 (  0%)   0.00 (  0%)
0.79 (  0%)       0 kB (  0%)
 tree CCP                           :  13.01 (  1%)   0.00 (  0%)
13.49 (  1%)     222 kB (  0%)
 tree PHI const/copy prop           :   0.32 (  0%)   0.00 (  0%)
0.41 (  0%)      28 kB (  0%)
 tree split crit edges              :   1.07 (  0%)   0.03 (  0%)
0.99 (  0%)  292721 kB (  1%)
 tree reassociation                 :   2.55 (  0%)   0.00 (  0%)
2.25 (  0%)       0 kB (  0%)
 tree PRE                           :   9.07 (  0%)   0.24 (  0%)
9.46 (  0%)  704185 kB (  2%)
 tree FRE                           :  50.06 (  3%)   4.95 (  4%)
54.72 (  3%) 1356907 kB (  4%)
 tree code sinking                  :   3.03 (  0%)   0.09 (  0%)
3.43 (  0%)  703792 kB (  2%)
 tree linearize phis                :   4.67 (  0%)   0.63 (  0%)
5.42 (  0%)  351072 kB (  1%)
 tree backward propagate            :   0.42 (  0%)   0.00 (  0%)
0.51 (  0%)       0 kB (  0%)
 tree forward propagate             :  12.18 (  1%)   0.89 (  1%)
12.89 (  1%)  635818 kB (  2%)
 tree phiprop                       :   0.21 (  0%)   0.01 (  0%)
0.26 (  0%)       0 kB (  0%)
 tree conservative DCE              :  12.30 (  1%)   0.60 (  0%)
13.47 (  1%)  256000 kB (  1%)
 tree aggressive DCE                :   7.00 (  0%)   0.42 (  0%)
6.93 (  0%)    1098 kB (  0%)
 tree buildin call DCE              :   0.19 (  0%)   0.00 (  0%)
0.16 (  0%)       0 kB (  0%)
 tree DSE                           :  65.64 (  3%)   0.00 (  0%)
65.67 (  3%)       0 kB (  0%)
 PHI merge                          :   0.23 (  0%)   0.00 (  0%)
0.28 (  0%)      19 kB (  0%)
 tree loop invariant motion         :   1.87 (  0%)   0.00 (  0%)
1.79 (  0%)       0 kB (  0%)
 scev constant prop                 :   0.10 (  0%)   0.00 (  0%)
0.03 (  0%)       7 kB (  0%)
 complete unrolling                 :   0.02 (  0%)   0.00 (  0%)
0.02 (  0%)     577 kB (  0%)
 tree slp vectorization             :   1.87 (  0%)   0.00 (  0%)
1.76 (  0%)   81741 kB (  0%)
 tree loop distribution             :   0.21 (  0%)   0.00 (  0%)
0.28 (  0%)       0 kB (  0%)
 tree iv optimization               :   0.04 (  0%)   0.00 (  0%)
0.04 (  0%)       0 kB (  0%)
 tree copy headers                  :   0.02 (  0%)   0.01 (  0%)
0.03 (  0%)     645 kB (  0%)
 tree SSA uncprop                   :   0.64 (  0%)   0.00 (  0%)
0.81 (  0%)       0 kB (  0%)
 tree switch conversion             :   0.07 (  0%)   0.00 (  0%)
0.07 (  0%)       0 kB (  0%)
 tree switch lowering               :   0.09 (  0%)   0.00 (  0%)
0.06 (  0%)       0 kB (  0%)
 gimple CSE sin/cos                 :   0.06 (  0%)   0.00 (  0%)
0.04 (  0%)       0 kB (  0%)
 gimple widening/fma detection      :   0.65 (  0%)   0.00 (  0%)
0.65 (  0%)       0 kB (  0%)
 tree strlen optimization           :   0.51 (  0%)   0.00 (  0%)
0.46 (  0%)       0 kB (  0%)
 dominance frontiers                :   4.17 (  0%)   0.00 (  0%)
4.85 (  0%)       0 kB (  0%)
 dominance computation              :  48.80 (  3%)   0.38 (  0%)
48.47 (  2%)       0 kB (  0%)
 control dependences                :   2.27 (  0%)   0.00 (  0%)
2.21 (  0%)       0 kB (  0%)
 out of ssa                         :   3.56 (  0%)   0.00 (  0%)
3.61 (  0%)      89 kB (  0%)
 expand vars                        :   9.21 (  0%)   0.02 (  0%)
9.47 (  0%)  183298 kB (  0%)
 expand                             :  24.82 (  1%)   0.43 (  0%)
25.20 (  1%) 3841647 kB ( 10%)
 post expand cleanups               :  12.22 (  1%)   0.68 (  1%)
12.33 (  1%) 1882283 kB (  5%)
 lower subreg                       :   0.04 (  0%)   0.00 (  0%)
0.09 (  0%)       0 kB (  0%)
 forward prop                       :  13.95 (  1%)   0.04 (  0%)
14.04 (  1%)  302340 kB (  1%)
 CSE                                :  52.01 (  3%)   0.01 (  0%)
52.40 (  3%)    5982 kB (  0%)
 dead code elimination              :  10.14 (  1%)   0.00 (  0%)
10.15 (  0%)       0 kB (  0%)
 dead store elim1                   :   9.66 (  1%)   0.02 (  0%)
9.36 (  0%)  148482 kB (  0%)
 dead store elim2                   :   9.44 (  0%)   0.01 (  0%)
9.58 (  0%)  137841 kB (  0%)
 loop analysis                      :   0.10 (  0%)   0.00 (  0%)
0.08 (  0%)       0 kB (  0%)
 loop init                          :  17.64 (  1%)   0.02 (  0%)
18.14 (  1%)    3420 kB (  0%)
 loop fini                          :   0.39 (  0%)   0.00 (  0%)
0.35 (  0%)       0 kB (  0%)
 CPROP                              :  61.25 (  3%)   0.16 (  0%)
61.61 (  3%) 1606298 kB (  4%)
 PRE                                :  43.75 (  2%)   0.00 (  0%)
43.98 (  2%)       0 kB (  0%)
 CSE 2                              :  42.86 (  2%)   0.00 (  0%)
42.99 (  2%)    9492 kB (  0%)
 branch prediction                  :   5.74 (  0%)   0.00 (  0%)
5.73 (  0%)     665 kB (  0%)
 combiner                           :  40.86 (  2%)   0.18 (  0%)
41.03 (  2%) 1335301 kB (  4%)
 if-conversion                      :  12.47 (  1%)   0.01 (  0%)
12.08 (  1%)  130958 kB (  0%)
 integrated RA                      :  56.03 (  3%)   0.03 (  0%)
55.98 (  3%) 1817584 kB (  5%)
 LRA non-specific                   :  27.22 (  1%)   0.02 (  0%)
27.14 (  1%)      27 kB (  0%)
 LRA virtuals elimination           :   1.97 (  0%)   0.00 (  0%)
2.12 (  0%)     187 kB (  0%)
 LRA create live ranges             :   7.22 (  0%)   0.00 (  0%)
7.04 (  0%)       0 kB (  0%)
 LRA hard reg assignment            :   1.88 (  0%)   0.00 (  0%)
2.03 (  0%)       0 kB (  0%)
 reload                             :   0.23 (  0%)   0.00 (  0%)
0.18 (  0%)       0 kB (  0%)
 reload CSE regs                    :  27.27 (  1%)   0.11 (  0%)
27.23 (  1%)  458199 kB (  1%)
 load CSE after reload              :   4.93 (  0%)   0.00 (  0%)
4.50 (  0%)       0 kB (  0%)
 ree                                :   6.81 (  0%)   0.00 (  0%)
6.84 (  0%)   56956 kB (  0%)
 thread pro- & epilogue             :   0.93 (  0%)   0.00 (  0%)
0.79 (  0%)     442 kB (  0%)
 if-conversion 2                    :   1.58 (  0%)   0.00 (  0%)
1.65 (  0%)      40 kB (  0%)
 split paths                        :   0.27 (  0%)   0.00 (  0%)
0.27 (  0%)       0 kB (  0%)
 combine stack adjustments          :   1.12 (  0%)   0.00 (  0%)
1.14 (  0%)       0 kB (  0%)
 peephole 2                         :   7.39 (  0%)   0.11 (  0%)
7.56 (  0%)  450261 kB (  1%)
 hard reg cprop                     :   8.19 (  0%)   0.00 (  0%)
8.12 (  0%)   11400 kB (  0%)
 scheduling 2                       :  66.85 (  3%)   0.02 (  0%)
67.00 (  3%)  154655 kB (  0%)
 machine dep reorg                  :   8.48 (  0%)   0.00 (  0%)
8.45 (  0%)     484 kB (  0%)
 reorder blocks                     :   6.98 (  0%)   0.00 (  0%)
6.19 (  0%)  103733 kB (  0%)
 shorten branches                   :   7.98 (  0%)   0.00 (  0%)
8.01 (  0%)       0 kB (  0%)
 final                              :  14.32 (  1%)   0.43 (  0%)
14.88 (  1%)  522494 kB (  1%)
 tree if-combine                    :   0.34 (  0%)   0.00 (  0%)
0.32 (  0%)       0 kB (  0%)
 straight-line strength reduction   :   0.65 (  0%)   0.00 (  0%)
0.66 (  0%)      70 kB (  0%)
 store merging                      :   0.14 (  0%)   0.00 (  0%)
0.07 (  0%)      72 kB (  0%)
 initialize rtl                     :   0.01 (  0%)   0.00 (  0%)
0.00 (  0%)      12 kB (  0%)
 address lowering                   :   0.03 (  0%)   0.00 (  0%)
0.01 (  0%)       0 kB (  0%)
 rest of compilation                :  20.06 (  1%)   0.14 (  0%)
19.97 (  1%)  334344 kB (  1%)
 remove unused locals               :   3.01 (  0%)   0.01 (  0%)
2.53 (  0%)       0 kB (  0%)
 address taken                      :   1.20 (  0%)   0.00 (  0%)
1.26 (  0%)       0 kB (  0%)
 repair loop structures             :   0.18 (  0%)   0.02 (  0%)
0.15 (  0%)      27 kB (  0%)
 TOTAL                              :1922.57        129.36
2051.17       38131807 kB
Command being timed: "gcc a.c -c -o delete.o -ftime-report -O3 -march=native"
User time (seconds): 1993.44
System time (seconds): 131.59
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 35:26.44
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 14667556
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 4316307
Voluntary context switches: 11113
Involuntary context switches: 215465
Swaps: 0
File system inputs: 65536
File system outputs: 1238168
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0



More information about the Gcc-help mailing list