This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: why 6Gb RAM not enough to compile a 14Mb source [MELT]?
- From: "Richard Guenther" <richard dot guenther at gmail dot com>
- To: "Basile STARYNKEVITCH" <basile at starynkevitch dot net>
- Cc: gcc at gnu dot org
- Date: Wed, 4 Jun 2008 10:12:28 +0200
- Subject: Re: why 6Gb RAM not enough to compile a 14Mb source [MELT]?
- References: <484636A8.9040402@starynkevitch.net>
On Wed, Jun 4, 2008 at 8:31 AM, Basile STARYNKEVITCH
<basile@starynkevitch.net> wrote:
> Hello All,
>
> my MELT branch http://gcc.gnu.org/wiki/MiddleEndLispTranslator has a big
> source file in it warm-basilys-0.c. It is "self" generated, about 14Mbytes &
> almost 280KLOC (in rev136334). It ends with a big initialization routine of
> 100KLOC which mostly fills a 5000 member structure (each member being itself
> a small structure) and calls a few routines. This initialization routine has
> a simple control structure (no deeply nested blocks or loops).
>
> But gcc (either gcc-4.1 or 4.2 or 4.3 from Debian, or the bootsrapped trunk
> rev136331) can compile this file without any optimisation ie with -O0 -g3 in
> about 16 seconds and less than 1Gb RAM.
>
> But on my 6 Gbytes machine (Core2, 2400MHz, Debian/Sid/AMD64) the cc1
> process with -O2 (either 4.2, 4.3 or the trunk) eats nearly 10Gb of virtual
> memory and trashes (using 4.8Gb of RAM, 1% cpu time, waiting for the swap
> IO). The same happens with -O1. -Os is a bit better.
>
> The time to run the
> ./built-melt-cc-script warm-basilys-0.c warm-basilys-0.so
> which compiles warm-basilys-0.c with -O2 -fPIC is
>
> (you can set the MELT_EXTRACFLAGS environment variable to pass
> real 84m23.594s
> user 6m23.496s
> sys 1m5.032s
>
> I am attaching the -ftime-report output for information. One of the most
> demanding passes is tree operand scan
>
> I find this report misleading on the memory consumption total (1591718kB =
> 1.6Gb). The top command gives that cc1 needs nearly 10Gb of process space,
> and uses nearly 5G (and trashes).
>
> I won't be annoyed for long by this, since I'll soon split the
> warm-basilys.bysl file (and hence the generated files) in several distinct
> files. Until then, -O0 is enough for me.
>
> Are there any specific flags to pass to gcc to lower the RAM consumption
> (even at the expense of generated code quality)?
>
> Are there any pragma-s to disable (or lower) optimisation of a single
> routine?
>
> My intuition (and experience) is that gcc -O2 (or even -O1) time and space
> consumption is nearly quadratic on the size of the longest routine.
>
> Thanks for reading.
If it does structure initialization you can try --param
max-fields-for-field-sensitive=0 --param max-aliased-vops=0
Otherwise can you file a bugreport and attach the testcase there?
(bonus points if you have some that doesn't max out at 10GB but
maybe 2GB ;))
Thanks,
Richard.
>
> --
> Basile STARYNKEVITCH http://starynkevitch.net/Basile/
> email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
> 8, rue de la Faiencerie, 92340 Bourg La Reine, France
> *** opinions {are only mines, sont seulement les miennes} ***
>
>
> Execution times (seconds)
> garbage collection : 7.16 ( 2%) usr 0.45 ( 1%) sys 47.16 ( 1%) wall
> 0 kB ( 0%) ggc
> callgraph construction: 16.83 ( 4%) usr 0.10 ( 0%) sys 16.87 ( 0%) wall
> 41478 kB ( 3%) ggc
> callgraph optimization: 9.82 ( 3%) usr 0.11 ( 0%) sys 9.95 ( 0%) wall
> 9184 kB ( 1%) ggc
> ipa reference : 0.25 ( 0%) usr 0.02 ( 0%) sys 0.26 ( 0%) wall
> 52 kB ( 0%) ggc
> ipa pure const : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
> 0 kB ( 0%) ggc
> cfg cleanup : 2.76 ( 1%) usr 0.03 ( 0%) sys 2.91 ( 0%) wall
> 5120 kB ( 0%) ggc
> CFG verifier : 11.22 ( 3%) usr 0.69 ( 1%) sys 177.08 ( 3%) wall
> 0 kB ( 0%) ggc
> trivially dead code : 0.75 ( 0%) usr 0.00 ( 0%) sys 0.80 ( 0%) wall
> 0 kB ( 0%) ggc
> df reaching defs : 3.01 ( 1%) usr 0.49 ( 1%) sys 34.85 ( 1%) wall
> 0 kB ( 0%) ggc
> df live regs : 3.46 ( 1%) usr 0.06 ( 0%) sys 3.57 ( 0%) wall
> 0 kB ( 0%) ggc
> df live&initialized regs: 2.12 ( 1%) usr 0.00 ( 0%) sys 2.16 ( 0%)
> wall 0 kB ( 0%) ggc
> df use-def / def-use chains: 1.61 ( 0%) usr 0.02 ( 0%) sys 1.75 ( 0%)
> wall 0 kB ( 0%) ggc
> df reg dead/unused notes: 1.07 ( 0%) usr 0.04 ( 0%) sys 1.10 ( 0%)
> wall 15075 kB ( 1%) ggc
> register information : 0.51 ( 0%) usr 0.01 ( 0%) sys 0.45 ( 0%) wall
> 0 kB ( 0%) ggc
> alias analysis : 1.05 ( 0%) usr 0.01 ( 0%) sys 0.91 ( 0%) wall
> 19781 kB ( 1%) ggc
> register scan : 0.25 ( 0%) usr 0.01 ( 0%) sys 0.23 ( 0%) wall
> 163 kB ( 0%) ggc
> rebuild jump labels : 0.53 ( 0%) usr 0.00 ( 0%) sys 0.53 ( 0%) wall
> 0 kB ( 0%) ggc
> preprocessing : 1.24 ( 0%) usr 0.56 ( 1%) sys 1.93 ( 0%) wall
> 46597 kB ( 3%) ggc
> lexical analysis : 0.30 ( 0%) usr 0.81 ( 1%) sys 1.29 ( 0%) wall
> 0 kB ( 0%) ggc
> parser : 1.70 ( 0%) usr 0.49 ( 1%) sys 2.24 ( 0%) wall
> 123365 kB ( 8%) ggc
> inline heuristics : 0.63 ( 0%) usr 0.01 ( 0%) sys 0.62 ( 0%) wall
> 5491 kB ( 0%) ggc
> integration : 2.11 ( 1%) usr 0.22 ( 0%) sys 2.25 ( 0%) wall
> 168932 kB (11%) ggc
> tree gimplify : 1.86 ( 0%) usr 0.05 ( 0%) sys 1.78 ( 0%) wall
> 109046 kB ( 7%) ggc
> tree eh : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
> 0 kB ( 0%) ggc
> tree CFG construction : 0.22 ( 0%) usr 0.01 ( 0%) sys 0.23 ( 0%) wall
> 69444 kB ( 4%) ggc
> tree CFG cleanup : 3.42 ( 1%) usr 0.03 ( 0%) sys 4.15 ( 0%) wall
> 7307 kB ( 0%) ggc
> tree VRP : 3.69 ( 1%) usr 0.24 ( 0%) sys 11.89 ( 0%) wall
> 115325 kB ( 7%) ggc
> tree copy propagation : 1.80 ( 0%) usr 0.05 ( 0%) sys 3.50 ( 0%) wall
> 3511 kB ( 0%) ggc
> tree find ref. vars : 0.12 ( 0%) usr 0.01 ( 0%) sys 0.12 ( 0%) wall
> 9570 kB ( 1%) ggc
> tree PTA : 2.59 ( 1%) usr 0.61 ( 1%) sys 57.50 ( 1%) wall
> 17158 kB ( 1%) ggc
> tree alias analysis : 1.13 ( 0%) usr 0.33 ( 1%) sys 26.66 ( 1%) wall
> 2461 kB ( 0%) ggc
> tree call clobbering : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.22 ( 0%) wall
> 10 kB ( 0%) ggc
> tree flow sensitive alias: 0.46 ( 0%) usr 0.00 ( 0%) sys 0.53 ( 0%)
> wall 10992 kB ( 1%) ggc
> tree flow insensitive alias: 8.41 ( 2%) usr 0.06 ( 0%) sys 8.96 ( 0%)
> wall 0 kB ( 0%) ggc
> tree memory partitioning: 0.38 ( 0%) usr 0.01 ( 0%) sys 0.41 ( 0%)
> wall 111 kB ( 0%) ggc
> tree PHI insertion : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
> 119 kB ( 0%) ggc
> tree SSA rewrite : 1.44 ( 0%) usr 0.03 ( 0%) sys 1.46 ( 0%) wall
> 44376 kB ( 3%) ggc
> tree SSA other : 0.09 ( 0%) usr 0.09 ( 0%) sys 0.27 ( 0%) wall
> 0 kB ( 0%) ggc
> tree SSA incremental : 2.11 ( 1%) usr 0.14 ( 0%) sys 4.59 ( 0%) wall
> 4795 kB ( 0%) ggc
> tree operand scan : 80.93 (21%) usr 0.92 ( 1%) sys 82.92 ( 2%) wall
> 71551 kB ( 4%) ggc
> dominator optimization: 3.97 ( 1%) usr 0.06 ( 0%) sys 3.92 ( 0%) wall
> 84156 kB ( 5%) ggc
> tree SRA : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
> 0 kB ( 0%) ggc
> tree STORE-CCP : 0.47 ( 0%) usr 0.05 ( 0%) sys 0.69 ( 0%) wall
> 992 kB ( 0%) ggc
> tree CCP : 0.93 ( 0%) usr 0.00 ( 0%) sys 0.94 ( 0%) wall
> 1205 kB ( 0%) ggc
> tree PHI const/copy prop: 0.06 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%)
> wall 77 kB ( 0%) ggc
> tree split crit edges : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
> 21401 kB ( 1%) ggc
> tree reassociation : 0.43 ( 0%) usr 0.01 ( 0%) sys 0.45 ( 0%) wall
> 236 kB ( 0%) ggc
> tree PRE : 13.92 ( 4%) usr 52.21 (81%) sys4339.32 (86%) wall
> 109776 kB ( 7%) ggc
> tree FRE : 4.18 ( 1%) usr 2.51 ( 4%) sys 6.69 ( 0%) wall
> 61570 kB ( 4%) ggc
> tree code sinking : 0.53 ( 0%) usr 0.03 ( 0%) sys 1.54 ( 0%) wall
> 1578 kB ( 0%) ggc
> tree linearize phis : 0.16 ( 0%) usr 0.01 ( 0%) sys 0.14 ( 0%) wall
> 0 kB ( 0%) ggc
> tree forward propagate: 0.36 ( 0%) usr 0.03 ( 0%) sys 0.35 ( 0%) wall
> 2466 kB ( 0%) ggc
> tree phiprop : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
> 0 kB ( 0%) ggc
> tree conservative DCE : 0.93 ( 0%) usr 0.01 ( 0%) sys 0.91 ( 0%) wall
> 20 kB ( 0%) ggc
> tree aggressive DCE : 0.28 ( 0%) usr 0.00 ( 0%) sys 0.30 ( 0%) wall
> 0 kB ( 0%) ggc
> tree DSE : 0.35 ( 0%) usr 0.01 ( 0%) sys 0.33 ( 0%) wall
> 562 kB ( 0%) ggc
> PHI merge : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
> 0 kB ( 0%) ggc
> loop invariant motion : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
> 6 kB ( 0%) ggc
> complete unrolling : 0.31 ( 0%) usr 0.00 ( 0%) sys 0.30 ( 0%) wall
> 316 kB ( 0%) ggc
> tree iv optimization : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
> 7 kB ( 0%) ggc
> tree loop init : 0.29 ( 0%) usr 0.00 ( 0%) sys 0.33 ( 0%) wall
> 281 kB ( 0%) ggc
> tree loop fini : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
> 0 kB ( 0%) ggc
> tree copy headers : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall
> 524 kB ( 0%) ggc
> tree SSA uncprop : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
> 0 kB ( 0%) ggc
> tree SSA to normal : 52.85 (14%) usr 0.27 ( 0%) sys 53.12 ( 1%) wall
> 25180 kB ( 2%) ggc
> tree rename SSA copies: 0.22 ( 0%) usr 0.00 ( 0%) sys 0.28 ( 0%) wall
> 0 kB ( 0%) ggc
> tree SSA verifier : 21.08 ( 6%) usr 0.19 ( 0%) sys 21.67 ( 0%) wall
> 4603 kB ( 0%) ggc
> tree STMT verifier : 47.77 (12%) usr 1.47 ( 2%) sys 49.16 ( 1%) wall
> 0 kB ( 0%) ggc
> callgraph verifier : 0.86 ( 0%) usr 0.00 ( 0%) sys 0.93 ( 0%) wall
> 2891 kB ( 0%) ggc
> dominance frontiers : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.25 ( 0%) wall
> 0 kB ( 0%) ggc
> dominance computation : 3.59 ( 1%) usr 0.04 ( 0%) sys 3.55 ( 0%) wall
> 0 kB ( 0%) ggc
> control dependences : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
> 0 kB ( 0%) ggc
> expand : 11.91 ( 3%) usr 0.31 ( 0%) sys 21.34 ( 0%) wall
> 172552 kB (11%) ggc
> lower subreg : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
> 0 kB ( 0%) ggc
> jump : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
> 0 kB ( 0%) ggc
> forward prop : 0.71 ( 0%) usr 0.01 ( 0%) sys 0.87 ( 0%) wall
> 18126 kB ( 1%) ggc
> CSE : 4.33 ( 1%) usr 0.03 ( 0%) sys 4.51 ( 0%) wall
> 7344 kB ( 0%) ggc
> dead code elimination : 0.63 ( 0%) usr 0.00 ( 0%) sys 0.58 ( 0%) wall
> 0 kB ( 0%) ggc
> dead store elim1 : 1.24 ( 0%) usr 0.00 ( 0%) sys 1.27 ( 0%) wall
> 14629 kB ( 1%) ggc
> dead store elim2 : 0.65 ( 0%) usr 0.01 ( 0%) sys 0.65 ( 0%) wall
> 11488 kB ( 1%) ggc
> loop analysis : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.21 ( 0%) wall
> 278 kB ( 0%) ggc
> global CSE : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
> 0 kB ( 0%) ggc
> CPROP 1 : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
> 4114 kB ( 0%) ggc
> PRE : 0.34 ( 0%) usr 0.00 ( 0%) sys 0.46 ( 0%) wall
> 3000 kB ( 0%) ggc
> CPROP 2 : 0.31 ( 0%) usr 0.00 ( 0%) sys 0.30 ( 0%) wall
> 3110 kB ( 0%) ggc
> bypass jumps : 0.21 ( 0%) usr 0.00 ( 0%) sys 0.24 ( 0%) wall
> 2539 kB ( 0%) ggc
> CSE 2 : 4.29 ( 1%) usr 0.02 ( 0%) sys 4.21 ( 0%) wall
> 5306 kB ( 0%) ggc
> branch prediction : 0.66 ( 0%) usr 0.01 ( 0%) sys 0.67 ( 0%) wall
> 3048 kB ( 0%) ggc
> combiner : 1.60 ( 0%) usr 0.01 ( 0%) sys 1.72 ( 0%) wall
> 22097 kB ( 1%) ggc
> if-conversion : 0.70 ( 0%) usr 0.01 ( 0%) sys 0.78 ( 0%) wall
> 456 kB ( 0%) ggc
> regmove : 0.91 ( 0%) usr 0.01 ( 0%) sys 0.87 ( 0%) wall
> 118 kB ( 0%) ggc
> local alloc : 4.45 ( 1%) usr 0.01 ( 0%) sys 4.49 ( 0%) wall
> 11555 kB ( 1%) ggc
> global alloc : 9.35 ( 2%) usr 0.03 ( 0%) sys 9.42 ( 0%) wall
> 37993 kB ( 2%) ggc
> reload CSE regs : 1.83 ( 0%) usr 0.02 ( 0%) sys 1.90 ( 0%) wall
> 30852 kB ( 2%) ggc
> thread pro- & epilogue: 0.24 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall
> 1494 kB ( 0%) ggc
> if-conversion 2 : 0.17 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall
> 143 kB ( 0%) ggc
> peephole 2 : 0.27 ( 0%) usr 0.00 ( 0%) sys 0.28 ( 0%) wall
> 2505 kB ( 0%) ggc
> rename registers : 0.93 ( 0%) usr 0.00 ( 0%) sys 0.94 ( 0%) wall
> 93 kB ( 0%) ggc
> scheduling 2 : 2.72 ( 1%) usr 0.01 ( 0%) sys 2.75 ( 0%) wall
> 1617 kB ( 0%) ggc
> machine dep reorg : 0.34 ( 0%) usr 0.00 ( 0%) sys 0.40 ( 0%) wall
> 385 kB ( 0%) ggc
> reorder blocks : 0.72 ( 0%) usr 0.00 ( 0%) sys 0.66 ( 0%) wall
> 6485 kB ( 0%) ggc
> final : 1.07 ( 0%) usr 0.02 ( 0%) sys 1.16 ( 0%) wall
> 8151 kB ( 1%) ggc
> symout : 0.03 ( 0%) usr 0.01 ( 0%) sys 0.04 ( 0%) wall
> 2181 kB ( 0%) ggc
> tree if-combine : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
> 0 kB ( 0%) ggc
> TOTAL : 382.44 64.16 5061.26
> 1591718 kB
>
>