I don't know how much memory gcc is "supposed" to use but this seems disproportionate. Seen on r140777 on Ubuntu Hardy. Obviously the testcase itself is meaningless, but it is supposed to be free of undefined behavior. Before CPP it was about 37 kB. regehr@john-home:~/volatile/tmp43$ current-gcc -Os -g foo.c small2.c: In function ‘func_41’: small2.c:309: warning: large integer implicitly truncated to unsigned type cc1: out of memory allocating 268435456 bytes after a total of 29876224 bytes regehr@john-home:~/volatile/tmp43$ current-gcc -v Using built-in specs. Target: i686-pc-linux-gnu Configured with: ../configure --program-prefix=current- --enable-languages=c,c++ --prefix=/home/regehr Thread model: posix gcc version 4.4.0 20080930 (experimental) (GCC)
Created attachment 16448 [details] failure-inducing input
The inlininer is going crazy some how.
We seem to accumulate an awful huge stack of BLOCKs during inlining ... are BLOCKs shared? Maybe we can avoid copying equivalent ones. This testcase is bad, as in the end we will optimize it all away to a constant (possibly - it didn't finish for me either ...).
/* When we are not doing full debug info, we however can keep around only the used variables for cfgexpand's memory packing saving quite a lot of memory. */ else if (debug_info_level == DINFO_LEVEL_NORMAL || debug_info_level == DINFO_LEVEL_VERBOSE /* Removing declarations before inlining is going to affect DECL_UID that in turn is going to affect hashtables and code generation. */ || !cfun->after_inlining) unused = false; the last check makes us blow up for -g0, otherwise we blow up because we retain dead variables for debugging purposes.
We optimize func_1 to func_1: movw g_290(%rip), %ax movl $0, g_3(%rip) cwtl ret inlining all its callees recursively ...
The difference is mainly that HEAD inlines all func_* calls, while 4.3 keeps many of them around (I see 30 call func_* insns in 4.3). The 4.4 .text is about half the size of 4.3 .text (i.e. inliner did much better job), but the emitted .debug_info is in 4.4 13MB compared to 12KB in 4.3 generated code. You need 7GB of RAM to compile this successfully. In theory, by using a more compact representation of BLOCK nodes we could store them in ~ 13MB (as that's how it is possible to encode it in DWARF3).
Current implementation of inliner is perfectly unaware of the debug info size implications. There is alternative to limit number of BLOCKS in function body growth same way as we limit stack usage that would just add little extra bookkeeping, but I would like to be sure that there is no better alternative first. Current BLOCK removal code is quite conservative based on my observations of what RTL code did originally, perhaps we can be more strict?
changing summary, inlining _is_ the right thing to do.
sizeof (tree_block) is 52 bytes on 32-bit hosts. Of these, 8 are unused (ann and type), 8 are frequently unused (block_fragment stuff -- always write-only at debug level 0). Moving fragments into an annotation and reusing type for something would already save 20% of the memory. I think we need a better representation for blocks after parsing. Maybe gimplification unit-at-a-time will make this easier, I don't know.
Subject: Re: [4.4 Regression] inlining causes explosion in debug info > sizeof (tree_block) is 52 bytes on 32-bit hosts. Of these, 8 are unused (ann > and type), 8 are frequently unused (block_fragment stuff -- always write-only > at debug level 0). Moving fragments into an annotation and reusing type for > something would already save 20% of the memory. I was looking into the bug last month but then got distracted by other more urgent things. I guess it is time to get back ;) Even if our blocks representation is not the most effecient around, I think main problem is that we keep too many blocks around that never get it to debug info or are never serving useful purpose. The testcase compiled at -g3 needs about 100MB of DWARF section and before inlining we unfortunately need to keep debug info at -g3 verbosity. I plan to look into what blocks gets ignored by the dwarf2out but also it would be great to figure out if we really need them in debug info at first place. (for example for every inlined function we create container block and then block containing the arguments. Current tree-ssa-live code to prune out blocks preserves both) Honza
Created attachment 17340 [details] Dump of block structure
Created attachment 17341 [details] Little dumping facility
There are obviously giant trees of blocks that have all variables unused and no statements in them coming from the early inliner. I am getting convinced we can safely prune those even at -g3: user can not breakpoint into the block in any way and can't ask for the value.... So I guess we could be at -g eliminating inner blocks with no used variables and no statements just keeping around blocks with statements including the unused vars so we can get proper "optimized out" debugger info. I will prepare patch for this. Honza
Subject: Bug 37709 Author: hubicka Date: Mon Feb 23 13:10:53 2009 New Revision: 144381 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=144381 Log: PR tree-optimization/37709 * tree.c (block_ultimate_origin): Move here from dwarf2out. * tree.h (block_ultimate_origin): Declare. * dwarf2out.c (block_ultimate_origin): Move to tree.c * tree-ssa-live.c (remove_unused_scope_block_p): Eliminate blocks containig no instructions nor live variables nor nested blocks. (dump_scope_block): New function. (remove_unused_locals): Enable removal of dead blocks by default; enable dumping at TDF_DETAILS. Modified: trunk/gcc/ChangeLog trunk/gcc/dbxout.c trunk/gcc/dwarf2out.c trunk/gcc/tree-ssa-live.c trunk/gcc/tree.c trunk/gcc/tree.h
Fixed.