37709 – [4.4 Regression] inlining causes explosion in debug info

Bug 37709 - [4.4 Regression] inlining causes explosion in debug info

Summary: [4.4 Regression] inlining causes explosion in debug info

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	4.4.0

Importance:	P2 normal
Target Milestone:	4.4.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:	compile-time-hog, memory-hog

Depends on:
Blocks:

Reported:	2008-10-01 23:41 UTC by John Regehr
Modified:	2009-02-23 13:12 UTC (History)
CC List:	4 users (show)

See Also:
Host:	i686-pc-linux-gnu
Target:	i686-pc-linux-gnu
Build:	i686-pc-linux-gnu
Known to work:
Known to fail:
Last reconfirmed:	2008-12-29 18:57:38

Attachments
failure-inducing input (13.28 KB, text/x-csrc) 2008-10-01 23:42 UTC, John Regehr	Details
Dump of block structure (1.83 KB, text/plain) 2009-02-22 14:22 UTC, Jan Hubicka	Details
Little dumping facility (1.46 KB, patch) 2009-02-22 14:23 UTC, Jan Hubicka	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description John Regehr 2008-10-01 23:41:29 UTC

I don't know how much memory gcc is "supposed" to use but this seems disproportionate. 

Seen on r140777 on Ubuntu Hardy.

Obviously the testcase itself is meaningless, but it is supposed to be free of undefined behavior.  Before CPP it was about 37 kB.

regehr@john-home:~/volatile/tmp43$ current-gcc -Os -g foo.c
small2.c: In function ‘func_41’:
small2.c:309: warning: large integer implicitly truncated to unsigned type

cc1: out of memory allocating 268435456 bytes after a total of 29876224 bytes
regehr@john-home:~/volatile/tmp43$ current-gcc -v
Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: ../configure --program-prefix=current- --enable-languages=c,c++ --prefix=/home/regehr
Thread model: posix
gcc version 4.4.0 20080930 (experimental) (GCC)

Comment 1 John Regehr 2008-10-01 23:42:10 UTC

Created attachment 16448 [details]
failure-inducing input

Comment 2 Andrew Pinski 2008-10-01 23:48:08 UTC

The inlininer is going crazy some how.

Comment 3 Richard Biener 2008-10-02 11:06:58 UTC

We seem to accumulate an awful huge stack of BLOCKs during inlining ... are
BLOCKs shared?  Maybe we can avoid copying equivalent ones.

This testcase is bad, as in the end we will optimize it all away to a constant
(possibly - it didn't finish for me either ...).

Comment 4 Richard Biener 2008-10-02 12:56:21 UTC

      /* When we are not doing full debug info, we however can keep around
         only the used variables for cfgexpand's memory packing saving quite
         a lot of memory.  */
      else if (debug_info_level == DINFO_LEVEL_NORMAL
               || debug_info_level == DINFO_LEVEL_VERBOSE
               /* Removing declarations before inlining is going to affect
                  DECL_UID that in turn is going to affect hashtables and
                  code generation.  */
               || !cfun->after_inlining)
        unused = false;

the last check makes us blow up for -g0, otherwise we blow up because we
retain dead variables for debugging purposes.

Comment 5 Richard Biener 2008-10-02 13:02:04 UTC

We optimize func_1 to

func_1:
        movw    g_290(%rip), %ax
        movl    $0, g_3(%rip)
        cwtl
        ret

inlining all its callees recursively ...

Comment 6 Jakub Jelinek 2008-10-02 14:26:31 UTC

The difference is mainly that HEAD inlines all func_* calls, while 4.3 keeps
many of them around (I see 30 call func_* insns in 4.3).  The 4.4 .text is about half the size of 4.3 .text (i.e. inliner did much better job), but the emitted
.debug_info is in 4.4 13MB compared to 12KB in 4.3 generated code.
You need 7GB of RAM to compile this successfully.

In theory, by using a more compact representation of BLOCK nodes we could store
them in ~ 13MB (as that's how it is possible to encode it in DWARF3).

Comment 7 Jan Hubicka 2008-11-11 20:09:54 UTC

Current implementation of inliner is perfectly unaware of the debug info size implications. There is alternative to limit number of BLOCKS in function body growth same way as we limit stack usage that would just add little extra bookkeeping, but I would like to be sure that there is no better alternative first.  Current BLOCK removal code is quite conservative based on my observations of what RTL code did originally, perhaps we can be more strict?

Comment 8 Paolo Bonzini 2009-02-03 16:28:09 UTC

changing summary, inlining _is_ the right thing to do.

Comment 9 Paolo Bonzini 2009-02-04 07:04:05 UTC

sizeof (tree_block) is 52 bytes on 32-bit hosts.  Of these, 8 are unused (ann and type), 8 are frequently unused (block_fragment stuff -- always write-only at debug level 0).  Moving fragments into an annotation and reusing type for something would already save 20% of the memory.

I think we need a better representation for blocks after parsing.  Maybe gimplification unit-at-a-time will make this easier, I don't know.

Comment 10 Jan Hubicka 2009-02-12 10:28:52 UTC

Subject: Re:  [4.4 Regression] inlining causes explosion in debug info

> sizeof (tree_block) is 52 bytes on 32-bit hosts.  Of these, 8 are unused (ann
> and type), 8 are frequently unused (block_fragment stuff -- always write-only
> at debug level 0).  Moving fragments into an annotation and reusing type for
> something would already save 20% of the memory.

I was looking into the bug last month but then got distracted by other
more urgent things.  I guess it is time to get back ;)

Even if our blocks representation is not the most effecient around, I
think main problem is that we keep too many blocks around that never get
it to debug info or are never serving useful purpose.  The testcase
compiled at -g3 needs about 100MB of DWARF section and before inlining
we unfortunately need to keep debug info at -g3 verbosity.  I plan to
look into what blocks gets ignored by the dwarf2out but also it would be
great to figure out if we really need them in debug info at first place.
(for example for every inlined function we create container block and
then block containing the arguments.  Current tree-ssa-live code to
prune out blocks preserves both)

Honza

Comment 11 Jan Hubicka 2009-02-22 14:22:59 UTC

Created attachment 17340 [details]
Dump of block structure

Comment 12 Jan Hubicka 2009-02-22 14:23:37 UTC

Created attachment 17341 [details]
Little dumping facility

Comment 13 Jan Hubicka 2009-02-22 14:46:04 UTC

There are obviously giant trees of blocks that have all variables unused and no statements in them coming from the early inliner.  I am getting convinced we can safely prune those even at -g3: user can not breakpoint into the block in any way and can't ask for the value....

So I guess we could be at -g eliminating inner blocks with no used variables and no statements just keeping around blocks with statements including the unused vars so we can get proper "optimized out" debugger info.

I will prepare patch for this.

Honza

Comment 14 Jan Hubicka 2009-02-23 13:11:19 UTC

Subject: Bug 37709

Author: hubicka
Date: Mon Feb 23 13:10:53 2009
New Revision: 144381

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=144381
Log:
	PR tree-optimization/37709                                              
	* tree.c (block_ultimate_origin): Move here from dwarf2out.             
	* tree.h (block_ultimate_origin): Declare.                              
	* dwarf2out.c (block_ultimate_origin): Move to tree.c                   
	* tree-ssa-live.c (remove_unused_scope_block_p):
	Eliminate blocks containig no instructions nor live variables nor
	nested blocks.
	(dump_scope_block): New function.
	(remove_unused_locals): Enable removal of dead blocks by default;
	enable dumping at TDF_DETAILS.                                          

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/dbxout.c
    trunk/gcc/dwarf2out.c
    trunk/gcc/tree-ssa-live.c
    trunk/gcc/tree.c
    trunk/gcc/tree.h

Comment 15 Jan Hubicka 2009-02-23 13:12:00 UTC

Fixed.