Bug 17798 - [3.4/4.0 Regression] high cpp memory usage with undefined symbols
Summary: [3.4/4.0 Regression] high cpp memory usage with undefined symbols
Status: RESOLVED WONTFIX
Alias: None
Product: gcc
Classification: Unclassified
Component: preprocessor (show other bugs)
Version: 3.4.3
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: memory-hog
Depends on:
Blocks:
 
Reported: 2004-10-02 22:23 UTC by Richard Henderson
Modified: 2005-02-09 08:13 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2005-01-27 01:00:42


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Henderson 2004-10-02 22:23:57 UTC
The following generator program

#include <stdio.h>
#include <stdlib.h>                                                            
                   
int main()
{
  int i;
  setvbuf (stdout, malloc(32768), _IOFBF, 32768);                              
                                               
  for (i = 0; i < 10000000; ++i)
    printf ("#ifdef M%d\nchar c%d[] = M%d;\n#endif\n", i, i, i);               
                                                               
  return 0;
}

builds a 483MB file which contains nothing but #ifdefs against undefined
symbols.  The preprocessed file should contain nothing but cpp line notes.

With gcc 3.2, cpp0 has peak memory usage of 930MB.
With gcc 3.4 and 4.0, cc1 has peak memory usage of 2010MB.

While this does strike me as a bit silly, apparently there are users trying
this sort of thing.

  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=68634
Comment 1 Andrew Pinski 2004-10-02 22:38:05 UTC
I think this is GC related.
Comment 2 Stuart MacDonald 2004-10-07 00:33:44 UTC
I was the original reporter. I had generated a file of this format in a
programmatic search to determine all the gcc predefined macros. I was looping
through all the possible macro names. Problem was my search kept failing due to
a crashing compiler.

Since it looked like a memory leak to me, I reported it.

Comment 3 Andrew Pinski 2004-10-07 00:39:28 UTC
A better way is to do "~/local/bin/gcc -x c /dev/null -dD -o - -E" and then look for "#define".
Comment 4 Mark Mitchell 2004-11-01 00:45:50 UTC
Postponed until GCC 3.4.4.
Comment 5 Andrew Pinski 2004-12-14 05:28:08 UTC
Confirmed.
Comment 6 Andrew Pinski 2004-12-14 05:53:11 UTC
The first thing is that read_file_guts mallocs the whole file which seems wrong.  That accounts for 
500M.
The next problem is that keep every identifier we parsed even though we don't need it.
3014 calls for 12,273,008 bytes: thread_a000a1ec |0x0 | _dyld_start | _start | main | toplev_main | 
do_compile | compile_file | c_common_parse_file | c_parse_file | yyparse | yylex | _yylex | c_lex | 
c_lex_with_flags | cpp_get_token | _cpp_lex_token | _cpp_handle_directive | do_ifdef | lex_macro_node 
| _cpp_lex_token | _cpp_lex_direct | lex_identifier | ht_lookup_with_hash | _obstack_newchunk | 
xmalloc | malloc | malloc_zone_malloc 

And this is where the problem comes from.
No there is no leak we keep a reference to all of thes identifiers but this seems like we should not.
Comment 7 Neil Booth 2004-12-14 13:57:07 UTC
Subject: Re:  [3.4/4.0 Regression] high cpp memory usage with undefined symbols

pinskia at gcc dot gnu dot org wrote:-

> 
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2004-12-14 05:53 -------
> The first thing is that read_file_guts mallocs the whole file which seems wrong.  That accounts for 
> 500M.
> The next problem is that keep every identifier we parsed even though we don't need it.
> 3014 calls for 12,273,008 bytes: thread_a000a1ec |0x0 | _dyld_start | _start | main | toplev_main | 
> do_compile | compile_file | c_common_parse_file | c_parse_file | yyparse | yylex | _yylex | c_lex | 
> c_lex_with_flags | cpp_get_token | _cpp_lex_token | _cpp_handle_directive | do_ifdef | lex_macro_node 
> | _cpp_lex_token | _cpp_lex_direct | lex_identifier | ht_lookup_with_hash | _obstack_newchunk | 
> xmalloc | malloc | malloc_zone_malloc 
> 
> And this is where the problem comes from.
> No there is no leak we keep a reference to all of thes identifiers but this seems like we should not.

Not doing either of these involves a major rework of cpplib FWIW.

I happen to think it would be beneficial, but I also think that the
whole approach CPP takes needs rethinking.

Neil.
Comment 8 Mark Mitchell 2005-02-09 08:13:57 UTC
This is nowhere near release-critical; it's an intentional extreme corner case.

As for the facts noted in the audit trail (i.e., that we lex the whole file up
front, and that we keep all identifiers around the entire time), those are very
sound strategies for most programs.

I've removed the target milestone, and closed as WONTFIX.  If someone chooses to
reopen this, please do not reset the target milestone.