This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: WPA stream_out form & memory consumption


On Mon, Apr 7, 2014 at 1:25 PM, Martin LiÅka <mliska@suse.cz> wrote:
> On 04/03/2014 10:40 PM, Jan Hubicka wrote:
>>>>
>>>> Firefox:
>>>> cgraph.c:869 (cgraph_create_edge_1)                       0: 0.0%
>>>> 0: 0.0%  130358176: 6.9%          0: 0.0%    1253444
>>>> cgraph.c:510 (cgraph_allocate_node)                       0: 0.0%
>>>> 0: 0.0%  182236800: 9.7%          0: 0.0%     555600
>>>> toplev.c:960 (realloc_for_line_map)                       0: 0.0%
>>>> 89503888: 5.5%  268468240:14.3%        160: 0.0%         13
>>>> tree-streamer-in.c:621 (streamer_alloc_tree)       93089976:77.5%
>>>> 972848816:59.6%  639230248:33.9%   21332480:32.3%   13496198
>>>> Total                                             120076578
>>>> 1632997043       1883064062         65981723         24732501
>>>> source location                                     Garbage
>>>> Freed             Leak         Overhead            Times
>>
>> Hi,
>> this is the linemap compression patch.  For me it reduces
>> realloc_for_line_map to about 60MB, 4%
>> toplev.c:960 (realloc_for_line_map)                       0: 0.0%
>> 22395000: 1.5%   67141656: 4.1%        144: 0.0%         12
>> vec.h:626 ((null))                                        0: 0.0%
>> 134568360: 9.3%   75497528: 4.6%    1569368: 2.4%    1009306
>> vec.h:614 ((null))                                 10659408: 8.3%
>> 61265504: 4.2%   78711104: 4.8%     793720: 1.2%     632110
>> vec.h:698 ((null))                                   908768: 0.7%
>> 99564524: 6.9%   82255800: 5.0%    4201148: 6.5%     874628
>> vec.h:666 ((null))                                    12840: 0.0%
>> 73534476: 5.1%   92633604: 5.7%    2929176: 4.5%     776347
>> lto/lto.c:245 (lto_read_in_decl_state)                    0: 0.0%
>> 43115656: 3.0%   94137976: 5.8%   25396856:39.2%    2698570
>> cgraph.c:869 (cgraph_create_edge_1)                       0: 0.0%
>> 0: 0.0%  124069712: 7.6%          0: 0.0%    1192978
>> cgraph.c:510 (cgraph_allocate_node)                       0: 0.0%
>> 0: 0.0%  189855256:11.6%          0: 0.0%     578827
>> tree-streamer-in.c:621 (streamer_alloc_tree)       97891888:76.1%
>> 892961808:61.6%  613594816:37.6%   22268544:34.3%   12574164
>> Total                                             128555402
>> 1448954963       1633186049         64847582         24190936
>> source location                                     Garbage
>> Freed             Leak         Overhead            Times
>>
>> So I get about 1GB of peak GGC memory and about 2.6GB for WPA in TOP.
>> (still on
>> the old tree) You may try to experiment with LOCATION_CACHE_SIZE. It sould
>> be
>> big enough so the locations get shared across different input files.
>
>
> Hi!
>
> Suggested patch looks fine, according to following data:
> https://drive.google.com/file/d/0B0pisUJ80pO1X05SMXdXS2ZScEE/edit?usp=sharing,
> it saves about ~1GB during non-parallel WPA phase. Average memory
> consumption during parallel WPA phase reduces by 1-2GB. It would be good to
> apply the patch.

AFAIK we settled on a simpler one dropping columns at stream-out time
that also helped.

As for the correct way to do the optimization we agreed(?) that streaming
the locations elsewhere and using references to them is more appropriate.
At stream-in (or before stream-out) we can then read the location pairs
and sort them before assigning linemap entries.

Thus, similar to the on-the-side string stream use a location stream
to stream locs.  At stream-in, input it as a whole and assign linemap
numbers in optimal order.

Column dropping hack:

Index: gcc/lto-streamer-out.c
===================================================================
--- gcc/lto-streamer-out.c      (revision 209181)
+++ gcc/lto-streamer-out.c      (working copy)
@@ -176,6 +176,7 @@ lto_output_location (struct output_block
     return;

   xloc = expand_location (loc);
+  xloc.column = 0;

   bp_pack_value (bp, ob->current_file != xloc.file, 1);
   bp_pack_value (bp, ob->current_line != xloc.line, 1);

maybe you can check the effect of that (we'd then set
flag_diagnostics_show_caret to 0 for lto1 invocations and thus
middle-end
diagnostics).

Richard.

> Martin
>
>
>>
>> Honza
>>
>> Index: lto-streamer-in.c
>> ===================================================================
>> --- lto-streamer-in.c   (revision 209047)
>> +++ lto-streamer-in.c   (working copy)
>> @@ -145,21 +145,49 @@ canon_file_name (const char *string)
>>   }
>>     +/* location_cache is used at LTO read in to avoid too many duplicates
>> in
>> +   the linemap tables.  */
>> +
>> +#define LOCATION_CACHE_SIZE 524287
>> +struct location_cache_entry
>> +{
>> +  const char *file;
>> +  int line;
>> +  int col;
>> +  location_t location;
>> +};
>> +static struct location_cache_entry *location_cache;
>> +
>> +/* Return hash of FILE/LINE/COL.  */
>> +
>> +int
>> +location_cache_hash (const char *file, int line, int col)
>> +{
>> +  return iterative_hash_hashval_t ((size_t)file,
>> +                                   iterative_hash_hashval_t (line, col))
>> % LOCATION_CACHE_SIZE;
>> +}
>> +
>> +
>>   /* Read a location bitpack from input block IB.  */
>>     location_t
>>   lto_input_location (struct bitpack_d *bp, struct data_in *data_in)
>>   {
>> -  static const char *current_file;
>> -  static int current_line;
>> +  static const char *current_file, *last_file;
>> +  static int current_line, last_line;
>>     static int current_col;
>>     bool file_change, line_change, column_change;
>>     unsigned len;
>> -  bool prev_file = current_file != NULL;
>> +  bool prev_file = last_file != NULL;
>> +  int hash;
>> +  const char *cfile;
>>       if (bp_unpack_value (bp, 1))
>>       return UNKNOWN_LOCATION;
>>   +  if (!location_cache)
>> +    location_cache = XCNEWVEC (struct location_cache_entry,
>> LOCATION_CACHE_SIZE);
>> +
>>     file_change = bp_unpack_value (bp, 1);
>>     line_change = bp_unpack_value (bp, 1);
>>     column_change = bp_unpack_value (bp, 1);
>> @@ -175,18 +203,32 @@ lto_input_location (struct bitpack_d *bp
>>       if (column_change)
>>       current_col = bp_unpack_var_len_unsigned (bp);
>> +  cfile = current_file;
>> +  hash = location_cache_hash (cfile, current_line, current_col);
>>   -  if (file_change)
>> +  if (location_cache[hash].file == cfile
>> +      && location_cache[hash].line == current_line
>> +      && location_cache[hash].col == current_col + 1)
>> +    return location_cache[hash].location;
>> +  location_cache[hash].file = cfile;
>> +  location_cache[hash].line = current_line;
>> +  location_cache[hash].col = current_col + 1;
>> +
>> +  if (current_file != last_file)
>>       {
>>         if (prev_file)
>>         linemap_add (line_table, LC_LEAVE, false, NULL, 0);
>>           linemap_add (line_table, LC_ENTER, false, current_file,
>> current_line);
>>       }
>> -  else if (line_change)
>> +  else if (current_line != last_line)
>>       linemap_line_start (line_table, current_line, current_col);
>>   -  return linemap_position_for_column (line_table, current_col);
>> +  location_cache[hash].location
>> +    = linemap_position_for_column (line_table, current_col);
>> +  last_file = current_file;
>> +  last_line = current_line;
>> +  return location_cache[hash].location;
>>   }
>>     @@ -981,6 +1023,27 @@ input_function (tree fn_decl, struct dat
>>         }
>>         bsi = gsi_start_bb (bb);
>>         while (!gsi_end_p (bsi))
>> +       {
>> +         gimple stmt = gsi_stmt (bsi);
>> +         /* If we're recompiling LTO objects with debug stmts but
>> +            we're not supposed to have debug stmts, remove them now.
>> +            We can't remove them earlier because this would cause uid
>> +            mismatches in fixups, but we can do it at this point, as
>> +            long as debug stmts don't require fixups.  */
>> +         if (!MAY_HAVE_DEBUG_STMTS && is_gimple_debug (stmt))
>> +           {
>> +             gimple_stmt_iterator gsi = bsi;
>> +             gsi_next (&bsi);
>> +             gsi_remove (&gsi, true);
>> +           }
>> +         else
>> +           {
>> +             gsi_next (&bsi);
>> +             stmts[gimple_uid (stmt)] = stmt;
>> +           }
>> +       }
>> +      bsi = gsi_start_bb (bb);
>> +      while (!gsi_end_p (bsi))
>>         {
>>           gimple stmt = gsi_stmt (bsi);
>>           /* If we're recompiling LTO objects with debug stmts but
>
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]