This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Possible LRA issue?


Hi,

   I have a large codebase where at some point, there's a structure
that takes an unsigned integer template argument, and uses as the size
of an array, something like

template <class T, size_t S>
struct Struct
{
    typedef std::array<T, S> Chunk;
    typedef std::list<Chunk> Content;

   Content c;
};

Changing the values of S alters significantly the compile time and
memory that the compiler takes. We use some large numbers there.
At some point, the compiler runs out of memory (xmalloc fails). I
wondered why, and did some analysis by debugging the 4.8.2 (same with
4.8.3), and did the following experiment turning off all the
optimizations (-fno-* and -O0):
  I generated a report of xmalloc usage of two programs: one having
S=10u, and another with S=11u, just to see the difference of 1.
The report was generated as follows: I set a breakpoint at xmalloc,
appending a bt to a file. Then I found common stack traces and counted
how many xmallocs were called in one and another versions of the
program (S=10u and S=11u as mentioned above).
The difference were:

a) Stack trace:
      xmalloc | pool_alloc | create_live_range | mark_pseudo_live |
mark_regno_live | process_bb_lives | lra_create_live_ranges | lra |
do_reload | rest_of_handle_reload | execute_one_pass |
execute_pass_list | execute_pass_list | expand_function |
output_in_order | compile | finalize_compilation_unit |
cp_write_global_declarations | compile_file | do_compile | toplev_main
| __libc_start_main | _start |

     S=10u: 15 times
     S=11u: 16 times


b) Stack trace:
      xmalloc | lra_set_insn_recog_data | lra_get_insn_recog_data |
lra_update_insn_regno_info | lra_update_insn_regno_info |
lra_push_insn_1 | lra_push_insn | push_insns | lra_process_new_insns |
curr_insn_transform | lra_constraints | lra | do_reload |
rest_of_handle_reload | execute_one_pass | execute_pass_list |
execute_pass_list | expand_function | output_in_order | compile |
finalize_compilation_unit | cp_write_global_declarations |
compile_file | do_compile | toplev_main | __libc_start_main | _start |

     S=10u: 186 times
     S=11u: 192 times

c) Stack trace:
     xmalloc | df_install_refs | df_refs_add_to_chains |
df_insn_rescan | emit_insn_after_1 | emit_pattern_after_noloc |
emit_pattern_after_setloc | emit_insn_after_setloc | try_split |
split_insn | split_all_insns | rest_of_handle_split_after_reload |
execute_one_pass | execute_pass_list | execute_pass_list |
execute_pass_list | expand_function | output_in_order | compile |
finalize_compilation_unit | cp_write_global_declarations |
compile_file | do_compile | toplev_main | __libc_start_main | _start |

     S=10u: 617 times
     S=11u: 619 times

d) Stack trace:
     xmalloc | df_install_refs | df_refs_add_to_chains |
df_bb_refs_record | df_scan_blocks | rest_of_handle_df_initialize |
execute_one_pass | execute_pass_list | execute_pass_list |
expand_function | output_in_order | compile |
finalize_compilation_unit | cp_write_global_declarations |
compile_file | do_compile | toplev_main | __libc_start_main | _start |

    S=10u: 13223 times
    S=11u: 13227 times

e) Stack trace:
     xmalloc | __GI__obstack_newchunk | bitmap_element_allocate |
bitmap_set_bit | update_lives | assign_hard_regno | assign_by_spills |
lra_assign | lra | do_reload | rest_of_handle_reload |
execute_one_pass | execute_pass_list | execute_pass_list |
expand_function | output_in_order | compile |
finalize_compilation_unit | cp_write_global_declarations |
compile_file | do_compile | toplev_main | __libc_start_main | _start |

    S=10u: 0 times (never!)
    S=11u: 1

Unfortunately I can't disclose the source code nor have the time to
isolate a piece of code reproducing the issue.
Some comments about the code: I don't do template metaprogramming
depending on S, but I do some for-range on the Content.

I can extend the analysis to S=12 and compare with the previous values.
I thought to fix this myself but lack the time and background on
theses optimizations. Any hint?
I'm open to do more experiments if anybody asks me, or post -fdumps.

I suspect that playing with gcc-min-heapsize and similar values this
issue could be worked around, but I'd like to know why just changing
the size of an array has such a consequence.

Thanks!

    Daniel.

-- 

Daniel F. Gutson
Chief Engineering Officer, SPD


San Lorenzo 47, 3rd Floor, Office 5

CÃrdoba, Argentina


Phone: +54 351 4217888 / +54 351 4218211

Skype: dgutson


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]