Bug 64928 - [11 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
Summary: [11 Regression] Inordinate cpu time and memory usage in "phase opt and genera...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.9.2
: P2 normal
Target Milestone: 12.0
Assignee: Not yet assigned to anyone
URL:
Keywords: compile-time-hog, memory-hog
: 66209 (view as bug list)
Depends on: 99512 66209
Blocks:
  Show dependency treegraph
 
Reported: 2015-02-03 21:09 UTC by lucier
Modified: 2024-07-19 06:29 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work: 12.1.0, 13.1.0, 14.0, 4.4.7
Known to fail: 11.4.0, 11.5.0, 5.0
Last reconfirmed: 2017-08-21 00:00:00


Attachments
Input file for bug (84.96 KB, application/gzip)
2015-02-03 21:11 UTC, lucier
Details
_io.i.gz: larger test file (378.00 KB, application/gzip)
2015-02-06 05:07 UTC, lucier
Details
Patch to limit coalescing amount (1.90 KB, patch)
2015-03-06 12:47 UTC, Richard Biener
Details | Diff
do not compute live/conflict for abnormal coalesces (1.24 KB, patch)
2015-03-06 12:52 UTC, Richard Biener
Details | Diff
Parametrized input files for test coverage testing. (105.63 KB, application/x-compressed-tar)
2021-03-10 02:13 UTC, lucier
Details
Smaller parameterized test file (17.76 KB, application/gzip)
2021-03-10 14:16 UTC, lucier
Details
SVG of the CFG at LIM (57.23 KB, text/plain)
2021-03-10 15:06 UTC, Richard Biener
Details

Note You need to log in before you can comment on or make changes to this bug.
Description lucier 2015-02-03 21:09:09 UTC
With this compiler:

firefly:~/Downloads/gambit/lib> /pkgs/gcc-4.9.2/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/pkgs/gcc-4.9.2/bin/gcc
COLLECT_LTO_WRAPPER=/pkgs/gcc-4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../../gcc-4.9.2/configure --prefix=/pkgs/gcc-4.9.2
Thread model: posix
gcc version 4.9.2 (GCC) 


With this command:

/pkgs/gcc-4.9.2/bin/gcc -Q -save-temps -Wno-unused -Wno-write-strings -O1 -fno-math-errno -fschedule-insns2 -fno-strict-aliasing -fno-trapping-math -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp  -fprofile-arcs -ftest-coverage  -I"../include" -c -o "_system.o" -I. -DHAVE_CONFIG_H -D___GAMBCDIR="\"/usr/local/Gambit-C\"" -D___SYS_TYPE_CPU="\"x86_64\"" -D___SYS_TYPE_VENDOR="\"unknown\"" -D___SYS_TYPE_OS="\"linux-gnu\"" -D___CONFIGURE_COMMAND="\"./configure 'CC=/pkgs/gcc-4.9.2/bin/gcc -Q -save-temps' '--enable-track-scheme' '--enable-coverage'"\" -D___OBJ_EXTENSION="\".o\"" -D___EXE_EXTENSION="\"\"" -D___BAT_EXTENSION="\"\"" -D___PRIMAL _system.c -D___LIBRARY

I get the output:

Execution times (seconds)
 phase setup             :   0.12 (100%) usr   0.00 ( 0%) sys   0.13 (100%) wall   35712 kB (100%) ggc
 TOTAL                 :   0.12             0.00             0.13              35728 kB
 btowc wctob mbrlen __signbitf __signbit __signbitl ___H__20___system ___H__23__23_type ___H__23__23_type_2d_cast ___H__23__23_subtype ___H__23__23_subtype_2d_set_21_ ___H__23__23_fixnum_3f_ ___H__23__23_subtyped_3f_ ___H__23__23_subtyped_2d_mutable_3f_ ___H__23__23_subtyped_2e_vector_3f_ ___H__23__23_subtyped_2e_symbol_3f_ ___H__23__23_subtyped_2e_flonum_3f_ ___H__23__23_subtyped_2e_bignum_3f_ ___H__23__23_special_3f_ ___H__23__23_ratnum_3f_ ___H__23__23_cpxnum_3f_ ___H__23__23_structure_3f_ ___H__23__23_values_3f_ ___H__23__23_meroon_3f_ ___H__23__23_jazz_3f_ ___H__23__23_frame_3f_ ___H__23__23_continuation_3f_ ___H__23__23_promise_3f_ ___H__23__23_return_3f_ ___H__23__23_foreign_3f_ ___H__23__23_flonum_3f_ ___H__23__23_bignum_3f_ ___H__23__23_unbound_3f_ ___H__23__23_quasi_2d_append ___H__23__23_quasi_2d_list ___H__23__23_quasi_2d_cons ___H__23__23_quasi_2d_list_2d__3e_vector ___H__23__23_quasi_2d_vector ___H__23__23_case_2d_memv ___H__23__23_eqv_3f_ ___H_eqv_3f_ ___H__23__23_eq_3f_ ___H_eq_3f_ ___H__23__23_bvector_2d_equal_3f_ ___H__23__23_equal_3f_ ___H_equal_3f_ ___H__23__23_symbol_2d_hash ___H_symbol_2d_hash ___H__23__23_keyword_2d_hash ___H_keyword_2d_hash ___H__23__23_eq_3f__2d_hash ___H_eq_3f__2d_hash ___H__23__23_eqv_3f__2d_hash ___H_eqv_3f__2d_hash ___H__23__23_equal_3f__2d_hash ___H_equal_3f__2d_hash ___H__23__23_string_3d__3f__2d_hash ___H_string_3d__3f__2d_hash ___H__23__23_string_2d_ci_3d__3f__2d_hash ___H_string_2d_ci_3d__3f__2d_hash ___H__23__23_generic_2d_hash ___H__23__23_fail_2d_check_2d_invalid_2d_hash_2d_number_2d_exception ___H_invalid_2d_hash_2d_number_2d_exception_3f_ ___H_invalid_2d_hash_2d_number_2d_exception_2d_procedure ___H_invalid_2d_hash_2d_number_2d_exception_2d_arguments ___H__23__23_raise_2d_invalid_2d_hash_2d_number_2d_exception ___H__23__23_fail_2d_check_2d_unbound_2d_table_2d_key_2d_exception ___H_unbound_2d_table_2d_key_2d_exception_3f_ ___H_unbound_2d_table_2d_key_2d_exception_2d_procedure ___H_unbound_2d_table_2d_key_2d_exception_2d_arguments ___H__23__23_raise_2d_unbound_2d_table_2d_key_2d_exception ___H__23__23_gc_2d_hash_2d_table_3f_ ___H__23__23_gc_2d_hash_2d_table_2d_ref ___H__23__23_gc_2d_hash_2d_table_2d_set_21_ ___H__23__23_gc_2d_hash_2d_table_2d_rehash_21_ ___H__23__23_smallest_2d_prime_2d_no_2d_less_2d_than ___H__23__23_gc_2d_hash_2d_table_2d_resize_21_ ___H__23__23_gc_2d_hash_2d_table_2d_allocate ___H__23__23_gc_2d_hash_2d_table_2d_for_2d_each ___H__23__23_gc_2d_hash_2d_table_2d_search ___H__23__23_gc_2d_hash_2d_table_2d_foldl ___H__23__23_mem_2d_allocated_3f_ ___H__23__23_fail_2d_check_2d_table ___H_table_3f_ ___H__23__23_make_2d_table ___H_make_2d_table ___H__23__23_table_2d_get_2d_eq_2d_gcht ___H__23__23_table_2d_get_2d_gcht_2d_not_2d_mem_2d_alloc ___H__23__23_table_2d_get_2d_gcht ___H__23__23_table_2d_length ___H_table_2d_length ___H__23__23_table_2d_access ___H__23__23_table_2d_ref ___H_table_2d_ref ___H__23__23_table_2d_resize_21_ ___H__23__23_table_2d_set_21_ ___H_table_2d_set_21_ ___H__23__23_table_2d_search ___H_table_2d_search ___H__23__23_table_2d_for_2d_each ___H_table_2d_for_2d_each ___H__23__23_table_2d_foldl ___H__23__23_table_2d__3e_list ___H_table_2d__3e_list ___H__23__23_list_2d__3e_table ___H_list_2d__3e_table ___H__23__23_table_2d_copy ___H_table_2d_copy ___H__23__23_table_2d_merge_21_ ___H_table_2d_merge_21_ ___H__23__23_table_2d_merge ___H_table_2d_merge ___H__23__23_table_2d_equal_3f_ ___H__23__23_table_2d_equal_3f__2d_hash ___H__23__23_fail_2d_check_2d_unbound_2d_serial_2d_number_2d_exception ___H_unbound_2d_serial_2d_number_2d_exception_3f_ ___H_unbound_2d_serial_2d_number_2d_exception_2d_procedure ___H_unbound_2d_serial_2d_number_2d_exception_2d_arguments ___H__23__23_raise_2d_unbound_2d_serial_2d_number_2d_exception ___H__23__23_object_2d__3e_serial_2d_number ___H_object_2d__3e_serial_2d_number ___H__23__23_serial_2d_number_2d__3e_object ___H_serial_2d_number_2d__3e_object ___H__23__23_object_2d__3e_u8vector ___H_object_2d__3e_u8vector ___H__23__23_u8vector_2d__3e_object ___H_u8vector_2d__3e_object ___setup_mod ___init_mod ____20___system
Analyzing compilation unit
Performing interprocedural optimizations
 <*free_lang_data> <visibility> <early_local_cleanups> <*free_inline_summary> <profile> <whole-program> <profile_estimate> <inline> <pure-const> <static-var>Assembling functions:
 ___setup_mod ___init_mod ___H_u8vector_2d__3e_object ___H__23__23_u8vector_2d__3e_object ___H_object_2d__3e_u8vector ___H__23__23_object_2d__3e_u8vector {GC 298137k -> 101678k} ___H_serial_2d_number_2d__3e_object ___H__23__23_serial_2d_number_2d__3e_object ___H_object_2d__3e_serial_2d_number ___H__23__23_object_2d__3e_serial_2d_number ___H__23__23_raise_2d_unbound_2d_serial_2d_number_2d_exception ___H_unbound_2d_serial_2d_number_2d_exception_2d_arguments ___H_unbound_2d_serial_2d_number_2d_exception_2d_procedure ___H_unbound_2d_serial_2d_number_2d_exception_3f_ ___H__23__23_fail_2d_check_2d_unbound_2d_serial_2d_number_2d_exception ___H__23__23_table_2d_equal_3f__2d_hash ___H__23__23_table_2d_equal_3f_ ___H_table_2d_merge ___H__23__23_table_2d_merge ___H_table_2d_merge_21_ ___H__23__23_table_2d_merge_21_ ___H_table_2d_copy ___H__23__23_table_2d_copy ___H_list_2d__3e_table ___H__23__23_list_2d__3e_table ___H_table_2d__3e_list ___H__23__23_table_2d__3e_list ___H__23__23_table_2d_foldl ___H_table_2d_for_2d_each ___H__23__23_table_2d_for_2d_each ___H_table_2d_search ___H__23__23_table_2d_search ___H_table_2d_set_21_ ___H__23__23_table_2d_resize_21_ ___H_table_2d_ref ___H__23__23_table_2d_access ___H_table_2d_length ___H__23__23_table_2d_length ___H__23__23_table_2d_get_2d_gcht ___H__23__23_table_2d_get_2d_gcht_2d_not_2d_mem_2d_alloc ___H__23__23_table_2d_get_2d_eq_2d_gcht ___H_make_2d_table ___H_table_3f_ ___H__23__23_fail_2d_check_2d_table ___H__23__23_mem_2d_allocated_3f_ ___H__23__23_gc_2d_hash_2d_table_2d_foldl ___H__23__23_gc_2d_hash_2d_table_2d_search ___H__23__23_gc_2d_hash_2d_table_2d_for_2d_each ___H__23__23_gc_2d_hash_2d_table_2d_allocate ___H__23__23_gc_2d_hash_2d_table_2d_resize_21_ ___H__23__23_smallest_2d_prime_2d_no_2d_less_2d_than ___H__23__23_gc_2d_hash_2d_table_3f_ ___H__23__23_raise_2d_unbound_2d_table_2d_key_2d_exception ___H_unbound_2d_table_2d_key_2d_exception_2d_arguments ___H_unbound_2d_table_2d_key_2d_exception_2d_procedure ___H_unbound_2d_table_2d_key_2d_exception_3f_ ___H__23__23_fail_2d_check_2d_unbound_2d_table_2d_key_2d_exception ___H__23__23_raise_2d_invalid_2d_hash_2d_number_2d_exception ___H_invalid_2d_hash_2d_number_2d_exception_2d_arguments ___H_invalid_2d_hash_2d_number_2d_exception_2d_procedure ___H_invalid_2d_hash_2d_number_2d_exception_3f_ ___H__23__23_fail_2d_check_2d_invalid_2d_hash_2d_number_2d_exception ___H__23__23_generic_2d_hash ___H_string_2d_ci_3d__3f__2d_hash ___H_string_3d__3f__2d_hash ___H__23__23_string_3d__3f__2d_hash ___H_equal_3f__2d_hash ___H__23__23_equal_3f__2d_hash ___H_eqv_3f__2d_hash ___H__23__23_eqv_3f__2d_hash ___H_eq_3f__2d_hash ___H__23__23_eq_3f__2d_hash ___H_keyword_2d_hash ___H__23__23_keyword_2d_hash ___H_symbol_2d_hash ___H__23__23_symbol_2d_hash ___H_equal_3f_ ___H__23__23_equal_3f_ ___H__23__23_bvector_2d_equal_3f_ ___H_eq_3f_ ___H__23__23_eq_3f_ ___H_eqv_3f_ ___H__23__23_eqv_3f_ ___H__23__23_case_2d_memv ___H__23__23_quasi_2d_vector ___H__23__23_quasi_2d_list_2d__3e_vector ___H__23__23_quasi_2d_cons ___H__23__23_quasi_2d_list ___H__23__23_quasi_2d_append ___H__23__23_unbound_3f_ ___H__23__23_bignum_3f_ ___H__23__23_flonum_3f_ ___H__23__23_foreign_3f_ ___H__23__23_return_3f_ ___H__23__23_promise_3f_ ___H__23__23_continuation_3f_ ___H__23__23_frame_3f_ ___H__23__23_jazz_3f_ ___H__23__23_meroon_3f_ ___H__23__23_values_3f_ ___H__23__23_structure_3f_ ___H__23__23_cpxnum_3f_ ___H__23__23_ratnum_3f_ ___H__23__23_special_3f_ ___H__23__23_subtyped_2e_bignum_3f_ ___H__23__23_subtyped_2e_flonum_3f_ ___H__23__23_subtyped_2e_symbol_3f_ ___H__23__23_subtyped_2e_vector_3f_ ___H__23__23_subtyped_2d_mutable_3f_ ___H__23__23_subtyped_3f_ ___H__23__23_fixnum_3f_ ___H__23__23_subtype_2d_set_21_ ___H__23__23_subtype ___H__23__23_type_2d_cast ___H__23__23_type ___H__20___system ___H__23__23_gc_2d_hash_2d_table_2d_set_21_ ___H__23__23_table_2d_set_21_ ___H__23__23_gc_2d_hash_2d_table_2d_rehash_21_ ___H__23__23_table_2d_ref ___H__23__23_gc_2d_hash_2d_table_2d_ref ___H__23__23_make_2d_table ___H__23__23_string_2d_ci_3d__3f__2d_hash ____20___system _GLOBAL__sub_I_65535_0__system.c
Execution times (seconds)
 phase setup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    1134 kB ( 0%) ggc
 phase parsing           :   0.11 ( 0%) usr   0.12 (14%) sys   0.23 ( 1%) wall    7383 kB ( 1%) ggc
 phase opt and generate  :  35.79 (100%) usr   0.73 (86%) sys  36.55 (99%) wall  513422 kB (98%) ggc
 garbage collection      :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall       0 kB ( 0%) ggc
 dump files              :   0.00 ( 0%) usr   0.01 ( 1%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 callgraph construction  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall    2337 kB ( 0%) ggc
 callgraph optimization  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall     399 kB ( 0%) ggc
 ipa dead code removal   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 ipa inlining heuristics :   0.01 ( 0%) usr   0.01 ( 1%) sys   0.01 ( 0%) wall    1132 kB ( 0%) ggc
 ipa profile             :   0.00 ( 0%) usr   0.01 ( 1%) sys   0.00 ( 0%) wall    2688 kB ( 1%) ggc
 ipa pure const          :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 cfg construction        :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     416 kB ( 0%) ggc
 cfg cleanup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall      14 kB ( 0%) ggc
 trivially dead code     :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) wall       0 kB ( 0%) ggc
 df scan insns           :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall      13 kB ( 0%) ggc
 df multiple defs        :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 df reaching defs        :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 df live regs            :   0.29 ( 1%) usr   0.00 ( 0%) sys   0.26 ( 1%) wall       0 kB ( 0%) ggc
 df live&initialized regs:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 df reg dead/unused notes:   0.24 ( 1%) usr   0.01 ( 1%) sys   0.24 ( 1%) wall   12426 kB ( 2%) ggc
 register information    :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 alias analysis          :   0.21 ( 1%) usr   0.00 ( 0%) sys   0.19 ( 1%) wall   23934 kB ( 5%) ggc
 alias stmt walking      :   0.33 ( 1%) usr   0.01 ( 1%) sys   0.28 ( 1%) wall     609 kB ( 0%) ggc
 register scan           :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     104 kB ( 0%) ggc
 rebuild jump labels     :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 preprocessing           :   0.03 ( 0%) usr   0.03 ( 4%) sys   0.06 ( 0%) wall    1743 kB ( 0%) ggc
 lexical analysis        :   0.03 ( 0%) usr   0.03 ( 4%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 parser (global)         :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall    1477 kB ( 0%) ggc
 parser function body    :   0.04 ( 0%) usr   0.06 ( 7%) sys   0.10 ( 0%) wall    3815 kB ( 1%) ggc
 inline parameters       :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall      89 kB ( 0%) ggc
 tree gimplify           :   0.03 ( 0%) usr   0.01 ( 1%) sys   0.02 ( 0%) wall    5057 kB ( 1%) ggc
 tree CFG construction   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    1743 kB ( 0%) ggc
 tree CFG cleanup        :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.24 ( 1%) wall     300 kB ( 0%) ggc
 tree copy propagation   :   0.28 ( 1%) usr   0.00 ( 0%) sys   0.31 ( 1%) wall    3211 kB ( 1%) ggc
 tree PTA                :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     217 kB ( 0%) ggc
 tree PHI insertion      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    2191 kB ( 0%) ggc
 tree SSA rewrite        :   0.19 ( 1%) usr   0.00 ( 0%) sys   0.18 ( 0%) wall   17777 kB ( 3%) ggc
 tree SSA other          :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      18 kB ( 0%) ggc
 tree SSA incremental    :   0.23 ( 1%) usr   0.01 ( 1%) sys   0.26 ( 1%) wall   27481 kB ( 5%) ggc
 tree operand scan       :   0.02 ( 0%) usr   0.02 ( 2%) sys   0.05 ( 0%) wall   15630 kB ( 3%) ggc
 dominator optimization  :   0.22 ( 1%) usr   0.01 ( 1%) sys   0.22 ( 1%) wall   27417 kB ( 5%) ggc
 tree CCP                :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall     491 kB ( 0%) ggc
 tree PHI const/copy prop:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     127 kB ( 0%) ggc
 tree split crit edges   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     743 kB ( 0%) ggc
 tree reassociation      :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       7 kB ( 0%) ggc
 tree FRE                :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall    2875 kB ( 1%) ggc
 tree code sinking       :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree forward propagate  :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     336 kB ( 0%) ggc
 tree conservative DCE   :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall      99 kB ( 0%) ggc
 tree aggressive DCE     :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall      20 kB ( 0%) ggc
 tree DSE                :   2.80 ( 8%) usr   0.00 ( 0%) sys   2.80 ( 8%) wall       0 kB ( 0%) ggc
 tree loop invariant motion:   0.16 ( 0%) usr   0.03 ( 4%) sys   0.19 ( 1%) wall   64219 kB (12%) ggc
 scev constant prop      :   0.29 ( 1%) usr   0.00 ( 0%) sys   0.27 ( 1%) wall   12074 kB ( 2%) ggc
 complete unrolling      :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      16 kB ( 0%) ggc
 tree iv optimization    :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall     932 kB ( 0%) ggc
 tree SSA uncprop        :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 tree rename SSA copies  :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 dominance frontiers     :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 dominance computation   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall       0 kB ( 0%) ggc
 out of ssa              :   5.90 (16%) usr   0.50 (59%) sys   6.41 (17%) wall      26 kB ( 0%) ggc
 expand vars             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     866 kB ( 0%) ggc
 expand                  :   0.39 ( 1%) usr   0.02 ( 2%) sys   0.40 ( 1%) wall   87038 kB (17%) ggc
 post expand cleanups    :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     322 kB ( 0%) ggc
 forward prop            :   0.33 ( 1%) usr   0.00 ( 0%) sys   0.33 ( 1%) wall   14733 kB ( 3%) ggc
 CSE                     :   7.53 (21%) usr   0.01 ( 1%) sys   7.53 (20%) wall   30934 kB ( 6%) ggc
 dead code elimination   :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall       0 kB ( 0%) ggc
 dead store elim1        :   0.37 ( 1%) usr   0.01 ( 1%) sys   0.36 ( 1%) wall    7276 kB ( 1%) ggc
 dead store elim2        :   1.73 ( 5%) usr   0.00 ( 0%) sys   1.71 ( 5%) wall   18715 kB ( 4%) ggc
 loop init               :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     713 kB ( 0%) ggc
 loop invariant motion   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall      27 kB ( 0%) ggc
 branch prediction       :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     277 kB ( 0%) ggc
 combiner                :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall     913 kB ( 0%) ggc
 integrated RA           :   0.87 ( 2%) usr   0.00 ( 0%) sys   0.99 ( 3%) wall   48097 kB ( 9%) ggc
 LRA non-specific        :   1.61 ( 4%) usr   0.01 ( 1%) sys   1.63 ( 4%) wall   37254 kB ( 7%) ggc
 LRA virtuals elimination:   0.13 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall   15481 kB ( 3%) ggc
 LRA reload inheritance  :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall      11 kB ( 0%) ggc
 LRA create live ranges  :   0.32 ( 1%) usr   0.01 ( 1%) sys   0.29 ( 1%) wall    7642 kB ( 1%) ggc
 LRA hard reg assignment :   0.69 ( 2%) usr   0.01 ( 1%) sys   0.73 ( 2%) wall       0 kB ( 0%) ggc
 reload CSE regs         :   5.74 (16%) usr   0.00 ( 0%) sys   5.73 (16%) wall   12325 kB ( 2%) ggc
 thread pro- & epilogue  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     465 kB ( 0%) ggc
 combine stack adjustments:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 hard reg cprop          :   0.33 ( 1%) usr   0.00 ( 0%) sys   0.34 ( 1%) wall       4 kB ( 0%) ggc
 scheduling 2            :   1.79 ( 5%) usr   0.01 ( 1%) sys   1.77 ( 5%) wall     299 kB ( 0%) ggc
 machine dep reorg       :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 shorten branches        :   0.21 ( 1%) usr   0.00 ( 0%) sys   0.20 ( 1%) wall       0 kB ( 0%) ggc
 final                   :   0.36 ( 1%) usr   0.00 ( 0%) sys   0.34 ( 1%) wall    1508 kB ( 0%) ggc
 variable output         :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     146 kB ( 0%) ggc
 straight-line strength reduction:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall      16 kB ( 0%) ggc
 rest of compilation     :   0.35 ( 1%) usr   0.02 ( 2%) sys   0.37 ( 1%) wall     991 kB ( 0%) ggc
 remove unused locals    :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall       0 kB ( 0%) ggc
 address taken           :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 unaccounted todo        :   0.19 ( 1%) usr   0.00 ( 0%) sys   0.18 ( 0%) wall       0 kB ( 0%) ggc
 TOTAL                 :  35.90             0.85            36.79             521957 kB

The "phase opt and generate" part uses most of the CPU time and most of the RAM.  With somewhat larger files, RAM usage goes up > 80GB.

Including _system.i with this report.
Comment 1 lucier 2015-02-03 21:11:41 UTC
Created attachment 34660 [details]
Input file for bug
Comment 2 Andrew Pinski 2015-02-03 21:32:57 UTC
Note phase opt and generate is a toplevel time area.
The passes which take most of the time are:
 tree DSE                :   2.80 ( 8%) usr   0.00 ( 0%) sys   2.80 ( 8%) wall       0 kB ( 0%) ggc
 out of ssa              :   5.90 (16%) usr   0.50 (59%) sys   6.41 (17%) wall      26 kB ( 0%) ggc
 CSE                     :   7.53 (21%) usr   0.01 ( 1%) sys   7.53 (20%) wall   30934 kB ( 6%) ggc
 reload CSE regs         :   5.74 (16%) usr   0.00 ( 0%) sys   5.73 (16%) wall   12325 kB ( 2%) ggc
 scheduling 2            :   1.79 ( 5%) usr   0.01 ( 1%) sys   1.77 ( 5%) wall     299 kB ( 0%) ggc
Comment 3 Andrew Pinski 2015-02-03 21:35:11 UTC
I think this is just an issue with computed goto (indirect gotos).
Comment 4 lucier 2015-02-03 21:49:20 UTC
On 02/03/2015 04:32 PM, pinskia at gcc dot gnu.org wrote:
> > --- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
> Note phase opt and generate is a toplevel time area.
> The passes which take most of the time are:

I'm also concerned about excessive memory usage; the largest passes (> 
20 MB) are

  alias analysis          :   0.21 ( 1%) usr   0.00 ( 0%) sys   0.19 ( 
1%) wall   23934 kB ( 5%) ggc
  tree SSA incremental    :   0.23 ( 1%) usr   0.01 ( 1%) sys   0.26 ( 
1%) wall   27481 kB ( 5%) ggc
  dominator optimization  :   0.22 ( 1%) usr   0.01 ( 1%) sys   0.22 ( 
1%) wall   27417 kB ( 5%) ggc
  tree loop invariant motion:   0.16 ( 0%) usr   0.03 ( 4%) sys   0.19 ( 
1%) wall   64219 kB (12%) ggc
  expand                  :   0.39 ( 1%) usr   0.02 ( 2%) sys   0.40 ( 
1%) wall   87038 kB (17%) ggc
  CSE                     :   7.53 (21%) usr   0.01 ( 1%) sys   7.53 
(20%) wall   30934 kB ( 6%) ggc
  integrated RA           :   0.87 ( 2%) usr   0.00 ( 0%) sys   0.99 ( 
3%) wall   48097 kB ( 9%) ggc
  LRA non-specific        :   1.61 ( 4%) usr   0.01 ( 1%) sys   1.63 ( 
4%) wall   37254 kB ( 7%) ggc

This also affects the 4.8 branch and the mainline.
Comment 5 lucier 2015-02-06 05:07:11 UTC
Created attachment 34681 [details]
_io.i.gz: larger test file

With this compiler:

firefly:~/Downloads/gambit/lib> /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/pkgs/gcc-mainline/bin/gcc
COLLECT_LTO_WRAPPER=/pkgs/gcc-mainline/libexec/gcc/x86_64-unknown-linux-gnu/5.0.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../../gcc-devel/configure --prefix=/pkgs/gcc-mainline --enable-languages=c --enable-checking=release
Thread model: posix
gcc version 5.0.0 20150206 (experimental) [trunk revision 220467] (GCC) 

and the input file _io.c, I find

/pkgs/gcc-mainline/bin/gcc -Q -save-temps -Wno-unused -Wno-write-strings -O1 -fno-math-errno -fschedule-insns2 -fno-strict-aliasing -fno-trapping-math -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp  -fprofile-arcs -ftest-coverage  -I"../include" -c -o "_io.o" -I. -DHAVE_CONFIG_H -D___GAMBCDIR="\"/usr/local/Gambit-C\"" -D___SYS_TYPE_CPU="\"x86_64\"" -D___SYS_TYPE_VENDOR="\"unknown\"" -D___SYS_TYPE_OS="\"linux-gnu\"" -D___CONFIGURE_COMMAND="\"./configure 'CC=/pkgs/gcc-mainline/bin/gcc -Q -save-temps' '--enable-coverage' '--enable-track-scheme'"\" -D___OBJ_EXTENSION="\".o\"" -D___EXE_EXTENSION="\"\"" -D___BAT_EXTENSION="\"\"" -D___PRIMAL _io.c -D___LIBRARY

Execution times (seconds)
 phase setup             :   0.78 (100%) usr   0.04 (100%) sys   0.83 (100%) wall  156905 kB (100%) ggc
 TOTAL                 :   0.78             0.04             0.83             156922 kB
 btowc wctob mbrlen __signbitf __signbit __signbitl ___H__20___io ___H__23__23_fail_2d_check_2d_datum_2d_parsing_2d_exception ___H_datum_2d_parsing_2d_exception_3f_ ___H_datum_2d_parsing_2d_exception_2d_kind ___H_datum_2d_parsing_2d_exception_2d_readenv ___H_datum_2d_parsing_2d_exception_2d_parameters ___H__23__23_raise_2d_datum_2d_parsing_2d_exception ___H__23__23_fail_2d_check_2d_unterminated_2d_process_2d_exception ___H_unterminated_2d_process_2d_exception_3f_ ___H_unterminated_2d_process_2d_exception_2d_procedure ___H_unterminated_2d_process_2d_exception_2d_arguments ___H__23__23_raise_2d_unterminated_2d_process_2d_exception ___H__23__23_fail_2d_check_2d_nonempty_2d_input_2d_port_2d_character_2d_buffer_2d_exception ___H_nonempty_2d_input_2d_port_2d_character_2d_buffer_2d_exception_3f_ ___H_nonempty_2d_input_2d_port_2d_character_2d_buffer_2d_exception_2d_procedure ___H_nonempty_2d_input_2d_port_2d_character_2d_buffer_2d_exception_2d_arguments ___H__23__23_raise_2d_nonempty_2d_input_2d_port_2d_character_2d_buffer_2d_exception ___H__23__23_fail_2d_check_2d_no_2d_such_2d_file_2d_or_2d_directory_2d_exception ___H_no_2d_such_2d_file_2d_or_2d_directory_2d_exception_3f_ ___H_no_2d_such_2d_file_2d_or_2d_directory_2d_exception_2d_procedure ___H_no_2d_such_2d_file_2d_or_2d_directory_2d_exception_2d_arguments ___H__23__23_raise_2d_no_2d_such_2d_file_2d_or_2d_directory_2d_exception ___H__23__23_raise_2d_os_2d_io_2d_exception ___H__23__23_raise_2d_io_2d_exception ___H__23__23_fail_2d_check_2d_settings ___H__23__23_fail_2d_check_2d_exact_2d_integer_2d_or_2d_string_2d_or_2d_settings ___H__23__23_fail_2d_check_2d_string_2d_or_2d_ip_2d_address ___H__23__23_make_2d_writeenv ___H__23__23_make_2d_readenv ___H__23__23_readenv_2d_current_2d_filepos ___H__23__23_readenv_2d_relative_2d_filepos ___H__23__23_make_2d_psettings ___H__23__23_parse_2d_psettings_21_ ___H__23__23_psettings_2d__3e_roptions ___H__23__23_psettings_2d__3e_woptions ___H__23__23_psettings_2d__3e_input_2d_readtable ___H__23__23_psettings_2d__3e_output_2d_readtable ___H__23__23_psettings_2d_options_2d__3e_options ___H__23__23_psettings_2d__3e_device_2d_flags ___H__23__23_psettings_2d__3e_permissions ___H__23__23_psettings_2d__3e_output_2d_width ___H__23__23_port_3f_ ___H_port_3f_ ___H__23__23_input_2d_port_3f_ ___H_input_2d_port_3f_ ___H__23__23_output_2d_port_3f_ ___H_output_2d_port_3f_ ___H__23__23_fail_2d_check_2d_port ___H__23__23_fail_2d_check_2d_input_2d_port ___H__23__23_fail_2d_check_2d_output_2d_port ___H__23__23_fail_2d_check_2d_character_2d_input_2d_port ___H__23__23_fail_2d_check_2d_character_2d_output_2d_port ___H__23__23_fail_2d_check_2d_byte_2d_port ___H__23__23_fail_2d_check_2d_byte_2d_input_2d_port ___H__23__23_fail_2d_check_2d_byte_2d_output_2d_port ___H__23__23_fail_2d_check_2d_device_2d_input_2d_port ___H__23__23_fail_2d_check_2d_device_2d_output_2d_port ___H__23__23_make_2d_io_2d_condvar ___H__23__23_io_2d_condvar_3f_ ___H__23__23_io_2d_condvar_2d_for_2d_writing_3f_ ___H__23__23_io_2d_condvar_2d_port ___H__23__23_io_2d_condvar_2d_port_2d_set_21_ ___H__23__23_make_2d_dummy_2d_port ___H_open_2d_dummy ___H__23__23_make_2d_device_2d_port ___H__23__23_make_2d_rdevice_2d_condvar ___H__23__23_make_2d_wdevice_2d_condvar ___H__23__23_make_2d_device_2d_port_2d_from_2d_single_2d_device ___H__23__23_close_2d_device ___H__23__23_input_2d_port_2d_byte_2d_position ___H_input_2d_port_2d_byte_2d_position ___H__23__23_output_2d_port_2d_byte_2d_position ___H_output_2d_port_2d_byte_2d_position ___H__23__23_device_2d_port_2d_wait_2d_for_2d_input_21_ ___H__23__23_device_2d_port_2d_wait_2d_for_2d_output_21_ ___H__23__23_char_2d_rbuf_2d_fill ___H__23__23_byte_2d_rbuf_2d_fill ___H__23__23_char_2d_wbuf_2d_drain_2d_no_2d_reset ___H__23__23_char_2d_wbuf_2d_drain ___H__23__23_byte_2d_wbuf_2d_drain_2d_no_2d_reset ___H__23__23_byte_2d_wbuf_2d_drain ___H__23__23_vect_2d_port_2d_options ___H__23__23_fail_2d_check_2d_vector_2d_input_2d_port ___H__23__23_fail_2d_check_2d_vector_2d_output_2d_port ___H__23__23_fail_2d_check_2d_vector_2d_or_2d_settings ___H__23__23_subvector_2d__3e_fifo ___H__23__23_fifo_2d__3e_vector ___H__23__23_open_2d_vector_2d_generic ___H__23__23_open_2d_vector ___H_open_2d_vector ___H__23__23_make_2d_vector_2d_pipe_2d_port ___H__23__23_open_2d_vector_2d_pipe_2d_generic ___H__23__23_open_2d_vector_2d_pipe ___H_open_2d_vector_2d_pipe ___H__23__23_open_2d_input_2d_vector ___H_open_2d_input_2d_vector ___H__23__23_open_2d_output_2d_vector ___H_open_2d_output_2d_vector ___H__23__23_get_2d_output_2d_vector ___H_get_2d_output_2d_vector ___H_call_2d_with_2d_input_2d_vector ___H_call_2d_with_2d_output_2d_vector ___H_with_2d_input_2d_from_2d_vector ___H_with_2d_output_2d_to_2d_vector ___H__23__23_make_2d_vector_2d_port ___H__23__23_fail_2d_check_2d_string_2d_input_2d_port ___H__23__23_fail_2d_check_2d_string_2d_output_2d_port ___H__23__23_fail_2d_check_2d_string_2d_or_2d_settings ___H__23__23_substring_2d__3e_fifo ___H__23__23_fifo_2d__3e_string ___H__23__23_open_2d_string_2d_generic ___H__23__23_open_2d_string ___H_open_2d_string ___H__23__23_make_2d_string_2d_pipe_2d_port ___H__23__23_open_2d_string_2d_pipe_2d_generic ___H__23__23_open_2d_string_2d_pipe ___H_open_2d_string_2d_pipe ___H__23__23_open_2d_input_2d_string ___H_open_2d_input_2d_string ___H__23__23_open_2d_output_2d_string ___H_open_2d_output_2d_string ___H__23__23_get_2d_output_2d_string ___H_get_2d_output_2d_string ___H_call_2d_with_2d_input_2d_string ___H_call_2d_with_2d_output_2d_string ___H_with_2d_input_2d_from_2d_string ___H_with_2d_output_2d_to_2d_string ___H__23__23_make_2d_string_2d_port ___H__23__23_fail_2d_check_2d_u8vector_2d_input_2d_port ___H__23__23_fail_2d_check_2d_u8vector_2d_output_2d_port ___H__23__23_fail_2d_check_2d_u8vector_2d_or_2d_settings ___H__23__23_subu8vector_2d__3e_fifo ___H__23__23_fifo_2d__3e_u8vector ___H__23__23_open_2d_u8vector_2d_generic ___H__23__23_open_2d_u8vector ___H_open_2d_u8vector ___H__23__23_make_2d_u8vector_2d_pipe_2d_port ___H__23__23_open_2d_u8vector_2d_pipe_2d_generic ___H__23__23_open_2d_u8vector_2d_pipe ___H_open_2d_u8vector_2d_pipe ___H__23__23_open_2d_input_2d_u8vector ___H_open_2d_input_2d_u8vector ___H__23__23_open_2d_output_2d_u8vector ___H_open_2d_output_2d_u8vector ___H__23__23_get_2d_output_2d_u8vector ___H_get_2d_output_2d_u8vector ___H_call_2d_with_2d_input_2d_u8vector ___H_call_2d_with_2d_output_2d_u8vector ___H_with_2d_input_2d_from_2d_u8vector ___H_with_2d_output_2d_to_2d_u8vector ___H__23__23_make_2d_u8vector_2d_port ___H__23__23_port_2d_of_2d_kind_3f_ ___H__23__23_port_2d_kind ___H__23__23_port_2d_device ___H__23__23_port_2d_name ___H__23__23_read ___H_read ___H__23__23_write_2d_generic_2d_to_2d_character_2d_port ___H__23__23_write ___H_write ___H__23__23_display ___H_display ___H__23__23_pretty_2d_print ___H_pretty_2d_print ___H__23__23_print ___H_print ___H_println ___H__23__23_newline ___H_newline ___H__23__23_flush_2d_input_2d_buffering ___H__23__23_force_2d_output ___H_force_2d_output ___H__23__23_close_2d_input_2d_port ___H_close_2d_input_2d_port ___H__23__23_close_2d_output_2d_port ___H_close_2d_output_2d_port ___H__23__23_close_2d_port ___H_close_2d_port ___H_input_2d_port_2d_readtable ___H_input_2d_port_2d_readtable_2d_set_21_ ___H_output_2d_port_2d_readtable ___H_output_2d_port_2d_readtable_2d_set_21_ ___H__23__23_input_2d_port_2d_timeout_2d_set_21_ ___H_input_2d_port_2d_timeout_2d_set_21_ ___H__23__23_output_2d_port_2d_timeout_2d_set_21_ ___H_output_2d_port_2d_timeout_2d_set_21_ ___H__23__23_port_2d_io_2d_exception_2d_handler_2d_set_21_ ___H_port_2d_io_2d_exception_2d_handler_2d_set_21_ ___H__23__23_input_2d_port_2d_char_2d_position ___H_input_2d_port_2d_char_2d_position ___H__23__23_output_2d_port_2d_char_2d_position ___H_output_2d_port_2d_char_2d_position ___H__23__23_input_2d_port_2d_line_2d_set_21_ ___H__23__23_input_2d_port_2d_line ___H_input_2d_port_2d_line ___H__23__23_input_2d_port_2d_column_2d_set_21_ ___H__23__23_input_2d_port_2d_column ___H_input_2d_port_2d_column ___H__23__23_output_2d_port_2d_line_2d_set_21_ ___H__23__23_output_2d_port_2d_line ___H_output_2d_port_2d_line ___H__23__23_output_2d_port_2d_column_2d_set_21_ ___H__23__23_output_2d_port_2d_column ___H_output_2d_port_2d_column ___H__23__23_output_2d_port_2d_width ___H_output_2d_port_2d_width ___H__23__23_object_2d__3e_truncated_2d_string ___H__23__23_object_2d__3e_string ___H_object_2d__3e_string ___H__23__23_string_2d__3e_limited_2d_string ___H__23__23_force_2d_limited_2d_string_21_ ___H__23__23_input_2d_port_2d_characters_2d_buffered ___H_input_2d_port_2d_characters_2d_buffered ___H__23__23_char_2d_ready_3f_ ___H_char_2d_ready_3f_ ___H__23__23_peek_2d_char ___H_peek_2d_char ___H__23__23_read_2d_char ___H_read_2d_char ___H__23__23_read_2d_substring ___H_read_2d_substring ___H__23__23_read_2d_line ___H_read_2d_line ___H__23__23_read_2d_all ___H_read_2d_all ___H__23__23_read_2d_all_2d_as_2d_a_2d_begin_2d_expr_2d_from_2d_path ___H__23__23_read_2d_all_2d_as_2d_a_2d_begin_2d_expr_2d_from_2d_psettings ___H__23__23_read_2d_all_2d_as_2d_a_2d_begin_2d_expr_2d_from_2d_port ___H__23__23_write_2d_char ___H_write_2d_char ___H__23__23_write_2d_substring ___H_write_2d_substring ___H__23__23_write_2d_string ___H__23__23_input_2d_port_2d_bytes_2d_buffered ___H_input_2d_port_2d_bytes_2d_buffered ___H__23__23_read_2d_u8 ___H_read_2d_u8 ___H__23__23_read_2d_subu8vector ___H_read_2d_subu8vector ___H__23__23_write_2d_u8 ___H_write_2d_u8 ___H__23__23_write_2d_subu8vector ___H_write_2d_subu8vector ___H__23__23_options_2d_set_21_ ___H__23__23_port_2d_settings_2d_set_21_ ___H_port_2d_settings_2d_set_21_ ___H__23__23_fail_2d_check_2d_tty_2d_port ___H__23__23_tty_3f_ ___H_tty_3f_ ___H__23__23_tty_2d_type_2d_set_21_ ___H_tty_2d_type_2d_set_21_ ___H__23__23_tty_2d_text_2d_attributes_2d_set_21_ ___H_tty_2d_text_2d_attributes_2d_set_21_ ___H__23__23_tty_2d_history ___H_tty_2d_history ___H__23__23_tty_2d_history_2d_set_21_ ___H_tty_2d_history_2d_set_21_ ___H__23__23_tty_2d_history_2d_max_2d_length_2d_set_21_ ___H_tty_2d_history_2d_max_2d_length_2d_set_21_ ___H__23__23_tty_2d_paren_2d_balance_2d_duration_2d_set_21_ ___H_tty_2d_paren_2d_balance_2d_duration_2d_set_21_ ___H__23__23_tty_2d_mode_2d_set_21_ ___H_tty_2d_mode_2d_set_21_ ___H__23__23_fail_2d_check_2d_process_2d_port ___H__23__23_make_2d_process_2d_psettings ___H__23__23_open_2d_process_2d_generic ___H__23__23_open_2d_process ___H_open_2d_process ___H__23__23_open_2d_input_2d_process ___H_open_2d_input_2d_process ___H__23__23_open_2d_output_2d_process ___H_open_2d_output_2d_process ___H_call_2d_with_2d_input_2d_process ___H_call_2d_with_2d_output_2d_process ___H_with_2d_input_2d_from_2d_process ___H_with_2d_output_2d_to_2d_process ___H__23__23_process_2d_pid ___H_process_2d_pid ___H__23__23_process_2d_status ___H_process_2d_status ___H__23__23_fail_2d_check_2d_host_2d_info ___H_host_2d_info_3f_ ___H_host_2d_info_2d_name ___H_host_2d_info_2d_aliases ___H_host_2d_info_2d_addresses ___H__23__23_host_2d_info ___H_host_2d_info ___H__23__23_host_2d_name ___H_host_2d_name ___H__23__23_string_2d_or_2d_ip_2d_address_3f_ ___H__23__23_ip_2d_address_3f_ ___H__23__23_fail_2d_check_2d_service_2d_info ___H_service_2d_info_3f_ ___H_service_2d_info_2d_name ___H_service_2d_info_2d_aliases ___H_service_2d_info_2d_port_2d_number ___H_service_2d_info_2d_protocol ___H__23__23_service_2d_info ___H_service_2d_info ___H__23__23_fail_2d_check_2d_protocol_2d_info ___H_protocol_2d_info_3f_ ___H_protocol_2d_info_2d_name ___H_protocol_2d_info_2d_aliases ___H_protocol_2d_info_2d_number ___H__23__23_protocol_2d_info ___H_protocol_2d_info ___H__23__23_fail_2d_check_2d_network_2d_info ___H_network_2d_info_3f_ ___H_network_2d_info_2d_name ___H_network_2d_info_2d_aliases ___H_network_2d_info_2d_number ___H__23__23_network_2d_info ___H_network_2d_info ___H__23__23_fail_2d_check_2d_tcp_2d_client_2d_port ___H__23__23_make_2d_tcp_2d_psettings ___H__23__23_make_2d_tcp_2d_client_2d_port ___H__23__23_open_2d_tcp_2d_client ___H_open_2d_tcp_2d_client ___H__23__23_fail_2d_check_2d_socket_2d_info ___H_socket_2d_info_3f_ ___H_socket_2d_info_2d_family ___H_socket_2d_info_2d_port_2d_number ___H_socket_2d_info_2d_address ___H__23__23_socket_2d_info_2d_setup_21_ ___H__23__23_tcp_2d_client_2d_socket_2d_info ___H__23__23_tcp_2d_client_2d_self_2d_socket_2d_info ___H_tcp_2d_client_2d_self_2d_socket_2d_info ___H__23__23_tcp_2d_client_2d_peer_2d_socket_2d_info ___H_tcp_2d_client_2d_peer_2d_socket_2d_info ___H__23__23_fail_2d_check_2d_address_2d_info ___H_address_2d_info_3f_ ___H_address_2d_info_2d_family ___H_address_2d_info_2d_socket_2d_type ___H_address_2d_info_2d_protocol ___H_address_2d_info_2d_socket_2d_info ___H__23__23_net_2d_family_2d_encode ___H__23__23_net_2d_family_2d_decode ___H__23__23_net_2d_socket_2d_type_2d_encode ___H__23__23_net_2d_socket_2d_type_2d_decode ___H__23__23_net_2d_protocol_2d_encode ___H__23__23_net_2d_protocol_2d_decode ___H__23__23_address_2d_info_2d_setup_21_ ___H__23__23_address_2d_infos ___H_address_2d_infos ___H__23__23_fail_2d_check_2d_tcp_2d_server_2d_port ___H__23__23_make_2d_tcp_2d_server_2d_port ___H__23__23_process_2d_tcp_2d_server_2d_psettings ___H__23__23_open_2d_tcp_2d_server_2d_aux ___H__23__23_open_2d_tcp_2d_server ___H_open_2d_tcp_2d_server ___H__23__23_tcp_2d_server_2d_socket_2d_info ___H_tcp_2d_server_2d_socket_2d_info ___H__23__23_string_2d__3e_address_2d_and_2d_port_2d_number ___H__23__23_fail_2d_check_2d_directory_2d_port ___H__23__23_make_2d_directory_2d_psettings ___H__23__23_make_2d_directory_2d_port ___H__23__23_open_2d_directory ___H_open_2d_directory ___H__23__23_fail_2d_check_2d_event_2d_queue_2d_port ___H__23__23_make_2d_event_2d_queue_2d_port ___H__23__23_open_2d_event_2d_queue ___H_open_2d_event_2d_queue ___H__23__23_make_2d_path_2d_psettings ___H__23__23_make_2d_input_2d_path_2d_psettings ___H__23__23_open_2d_file_2d_generic ___H__23__23_open_2d_file_2d_generic_2d_from_2d_psettings ___H__23__23_path_2d_reference ___H__23__23_open_2d_file ___H_open_2d_file ___H__23__23_open_2d_input_2d_file ___H_open_2d_input_2d_file ___H__23__23_open_2d_output_2d_file ___H_open_2d_output_2d_file ___H_call_2d_with_2d_input_2d_file ___H_call_2d_with_2d_output_2d_file ___H_with_2d_input_2d_from_2d_file ___H_with_2d_output_2d_to_2d_file ___H_with_2d_input_2d_from_2d_port ___H_with_2d_output_2d_to_2d_port ___H__23__23_open_2d_predefined ___H_console_2d_port ___H__23__23_open_2d_all_2d_predefined ___H__23__23_force_2d_output_2d_on_2d_predefined ___H__23__23_make_2d_filepos ___H__23__23_filepos_2d_line ___H__23__23_filepos_2d_col ___H__23__23_fail_2d_check_2d_readtable ___H__23__23_readtable_3f_ ___H_readtable_3f_ ___H__23__23_readtable_2d_copy_2d_shallow ___H__23__23_readtable_2d_copy ___H_readtable_2d_case_2d_conversion_3f_ ___H_readtable_2d_case_2d_conversion_3f__2d_set ___H_readtable_2d_keywords_2d_allowed_3f_ ___H_readtable_2d_keywords_2d_allowed_3f__2d_set ___H_readtable_2d_sharing_2d_allowed_3f_ ___H_readtable_2d_sharing_2d_allowed_3f__2d_set ___H_readtable_2d_eval_2d_allowed_3f_ ___H_readtable_2d_eval_2d_allowed_3f__2d_set ___H_readtable_2d_write_2d_extended_2d_read_2d_macros_3f_ ___H_readtable_2d_write_2d_extended_2d_read_2d_macros_3f__2d_set ___H_readtable_2d_write_2d_cdr_2d_read_2d_macros_3f_ ___H_readtable_2d_write_2d_cdr_2d_read_2d_macros_3f__2d_set ___H_readtable_2d_max_2d_write_2d_level ___H_readtable_2d_max_2d_write_2d_level_2d_set ___H_readtable_2d_max_2d_write_2d_length ___H_readtable_2d_max_2d_write_2d_length_2d_set ___H_readtable_2d_max_2d_unescaped_2d_char ___H_readtable_2d_max_2d_unescaped_2d_char_2d_set ___H_readtable_2d_comment_2d_handler ___H_readtable_2d_comment_2d_handler_2d_set ___H_readtable_2d_start_2d_syntax ___H_readtable_2d_start_2d_syntax_2d_set ___H__23__23_extract_2d_language_2d_and_2d_tail ___H__23__23_readtable_2d_setup_2d_for_2d_language_21_ ___H__23__23_readtable_2d_setup_2d_for_2d_standard_2d_level_21_ ___H__23__23_make_2d_readtable_2d_parameter ___H__23__23_start_2d_main ___H__23__23_make_2d_marktable ___H__23__23_marktable_2d_mark_21_ ___H__23__23_marktable_2d_lookup_21_ ___H__23__23_marktable_2d_save ___H__23__23_marktable_2d_restore_21_ ___H__23__23_might_2d_write_2d_differently_3f_ ___H__23__23_default_2d_wr ___H__23__23_wr_2d_str ___H__23__23_wr_2d_substr ___H__23__23_wr_2d_ch ___H__23__23_wr_2d_filler ___H__23__23_wr_2d_spaces ___H__23__23_wr_2d_indent ___H__23__23_shifted_2d_column ___H__23__23_wr_2d_sn ___H__23__23_wr_2d_no_2d_display ___H__23__23_wr_2d_mark ___H__23__23_wr_2d_stamp ___H__23__23_wr_2d_symbol ___H__23__23_escape_2d_symbol_3f_ ___H__23__23_escape_2d_symkey_3f_ ___H__23__23_wr_2d_keyword ___H__23__23_escape_2d_keyword_3f_ ___H__23__23_wr_2d_pair ___H__23__23_print_2d_marker ___H__23__23_wr_2d_one_2d_line_2d_pretty_2d_print ___H__23__23_wr_2d_fits_2d_on_2d_line ___H__23__23_wr_2d_complex ___H__23__23_wr_2d_char ___H__23__23_wr_2d_hex ___H__23__23_wr_2d_oct ___H__23__23_wr_2d_string ___H__23__23_wr_2d_escaped_2d_string ___H__23__23_reader_2d__3e_open_2d_close ___H__23__23_head_2d__3e_open_2d_close ___H__23__23_wr_2d_vector ___H__23__23_wr_2d_vector_2d_aux1 ___H__23__23_wr_2d_vector_2d_aux2 ___H__23__23_wr_2d_vector_2d_aux3 ___H__23__23_wr_2d_foreign ___H__23__23_explode_2d_object ___H__23__23_implode_2d_object ___H__23__23_explode_2d_structure ___H__23__23_implode_2d_structure ___H__23__23_implode_2d_frame ___H__23__23_implode_2d_continuation ___H__23__23_explode_2d_procedure ___H__23__23_explode_2d_closure ___H__23__23_explode_2d_subprocedure ___H__23__23_implode_2d_procedure ___H__23__23_implode_2d_procedure_2d_or_2d_return ___H__23__23_explode_2d_return ___H__23__23_implode_2d_return ___H__23__23_wr_2d_opaque ___H__23__23_wr_2d_serialize ___H__23__23_wr_2d_s8vector ___H__23__23_wr_2d_u8vector ___H__23__23_wr_2d_s16vector ___H__23__23_wr_2d_u16vector ___H__23__23_wr_2d_s32vector ___H__23__23_wr_2d_u32vector ___H__23__23_wr_2d_s64vector ___H__23__23_wr_2d_u64vector ___H__23__23_wr_2d_f32vector ___H__23__23_wr_2d_f64vector ___H__23__23_wr_2d_structure ___H__23__23_wr_2d_gc_2d_hash_2d_table ___H__23__23_explode_2d_gc_2d_hash_2d_table ___H__23__23_implode_2d_gc_2d_hash_2d_table ___H__23__23_wr_2d_meroon ___H__23__23_wr_2d_jazz ___H__23__23_wr_2d_frame ___H__23__23_wr_2d_continuation ___H__23__23_wr_2d_promise ___H__23__23_explode_2d_promise ___H__23__23_implode_2d_promise ___H__23__23_wr_2d_will ___H__23__23_wr_2d_procedure ___H__23__23_wr_2d_return ___H__23__23_wr_2d_box ___H__23__23_wr_2d_other ___H__23__23_eof_2d_object_3f_ ___H_eof_2d_object_3f_ ___H_transcript_2d_on ___H_transcript_2d_off ___H__23__23_make_2d_chartable ___H__23__23_chartable_2d_copy ___H__23__23_chartable_2d_ref ___H__23__23_chartable_2d_set_21_ ___H__23__23_readtable_2d_char_2d_delimiter_3f_ ___H__23__23_readtable_2d_char_2d_delimiter_3f__2d_set_21_ ___H__23__23_readtable_2d_char_2d_handler ___H__23__23_readtable_2d_char_2d_handler_2d_set_21_ ___H__23__23_readtable_2d_char_2d_sharp_2d_handler ___H__23__23_readtable_2d_char_2d_sharp_2d_handler_2d_set_21_ ___H__23__23_readtable_2d_char_2d_class_2d_set_21_ ___H__23__23_readtable_2d_convert_2d_case ___H__23__23_readtable_2d_string_2d_convert_2d_case_21_ ___H__23__23_readtable_2d_parse_2d_keyword ___H__23__23_read_2d_datum_2d_or_2d_eof ___H__23__23_read_2d_datum_2d_or_2d_label ___H__23__23_read_2d_datum_2d_or_2d_label_2d_or_2d_none ___H__23__23_read_2d_datum_2d_or_2d_label_2d_or_2d_none_2d_or_2d_dot ___H__23__23_script_2d_marker ___H__23__23_none_2d_marker ___H__23__23_dot_2d_marker ___H__23__23_label_2d_marker_3f_ ___H__23__23_label_2d_marker_2d_enter_21_ ___H__23__23_label_2d_marker_2d_reference ___H__23__23_label_2d_marker_2d_fixup_2d_handler_2d_add_21_ ___H__23__23_label_2d_marker_2d_define ___H__23__23_label_2d_marker_2d_fixup_21_ ___H__23__23_read_2d_check_2d_labels_21_ ___H__23__23_build_2d_list ___H__23__23_read_2d_next_2d_char_2d_expecting ___H__23__23_build_2d_vector ___H__23__23_build_2d_delimited_2d_string ___H__23__23_build_2d_delimited_2d_number_2f_keyword_2f_symbol ___H__23__23_string_2d__3e_number_2f_keyword_2f_symbol ___H__23__23_char_2d_octal_3f_ ___H__23__23_char_2d_hexadecimal_3f_ ___H__23__23_build_2d_escaped_2d_string_2d_up_2d_to ___H__23__23_build_2d_decimal_2d_integer ___H__23__23_build_2d_read_2d_macro ___H__23__23_skip_2d_extended_2d_comment ___H__23__23_skip_2d_single_2d_line_2d_comment ___H__23__23_skip_2d_comment_2d_done ___H__23__23_read_2d_sharp ___H__23__23_read_2d_sharp_2d_aux ___H__23__23_read_2d_sharp_2d_vector ___H__23__23_read_2d_sharp_2d_char ___H__23__23_read_2d_sharp_2d_comment ___H__23__23_read_2d_sharp_2d_bang ___H__23__23_read_2d_sharp_2d_keyword_2f_symbol ___H__23__23_read_2d_sharp_2d_colon ___H__23__23_read_2d_sharp_2d_semicolon ___H__23__23_read_2d_sharp_2d_quotation ___H__23__23_read_2d_sharp_2d_ampersand ___H__23__23_read_2d_sharp_2d_dot ___H__23__23_read_2d_sharp_2d_less ___H__23__23_read_2d_sharp_2d_digit ___H__23__23_wrap ___H__23__23_wrap_2d_op ___H__23__23_wrap_2d_op0 ___H__23__23_wrap_2d_op1 ___H__23__23_wrap_2d_op1_2a_ ___H__23__23_wrap_2d_op2 ___H__23__23_wrap_2d_op3 ___H__23__23_wrap_2d_op4 ___H__23__23_read_2d_sharp_2d_other ___H__23__23_read_2d_whitespace ___H__23__23_read_2d_single_2d_line_2d_comment ___H__23__23_read_2d_escaped_2d_string ___H__23__23_read_2d_quotation ___H__23__23_closing_2d_parenthesis_2d_for ___H__23__23_read_2d_vector_2d_or_2d_list ___H__23__23_read_2d_list ___H__23__23_read_2d_vector ___H__23__23_read_2d_other ___H__23__23_read_2d_none ___H__23__23_read_2d_illegal ___H__23__23_read_2d_dot ___H__23__23_read_2d_number_2f_keyword_2f_symbol ___H__23__23_read_2d_assoc_2d_string_3d__3f_ ___H__23__23_read_2d_string_3d__3f_ ___H__23__23_read_2d_six ___H__23__23_read_2d_six_2d_datum_2d_or_2d_eof ___H__23__23_six_2d_type_3f_ ___H__23__23_make_2d_standard_2d_readtable ___setup_mod ___init_mod ____20___io
Analyzing compilation unit
Performing interprocedural optimizations
 <*free_lang_data> <visibility> <build_ssa_passes> <chkp_passes> <opt_local_passes> <free-inline-summary> <profile> <whole-program> <profile_estimate> <inline> <pure-const> <static-var> <single-use> <comdats>Assembling functions:
 ___setup_mod ___init_mod ___H__23__23_make_2d_standard_2d_readtable ___H__23__23_six_2d_type_3f_ ___H__23__23_read_2d_six_2d_datum_2d_or_2d_eof {GC 1963188k -> 1911014k}^Cmakefile:150: recipe for target '_io.o' failed
make: *** [_io.o] Interrupt


When I killed it, top was reporting:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND     
 8760 lucier    20   0 37.918g 0.029t    584 D   4.7 95.6  34:11.14 cc1       

(I don't remember seeing resident memory measured in terabytes before ;-)

I'm having similar problems with the 4.8 branch.  

I'm including _io.i.gz
Comment 6 lucier 2015-02-06 05:08:26 UTC
The problem does not appear with this compiler:

maclaurin-271% gcc -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) 

so it appears to be a regression.

Brad
Comment 7 Richard Biener 2015-02-09 14:31:29 UTC
Given from the description I suppose that non-profiling/coverage mode is fine.
Comment 8 Richard Biener 2015-02-09 15:07:38 UTC
Ok, so the memory is used by out-of-SSA it seems

#5  0x0000000000c9eebc in coalesce_ssa_name ()
    at /space/rguenther/src/svn/gcc-4_9-branch/gcc/tree-ssa-coalesce.c:1330
1330      graph = build_ssa_conflict_graph (liveinfo);
(gdb) p *cl->list.htab
$10 = {entries = 0x2b19b30, size = 524287, n_elements = 77146, n_deleted = 0, 
  searches = 122189, collisions = 6508, size_prime_index = 16}

where we malloc(!) 77146 entries of size 12.

But of course bad is the conflict graph with 76063 bitmaps eating up around
1GB of memory for the first testcase (and function ___H__23__23_u8vector_2d__3e_object).

That's likely caused by the change to more aggressively coalesce anonymous
SSA names.
Comment 9 Richard Biener 2015-02-09 15:29:14 UTC
It seems that loop invariant motion is responsible for most of the abnormals,
thus -fno-tree-loop-im restores performance.

The loop LIM detects is of style

  <bb 6>: (header)
  # ___fp_3(ab) = PHI <___fp_41(4), ___fp_5(21)>
  # ___r1_7(ab) = PHI <___r1_42(4), ___r1_9(21)>
  # ___r2_11(ab) = PHI <___r2_43(4), ___r3_17(21)>
  # ___r3_19(ab) = PHI <___r3_44(4), ___r3_23(21)>
  # ___r4_25 = PHI <___r4_45(4), ___r4_26(21)>
  # gotovar.17_29 = PHI <_51(4), _69(21)>
  goto gotovar.17_29;

...

  <bb 21>: (latch)
  _67 = ___pc_1 + 15;
  _68 = (void * *) _67;
  _69 = *_68;
  PROF_edge_counter_142 = __gcov0.___H_object_2d__3e_u8vector[14];
  PROF_edge_counter_143 = PROF_edge_counter_142 + 1;
  __gcov0.___H_object_2d__3e_u8vector[14] = PROF_edge_counter_143;
  goto <bb 6>;

not sure if we should artificially limit such loops.  LIM doesn't account
for the (compile-time) cost of needing very many PHIs when rewriting
the store-motion vars into SSA form (but it could in theory estimate
by taking into account the CFG structure of the "loop").

Let's see if we can first generate a smaller testcase to illustrate the
issue.

Mine for now.
Comment 10 Jeffrey A. Law 2015-02-16 19:57:51 UTC
Might want to look at 65076 as well where phase opt and generate is taking 89% of the compile time.  Might be a better testcase to work with.
Comment 11 Richard Biener 2015-03-05 17:22:05 UTC
Ok, so it's already calculate_live_ranges that takes much memory.  I have a small patch to improve that somewhat.

But what we really need is to get the "must coalesce" stuff "coalesced" with
respect to both live and conflict computation.  That is, map must-coalesce
SSA vars to the same partition.  That loses the SSA corruption testing, but
well so it might be much more controversical (silent wrong-code instead of ICE).
Unfortunately in the testcase there are only 2750 must-coalesces but
109493 partitions participating in the coalescing (so at least 50000 want
coalesces).

The good news is of course that we can simply choose to _not_ coalesce that
many variables, but say only the important ones.
Comment 12 Steven Bosscher 2015-03-05 23:07:38 UTC
(In reply to Richard Biener from comment #9)
> It seems that loop invariant motion is responsible for most of the abnormals,
> thus -fno-tree-loop-im restores performance.
> 
> The loop LIM detects is of style
> 
>   <bb 6>: (header)
>   # ___fp_3(ab) = PHI <___fp_41(4), ___fp_5(21)>
>   # ___r1_7(ab) = PHI <___r1_42(4), ___r1_9(21)>
>   # ___r2_11(ab) = PHI <___r2_43(4), ___r3_17(21)>
>   # ___r3_19(ab) = PHI <___r3_44(4), ___r3_23(21)>
>   # ___r4_25 = PHI <___r4_45(4), ___r4_26(21)>
>   # gotovar.17_29 = PHI <_51(4), _69(21)>
>   goto gotovar.17_29;

Perhaps disable LIM (and maybe PRE) if the CFG has a large edge/bb ratio (i.e.
dense CFG)? There's probably no benefit in such cases anyway.
Comment 13 Jeffrey A. Law 2015-03-06 00:45:04 UTC
I think we've done similar things for Brad's large testcases in the past.  You want to look at both the edge/bb density as well as the overall size.  ie, a high density doesn't really hurt if the total cfg is small.

See "is_too_expensive" in gcse.c for the current heuristics to avoid trying global opts on these kinds of testcases.
Comment 14 Richard Biener 2015-03-06 10:52:59 UTC
Note that if we fix out-of-SSA coalescing (patch in testing) then RTL CSE
explodes via DF.
Comment 15 Richard Biener 2015-03-06 12:35:00 UTC
Author: rguenth
Date: Fri Mar  6 12:34:28 2015
New Revision: 221237

URL: https://gcc.gnu.org/viewcvs?rev=221237&root=gcc&view=rev
Log:
2015-03-06  Richard Biener  <rguenther@suse.de>

	PR middle-end/64928
	* tree-ssa-live.h (struct tree_live_info_d): Add livein_obstack
	and liveout_obstack members.
	(calculate_live_on_exit): Remove.
	(calculate_live_ranges): Change declaration.
	* tree-ssa-live.c (liveness_bitmap_obstack): Remove global var.
	(new_tree_live_info): Adjust.
	(calculate_live_ranges): Delete livein when not wanted.
	(calculate_live_ranges): Do not initialize liveness_bitmap_obstack.
	Deal with partly deleted live info.
	(loe_visit_block): Remove temporary bitmap by using
	bitmap_ior_and_compl_into.
	(live_worklist): Adjust accordingly.
	(calculate_live_on_exit): Make static.
	* tree-ssa-coalesce.c (coalesce_ssa_name): Tell calculate_live_ranges
	we do not need livein.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-ssa-coalesce.c
    trunk/gcc/tree-ssa-live.c
    trunk/gcc/tree-ssa-live.h
Comment 16 Richard Biener 2015-03-06 12:47:28 UTC
Created attachment 34974 [details]
Patch to limit coalescing amount

The committed patch improves peak memory usage from 7.6GB to 5.8GB for the small testcase.

The attached patch reduces memory usage from SSA coalescing further (to ~300MB)
by simply doing less coalescing.  Unfortunately the generated RTL puts a bigger load on CSE/DF and thus we need 7.6GB again (eventually one can find an optimal
--param max-out-of-ssa-coalesce-names, but that's probably highly testcase specific).

In theory you can iterate on coalescing piecewise as well, but the overhead
for doing this might be too big (basically up to computing live/conflict
for each coalesce pair separately, taking into account previous coalesces).
Comment 17 Richard Biener 2015-03-06 12:52:54 UTC
Created attachment 34975 [details]
do not compute live/conflict for abnormal coalesces

This is the other idea of simply not computing live/conflict for abnormal coalesces we know to always succeed.  This shrinks the following live/conflict
problem for the regular coalesces by unifying some partitions.

Doesn't help this particular testcase much.
Comment 18 Richard Biener 2015-03-06 13:01:09 UTC
(In reply to Richard Biener from comment #17)
> Created attachment 34975 [details]
> do not compute live/conflict for abnormal coalesces
> 
> This is the other idea of simply not computing live/conflict for abnormal
> coalesces we know to always succeed.  This shrinks the following
> live/conflict
> problem for the regular coalesces by unifying some partitions.
> 
> Doesn't help this particular testcase much.

But it fixes PR63155 ...
Comment 19 Nick Wellnhofer 2015-05-20 14:49:17 UTC
*** Bug 66209 has been marked as a duplicate of this bug. ***
Comment 20 Richard Biener 2015-06-23 08:15:43 UTC
The gcc-4_8-branch is being closed, re-targeting regressions to 4.9.3.
Comment 21 Jakub Jelinek 2015-06-26 19:52:58 UTC
GCC 4.9.3 has been released.
Comment 22 Richard Biener 2016-08-03 11:44:52 UTC
GCC 4.9 branch is being closed
Comment 23 lucier 2017-08-19 00:48:33 UTC
I tried the mainline compiler with the smaller input file on a similar machine to the one in the original report.

I don't know whether I've configured the compiler incorrectly or something, but the problem seems worse now than when first reported.

This is the compiler:

heine:~/programs/gcc> /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/pkgs/gcc-mainline/bin/gcc
COLLECT_LTO_WRAPPER=/pkgs/gcc-mainline/libexec/gcc/x86_64-pc-linux-gnu/8.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../../gcc-mainline/configure --prefix=/pkgs/gcc-mainline --enable-checking=release --enable-languages=c --disable-multilib --enable-gather-detailed-mem-stats
Thread model: posix
gcc version 8.0.0 20170818 (experimental) [trunk revision 251188] (GCC) 

and this is the result: 

/pkgs/gcc-mainline/bin/gcc -Q -save-temps -Wno-unused -Wno-write-strings -O1 -fno-math-errno -fschedule-insns2 -fno-strict-aliasing -fno-trapping-math -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -fprofile-arcs -ftest-coverage -I"../include" -c -o "_system.o" -I. -DHAVE_CONFIG_H  -D___PRIMAL _system.c -D___LIBRARY
Execution times (seconds)
 phase setup             :   0.05 (100%) usr   0.00 ( 0%) sys   0.05 (83%) wall    1425 kB (99%) ggc
 TOTAL                 :   0.05             0.00             0.06               1434 kB
 btowc wctob mbrlen __signbitf __signbit __signbitl ___H__20___system ___H__23__23_type ___H__23__23_type_2d_cast ___H__23__23_subtype ___H__23__23_subtype_2d_set_21_ ___H__23__23_fixnum_3f_ ___H__23__23_subtyped_3f_ ___H__23__23_subtyped_2d_mutable_3f_ ___H__23__23_subtyped_2e_vector_3f_ ___H__23__23_subtyped_2e_symbol_3f_ ___H__23__23_subtyped_2e_flonum_3f_ ___H__23__23_subtyped_2e_bignum_3f_ ___H__23__23_special_3f_ ___H__23__23_ratnum_3f_ ___H__23__23_cpxnum_3f_ ___H__23__23_structure_3f_ ___H__23__23_values_3f_ ___H__23__23_meroon_3f_ ___H__23__23_jazz_3f_ ___H__23__23_frame_3f_ ___H__23__23_continuation_3f_ ___H__23__23_promise_3f_ ___H__23__23_return_3f_ ___H__23__23_foreign_3f_ ___H__23__23_flonum_3f_ ___H__23__23_bignum_3f_ ___H__23__23_unbound_3f_ ___H__23__23_quasi_2d_append ___H__23__23_quasi_2d_list ___H__23__23_quasi_2d_cons ___H__23__23_quasi_2d_list_2d__3e_vector ___H__23__23_quasi_2d_vector ___H__23__23_case_2d_memv ___H__23__23_eqv_3f_ ___H_eqv_3f_ ___H__23__23_eq_3f_ ___H_eq_3f_ ___H__23__23_bvector_2d_equal_3f_ ___H__23__23_equal_3f_ ___H_equal_3f_ ___H__23__23_symbol_2d_hash ___H_symbol_2d_hash ___H__23__23_keyword_2d_hash ___H_keyword_2d_hash ___H__23__23_eq_3f__2d_hash ___H_eq_3f__2d_hash ___H__23__23_eqv_3f__2d_hash ___H_eqv_3f__2d_hash ___H__23__23_equal_3f__2d_hash ___H_equal_3f__2d_hash ___H__23__23_string_3d__3f__2d_hash ___H_string_3d__3f__2d_hash ___H__23__23_string_2d_ci_3d__3f__2d_hash ___H_string_2d_ci_3d__3f__2d_hash ___H__23__23_generic_2d_hash ___H__23__23_fail_2d_check_2d_invalid_2d_hash_2d_number_2d_exception ___H_invalid_2d_hash_2d_number_2d_exception_3f_ ___H_invalid_2d_hash_2d_number_2d_exception_2d_procedure ___H_invalid_2d_hash_2d_number_2d_exception_2d_arguments ___H__23__23_raise_2d_invalid_2d_hash_2d_number_2d_exception ___H__23__23_fail_2d_check_2d_unbound_2d_table_2d_key_2d_exception ___H_unbound_2d_table_2d_key_2d_exception_3f_ ___H_unbound_2d_table_2d_key_2d_exception_2d_procedure ___H_unbound_2d_table_2d_key_2d_exception_2d_arguments ___H__23__23_raise_2d_unbound_2d_table_2d_key_2d_exception ___H__23__23_gc_2d_hash_2d_table_3f_ ___H__23__23_gc_2d_hash_2d_table_2d_ref ___H__23__23_gc_2d_hash_2d_table_2d_set_21_ ___H__23__23_gc_2d_hash_2d_table_2d_rehash_21_ ___H__23__23_smallest_2d_prime_2d_no_2d_less_2d_than ___H__23__23_gc_2d_hash_2d_table_2d_resize_21_ ___H__23__23_gc_2d_hash_2d_table_2d_allocate ___H__23__23_gc_2d_hash_2d_table_2d_for_2d_each ___H__23__23_gc_2d_hash_2d_table_2d_search ___H__23__23_gc_2d_hash_2d_table_2d_foldl ___H__23__23_mem_2d_allocated_3f_ ___H__23__23_fail_2d_check_2d_table ___H_table_3f_ ___H__23__23_make_2d_table ___H_make_2d_table ___H__23__23_table_2d_get_2d_eq_2d_gcht ___H__23__23_table_2d_get_2d_gcht_2d_not_2d_mem_2d_alloc ___H__23__23_table_2d_get_2d_gcht ___H__23__23_table_2d_length ___H_table_2d_length ___H__23__23_table_2d_access ___H__23__23_table_2d_ref ___H_table_2d_ref ___H__23__23_table_2d_resize_21_ ___H__23__23_table_2d_set_21_ ___H_table_2d_set_21_ ___H__23__23_table_2d_search ___H_table_2d_search ___H__23__23_table_2d_for_2d_each ___H_table_2d_for_2d_each ___H__23__23_table_2d_foldl ___H__23__23_table_2d__3e_list ___H_table_2d__3e_list ___H__23__23_list_2d__3e_table ___H_list_2d__3e_table ___H__23__23_table_2d_copy ___H_table_2d_copy ___H__23__23_table_2d_merge_21_ ___H_table_2d_merge_21_ ___H__23__23_table_2d_merge ___H_table_2d_merge ___H__23__23_table_2d_equal_3f_ ___H__23__23_table_2d_equal_3f__2d_hash ___H__23__23_fail_2d_check_2d_unbound_2d_serial_2d_number_2d_exception ___H_unbound_2d_serial_2d_number_2d_exception_3f_ ___H_unbound_2d_serial_2d_number_2d_exception_2d_procedure ___H_unbound_2d_serial_2d_number_2d_exception_2d_arguments ___H__23__23_raise_2d_unbound_2d_serial_2d_number_2d_exception ___H__23__23_object_2d__3e_serial_2d_number ___H_object_2d__3e_serial_2d_number ___H__23__23_serial_2d_number_2d__3e_object ___H_serial_2d_number_2d__3e_object ___H__23__23_object_2d__3e_u8vector ___H_object_2d__3e_u8vector ___H__23__23_u8vector_2d__3e_object ___H_u8vector_2d__3e_object ___setup_mod ___init_mod ____20___system
Analyzing compilation unit
Performing interprocedural optimizations
 <*free_lang_data> <visibility> <build_ssa_passes> <opt_local_passes> <targetclone> <profile> <free-fnsummary> <whole-program> <profile_estimate> <fnsummary> <inline> <pure-const> <static-var> <single-use> <comdats>Assembling functions:
 <materialize-all-clones> <simdclone> ___H__20___system ___H__23__23_type ___H__23__23_type_2d_cast ___H__23__23_subtype ___H__23__23_subtype_2d_set_21_ ___H__23__23_fixnum_3f_ ___H__23__23_subtyped_3f_ ___H__23__23_subtyped_2d_mutable_3f_ ___H__23__23_subtyped_2e_vector_3f_ ___H__23__23_subtyped_2e_symbol_3f_ ___H__23__23_subtyped_2e_flonum_3f_ ___H__23__23_subtyped_2e_bignum_3f_ ___H__23__23_special_3f_ ___H__23__23_ratnum_3f_ ___H__23__23_cpxnum_3f_ ___H__23__23_structure_3f_ ___H__23__23_values_3f_ ___H__23__23_meroon_3f_ ___H__23__23_jazz_3f_ ___H__23__23_frame_3f_ ___H__23__23_continuation_3f_ ___H__23__23_promise_3f_ ___H__23__23_return_3f_ ___H__23__23_foreign_3f_ ___H__23__23_flonum_3f_ ___H__23__23_bignum_3f_ ___H__23__23_unbound_3f_ ___H__23__23_quasi_2d_append ___H__23__23_quasi_2d_list ___H__23__23_quasi_2d_cons ___H__23__23_quasi_2d_list_2d__3e_vector ___H__23__23_quasi_2d_vector ___H__23__23_case_2d_memv ___H__23__23_eqv_3f_ ___H_eqv_3f_ ___H__23__23_eq_3f_ ___H_eq_3f_ ___H__23__23_bvector_2d_equal_3f_ ___H__23__23_equal_3f_ ___H_equal_3f_ ___H__23__23_symbol_2d_hash ___H_symbol_2d_hash ___H__23__23_keyword_2d_hash ___H_keyword_2d_hash ___H__23__23_eq_3f__2d_hash ___H_eq_3f__2d_hash ___H__23__23_eqv_3f__2d_hash ___H_eqv_3f__2d_hash ___H__23__23_equal_3f__2d_hash ___H_equal_3f__2d_hash ___H__23__23_string_3d__3f__2d_hash ___H_string_3d__3f__2d_hash ___H_string_2d_ci_3d__3f__2d_hash ___H__23__23_generic_2d_hash ___H__23__23_fail_2d_check_2d_invalid_2d_hash_2d_number_2d_exception ___H_invalid_2d_hash_2d_number_2d_exception_3f_ ___H_invalid_2d_hash_2d_number_2d_exception_2d_procedure ___H_invalid_2d_hash_2d_number_2d_exception_2d_arguments ___H__23__23_raise_2d_invalid_2d_hash_2d_number_2d_exception ___H__23__23_fail_2d_check_2d_unbound_2d_table_2d_key_2d_exception ___H_unbound_2d_table_2d_key_2d_exception_3f_ ___H_unbound_2d_table_2d_key_2d_exception_2d_procedure ___H_unbound_2d_table_2d_key_2d_exception_2d_arguments ___H__23__23_raise_2d_unbound_2d_table_2d_key_2d_exception ___H__23__23_gc_2d_hash_2d_table_3f_ ___H__23__23_smallest_2d_prime_2d_no_2d_less_2d_than ___H__23__23_gc_2d_hash_2d_table_2d_resize_21_ ___H__23__23_gc_2d_hash_2d_table_2d_allocate ___H__23__23_gc_2d_hash_2d_table_2d_for_2d_each ___H__23__23_gc_2d_hash_2d_table_2d_search ___H__23__23_gc_2d_hash_2d_table_2d_foldl ___H__23__23_mem_2d_allocated_3f_ ___H__23__23_fail_2d_check_2d_table ___H_table_3f_ ___H_make_2d_table ___H__23__23_table_2d_get_2d_eq_2d_gcht ___H__23__23_table_2d_get_2d_gcht_2d_not_2d_mem_2d_alloc ___H__23__23_table_2d_get_2d_gcht ___H__23__23_table_2d_length ___H_table_2d_length ___H__23__23_table_2d_access ___H_table_2d_ref ___H__23__23_table_2d_resize_21_ ___H_table_2d_set_21_ ___H__23__23_table_2d_search ___H_table_2d_search ___H__23__23_table_2d_for_2d_each ___H_table_2d_for_2d_each ___H__23__23_table_2d_foldl ___H__23__23_table_2d__3e_list ___H_table_2d__3e_list ___H__23__23_list_2d__3e_table ___H_list_2d__3e_table ___H__23__23_table_2d_copy ___H_table_2d_copy ___H__23__23_table_2d_merge_21_ ___H_table_2d_merge_21_ ___H__23__23_table_2d_merge ___H_table_2d_merge ___H__23__23_table_2d_equal_3f_ ___H__23__23_table_2d_equal_3f__2d_hash ___H__23__23_fail_2d_check_2d_unbound_2d_serial_2d_number_2d_exception ___H_unbound_2d_serial_2d_number_2d_exception_3f_ ___H_unbound_2d_serial_2d_number_2d_exception_2d_procedure ___H_unbound_2d_serial_2d_number_2d_exception_2d_arguments ___H__23__23_raise_2d_unbound_2d_serial_2d_number_2d_exception ___H__23__23_object_2d__3e_serial_2d_number ___H_object_2d__3e_serial_2d_number ___H__23__23_serial_2d_number_2d__3e_object ___H_serial_2d_number_2d__3e_object ___H__23__23_object_2d__3e_u8vector {GC 267350k -> 214835k} {GC 430685k -> 259602k} ___H_object_2d__3e_u8vector ___H__23__23_u8vector_2d__3e_object {GC 582086k -> 310231k} ___H_u8vector_2d__3e_object ___setup_mod ___init_mod ___H__23__23_gc_2d_hash_2d_table_2d_set_21_ ___H__23__23_table_2d_set_21_ ___H__23__23_gc_2d_hash_2d_table_2d_rehash_21_ ___H__23__23_table_2d_ref ___H__23__23_gc_2d_hash_2d_table_2d_ref ___H__23__23_make_2d_table ___H__23__23_string_2d_ci_3d__3f__2d_hash ____20___system _GLOBAL__sub_I_00100_0__system.c _GLOBAL__sub_D_00100_1__system.c
Execution times (seconds)
 phase setup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    1180 kB ( 0%) ggc
 phase parsing           :   0.30 ( 0%) usr   0.24 (10%) sys   0.53 ( 0%) wall   11106 kB ( 1%) ggc
 phase opt and generate  : 231.20 (100%) usr   2.26 (90%) sys 233.89 (100%) wall 1264764 kB (99%) ggc
 garbage collection      :   1.47 ( 1%) usr   0.01 ( 0%) sys   1.48 ( 1%) wall       0 kB ( 0%) ggc
 dump files              :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 callgraph construction  :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall    5513 kB ( 0%) ggc
 ipa function summary    :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall    1333 kB ( 0%) ggc
 ipa dead code removal   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 ipa profile             :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall    2764 kB ( 0%) ggc
 ipa pure const          :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall       0 kB ( 0%) ggc
 cfg construction        :   0.41 ( 0%) usr   0.00 ( 0%) sys   0.39 ( 0%) wall     463 kB ( 0%) ggc
 cfg cleanup             :   7.07 ( 3%) usr   0.00 ( 0%) sys   6.98 ( 3%) wall      19 kB ( 0%) ggc
 trivially dead code     :   0.42 ( 0%) usr   0.00 ( 0%) sys   0.40 ( 0%) wall       0 kB ( 0%) ggc
 df scan insns           :   0.65 ( 0%) usr   0.00 ( 0%) sys   0.68 ( 0%) wall       5 kB ( 0%) ggc
 df multiple defs        :   3.41 ( 1%) usr   0.02 ( 1%) sys   3.41 ( 1%) wall       0 kB ( 0%) ggc
 df reaching defs        :   0.02 ( 0%) usr   0.01 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 df live regs            :  10.87 ( 5%) usr   0.01 ( 0%) sys  10.84 ( 5%) wall       0 kB ( 0%) ggc
 df live&initialized regs:   5.22 ( 2%) usr   0.00 ( 0%) sys   5.22 ( 2%) wall       0 kB ( 0%) ggc
 df use-def / def-use chains:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 df reg dead/unused notes:   3.39 ( 1%) usr   0.01 ( 0%) sys   3.41 ( 1%) wall   23596 kB ( 2%) ggc
 register information    :   0.66 ( 0%) usr   0.00 ( 0%) sys   0.64 ( 0%) wall       0 kB ( 0%) ggc
 alias analysis          :   1.44 ( 1%) usr   0.00 ( 0%) sys   1.42 ( 1%) wall   50694 kB ( 4%) ggc
 alias stmt walking      :  25.60 (11%) usr   0.36 (14%) sys  25.17 (11%) wall    1121 kB ( 0%) ggc
 register scan           :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall      41 kB ( 0%) ggc
 rebuild jump labels     :   0.21 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall       0 kB ( 0%) ggc
 preprocessing           :   0.07 ( 0%) usr   0.06 ( 2%) sys   0.16 ( 0%) wall    1080 kB ( 0%) ggc
 lexical analysis        :   0.10 ( 0%) usr   0.08 ( 3%) sys   0.10 ( 0%) wall       0 kB ( 0%) ggc
 parser (global)         :   0.04 ( 0%) usr   0.03 ( 1%) sys   0.07 ( 0%) wall    1542 kB ( 0%) ggc
 parser struct body      :   0.00 ( 0%) usr   0.01 ( 0%) sys   0.00 ( 0%) wall     324 kB ( 0%) ggc
 parser function body    :   0.09 ( 0%) usr   0.06 ( 2%) sys   0.20 ( 0%) wall    8135 kB ( 1%) ggc
 inline parameters       :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall    1071 kB ( 0%) ggc
 tree gimplify           :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall    5494 kB ( 0%) ggc
 tree CFG construction   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall    1895 kB ( 0%) ggc
 tree CFG cleanup        :   3.07 ( 1%) usr   0.00 ( 0%) sys   3.14 ( 1%) wall      78 kB ( 0%) ggc
 tree copy propagation   :   0.92 ( 0%) usr   0.00 ( 0%) sys   0.92 ( 0%) wall     194 kB ( 0%) ggc
 tree PTA                :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.23 ( 0%) wall     208 kB ( 0%) ggc
 tree PHI insertion      :   0.01 ( 0%) usr   0.01 ( 0%) sys   0.02 ( 0%) wall    2265 kB ( 0%) ggc
 tree SSA rewrite        :   1.30 ( 1%) usr   0.01 ( 0%) sys   1.34 ( 1%) wall   17229 kB ( 1%) ggc
 tree SSA other          :   0.02 ( 0%) usr   0.01 ( 0%) sys   0.02 ( 0%) wall      17 kB ( 0%) ggc
 tree SSA incremental    :   2.92 ( 1%) usr   0.04 ( 2%) sys   2.96 ( 1%) wall  108528 kB ( 8%) ggc
 tree operand scan       :   0.16 ( 0%) usr   0.03 ( 1%) sys   0.10 ( 0%) wall   21599 kB ( 2%) ggc
 dominator optimization  :   3.81 ( 2%) usr   0.01 ( 0%) sys   4.65 ( 2%) wall   27533 kB ( 2%) ggc
 tree SRA                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree CCP                :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall     177 kB ( 0%) ggc
 tree PHI const/copy prop:   0.18 ( 0%) usr   0.00 ( 0%) sys   0.17 ( 0%) wall    5439 kB ( 0%) ggc
 tree split crit edges   :   1.38 ( 1%) usr   0.00 ( 0%) sys   1.36 ( 1%) wall   77179 kB ( 6%) ggc
 tree reassociation      :   0.27 ( 0%) usr   0.00 ( 0%) sys   0.28 ( 0%) wall       8 kB ( 0%) ggc
 tree FRE                :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall    1310 kB ( 0%) ggc
 tree code sinking       :   0.32 ( 0%) usr   0.00 ( 0%) sys   0.31 ( 0%) wall       0 kB ( 0%) ggc
 tree linearize phis     :   0.17 ( 0%) usr   0.00 ( 0%) sys   0.16 ( 0%) wall     131 kB ( 0%) ggc
 tree backward propagate :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 tree forward propagate  :   2.56 ( 1%) usr   0.00 ( 0%) sys   2.64 ( 1%) wall     288 kB ( 0%) ggc
 tree conservative DCE   :   0.80 ( 0%) usr   0.02 ( 1%) sys   0.76 ( 0%) wall      84 kB ( 0%) ggc
 tree aggressive DCE     :   0.60 ( 0%) usr   0.02 ( 1%) sys   0.71 ( 0%) wall    2225 kB ( 0%) ggc
 tree DSE                :   0.30 ( 0%) usr   0.00 ( 0%) sys   0.28 ( 0%) wall       8 kB ( 0%) ggc
 tree loop invariant motion:  40.96 (18%) usr   0.27 (11%) sys  41.41 (18%) wall  209802 kB (16%) ggc
 tree canonical iv       :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      17 kB ( 0%) ggc
 scev constant prop      :   1.40 ( 1%) usr   0.01 ( 0%) sys   1.42 ( 1%) wall   19981 kB ( 2%) ggc
 tree iv optimization    :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     296 kB ( 0%) ggc
 tree SSA uncprop        :   0.46 ( 0%) usr   0.00 ( 0%) sys   0.45 ( 0%) wall       0 kB ( 0%) ggc
 dominance frontiers     :   0.55 ( 0%) usr   0.01 ( 0%) sys   0.54 ( 0%) wall       0 kB ( 0%) ggc
 dominance computation   :   5.36 ( 2%) usr   0.01 ( 0%) sys   5.27 ( 2%) wall       0 kB ( 0%) ggc
 out of ssa              :  26.58 (11%) usr   0.96 (38%) sys  27.56 (12%) wall    4461 kB ( 0%) ggc
 expand vars             :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     999 kB ( 0%) ggc
 expand                  :   4.32 ( 2%) usr   0.12 ( 5%) sys   4.47 ( 2%) wall  184816 kB (14%) ggc
 post expand cleanups    :   0.76 ( 0%) usr   0.00 ( 0%) sys   0.77 ( 0%) wall     337 kB ( 0%) ggc
 forward prop            :   2.92 ( 1%) usr   0.01 ( 0%) sys   3.00 ( 1%) wall   14617 kB ( 1%) ggc
 CSE                     :   1.98 ( 1%) usr   0.03 ( 1%) sys   2.06 ( 1%) wall   16860 kB ( 1%) ggc
 dead code elimination   :   0.86 ( 0%) usr   0.00 ( 0%) sys   0.84 ( 0%) wall       0 kB ( 0%) ggc
 dead store elim1        :   2.43 ( 1%) usr   0.00 ( 0%) sys   2.43 ( 1%) wall   11087 kB ( 1%) ggc
 dead store elim2        :   3.04 ( 1%) usr   0.00 ( 0%) sys   3.03 ( 1%) wall   35846 kB ( 3%) ggc
 loop analysis           :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 loop init               :   2.44 ( 1%) usr   0.00 ( 0%) sys   2.52 ( 1%) wall    1031 kB ( 0%) ggc
 loop invariant motion   :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall     224 kB ( 0%) ggc
 loop fini               :   0.18 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) wall       0 kB ( 0%) ggc
 branch prediction       :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     268 kB ( 0%) ggc
 combiner                :   1.49 ( 1%) usr   0.01 ( 0%) sys   1.47 ( 1%) wall    4746 kB ( 0%) ggc
 if-conversion           :   2.70 ( 1%) usr   0.00 ( 0%) sys   2.73 ( 1%) wall   46824 kB ( 4%) ggc
 integrated RA           :   9.59 ( 4%) usr   0.03 ( 1%) sys   9.69 ( 4%) wall  164161 kB (13%) ggc
 LRA non-specific        :  11.22 ( 5%) usr   0.05 ( 2%) sys  11.20 ( 5%) wall   52521 kB ( 4%) ggc
 LRA virtuals elimination:   1.67 ( 1%) usr   0.05 ( 2%) sys   1.71 ( 1%) wall   30963 kB ( 2%) ggc
 LRA reload inheritance  :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall      10 kB ( 0%) ggc
 LRA create live ranges  :  14.05 ( 6%) usr   0.00 ( 0%) sys  14.07 ( 6%) wall    4517 kB ( 0%) ggc
 LRA hard reg assignment :   0.87 ( 0%) usr   0.00 ( 0%) sys   0.91 ( 0%) wall       0 kB ( 0%) ggc
 LRA coalesce pseudo regs:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 reload                  :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 reload CSE regs         :   1.79 ( 1%) usr   0.01 ( 0%) sys   1.87 ( 1%) wall   27472 kB ( 2%) ggc
 thread pro- & epilogue  :   0.67 ( 0%) usr   0.00 ( 0%) sys   0.67 ( 0%) wall     521 kB ( 0%) ggc
 if-conversion 2         :   0.42 ( 0%) usr   0.00 ( 0%) sys   0.42 ( 0%) wall       0 kB ( 0%) ggc
 combine stack adjustments:   0.22 ( 0%) usr   0.00 ( 0%) sys   0.25 ( 0%) wall       0 kB ( 0%) ggc
 hard reg cprop          :   0.48 ( 0%) usr   0.04 ( 2%) sys   0.55 ( 0%) wall       3 kB ( 0%) ggc
 scheduling 2            :   4.38 ( 2%) usr   0.03 ( 1%) sys   4.43 ( 2%) wall    4136 kB ( 0%) ggc
 machine dep reorg       :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 reorder blocks          :   1.56 ( 1%) usr   0.00 ( 0%) sys   1.57 ( 1%) wall    8368 kB ( 1%) ggc
 shorten branches        :   0.49 ( 0%) usr   0.00 ( 0%) sys   0.49 ( 0%) wall       0 kB ( 0%) ggc
 final                   :   1.40 ( 1%) usr   0.03 ( 1%) sys   1.45 ( 1%) wall   60062 kB ( 5%) ggc
 variable output         :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     142 kB ( 0%) ggc
 straight-line strength reduction:   0.33 ( 0%) usr   0.00 ( 0%) sys   0.32 ( 0%) wall      30 kB ( 0%) ggc
 initialize rtl          :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall      12 kB ( 0%) ggc
 rest of compilation     :   2.60 ( 1%) usr   0.02 ( 1%) sys   2.60 ( 1%) wall     621 kB ( 0%) ggc
 remove unused locals    :   0.23 ( 0%) usr   0.00 ( 0%) sys   0.24 ( 0%) wall       0 kB ( 0%) ggc
 repair loop structures  :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall       0 kB ( 0%) ggc
 TOTAL                 : 231.50             2.50           234.43            1277059 kB
Comment 24 Richard Biener 2017-08-21 14:19:51 UTC
I think we made LIM more powerful and/or changed coalescing.  We didn't really address the issue with LIMs lack of a costmodel or out-of-SSA coalescing being quadratic in size requirement (but I see RTL passes bumping up to 5GB of memory use as well, REE for example, also noted elsewhere).
Comment 25 Jakub Jelinek 2017-10-10 13:25:57 UTC
GCC 5 branch is being closed
Comment 26 Jakub Jelinek 2018-10-26 10:09:17 UTC
GCC 6 branch is being closed
Comment 27 Richard Biener 2018-11-02 09:40:51 UTC
Btw, on trunk for the small testcase the main peak memory user is

Bitmaps                                                 Leak            Peak            Times  N searches Search iter      Type
--------------------------------------------------------------------------------------------------------------------------------------------
...
tree-ssa-live.c:931 (new_tree_live_info)         4089900520: 42.6%4089900600 102257849: 11.3%       35539       42909      heap
tree-ssa-live.c:932 (new_tree_live_info)         4099840160: 42.7%4099840200 103153730: 11.4%      326917       98706      heap
--------------------------------------------------------------------------------------------------------------------------------------------
Total                                            9592285400                  906070505

that's livein/liveout.  SSA conflicts are probably similar but harder
to decipher from the stats:

tree-ssa-coalesce.c:586 (ssa_conflicts_add_one)       43056:  0.0%    198672    398160:  0.0%       19205       39415      heap

next top is

df-problems.c:4400 (df_md_alloc)                  218129480:  2.3% 218146320   5654706:  0.6%       71264      127594      heap
df-problems.c:4401 (df_md_alloc)                  218142960:  2.3% 218159920   5640467:  0.6%       71675      127395      heap
Comment 28 Richard Biener 2019-11-14 07:50:14 UTC
The GCC 7 branch is being closed, re-targeting to GCC 8.4.
Comment 29 Jakub Jelinek 2020-03-04 09:32:16 UTC
GCC 8.4.0 has been released, adjusting target milestone.
Comment 30 lucier 2020-09-29 00:14:26 UTC
I'm coming back to this project.

I naively thought "Well, I don't need arc profiling, I'll just set -ftest-coverage without -fprofile-arcs" but it appears that I can't do that, the gcda files are generated by -fprofile-arcs.

It seems to me that test coverage could be implemented simply by instrumenting each basic block in an algorithm that's linear in the number of basic blocks.  Is it possible to do this?

Brad
Comment 31 Richard Biener 2020-09-29 07:09:11 UTC
(In reply to lucier from comment #30)
> I'm coming back to this project.
> 
> I naively thought "Well, I don't need arc profiling, I'll just set
> -ftest-coverage without -fprofile-arcs" but it appears that I can't do that,
> the gcda files are generated by -fprofile-arcs.
> 
> It seems to me that test coverage could be implemented simply by
> instrumenting each basic block in an algorithm that's linear in the number
> of basic blocks.  Is it possible to do this?
> 
> Brad

I don't think the instrumentation itself is the problem - it's already
doing better than one counter per block.  It's simply that the large
source runs into multiple non-linearities in core pieces of the compiler
that cannot be turned off ...
Comment 32 lucier 2020-09-29 12:17:59 UTC
I don't know precisely what you're saying, but it compiles fine without the instrumentation.
Comment 33 rguenther@suse.de 2020-09-29 13:06:17 UTC
On Tue, 29 Sep 2020, lucier at math dot purdue.edu wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
> 
> --- Comment #32 from lucier at math dot purdue.edu ---
> I don't know precisely what you're saying, but it compiles fine without the
> instrumentation.

Yes - the instrumentation does complicate the IL but the instrumentation
should be already better than linear in the blocks.
Comment 34 lucier 2021-03-10 02:10:43 UTC
I decided to approach this a bit more methodically by generating a series of synthetic programs, each twice as long as the previous, and to measure the compilation time.  I'll attach the associated .i files here.

Each .i file was generated from a Scheme file with 2^k copies, k=1,..,5, of a simple recursive definition of the fibonacci function, suitably renamed.  So these are not large files by my standards.

The short summary is that CPU time seems to grow quadraticly with the length of the code.  The required memory grows very quickly, too---I killed the compilation with k=5 (so 32 copies of fibonacci function) because the computation filled 32GB of RAM and 32GB of swap.

Perhaps this parameterized input files might be of help.

Brad

I downloaded the git sources for gcc:

heine:~/programs/gcc/gcc-mainline> git log
commit 7eef9a66018e23677058fec421229e3fa435a1a3 (HEAD -> master, origin/master, origin/HEAD)
Author: Joel Brobecker <brobecker@adacore.com>
Date:   Mon Mar 8 23:59:37 2021 -0300

I configured and built gcc with

heine:~/programs/gcc/gcc-mainline> /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/pkgs/gcc-mainline/bin/gcc
COLLECT_LTO_WRAPPER=/pkgs/gcc-mainline/libexec/gcc/x86_64-pc-linux-gnu/11.0.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../../gcc-mainline/configure --prefix=/pkgs/gcc-mainline --enable-languages=c --enable-checking=release
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 11.0.1 20210309 (experimental) (GCC) 

The program names are fib-1.c to fib-5.c, fib-k.c contains 2^k copies of fibonacci.

/pkgs/gcc-mainline/bin/gcc -march=native -D___CAN_IMPORT_CLIB_DYNAMICALLY  -O1     -Wno-unused -Wno-write-strings -Wdisabled-optimization -fwrapv -fno-strict-aliasing -fno-trapping-math -fno-math-errno -fschedule-insns2 -fomit-frame-pointer -fPIC -fno-common -mpc64   -rdynamic -shared  -D___SINGLE_HOST -D___DYNAMIC -I"/home/lucier/programs/gambit/gambit-profiled/include" -o 'fib-1.o1' -Q -fprofile-arcs -ftest-coverage -save-temps   'fib-1.c' 

Time variable                                   usr           sys          wall           GGC
 phase setup                        :   0.02 (100%)   0.00 (  0%)   0.03 (100%)  5039k (100%)
 TOTAL                              :   0.02          0.00          0.03         5049k
 btowc wctob mbrlen ___H_fib_2d_1 ___setup_mod ___init_mod ___LNK_fib_2d_1_2e_o1
Analyzing compilation unit
Performing interprocedural optimizations
 <*free_lang_data> {heap 1240k} <visibility> {heap 1240k} <build_ssa_passes> {heap 1240k} <opt_local_passes> {heap 1240k} <remove_symbols> {heap 2468k} <targetclone> {heap 2468k} <profile> {heap 2468k} <free-fnsummary> {heap 2468k}Streaming LTO
 <whole-program> {heap 2468k} <profile_estimate> {heap 2468k} <fnsummary> {heap 2468k} <inline> {heap 2468k} <pure-const> {heap 2468k} <modref> {heap 2468k} <free-fnsummary> {heap 2468k} <static-var> {heap 2468k} <single-use> {heap 2468k} <comdats> {heap 2468k}Assembling functions:
 <simdclone> {heap 2468k} ___setup_mod ___init_mod ___H_fib_2d_1 ___LNK_fib_2d_1_2e_o1 _sub_I_00100_0 _sub_D_00100_1
Time variable                                   usr           sys          wall           GGC
 phase setup                        :   0.00 (  0%)   0.00 (  0%)   0.01 (  1%)  1519k (  6%)
 phase parsing                      :   0.06 (  8%)   0.01 ( 20%)   0.08 ( 10%)  2072k (  8%)
 phase opt and generate             :   0.67 ( 92%)   0.04 ( 80%)   0.70 ( 89%)    22M ( 86%)
 dump files                         :   0.01 (  1%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
 callgraph functions expansion      :   0.66 ( 90%)   0.03 ( 60%)   0.69 ( 87%)    21M ( 82%)
 callgraph ipa passes               :   0.01 (  1%)   0.00 (  0%)   0.01 (  1%)   570k (  2%)
 cfg cleanup                        :   0.00 (  0%)   0.00 (  0%)   0.04 (  5%)    64  (  0%)
 trivially dead code                :   0.00 (  0%)   0.01 ( 20%)   0.00 (  0%)     0  (  0%)
 df live regs                       :   0.01 (  1%)   0.00 (  0%)   0.02 (  3%)     0  (  0%)
 df live&initialized regs           :   0.02 (  3%)   0.00 (  0%)   0.02 (  3%)     0  (  0%)
 df reg dead/unused notes           :   0.02 (  3%)   0.00 (  0%)   0.01 (  1%)   305k (  1%)
 alias analysis                     :   0.01 (  1%)   0.00 (  0%)   0.01 (  1%)  1482k (  6%)
 alias stmt walking                 :   0.02 (  3%)   0.01 ( 20%)   0.02 (  3%)  7280  (  0%)
 rebuild jump labels                :   0.01 (  1%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
 preprocessing                      :   0.02 (  3%)   0.00 (  0%)   0.01 (  1%)   240k (  1%)
 lexical analysis                   :   0.02 (  3%)   0.01 ( 20%)   0.00 (  0%)     0  (  0%)
 parser (global)                    :   0.01 (  1%)   0.00 (  0%)   0.04 (  5%)  1239k (  5%)
 parser struct body                 :   0.01 (  1%)   0.00 (  0%)   0.01 (  1%)   359k (  1%)
 parser function body               :   0.00 (  0%)   0.00 (  0%)   0.02 (  3%)   201k (  1%)
 tree gimplify                      :   0.00 (  0%)   0.01 ( 20%)   0.00 (  0%)   297k (  1%)
 tree copy propagation              :   0.01 (  1%)   0.00 (  0%)   0.01 (  1%)    13k (  0%)
 tree SSA rewrite                   :   0.00 (  0%)   0.00 (  0%)   0.01 (  1%)   356k (  1%)
 tree SSA incremental               :   0.01 (  1%)   0.00 (  0%)   0.00 (  0%)  2918k ( 11%)
 tree operand scan                  :   0.01 (  1%)   0.00 (  0%)   0.00 (  0%)   314k (  1%)
 dominator optimization             :   0.03 (  4%)   0.01 ( 20%)   0.04 (  5%)   531k (  2%)
 tree FRE                           :   0.00 (  0%)   0.00 (  0%)   0.01 (  1%)    36k (  0%)
 tree forward propagate             :   0.02 (  3%)   0.00 (  0%)   0.00 (  0%)    34k (  0%)
 tree conservative DCE              :   0.00 (  0%)   0.00 (  0%)   0.01 (  1%)  6224  (  0%)
 tree DSE                           :   0.03 (  4%)   0.00 (  0%)   0.04 (  5%)     0  (  0%)
 tree loop invariant motion         :   0.01 (  1%)   0.00 (  0%)   0.03 (  4%)  2496k (  9%)
 tree strlen optimization           :   0.01 (  1%)   0.00 (  0%)   0.01 (  1%)    83k (  0%)
 dominance computation              :   0.02 (  3%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
 out of ssa                         :   0.03 (  4%)   0.00 (  0%)   0.02 (  3%)    64k (  0%)
 expand                             :   0.00 (  0%)   0.00 (  0%)   0.01 (  1%)  2473k (  9%)
 forward prop                       :   0.02 (  3%)   0.00 (  0%)   0.02 (  3%)    81k (  0%)
 CSE                                :   0.01 (  1%)   0.00 (  0%)   0.00 (  0%)   211k (  1%)
 dead store elim2                   :   0.01 (  1%)   0.00 (  0%)   0.02 (  3%)   701k (  3%)
 loop init                          :   0.01 (  1%)   0.00 (  0%)   0.00 (  0%)    29k (  0%)
 loop fini                          :   0.00 (  0%)   0.00 (  0%)   0.01 (  1%)   116k (  0%)
 combiner                           :   0.01 (  1%)   0.00 (  0%)   0.01 (  1%)   108k (  0%)
 if-conversion                      :   0.02 (  3%)   0.00 (  0%)   0.00 (  0%)   666k (  3%)
 integrated RA                      :   0.06 (  8%)   0.00 (  0%)   0.05 (  6%)  3986k ( 15%)
 LRA non-specific                   :   0.05 (  7%)   0.00 (  0%)   0.06 (  8%)  1324k (  5%)
 LRA reload inheritance             :   0.01 (  1%)   0.00 (  0%)   0.01 (  1%)   224  (  0%)
 LRA create live ranges             :   0.09 ( 12%)   0.00 (  0%)   0.08 ( 10%)   241k (  1%)
 LRA hard reg assignment            :   0.02 (  3%)   0.00 (  0%)   0.02 (  3%)     0  (  0%)
 reload CSE regs                    :   0.02 (  3%)   0.00 (  0%)   0.02 (  3%)   368k (  1%)
 thread pro- & epilogue             :   0.01 (  1%)   0.00 (  0%)   0.00 (  0%)    10k (  0%)
 hard reg cprop                     :   0.00 (  0%)   0.00 (  0%)   0.01 (  1%)   288  (  0%)
 scheduling 2                       :   0.04 (  5%)   0.00 (  0%)   0.04 (  5%)   149k (  1%)
 shorten branches                   :   0.00 (  0%)   0.00 (  0%)   0.01 (  1%)     0  (  0%)
 final                              :   0.01 (  1%)   0.00 (  0%)   0.00 (  0%)   816k (  3%)
 initialize rtl                     :   0.00 (  0%)   0.00 (  0%)   0.01 (  1%)    12k (  0%)
 rest of compilation                :   0.00 (  0%)   0.00 (  0%)   0.02 (  3%)    66k (  0%)
 TOTAL                              :   0.73          0.05          0.79           25M


/pkgs/gcc-mainline/bin/gcc -march=native -D___CAN_IMPORT_CLIB_DYNAMICALLY  -O1     -Wno-unused -Wno-write-strings -Wdisabled-optimization -fwrapv -fno-strict-aliasing -fno-trapping-math -fno-math-errno -fschedule-insns2 -fomit-frame-pointer -fPIC -fno-common -mpc64   -rdynamic -shared  -D___SINGLE_HOST -D___DYNAMIC -I"/home/lucier/programs/gambit/gambit-profiled/include" -o 'fib-2.o1' -Q -fprofile-arcs -ftest-coverage -save-temps   'fib-2.c' 

Time variable                                   usr           sys          wall           GGC
 phase setup                        :   0.01 (100%)   0.02 (100%)   0.04 (100%)  7596k (100%)
 TOTAL                              :   0.01          0.02          0.04         7606k
 btowc wctob mbrlen ___H_fib_2d_2 ___setup_mod ___init_mod ___LNK_fib_2d_2_2e_o1
Analyzing compilation unit
Performing interprocedural optimizations
 <*free_lang_data> {heap 1432k} <visibility> {heap 1432k} <build_ssa_passes> {heap 1432k} <opt_local_passes> {heap 1432k} <remove_symbols> {heap 3104k} <targetclone> {heap 3104k} <profile> {heap 3104k} <free-fnsummary> {heap 3104k}Streaming LTO
 <whole-program> {heap 3104k} <profile_estimate> {heap 3104k} <fnsummary> {heap 3104k} <inline> {heap 3104k} <pure-const> {heap 3104k} <modref> {heap 3104k} <free-fnsummary> {heap 3104k} <static-var> {heap 3104k} <single-use> {heap 3104k} <comdats> {heap 3104k}Assembling functions:
 <simdclone> {heap 3104k} ___setup_mod ___init_mod ___H_fib_2d_2 ___LNK_fib_2d_2_2e_o1 _sub_I_00100_0 _sub_D_00100_1
Time variable                                   usr           sys          wall           GGC
 phase setup                        :   0.00 (  0%)   0.00 (  0%)   0.00 (  0%)  1519k (  2%)
 phase parsing                      :   0.04 (  1%)   0.05 ( 36%)   0.10 (  3%)  2500k (  4%)
 phase opt and generate             :   2.78 ( 99%)   0.09 ( 64%)   2.88 ( 97%)    62M ( 94%)
 callgraph construction             :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)    26k (  0%)
 callgraph functions expansion      :   2.75 ( 98%)   0.09 ( 64%)   2.85 ( 96%)    61M ( 92%)
 callgraph ipa passes               :   0.02 (  1%)   0.00 (  0%)   0.02 (  1%)   939k (  1%)
 ipa pure const                     :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 cfg cleanup                        :   0.04 (  1%)   0.00 (  0%)   0.04 (  1%)    64  (  0%)
 trivially dead code                :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 df scan insns                      :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)   288  (  0%)
 df reaching defs                   :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 df live regs                       :   0.07 (  2%)   0.00 (  0%)   0.10 (  3%)     0  (  0%)
 df live&initialized regs           :   0.08 (  3%)   0.00 (  0%)   0.07 (  2%)     0  (  0%)
 df reg dead/unused notes           :   0.05 (  2%)   0.01 (  7%)   0.06 (  2%)   935k (  1%)
 register information               :   0.04 (  1%)   0.00 (  0%)   0.03 (  1%)     0  (  0%)
 alias analysis                     :   0.02 (  1%)   0.00 (  0%)   0.00 (  0%)  2960k (  4%)
 alias stmt walking                 :   0.13 (  5%)   0.02 ( 14%)   0.10 (  3%)  7472  (  0%)
 rebuild jump labels                :   0.01 (  0%)   0.00 (  0%)   0.03 (  1%)     0  (  0%)
 preprocessing                      :   0.00 (  0%)   0.03 ( 21%)   0.03 (  1%)   250k (  0%)
 lexical analysis                   :   0.02 (  1%)   0.02 ( 14%)   0.06 (  2%)     0  (  0%)
 parser (global)                    :   0.00 (  0%)   0.00 (  0%)   0.00 (  0%)  1252k (  2%)
 parser struct body                 :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)   359k (  1%)
 parser function body               :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)   608k (  1%)
 inline parameters                  :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)    39k (  0%)
 tree gimplify                      :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)   505k (  1%)
 tree CFG cleanup                   :   0.02 (  1%)   0.01 (  7%)   0.02 (  1%)   320k (  0%)
 tree copy propagation              :   0.04 (  1%)   0.00 (  0%)   0.05 (  2%)    24k (  0%)
 tree PTA                           :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)    13k (  0%)
 tree SSA rewrite                   :   0.02 (  1%)   0.00 (  0%)   0.02 (  1%)   605k (  1%)
 tree SSA incremental               :   0.05 (  2%)   0.00 (  0%)   0.06 (  2%)  9895k ( 14%)
 tree operand scan                  :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)   882k (  1%)
 dominator optimization             :   0.13 (  5%)   0.00 (  0%)   0.16 (  5%)  1261k (  2%)
 tree split crit edges              :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)  1410k (  2%)
 tree reassociation                 :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)    48  (  0%)
 tree code sinking                  :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)  1680k (  2%)
 tree forward propagate             :   0.01 (  0%)   0.00 (  0%)   0.02 (  1%)    63k (  0%)
 tree conservative DCE              :   0.02 (  1%)   0.00 (  0%)   0.02 (  1%)  8288  (  0%)
 tree aggressive DCE                :   0.03 (  1%)   0.00 (  0%)   0.02 (  1%)    40  (  0%)
 tree DSE                           :   0.11 (  4%)   0.00 (  0%)   0.12 (  4%)     0  (  0%)
 tree loop invariant motion         :   0.09 (  3%)   0.01 (  7%)   0.09 (  3%)  7961k ( 12%)
 tree iv optimization               :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)    22k (  0%)
 tree SSA uncprop                   :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 tree strlen optimization           :   0.02 (  1%)   0.00 (  0%)   0.02 (  1%)   149k (  0%)
 tree modref                        :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)  2800  (  0%)
 dominance computation              :   0.02 (  1%)   0.00 (  0%)   0.05 (  2%)     0  (  0%)
 out of ssa                         :   0.11 (  4%)   0.01 (  7%)   0.13 (  4%)   752  (  0%)
 expand                             :   0.03 (  1%)   0.00 (  0%)   0.02 (  1%)  7567k ( 11%)
 post expand cleanups               :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)    49k (  0%)
 varconst                           :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)  1024  (  0%)
 forward prop                       :   0.09 (  3%)   0.00 (  0%)   0.09 (  3%)   255k (  0%)
 CSE                                :   0.02 (  1%)   0.00 (  0%)   0.02 (  1%)   659k (  1%)
 dead code elimination              :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 dead store elim1                   :   0.02 (  1%)   0.00 (  0%)   0.03 (  1%)   467k (  1%)
 dead store elim2                   :   0.04 (  1%)   0.00 (  0%)   0.03 (  1%)  2157k (  3%)
 loop init                          :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)    36k (  0%)
 loop fini                          :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)   352k (  1%)
 combiner                           :   0.02 (  1%)   0.00 (  0%)   0.02 (  1%)   260k (  0%)
 if-conversion                      :   0.03 (  1%)   0.00 (  0%)   0.04 (  1%)  2511k (  4%)
 integrated RA                      :   0.21 (  7%)   0.01 (  7%)   0.22 (  7%)  9272k ( 14%)
 LRA non-specific                   :   0.18 (  6%)   0.01 (  7%)   0.16 (  5%)  4240k (  6%)
 LRA virtuals elimination           :   0.03 (  1%)   0.00 (  0%)   0.02 (  1%)  1264k (  2%)
 LRA reload inheritance             :   0.04 (  1%)   0.00 (  0%)   0.04 (  1%)     0  (  0%)
 LRA create live ranges             :   0.41 ( 15%)   0.00 (  0%)   0.44 ( 15%)   757k (  1%)
 LRA hard reg assignment            :   0.08 (  3%)   0.01 (  7%)   0.09 (  3%)     0  (  0%)
 reload CSE regs                    :   0.05 (  2%)   0.00 (  0%)   0.05 (  2%)  1113k (  2%)
 thread pro- & epilogue             :   0.02 (  1%)   0.00 (  0%)   0.02 (  1%)    10k (  0%)
 if-conversion 2                    :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 combine stack adjustments          :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 hard reg cprop                     :   0.02 (  1%)   0.00 (  0%)   0.02 (  1%)   432  (  0%)
 scheduling 2                       :   0.11 (  4%)   0.00 (  0%)   0.12 (  4%)   457k (  1%)
 reorder blocks                     :   0.02 (  1%)   0.00 (  0%)   0.01 (  0%)   370k (  1%)
 shorten branches                   :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 final                              :   0.03 (  1%)   0.00 (  0%)   0.03 (  1%)  2482k (  4%)
 straight-line strength reduction   :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)  4440  (  0%)
 rest of compilation                :   0.08 (  3%)   0.00 (  0%)   0.03 (  1%)   179k (  0%)
 remove unused locals               :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 repair loop structures             :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
 TOTAL                              :   2.82          0.14          2.98           66M

/pkgs/gcc-mainline/bin/gcc -march=native -D___CAN_IMPORT_CLIB_DYNAMICALLY  -O1     -Wno-unused -Wno-write-strings -Wdisabled-optimization -fwrapv -fno-strict-aliasing -fno-trapping-math -fno-math-errno -fschedule-insns2 -fomit-frame-pointer -fPIC -fno-common -mpc64   -rdynamic -shared  -D___SINGLE_HOST -D___DYNAMIC -I"/home/lucier/programs/gambit/gambit-profiled/include" -o 'fib-3.o1' -Q -fprofile-arcs -ftest-coverage -save-temps   'fib-3.c' 

Time variable                                   usr           sys          wall           GGC
 phase setup                        :   0.04 (100%)   0.00 (  0%)   0.04 (100%)  8613k (100%)
 TOTAL                              :   0.04          0.00          0.04         8624k
 btowc wctob mbrlen ___H_fib_2d_3 ___setup_mod ___init_mod ___LNK_fib_2d_3_2e_o1
Analyzing compilation unit
Performing interprocedural optimizations
 <*free_lang_data> {heap 1436k} <visibility> {heap 1436k} <build_ssa_passes> {heap 1436k} <opt_local_passes> {heap 1436k} <remove_symbols> {heap 3060k} <targetclone> {heap 3060k} <profile> {heap 3060k} <free-fnsummary> {heap 3060k}Streaming LTO
 <whole-program> {heap 3060k} <profile_estimate> {heap 3060k} <fnsummary> {heap 3060k} <inline> {heap 3060k} <pure-const> {heap 3060k} <modref> {heap 3060k} <free-fnsummary> {heap 3060k} <static-var> {heap 3060k} <single-use> {heap 3060k} <comdats> {heap 3060k}Assembling functions:
 <simdclone> {heap 3060k} ___setup_mod ___init_mod ___H_fib_2d_3 ___LNK_fib_2d_3_2e_o1 _sub_I_00100_0 _sub_D_00100_1
Time variable                                   usr           sys          wall           GGC
 phase setup                        :   0.00 (  0%)   0.00 (  0%)   0.00 (  0%)  1519k (  1%)
 phase parsing                      :   0.09 (  1%)   0.05 ( 11%)   0.14 (  1%)  2845k (  1%)
 phase opt and generate             :  13.80 ( 99%)   0.42 ( 89%)  14.22 ( 99%)   220M ( 98%)
 callgraph functions expansion      :  13.76 ( 99%)   0.42 ( 89%)  14.17 ( 99%)   216M ( 97%)
 callgraph ipa passes               :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)  1687k (  1%)
 ipa function summary               :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)   176k (  0%)
 ipa profile                        :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)   300k (  0%)
 ipa pure const                     :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 cfg construction                   :   0.02 (  0%)   0.00 (  0%)   0.01 (  0%)    82k (  0%)
 cfg cleanup                        :   0.20 (  1%)   0.01 (  2%)   0.19 (  1%)    64  (  0%)
 trivially dead code                :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)     0  (  0%)
 df scan insns                      :   0.03 (  0%)   0.00 (  0%)   0.03 (  0%)   288  (  0%)
 df reaching defs                   :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
 df live regs                       :   0.37 (  3%)   0.00 (  0%)   0.40 (  3%)     0  (  0%)
 df live&initialized regs           :   0.37 (  3%)   0.01 (  2%)   0.38 (  3%)     0  (  0%)
 df reg dead/unused notes           :   0.17 (  1%)   0.01 (  2%)   0.18 (  1%)  3229k (  1%)
 register information               :   0.15 (  1%)   0.00 (  0%)   0.17 (  1%)     0  (  0%)
 alias analysis                     :   0.07 (  1%)   0.00 (  0%)   0.05 (  0%)    11M (  5%)
 alias stmt walking                 :   1.02 (  7%)   0.02 (  4%)   0.93 (  6%)  7856  (  0%)
 rebuild jump labels                :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)     0  (  0%)
 preprocessing                      :   0.03 (  0%)   0.00 (  0%)   0.04 (  0%)   268k (  0%)
 lexical analysis                   :   0.04 (  0%)   0.02 (  4%)   0.03 (  0%)     0  (  0%)
 parser (global)                    :   0.00 (  0%)   0.01 (  2%)   0.03 (  0%)  1275k (  1%)
 parser struct body                 :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)   359k (  0%)
 parser function body               :   0.01 (  0%)   0.02 (  4%)   0.04 (  0%)   911k (  0%)
 tree gimplify                      :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)   937k (  0%)
 tree CFG cleanup                   :   0.11 (  1%)   0.00 (  0%)   0.14 (  1%)  1373k (  1%)
 tree copy propagation              :   0.17 (  1%)   0.00 (  0%)   0.17 (  1%)    48k (  0%)
 tree PTA                           :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)    23k (  0%)
 tree SSA rewrite                   :   0.13 (  1%)   0.00 (  0%)   0.13 (  1%)  1877k (  1%)
 tree SSA other                     :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)   952  (  0%)
 tree SSA incremental               :   0.24 (  2%)   0.01 (  2%)   0.24 (  2%)    34M ( 15%)
 tree operand scan                  :   0.01 (  0%)   0.02 (  4%)   0.03 (  0%)  2882k (  1%)
 dominator optimization             :   0.43 (  3%)   0.01 (  2%)   0.58 (  4%)  4002k (  2%)
 tree CCP                           :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)    47k (  0%)
 tree split crit edges              :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)  5019k (  2%)
 tree reassociation                 :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)    48  (  0%)
 tree FRE                           :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)   110k (  0%)
 tree code sinking                  :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)  6070k (  3%)
 tree linearize phis                :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)  6432  (  0%)
 tree forward propagate             :   0.20 (  1%)   0.02 (  4%)   0.21 (  1%)   119k (  0%)
 tree conservative DCE              :   0.06 (  0%)   0.00 (  0%)   0.05 (  0%)    16k (  0%)
 tree aggressive DCE                :   0.08 (  1%)   0.00 (  0%)   0.07 (  0%)    40  (  0%)
 tree DSE                           :   0.47 (  3%)   0.00 (  0%)   0.47 (  3%)     0  (  0%)
 tree loop invariant motion         :   0.61 (  4%)   0.04 (  9%)   0.65 (  5%)    27M ( 12%)
 complete unrolling                 :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)   544  (  0%)
 tree iv optimization               :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)    47k (  0%)
 tree SSA uncprop                   :   0.03 (  0%)   0.00 (  0%)   0.03 (  0%)     0  (  0%)
 tree strlen optimization           :   0.09 (  1%)   0.00 (  0%)   0.10 (  1%)   281k (  0%)
 tree modref                        :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)  2800  (  0%)
 dominance computation              :   0.16 (  1%)   0.00 (  0%)   0.14 (  1%)     0  (  0%)
 out of ssa                         :   0.72 (  5%)   0.12 ( 26%)   0.85 (  6%)   512k (  0%)
 expand                             :   0.10 (  1%)   0.02 (  4%)   0.11 (  1%)    25M ( 11%)
 post expand cleanups               :   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)    89k (  0%)
 forward prop                       :   0.35 (  3%)   0.01 (  2%)   0.35 (  2%)   888k (  0%)
 CSE                                :   0.10 (  1%)   0.00 (  0%)   0.11 (  1%)  2302k (  1%)
 dead code elimination              :   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)     0  (  0%)
 dead store elim1                   :   0.08 (  1%)   0.00 (  0%)   0.09 (  1%)  1532k (  1%)
 dead store elim2                   :   0.13 (  1%)   0.00 (  0%)   0.14 (  1%)  7464k (  3%)
 loop init                          :   0.08 (  1%)   0.00 (  0%)   0.11 (  1%)    50k (  0%)
 loop invariant motion              :   0.02 (  0%)   0.00 (  0%)   0.01 (  0%)    58k (  0%)
 loop fini                          :   0.03 (  0%)   0.00 (  0%)   0.03 (  0%)   928k (  0%)
 combiner                           :   0.06 (  0%)   0.00 (  0%)   0.06 (  0%)   736k (  0%)
 if-conversion                      :   0.10 (  1%)   0.00 (  0%)   0.09 (  1%)  9292k (  4%)
 integrated RA                      :   1.16 (  8%)   0.01 (  2%)   1.15 (  8%)    37M ( 17%)
 LRA non-specific                   :   0.93 (  7%)   0.01 (  2%)   0.95 (  7%)    10M (  5%)
 LRA virtuals elimination           :   0.06 (  0%)   0.00 (  0%)   0.07 (  0%)  4366k (  2%)
 LRA reload inheritance             :   0.23 (  2%)   0.00 (  0%)   0.23 (  2%)     0  (  0%)
 LRA create live ranges             :   2.41 ( 17%)   0.00 (  0%)   2.41 ( 17%)  2648k (  1%)
 LRA hard reg assignment            :   0.78 (  6%)   0.02 (  4%)   0.78 (  5%)     0  (  0%)
 reload                             :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)   144  (  0%)
 reload CSE regs                    :   0.16 (  1%)   0.01 (  2%)   0.16 (  1%)  3807k (  2%)
 thread pro- & epilogue             :   0.06 (  0%)   0.00 (  0%)   0.05 (  0%)    10k (  0%)
 if-conversion 2                    :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 combine stack adjustments          :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)     0  (  0%)
 hard reg cprop                     :   0.07 (  1%)   0.02 (  4%)   0.08 (  1%)   720  (  0%)
 scheduling 2                       :   0.36 (  3%)   0.01 (  2%)   0.35 (  2%)  1590k (  1%)
 machine dep reorg                  :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
 reorder blocks                     :   0.06 (  0%)   0.00 (  0%)   0.05 (  0%)  1180k (  1%)
 shorten branches                   :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)     0  (  0%)
 final                              :   0.07 (  1%)   0.01 (  2%)   0.08 (  1%)  8569k (  4%)
 straight-line strength reduction   :   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)  8232  (  0%)
 rest of compilation                :   0.13 (  1%)   0.03 (  6%)   0.18 (  1%)   342k (  0%)
 remove unused locals               :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)     0  (  0%)
 address taken                      :   0.02 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 TOTAL                              :  13.89          0.47         14.36          224M

/pkgs/gcc-mainline/bin/gcc -march=native -D___CAN_IMPORT_CLIB_DYNAMICALLY  -O1     -Wno-unused -Wno-write-strings -Wdisabled-optimization -fwrapv -fno-strict-aliasing -fno-trapping-math -fno-math-errno -fschedule-insns2 -fomit-frame-pointer -fPIC -fno-common -mpc64   -rdynamic -shared  -D___SINGLE_HOST -D___DYNAMIC -I"/home/lucier/programs/gambit/gambit-profiled/include" -o 'fib-4.o1' -Q -fprofile-arcs -ftest-coverage -save-temps   'fib-4.c' 

Time variable                                   usr           sys          wall           GGC
 phase setup                        :   0.05 (100%)   0.00 (  0%)   0.06 (100%)    10M (100%)
 TOTAL                              :   0.05          0.00          0.06           10M
 btowc wctob mbrlen ___H_fib_2d_4 ___setup_mod ___init_mod ___LNK_fib_2d_4_2e_o1
Analyzing compilation unit
Performing interprocedural optimizations
 <*free_lang_data> {heap 1652k} <visibility> {heap 1652k} <build_ssa_passes> {heap 1652k} <opt_local_passes> {heap 1652k} <remove_symbols> {heap 4168k} <targetclone> {heap 4168k} <profile> {heap 4168k} <free-fnsummary> {heap 4168k}Streaming LTO
 <whole-program> {heap 4168k} <profile_estimate> {heap 4168k} <fnsummary> {heap 4168k} <inline> {heap 4168k} <pure-const> {heap 4168k} <modref> {heap 4168k} <free-fnsummary> {heap 4168k} <static-var> {heap 4168k} <single-use> {heap 4168k} <comdats> {heap 4168k}Assembling functions:
 <simdclone> {heap 4168k} ___setup_mod ___init_mod ___H_fib_2d_4 {GC madv_dontneed 556k} {GC 264M -> 260M} {GC madv_dontneed 116k} {GC 526M -> 302M} ___LNK_fib_2d_4_2e_o1 _sub_I_00100_0 _sub_D_00100_1
Time variable                                   usr           sys          wall           GGC
 phase setup                        :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)  1519k (  0%)
 phase parsing                      :   0.16 (  0%)   0.08 (  3%)   0.23 (  0%)  4049k (  1%)
 phase lang. deferred               :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)    96  (  0%)
 phase opt and generate             :  55.79 (100%)   2.22 ( 97%)  58.03 (100%)   712M ( 99%)
 garbage collection                 :   0.38 (  1%)   0.00 (  0%)   0.38 (  1%)     0  (  0%)
 dump files                         :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 callgraph construction             :   0.00 (  0%)   0.00 (  0%)   0.00 (  0%)  1108k (  0%)
 callgraph optimization             :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)    19k (  0%)
 callgraph functions expansion      :  55.71 (100%)   2.21 ( 96%)  57.94 ( 99%)   706M ( 98%)
 callgraph ipa passes               :   0.07 (  0%)   0.01 (  0%)   0.09 (  0%)  3221k (  0%)
 ipa function summary               :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)   335k (  0%)
 ipa inlining heuristics            :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)    16  (  0%)
 ipa profile                        :   0.00 (  0%)   0.01 (  0%)   0.01 (  0%)   605k (  0%)
 ipa pure const                     :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)     0  (  0%)
 cfg construction                   :   0.06 (  0%)   0.00 (  0%)   0.05 (  0%)   159k (  0%)
 cfg cleanup                        :   0.68 (  1%)   0.02 (  1%)   0.69 (  1%)    48  (  0%)
 trivially dead code                :   0.11 (  0%)   0.00 (  0%)   0.11 (  0%)     0  (  0%)
 df scan insns                      :   0.09 (  0%)   0.01 (  0%)   0.11 (  0%)   288  (  0%)
 df live regs                       :   1.30 (  2%)   0.04 (  2%)   1.36 (  2%)     0  (  0%)
 df live&initialized regs           :   1.52 (  3%)   0.03 (  1%)   1.56 (  3%)     0  (  0%)
 df reg dead/unused notes           :   0.52 (  1%)   0.01 (  0%)   0.54 (  1%)    11M (  2%)
 register information               :   0.34 (  1%)   0.00 (  0%)   0.34 (  1%)     0  (  0%)
 alias analysis                     :   0.20 (  0%)   0.00 (  0%)   0.20 (  0%)    26M (  4%)
 alias stmt walking                 :   7.31 ( 13%)   0.11 (  5%)   7.32 ( 13%)  8624  (  0%)
 register scan                      :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)  9008  (  0%)
 rebuild jump labels                :   0.07 (  0%)   0.00 (  0%)   0.05 (  0%)     0  (  0%)
 preprocessing                      :   0.02 (  0%)   0.02 (  1%)   0.07 (  0%)   306k (  0%)
 lexical analysis                   :   0.06 (  0%)   0.03 (  1%)   0.10 (  0%)     0  (  0%)
 parser (global)                    :   0.03 (  0%)   0.02 (  1%)   0.02 (  0%)  1323k (  0%)
 parser function body               :   0.05 (  0%)   0.01 (  0%)   0.05 (  0%)  2029k (  0%)
 inline parameters                  :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)   131k (  0%)
 tree gimplify                      :   0.00 (  0%)   0.00 (  0%)   0.00 (  0%)  1802k (  0%)
 tree CFG construction              :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)   578k (  0%)
 tree CFG cleanup                   :   0.41 (  1%)   0.00 (  0%)   0.42 (  1%)  5686k (  1%)
 tree copy propagation              :   0.68 (  1%)   0.00 (  0%)   0.67 (  1%)    96k (  0%)
 tree PTA                           :   0.01 (  0%)   0.01 (  0%)   0.02 (  0%)    43k (  0%)
 tree PHI insertion                 :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)   866k (  0%)
 tree SSA rewrite                   :   0.57 (  1%)   0.00 (  0%)   0.57 (  1%)    10M (  1%)
 tree SSA incremental               :   1.15 (  2%)   0.05 (  2%)   1.20 (  2%)   118M ( 16%)
 tree operand scan                  :   0.10 (  0%)   0.06 (  3%)   0.25 (  0%)    10M (  1%)
 dominator optimization             :   3.64 (  7%)   0.04 (  2%)   3.82 (  7%)    13M (  2%)
 tree CCP                           :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)    94k (  0%)
 tree split crit edges              :   0.04 (  0%)   0.00 (  0%)   0.03 (  0%)    18M (  3%)
 tree reassociation                 :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)    48  (  0%)
 tree FRE                           :   0.01 (  0%)   0.00 (  0%)   0.03 (  0%)   208k (  0%)
 tree code sinking                  :   0.07 (  0%)   0.00 (  0%)   0.07 (  0%)    18M (  3%)
 tree linearize phis                :   0.04 (  0%)   0.00 (  0%)   0.03 (  0%)  6432  (  0%)
 tree backward propagate            :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 tree forward propagate             :   1.65 (  3%)   0.01 (  0%)   1.66 (  3%)   232k (  0%)
 tree conservative DCE              :   0.29 (  1%)   0.00 (  0%)   0.29 (  0%)    31k (  0%)
 tree aggressive DCE                :   0.30 (  1%)   0.00 (  0%)   0.24 (  0%)    40  (  0%)
 tree DSE                           :   1.88 (  3%)   0.00 (  0%)   1.89 (  3%)     0  (  0%)
 tree loop invariant motion         :   5.00 (  9%)   0.15 (  7%)   5.10 (  9%)   103M ( 14%)
 tree iv optimization               :   0.01 (  0%)   0.01 (  0%)   0.02 (  0%)    95k (  0%)
 tree SSA uncprop                   :   0.13 (  0%)   0.00 (  0%)   0.15 (  0%)     0  (  0%)
 tree strlen optimization           :   0.62 (  1%)   0.00 (  0%)   0.62 (  1%)   547k (  0%)
 tree modref                        :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)  2800  (  0%)
 dominance frontiers                :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)     0  (  0%)
 dominance computation              :   0.58 (  1%)   0.02 (  1%)   0.59 (  1%)     0  (  0%)
 out of ssa                         :   5.62 ( 10%)   1.11 ( 48%)   6.73 ( 12%)  2049k (  0%)
 expand vars                        :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)   407k (  0%)
 expand                             :   0.39 (  1%)   0.01 (  0%)   0.42 (  1%)    92M ( 13%)
 post expand cleanups               :   0.12 (  0%)   0.00 (  0%)   0.13 (  0%)   169k (  0%)
 lower subreg                       :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 forward prop                       :   1.25 (  2%)   0.05 (  2%)   1.29 (  2%)  3301k (  0%)
 CSE                                :   0.28 (  1%)   0.00 (  0%)   0.27 (  0%)  8571k (  1%)
 dead code elimination              :   0.08 (  0%)   0.00 (  0%)   0.08 (  0%)     0  (  0%)
 dead store elim1                   :   0.32 (  1%)   0.00 (  0%)   0.32 (  1%)  5493k (  1%)
 dead store elim2                   :   0.41 (  1%)   0.00 (  0%)   0.43 (  1%)    23M (  3%)
 loop analysis                      :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
 loop init                          :   0.20 (  0%)   0.00 (  0%)   0.21 (  0%)    62k (  0%)
 loop fini                          :   0.07 (  0%)   0.02 (  1%)   0.10 (  0%)  3776k (  1%)
 combiner                           :   0.22 (  0%)   0.00 (  0%)   0.22 (  0%)  2378k (  0%)
 if-conversion                      :   0.38 (  1%)   0.01 (  0%)   0.37 (  1%)    36M (  5%)
 integrated RA                      :   5.43 ( 10%)   0.02 (  1%)   5.44 (  9%)    96M ( 13%)
 LRA non-specific                   :   3.61 (  6%)   0.01 (  0%)   3.64 (  6%)    21M (  3%)
 LRA virtuals elimination           :   0.18 (  0%)   0.01 (  0%)   0.16 (  0%)    15M (  2%)
 LRA create live ranges             :   3.08 (  6%)   0.01 (  0%)   3.09 (  5%)  2027k (  0%)
 LRA hard reg assignment            :   0.07 (  0%)   0.00 (  0%)   0.07 (  0%)     0  (  0%)
 reload                             :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)   144  (  0%)
 reload CSE regs                    :   0.51 (  1%)   0.00 (  0%)   0.51 (  1%)    13M (  2%)
 thread pro- & epilogue             :   0.10 (  0%)   0.00 (  0%)   0.11 (  0%)  9680  (  0%)
 if-conversion 2                    :   0.05 (  0%)   0.00 (  0%)   0.02 (  0%)    24  (  0%)
 combine stack adjustments          :   0.04 (  0%)   0.00 (  0%)   0.03 (  0%)     0  (  0%)
 hard reg cprop                     :   0.21 (  0%)   0.10 (  4%)   0.31 (  1%)  3288  (  0%)
 scheduling 2                       :   1.36 (  2%)   0.04 (  2%)   1.38 (  2%)  5904k (  1%)
 machine dep reorg                  :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)     0  (  0%)
 reorder blocks                     :   0.19 (  0%)   0.00 (  0%)   0.23 (  0%)  4176k (  1%)
 shorten branches                   :   0.14 (  0%)   0.00 (  0%)   0.14 (  0%)     0  (  0%)
 final                              :   0.27 (  0%)   0.01 (  0%)   0.29 (  0%)    31M (  4%)
 straight-line strength reduction   :   0.10 (  0%)   0.00 (  0%)   0.10 (  0%)    33k (  0%)
 rest of compilation                :   0.93 (  2%)   0.24 ( 10%)   1.15 (  2%)  1158k (  0%)
 remove unused locals               :   0.07 (  0%)   0.00 (  0%)   0.07 (  0%)     0  (  0%)
 address taken                      :   0.09 (  0%)   0.00 (  0%)   0.09 (  0%)     0  (  0%)
 repair loop structures             :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 TOTAL                              :  55.95          2.30         58.28          718M


heine:~/programs/gambit/gambit-profiled> /pkgs/gcc-mainline/bin/gcc -march=native -D___CAN_IMPORT_CLIB_DYNAMICALLY  -O1     -Wno-unused -Wno-write-strings -Wdisabled-optimization -fwrapv -fno-strict-aliasing -fno-trapping-math -fno-math-errno -fschedule-insns2 -fomit-frame-pointer -fPIC -fno-common -mpc64   -rdynamic -shared  -D___SINGLE_HOST -D___DYNAMIC -I"/home/lucier/programs/gambit/gambit-profiled/include" -o 'fib-5.o1' -Q -fprofile-arcs -ftest-coverage -save-temps   'fib-5.c' 

Time variable                                   usr           sys          wall           GGC
 phase setup                        :   0.08 (100%)   0.02 (100%)   0.13 ( 93%)    22M (100%)
 TOTAL                              :   0.08          0.02          0.14           22M
 btowc wctob mbrlen ___H_fib_2d_5 ___setup_mod ___init_mod ___LNK_fib_2d_5_2e_o1
Analyzing compilation unit
Performing interprocedural optimizations
 <*free_lang_data> {heap 2884k} <visibility> {heap 2884k} <build_ssa_passes> {heap 2884k} <opt_local_passes> {heap 3032k} <remove_symbols> {heap 7436k} <targetclone> {heap 7436k} <profile> {heap 7436k} <free-fnsummary> {heap 7436k}Streaming LTO
 <whole-program> {heap 7436k} <profile_estimate> {heap 7436k} <fnsummary> {heap 7436k} <inline> {heap 7436k} <pure-const> {heap 7436k} <modref> {heap 7436k} <free-fnsummary> {heap 7436k} <static-var> {heap 7436k} <single-use> {heap 7436k} <comdats> {heap 7436k}Assembling functions:
 <simdclone> {heap 7436k} ___setup_mod ___init_mod ___H_fib_2d_5gcc: fatal error: Killed signal terminated program cc1
compilation terminated.
Comment 35 lucier 2021-03-10 02:13:13 UTC
Created attachment 50345 [details]
Parametrized input files for test coverage testing.

These are the .i files that go with my previous comment.
Comment 36 Richard Biener 2021-03-10 09:47:18 UTC
So the issue is still the same - one thing I noticed is that store-motion also
adds a flag for each counter update to avoid introducing store-data-races.
-fallow-store-data-races mitigates that part and speeds up the compilation quite a bit.  In case there are threads involved you'd want -fprofile-update=atomic
which then causes store-motion to give up and the compile-time is great overall.

The original trigger of the regression is likely the marking of the profile
counters as to not be aliased - we might want to introduce another flag to
tell that store-data-races for the particular decl are not a consideration
(maybe even have some user-visible attribute for this).

Otherwise re-confirmed (I stripped options down to -O -fPIC -fprofile-arcs -ftest-coverage):

rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O -fPIC -fprofile-arcs -ftest-coverage fib-2.o1-fib-2.i
1.84user 0.05system 0:01.90elapsed 99%CPU (0avgtext+0avgdata 160764maxresident)k
0inputs+0outputs (0major+58129minor)pagefaults 0swaps
rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O -fPIC -fprofile-arcs -ftest-coverage fib-3.o1-fib-3.i 
10.15user 0.17system 0:10.32elapsed 99%CPU (0avgtext+0avgdata 726688maxresident)k
0inputs+0outputs (0major+265008minor)pagefaults 0swaps
rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O -fPIC -fprofile-arcs -ftest-coverage fib-4.o1-fib-4.i 
43.60user 1.06system 0:44.68elapsed 99%CPU (0avgtext+0avgdata 6107260maxresident)k
0inputs+0outputs (0major+1765217minor)pagefaults 0swaps
rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O -fPIC -fprofile-arcs -ftest-coverage fib-5.o1-fib-5.i 
gcc: fatal error: Killed signal terminated program cc1
compilation terminated.
Command exited with non-zero status 1
143.09user 3.93system 2:28.29elapsed 99%CPU (0avgtext+0avgdata 24636148maxresident)k
37504inputs+0outputs (31major+6133278minor)pagefaults 0swaps

on the last which runs OOM adding -fallow-store-data-races does

rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O -fPIC -fprofile-arcs -ftest-coverage fib-5.o1-fib-5.i -fallow-store-data-races
123.06user 0.45system 2:03.59elapsed 99%CPU (0avgtext+0avgdata 1777700maxresident)k
57304inputs+0outputs (68major+535127minor)pagefaults 0swaps

and -fprofile-update=atomic

rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O -fPIC -fprofile-arcs -ftest-coverage fib-5.o1-fib-5.i -fprofile-update=atomic 
0.61user 0.02system 0:00.63elapsed 100%CPU (0avgtext+0avgdata 73236maxresident)k
72inputs+0outputs (0major+18284minor)pagefaults 0swaps

and -fno-tree-loop-im

rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O -fPIC -fprofile-arcs -ftest-coverage fib-5.o1-fib-5.i -fno-tree-loop-im      
1.06user 0.01system 0:01.07elapsed 99%CPU (0avgtext+0avgdata 90672maxresident)k
0inputs+0outputs (0major+24331minor)pagefaults 0swaps

I still wonder if you can produce an even smaller testcase where visualizing
the CFG is possible.  Unfortunately the source is mechanically generated
and following it is hard.  Like a testcase that retains the basic structure
but ends up with just a few (2, less than 10) computed gotos?
Comment 37 lucier 2021-03-10 14:16:24 UTC
Created attachment 50352 [details]
Smaller parameterized test file

This file is generated from a single copy of the fibonacci function, and is simplified a bit otherwise.  I believe it has two computed gotos.
Comment 38 Richard Biener 2021-03-10 15:06:07 UTC
Created attachment 50354 [details]
SVG of the CFG at LIM

This is a SVG of the CFG as created by dot at the point of the first LIM pass.

The CFG isn't too special and I guess a switch instead of the computed goto
would present us with the same issues.

I suppose putting a hard limit on the number of stores to move and then
ordering candidates based on their importance (execution frequency) is the
way to go.
Comment 39 Jakub Jelinek 2021-05-14 09:47:28 UTC
GCC 8 branch is being closed.
Comment 40 Richard Biener 2021-06-01 08:06:41 UTC
GCC 9.4 is being released, retargeting bugs to GCC 9.5.
Comment 41 Richard Biener 2022-05-27 09:35:29 UTC
GCC 9 branch is being closed
Comment 42 Jakub Jelinek 2022-06-28 10:31:18 UTC
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
Comment 43 Richard Biener 2023-07-07 10:30:36 UTC
GCC 10 branch is being closed.
Comment 44 Richard Biener 2023-09-28 07:06:08 UTC
I tried the first input file with GCC 13.2 and on a Ryzen 9 7900X get a memory
usage of 105MB and 1.1s compile-time.  The larger testcase needs 360MB peak
and 6.3s to compile.  Both with mostly flat -ftime-report profile.

Upping to -O2 shows same memory peak but 13.1s for the larger testcase.  We
then see

 PRE                                :   2.09 ( 16%)   0.01 (  1%)   2.15 ( 15%)   288k (  0%)

as the biggest thing sticking out (similar for the small testcase).

I think we've come a long way here.  GCC 12.3 behaves the same.  For GCC 11.4
the larger testcase at -O2 I stopped after 3 minutes, the small testcase at -O1
takes 44s and 5GB memory.

Fixed for GCC 12+, I'm not going to look at identifying what to backport (I usually backported compile-time/memory-usage improvements when reasonable, so
I suspect this was a bigger change).
Comment 45 lucier 2023-10-02 00:26:37 UTC
I confirm that I no longer have this problem with

> gcc-12 -v
Using built-in specs.
COLLECT_GCC=gcc-12
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 12.3.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-12 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-12-ALHxjy/gcc-12-12.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-ALHxjy/gcc-12-12.3.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.3.0 (Ubuntu 12.3.0-1ubuntu1~22.04) 

A different example procedure still took > 45 minutes and > 3.5 GB to compile with -ftest-coverage -fprofile-arcs (it had finished when I came back from lunch) but it was quite large (even by my standards!).

If this is a "won't fix" for earlier versions of gcc, then I'm OK with closing this PR.
Comment 46 Richard Biener 2023-10-04 06:39:38 UTC
It'll get closed when we close the GCC 11 branch, there's still the opportunity for somebody to bisect what fixed it in GCC 12 in case it was something trivial.
Comment 47 Richard Biener 2024-07-19 06:29:03 UTC
Fixed in GCC 12.