With this compiler: firefly:~/Downloads/gambit/lib> /pkgs/gcc-4.9.2/bin/gcc -v Using built-in specs. COLLECT_GCC=/pkgs/gcc-4.9.2/bin/gcc COLLECT_LTO_WRAPPER=/pkgs/gcc-4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../../gcc-4.9.2/configure --prefix=/pkgs/gcc-4.9.2 Thread model: posix gcc version 4.9.2 (GCC) With this command: /pkgs/gcc-4.9.2/bin/gcc -Q -save-temps -Wno-unused -Wno-write-strings -O1 -fno-math-errno -fschedule-insns2 -fno-strict-aliasing -fno-trapping-math -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -fprofile-arcs -ftest-coverage -I"../include" -c -o "_system.o" -I. -DHAVE_CONFIG_H -D___GAMBCDIR="\"/usr/local/Gambit-C\"" -D___SYS_TYPE_CPU="\"x86_64\"" -D___SYS_TYPE_VENDOR="\"unknown\"" -D___SYS_TYPE_OS="\"linux-gnu\"" -D___CONFIGURE_COMMAND="\"./configure 'CC=/pkgs/gcc-4.9.2/bin/gcc -Q -save-temps' '--enable-track-scheme' '--enable-coverage'"\" -D___OBJ_EXTENSION="\".o\"" -D___EXE_EXTENSION="\"\"" -D___BAT_EXTENSION="\"\"" -D___PRIMAL _system.c -D___LIBRARY I get the output: Execution times (seconds) phase setup : 0.12 (100%) usr 0.00 ( 0%) sys 0.13 (100%) wall 35712 kB (100%) ggc TOTAL : 0.12 0.00 0.13 35728 kB btowc wctob mbrlen __signbitf __signbit __signbitl ___H__20___system ___H__23__23_type ___H__23__23_type_2d_cast ___H__23__23_subtype ___H__23__23_subtype_2d_set_21_ ___H__23__23_fixnum_3f_ ___H__23__23_subtyped_3f_ ___H__23__23_subtyped_2d_mutable_3f_ ___H__23__23_subtyped_2e_vector_3f_ ___H__23__23_subtyped_2e_symbol_3f_ ___H__23__23_subtyped_2e_flonum_3f_ ___H__23__23_subtyped_2e_bignum_3f_ ___H__23__23_special_3f_ ___H__23__23_ratnum_3f_ ___H__23__23_cpxnum_3f_ ___H__23__23_structure_3f_ ___H__23__23_values_3f_ ___H__23__23_meroon_3f_ ___H__23__23_jazz_3f_ ___H__23__23_frame_3f_ ___H__23__23_continuation_3f_ ___H__23__23_promise_3f_ ___H__23__23_return_3f_ ___H__23__23_foreign_3f_ ___H__23__23_flonum_3f_ ___H__23__23_bignum_3f_ ___H__23__23_unbound_3f_ ___H__23__23_quasi_2d_append ___H__23__23_quasi_2d_list ___H__23__23_quasi_2d_cons ___H__23__23_quasi_2d_list_2d__3e_vector ___H__23__23_quasi_2d_vector ___H__23__23_case_2d_memv ___H__23__23_eqv_3f_ ___H_eqv_3f_ ___H__23__23_eq_3f_ ___H_eq_3f_ ___H__23__23_bvector_2d_equal_3f_ ___H__23__23_equal_3f_ ___H_equal_3f_ ___H__23__23_symbol_2d_hash ___H_symbol_2d_hash ___H__23__23_keyword_2d_hash ___H_keyword_2d_hash ___H__23__23_eq_3f__2d_hash ___H_eq_3f__2d_hash ___H__23__23_eqv_3f__2d_hash ___H_eqv_3f__2d_hash ___H__23__23_equal_3f__2d_hash ___H_equal_3f__2d_hash ___H__23__23_string_3d__3f__2d_hash ___H_string_3d__3f__2d_hash ___H__23__23_string_2d_ci_3d__3f__2d_hash ___H_string_2d_ci_3d__3f__2d_hash ___H__23__23_generic_2d_hash ___H__23__23_fail_2d_check_2d_invalid_2d_hash_2d_number_2d_exception ___H_invalid_2d_hash_2d_number_2d_exception_3f_ ___H_invalid_2d_hash_2d_number_2d_exception_2d_procedure ___H_invalid_2d_hash_2d_number_2d_exception_2d_arguments ___H__23__23_raise_2d_invalid_2d_hash_2d_number_2d_exception ___H__23__23_fail_2d_check_2d_unbound_2d_table_2d_key_2d_exception ___H_unbound_2d_table_2d_key_2d_exception_3f_ ___H_unbound_2d_table_2d_key_2d_exception_2d_procedure ___H_unbound_2d_table_2d_key_2d_exception_2d_arguments ___H__23__23_raise_2d_unbound_2d_table_2d_key_2d_exception ___H__23__23_gc_2d_hash_2d_table_3f_ ___H__23__23_gc_2d_hash_2d_table_2d_ref ___H__23__23_gc_2d_hash_2d_table_2d_set_21_ ___H__23__23_gc_2d_hash_2d_table_2d_rehash_21_ ___H__23__23_smallest_2d_prime_2d_no_2d_less_2d_than ___H__23__23_gc_2d_hash_2d_table_2d_resize_21_ ___H__23__23_gc_2d_hash_2d_table_2d_allocate ___H__23__23_gc_2d_hash_2d_table_2d_for_2d_each ___H__23__23_gc_2d_hash_2d_table_2d_search ___H__23__23_gc_2d_hash_2d_table_2d_foldl ___H__23__23_mem_2d_allocated_3f_ ___H__23__23_fail_2d_check_2d_table ___H_table_3f_ ___H__23__23_make_2d_table ___H_make_2d_table ___H__23__23_table_2d_get_2d_eq_2d_gcht ___H__23__23_table_2d_get_2d_gcht_2d_not_2d_mem_2d_alloc ___H__23__23_table_2d_get_2d_gcht ___H__23__23_table_2d_length ___H_table_2d_length ___H__23__23_table_2d_access ___H__23__23_table_2d_ref ___H_table_2d_ref ___H__23__23_table_2d_resize_21_ ___H__23__23_table_2d_set_21_ ___H_table_2d_set_21_ ___H__23__23_table_2d_search ___H_table_2d_search ___H__23__23_table_2d_for_2d_each ___H_table_2d_for_2d_each ___H__23__23_table_2d_foldl ___H__23__23_table_2d__3e_list ___H_table_2d__3e_list ___H__23__23_list_2d__3e_table ___H_list_2d__3e_table ___H__23__23_table_2d_copy ___H_table_2d_copy ___H__23__23_table_2d_merge_21_ ___H_table_2d_merge_21_ ___H__23__23_table_2d_merge ___H_table_2d_merge ___H__23__23_table_2d_equal_3f_ ___H__23__23_table_2d_equal_3f__2d_hash ___H__23__23_fail_2d_check_2d_unbound_2d_serial_2d_number_2d_exception ___H_unbound_2d_serial_2d_number_2d_exception_3f_ ___H_unbound_2d_serial_2d_number_2d_exception_2d_procedure ___H_unbound_2d_serial_2d_number_2d_exception_2d_arguments ___H__23__23_raise_2d_unbound_2d_serial_2d_number_2d_exception ___H__23__23_object_2d__3e_serial_2d_number ___H_object_2d__3e_serial_2d_number ___H__23__23_serial_2d_number_2d__3e_object ___H_serial_2d_number_2d__3e_object ___H__23__23_object_2d__3e_u8vector ___H_object_2d__3e_u8vector ___H__23__23_u8vector_2d__3e_object ___H_u8vector_2d__3e_object ___setup_mod ___init_mod ____20___system Analyzing compilation unit Performing interprocedural optimizations <*free_lang_data> <visibility> <early_local_cleanups> <*free_inline_summary> <profile> <whole-program> <profile_estimate> <inline> <pure-const> <static-var>Assembling functions: ___setup_mod ___init_mod ___H_u8vector_2d__3e_object ___H__23__23_u8vector_2d__3e_object ___H_object_2d__3e_u8vector ___H__23__23_object_2d__3e_u8vector {GC 298137k -> 101678k} ___H_serial_2d_number_2d__3e_object ___H__23__23_serial_2d_number_2d__3e_object ___H_object_2d__3e_serial_2d_number ___H__23__23_object_2d__3e_serial_2d_number ___H__23__23_raise_2d_unbound_2d_serial_2d_number_2d_exception ___H_unbound_2d_serial_2d_number_2d_exception_2d_arguments ___H_unbound_2d_serial_2d_number_2d_exception_2d_procedure ___H_unbound_2d_serial_2d_number_2d_exception_3f_ ___H__23__23_fail_2d_check_2d_unbound_2d_serial_2d_number_2d_exception ___H__23__23_table_2d_equal_3f__2d_hash ___H__23__23_table_2d_equal_3f_ ___H_table_2d_merge ___H__23__23_table_2d_merge ___H_table_2d_merge_21_ ___H__23__23_table_2d_merge_21_ ___H_table_2d_copy ___H__23__23_table_2d_copy ___H_list_2d__3e_table ___H__23__23_list_2d__3e_table ___H_table_2d__3e_list ___H__23__23_table_2d__3e_list ___H__23__23_table_2d_foldl ___H_table_2d_for_2d_each ___H__23__23_table_2d_for_2d_each ___H_table_2d_search ___H__23__23_table_2d_search ___H_table_2d_set_21_ ___H__23__23_table_2d_resize_21_ ___H_table_2d_ref ___H__23__23_table_2d_access ___H_table_2d_length ___H__23__23_table_2d_length ___H__23__23_table_2d_get_2d_gcht ___H__23__23_table_2d_get_2d_gcht_2d_not_2d_mem_2d_alloc ___H__23__23_table_2d_get_2d_eq_2d_gcht ___H_make_2d_table ___H_table_3f_ ___H__23__23_fail_2d_check_2d_table ___H__23__23_mem_2d_allocated_3f_ ___H__23__23_gc_2d_hash_2d_table_2d_foldl ___H__23__23_gc_2d_hash_2d_table_2d_search ___H__23__23_gc_2d_hash_2d_table_2d_for_2d_each ___H__23__23_gc_2d_hash_2d_table_2d_allocate ___H__23__23_gc_2d_hash_2d_table_2d_resize_21_ ___H__23__23_smallest_2d_prime_2d_no_2d_less_2d_than ___H__23__23_gc_2d_hash_2d_table_3f_ ___H__23__23_raise_2d_unbound_2d_table_2d_key_2d_exception ___H_unbound_2d_table_2d_key_2d_exception_2d_arguments ___H_unbound_2d_table_2d_key_2d_exception_2d_procedure ___H_unbound_2d_table_2d_key_2d_exception_3f_ ___H__23__23_fail_2d_check_2d_unbound_2d_table_2d_key_2d_exception ___H__23__23_raise_2d_invalid_2d_hash_2d_number_2d_exception ___H_invalid_2d_hash_2d_number_2d_exception_2d_arguments ___H_invalid_2d_hash_2d_number_2d_exception_2d_procedure ___H_invalid_2d_hash_2d_number_2d_exception_3f_ ___H__23__23_fail_2d_check_2d_invalid_2d_hash_2d_number_2d_exception ___H__23__23_generic_2d_hash ___H_string_2d_ci_3d__3f__2d_hash ___H_string_3d__3f__2d_hash ___H__23__23_string_3d__3f__2d_hash ___H_equal_3f__2d_hash ___H__23__23_equal_3f__2d_hash ___H_eqv_3f__2d_hash ___H__23__23_eqv_3f__2d_hash ___H_eq_3f__2d_hash ___H__23__23_eq_3f__2d_hash ___H_keyword_2d_hash ___H__23__23_keyword_2d_hash ___H_symbol_2d_hash ___H__23__23_symbol_2d_hash ___H_equal_3f_ ___H__23__23_equal_3f_ ___H__23__23_bvector_2d_equal_3f_ ___H_eq_3f_ ___H__23__23_eq_3f_ ___H_eqv_3f_ ___H__23__23_eqv_3f_ ___H__23__23_case_2d_memv ___H__23__23_quasi_2d_vector ___H__23__23_quasi_2d_list_2d__3e_vector ___H__23__23_quasi_2d_cons ___H__23__23_quasi_2d_list ___H__23__23_quasi_2d_append ___H__23__23_unbound_3f_ ___H__23__23_bignum_3f_ ___H__23__23_flonum_3f_ ___H__23__23_foreign_3f_ ___H__23__23_return_3f_ ___H__23__23_promise_3f_ ___H__23__23_continuation_3f_ ___H__23__23_frame_3f_ ___H__23__23_jazz_3f_ ___H__23__23_meroon_3f_ ___H__23__23_values_3f_ ___H__23__23_structure_3f_ ___H__23__23_cpxnum_3f_ ___H__23__23_ratnum_3f_ ___H__23__23_special_3f_ ___H__23__23_subtyped_2e_bignum_3f_ ___H__23__23_subtyped_2e_flonum_3f_ ___H__23__23_subtyped_2e_symbol_3f_ ___H__23__23_subtyped_2e_vector_3f_ ___H__23__23_subtyped_2d_mutable_3f_ ___H__23__23_subtyped_3f_ ___H__23__23_fixnum_3f_ ___H__23__23_subtype_2d_set_21_ ___H__23__23_subtype ___H__23__23_type_2d_cast ___H__23__23_type ___H__20___system ___H__23__23_gc_2d_hash_2d_table_2d_set_21_ ___H__23__23_table_2d_set_21_ ___H__23__23_gc_2d_hash_2d_table_2d_rehash_21_ ___H__23__23_table_2d_ref ___H__23__23_gc_2d_hash_2d_table_2d_ref ___H__23__23_make_2d_table ___H__23__23_string_2d_ci_3d__3f__2d_hash ____20___system _GLOBAL__sub_I_65535_0__system.c Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1134 kB ( 0%) ggc phase parsing : 0.11 ( 0%) usr 0.12 (14%) sys 0.23 ( 1%) wall 7383 kB ( 1%) ggc phase opt and generate : 35.79 (100%) usr 0.73 (86%) sys 36.55 (99%) wall 513422 kB (98%) ggc garbage collection : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 0 kB ( 0%) ggc dump files : 0.00 ( 0%) usr 0.01 ( 1%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc callgraph construction : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 2337 kB ( 0%) ggc callgraph optimization : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 399 kB ( 0%) ggc ipa dead code removal : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc ipa inlining heuristics : 0.01 ( 0%) usr 0.01 ( 1%) sys 0.01 ( 0%) wall 1132 kB ( 0%) ggc ipa profile : 0.00 ( 0%) usr 0.01 ( 1%) sys 0.00 ( 0%) wall 2688 kB ( 1%) ggc ipa pure const : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc cfg construction : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 416 kB ( 0%) ggc cfg cleanup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 14 kB ( 0%) ggc trivially dead code : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall 0 kB ( 0%) ggc df scan insns : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 13 kB ( 0%) ggc df multiple defs : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc df reaching defs : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc df live regs : 0.29 ( 1%) usr 0.00 ( 0%) sys 0.26 ( 1%) wall 0 kB ( 0%) ggc df live&initialized regs: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc df reg dead/unused notes: 0.24 ( 1%) usr 0.01 ( 1%) sys 0.24 ( 1%) wall 12426 kB ( 2%) ggc register information : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc alias analysis : 0.21 ( 1%) usr 0.00 ( 0%) sys 0.19 ( 1%) wall 23934 kB ( 5%) ggc alias stmt walking : 0.33 ( 1%) usr 0.01 ( 1%) sys 0.28 ( 1%) wall 609 kB ( 0%) ggc register scan : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 104 kB ( 0%) ggc rebuild jump labels : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc preprocessing : 0.03 ( 0%) usr 0.03 ( 4%) sys 0.06 ( 0%) wall 1743 kB ( 0%) ggc lexical analysis : 0.03 ( 0%) usr 0.03 ( 4%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc parser (global) : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 1477 kB ( 0%) ggc parser function body : 0.04 ( 0%) usr 0.06 ( 7%) sys 0.10 ( 0%) wall 3815 kB ( 1%) ggc inline parameters : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 89 kB ( 0%) ggc tree gimplify : 0.03 ( 0%) usr 0.01 ( 1%) sys 0.02 ( 0%) wall 5057 kB ( 1%) ggc tree CFG construction : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1743 kB ( 0%) ggc tree CFG cleanup : 0.16 ( 0%) usr 0.00 ( 0%) sys 0.24 ( 1%) wall 300 kB ( 0%) ggc tree copy propagation : 0.28 ( 1%) usr 0.00 ( 0%) sys 0.31 ( 1%) wall 3211 kB ( 1%) ggc tree PTA : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 217 kB ( 0%) ggc tree PHI insertion : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 2191 kB ( 0%) ggc tree SSA rewrite : 0.19 ( 1%) usr 0.00 ( 0%) sys 0.18 ( 0%) wall 17777 kB ( 3%) ggc tree SSA other : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 18 kB ( 0%) ggc tree SSA incremental : 0.23 ( 1%) usr 0.01 ( 1%) sys 0.26 ( 1%) wall 27481 kB ( 5%) ggc tree operand scan : 0.02 ( 0%) usr 0.02 ( 2%) sys 0.05 ( 0%) wall 15630 kB ( 3%) ggc dominator optimization : 0.22 ( 1%) usr 0.01 ( 1%) sys 0.22 ( 1%) wall 27417 kB ( 5%) ggc tree CCP : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 491 kB ( 0%) ggc tree PHI const/copy prop: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 127 kB ( 0%) ggc tree split crit edges : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 743 kB ( 0%) ggc tree reassociation : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 7 kB ( 0%) ggc tree FRE : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 2875 kB ( 1%) ggc tree code sinking : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree forward propagate : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 336 kB ( 0%) ggc tree conservative DCE : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 99 kB ( 0%) ggc tree aggressive DCE : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 20 kB ( 0%) ggc tree DSE : 2.80 ( 8%) usr 0.00 ( 0%) sys 2.80 ( 8%) wall 0 kB ( 0%) ggc tree loop invariant motion: 0.16 ( 0%) usr 0.03 ( 4%) sys 0.19 ( 1%) wall 64219 kB (12%) ggc scev constant prop : 0.29 ( 1%) usr 0.00 ( 0%) sys 0.27 ( 1%) wall 12074 kB ( 2%) ggc complete unrolling : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 16 kB ( 0%) ggc tree iv optimization : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 932 kB ( 0%) ggc tree SSA uncprop : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc tree rename SSA copies : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc dominance frontiers : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc dominance computation : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 0 kB ( 0%) ggc out of ssa : 5.90 (16%) usr 0.50 (59%) sys 6.41 (17%) wall 26 kB ( 0%) ggc expand vars : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 866 kB ( 0%) ggc expand : 0.39 ( 1%) usr 0.02 ( 2%) sys 0.40 ( 1%) wall 87038 kB (17%) ggc post expand cleanups : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 322 kB ( 0%) ggc forward prop : 0.33 ( 1%) usr 0.00 ( 0%) sys 0.33 ( 1%) wall 14733 kB ( 3%) ggc CSE : 7.53 (21%) usr 0.01 ( 1%) sys 7.53 (20%) wall 30934 kB ( 6%) ggc dead code elimination : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 0 kB ( 0%) ggc dead store elim1 : 0.37 ( 1%) usr 0.01 ( 1%) sys 0.36 ( 1%) wall 7276 kB ( 1%) ggc dead store elim2 : 1.73 ( 5%) usr 0.00 ( 0%) sys 1.71 ( 5%) wall 18715 kB ( 4%) ggc loop init : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 713 kB ( 0%) ggc loop invariant motion : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 27 kB ( 0%) ggc branch prediction : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 277 kB ( 0%) ggc combiner : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 913 kB ( 0%) ggc integrated RA : 0.87 ( 2%) usr 0.00 ( 0%) sys 0.99 ( 3%) wall 48097 kB ( 9%) ggc LRA non-specific : 1.61 ( 4%) usr 0.01 ( 1%) sys 1.63 ( 4%) wall 37254 kB ( 7%) ggc LRA virtuals elimination: 0.13 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall 15481 kB ( 3%) ggc LRA reload inheritance : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 11 kB ( 0%) ggc LRA create live ranges : 0.32 ( 1%) usr 0.01 ( 1%) sys 0.29 ( 1%) wall 7642 kB ( 1%) ggc LRA hard reg assignment : 0.69 ( 2%) usr 0.01 ( 1%) sys 0.73 ( 2%) wall 0 kB ( 0%) ggc reload CSE regs : 5.74 (16%) usr 0.00 ( 0%) sys 5.73 (16%) wall 12325 kB ( 2%) ggc thread pro- & epilogue : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 465 kB ( 0%) ggc combine stack adjustments: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc hard reg cprop : 0.33 ( 1%) usr 0.00 ( 0%) sys 0.34 ( 1%) wall 4 kB ( 0%) ggc scheduling 2 : 1.79 ( 5%) usr 0.01 ( 1%) sys 1.77 ( 5%) wall 299 kB ( 0%) ggc machine dep reorg : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc shorten branches : 0.21 ( 1%) usr 0.00 ( 0%) sys 0.20 ( 1%) wall 0 kB ( 0%) ggc final : 0.36 ( 1%) usr 0.00 ( 0%) sys 0.34 ( 1%) wall 1508 kB ( 0%) ggc variable output : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 146 kB ( 0%) ggc straight-line strength reduction: 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 16 kB ( 0%) ggc rest of compilation : 0.35 ( 1%) usr 0.02 ( 2%) sys 0.37 ( 1%) wall 991 kB ( 0%) ggc remove unused locals : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 0 kB ( 0%) ggc address taken : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc unaccounted todo : 0.19 ( 1%) usr 0.00 ( 0%) sys 0.18 ( 0%) wall 0 kB ( 0%) ggc TOTAL : 35.90 0.85 36.79 521957 kB The "phase opt and generate" part uses most of the CPU time and most of the RAM. With somewhat larger files, RAM usage goes up > 80GB. Including _system.i with this report.
Created attachment 34660 [details] Input file for bug
Note phase opt and generate is a toplevel time area. The passes which take most of the time are: tree DSE : 2.80 ( 8%) usr 0.00 ( 0%) sys 2.80 ( 8%) wall 0 kB ( 0%) ggc out of ssa : 5.90 (16%) usr 0.50 (59%) sys 6.41 (17%) wall 26 kB ( 0%) ggc CSE : 7.53 (21%) usr 0.01 ( 1%) sys 7.53 (20%) wall 30934 kB ( 6%) ggc reload CSE regs : 5.74 (16%) usr 0.00 ( 0%) sys 5.73 (16%) wall 12325 kB ( 2%) ggc scheduling 2 : 1.79 ( 5%) usr 0.01 ( 1%) sys 1.77 ( 5%) wall 299 kB ( 0%) ggc
I think this is just an issue with computed goto (indirect gotos).
On 02/03/2015 04:32 PM, pinskia at gcc dot gnu.org wrote: > > --- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> --- > Note phase opt and generate is a toplevel time area. > The passes which take most of the time are: I'm also concerned about excessive memory usage; the largest passes (> 20 MB) are alias analysis : 0.21 ( 1%) usr 0.00 ( 0%) sys 0.19 ( 1%) wall 23934 kB ( 5%) ggc tree SSA incremental : 0.23 ( 1%) usr 0.01 ( 1%) sys 0.26 ( 1%) wall 27481 kB ( 5%) ggc dominator optimization : 0.22 ( 1%) usr 0.01 ( 1%) sys 0.22 ( 1%) wall 27417 kB ( 5%) ggc tree loop invariant motion: 0.16 ( 0%) usr 0.03 ( 4%) sys 0.19 ( 1%) wall 64219 kB (12%) ggc expand : 0.39 ( 1%) usr 0.02 ( 2%) sys 0.40 ( 1%) wall 87038 kB (17%) ggc CSE : 7.53 (21%) usr 0.01 ( 1%) sys 7.53 (20%) wall 30934 kB ( 6%) ggc integrated RA : 0.87 ( 2%) usr 0.00 ( 0%) sys 0.99 ( 3%) wall 48097 kB ( 9%) ggc LRA non-specific : 1.61 ( 4%) usr 0.01 ( 1%) sys 1.63 ( 4%) wall 37254 kB ( 7%) ggc This also affects the 4.8 branch and the mainline.
Created attachment 34681 [details] _io.i.gz: larger test file With this compiler: firefly:~/Downloads/gambit/lib> /pkgs/gcc-mainline/bin/gcc -v Using built-in specs. COLLECT_GCC=/pkgs/gcc-mainline/bin/gcc COLLECT_LTO_WRAPPER=/pkgs/gcc-mainline/libexec/gcc/x86_64-unknown-linux-gnu/5.0.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../../gcc-devel/configure --prefix=/pkgs/gcc-mainline --enable-languages=c --enable-checking=release Thread model: posix gcc version 5.0.0 20150206 (experimental) [trunk revision 220467] (GCC) and the input file _io.c, I find /pkgs/gcc-mainline/bin/gcc -Q -save-temps -Wno-unused -Wno-write-strings -O1 -fno-math-errno -fschedule-insns2 -fno-strict-aliasing -fno-trapping-math -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -fprofile-arcs -ftest-coverage -I"../include" -c -o "_io.o" -I. -DHAVE_CONFIG_H -D___GAMBCDIR="\"/usr/local/Gambit-C\"" -D___SYS_TYPE_CPU="\"x86_64\"" -D___SYS_TYPE_VENDOR="\"unknown\"" -D___SYS_TYPE_OS="\"linux-gnu\"" -D___CONFIGURE_COMMAND="\"./configure 'CC=/pkgs/gcc-mainline/bin/gcc -Q -save-temps' '--enable-coverage' '--enable-track-scheme'"\" -D___OBJ_EXTENSION="\".o\"" -D___EXE_EXTENSION="\"\"" -D___BAT_EXTENSION="\"\"" -D___PRIMAL _io.c -D___LIBRARY Execution times (seconds) phase setup : 0.78 (100%) usr 0.04 (100%) sys 0.83 (100%) wall 156905 kB (100%) ggc TOTAL : 0.78 0.04 0.83 156922 kB btowc wctob mbrlen __signbitf __signbit __signbitl ___H__20___io ___H__23__23_fail_2d_check_2d_datum_2d_parsing_2d_exception ___H_datum_2d_parsing_2d_exception_3f_ ___H_datum_2d_parsing_2d_exception_2d_kind ___H_datum_2d_parsing_2d_exception_2d_readenv ___H_datum_2d_parsing_2d_exception_2d_parameters ___H__23__23_raise_2d_datum_2d_parsing_2d_exception ___H__23__23_fail_2d_check_2d_unterminated_2d_process_2d_exception ___H_unterminated_2d_process_2d_exception_3f_ ___H_unterminated_2d_process_2d_exception_2d_procedure ___H_unterminated_2d_process_2d_exception_2d_arguments ___H__23__23_raise_2d_unterminated_2d_process_2d_exception ___H__23__23_fail_2d_check_2d_nonempty_2d_input_2d_port_2d_character_2d_buffer_2d_exception ___H_nonempty_2d_input_2d_port_2d_character_2d_buffer_2d_exception_3f_ ___H_nonempty_2d_input_2d_port_2d_character_2d_buffer_2d_exception_2d_procedure ___H_nonempty_2d_input_2d_port_2d_character_2d_buffer_2d_exception_2d_arguments ___H__23__23_raise_2d_nonempty_2d_input_2d_port_2d_character_2d_buffer_2d_exception ___H__23__23_fail_2d_check_2d_no_2d_such_2d_file_2d_or_2d_directory_2d_exception ___H_no_2d_such_2d_file_2d_or_2d_directory_2d_exception_3f_ ___H_no_2d_such_2d_file_2d_or_2d_directory_2d_exception_2d_procedure ___H_no_2d_such_2d_file_2d_or_2d_directory_2d_exception_2d_arguments ___H__23__23_raise_2d_no_2d_such_2d_file_2d_or_2d_directory_2d_exception ___H__23__23_raise_2d_os_2d_io_2d_exception ___H__23__23_raise_2d_io_2d_exception ___H__23__23_fail_2d_check_2d_settings ___H__23__23_fail_2d_check_2d_exact_2d_integer_2d_or_2d_string_2d_or_2d_settings ___H__23__23_fail_2d_check_2d_string_2d_or_2d_ip_2d_address ___H__23__23_make_2d_writeenv ___H__23__23_make_2d_readenv ___H__23__23_readenv_2d_current_2d_filepos ___H__23__23_readenv_2d_relative_2d_filepos ___H__23__23_make_2d_psettings ___H__23__23_parse_2d_psettings_21_ ___H__23__23_psettings_2d__3e_roptions ___H__23__23_psettings_2d__3e_woptions ___H__23__23_psettings_2d__3e_input_2d_readtable ___H__23__23_psettings_2d__3e_output_2d_readtable ___H__23__23_psettings_2d_options_2d__3e_options ___H__23__23_psettings_2d__3e_device_2d_flags ___H__23__23_psettings_2d__3e_permissions ___H__23__23_psettings_2d__3e_output_2d_width ___H__23__23_port_3f_ ___H_port_3f_ ___H__23__23_input_2d_port_3f_ ___H_input_2d_port_3f_ ___H__23__23_output_2d_port_3f_ ___H_output_2d_port_3f_ ___H__23__23_fail_2d_check_2d_port ___H__23__23_fail_2d_check_2d_input_2d_port ___H__23__23_fail_2d_check_2d_output_2d_port ___H__23__23_fail_2d_check_2d_character_2d_input_2d_port ___H__23__23_fail_2d_check_2d_character_2d_output_2d_port ___H__23__23_fail_2d_check_2d_byte_2d_port ___H__23__23_fail_2d_check_2d_byte_2d_input_2d_port ___H__23__23_fail_2d_check_2d_byte_2d_output_2d_port ___H__23__23_fail_2d_check_2d_device_2d_input_2d_port ___H__23__23_fail_2d_check_2d_device_2d_output_2d_port ___H__23__23_make_2d_io_2d_condvar ___H__23__23_io_2d_condvar_3f_ ___H__23__23_io_2d_condvar_2d_for_2d_writing_3f_ ___H__23__23_io_2d_condvar_2d_port ___H__23__23_io_2d_condvar_2d_port_2d_set_21_ ___H__23__23_make_2d_dummy_2d_port ___H_open_2d_dummy ___H__23__23_make_2d_device_2d_port ___H__23__23_make_2d_rdevice_2d_condvar ___H__23__23_make_2d_wdevice_2d_condvar ___H__23__23_make_2d_device_2d_port_2d_from_2d_single_2d_device ___H__23__23_close_2d_device ___H__23__23_input_2d_port_2d_byte_2d_position ___H_input_2d_port_2d_byte_2d_position ___H__23__23_output_2d_port_2d_byte_2d_position ___H_output_2d_port_2d_byte_2d_position ___H__23__23_device_2d_port_2d_wait_2d_for_2d_input_21_ ___H__23__23_device_2d_port_2d_wait_2d_for_2d_output_21_ ___H__23__23_char_2d_rbuf_2d_fill ___H__23__23_byte_2d_rbuf_2d_fill ___H__23__23_char_2d_wbuf_2d_drain_2d_no_2d_reset ___H__23__23_char_2d_wbuf_2d_drain ___H__23__23_byte_2d_wbuf_2d_drain_2d_no_2d_reset ___H__23__23_byte_2d_wbuf_2d_drain ___H__23__23_vect_2d_port_2d_options ___H__23__23_fail_2d_check_2d_vector_2d_input_2d_port ___H__23__23_fail_2d_check_2d_vector_2d_output_2d_port ___H__23__23_fail_2d_check_2d_vector_2d_or_2d_settings ___H__23__23_subvector_2d__3e_fifo ___H__23__23_fifo_2d__3e_vector ___H__23__23_open_2d_vector_2d_generic ___H__23__23_open_2d_vector ___H_open_2d_vector ___H__23__23_make_2d_vector_2d_pipe_2d_port ___H__23__23_open_2d_vector_2d_pipe_2d_generic ___H__23__23_open_2d_vector_2d_pipe ___H_open_2d_vector_2d_pipe ___H__23__23_open_2d_input_2d_vector ___H_open_2d_input_2d_vector ___H__23__23_open_2d_output_2d_vector ___H_open_2d_output_2d_vector ___H__23__23_get_2d_output_2d_vector ___H_get_2d_output_2d_vector ___H_call_2d_with_2d_input_2d_vector ___H_call_2d_with_2d_output_2d_vector ___H_with_2d_input_2d_from_2d_vector ___H_with_2d_output_2d_to_2d_vector ___H__23__23_make_2d_vector_2d_port ___H__23__23_fail_2d_check_2d_string_2d_input_2d_port ___H__23__23_fail_2d_check_2d_string_2d_output_2d_port ___H__23__23_fail_2d_check_2d_string_2d_or_2d_settings ___H__23__23_substring_2d__3e_fifo ___H__23__23_fifo_2d__3e_string ___H__23__23_open_2d_string_2d_generic ___H__23__23_open_2d_string ___H_open_2d_string ___H__23__23_make_2d_string_2d_pipe_2d_port ___H__23__23_open_2d_string_2d_pipe_2d_generic ___H__23__23_open_2d_string_2d_pipe ___H_open_2d_string_2d_pipe ___H__23__23_open_2d_input_2d_string ___H_open_2d_input_2d_string ___H__23__23_open_2d_output_2d_string ___H_open_2d_output_2d_string ___H__23__23_get_2d_output_2d_string ___H_get_2d_output_2d_string ___H_call_2d_with_2d_input_2d_string ___H_call_2d_with_2d_output_2d_string ___H_with_2d_input_2d_from_2d_string ___H_with_2d_output_2d_to_2d_string ___H__23__23_make_2d_string_2d_port ___H__23__23_fail_2d_check_2d_u8vector_2d_input_2d_port ___H__23__23_fail_2d_check_2d_u8vector_2d_output_2d_port ___H__23__23_fail_2d_check_2d_u8vector_2d_or_2d_settings ___H__23__23_subu8vector_2d__3e_fifo ___H__23__23_fifo_2d__3e_u8vector ___H__23__23_open_2d_u8vector_2d_generic ___H__23__23_open_2d_u8vector ___H_open_2d_u8vector ___H__23__23_make_2d_u8vector_2d_pipe_2d_port ___H__23__23_open_2d_u8vector_2d_pipe_2d_generic ___H__23__23_open_2d_u8vector_2d_pipe ___H_open_2d_u8vector_2d_pipe ___H__23__23_open_2d_input_2d_u8vector ___H_open_2d_input_2d_u8vector ___H__23__23_open_2d_output_2d_u8vector ___H_open_2d_output_2d_u8vector ___H__23__23_get_2d_output_2d_u8vector ___H_get_2d_output_2d_u8vector ___H_call_2d_with_2d_input_2d_u8vector ___H_call_2d_with_2d_output_2d_u8vector ___H_with_2d_input_2d_from_2d_u8vector ___H_with_2d_output_2d_to_2d_u8vector ___H__23__23_make_2d_u8vector_2d_port ___H__23__23_port_2d_of_2d_kind_3f_ ___H__23__23_port_2d_kind ___H__23__23_port_2d_device ___H__23__23_port_2d_name ___H__23__23_read ___H_read ___H__23__23_write_2d_generic_2d_to_2d_character_2d_port ___H__23__23_write ___H_write ___H__23__23_display ___H_display ___H__23__23_pretty_2d_print ___H_pretty_2d_print ___H__23__23_print ___H_print ___H_println ___H__23__23_newline ___H_newline ___H__23__23_flush_2d_input_2d_buffering ___H__23__23_force_2d_output ___H_force_2d_output ___H__23__23_close_2d_input_2d_port ___H_close_2d_input_2d_port ___H__23__23_close_2d_output_2d_port ___H_close_2d_output_2d_port ___H__23__23_close_2d_port ___H_close_2d_port ___H_input_2d_port_2d_readtable ___H_input_2d_port_2d_readtable_2d_set_21_ ___H_output_2d_port_2d_readtable ___H_output_2d_port_2d_readtable_2d_set_21_ ___H__23__23_input_2d_port_2d_timeout_2d_set_21_ ___H_input_2d_port_2d_timeout_2d_set_21_ ___H__23__23_output_2d_port_2d_timeout_2d_set_21_ ___H_output_2d_port_2d_timeout_2d_set_21_ ___H__23__23_port_2d_io_2d_exception_2d_handler_2d_set_21_ ___H_port_2d_io_2d_exception_2d_handler_2d_set_21_ ___H__23__23_input_2d_port_2d_char_2d_position ___H_input_2d_port_2d_char_2d_position ___H__23__23_output_2d_port_2d_char_2d_position ___H_output_2d_port_2d_char_2d_position ___H__23__23_input_2d_port_2d_line_2d_set_21_ ___H__23__23_input_2d_port_2d_line ___H_input_2d_port_2d_line ___H__23__23_input_2d_port_2d_column_2d_set_21_ ___H__23__23_input_2d_port_2d_column ___H_input_2d_port_2d_column ___H__23__23_output_2d_port_2d_line_2d_set_21_ ___H__23__23_output_2d_port_2d_line ___H_output_2d_port_2d_line ___H__23__23_output_2d_port_2d_column_2d_set_21_ ___H__23__23_output_2d_port_2d_column ___H_output_2d_port_2d_column ___H__23__23_output_2d_port_2d_width ___H_output_2d_port_2d_width ___H__23__23_object_2d__3e_truncated_2d_string ___H__23__23_object_2d__3e_string ___H_object_2d__3e_string ___H__23__23_string_2d__3e_limited_2d_string ___H__23__23_force_2d_limited_2d_string_21_ ___H__23__23_input_2d_port_2d_characters_2d_buffered ___H_input_2d_port_2d_characters_2d_buffered ___H__23__23_char_2d_ready_3f_ ___H_char_2d_ready_3f_ ___H__23__23_peek_2d_char ___H_peek_2d_char ___H__23__23_read_2d_char ___H_read_2d_char ___H__23__23_read_2d_substring ___H_read_2d_substring ___H__23__23_read_2d_line ___H_read_2d_line ___H__23__23_read_2d_all ___H_read_2d_all ___H__23__23_read_2d_all_2d_as_2d_a_2d_begin_2d_expr_2d_from_2d_path ___H__23__23_read_2d_all_2d_as_2d_a_2d_begin_2d_expr_2d_from_2d_psettings ___H__23__23_read_2d_all_2d_as_2d_a_2d_begin_2d_expr_2d_from_2d_port ___H__23__23_write_2d_char ___H_write_2d_char ___H__23__23_write_2d_substring ___H_write_2d_substring ___H__23__23_write_2d_string ___H__23__23_input_2d_port_2d_bytes_2d_buffered ___H_input_2d_port_2d_bytes_2d_buffered ___H__23__23_read_2d_u8 ___H_read_2d_u8 ___H__23__23_read_2d_subu8vector ___H_read_2d_subu8vector ___H__23__23_write_2d_u8 ___H_write_2d_u8 ___H__23__23_write_2d_subu8vector ___H_write_2d_subu8vector ___H__23__23_options_2d_set_21_ ___H__23__23_port_2d_settings_2d_set_21_ ___H_port_2d_settings_2d_set_21_ ___H__23__23_fail_2d_check_2d_tty_2d_port ___H__23__23_tty_3f_ ___H_tty_3f_ ___H__23__23_tty_2d_type_2d_set_21_ ___H_tty_2d_type_2d_set_21_ ___H__23__23_tty_2d_text_2d_attributes_2d_set_21_ ___H_tty_2d_text_2d_attributes_2d_set_21_ ___H__23__23_tty_2d_history ___H_tty_2d_history ___H__23__23_tty_2d_history_2d_set_21_ ___H_tty_2d_history_2d_set_21_ ___H__23__23_tty_2d_history_2d_max_2d_length_2d_set_21_ ___H_tty_2d_history_2d_max_2d_length_2d_set_21_ ___H__23__23_tty_2d_paren_2d_balance_2d_duration_2d_set_21_ ___H_tty_2d_paren_2d_balance_2d_duration_2d_set_21_ ___H__23__23_tty_2d_mode_2d_set_21_ ___H_tty_2d_mode_2d_set_21_ ___H__23__23_fail_2d_check_2d_process_2d_port ___H__23__23_make_2d_process_2d_psettings ___H__23__23_open_2d_process_2d_generic ___H__23__23_open_2d_process ___H_open_2d_process ___H__23__23_open_2d_input_2d_process ___H_open_2d_input_2d_process ___H__23__23_open_2d_output_2d_process ___H_open_2d_output_2d_process ___H_call_2d_with_2d_input_2d_process ___H_call_2d_with_2d_output_2d_process ___H_with_2d_input_2d_from_2d_process ___H_with_2d_output_2d_to_2d_process ___H__23__23_process_2d_pid ___H_process_2d_pid ___H__23__23_process_2d_status ___H_process_2d_status ___H__23__23_fail_2d_check_2d_host_2d_info ___H_host_2d_info_3f_ ___H_host_2d_info_2d_name ___H_host_2d_info_2d_aliases ___H_host_2d_info_2d_addresses ___H__23__23_host_2d_info ___H_host_2d_info ___H__23__23_host_2d_name ___H_host_2d_name ___H__23__23_string_2d_or_2d_ip_2d_address_3f_ ___H__23__23_ip_2d_address_3f_ ___H__23__23_fail_2d_check_2d_service_2d_info ___H_service_2d_info_3f_ ___H_service_2d_info_2d_name ___H_service_2d_info_2d_aliases ___H_service_2d_info_2d_port_2d_number ___H_service_2d_info_2d_protocol ___H__23__23_service_2d_info ___H_service_2d_info ___H__23__23_fail_2d_check_2d_protocol_2d_info ___H_protocol_2d_info_3f_ ___H_protocol_2d_info_2d_name ___H_protocol_2d_info_2d_aliases ___H_protocol_2d_info_2d_number ___H__23__23_protocol_2d_info ___H_protocol_2d_info ___H__23__23_fail_2d_check_2d_network_2d_info ___H_network_2d_info_3f_ ___H_network_2d_info_2d_name ___H_network_2d_info_2d_aliases ___H_network_2d_info_2d_number ___H__23__23_network_2d_info ___H_network_2d_info ___H__23__23_fail_2d_check_2d_tcp_2d_client_2d_port ___H__23__23_make_2d_tcp_2d_psettings ___H__23__23_make_2d_tcp_2d_client_2d_port ___H__23__23_open_2d_tcp_2d_client ___H_open_2d_tcp_2d_client ___H__23__23_fail_2d_check_2d_socket_2d_info ___H_socket_2d_info_3f_ ___H_socket_2d_info_2d_family ___H_socket_2d_info_2d_port_2d_number ___H_socket_2d_info_2d_address ___H__23__23_socket_2d_info_2d_setup_21_ ___H__23__23_tcp_2d_client_2d_socket_2d_info ___H__23__23_tcp_2d_client_2d_self_2d_socket_2d_info ___H_tcp_2d_client_2d_self_2d_socket_2d_info ___H__23__23_tcp_2d_client_2d_peer_2d_socket_2d_info ___H_tcp_2d_client_2d_peer_2d_socket_2d_info ___H__23__23_fail_2d_check_2d_address_2d_info ___H_address_2d_info_3f_ ___H_address_2d_info_2d_family ___H_address_2d_info_2d_socket_2d_type ___H_address_2d_info_2d_protocol ___H_address_2d_info_2d_socket_2d_info ___H__23__23_net_2d_family_2d_encode ___H__23__23_net_2d_family_2d_decode ___H__23__23_net_2d_socket_2d_type_2d_encode ___H__23__23_net_2d_socket_2d_type_2d_decode ___H__23__23_net_2d_protocol_2d_encode ___H__23__23_net_2d_protocol_2d_decode ___H__23__23_address_2d_info_2d_setup_21_ ___H__23__23_address_2d_infos ___H_address_2d_infos ___H__23__23_fail_2d_check_2d_tcp_2d_server_2d_port ___H__23__23_make_2d_tcp_2d_server_2d_port ___H__23__23_process_2d_tcp_2d_server_2d_psettings ___H__23__23_open_2d_tcp_2d_server_2d_aux ___H__23__23_open_2d_tcp_2d_server ___H_open_2d_tcp_2d_server ___H__23__23_tcp_2d_server_2d_socket_2d_info ___H_tcp_2d_server_2d_socket_2d_info ___H__23__23_string_2d__3e_address_2d_and_2d_port_2d_number ___H__23__23_fail_2d_check_2d_directory_2d_port ___H__23__23_make_2d_directory_2d_psettings ___H__23__23_make_2d_directory_2d_port ___H__23__23_open_2d_directory ___H_open_2d_directory ___H__23__23_fail_2d_check_2d_event_2d_queue_2d_port ___H__23__23_make_2d_event_2d_queue_2d_port ___H__23__23_open_2d_event_2d_queue ___H_open_2d_event_2d_queue ___H__23__23_make_2d_path_2d_psettings ___H__23__23_make_2d_input_2d_path_2d_psettings ___H__23__23_open_2d_file_2d_generic ___H__23__23_open_2d_file_2d_generic_2d_from_2d_psettings ___H__23__23_path_2d_reference ___H__23__23_open_2d_file ___H_open_2d_file ___H__23__23_open_2d_input_2d_file ___H_open_2d_input_2d_file ___H__23__23_open_2d_output_2d_file ___H_open_2d_output_2d_file ___H_call_2d_with_2d_input_2d_file ___H_call_2d_with_2d_output_2d_file ___H_with_2d_input_2d_from_2d_file ___H_with_2d_output_2d_to_2d_file ___H_with_2d_input_2d_from_2d_port ___H_with_2d_output_2d_to_2d_port ___H__23__23_open_2d_predefined ___H_console_2d_port ___H__23__23_open_2d_all_2d_predefined ___H__23__23_force_2d_output_2d_on_2d_predefined ___H__23__23_make_2d_filepos ___H__23__23_filepos_2d_line ___H__23__23_filepos_2d_col ___H__23__23_fail_2d_check_2d_readtable ___H__23__23_readtable_3f_ ___H_readtable_3f_ ___H__23__23_readtable_2d_copy_2d_shallow ___H__23__23_readtable_2d_copy ___H_readtable_2d_case_2d_conversion_3f_ ___H_readtable_2d_case_2d_conversion_3f__2d_set ___H_readtable_2d_keywords_2d_allowed_3f_ ___H_readtable_2d_keywords_2d_allowed_3f__2d_set ___H_readtable_2d_sharing_2d_allowed_3f_ ___H_readtable_2d_sharing_2d_allowed_3f__2d_set ___H_readtable_2d_eval_2d_allowed_3f_ ___H_readtable_2d_eval_2d_allowed_3f__2d_set ___H_readtable_2d_write_2d_extended_2d_read_2d_macros_3f_ ___H_readtable_2d_write_2d_extended_2d_read_2d_macros_3f__2d_set ___H_readtable_2d_write_2d_cdr_2d_read_2d_macros_3f_ ___H_readtable_2d_write_2d_cdr_2d_read_2d_macros_3f__2d_set ___H_readtable_2d_max_2d_write_2d_level ___H_readtable_2d_max_2d_write_2d_level_2d_set ___H_readtable_2d_max_2d_write_2d_length ___H_readtable_2d_max_2d_write_2d_length_2d_set ___H_readtable_2d_max_2d_unescaped_2d_char ___H_readtable_2d_max_2d_unescaped_2d_char_2d_set ___H_readtable_2d_comment_2d_handler ___H_readtable_2d_comment_2d_handler_2d_set ___H_readtable_2d_start_2d_syntax ___H_readtable_2d_start_2d_syntax_2d_set ___H__23__23_extract_2d_language_2d_and_2d_tail ___H__23__23_readtable_2d_setup_2d_for_2d_language_21_ ___H__23__23_readtable_2d_setup_2d_for_2d_standard_2d_level_21_ ___H__23__23_make_2d_readtable_2d_parameter ___H__23__23_start_2d_main ___H__23__23_make_2d_marktable ___H__23__23_marktable_2d_mark_21_ ___H__23__23_marktable_2d_lookup_21_ ___H__23__23_marktable_2d_save ___H__23__23_marktable_2d_restore_21_ ___H__23__23_might_2d_write_2d_differently_3f_ ___H__23__23_default_2d_wr ___H__23__23_wr_2d_str ___H__23__23_wr_2d_substr ___H__23__23_wr_2d_ch ___H__23__23_wr_2d_filler ___H__23__23_wr_2d_spaces ___H__23__23_wr_2d_indent ___H__23__23_shifted_2d_column ___H__23__23_wr_2d_sn ___H__23__23_wr_2d_no_2d_display ___H__23__23_wr_2d_mark ___H__23__23_wr_2d_stamp ___H__23__23_wr_2d_symbol ___H__23__23_escape_2d_symbol_3f_ ___H__23__23_escape_2d_symkey_3f_ ___H__23__23_wr_2d_keyword ___H__23__23_escape_2d_keyword_3f_ ___H__23__23_wr_2d_pair ___H__23__23_print_2d_marker ___H__23__23_wr_2d_one_2d_line_2d_pretty_2d_print ___H__23__23_wr_2d_fits_2d_on_2d_line ___H__23__23_wr_2d_complex ___H__23__23_wr_2d_char ___H__23__23_wr_2d_hex ___H__23__23_wr_2d_oct ___H__23__23_wr_2d_string ___H__23__23_wr_2d_escaped_2d_string ___H__23__23_reader_2d__3e_open_2d_close ___H__23__23_head_2d__3e_open_2d_close ___H__23__23_wr_2d_vector ___H__23__23_wr_2d_vector_2d_aux1 ___H__23__23_wr_2d_vector_2d_aux2 ___H__23__23_wr_2d_vector_2d_aux3 ___H__23__23_wr_2d_foreign ___H__23__23_explode_2d_object ___H__23__23_implode_2d_object ___H__23__23_explode_2d_structure ___H__23__23_implode_2d_structure ___H__23__23_implode_2d_frame ___H__23__23_implode_2d_continuation ___H__23__23_explode_2d_procedure ___H__23__23_explode_2d_closure ___H__23__23_explode_2d_subprocedure ___H__23__23_implode_2d_procedure ___H__23__23_implode_2d_procedure_2d_or_2d_return ___H__23__23_explode_2d_return ___H__23__23_implode_2d_return ___H__23__23_wr_2d_opaque ___H__23__23_wr_2d_serialize ___H__23__23_wr_2d_s8vector ___H__23__23_wr_2d_u8vector ___H__23__23_wr_2d_s16vector ___H__23__23_wr_2d_u16vector ___H__23__23_wr_2d_s32vector ___H__23__23_wr_2d_u32vector ___H__23__23_wr_2d_s64vector ___H__23__23_wr_2d_u64vector ___H__23__23_wr_2d_f32vector ___H__23__23_wr_2d_f64vector ___H__23__23_wr_2d_structure ___H__23__23_wr_2d_gc_2d_hash_2d_table ___H__23__23_explode_2d_gc_2d_hash_2d_table ___H__23__23_implode_2d_gc_2d_hash_2d_table ___H__23__23_wr_2d_meroon ___H__23__23_wr_2d_jazz ___H__23__23_wr_2d_frame ___H__23__23_wr_2d_continuation ___H__23__23_wr_2d_promise ___H__23__23_explode_2d_promise ___H__23__23_implode_2d_promise ___H__23__23_wr_2d_will ___H__23__23_wr_2d_procedure ___H__23__23_wr_2d_return ___H__23__23_wr_2d_box ___H__23__23_wr_2d_other ___H__23__23_eof_2d_object_3f_ ___H_eof_2d_object_3f_ ___H_transcript_2d_on ___H_transcript_2d_off ___H__23__23_make_2d_chartable ___H__23__23_chartable_2d_copy ___H__23__23_chartable_2d_ref ___H__23__23_chartable_2d_set_21_ ___H__23__23_readtable_2d_char_2d_delimiter_3f_ ___H__23__23_readtable_2d_char_2d_delimiter_3f__2d_set_21_ ___H__23__23_readtable_2d_char_2d_handler ___H__23__23_readtable_2d_char_2d_handler_2d_set_21_ ___H__23__23_readtable_2d_char_2d_sharp_2d_handler ___H__23__23_readtable_2d_char_2d_sharp_2d_handler_2d_set_21_ ___H__23__23_readtable_2d_char_2d_class_2d_set_21_ ___H__23__23_readtable_2d_convert_2d_case ___H__23__23_readtable_2d_string_2d_convert_2d_case_21_ ___H__23__23_readtable_2d_parse_2d_keyword ___H__23__23_read_2d_datum_2d_or_2d_eof ___H__23__23_read_2d_datum_2d_or_2d_label ___H__23__23_read_2d_datum_2d_or_2d_label_2d_or_2d_none ___H__23__23_read_2d_datum_2d_or_2d_label_2d_or_2d_none_2d_or_2d_dot ___H__23__23_script_2d_marker ___H__23__23_none_2d_marker ___H__23__23_dot_2d_marker ___H__23__23_label_2d_marker_3f_ ___H__23__23_label_2d_marker_2d_enter_21_ ___H__23__23_label_2d_marker_2d_reference ___H__23__23_label_2d_marker_2d_fixup_2d_handler_2d_add_21_ ___H__23__23_label_2d_marker_2d_define ___H__23__23_label_2d_marker_2d_fixup_21_ ___H__23__23_read_2d_check_2d_labels_21_ ___H__23__23_build_2d_list ___H__23__23_read_2d_next_2d_char_2d_expecting ___H__23__23_build_2d_vector ___H__23__23_build_2d_delimited_2d_string ___H__23__23_build_2d_delimited_2d_number_2f_keyword_2f_symbol ___H__23__23_string_2d__3e_number_2f_keyword_2f_symbol ___H__23__23_char_2d_octal_3f_ ___H__23__23_char_2d_hexadecimal_3f_ ___H__23__23_build_2d_escaped_2d_string_2d_up_2d_to ___H__23__23_build_2d_decimal_2d_integer ___H__23__23_build_2d_read_2d_macro ___H__23__23_skip_2d_extended_2d_comment ___H__23__23_skip_2d_single_2d_line_2d_comment ___H__23__23_skip_2d_comment_2d_done ___H__23__23_read_2d_sharp ___H__23__23_read_2d_sharp_2d_aux ___H__23__23_read_2d_sharp_2d_vector ___H__23__23_read_2d_sharp_2d_char ___H__23__23_read_2d_sharp_2d_comment ___H__23__23_read_2d_sharp_2d_bang ___H__23__23_read_2d_sharp_2d_keyword_2f_symbol ___H__23__23_read_2d_sharp_2d_colon ___H__23__23_read_2d_sharp_2d_semicolon ___H__23__23_read_2d_sharp_2d_quotation ___H__23__23_read_2d_sharp_2d_ampersand ___H__23__23_read_2d_sharp_2d_dot ___H__23__23_read_2d_sharp_2d_less ___H__23__23_read_2d_sharp_2d_digit ___H__23__23_wrap ___H__23__23_wrap_2d_op ___H__23__23_wrap_2d_op0 ___H__23__23_wrap_2d_op1 ___H__23__23_wrap_2d_op1_2a_ ___H__23__23_wrap_2d_op2 ___H__23__23_wrap_2d_op3 ___H__23__23_wrap_2d_op4 ___H__23__23_read_2d_sharp_2d_other ___H__23__23_read_2d_whitespace ___H__23__23_read_2d_single_2d_line_2d_comment ___H__23__23_read_2d_escaped_2d_string ___H__23__23_read_2d_quotation ___H__23__23_closing_2d_parenthesis_2d_for ___H__23__23_read_2d_vector_2d_or_2d_list ___H__23__23_read_2d_list ___H__23__23_read_2d_vector ___H__23__23_read_2d_other ___H__23__23_read_2d_none ___H__23__23_read_2d_illegal ___H__23__23_read_2d_dot ___H__23__23_read_2d_number_2f_keyword_2f_symbol ___H__23__23_read_2d_assoc_2d_string_3d__3f_ ___H__23__23_read_2d_string_3d__3f_ ___H__23__23_read_2d_six ___H__23__23_read_2d_six_2d_datum_2d_or_2d_eof ___H__23__23_six_2d_type_3f_ ___H__23__23_make_2d_standard_2d_readtable ___setup_mod ___init_mod ____20___io Analyzing compilation unit Performing interprocedural optimizations <*free_lang_data> <visibility> <build_ssa_passes> <chkp_passes> <opt_local_passes> <free-inline-summary> <profile> <whole-program> <profile_estimate> <inline> <pure-const> <static-var> <single-use> <comdats>Assembling functions: ___setup_mod ___init_mod ___H__23__23_make_2d_standard_2d_readtable ___H__23__23_six_2d_type_3f_ ___H__23__23_read_2d_six_2d_datum_2d_or_2d_eof {GC 1963188k -> 1911014k}^Cmakefile:150: recipe for target '_io.o' failed make: *** [_io.o] Interrupt When I killed it, top was reporting: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8760 lucier 20 0 37.918g 0.029t 584 D 4.7 95.6 34:11.14 cc1 (I don't remember seeing resident memory measured in terabytes before ;-) I'm having similar problems with the 4.8 branch. I'm including _io.i.gz
The problem does not appear with this compiler: maclaurin-271% gcc -v Using built-in specs. Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux Thread model: posix gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) so it appears to be a regression. Brad
Given from the description I suppose that non-profiling/coverage mode is fine.
Ok, so the memory is used by out-of-SSA it seems #5 0x0000000000c9eebc in coalesce_ssa_name () at /space/rguenther/src/svn/gcc-4_9-branch/gcc/tree-ssa-coalesce.c:1330 1330 graph = build_ssa_conflict_graph (liveinfo); (gdb) p *cl->list.htab $10 = {entries = 0x2b19b30, size = 524287, n_elements = 77146, n_deleted = 0, searches = 122189, collisions = 6508, size_prime_index = 16} where we malloc(!) 77146 entries of size 12. But of course bad is the conflict graph with 76063 bitmaps eating up around 1GB of memory for the first testcase (and function ___H__23__23_u8vector_2d__3e_object). That's likely caused by the change to more aggressively coalesce anonymous SSA names.
It seems that loop invariant motion is responsible for most of the abnormals, thus -fno-tree-loop-im restores performance. The loop LIM detects is of style <bb 6>: (header) # ___fp_3(ab) = PHI <___fp_41(4), ___fp_5(21)> # ___r1_7(ab) = PHI <___r1_42(4), ___r1_9(21)> # ___r2_11(ab) = PHI <___r2_43(4), ___r3_17(21)> # ___r3_19(ab) = PHI <___r3_44(4), ___r3_23(21)> # ___r4_25 = PHI <___r4_45(4), ___r4_26(21)> # gotovar.17_29 = PHI <_51(4), _69(21)> goto gotovar.17_29; ... <bb 21>: (latch) _67 = ___pc_1 + 15; _68 = (void * *) _67; _69 = *_68; PROF_edge_counter_142 = __gcov0.___H_object_2d__3e_u8vector[14]; PROF_edge_counter_143 = PROF_edge_counter_142 + 1; __gcov0.___H_object_2d__3e_u8vector[14] = PROF_edge_counter_143; goto <bb 6>; not sure if we should artificially limit such loops. LIM doesn't account for the (compile-time) cost of needing very many PHIs when rewriting the store-motion vars into SSA form (but it could in theory estimate by taking into account the CFG structure of the "loop"). Let's see if we can first generate a smaller testcase to illustrate the issue. Mine for now.
Might want to look at 65076 as well where phase opt and generate is taking 89% of the compile time. Might be a better testcase to work with.
Ok, so it's already calculate_live_ranges that takes much memory. I have a small patch to improve that somewhat. But what we really need is to get the "must coalesce" stuff "coalesced" with respect to both live and conflict computation. That is, map must-coalesce SSA vars to the same partition. That loses the SSA corruption testing, but well so it might be much more controversical (silent wrong-code instead of ICE). Unfortunately in the testcase there are only 2750 must-coalesces but 109493 partitions participating in the coalescing (so at least 50000 want coalesces). The good news is of course that we can simply choose to _not_ coalesce that many variables, but say only the important ones.
(In reply to Richard Biener from comment #9) > It seems that loop invariant motion is responsible for most of the abnormals, > thus -fno-tree-loop-im restores performance. > > The loop LIM detects is of style > > <bb 6>: (header) > # ___fp_3(ab) = PHI <___fp_41(4), ___fp_5(21)> > # ___r1_7(ab) = PHI <___r1_42(4), ___r1_9(21)> > # ___r2_11(ab) = PHI <___r2_43(4), ___r3_17(21)> > # ___r3_19(ab) = PHI <___r3_44(4), ___r3_23(21)> > # ___r4_25 = PHI <___r4_45(4), ___r4_26(21)> > # gotovar.17_29 = PHI <_51(4), _69(21)> > goto gotovar.17_29; Perhaps disable LIM (and maybe PRE) if the CFG has a large edge/bb ratio (i.e. dense CFG)? There's probably no benefit in such cases anyway.
I think we've done similar things for Brad's large testcases in the past. You want to look at both the edge/bb density as well as the overall size. ie, a high density doesn't really hurt if the total cfg is small. See "is_too_expensive" in gcse.c for the current heuristics to avoid trying global opts on these kinds of testcases.
Note that if we fix out-of-SSA coalescing (patch in testing) then RTL CSE explodes via DF.
Author: rguenth Date: Fri Mar 6 12:34:28 2015 New Revision: 221237 URL: https://gcc.gnu.org/viewcvs?rev=221237&root=gcc&view=rev Log: 2015-03-06 Richard Biener <rguenther@suse.de> PR middle-end/64928 * tree-ssa-live.h (struct tree_live_info_d): Add livein_obstack and liveout_obstack members. (calculate_live_on_exit): Remove. (calculate_live_ranges): Change declaration. * tree-ssa-live.c (liveness_bitmap_obstack): Remove global var. (new_tree_live_info): Adjust. (calculate_live_ranges): Delete livein when not wanted. (calculate_live_ranges): Do not initialize liveness_bitmap_obstack. Deal with partly deleted live info. (loe_visit_block): Remove temporary bitmap by using bitmap_ior_and_compl_into. (live_worklist): Adjust accordingly. (calculate_live_on_exit): Make static. * tree-ssa-coalesce.c (coalesce_ssa_name): Tell calculate_live_ranges we do not need livein. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-ssa-coalesce.c trunk/gcc/tree-ssa-live.c trunk/gcc/tree-ssa-live.h
Created attachment 34974 [details] Patch to limit coalescing amount The committed patch improves peak memory usage from 7.6GB to 5.8GB for the small testcase. The attached patch reduces memory usage from SSA coalescing further (to ~300MB) by simply doing less coalescing. Unfortunately the generated RTL puts a bigger load on CSE/DF and thus we need 7.6GB again (eventually one can find an optimal --param max-out-of-ssa-coalesce-names, but that's probably highly testcase specific). In theory you can iterate on coalescing piecewise as well, but the overhead for doing this might be too big (basically up to computing live/conflict for each coalesce pair separately, taking into account previous coalesces).
Created attachment 34975 [details] do not compute live/conflict for abnormal coalesces This is the other idea of simply not computing live/conflict for abnormal coalesces we know to always succeed. This shrinks the following live/conflict problem for the regular coalesces by unifying some partitions. Doesn't help this particular testcase much.
(In reply to Richard Biener from comment #17) > Created attachment 34975 [details] > do not compute live/conflict for abnormal coalesces > > This is the other idea of simply not computing live/conflict for abnormal > coalesces we know to always succeed. This shrinks the following > live/conflict > problem for the regular coalesces by unifying some partitions. > > Doesn't help this particular testcase much. But it fixes PR63155 ...
*** Bug 66209 has been marked as a duplicate of this bug. ***
The gcc-4_8-branch is being closed, re-targeting regressions to 4.9.3.
GCC 4.9.3 has been released.
GCC 4.9 branch is being closed
I tried the mainline compiler with the smaller input file on a similar machine to the one in the original report. I don't know whether I've configured the compiler incorrectly or something, but the problem seems worse now than when first reported. This is the compiler: heine:~/programs/gcc> /pkgs/gcc-mainline/bin/gcc -v Using built-in specs. COLLECT_GCC=/pkgs/gcc-mainline/bin/gcc COLLECT_LTO_WRAPPER=/pkgs/gcc-mainline/libexec/gcc/x86_64-pc-linux-gnu/8.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ../../gcc-mainline/configure --prefix=/pkgs/gcc-mainline --enable-checking=release --enable-languages=c --disable-multilib --enable-gather-detailed-mem-stats Thread model: posix gcc version 8.0.0 20170818 (experimental) [trunk revision 251188] (GCC) and this is the result: /pkgs/gcc-mainline/bin/gcc -Q -save-temps -Wno-unused -Wno-write-strings -O1 -fno-math-errno -fschedule-insns2 -fno-strict-aliasing -fno-trapping-math -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -fprofile-arcs -ftest-coverage -I"../include" -c -o "_system.o" -I. -DHAVE_CONFIG_H -D___PRIMAL _system.c -D___LIBRARY Execution times (seconds) phase setup : 0.05 (100%) usr 0.00 ( 0%) sys 0.05 (83%) wall 1425 kB (99%) ggc TOTAL : 0.05 0.00 0.06 1434 kB btowc wctob mbrlen __signbitf __signbit __signbitl ___H__20___system ___H__23__23_type ___H__23__23_type_2d_cast ___H__23__23_subtype ___H__23__23_subtype_2d_set_21_ ___H__23__23_fixnum_3f_ ___H__23__23_subtyped_3f_ ___H__23__23_subtyped_2d_mutable_3f_ ___H__23__23_subtyped_2e_vector_3f_ ___H__23__23_subtyped_2e_symbol_3f_ ___H__23__23_subtyped_2e_flonum_3f_ ___H__23__23_subtyped_2e_bignum_3f_ ___H__23__23_special_3f_ ___H__23__23_ratnum_3f_ ___H__23__23_cpxnum_3f_ ___H__23__23_structure_3f_ ___H__23__23_values_3f_ ___H__23__23_meroon_3f_ ___H__23__23_jazz_3f_ ___H__23__23_frame_3f_ ___H__23__23_continuation_3f_ ___H__23__23_promise_3f_ ___H__23__23_return_3f_ ___H__23__23_foreign_3f_ ___H__23__23_flonum_3f_ ___H__23__23_bignum_3f_ ___H__23__23_unbound_3f_ ___H__23__23_quasi_2d_append ___H__23__23_quasi_2d_list ___H__23__23_quasi_2d_cons ___H__23__23_quasi_2d_list_2d__3e_vector ___H__23__23_quasi_2d_vector ___H__23__23_case_2d_memv ___H__23__23_eqv_3f_ ___H_eqv_3f_ ___H__23__23_eq_3f_ ___H_eq_3f_ ___H__23__23_bvector_2d_equal_3f_ ___H__23__23_equal_3f_ ___H_equal_3f_ ___H__23__23_symbol_2d_hash ___H_symbol_2d_hash ___H__23__23_keyword_2d_hash ___H_keyword_2d_hash ___H__23__23_eq_3f__2d_hash ___H_eq_3f__2d_hash ___H__23__23_eqv_3f__2d_hash ___H_eqv_3f__2d_hash ___H__23__23_equal_3f__2d_hash ___H_equal_3f__2d_hash ___H__23__23_string_3d__3f__2d_hash ___H_string_3d__3f__2d_hash ___H__23__23_string_2d_ci_3d__3f__2d_hash ___H_string_2d_ci_3d__3f__2d_hash ___H__23__23_generic_2d_hash ___H__23__23_fail_2d_check_2d_invalid_2d_hash_2d_number_2d_exception ___H_invalid_2d_hash_2d_number_2d_exception_3f_ ___H_invalid_2d_hash_2d_number_2d_exception_2d_procedure ___H_invalid_2d_hash_2d_number_2d_exception_2d_arguments ___H__23__23_raise_2d_invalid_2d_hash_2d_number_2d_exception ___H__23__23_fail_2d_check_2d_unbound_2d_table_2d_key_2d_exception ___H_unbound_2d_table_2d_key_2d_exception_3f_ ___H_unbound_2d_table_2d_key_2d_exception_2d_procedure ___H_unbound_2d_table_2d_key_2d_exception_2d_arguments ___H__23__23_raise_2d_unbound_2d_table_2d_key_2d_exception ___H__23__23_gc_2d_hash_2d_table_3f_ ___H__23__23_gc_2d_hash_2d_table_2d_ref ___H__23__23_gc_2d_hash_2d_table_2d_set_21_ ___H__23__23_gc_2d_hash_2d_table_2d_rehash_21_ ___H__23__23_smallest_2d_prime_2d_no_2d_less_2d_than ___H__23__23_gc_2d_hash_2d_table_2d_resize_21_ ___H__23__23_gc_2d_hash_2d_table_2d_allocate ___H__23__23_gc_2d_hash_2d_table_2d_for_2d_each ___H__23__23_gc_2d_hash_2d_table_2d_search ___H__23__23_gc_2d_hash_2d_table_2d_foldl ___H__23__23_mem_2d_allocated_3f_ ___H__23__23_fail_2d_check_2d_table ___H_table_3f_ ___H__23__23_make_2d_table ___H_make_2d_table ___H__23__23_table_2d_get_2d_eq_2d_gcht ___H__23__23_table_2d_get_2d_gcht_2d_not_2d_mem_2d_alloc ___H__23__23_table_2d_get_2d_gcht ___H__23__23_table_2d_length ___H_table_2d_length ___H__23__23_table_2d_access ___H__23__23_table_2d_ref ___H_table_2d_ref ___H__23__23_table_2d_resize_21_ ___H__23__23_table_2d_set_21_ ___H_table_2d_set_21_ ___H__23__23_table_2d_search ___H_table_2d_search ___H__23__23_table_2d_for_2d_each ___H_table_2d_for_2d_each ___H__23__23_table_2d_foldl ___H__23__23_table_2d__3e_list ___H_table_2d__3e_list ___H__23__23_list_2d__3e_table ___H_list_2d__3e_table ___H__23__23_table_2d_copy ___H_table_2d_copy ___H__23__23_table_2d_merge_21_ ___H_table_2d_merge_21_ ___H__23__23_table_2d_merge ___H_table_2d_merge ___H__23__23_table_2d_equal_3f_ ___H__23__23_table_2d_equal_3f__2d_hash ___H__23__23_fail_2d_check_2d_unbound_2d_serial_2d_number_2d_exception ___H_unbound_2d_serial_2d_number_2d_exception_3f_ ___H_unbound_2d_serial_2d_number_2d_exception_2d_procedure ___H_unbound_2d_serial_2d_number_2d_exception_2d_arguments ___H__23__23_raise_2d_unbound_2d_serial_2d_number_2d_exception ___H__23__23_object_2d__3e_serial_2d_number ___H_object_2d__3e_serial_2d_number ___H__23__23_serial_2d_number_2d__3e_object ___H_serial_2d_number_2d__3e_object ___H__23__23_object_2d__3e_u8vector ___H_object_2d__3e_u8vector ___H__23__23_u8vector_2d__3e_object ___H_u8vector_2d__3e_object ___setup_mod ___init_mod ____20___system Analyzing compilation unit Performing interprocedural optimizations <*free_lang_data> <visibility> <build_ssa_passes> <opt_local_passes> <targetclone> <profile> <free-fnsummary> <whole-program> <profile_estimate> <fnsummary> <inline> <pure-const> <static-var> <single-use> <comdats>Assembling functions: <materialize-all-clones> <simdclone> ___H__20___system ___H__23__23_type ___H__23__23_type_2d_cast ___H__23__23_subtype ___H__23__23_subtype_2d_set_21_ ___H__23__23_fixnum_3f_ ___H__23__23_subtyped_3f_ ___H__23__23_subtyped_2d_mutable_3f_ ___H__23__23_subtyped_2e_vector_3f_ ___H__23__23_subtyped_2e_symbol_3f_ ___H__23__23_subtyped_2e_flonum_3f_ ___H__23__23_subtyped_2e_bignum_3f_ ___H__23__23_special_3f_ ___H__23__23_ratnum_3f_ ___H__23__23_cpxnum_3f_ ___H__23__23_structure_3f_ ___H__23__23_values_3f_ ___H__23__23_meroon_3f_ ___H__23__23_jazz_3f_ ___H__23__23_frame_3f_ ___H__23__23_continuation_3f_ ___H__23__23_promise_3f_ ___H__23__23_return_3f_ ___H__23__23_foreign_3f_ ___H__23__23_flonum_3f_ ___H__23__23_bignum_3f_ ___H__23__23_unbound_3f_ ___H__23__23_quasi_2d_append ___H__23__23_quasi_2d_list ___H__23__23_quasi_2d_cons ___H__23__23_quasi_2d_list_2d__3e_vector ___H__23__23_quasi_2d_vector ___H__23__23_case_2d_memv ___H__23__23_eqv_3f_ ___H_eqv_3f_ ___H__23__23_eq_3f_ ___H_eq_3f_ ___H__23__23_bvector_2d_equal_3f_ ___H__23__23_equal_3f_ ___H_equal_3f_ ___H__23__23_symbol_2d_hash ___H_symbol_2d_hash ___H__23__23_keyword_2d_hash ___H_keyword_2d_hash ___H__23__23_eq_3f__2d_hash ___H_eq_3f__2d_hash ___H__23__23_eqv_3f__2d_hash ___H_eqv_3f__2d_hash ___H__23__23_equal_3f__2d_hash ___H_equal_3f__2d_hash ___H__23__23_string_3d__3f__2d_hash ___H_string_3d__3f__2d_hash ___H_string_2d_ci_3d__3f__2d_hash ___H__23__23_generic_2d_hash ___H__23__23_fail_2d_check_2d_invalid_2d_hash_2d_number_2d_exception ___H_invalid_2d_hash_2d_number_2d_exception_3f_ ___H_invalid_2d_hash_2d_number_2d_exception_2d_procedure ___H_invalid_2d_hash_2d_number_2d_exception_2d_arguments ___H__23__23_raise_2d_invalid_2d_hash_2d_number_2d_exception ___H__23__23_fail_2d_check_2d_unbound_2d_table_2d_key_2d_exception ___H_unbound_2d_table_2d_key_2d_exception_3f_ ___H_unbound_2d_table_2d_key_2d_exception_2d_procedure ___H_unbound_2d_table_2d_key_2d_exception_2d_arguments ___H__23__23_raise_2d_unbound_2d_table_2d_key_2d_exception ___H__23__23_gc_2d_hash_2d_table_3f_ ___H__23__23_smallest_2d_prime_2d_no_2d_less_2d_than ___H__23__23_gc_2d_hash_2d_table_2d_resize_21_ ___H__23__23_gc_2d_hash_2d_table_2d_allocate ___H__23__23_gc_2d_hash_2d_table_2d_for_2d_each ___H__23__23_gc_2d_hash_2d_table_2d_search ___H__23__23_gc_2d_hash_2d_table_2d_foldl ___H__23__23_mem_2d_allocated_3f_ ___H__23__23_fail_2d_check_2d_table ___H_table_3f_ ___H_make_2d_table ___H__23__23_table_2d_get_2d_eq_2d_gcht ___H__23__23_table_2d_get_2d_gcht_2d_not_2d_mem_2d_alloc ___H__23__23_table_2d_get_2d_gcht ___H__23__23_table_2d_length ___H_table_2d_length ___H__23__23_table_2d_access ___H_table_2d_ref ___H__23__23_table_2d_resize_21_ ___H_table_2d_set_21_ ___H__23__23_table_2d_search ___H_table_2d_search ___H__23__23_table_2d_for_2d_each ___H_table_2d_for_2d_each ___H__23__23_table_2d_foldl ___H__23__23_table_2d__3e_list ___H_table_2d__3e_list ___H__23__23_list_2d__3e_table ___H_list_2d__3e_table ___H__23__23_table_2d_copy ___H_table_2d_copy ___H__23__23_table_2d_merge_21_ ___H_table_2d_merge_21_ ___H__23__23_table_2d_merge ___H_table_2d_merge ___H__23__23_table_2d_equal_3f_ ___H__23__23_table_2d_equal_3f__2d_hash ___H__23__23_fail_2d_check_2d_unbound_2d_serial_2d_number_2d_exception ___H_unbound_2d_serial_2d_number_2d_exception_3f_ ___H_unbound_2d_serial_2d_number_2d_exception_2d_procedure ___H_unbound_2d_serial_2d_number_2d_exception_2d_arguments ___H__23__23_raise_2d_unbound_2d_serial_2d_number_2d_exception ___H__23__23_object_2d__3e_serial_2d_number ___H_object_2d__3e_serial_2d_number ___H__23__23_serial_2d_number_2d__3e_object ___H_serial_2d_number_2d__3e_object ___H__23__23_object_2d__3e_u8vector {GC 267350k -> 214835k} {GC 430685k -> 259602k} ___H_object_2d__3e_u8vector ___H__23__23_u8vector_2d__3e_object {GC 582086k -> 310231k} ___H_u8vector_2d__3e_object ___setup_mod ___init_mod ___H__23__23_gc_2d_hash_2d_table_2d_set_21_ ___H__23__23_table_2d_set_21_ ___H__23__23_gc_2d_hash_2d_table_2d_rehash_21_ ___H__23__23_table_2d_ref ___H__23__23_gc_2d_hash_2d_table_2d_ref ___H__23__23_make_2d_table ___H__23__23_string_2d_ci_3d__3f__2d_hash ____20___system _GLOBAL__sub_I_00100_0__system.c _GLOBAL__sub_D_00100_1__system.c Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1180 kB ( 0%) ggc phase parsing : 0.30 ( 0%) usr 0.24 (10%) sys 0.53 ( 0%) wall 11106 kB ( 1%) ggc phase opt and generate : 231.20 (100%) usr 2.26 (90%) sys 233.89 (100%) wall 1264764 kB (99%) ggc garbage collection : 1.47 ( 1%) usr 0.01 ( 0%) sys 1.48 ( 1%) wall 0 kB ( 0%) ggc dump files : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc callgraph construction : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 5513 kB ( 0%) ggc ipa function summary : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 1333 kB ( 0%) ggc ipa dead code removal : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc ipa profile : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 2764 kB ( 0%) ggc ipa pure const : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 0 kB ( 0%) ggc cfg construction : 0.41 ( 0%) usr 0.00 ( 0%) sys 0.39 ( 0%) wall 463 kB ( 0%) ggc cfg cleanup : 7.07 ( 3%) usr 0.00 ( 0%) sys 6.98 ( 3%) wall 19 kB ( 0%) ggc trivially dead code : 0.42 ( 0%) usr 0.00 ( 0%) sys 0.40 ( 0%) wall 0 kB ( 0%) ggc df scan insns : 0.65 ( 0%) usr 0.00 ( 0%) sys 0.68 ( 0%) wall 5 kB ( 0%) ggc df multiple defs : 3.41 ( 1%) usr 0.02 ( 1%) sys 3.41 ( 1%) wall 0 kB ( 0%) ggc df reaching defs : 0.02 ( 0%) usr 0.01 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc df live regs : 10.87 ( 5%) usr 0.01 ( 0%) sys 10.84 ( 5%) wall 0 kB ( 0%) ggc df live&initialized regs: 5.22 ( 2%) usr 0.00 ( 0%) sys 5.22 ( 2%) wall 0 kB ( 0%) ggc df use-def / def-use chains: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc df reg dead/unused notes: 3.39 ( 1%) usr 0.01 ( 0%) sys 3.41 ( 1%) wall 23596 kB ( 2%) ggc register information : 0.66 ( 0%) usr 0.00 ( 0%) sys 0.64 ( 0%) wall 0 kB ( 0%) ggc alias analysis : 1.44 ( 1%) usr 0.00 ( 0%) sys 1.42 ( 1%) wall 50694 kB ( 4%) ggc alias stmt walking : 25.60 (11%) usr 0.36 (14%) sys 25.17 (11%) wall 1121 kB ( 0%) ggc register scan : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 41 kB ( 0%) ggc rebuild jump labels : 0.21 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall 0 kB ( 0%) ggc preprocessing : 0.07 ( 0%) usr 0.06 ( 2%) sys 0.16 ( 0%) wall 1080 kB ( 0%) ggc lexical analysis : 0.10 ( 0%) usr 0.08 ( 3%) sys 0.10 ( 0%) wall 0 kB ( 0%) ggc parser (global) : 0.04 ( 0%) usr 0.03 ( 1%) sys 0.07 ( 0%) wall 1542 kB ( 0%) ggc parser struct body : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.00 ( 0%) wall 324 kB ( 0%) ggc parser function body : 0.09 ( 0%) usr 0.06 ( 2%) sys 0.20 ( 0%) wall 8135 kB ( 1%) ggc inline parameters : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 1071 kB ( 0%) ggc tree gimplify : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 5494 kB ( 0%) ggc tree CFG construction : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 1895 kB ( 0%) ggc tree CFG cleanup : 3.07 ( 1%) usr 0.00 ( 0%) sys 3.14 ( 1%) wall 78 kB ( 0%) ggc tree copy propagation : 0.92 ( 0%) usr 0.00 ( 0%) sys 0.92 ( 0%) wall 194 kB ( 0%) ggc tree PTA : 0.16 ( 0%) usr 0.00 ( 0%) sys 0.23 ( 0%) wall 208 kB ( 0%) ggc tree PHI insertion : 0.01 ( 0%) usr 0.01 ( 0%) sys 0.02 ( 0%) wall 2265 kB ( 0%) ggc tree SSA rewrite : 1.30 ( 1%) usr 0.01 ( 0%) sys 1.34 ( 1%) wall 17229 kB ( 1%) ggc tree SSA other : 0.02 ( 0%) usr 0.01 ( 0%) sys 0.02 ( 0%) wall 17 kB ( 0%) ggc tree SSA incremental : 2.92 ( 1%) usr 0.04 ( 2%) sys 2.96 ( 1%) wall 108528 kB ( 8%) ggc tree operand scan : 0.16 ( 0%) usr 0.03 ( 1%) sys 0.10 ( 0%) wall 21599 kB ( 2%) ggc dominator optimization : 3.81 ( 2%) usr 0.01 ( 0%) sys 4.65 ( 2%) wall 27533 kB ( 2%) ggc tree SRA : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree CCP : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 177 kB ( 0%) ggc tree PHI const/copy prop: 0.18 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall 5439 kB ( 0%) ggc tree split crit edges : 1.38 ( 1%) usr 0.00 ( 0%) sys 1.36 ( 1%) wall 77179 kB ( 6%) ggc tree reassociation : 0.27 ( 0%) usr 0.00 ( 0%) sys 0.28 ( 0%) wall 8 kB ( 0%) ggc tree FRE : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall 1310 kB ( 0%) ggc tree code sinking : 0.32 ( 0%) usr 0.00 ( 0%) sys 0.31 ( 0%) wall 0 kB ( 0%) ggc tree linearize phis : 0.17 ( 0%) usr 0.00 ( 0%) sys 0.16 ( 0%) wall 131 kB ( 0%) ggc tree backward propagate : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc tree forward propagate : 2.56 ( 1%) usr 0.00 ( 0%) sys 2.64 ( 1%) wall 288 kB ( 0%) ggc tree conservative DCE : 0.80 ( 0%) usr 0.02 ( 1%) sys 0.76 ( 0%) wall 84 kB ( 0%) ggc tree aggressive DCE : 0.60 ( 0%) usr 0.02 ( 1%) sys 0.71 ( 0%) wall 2225 kB ( 0%) ggc tree DSE : 0.30 ( 0%) usr 0.00 ( 0%) sys 0.28 ( 0%) wall 8 kB ( 0%) ggc tree loop invariant motion: 40.96 (18%) usr 0.27 (11%) sys 41.41 (18%) wall 209802 kB (16%) ggc tree canonical iv : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 17 kB ( 0%) ggc scev constant prop : 1.40 ( 1%) usr 0.01 ( 0%) sys 1.42 ( 1%) wall 19981 kB ( 2%) ggc tree iv optimization : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 296 kB ( 0%) ggc tree SSA uncprop : 0.46 ( 0%) usr 0.00 ( 0%) sys 0.45 ( 0%) wall 0 kB ( 0%) ggc dominance frontiers : 0.55 ( 0%) usr 0.01 ( 0%) sys 0.54 ( 0%) wall 0 kB ( 0%) ggc dominance computation : 5.36 ( 2%) usr 0.01 ( 0%) sys 5.27 ( 2%) wall 0 kB ( 0%) ggc out of ssa : 26.58 (11%) usr 0.96 (38%) sys 27.56 (12%) wall 4461 kB ( 0%) ggc expand vars : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 999 kB ( 0%) ggc expand : 4.32 ( 2%) usr 0.12 ( 5%) sys 4.47 ( 2%) wall 184816 kB (14%) ggc post expand cleanups : 0.76 ( 0%) usr 0.00 ( 0%) sys 0.77 ( 0%) wall 337 kB ( 0%) ggc forward prop : 2.92 ( 1%) usr 0.01 ( 0%) sys 3.00 ( 1%) wall 14617 kB ( 1%) ggc CSE : 1.98 ( 1%) usr 0.03 ( 1%) sys 2.06 ( 1%) wall 16860 kB ( 1%) ggc dead code elimination : 0.86 ( 0%) usr 0.00 ( 0%) sys 0.84 ( 0%) wall 0 kB ( 0%) ggc dead store elim1 : 2.43 ( 1%) usr 0.00 ( 0%) sys 2.43 ( 1%) wall 11087 kB ( 1%) ggc dead store elim2 : 3.04 ( 1%) usr 0.00 ( 0%) sys 3.03 ( 1%) wall 35846 kB ( 3%) ggc loop analysis : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc loop init : 2.44 ( 1%) usr 0.00 ( 0%) sys 2.52 ( 1%) wall 1031 kB ( 0%) ggc loop invariant motion : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 224 kB ( 0%) ggc loop fini : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall 0 kB ( 0%) ggc branch prediction : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 268 kB ( 0%) ggc combiner : 1.49 ( 1%) usr 0.01 ( 0%) sys 1.47 ( 1%) wall 4746 kB ( 0%) ggc if-conversion : 2.70 ( 1%) usr 0.00 ( 0%) sys 2.73 ( 1%) wall 46824 kB ( 4%) ggc integrated RA : 9.59 ( 4%) usr 0.03 ( 1%) sys 9.69 ( 4%) wall 164161 kB (13%) ggc LRA non-specific : 11.22 ( 5%) usr 0.05 ( 2%) sys 11.20 ( 5%) wall 52521 kB ( 4%) ggc LRA virtuals elimination: 1.67 ( 1%) usr 0.05 ( 2%) sys 1.71 ( 1%) wall 30963 kB ( 2%) ggc LRA reload inheritance : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 10 kB ( 0%) ggc LRA create live ranges : 14.05 ( 6%) usr 0.00 ( 0%) sys 14.07 ( 6%) wall 4517 kB ( 0%) ggc LRA hard reg assignment : 0.87 ( 0%) usr 0.00 ( 0%) sys 0.91 ( 0%) wall 0 kB ( 0%) ggc LRA coalesce pseudo regs: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc reload : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc reload CSE regs : 1.79 ( 1%) usr 0.01 ( 0%) sys 1.87 ( 1%) wall 27472 kB ( 2%) ggc thread pro- & epilogue : 0.67 ( 0%) usr 0.00 ( 0%) sys 0.67 ( 0%) wall 521 kB ( 0%) ggc if-conversion 2 : 0.42 ( 0%) usr 0.00 ( 0%) sys 0.42 ( 0%) wall 0 kB ( 0%) ggc combine stack adjustments: 0.22 ( 0%) usr 0.00 ( 0%) sys 0.25 ( 0%) wall 0 kB ( 0%) ggc hard reg cprop : 0.48 ( 0%) usr 0.04 ( 2%) sys 0.55 ( 0%) wall 3 kB ( 0%) ggc scheduling 2 : 4.38 ( 2%) usr 0.03 ( 1%) sys 4.43 ( 2%) wall 4136 kB ( 0%) ggc machine dep reorg : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc reorder blocks : 1.56 ( 1%) usr 0.00 ( 0%) sys 1.57 ( 1%) wall 8368 kB ( 1%) ggc shorten branches : 0.49 ( 0%) usr 0.00 ( 0%) sys 0.49 ( 0%) wall 0 kB ( 0%) ggc final : 1.40 ( 1%) usr 0.03 ( 1%) sys 1.45 ( 1%) wall 60062 kB ( 5%) ggc variable output : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 142 kB ( 0%) ggc straight-line strength reduction: 0.33 ( 0%) usr 0.00 ( 0%) sys 0.32 ( 0%) wall 30 kB ( 0%) ggc initialize rtl : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 12 kB ( 0%) ggc rest of compilation : 2.60 ( 1%) usr 0.02 ( 1%) sys 2.60 ( 1%) wall 621 kB ( 0%) ggc remove unused locals : 0.23 ( 0%) usr 0.00 ( 0%) sys 0.24 ( 0%) wall 0 kB ( 0%) ggc repair loop structures : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall 0 kB ( 0%) ggc TOTAL : 231.50 2.50 234.43 1277059 kB
I think we made LIM more powerful and/or changed coalescing. We didn't really address the issue with LIMs lack of a costmodel or out-of-SSA coalescing being quadratic in size requirement (but I see RTL passes bumping up to 5GB of memory use as well, REE for example, also noted elsewhere).
GCC 5 branch is being closed
GCC 6 branch is being closed
Btw, on trunk for the small testcase the main peak memory user is Bitmaps Leak Peak Times N searches Search iter Type -------------------------------------------------------------------------------------------------------------------------------------------- ... tree-ssa-live.c:931 (new_tree_live_info) 4089900520: 42.6%4089900600 102257849: 11.3% 35539 42909 heap tree-ssa-live.c:932 (new_tree_live_info) 4099840160: 42.7%4099840200 103153730: 11.4% 326917 98706 heap -------------------------------------------------------------------------------------------------------------------------------------------- Total 9592285400 906070505 that's livein/liveout. SSA conflicts are probably similar but harder to decipher from the stats: tree-ssa-coalesce.c:586 (ssa_conflicts_add_one) 43056: 0.0% 198672 398160: 0.0% 19205 39415 heap next top is df-problems.c:4400 (df_md_alloc) 218129480: 2.3% 218146320 5654706: 0.6% 71264 127594 heap df-problems.c:4401 (df_md_alloc) 218142960: 2.3% 218159920 5640467: 0.6% 71675 127395 heap
The GCC 7 branch is being closed, re-targeting to GCC 8.4.
GCC 8.4.0 has been released, adjusting target milestone.
I'm coming back to this project. I naively thought "Well, I don't need arc profiling, I'll just set -ftest-coverage without -fprofile-arcs" but it appears that I can't do that, the gcda files are generated by -fprofile-arcs. It seems to me that test coverage could be implemented simply by instrumenting each basic block in an algorithm that's linear in the number of basic blocks. Is it possible to do this? Brad
(In reply to lucier from comment #30) > I'm coming back to this project. > > I naively thought "Well, I don't need arc profiling, I'll just set > -ftest-coverage without -fprofile-arcs" but it appears that I can't do that, > the gcda files are generated by -fprofile-arcs. > > It seems to me that test coverage could be implemented simply by > instrumenting each basic block in an algorithm that's linear in the number > of basic blocks. Is it possible to do this? > > Brad I don't think the instrumentation itself is the problem - it's already doing better than one counter per block. It's simply that the large source runs into multiple non-linearities in core pieces of the compiler that cannot be turned off ...
I don't know precisely what you're saying, but it compiles fine without the instrumentation.
On Tue, 29 Sep 2020, lucier at math dot purdue.edu wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928 > > --- Comment #32 from lucier at math dot purdue.edu --- > I don't know precisely what you're saying, but it compiles fine without the > instrumentation. Yes - the instrumentation does complicate the IL but the instrumentation should be already better than linear in the blocks.
I decided to approach this a bit more methodically by generating a series of synthetic programs, each twice as long as the previous, and to measure the compilation time. I'll attach the associated .i files here. Each .i file was generated from a Scheme file with 2^k copies, k=1,..,5, of a simple recursive definition of the fibonacci function, suitably renamed. So these are not large files by my standards. The short summary is that CPU time seems to grow quadraticly with the length of the code. The required memory grows very quickly, too---I killed the compilation with k=5 (so 32 copies of fibonacci function) because the computation filled 32GB of RAM and 32GB of swap. Perhaps this parameterized input files might be of help. Brad I downloaded the git sources for gcc: heine:~/programs/gcc/gcc-mainline> git log commit 7eef9a66018e23677058fec421229e3fa435a1a3 (HEAD -> master, origin/master, origin/HEAD) Author: Joel Brobecker <brobecker@adacore.com> Date: Mon Mar 8 23:59:37 2021 -0300 I configured and built gcc with heine:~/programs/gcc/gcc-mainline> /pkgs/gcc-mainline/bin/gcc -v Using built-in specs. COLLECT_GCC=/pkgs/gcc-mainline/bin/gcc COLLECT_LTO_WRAPPER=/pkgs/gcc-mainline/libexec/gcc/x86_64-pc-linux-gnu/11.0.1/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ../../gcc-mainline/configure --prefix=/pkgs/gcc-mainline --enable-languages=c --enable-checking=release Thread model: posix Supported LTO compression algorithms: zlib gcc version 11.0.1 20210309 (experimental) (GCC) The program names are fib-1.c to fib-5.c, fib-k.c contains 2^k copies of fibonacci. /pkgs/gcc-mainline/bin/gcc -march=native -D___CAN_IMPORT_CLIB_DYNAMICALLY -O1 -Wno-unused -Wno-write-strings -Wdisabled-optimization -fwrapv -fno-strict-aliasing -fno-trapping-math -fno-math-errno -fschedule-insns2 -fomit-frame-pointer -fPIC -fno-common -mpc64 -rdynamic -shared -D___SINGLE_HOST -D___DYNAMIC -I"/home/lucier/programs/gambit/gambit-profiled/include" -o 'fib-1.o1' -Q -fprofile-arcs -ftest-coverage -save-temps 'fib-1.c' Time variable usr sys wall GGC phase setup : 0.02 (100%) 0.00 ( 0%) 0.03 (100%) 5039k (100%) TOTAL : 0.02 0.00 0.03 5049k btowc wctob mbrlen ___H_fib_2d_1 ___setup_mod ___init_mod ___LNK_fib_2d_1_2e_o1 Analyzing compilation unit Performing interprocedural optimizations <*free_lang_data> {heap 1240k} <visibility> {heap 1240k} <build_ssa_passes> {heap 1240k} <opt_local_passes> {heap 1240k} <remove_symbols> {heap 2468k} <targetclone> {heap 2468k} <profile> {heap 2468k} <free-fnsummary> {heap 2468k}Streaming LTO <whole-program> {heap 2468k} <profile_estimate> {heap 2468k} <fnsummary> {heap 2468k} <inline> {heap 2468k} <pure-const> {heap 2468k} <modref> {heap 2468k} <free-fnsummary> {heap 2468k} <static-var> {heap 2468k} <single-use> {heap 2468k} <comdats> {heap 2468k}Assembling functions: <simdclone> {heap 2468k} ___setup_mod ___init_mod ___H_fib_2d_1 ___LNK_fib_2d_1_2e_o1 _sub_I_00100_0 _sub_D_00100_1 Time variable usr sys wall GGC phase setup : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 1%) 1519k ( 6%) phase parsing : 0.06 ( 8%) 0.01 ( 20%) 0.08 ( 10%) 2072k ( 8%) phase opt and generate : 0.67 ( 92%) 0.04 ( 80%) 0.70 ( 89%) 22M ( 86%) dump files : 0.01 ( 1%) 0.00 ( 0%) 0.00 ( 0%) 0 ( 0%) callgraph functions expansion : 0.66 ( 90%) 0.03 ( 60%) 0.69 ( 87%) 21M ( 82%) callgraph ipa passes : 0.01 ( 1%) 0.00 ( 0%) 0.01 ( 1%) 570k ( 2%) cfg cleanup : 0.00 ( 0%) 0.00 ( 0%) 0.04 ( 5%) 64 ( 0%) trivially dead code : 0.00 ( 0%) 0.01 ( 20%) 0.00 ( 0%) 0 ( 0%) df live regs : 0.01 ( 1%) 0.00 ( 0%) 0.02 ( 3%) 0 ( 0%) df live&initialized regs : 0.02 ( 3%) 0.00 ( 0%) 0.02 ( 3%) 0 ( 0%) df reg dead/unused notes : 0.02 ( 3%) 0.00 ( 0%) 0.01 ( 1%) 305k ( 1%) alias analysis : 0.01 ( 1%) 0.00 ( 0%) 0.01 ( 1%) 1482k ( 6%) alias stmt walking : 0.02 ( 3%) 0.01 ( 20%) 0.02 ( 3%) 7280 ( 0%) rebuild jump labels : 0.01 ( 1%) 0.00 ( 0%) 0.00 ( 0%) 0 ( 0%) preprocessing : 0.02 ( 3%) 0.00 ( 0%) 0.01 ( 1%) 240k ( 1%) lexical analysis : 0.02 ( 3%) 0.01 ( 20%) 0.00 ( 0%) 0 ( 0%) parser (global) : 0.01 ( 1%) 0.00 ( 0%) 0.04 ( 5%) 1239k ( 5%) parser struct body : 0.01 ( 1%) 0.00 ( 0%) 0.01 ( 1%) 359k ( 1%) parser function body : 0.00 ( 0%) 0.00 ( 0%) 0.02 ( 3%) 201k ( 1%) tree gimplify : 0.00 ( 0%) 0.01 ( 20%) 0.00 ( 0%) 297k ( 1%) tree copy propagation : 0.01 ( 1%) 0.00 ( 0%) 0.01 ( 1%) 13k ( 0%) tree SSA rewrite : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 1%) 356k ( 1%) tree SSA incremental : 0.01 ( 1%) 0.00 ( 0%) 0.00 ( 0%) 2918k ( 11%) tree operand scan : 0.01 ( 1%) 0.00 ( 0%) 0.00 ( 0%) 314k ( 1%) dominator optimization : 0.03 ( 4%) 0.01 ( 20%) 0.04 ( 5%) 531k ( 2%) tree FRE : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 1%) 36k ( 0%) tree forward propagate : 0.02 ( 3%) 0.00 ( 0%) 0.00 ( 0%) 34k ( 0%) tree conservative DCE : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 1%) 6224 ( 0%) tree DSE : 0.03 ( 4%) 0.00 ( 0%) 0.04 ( 5%) 0 ( 0%) tree loop invariant motion : 0.01 ( 1%) 0.00 ( 0%) 0.03 ( 4%) 2496k ( 9%) tree strlen optimization : 0.01 ( 1%) 0.00 ( 0%) 0.01 ( 1%) 83k ( 0%) dominance computation : 0.02 ( 3%) 0.00 ( 0%) 0.00 ( 0%) 0 ( 0%) out of ssa : 0.03 ( 4%) 0.00 ( 0%) 0.02 ( 3%) 64k ( 0%) expand : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 1%) 2473k ( 9%) forward prop : 0.02 ( 3%) 0.00 ( 0%) 0.02 ( 3%) 81k ( 0%) CSE : 0.01 ( 1%) 0.00 ( 0%) 0.00 ( 0%) 211k ( 1%) dead store elim2 : 0.01 ( 1%) 0.00 ( 0%) 0.02 ( 3%) 701k ( 3%) loop init : 0.01 ( 1%) 0.00 ( 0%) 0.00 ( 0%) 29k ( 0%) loop fini : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 1%) 116k ( 0%) combiner : 0.01 ( 1%) 0.00 ( 0%) 0.01 ( 1%) 108k ( 0%) if-conversion : 0.02 ( 3%) 0.00 ( 0%) 0.00 ( 0%) 666k ( 3%) integrated RA : 0.06 ( 8%) 0.00 ( 0%) 0.05 ( 6%) 3986k ( 15%) LRA non-specific : 0.05 ( 7%) 0.00 ( 0%) 0.06 ( 8%) 1324k ( 5%) LRA reload inheritance : 0.01 ( 1%) 0.00 ( 0%) 0.01 ( 1%) 224 ( 0%) LRA create live ranges : 0.09 ( 12%) 0.00 ( 0%) 0.08 ( 10%) 241k ( 1%) LRA hard reg assignment : 0.02 ( 3%) 0.00 ( 0%) 0.02 ( 3%) 0 ( 0%) reload CSE regs : 0.02 ( 3%) 0.00 ( 0%) 0.02 ( 3%) 368k ( 1%) thread pro- & epilogue : 0.01 ( 1%) 0.00 ( 0%) 0.00 ( 0%) 10k ( 0%) hard reg cprop : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 1%) 288 ( 0%) scheduling 2 : 0.04 ( 5%) 0.00 ( 0%) 0.04 ( 5%) 149k ( 1%) shorten branches : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 1%) 0 ( 0%) final : 0.01 ( 1%) 0.00 ( 0%) 0.00 ( 0%) 816k ( 3%) initialize rtl : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 1%) 12k ( 0%) rest of compilation : 0.00 ( 0%) 0.00 ( 0%) 0.02 ( 3%) 66k ( 0%) TOTAL : 0.73 0.05 0.79 25M /pkgs/gcc-mainline/bin/gcc -march=native -D___CAN_IMPORT_CLIB_DYNAMICALLY -O1 -Wno-unused -Wno-write-strings -Wdisabled-optimization -fwrapv -fno-strict-aliasing -fno-trapping-math -fno-math-errno -fschedule-insns2 -fomit-frame-pointer -fPIC -fno-common -mpc64 -rdynamic -shared -D___SINGLE_HOST -D___DYNAMIC -I"/home/lucier/programs/gambit/gambit-profiled/include" -o 'fib-2.o1' -Q -fprofile-arcs -ftest-coverage -save-temps 'fib-2.c' Time variable usr sys wall GGC phase setup : 0.01 (100%) 0.02 (100%) 0.04 (100%) 7596k (100%) TOTAL : 0.01 0.02 0.04 7606k btowc wctob mbrlen ___H_fib_2d_2 ___setup_mod ___init_mod ___LNK_fib_2d_2_2e_o1 Analyzing compilation unit Performing interprocedural optimizations <*free_lang_data> {heap 1432k} <visibility> {heap 1432k} <build_ssa_passes> {heap 1432k} <opt_local_passes> {heap 1432k} <remove_symbols> {heap 3104k} <targetclone> {heap 3104k} <profile> {heap 3104k} <free-fnsummary> {heap 3104k}Streaming LTO <whole-program> {heap 3104k} <profile_estimate> {heap 3104k} <fnsummary> {heap 3104k} <inline> {heap 3104k} <pure-const> {heap 3104k} <modref> {heap 3104k} <free-fnsummary> {heap 3104k} <static-var> {heap 3104k} <single-use> {heap 3104k} <comdats> {heap 3104k}Assembling functions: <simdclone> {heap 3104k} ___setup_mod ___init_mod ___H_fib_2d_2 ___LNK_fib_2d_2_2e_o1 _sub_I_00100_0 _sub_D_00100_1 Time variable usr sys wall GGC phase setup : 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 1519k ( 2%) phase parsing : 0.04 ( 1%) 0.05 ( 36%) 0.10 ( 3%) 2500k ( 4%) phase opt and generate : 2.78 ( 99%) 0.09 ( 64%) 2.88 ( 97%) 62M ( 94%) callgraph construction : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 26k ( 0%) callgraph functions expansion : 2.75 ( 98%) 0.09 ( 64%) 2.85 ( 96%) 61M ( 92%) callgraph ipa passes : 0.02 ( 1%) 0.00 ( 0%) 0.02 ( 1%) 939k ( 1%) ipa pure const : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) cfg cleanup : 0.04 ( 1%) 0.00 ( 0%) 0.04 ( 1%) 64 ( 0%) trivially dead code : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) df scan insns : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 288 ( 0%) df reaching defs : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) df live regs : 0.07 ( 2%) 0.00 ( 0%) 0.10 ( 3%) 0 ( 0%) df live&initialized regs : 0.08 ( 3%) 0.00 ( 0%) 0.07 ( 2%) 0 ( 0%) df reg dead/unused notes : 0.05 ( 2%) 0.01 ( 7%) 0.06 ( 2%) 935k ( 1%) register information : 0.04 ( 1%) 0.00 ( 0%) 0.03 ( 1%) 0 ( 0%) alias analysis : 0.02 ( 1%) 0.00 ( 0%) 0.00 ( 0%) 2960k ( 4%) alias stmt walking : 0.13 ( 5%) 0.02 ( 14%) 0.10 ( 3%) 7472 ( 0%) rebuild jump labels : 0.01 ( 0%) 0.00 ( 0%) 0.03 ( 1%) 0 ( 0%) preprocessing : 0.00 ( 0%) 0.03 ( 21%) 0.03 ( 1%) 250k ( 0%) lexical analysis : 0.02 ( 1%) 0.02 ( 14%) 0.06 ( 2%) 0 ( 0%) parser (global) : 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 1252k ( 2%) parser struct body : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 359k ( 1%) parser function body : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 608k ( 1%) inline parameters : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 39k ( 0%) tree gimplify : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 505k ( 1%) tree CFG cleanup : 0.02 ( 1%) 0.01 ( 7%) 0.02 ( 1%) 320k ( 0%) tree copy propagation : 0.04 ( 1%) 0.00 ( 0%) 0.05 ( 2%) 24k ( 0%) tree PTA : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 13k ( 0%) tree SSA rewrite : 0.02 ( 1%) 0.00 ( 0%) 0.02 ( 1%) 605k ( 1%) tree SSA incremental : 0.05 ( 2%) 0.00 ( 0%) 0.06 ( 2%) 9895k ( 14%) tree operand scan : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 882k ( 1%) dominator optimization : 0.13 ( 5%) 0.00 ( 0%) 0.16 ( 5%) 1261k ( 2%) tree split crit edges : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 1410k ( 2%) tree reassociation : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 48 ( 0%) tree code sinking : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 1680k ( 2%) tree forward propagate : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 1%) 63k ( 0%) tree conservative DCE : 0.02 ( 1%) 0.00 ( 0%) 0.02 ( 1%) 8288 ( 0%) tree aggressive DCE : 0.03 ( 1%) 0.00 ( 0%) 0.02 ( 1%) 40 ( 0%) tree DSE : 0.11 ( 4%) 0.00 ( 0%) 0.12 ( 4%) 0 ( 0%) tree loop invariant motion : 0.09 ( 3%) 0.01 ( 7%) 0.09 ( 3%) 7961k ( 12%) tree iv optimization : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 22k ( 0%) tree SSA uncprop : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) tree strlen optimization : 0.02 ( 1%) 0.00 ( 0%) 0.02 ( 1%) 149k ( 0%) tree modref : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 2800 ( 0%) dominance computation : 0.02 ( 1%) 0.00 ( 0%) 0.05 ( 2%) 0 ( 0%) out of ssa : 0.11 ( 4%) 0.01 ( 7%) 0.13 ( 4%) 752 ( 0%) expand : 0.03 ( 1%) 0.00 ( 0%) 0.02 ( 1%) 7567k ( 11%) post expand cleanups : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 49k ( 0%) varconst : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 1024 ( 0%) forward prop : 0.09 ( 3%) 0.00 ( 0%) 0.09 ( 3%) 255k ( 0%) CSE : 0.02 ( 1%) 0.00 ( 0%) 0.02 ( 1%) 659k ( 1%) dead code elimination : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) dead store elim1 : 0.02 ( 1%) 0.00 ( 0%) 0.03 ( 1%) 467k ( 1%) dead store elim2 : 0.04 ( 1%) 0.00 ( 0%) 0.03 ( 1%) 2157k ( 3%) loop init : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 36k ( 0%) loop fini : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 352k ( 1%) combiner : 0.02 ( 1%) 0.00 ( 0%) 0.02 ( 1%) 260k ( 0%) if-conversion : 0.03 ( 1%) 0.00 ( 0%) 0.04 ( 1%) 2511k ( 4%) integrated RA : 0.21 ( 7%) 0.01 ( 7%) 0.22 ( 7%) 9272k ( 14%) LRA non-specific : 0.18 ( 6%) 0.01 ( 7%) 0.16 ( 5%) 4240k ( 6%) LRA virtuals elimination : 0.03 ( 1%) 0.00 ( 0%) 0.02 ( 1%) 1264k ( 2%) LRA reload inheritance : 0.04 ( 1%) 0.00 ( 0%) 0.04 ( 1%) 0 ( 0%) LRA create live ranges : 0.41 ( 15%) 0.00 ( 0%) 0.44 ( 15%) 757k ( 1%) LRA hard reg assignment : 0.08 ( 3%) 0.01 ( 7%) 0.09 ( 3%) 0 ( 0%) reload CSE regs : 0.05 ( 2%) 0.00 ( 0%) 0.05 ( 2%) 1113k ( 2%) thread pro- & epilogue : 0.02 ( 1%) 0.00 ( 0%) 0.02 ( 1%) 10k ( 0%) if-conversion 2 : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) combine stack adjustments : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) hard reg cprop : 0.02 ( 1%) 0.00 ( 0%) 0.02 ( 1%) 432 ( 0%) scheduling 2 : 0.11 ( 4%) 0.00 ( 0%) 0.12 ( 4%) 457k ( 1%) reorder blocks : 0.02 ( 1%) 0.00 ( 0%) 0.01 ( 0%) 370k ( 1%) shorten branches : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) final : 0.03 ( 1%) 0.00 ( 0%) 0.03 ( 1%) 2482k ( 4%) straight-line strength reduction : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 4440 ( 0%) rest of compilation : 0.08 ( 3%) 0.00 ( 0%) 0.03 ( 1%) 179k ( 0%) remove unused locals : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) repair loop structures : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 ( 0%) TOTAL : 2.82 0.14 2.98 66M /pkgs/gcc-mainline/bin/gcc -march=native -D___CAN_IMPORT_CLIB_DYNAMICALLY -O1 -Wno-unused -Wno-write-strings -Wdisabled-optimization -fwrapv -fno-strict-aliasing -fno-trapping-math -fno-math-errno -fschedule-insns2 -fomit-frame-pointer -fPIC -fno-common -mpc64 -rdynamic -shared -D___SINGLE_HOST -D___DYNAMIC -I"/home/lucier/programs/gambit/gambit-profiled/include" -o 'fib-3.o1' -Q -fprofile-arcs -ftest-coverage -save-temps 'fib-3.c' Time variable usr sys wall GGC phase setup : 0.04 (100%) 0.00 ( 0%) 0.04 (100%) 8613k (100%) TOTAL : 0.04 0.00 0.04 8624k btowc wctob mbrlen ___H_fib_2d_3 ___setup_mod ___init_mod ___LNK_fib_2d_3_2e_o1 Analyzing compilation unit Performing interprocedural optimizations <*free_lang_data> {heap 1436k} <visibility> {heap 1436k} <build_ssa_passes> {heap 1436k} <opt_local_passes> {heap 1436k} <remove_symbols> {heap 3060k} <targetclone> {heap 3060k} <profile> {heap 3060k} <free-fnsummary> {heap 3060k}Streaming LTO <whole-program> {heap 3060k} <profile_estimate> {heap 3060k} <fnsummary> {heap 3060k} <inline> {heap 3060k} <pure-const> {heap 3060k} <modref> {heap 3060k} <free-fnsummary> {heap 3060k} <static-var> {heap 3060k} <single-use> {heap 3060k} <comdats> {heap 3060k}Assembling functions: <simdclone> {heap 3060k} ___setup_mod ___init_mod ___H_fib_2d_3 ___LNK_fib_2d_3_2e_o1 _sub_I_00100_0 _sub_D_00100_1 Time variable usr sys wall GGC phase setup : 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 1519k ( 1%) phase parsing : 0.09 ( 1%) 0.05 ( 11%) 0.14 ( 1%) 2845k ( 1%) phase opt and generate : 13.80 ( 99%) 0.42 ( 89%) 14.22 ( 99%) 220M ( 98%) callgraph functions expansion : 13.76 ( 99%) 0.42 ( 89%) 14.17 ( 99%) 216M ( 97%) callgraph ipa passes : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 1687k ( 1%) ipa function summary : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 176k ( 0%) ipa profile : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 300k ( 0%) ipa pure const : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) cfg construction : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 82k ( 0%) cfg cleanup : 0.20 ( 1%) 0.01 ( 2%) 0.19 ( 1%) 64 ( 0%) trivially dead code : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 0 ( 0%) df scan insns : 0.03 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 288 ( 0%) df reaching defs : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 ( 0%) df live regs : 0.37 ( 3%) 0.00 ( 0%) 0.40 ( 3%) 0 ( 0%) df live&initialized regs : 0.37 ( 3%) 0.01 ( 2%) 0.38 ( 3%) 0 ( 0%) df reg dead/unused notes : 0.17 ( 1%) 0.01 ( 2%) 0.18 ( 1%) 3229k ( 1%) register information : 0.15 ( 1%) 0.00 ( 0%) 0.17 ( 1%) 0 ( 0%) alias analysis : 0.07 ( 1%) 0.00 ( 0%) 0.05 ( 0%) 11M ( 5%) alias stmt walking : 1.02 ( 7%) 0.02 ( 4%) 0.93 ( 6%) 7856 ( 0%) rebuild jump labels : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 ( 0%) preprocessing : 0.03 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 268k ( 0%) lexical analysis : 0.04 ( 0%) 0.02 ( 4%) 0.03 ( 0%) 0 ( 0%) parser (global) : 0.00 ( 0%) 0.01 ( 2%) 0.03 ( 0%) 1275k ( 1%) parser struct body : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 359k ( 0%) parser function body : 0.01 ( 0%) 0.02 ( 4%) 0.04 ( 0%) 911k ( 0%) tree gimplify : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 937k ( 0%) tree CFG cleanup : 0.11 ( 1%) 0.00 ( 0%) 0.14 ( 1%) 1373k ( 1%) tree copy propagation : 0.17 ( 1%) 0.00 ( 0%) 0.17 ( 1%) 48k ( 0%) tree PTA : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 23k ( 0%) tree SSA rewrite : 0.13 ( 1%) 0.00 ( 0%) 0.13 ( 1%) 1877k ( 1%) tree SSA other : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 952 ( 0%) tree SSA incremental : 0.24 ( 2%) 0.01 ( 2%) 0.24 ( 2%) 34M ( 15%) tree operand scan : 0.01 ( 0%) 0.02 ( 4%) 0.03 ( 0%) 2882k ( 1%) dominator optimization : 0.43 ( 3%) 0.01 ( 2%) 0.58 ( 4%) 4002k ( 2%) tree CCP : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 47k ( 0%) tree split crit edges : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 5019k ( 2%) tree reassociation : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 48 ( 0%) tree FRE : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 110k ( 0%) tree code sinking : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 6070k ( 3%) tree linearize phis : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 6432 ( 0%) tree forward propagate : 0.20 ( 1%) 0.02 ( 4%) 0.21 ( 1%) 119k ( 0%) tree conservative DCE : 0.06 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 16k ( 0%) tree aggressive DCE : 0.08 ( 1%) 0.00 ( 0%) 0.07 ( 0%) 40 ( 0%) tree DSE : 0.47 ( 3%) 0.00 ( 0%) 0.47 ( 3%) 0 ( 0%) tree loop invariant motion : 0.61 ( 4%) 0.04 ( 9%) 0.65 ( 5%) 27M ( 12%) complete unrolling : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 544 ( 0%) tree iv optimization : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 47k ( 0%) tree SSA uncprop : 0.03 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 0 ( 0%) tree strlen optimization : 0.09 ( 1%) 0.00 ( 0%) 0.10 ( 1%) 281k ( 0%) tree modref : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 2800 ( 0%) dominance computation : 0.16 ( 1%) 0.00 ( 0%) 0.14 ( 1%) 0 ( 0%) out of ssa : 0.72 ( 5%) 0.12 ( 26%) 0.85 ( 6%) 512k ( 0%) expand : 0.10 ( 1%) 0.02 ( 4%) 0.11 ( 1%) 25M ( 11%) post expand cleanups : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 89k ( 0%) forward prop : 0.35 ( 3%) 0.01 ( 2%) 0.35 ( 2%) 888k ( 0%) CSE : 0.10 ( 1%) 0.00 ( 0%) 0.11 ( 1%) 2302k ( 1%) dead code elimination : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 0 ( 0%) dead store elim1 : 0.08 ( 1%) 0.00 ( 0%) 0.09 ( 1%) 1532k ( 1%) dead store elim2 : 0.13 ( 1%) 0.00 ( 0%) 0.14 ( 1%) 7464k ( 3%) loop init : 0.08 ( 1%) 0.00 ( 0%) 0.11 ( 1%) 50k ( 0%) loop invariant motion : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 58k ( 0%) loop fini : 0.03 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 928k ( 0%) combiner : 0.06 ( 0%) 0.00 ( 0%) 0.06 ( 0%) 736k ( 0%) if-conversion : 0.10 ( 1%) 0.00 ( 0%) 0.09 ( 1%) 9292k ( 4%) integrated RA : 1.16 ( 8%) 0.01 ( 2%) 1.15 ( 8%) 37M ( 17%) LRA non-specific : 0.93 ( 7%) 0.01 ( 2%) 0.95 ( 7%) 10M ( 5%) LRA virtuals elimination : 0.06 ( 0%) 0.00 ( 0%) 0.07 ( 0%) 4366k ( 2%) LRA reload inheritance : 0.23 ( 2%) 0.00 ( 0%) 0.23 ( 2%) 0 ( 0%) LRA create live ranges : 2.41 ( 17%) 0.00 ( 0%) 2.41 ( 17%) 2648k ( 1%) LRA hard reg assignment : 0.78 ( 6%) 0.02 ( 4%) 0.78 ( 5%) 0 ( 0%) reload : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 144 ( 0%) reload CSE regs : 0.16 ( 1%) 0.01 ( 2%) 0.16 ( 1%) 3807k ( 2%) thread pro- & epilogue : 0.06 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 10k ( 0%) if-conversion 2 : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) combine stack adjustments : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 ( 0%) hard reg cprop : 0.07 ( 1%) 0.02 ( 4%) 0.08 ( 1%) 720 ( 0%) scheduling 2 : 0.36 ( 3%) 0.01 ( 2%) 0.35 ( 2%) 1590k ( 1%) machine dep reorg : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 ( 0%) reorder blocks : 0.06 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 1180k ( 1%) shorten branches : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 0 ( 0%) final : 0.07 ( 1%) 0.01 ( 2%) 0.08 ( 1%) 8569k ( 4%) straight-line strength reduction : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 8232 ( 0%) rest of compilation : 0.13 ( 1%) 0.03 ( 6%) 0.18 ( 1%) 342k ( 0%) remove unused locals : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 ( 0%) address taken : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) TOTAL : 13.89 0.47 14.36 224M /pkgs/gcc-mainline/bin/gcc -march=native -D___CAN_IMPORT_CLIB_DYNAMICALLY -O1 -Wno-unused -Wno-write-strings -Wdisabled-optimization -fwrapv -fno-strict-aliasing -fno-trapping-math -fno-math-errno -fschedule-insns2 -fomit-frame-pointer -fPIC -fno-common -mpc64 -rdynamic -shared -D___SINGLE_HOST -D___DYNAMIC -I"/home/lucier/programs/gambit/gambit-profiled/include" -o 'fib-4.o1' -Q -fprofile-arcs -ftest-coverage -save-temps 'fib-4.c' Time variable usr sys wall GGC phase setup : 0.05 (100%) 0.00 ( 0%) 0.06 (100%) 10M (100%) TOTAL : 0.05 0.00 0.06 10M btowc wctob mbrlen ___H_fib_2d_4 ___setup_mod ___init_mod ___LNK_fib_2d_4_2e_o1 Analyzing compilation unit Performing interprocedural optimizations <*free_lang_data> {heap 1652k} <visibility> {heap 1652k} <build_ssa_passes> {heap 1652k} <opt_local_passes> {heap 1652k} <remove_symbols> {heap 4168k} <targetclone> {heap 4168k} <profile> {heap 4168k} <free-fnsummary> {heap 4168k}Streaming LTO <whole-program> {heap 4168k} <profile_estimate> {heap 4168k} <fnsummary> {heap 4168k} <inline> {heap 4168k} <pure-const> {heap 4168k} <modref> {heap 4168k} <free-fnsummary> {heap 4168k} <static-var> {heap 4168k} <single-use> {heap 4168k} <comdats> {heap 4168k}Assembling functions: <simdclone> {heap 4168k} ___setup_mod ___init_mod ___H_fib_2d_4 {GC madv_dontneed 556k} {GC 264M -> 260M} {GC madv_dontneed 116k} {GC 526M -> 302M} ___LNK_fib_2d_4_2e_o1 _sub_I_00100_0 _sub_D_00100_1 Time variable usr sys wall GGC phase setup : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 1519k ( 0%) phase parsing : 0.16 ( 0%) 0.08 ( 3%) 0.23 ( 0%) 4049k ( 1%) phase lang. deferred : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 96 ( 0%) phase opt and generate : 55.79 (100%) 2.22 ( 97%) 58.03 (100%) 712M ( 99%) garbage collection : 0.38 ( 1%) 0.00 ( 0%) 0.38 ( 1%) 0 ( 0%) dump files : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) callgraph construction : 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 1108k ( 0%) callgraph optimization : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 19k ( 0%) callgraph functions expansion : 55.71 (100%) 2.21 ( 96%) 57.94 ( 99%) 706M ( 98%) callgraph ipa passes : 0.07 ( 0%) 0.01 ( 0%) 0.09 ( 0%) 3221k ( 0%) ipa function summary : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 335k ( 0%) ipa inlining heuristics : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 16 ( 0%) ipa profile : 0.00 ( 0%) 0.01 ( 0%) 0.01 ( 0%) 605k ( 0%) ipa pure const : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 ( 0%) cfg construction : 0.06 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 159k ( 0%) cfg cleanup : 0.68 ( 1%) 0.02 ( 1%) 0.69 ( 1%) 48 ( 0%) trivially dead code : 0.11 ( 0%) 0.00 ( 0%) 0.11 ( 0%) 0 ( 0%) df scan insns : 0.09 ( 0%) 0.01 ( 0%) 0.11 ( 0%) 288 ( 0%) df live regs : 1.30 ( 2%) 0.04 ( 2%) 1.36 ( 2%) 0 ( 0%) df live&initialized regs : 1.52 ( 3%) 0.03 ( 1%) 1.56 ( 3%) 0 ( 0%) df reg dead/unused notes : 0.52 ( 1%) 0.01 ( 0%) 0.54 ( 1%) 11M ( 2%) register information : 0.34 ( 1%) 0.00 ( 0%) 0.34 ( 1%) 0 ( 0%) alias analysis : 0.20 ( 0%) 0.00 ( 0%) 0.20 ( 0%) 26M ( 4%) alias stmt walking : 7.31 ( 13%) 0.11 ( 5%) 7.32 ( 13%) 8624 ( 0%) register scan : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 9008 ( 0%) rebuild jump labels : 0.07 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 0 ( 0%) preprocessing : 0.02 ( 0%) 0.02 ( 1%) 0.07 ( 0%) 306k ( 0%) lexical analysis : 0.06 ( 0%) 0.03 ( 1%) 0.10 ( 0%) 0 ( 0%) parser (global) : 0.03 ( 0%) 0.02 ( 1%) 0.02 ( 0%) 1323k ( 0%) parser function body : 0.05 ( 0%) 0.01 ( 0%) 0.05 ( 0%) 2029k ( 0%) inline parameters : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 131k ( 0%) tree gimplify : 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 1802k ( 0%) tree CFG construction : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 578k ( 0%) tree CFG cleanup : 0.41 ( 1%) 0.00 ( 0%) 0.42 ( 1%) 5686k ( 1%) tree copy propagation : 0.68 ( 1%) 0.00 ( 0%) 0.67 ( 1%) 96k ( 0%) tree PTA : 0.01 ( 0%) 0.01 ( 0%) 0.02 ( 0%) 43k ( 0%) tree PHI insertion : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 866k ( 0%) tree SSA rewrite : 0.57 ( 1%) 0.00 ( 0%) 0.57 ( 1%) 10M ( 1%) tree SSA incremental : 1.15 ( 2%) 0.05 ( 2%) 1.20 ( 2%) 118M ( 16%) tree operand scan : 0.10 ( 0%) 0.06 ( 3%) 0.25 ( 0%) 10M ( 1%) dominator optimization : 3.64 ( 7%) 0.04 ( 2%) 3.82 ( 7%) 13M ( 2%) tree CCP : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 94k ( 0%) tree split crit edges : 0.04 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 18M ( 3%) tree reassociation : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 48 ( 0%) tree FRE : 0.01 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 208k ( 0%) tree code sinking : 0.07 ( 0%) 0.00 ( 0%) 0.07 ( 0%) 18M ( 3%) tree linearize phis : 0.04 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 6432 ( 0%) tree backward propagate : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) tree forward propagate : 1.65 ( 3%) 0.01 ( 0%) 1.66 ( 3%) 232k ( 0%) tree conservative DCE : 0.29 ( 1%) 0.00 ( 0%) 0.29 ( 0%) 31k ( 0%) tree aggressive DCE : 0.30 ( 1%) 0.00 ( 0%) 0.24 ( 0%) 40 ( 0%) tree DSE : 1.88 ( 3%) 0.00 ( 0%) 1.89 ( 3%) 0 ( 0%) tree loop invariant motion : 5.00 ( 9%) 0.15 ( 7%) 5.10 ( 9%) 103M ( 14%) tree iv optimization : 0.01 ( 0%) 0.01 ( 0%) 0.02 ( 0%) 95k ( 0%) tree SSA uncprop : 0.13 ( 0%) 0.00 ( 0%) 0.15 ( 0%) 0 ( 0%) tree strlen optimization : 0.62 ( 1%) 0.00 ( 0%) 0.62 ( 1%) 547k ( 0%) tree modref : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 2800 ( 0%) dominance frontiers : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 0 ( 0%) dominance computation : 0.58 ( 1%) 0.02 ( 1%) 0.59 ( 1%) 0 ( 0%) out of ssa : 5.62 ( 10%) 1.11 ( 48%) 6.73 ( 12%) 2049k ( 0%) expand vars : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 407k ( 0%) expand : 0.39 ( 1%) 0.01 ( 0%) 0.42 ( 1%) 92M ( 13%) post expand cleanups : 0.12 ( 0%) 0.00 ( 0%) 0.13 ( 0%) 169k ( 0%) lower subreg : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) forward prop : 1.25 ( 2%) 0.05 ( 2%) 1.29 ( 2%) 3301k ( 0%) CSE : 0.28 ( 1%) 0.00 ( 0%) 0.27 ( 0%) 8571k ( 1%) dead code elimination : 0.08 ( 0%) 0.00 ( 0%) 0.08 ( 0%) 0 ( 0%) dead store elim1 : 0.32 ( 1%) 0.00 ( 0%) 0.32 ( 1%) 5493k ( 1%) dead store elim2 : 0.41 ( 1%) 0.00 ( 0%) 0.43 ( 1%) 23M ( 3%) loop analysis : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 ( 0%) loop init : 0.20 ( 0%) 0.00 ( 0%) 0.21 ( 0%) 62k ( 0%) loop fini : 0.07 ( 0%) 0.02 ( 1%) 0.10 ( 0%) 3776k ( 1%) combiner : 0.22 ( 0%) 0.00 ( 0%) 0.22 ( 0%) 2378k ( 0%) if-conversion : 0.38 ( 1%) 0.01 ( 0%) 0.37 ( 1%) 36M ( 5%) integrated RA : 5.43 ( 10%) 0.02 ( 1%) 5.44 ( 9%) 96M ( 13%) LRA non-specific : 3.61 ( 6%) 0.01 ( 0%) 3.64 ( 6%) 21M ( 3%) LRA virtuals elimination : 0.18 ( 0%) 0.01 ( 0%) 0.16 ( 0%) 15M ( 2%) LRA create live ranges : 3.08 ( 6%) 0.01 ( 0%) 3.09 ( 5%) 2027k ( 0%) LRA hard reg assignment : 0.07 ( 0%) 0.00 ( 0%) 0.07 ( 0%) 0 ( 0%) reload : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 144 ( 0%) reload CSE regs : 0.51 ( 1%) 0.00 ( 0%) 0.51 ( 1%) 13M ( 2%) thread pro- & epilogue : 0.10 ( 0%) 0.00 ( 0%) 0.11 ( 0%) 9680 ( 0%) if-conversion 2 : 0.05 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 24 ( 0%) combine stack adjustments : 0.04 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 0 ( 0%) hard reg cprop : 0.21 ( 0%) 0.10 ( 4%) 0.31 ( 1%) 3288 ( 0%) scheduling 2 : 1.36 ( 2%) 0.04 ( 2%) 1.38 ( 2%) 5904k ( 1%) machine dep reorg : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 ( 0%) reorder blocks : 0.19 ( 0%) 0.00 ( 0%) 0.23 ( 0%) 4176k ( 1%) shorten branches : 0.14 ( 0%) 0.00 ( 0%) 0.14 ( 0%) 0 ( 0%) final : 0.27 ( 0%) 0.01 ( 0%) 0.29 ( 0%) 31M ( 4%) straight-line strength reduction : 0.10 ( 0%) 0.00 ( 0%) 0.10 ( 0%) 33k ( 0%) rest of compilation : 0.93 ( 2%) 0.24 ( 10%) 1.15 ( 2%) 1158k ( 0%) remove unused locals : 0.07 ( 0%) 0.00 ( 0%) 0.07 ( 0%) 0 ( 0%) address taken : 0.09 ( 0%) 0.00 ( 0%) 0.09 ( 0%) 0 ( 0%) repair loop structures : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) TOTAL : 55.95 2.30 58.28 718M heine:~/programs/gambit/gambit-profiled> /pkgs/gcc-mainline/bin/gcc -march=native -D___CAN_IMPORT_CLIB_DYNAMICALLY -O1 -Wno-unused -Wno-write-strings -Wdisabled-optimization -fwrapv -fno-strict-aliasing -fno-trapping-math -fno-math-errno -fschedule-insns2 -fomit-frame-pointer -fPIC -fno-common -mpc64 -rdynamic -shared -D___SINGLE_HOST -D___DYNAMIC -I"/home/lucier/programs/gambit/gambit-profiled/include" -o 'fib-5.o1' -Q -fprofile-arcs -ftest-coverage -save-temps 'fib-5.c' Time variable usr sys wall GGC phase setup : 0.08 (100%) 0.02 (100%) 0.13 ( 93%) 22M (100%) TOTAL : 0.08 0.02 0.14 22M btowc wctob mbrlen ___H_fib_2d_5 ___setup_mod ___init_mod ___LNK_fib_2d_5_2e_o1 Analyzing compilation unit Performing interprocedural optimizations <*free_lang_data> {heap 2884k} <visibility> {heap 2884k} <build_ssa_passes> {heap 2884k} <opt_local_passes> {heap 3032k} <remove_symbols> {heap 7436k} <targetclone> {heap 7436k} <profile> {heap 7436k} <free-fnsummary> {heap 7436k}Streaming LTO <whole-program> {heap 7436k} <profile_estimate> {heap 7436k} <fnsummary> {heap 7436k} <inline> {heap 7436k} <pure-const> {heap 7436k} <modref> {heap 7436k} <free-fnsummary> {heap 7436k} <static-var> {heap 7436k} <single-use> {heap 7436k} <comdats> {heap 7436k}Assembling functions: <simdclone> {heap 7436k} ___setup_mod ___init_mod ___H_fib_2d_5gcc: fatal error: Killed signal terminated program cc1 compilation terminated.
Created attachment 50345 [details] Parametrized input files for test coverage testing. These are the .i files that go with my previous comment.
So the issue is still the same - one thing I noticed is that store-motion also adds a flag for each counter update to avoid introducing store-data-races. -fallow-store-data-races mitigates that part and speeds up the compilation quite a bit. In case there are threads involved you'd want -fprofile-update=atomic which then causes store-motion to give up and the compile-time is great overall. The original trigger of the regression is likely the marking of the profile counters as to not be aliased - we might want to introduce another flag to tell that store-data-races for the particular decl are not a consideration (maybe even have some user-visible attribute for this). Otherwise re-confirmed (I stripped options down to -O -fPIC -fprofile-arcs -ftest-coverage): rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O -fPIC -fprofile-arcs -ftest-coverage fib-2.o1-fib-2.i 1.84user 0.05system 0:01.90elapsed 99%CPU (0avgtext+0avgdata 160764maxresident)k 0inputs+0outputs (0major+58129minor)pagefaults 0swaps rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O -fPIC -fprofile-arcs -ftest-coverage fib-3.o1-fib-3.i 10.15user 0.17system 0:10.32elapsed 99%CPU (0avgtext+0avgdata 726688maxresident)k 0inputs+0outputs (0major+265008minor)pagefaults 0swaps rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O -fPIC -fprofile-arcs -ftest-coverage fib-4.o1-fib-4.i 43.60user 1.06system 0:44.68elapsed 99%CPU (0avgtext+0avgdata 6107260maxresident)k 0inputs+0outputs (0major+1765217minor)pagefaults 0swaps rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O -fPIC -fprofile-arcs -ftest-coverage fib-5.o1-fib-5.i gcc: fatal error: Killed signal terminated program cc1 compilation terminated. Command exited with non-zero status 1 143.09user 3.93system 2:28.29elapsed 99%CPU (0avgtext+0avgdata 24636148maxresident)k 37504inputs+0outputs (31major+6133278minor)pagefaults 0swaps on the last which runs OOM adding -fallow-store-data-races does rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O -fPIC -fprofile-arcs -ftest-coverage fib-5.o1-fib-5.i -fallow-store-data-races 123.06user 0.45system 2:03.59elapsed 99%CPU (0avgtext+0avgdata 1777700maxresident)k 57304inputs+0outputs (68major+535127minor)pagefaults 0swaps and -fprofile-update=atomic rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O -fPIC -fprofile-arcs -ftest-coverage fib-5.o1-fib-5.i -fprofile-update=atomic 0.61user 0.02system 0:00.63elapsed 100%CPU (0avgtext+0avgdata 73236maxresident)k 72inputs+0outputs (0major+18284minor)pagefaults 0swaps and -fno-tree-loop-im rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O -fPIC -fprofile-arcs -ftest-coverage fib-5.o1-fib-5.i -fno-tree-loop-im 1.06user 0.01system 0:01.07elapsed 99%CPU (0avgtext+0avgdata 90672maxresident)k 0inputs+0outputs (0major+24331minor)pagefaults 0swaps I still wonder if you can produce an even smaller testcase where visualizing the CFG is possible. Unfortunately the source is mechanically generated and following it is hard. Like a testcase that retains the basic structure but ends up with just a few (2, less than 10) computed gotos?
Created attachment 50352 [details] Smaller parameterized test file This file is generated from a single copy of the fibonacci function, and is simplified a bit otherwise. I believe it has two computed gotos.
Created attachment 50354 [details] SVG of the CFG at LIM This is a SVG of the CFG as created by dot at the point of the first LIM pass. The CFG isn't too special and I guess a switch instead of the computed goto would present us with the same issues. I suppose putting a hard limit on the number of stores to move and then ordering candidates based on their importance (execution frequency) is the way to go.
GCC 8 branch is being closed.
GCC 9.4 is being released, retargeting bugs to GCC 9.5.
GCC 9 branch is being closed
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
GCC 10 branch is being closed.
I tried the first input file with GCC 13.2 and on a Ryzen 9 7900X get a memory usage of 105MB and 1.1s compile-time. The larger testcase needs 360MB peak and 6.3s to compile. Both with mostly flat -ftime-report profile. Upping to -O2 shows same memory peak but 13.1s for the larger testcase. We then see PRE : 2.09 ( 16%) 0.01 ( 1%) 2.15 ( 15%) 288k ( 0%) as the biggest thing sticking out (similar for the small testcase). I think we've come a long way here. GCC 12.3 behaves the same. For GCC 11.4 the larger testcase at -O2 I stopped after 3 minutes, the small testcase at -O1 takes 44s and 5GB memory. Fixed for GCC 12+, I'm not going to look at identifying what to backport (I usually backported compile-time/memory-usage improvements when reasonable, so I suspect this was a bigger change).
I confirm that I no longer have this problem with > gcc-12 -v Using built-in specs. COLLECT_GCC=gcc-12 COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 12.3.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-12 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-12-ALHxjy/gcc-12-12.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-ALHxjy/gcc-12-12.3.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 12.3.0 (Ubuntu 12.3.0-1ubuntu1~22.04) A different example procedure still took > 45 minutes and > 3.5 GB to compile with -ftest-coverage -fprofile-arcs (it had finished when I came back from lunch) but it was quite large (even by my standards!). If this is a "won't fix" for earlier versions of gcc, then I'm OK with closing this PR.
It'll get closed when we close the GCC 11 branch, there's still the opportunity for somebody to bisect what fixed it in GCC 12 in case it was something trivial.
Fixed in GCC 12.