Recording call-graphs with perf
To get useful call-graphs build the compiler with -fno-omit-frame-pointer
Enable call-graph recording with -g, e.g.:
perf record -g g++ -w -Ofast tramp3d-v4.cpp
- To view the result run:
perf report -g "graph,0.5,caller"
Samples: 121K of event 'cycles', Event count (approx.): 97086533769 Children Self Command Shared Object Symbol + 89.66% 0.00% cc1plus cc1plus [.] toplev::main + 89.66% 0.00% cc1plus cc1plus [.] compile_file + 89.66% 0.00% cc1plus cc1plus [.] main + 89.65% 0.00% cc1plus libc-2.22.90.so [.] __libc_start_main + 89.64% 0.00% cc1plus [unknown] [.] 0x4c5441554100647c + 74.91% 0.00% cc1plus cc1plus [.] symbol_table::finalize_compilation_unit + 71.82% 0.00% cc1plus cc1plus [.] symbol_table::compile + 70.09% 0.09% cc1plus cc1plus [.] execute_one_pass + 68.68% 0.01% cc1plus cc1plus [.] execute_pass_list + 68.67% 0.02% cc1plus cc1plus [.] execute_pass_list_1 + 56.39% 0.00% cc1plus cc1plus [.] cgraph_node::expand + 14.22% 0.00% cc1plus cc1plus [.] execute_ipa_pass_list + 12.58% 0.01% cc1plus cc1plus [.] do_per_function_toporder + 7.89% 0.03% cc1plus cc1plus [.] instantiate_decl + 7.88% 0.02% cc1plus cc1plus [.] c_parse_final_cleanups + 7.81% 0.02% cc1plus cc1plus [.] instantiate_pending_templates + 7.19% 0.10% cc1plus cc1plus [.] dom_walker::walk + 6.87% 0.00% cc1plus cc1plus [.] c_common_parse_file + 6.86% 0.01% cc1plus cc1plus [.] c_parse_file
You can navigate the list with the cursor keys. Notice the + symbols on the far left. You can expand (and collapse) these call-graph nodes by pressing Enter.
Samples: 121K of event 'cycles', Event count (approx.): 97086533769 Children Self Command Shared Object Symbol - 89.66% 0.00% cc1plus cc1plus [.] toplev::main - toplev::main - 89.65% compile_file - 74.91% symbol_table::finalize_compilation_unit - 71.82% symbol_table::compile - 56.39% cgraph_node::expand - 54.99% execute_pass_list - execute_pass_list_1 + 52.90% execute_pass_list_1 + 2.07% execute_one_pass 0.00% (anonymous namespace)::pass_expand::execute 0.00% (anonymous namespace)::pass_lower_eh_dispatch::gate 0.00% (anonymous namespace)::pass_lower_vector::gate 0.00% (anonymous namespace)::pass_rest_of_compilation::gate 0.00% (anonymous namespace)::pass_tsan_O0::gate 0.00% (anonymous namespace)::pass_vtable_verify::gate 0.00% execute_todo 0.00% ggc_collect 0.00% (anonymous namespace)::pass_lower_resx::execute 0.00% (anonymous namespace)::pass_lower_vaarg::gate 0.00% opt_pass::gate + 1.36% execute_all_ipa_transforms + 0.02% init_function_start + 0.01% cgraph_node::assemble_thunks_and_aliases 0.00% invoke_set_current_function_hook + 14.22% execute_ipa_pass_list + 1.12% execute_ipa_summary_passes + 0.05% symbol_table::materialize_all_clones + 0.03% symbol_table::remove_unreachable_nodes + 0.01% symbol_table::output_variables 0.01% ipa_reverse_postorder 0.00% output_in_order 0.00% announce_function 0.00% gimple_set_body 0.00% cgraph_node::release_body 0.00% execute_pass_list 0.00% type_in_anonymous_namespace_p + 2.46% analyze_functions + 0.62% handle_alias_pairs 0.00% gimple_has_body_p 0.00% decl_function_context + 7.86% c_parse_final_cleanups + 6.87% c_common_parse_file 0.00% decl_needed_p 0.00% emit_tinfo_decl + 0.01% cxx_init + 0.00% init_ttree + 0.00% gcc::context::context + 0.00% init_emit_regs 0.00% init_reg_sets_1
- By pressing the right cursor key twice you will see the annotated disassembly of the symbol, e.g.:
... │ static inline void │ gsi_prev (gimple_stmt_iterator *i) │ { │ gimple *prev = i->ptr->prev; │ a58:┌─→mov 0x20(%rbx),%rbx │ │ if (prev->next) 5.26 │ │ cmpq $0x0,0x18(%rbx) │ │↓ je af0 │ │ i->ptr = prev; │ a67:│ mov %rbx,-0x50(%rbp) │ a6b:│ test %rbx,%rbx │ │↓ je af8 │ │ { │ │ stmt = gsi_stmt (gsi); │ │ if (gimple_code (stmt) != GIMPLE_CALL) │ │ cmpb $0x8,(%rbx) 5.26 │ │↑ jne a58 │ │ continue; │ │ if (!gimple_call_internal_p (stmt) │ │ || gimple_call_internal_fn (stmt) != IFN_ANNOTATE) │ │ testb $0x40,0x2(%rbx) 10.53 │ └──je a58 │ cmpl $0x8,0x60(%rbx) │ ↑ jne a58 │ ...
See man perf-report for further options.
Kudos to: CppCon 2015: Chandler Carruth "Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!"