This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug other/60828] New: Compile time speedups when using tcmalloc
- From: "trippels at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 11 Apr 2014 20:15:20 +0000
- Subject: [Bug other/60828] New: Compile time speedups when using tcmalloc
- Auto-submitted: auto-generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60828
Bug ID: 60828
Summary: Compile time speedups when using tcmalloc
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: other
Assignee: unassigned at gcc dot gnu.org
Reporter: trippels at gcc dot gnu.org
There are noticeable compile time speedups when one links gcc with
tcmalloc. This happens mostly for C++ programs. Plain C projects
show not much difference.
Here are the compile times for Firefox an my 4-core machine:
Firefox -O3:
glibc malloc:
2806.82s user 126.92s system 349% cpu 13:58.37 total 0% speedup
tcmalloc:
2707.31s user 129.93s system 358% cpu 13:10.61 total 5.7% speedup
jemalloc:
2708.30s user 175.53s system 354% cpu 13:34.29 total 2.9% speedup
Firefox -flto=4 -O3:
glibc malloc:
3241.66s user 155.71s system 316% cpu 17:54.13 total 0% speedup
tcmalloc:
3140.43s user 164.22s system 323% cpu 17:01.13 total 4.9% speedup
jemalloc:
3155.74s user 226.63s system 320% cpu 17:35.51 total 1.7% speedup
A simpler example is tramp3d-v4:
glibc malloc:
% time g++ -w -O3 -march=native tramp3d-v4.cpp
22.30s user 0.34s system 97% cpu 23.301 total
tcmalloc:
~ % time g++ -w -O3 -march=native tramp3d-v4.cpp
21.36s user 0.30s system 99% cpu 21.659 total (~7% speedup)
tcmalloc build in heap-profiler shows (number of allocated megabytes.
This includes the space that has since been deallocated):
markus@x4 ~ % pprof --alloc_space --text
/usr/libexec/gcc/x86_64-pc-linux-gnu/4.9.0/cc1 /tmp/mybin.hprof_4474.0010.heap
Using local file /usr/libexec/gcc/x86_64-pc-linux-gnu/4.9.0/cc1.
Using local file /tmp/mybin.hprof_4474.0010.heap.
Total: 34.3 MB
7.7 22.6% 22.6% 7.8 22.6% c_common_nodes_and_builtins [clone
.cold.171]
5.7 16.7% 39.3% 5.7 16.7% tree_ssa_lim
4.3 12.5% 51.8% 10.8 31.5% cpp_classify_number
3.8 11.1% 62.9% 5.2 15.1% do_endif [clone .lto_priv.2364]
2.6 7.5% 70.4% 2.6 7.5% _cpp_pop_context
2.6 7.5% 77.8% 2.6 7.5% cgraph_add_node_removal_hook
2.2 6.5% 84.3% 2.2 6.5% __gmp_default_allocate
1.7 5.1% 89.4% 1.7 5.1% rtx_moveable_p [clone .isra.7] [clone
.lto_priv.5842]
1.5 4.2% 93.6% 1.7 5.1% add_exit_phis [clone .lto_priv.5880]
0.7 2.1% 95.7% 0.7 2.1% ix86_target_macros_internal [clone
.lto_priv.7319]
0.3 0.9% 96.6% 0.3 0.9% init_alias_vars [clone .lto_priv.9038]
0.3 0.8% 97.4% 0.3 0.8% gimple_fold_builtin
...
And total objects (including deallocated):
Total: 619253 objects
290259 46.9% 46.9% 290259 46.9% __gmp_default_allocate
89866 14.5% 61.4% 89866 14.5% rtx_moveable_p [clone .isra.7] [clone
.lto_priv.5842]
74190 12.0% 73.4% 107769 17.4% cpp_classify_number
66198 10.7% 84.1% 66243 10.7% do_endif [clone .lto_priv.2364]
44778 7.2% 91.3% 44778 7.2% _cpp_pop_context
20931 3.4% 94.7% 20939 3.4% simplify_plus_minus [clone
.lto_priv.5851]
8642 1.4% 96.1% 11749 1.9% expand_asm_operands [clone
.lto_priv.6838]
5665 0.9% 97.0% 5801 0.9% c_common_nodes_and_builtins [clone
.cold.171]
4659 0.8% 97.7% 4659 0.8% merge_classes [clone .part.41] [clone
.lto_priv.3432]
3659 0.6% 98.3% 3773 0.6% init_alias_vars [clone .lto_priv.9038]
2541 0.4% 98.7% 2541 0.4% tree_ssa_lim