Debian's g++ 4.9.2 and gcc 4.8.4 are extremely slow at compiling some templates. They take more than 1h to compile the same code that Clang 3.4 needs 39 seconds. How to reproduce: $ git clone https://github.com/facebook/fatal.git \ && cd fatal && git checkout -b dev origin/dev \ && clang++-3.4 --version && time clang++-3.4 -Wall -std=c++11 -I . \ fatal/type/benchmark/prefix_tree_benchmark.cpp \ && g++-4.8 --version && time g++-4.8 -Wall -std=c++11 -I . \ fatal/type/benchmark/prefix_tree_benchmark.cpp \ && g++-4.9 --version && time g++-4.9 -Wall -std=c++11 -I . \ fatal/type/benchmark/prefix_tree_benchmark.cpp Output: Cloning into 'fatal'... remote: Counting objects: 1124, done. remote: Compressing objects: 100% (226/226), done. remote: Total 1124 (delta 119), reused 0 (delta 0), pack-reused 884 Receiving objects: 100% (1124/1124), 803.31 KiB | 1.20 MiB/s, done. Resolving deltas: 100% (727/727), done. Checking connectivity... done. Branch dev set up to track remote branch dev from origin. Switched to a new branch 'dev' Debian clang version 3.4.2-13 (tags/RELEASE_34/dot2-final) (based on LLVM 3.4.2) Target: x86_64-pc-linux-gnu Thread model: posix real 0m39.205s user 0m37.416s sys 0m1.432s g++-4.8 (Debian 4.8.4-1) 4.8.4 Copyright (C) 2013 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. real 64m37.227s user 61m42.556s sys 0m10.604s g++-4.9 (Debian 4.9.2-10) 4.9.2 Copyright (C) 2014 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. real 65m33.790s user 63m52.544s sys 0m7.664s
This is not that useful. Can you provide the preprocessed source for the file which is taking a long time?
Can youm please attach the preprocessed source files (generated with -E) to reproduce the problem and also compile with -ftime-report and post the output?
Created attachment 34965 [details] preprocessed files from gcc 4.9 generated with: $ time g++-4.9 -Wall -std=c++11 -E -I . fatal/type/benchmark/prefix_tree_benchmark.cpp real 0m0.468s user 0m0.452s sys 0m0.016s
Created attachment 34966 [details] preprocessed file from gcc 4.8 generated with: $ time g++-4.8 -Wall -std=c++11 -E -I . fatal/type/benchmark/prefix_tree_benchmark.cpp real 0m0.450s user 0m0.424s sys 0m0.020s
Created attachment 34967 [details] preprocessed file from gcc 4.8 generated with: $ time g++-4.8 -Wall -std=c++11 -E -I . fatal/type/benchmark/prefix_tree_benchmark.cpp real 0m0.450s user 0m0.424s sys 0m0.020s
Created attachment 34968 [details] preprocessed file from gcc 4.9 generated with: $ time g++-4.9 -Wall -std=c++11 -E -I . fatal/type/benchmark/prefix_tree_benchmark.cpp real 0m0.468s user 0m0.452s sys 0m0.016s
Created attachment 34969 [details] FWIW, preprocessed file from clang 3.4 generated with: $ time clang++-3.4 -Wall -std=c++11 -E -I . fatal/type/benchmark/prefix_tree_benchmark.cpp > clang-3.4-preprocessed.cpp real 0m0.224s user 0m0.188s sys 0m0.028s
It is also interesting that gcc-5 rejects the testcase (4.8, 4.9 and clang accept it): In file included from ./fatal/type/pair.h:14:0, from ./fatal/type/list.h:13, from ./fatal/type/map.h:13, from ./fatal/type/prefix_tree.h:13, from fatal/type/benchmark/prefix_tree_benchmark.cpp:10: ./fatal/type/transform.h:1811:48: error: expected template-name before ‘<’ token using apply = typename with<>::template apply<T>; ^ ./fatal/type/transform.h:1811:48: error: expected identifier before ‘<’ token In file included from ./fatal/type/reflect_template.h:14:0, from ./fatal/type/reflection.h:14, from ./fatal/type/prefix_tree.h:14, from fatal/type/benchmark/prefix_tree_benchmark.cpp:10: ./fatal/type/sequence.h: In static member function ‘static constexpr const type* fatal::constant_sequence<T, Values>::data()’: ./fatal/type/sequence.h:150:48: error: incomplete type ‘fatal::constant_sequence<T, Values>::array<> {aka fatal::constant_array<T, Values ...>}’ used in nested name specifier static constexpr type const *data() { return array<>::data(); } ^ ./fatal/type/sequence.h: In static member function ‘static constexpr const type* fatal::constant_sequence<T, Values>::z_data()’: ./fatal/type/sequence.h:175:50: error: incomplete type ‘fatal::constant_sequence<T, Values>::z_array<> {aka fatal::constant_array<T, Values ..., static_cast<T>(0)>}’ used in nested name specifier static constexpr type const *z_data() { return z_array<>::data(); } ^ while it still keeps running afterwards. perf shows (I only ran it for ~2 minutes): gcc-4.8 21.48% cc1plus cc1plus [.] comp_template_args_with_info 16.94% cc1plus cc1plus [.] structural_comptypes 8.23% cc1plus cc1plus [.] htab_find_slot_with_hash 7.02% cc1plus cc1plus [.] cp_tree_equal 6.91% cc1plus cc1plus [.] typename_compare 6.72% cc1plus cc1plus [.] eq_specializations 5.39% cc1plus cc1plus [.] cp_type_quals gcc-4.9 22.90% cc1plus cc1plus [.] structural_comptypes 21.53% cc1plus cc1plus [.] eq_specializations 8.20% cc1plus cc1plus [.] make_typename_type 6.27% cc1plus cc1plus [.] template_args_equal 5.88% cc1plus cc1plus [.] comp_template_args_with_info 5.82% cc1plus cc1plus [.] cp_tree_equal 5.53% cc1plus cc1plus [.] typename_compar gcc-5 26.96% cc1plus cc1plus [.] make_typename_type 20.37% cc1plus cc1plus [.] template_args_equal 13.03% cc1plus cc1plus [.] structural_comptypes 5.17% cc1plus cc1plus [.] cp_tree_equal 4.43% cc1plus cc1plus [.] tsubst_aggr_type 4.15% cc1plus cc1plus [.] comptypes
Would be interesting to rerun this again to see if this has been improved.
The master branch has been updated by Patrick Palka <ppalka@gcc.gnu.org>: https://gcc.gnu.org/g:343d83c7a89d0c7a78139e685395228115a28f6e commit r13-1047-g343d83c7a89d0c7a78139e685395228115a28f6e Author: Patrick Palka <ppalka@redhat.com> Date: Fri Jun 10 16:10:02 2022 -0400 c++: improve TYPENAME_TYPE hashing [PR65328] For the testcase in this PR, compilation takes very long ultimately due to our poor hashing of TYPENAME_TYPE causing a huge number of collisions in the spec_hasher and typename_hasher tables. In spec_hasher, we don't hash the components of TYPENAME_TYPE, which means most TYPENAME_TYPE arguments end up contributing the same hash. This is the safe thing to do uniformly since structural_comptypes may try resolving a TYPENAME_TYPE via the current instantiation. But this behavior of structural_comptypes is suppressed from spec_hasher::equal via the comparing_specializations flag, which means spec_hasher::hash can assume it's disabled too. To that end, this patch makes spec_hasher::hash set the flag, and teaches iterative_hash_template_arg to hash the relevant components of TYPENAME_TYPE when the flag is set. And in typename_hasher, the hash function considers TYPE_IDENTIFIER instead of the more informative TYPENAME_TYPE_FULLNAME, which this patch fixes accordingly. After this patch, compile time for the testcase in the PR falls to around 30 seconds on my machine (down from dozens of minutes). PR c++/65328 gcc/cp/ChangeLog: * decl.cc (typename_hasher::hash): Add extra overloads. Use iterative_hash_object instead of htab_hash_pointer. Hash TYPENAME_TYPE_FULLNAME instead of TYPE_IDENTIFIER. (build_typename_type): Use typename_hasher::hash. * pt.cc (spec_hasher::hash): Add two-parameter overload. Set comparing_specializations around the call to hash_tmpl_and_args. (iterative_hash_template_arg) <case TYPENAME_TYPE>: When comparing_specializations, hash the TYPE_CONTEXT and TYPENAME_TYPE_FULLNAME. (tsubst_function_decl): Use spec_hasher::hash instead of hash_tmpl_and_args. (tsubst_template_decl): Likewise. (tsubst_decl): Likewise.
great job
Should be fixed for GCC 13