Bug 65328 - GCC perf issue when compiling templates - 120x slower than Clang
Summary: GCC perf issue when compiling templates - 120x slower than Clang
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 4.9.2
: P3 normal
Target Milestone: 13.0
Assignee: Patrick Palka
URL:
Keywords: compile-time-hog
Depends on:
Blocks:
 
Reported: 2015-03-05 18:42 UTC by juchem
Modified: 2022-06-28 16:24 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2015-03-05 00:00:00


Attachments
preprocessed files from gcc 4.9 (209.16 KB, text/plain)
2015-03-05 18:58 UTC, juchem
Details
preprocessed file from gcc 4.8 (187.81 KB, text/plain)
2015-03-05 18:59 UTC, juchem
Details
preprocessed file from gcc 4.8 (187.81 KB, application/octet-stream)
2015-03-05 19:00 UTC, juchem
Details
preprocessed file from gcc 4.9 (209.16 KB, application/octet-stream)
2015-03-05 19:00 UTC, juchem
Details
FWIW, preprocessed file from clang 3.4 (189.68 KB, application/octet-stream)
2015-03-05 19:05 UTC, juchem
Details

Note You need to log in before you can comment on or make changes to this bug.
Description juchem 2015-03-05 18:42:37 UTC
Debian's g++ 4.9.2 and gcc 4.8.4 are extremely slow at compiling some templates.

They take more than 1h to compile the same code that Clang 3.4 needs 39 seconds. 

How to reproduce:

  $ git clone https://github.com/facebook/fatal.git \
    && cd fatal && git checkout -b dev origin/dev \
    && clang++-3.4 --version && time clang++-3.4 -Wall -std=c++11 -I . \
      fatal/type/benchmark/prefix_tree_benchmark.cpp \
    && g++-4.8 --version && time g++-4.8 -Wall -std=c++11 -I . \
      fatal/type/benchmark/prefix_tree_benchmark.cpp \
    && g++-4.9 --version && time g++-4.9 -Wall -std=c++11 -I . \
      fatal/type/benchmark/prefix_tree_benchmark.cpp

Output:

Cloning into 'fatal'...
remote: Counting objects: 1124, done.
remote: Compressing objects: 100% (226/226), done.
remote: Total 1124 (delta 119), reused 0 (delta 0), pack-reused 884
Receiving objects: 100% (1124/1124), 803.31 KiB | 1.20 MiB/s, done.
Resolving deltas: 100% (727/727), done.
Checking connectivity... done.
Branch dev set up to track remote branch dev from origin.
Switched to a new branch 'dev'

Debian clang version 3.4.2-13 (tags/RELEASE_34/dot2-final) (based on LLVM 3.4.2)
Target: x86_64-pc-linux-gnu
Thread model: posix

real    0m39.205s
user    0m37.416s
sys     0m1.432s

g++-4.8 (Debian 4.8.4-1) 4.8.4
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


real    64m37.227s
user    61m42.556s
sys     0m10.604s

g++-4.9 (Debian 4.9.2-10) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


real    65m33.790s
user    63m52.544s
sys     0m7.664s
Comment 1 Andrew Pinski 2015-03-05 18:47:06 UTC
This is not that useful.  Can you provide the preprocessed source for the file which is taking a long time?
Comment 2 Jan Hubicka 2015-03-05 18:49:43 UTC
Can youm please attach the preprocessed source files (generated with -E) to
reproduce the problem and also compile with -ftime-report and post the output?
Comment 3 juchem 2015-03-05 18:58:28 UTC
Created attachment 34965 [details]
preprocessed files from gcc 4.9

generated with:

$ time g++-4.9 -Wall -std=c++11 -E -I . fatal/type/benchmark/prefix_tree_benchmark.cpp

real    0m0.468s
user    0m0.452s
sys     0m0.016s
Comment 4 juchem 2015-03-05 18:59:13 UTC
Created attachment 34966 [details]
preprocessed file from gcc 4.8

generated with:

$ time g++-4.8 -Wall -std=c++11 -E -I . fatal/type/benchmark/prefix_tree_benchmark.cpp

real    0m0.450s
user    0m0.424s
sys     0m0.020s
Comment 5 juchem 2015-03-05 19:00:19 UTC
Created attachment 34967 [details]
preprocessed file from gcc 4.8

generated with:

$ time g++-4.8 -Wall -std=c++11 -E -I . fatal/type/benchmark/prefix_tree_benchmark.cpp

real    0m0.450s
user    0m0.424s
sys     0m0.020s
Comment 6 juchem 2015-03-05 19:00:58 UTC
Created attachment 34968 [details]
preprocessed file from gcc 4.9

generated with:

$ time g++-4.9 -Wall -std=c++11 -E -I . fatal/type/benchmark/prefix_tree_benchmark.cpp

real    0m0.468s
user    0m0.452s
sys     0m0.016s
Comment 7 juchem 2015-03-05 19:05:14 UTC
Created attachment 34969 [details]
FWIW, preprocessed file from clang 3.4

generated with:

$ time clang++-3.4 -Wall -std=c++11 -E -I . fatal/type/benchmark/prefix_tree_benchmark.cpp > clang-3.4-preprocessed.cpp

real    0m0.224s
user    0m0.188s
sys     0m0.028s
Comment 8 Markus Trippelsdorf 2015-03-05 19:38:40 UTC
It is also interesting that gcc-5 rejects the testcase (4.8, 4.9 and clang accept it):

In file included from ./fatal/type/pair.h:14:0,
                 from ./fatal/type/list.h:13,
                 from ./fatal/type/map.h:13,
                 from ./fatal/type/prefix_tree.h:13,
                 from fatal/type/benchmark/prefix_tree_benchmark.cpp:10:
./fatal/type/transform.h:1811:48: error: expected template-name before ‘<’ token
   using apply = typename with<>::template apply<T>;
                                                ^
./fatal/type/transform.h:1811:48: error: expected identifier before ‘<’ token
In file included from ./fatal/type/reflect_template.h:14:0,
                 from ./fatal/type/reflection.h:14,
                 from ./fatal/type/prefix_tree.h:14,
                 from fatal/type/benchmark/prefix_tree_benchmark.cpp:10:
./fatal/type/sequence.h: In static member function ‘static constexpr const type* fatal::constant_sequence<T, Values>::data()’:
./fatal/type/sequence.h:150:48: error: incomplete type ‘fatal::constant_sequence<T, Values>::array<> {aka fatal::constant_array<T, Values ...>}’ used in nested name specifier
   static constexpr type const *data() { return array<>::data(); }
                                                ^
./fatal/type/sequence.h: In static member function ‘static constexpr const type* fatal::constant_sequence<T, Values>::z_data()’:
./fatal/type/sequence.h:175:50: error: incomplete type ‘fatal::constant_sequence<T, Values>::z_array<> {aka fatal::constant_array<T, Values ..., static_cast<T>(0)>}’ used in nested name specifier
   static constexpr type const *z_data() { return z_array<>::data(); }
                                                  ^
while it still keeps running afterwards.

perf shows (I only ran it for ~2 minutes):

gcc-4.8
  21.48%  cc1plus  cc1plus            [.] comp_template_args_with_info
  16.94%  cc1plus  cc1plus            [.] structural_comptypes
   8.23%  cc1plus  cc1plus            [.] htab_find_slot_with_hash
   7.02%  cc1plus  cc1plus            [.] cp_tree_equal
   6.91%  cc1plus  cc1plus            [.] typename_compare
   6.72%  cc1plus  cc1plus            [.] eq_specializations
   5.39%  cc1plus  cc1plus            [.] cp_type_quals

gcc-4.9
  22.90%  cc1plus  cc1plus            [.] structural_comptypes
  21.53%  cc1plus  cc1plus            [.] eq_specializations
   8.20%  cc1plus  cc1plus            [.] make_typename_type
   6.27%  cc1plus  cc1plus            [.] template_args_equal
   5.88%  cc1plus  cc1plus            [.] comp_template_args_with_info
   5.82%  cc1plus  cc1plus            [.] cp_tree_equal
   5.53%  cc1plus  cc1plus            [.] typename_compar

gcc-5
  26.96%  cc1plus  cc1plus              [.] make_typename_type
  20.37%  cc1plus  cc1plus              [.] template_args_equal
  13.03%  cc1plus  cc1plus              [.] structural_comptypes
   5.17%  cc1plus  cc1plus              [.] cp_tree_equal
   4.43%  cc1plus  cc1plus              [.] tsubst_aggr_type
   4.15%  cc1plus  cc1plus              [.] comptypes
Comment 9 Andrew Pinski 2016-08-28 21:45:46 UTC
Would be interesting to rerun this again to see if this has been improved.
Comment 10 GCC Commits 2022-06-10 20:10:50 UTC
The master branch has been updated by Patrick Palka <ppalka@gcc.gnu.org>:

https://gcc.gnu.org/g:343d83c7a89d0c7a78139e685395228115a28f6e

commit r13-1047-g343d83c7a89d0c7a78139e685395228115a28f6e
Author: Patrick Palka <ppalka@redhat.com>
Date:   Fri Jun 10 16:10:02 2022 -0400

    c++: improve TYPENAME_TYPE hashing [PR65328]
    
    For the testcase in this PR, compilation takes very long ultimately due
    to our poor hashing of TYPENAME_TYPE causing a huge number of collisions
    in the spec_hasher and typename_hasher tables.
    
    In spec_hasher, we don't hash the components of TYPENAME_TYPE, which
    means most TYPENAME_TYPE arguments end up contributing the same hash.
    This is the safe thing to do uniformly since structural_comptypes may
    try resolving a TYPENAME_TYPE via the current instantiation.  But this
    behavior of structural_comptypes is suppressed from spec_hasher::equal
    via the comparing_specializations flag, which means spec_hasher::hash
    can assume it's disabled too.  To that end, this patch makes
    spec_hasher::hash set the flag, and teaches iterative_hash_template_arg
    to hash the relevant components of TYPENAME_TYPE when the flag is set.
    
    And in typename_hasher, the hash function considers TYPE_IDENTIFIER
    instead of the more informative TYPENAME_TYPE_FULLNAME, which this patch
    fixes accordingly.
    
    After this patch, compile time for the testcase in the PR falls to
    around 30 seconds on my machine (down from dozens of minutes).
    
            PR c++/65328
    
    gcc/cp/ChangeLog:
    
            * decl.cc (typename_hasher::hash): Add extra overloads.
            Use iterative_hash_object instead of htab_hash_pointer.
            Hash TYPENAME_TYPE_FULLNAME instead of TYPE_IDENTIFIER.
            (build_typename_type): Use typename_hasher::hash.
            * pt.cc (spec_hasher::hash): Add two-parameter overload.
            Set comparing_specializations around the call to
            hash_tmpl_and_args.
            (iterative_hash_template_arg) <case TYPENAME_TYPE>:
            When comparing_specializations, hash the TYPE_CONTEXT
            and TYPENAME_TYPE_FULLNAME.
            (tsubst_function_decl): Use spec_hasher::hash instead of
            hash_tmpl_and_args.
            (tsubst_template_decl): Likewise.
            (tsubst_decl): Likewise.
Comment 11 juchem 2022-06-11 01:23:46 UTC
great job
Comment 12 Patrick Palka 2022-06-28 16:24:49 UTC
Should be fixed for GCC 13