Created attachment 32686 [details] g++ -v output for 4.8.2 Hello. After test upgrade from 4.8.2 to 4.9.0 I've noticed that compilation takes significantly more time than before. g++ -v output for both compilers is in attachments. I've compiled the same file from internal project with following switches. Debug build here, but it's the same problem with release flags: -std=gnu++11 -fvisibility=hidden -Wall -Wextra -Winit-self -Winvalid-pch -Wfatal-errors -Woverloaded-virtual -fvisibility-inlines-hidden -O0 -g -pipe -fsanitize=address -ftime-report Full ftime-report output for both compilers is in attachments. From what I see, these steps take significantly more time than before: - phase lang. deferred : 1.22 usr for 4.8.2, 4.08 usr for 4.9.0; - template instantiation : 1.44 usr for 4.8.2, 5.59 usr for 4.9.0. Some other numbers are higher too, but these 2 have the largest increase. I'm not sure how to create a reduced test case from this project, but I will try something on Monday.
Created attachment 32687 [details] g++ -v output for 4.9.0
Created attachment 32688 [details] ftime-report for G++ 4.8.2
Created attachment 32689 [details] ftime-report for G++ 4.9.0
After a series of tests I'm pretty sure that it is not a problem of g++, but libstdc++. Let me explain why do I think so. First, I was not able to reduce real life case to reasonably small subset that does not depend on external libs at least. So I've decided to test explicit template instantiation of std::map for multiple combination of integer types as key and value. File that I've used is attached as test.cpp. Overall pattern was the same. After that I realized that results depend on actual libstdc++ includes and I compare apples with oranges. So, I've used these commands to get preprocessed dump: - g++-4.8 -std=gnu++11 -O0 -E -P std_map.cpp > preprocessed_48.cpp - g++-4.9 -std=gnu++11 -O0 -E -P std_map.cpp > preprocessed_49.cpp After that I've compiled both files with g++-4.8 first and g++-4.9 second to compare results. And it turned out that they are nearly the same for both compilers depending on what file I compile. ftime-reports are in attachments. I understand that it is not a bug per se, but compilation time for my project that uses STL containers a lot jumped from 15 minutes to 40 minutes mark, and it would be nice to reduce it back somehow.
Created attachment 32696 [details] Test file used
Created attachment 32697 [details] ftime-report, preprocessed by 4.8.2, compiled by 4.8.2
Created attachment 32698 [details] ftime-report, preprocessed by 4.8.2, compiled by 4.9.0
Created attachment 32699 [details] ftime-report, preprocessed by 4.9.0, compiled by 4.8.2
Created attachment 32700 [details] ftime-report, preprocessed by 4.9.0, compiled by 4.9.0
Most of the libstdc++ changes are for C++11 allocator support, which is required for conformance. Some changes I made in <bits/alloc_traits.h> might have negatively affected compile time though.
Hi, I want to chime in on this - is there anything new regarding the issue? The current state marks it as UNCONFIRMED however there is some definite performance loss even in the current 4.9.1 and 4.9.2 gcc releases. My employer switched to gcc 4.9.1 recently and our codebase (mostly C++, heave users of stl / boost) has almost doubled it's compile time. We're up to 60min for a clean compile - and that is with ccache enabled. Plain, no-ccache compiles are even worse. Compiling with -ftime-report shows that g++ is spending significant ammounts of time in the 'parsing' stage (mostly between 1.8 and 2.8sec/per file). Currently we're using 4.9.1 on CentOS 5 (custom build) but I also confirmed the same increase in compile time on a Gentoo and Debian system.
As Jonathan wrote, the standard library must conform to the C++ standard. You simply cannot expect fast compile times out of the box when you make heavy use of stl/boost. (Although careful analysis might bring it down considerably. (e.g., bundling most template instantiations into a single compilation unit)) For the attached testcase clang's libc++ is even 30% slower than gcc-4.9 libstdc++.
I spent several hours trying to find the cause of the slowdown, without success. Currently I am focused on fixing regressions and conformance errors. I will come back to this when I can.
(In reply to Markus Trippelsdorf from comment #12) > As Jonathan wrote, the standard library must conform to the C++ standard. > > You simply cannot expect fast compile times out of the box when you make > heavy use of stl/boost. > (Although careful analysis might bring it down considerably. (e.g., bundling > most template instantiations into a single compilation unit)) > > For the attached testcase clang's libc++ is even 30% slower than gcc-4.9 > libstdc++. I agree on your point here, however shouldn't an unchanged codebase (and we're not using C++11 features or -std=c++11 yet) at least keep the same performance? It's understandable that standards compliance can counter performance but I honestly wouldn't expect the compiler performance of older code / code not using the new features to drop that drastically..
For the program in comment 5 most of the difference in compile-time is caused by the allocator-aware container requirements (additional constructors which need to be instantiated for every explicit instantiation in the program and more complicated definitions for copy/move/swap). These changes are required for C++11 conformance, and I don't see any obvious way to implement them without affecting compile times. There is also some increase in compile-time caused by extra 'noexcept' specifications added to lots of functions, especially iterator member functions and operators. Not all of those are required by the standard, but they are important nonetheless.
Thank you Jonathan, that explains some things. I get it there is also no easy way to disable certain standards features at compile time?
(In reply to Rene Koecher from comment #14) > I agree on your point here, however shouldn't an unchanged codebase (and > we're not using C++11 features or -std=c++11 yet) at least keep the same > performance? No, because the standard library headers you include are not unchanged if you upgrade the compiler. > It's understandable that standards compliance can counter performance but I > honestly wouldn't expect the compiler performance of older code / code not > using the new features to drop that drastically.. Please provide a testcase demonstrating the problem with pre-C++11 code.
(In reply to Jonathan Wakely from comment #15) > For the program in comment 5 most of the difference in compile-time is > caused by the allocator-aware container requirements (additional > constructors which need to be instantiated for every explicit instantiation > in the program and more complicated definitions for copy/move/swap). These > changes are required for C++11 conformance, and I don't see any obvious way > to implement them without affecting compile times. Just to confirm, the regression started with r204848 (~25% compile time slowdown with -O0 on sandybridge).
At Facebook we experienced a similar regression, compilation times more than doubled for several large C++ files. We found that the regression was mostly caused by r207240, specifically to the changes in bits/alloc_traits.h. Just reverting that file brought back build times almost to previous levels. By looking at GCC profiles, a disproportionate amount of time is spent in structural_comptypes and template_args_equal (which doesn't happen before the change). The revision only changes the way some traits are selected through SFINAE, specifically the pattern: template <typename T> enable_if<..., R>::type f(); became template <typename T, typename = _Require<...>> R f(); and _Require is just a wrapper around enable_if. I don't know why this change has such a large impact on compilation times, it would deserve some investigation. Other parts of the standard library might be affected by this. The regression might have been already solved in r225244, which uses yet another SFINAE pattern without extra template arguments, which I believe are the cause of the regression. However I haven't tested it yet.
(In reply to Giuseppe Ottaviano from comment #19) > At Facebook we experienced a similar regression, compilation times more than > doubled for several large C++ files. > We found that the regression was mostly caused by r207240, specifically to > the changes in bits/alloc_traits.h. Just reverting that file brought back > build times almost to previous levels. > > By looking at GCC profiles, a disproportionate amount of time is spent in > structural_comptypes and template_args_equal (which doesn't happen before > the change). The revision only changes the way some traits are selected > through SFINAE, specifically the pattern: > > template <typename T> enable_if<..., R>::type f(); > > became > > template <typename T, typename = _Require<...>> R f(); > > and _Require is just a wrapper around enable_if. Jason, this is an interesting observation about where time is spent in the FE for this common SFINAE technique. I use it for constructors, where there is no return value on which to put the enable_if, and where adding an extra constructor parameter with a default argument would change the signature (or be impossible, due to the constructor being a variadic template). Any chance this is low-hanging fruit and could be avoided fairly easily, or should I stop using this technique in bits of the library that are compiled as often as allocator_traits? > I don't know why this change has such a large impact on compilation times, > it would deserve some investigation. Other parts of the standard library > might be affected by this. Very probably, I have used that pattern widely. > The regression might have been already solved in r225244, which uses yet > another SFINAE pattern without extra template arguments, which I believe are > the cause of the regression. However I haven't tested it yet. That would be nice to know, because I now use that kind of void_t-style constraint in a few places, and plan to use it more widely. My measurements do show that using void_t-style constraints result in small but measurable reductions in compile time and memory use.
(In reply to Jonathan Wakely from comment #20) > (In reply to Giuseppe Ottaviano from comment #19) > > The regression might have been already solved in r225244, which uses yet > > another SFINAE pattern without extra template arguments, which I believe are > > the cause of the regression. However I haven't tested it yet. > > That would be nice to know, because I now use that kind of void_t-style > constraint in a few places, and plan to use it more widely. My measurements > do show that using void_t-style constraints result in small but measurable > reductions in compile time and memory use. Oh, I looked at the wrong bit of r225244, it's using SFINAE in a trailing-return-type that matters here, not the __detected_or_t_ changes.
>> The regression might have been already solved in r225244, which uses >> yet another SFINAE pattern without extra template arguments, which I >> believe are the cause of the regression. However I haven't tested it >> yet. > That would be nice to know, because I now use that kind of > void_t-style constraint in a few places, and plan to use it more > widely. My measurements do show that using void_t-style constraints > result in small but measurable reductions in compile time and memory > use. > Oh, I looked at the wrong bit of r225244, it's using SFINAE in a > trailing-return-type that matters here, not the __detected_or_t_ > changes. Yes I referred to the trailing return type. Unfortunately it's not trivial to test it with our code because alloc_traits.h is not anymore a drop-in replacement. Maybe the test code included in this bug is enough? Is r225244 already included in a GCC release?
(In reply to Giuseppe Ottaviano from comment #22) > Yes I referred to the trailing return type. Unfortunately it's not trivial > to test it with our code because alloc_traits.h is not anymore a drop-in > replacement. Maybe the test code included in this bug is enough? Is r225244 > already included in a GCC release? No, only on trunk. It depends on the additions in r225242, so to use the new alloc_traits.h you would only need the new code in https://gcc.gnu.org/viewcvs/gcc/trunk/libstdc%2B%2B-v3/include/std/type_traits?r1=225242&r2=225241&pathrev=225242 (which could be added to the top of alloc_traits.h just in order to test, if that's easier).
> No, only on trunk. It depends on the additions in r225242, so to use the new > alloc_traits.h you would only need the new code in > https://gcc.gnu.org/viewcvs/gcc/trunk/libstdc%2B%2B-v3/include/std/ > type_traits?r1=225242&r2=225241&pathrev=225242 (which could be added to the > top of alloc_traits.h just in order to test, if that's easier). Does __detected_or_t depend on some new language features? I copied all the required metafunctions (including __void_t and __ptr_rebind), and I get this error: .../bits/alloc_traits.h: In substitution of 'template<class _Tp> using __v_pointer = typename _Tp::void_pointer [with _Tp = std::allocator<std::thread::_Impl<std::_Bind_simple<std::function<void()>()> > >]': .../bits/alloc_traits.h:57:29: required from 'struct std::__detector<void*, void, std::__allocator_traits_base::__v_pointer, std::allocator<std::thread::_Impl<std::_Bind_simple<std::function<void()>()> > > >' [...] .../bits/alloc_traits.h:103:53: error: no type named 'void_pointer' in 'class std::allocator<std::thread::_Impl<std::_Bind_simple<std::function<void()>()> > >' Looks like it is ignoring the __detector negative case.
There was a G++ bug (now fixed) that made void_t not work, try this alternative version: template< class... > struct __voider { using type = void; }; template< class... _T0toN > using __void_t = typename __voider<_T0toN...>::type;
(In reply to Jonathan Wakely from comment #25) > There was a G++ bug (now fixed) that made void_t not work, try this > alternative version: > > template< class... > struct __voider { using type = void; }; > template< class... _T0toN > using __void_t = typename > __voider<_T0toN...>::type; Thanks, I finally got it to work! I also had to hack back __alloctr_rebind because hashtable.h depended on it. I can confirm that, on my test files, the trailing return approach is not slower than enable_if on return. Since std::allocator is pretty much the only used allocator, I also tried to add a partial specialization allocator_traits<allocator<T>> so that no SFINAE has to be performed, and there is a non-negligible speedup: enable_if on return type: 100% (baseline) _Require/enable_if on extra template argument: 225% detected_or_t/trailing return type: 100% partial specialization for allocator<>: 89% This is all with GCC 4.9.2 with no optimization flags. GCC 5.2 gives similar times. Do you use partial specializations as performance optimizations (thus equivalent to the general case) in libstdc++?
(In reply to Giuseppe Ottaviano from comment #26) > Do you use partial specializations as performance optimizations (thus > equivalent to the general case) in libstdc++? No, but doing so for std::allocator_traits<std::allocator<T>> might make sense.
(In reply to Giuseppe Ottaviano from comment #26) Giuseppe, is there an easy way you could provide me with your changes to alloc_traits.h? I'd really like to give it a shot against our codebase and see if there's any speedup. I tried to follow along by taking the alloc_traits.h from 225242 adding the suggested diff to compensate for the missing bits in type_traits. While this results in a working drop-in replacement I can not detect any noticable changes in compile time.
(In reply to Rene Koecher from comment #28) > (In reply to Giuseppe Ottaviano from comment #26) > > Giuseppe, is there an easy way you could provide me with your changes to > alloc_traits.h? > > I'd really like to give it a shot against our codebase and see if there's > any speedup. I tried to follow along by taking the alloc_traits.h from > 225242 adding the suggested diff to compensate for the missing bits in > type_traits. > > While this results in a working drop-in replacement I can not detect any > noticable changes in compile time. Interesting, in a previous comment you say that you see a large jump in "parsing" time, while in my case (and also in the attached reports) the largest increases are in template instantiation. Maybe you have a different problem? Anyway, of course I can share my changes (please ignore the weird formatting): https://gist.github.com/ot/b094d58bf049cee3db99 If replacing alloc_traits.h does not work, try also allocator.h, it includes the partial specialization that shortcuts SFINAE.
(In reply to Giuseppe Ottaviano from comment #19) > At Facebook we experienced a similar regression, compilation times more than > doubled for several large C++ files. > We found that the regression was mostly caused by r207240, specifically to > the changes in bits/alloc_traits.h. Just reverting that file brought back > build times almost to previous levels. Do you have a testcase showing this slowdown? I tried a partial specialization and it doesn't make much difference to the test file in comment 5, confirming that the Facebook issue is different from the original bug report, which is caused by the constructor overloads that are required by the standard.
Author: redi Date: Mon Jan 11 16:47:58 2016 New Revision: 232232 URL: https://gcc.gnu.org/viewcvs?rev=232232&root=gcc&view=rev Log: allocator_traits<allocator<T>> partial specialization PR libstdc++/60976 * include/bits/alloc_traits.h (allocator_traits<allocator<_Tp>>): Define partial specialization. * testsuite/20_util/shared_ptr/cons/58659.cc: Add construct and destroy members to std::allocator explicit specialization. Modified: trunk/libstdc++-v3/ChangeLog trunk/libstdc++-v3/include/bits/alloc_traits.h trunk/libstdc++-v3/testsuite/20_util/shared_ptr/cons/58659.cc
I'm confirming this, as we definitely have slowed down (although I see less 2x difference for the attached testcase), but that doesn't mean we can fix it. I've committed a change on trunk that adds a partial specialization of std::allocator_traits<sd::allocator<T>> which helps a bit.
Author: redi Date: Thu Feb 11 13:30:27 2016 New Revision: 233343 URL: https://gcc.gnu.org/viewcvs?rev=233343&root=gcc&view=rev Log: allocator_traits<allocator<T>> partial specialization PR libstdc++/60976 * include/bits/alloc_traits.h (allocator_traits<allocator<_Tp>>): Define partial specialization. * testsuite/20_util/shared_ptr/cons/58659.cc: Add construct and destroy members to std::allocator explicit specialization. Modified: branches/gcc-5-branch/libstdc++-v3/ChangeLog branches/gcc-5-branch/libstdc++-v3/include/bits/alloc_traits.h branches/gcc-5-branch/libstdc++-v3/testsuite/20_util/shared_ptr/cons/58659.cc
Author: redi Date: Fri Mar 18 15:58:03 2016 New Revision: 234338 URL: https://gcc.gnu.org/viewcvs?rev=234338&root=gcc&view=rev Log: allocator_traits<allocator<T>> partial specialization PR libstdc++/60976 * include/bits/alloc_traits.h (allocator_traits<allocator<_Tp>>): Define partial specialization. * testsuite/20_util/shared_ptr/cons/58659.cc: Add construct and destroy members to std::allocator explicit specialization. Modified: branches/gcc-4_9-branch/libstdc++-v3/ChangeLog branches/gcc-4_9-branch/libstdc++-v3/include/bits/alloc_traits.h branches/gcc-4_9-branch/libstdc++-v3/testsuite/20_util/shared_ptr/cons/58659.cc
I've backported the std::allocator_traits<std::allocator<T>> partial specialization to the gcc-4.9 and gcc-5 branches now. Please let me know if this makes any difference for your use cases (and provide testcases to reproduce problems if not).
(In reply to Jonathan Wakely from comment #35) > I've backported the std::allocator_traits<std::allocator<T>> partial > specialization to the gcc-4.9 and gcc-5 branches now. Please let me know if > this makes any difference for your use cases (and provide testcases to > reproduce problems if not). I reported our relative build times with my version of the partial specialization a few comments above: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60976#c26 . Your partial specialization patch causes an ICE in our version of GCC 4.9: <include_dir>/bits/alloc_traits.h:464:56: internal compiler error: in retrieve_specialization, at cp/pt.c:1057 0x5615cf retrieve_specialization <src_dir>/gcc/cp/pt.c:1054 0x57809e tsubst_decl <src_dir>/gcc/cp/pt.c:11079 0x56a9b4 tsubst(tree_node*, tree_node*, int, tree_node*) <src_dir>/gcc/cp/pt.c:11559 0x570db4 instantiate_template_1 <src_dir>/gcc/cp/pt.c:15586 0x570db4 instantiate_template(tree_node*, tree_node*, int) <src_dir>/gcc/cp/pt.c:15636 0x56a918 instantiate_alias_template <src_dir>/gcc/cp/pt.c:15666 0x56a918 tsubst(tree_node*, tree_node*, int, tree_node*) <src_dir>/gcc/cp/pt.c:11586 0x56fef0 lookup_template_class_1 <src_dir>/gcc/cp/pt.c:7675 0x56fef0 lookup_template_class(tree_node*, tree_node*, tree_node*, tree_node*, int, int) <src_dir>/gcc/cp/pt.c:7901 0x541e43 make_typename_type(tree_node*, tree_node*, tag_types, int) <src_dir>/gcc/cp/decl.c:3459 0x56b49a tsubst(tree_node*, tree_node*, int, tree_node*) <src_dir>/gcc/cp/pt.c:12189 0x577ec1 tsubst_decl <src_dir>/gcc/cp/pt.c:11110 0x56a9b4 tsubst(tree_node*, tree_node*, int, tree_node*) <src_dir>/gcc/cp/pt.c:11559 0x566fd6 tsubst_expr <src_dir>/gcc/cp/pt.c:13509 0x566dbc tsubst_expr <src_dir>/gcc/cp/pt.c:13452 0x56705f tsubst_expr <src_dir>/gcc/cp/pt.c:13657 0x566dbc tsubst_expr <src_dir>/gcc/cp/pt.c:13452 0x56705f tsubst_expr <src_dir>/gcc/cp/pt.c:13657 0x57a7a4 instantiate_decl(tree_node*, int, bool) <src_dir>/gcc/cp/pt.c:19980 0x57ee0b instantiate_pending_templates(int) <src_dir>/gcc/cp/pt.c:20096 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions. Changing the definitions of rebind_alloc and rebind_traits to the following: template<typename _Tp1> using rebind_alloc = typename allocator_type::template rebind<_Tp1>::other; template<typename _Tp1> using rebind_traits = allocator_traits<rebind_alloc<_Tp1>>; fixes the ICE and matches the build time of my patch.
(In reply to Giuseppe Ottaviano from comment #36) > I reported our relative build times with my version of the partial > specialization a few comments above: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60976#c26 . Yep, thanks. I'm hoping Viacheslav or Rene can try it too. > Your partial specialization patch causes an ICE in our version of GCC 4.9: I think that ICE is fixed by r234337 on the 4.9 branch, which I committed right before the allocator_traits patch. Good to know that what I committed is no slower than your locally patched version, thanks for checking.
Can the bug be marked as resolved?