We're currently looking at nodejs taking a while longer (~8 minutes) to build with GCC than Clang. I'm going to report some other testcases later which have some different properties (and are far more pronounced) and will give more background there. This one feels kind of standalone and different to the others not yet reported. On trunk with --enable-checking=release: ``` $ time /tmp/bisect-gcc-pfx/bin/g++ -c v8_base_without_compiler.runtime-temporal.ii -O0 -march=znver1 -std=gnu++20 -ftime-report In file included from ../../deps/v8/src/objects/contexts.h:9, from ../../deps/v8/src/execution/thread-local-top.h:13, from ../../deps/v8/src/execution/isolate-data.h:12, from ../../deps/v8/src/execution/isolate.h:29, from ../../deps/v8/src/execution/isolate-inl.h:8, from ../../deps/v8/src/runtime/runtime-temporal.cc:5: ../../deps/v8/src/objects/fixed-array.h:615:41: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor] ../../deps/v8/src/objects/fixed-array.h:615:41: note: remove the '< >' ../../deps/v8/src/objects/fixed-array.h:615:212: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor] ../../deps/v8/src/objects/fixed-array.h:615:212: note: remove the '< >' ../../deps/v8/src/objects/fixed-array.h:672:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor] ../../deps/v8/src/objects/fixed-array.h:672:32: note: remove the '< >' ../../deps/v8/src/objects/fixed-array.h:672:194: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor] ../../deps/v8/src/objects/fixed-array.h:672:194: note: remove the '< >' Time variable usr sys wall GGC phase setup : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 2171k ( 0%) phase parsing : 5.11 ( 89%) 1.14 ( 93%) 6.28 ( 90%) 625M ( 89%) phase lang. deferred : 0.53 ( 9%) 0.08 ( 7%) 0.61 ( 9%) 66M ( 9%) phase opt and generate : 0.09 ( 2%) 0.00 ( 0%) 0.10 ( 1%) 9987k ( 1%) |name lookup : 1.04 ( 18%) 0.19 ( 16%) 1.25 ( 18%) 24M ( 3%) |overload resolution : 1.32 ( 23%) 0.22 ( 18%) 1.37 ( 20%) 101M ( 14%) garbage collection : 0.38 ( 7%) 0.00 ( 0%) 0.38 ( 5%) 0 ( 0%) callgraph construction : 0.08 ( 1%) 0.00 ( 0%) 0.07 ( 1%) 7630k ( 1%) callgraph ipa passes : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 647k ( 0%) trivially dead code : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 ( 0%) preprocessing : 0.19 ( 3%) 0.18 ( 15%) 0.34 ( 5%) 5702k ( 1%) parser (global) : 0.74 ( 13%) 0.28 ( 23%) 1.01 ( 14%) 128M ( 18%) parser struct body : 0.73 ( 13%) 0.14 ( 11%) 0.99 ( 14%) 98M ( 14%) parser enumerator list : 0.02 ( 0%) 0.00 ( 0%) 0.04 ( 1%) 2465k ( 0%) parser function body : 0.05 ( 1%) 0.02 ( 2%) 0.09 ( 1%) 5367k ( 1%) parser inl. func. body : 0.75 ( 13%) 0.12 ( 10%) 0.96 ( 14%) 68M ( 10%) parser inl. meth. body : 0.31 ( 5%) 0.08 ( 7%) 0.41 ( 6%) 40M ( 6%) template instantiation : 1.99 ( 35%) 0.39 ( 32%) 2.19 ( 31%) 340M ( 48%) constant expression evaluation : 0.48 ( 8%) 0.01 ( 1%) 0.46 ( 7%) 2297k ( 0%) tree operand scan : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 147k ( 0%) varconst : 0.00 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 58k ( 0%) integrated RA : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 866k ( 0%) final : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 60k ( 0%) TOTAL : 5.73 1.22 7.00 703M real 0m7.076s user 0m5.741s sys 0m1.298s $ time /tmp/bisect-gcc-pfx/bin/g++ -c v8_base_without_compiler.runtime-temporal.ii -O2 -march=znver1 -std=gnu++20 -ftime-report In file included from ../../deps/v8/src/objects/contexts.h:9, from ../../deps/v8/src/execution/thread-local-top.h:13, from ../../deps/v8/src/execution/isolate-data.h:12, from ../../deps/v8/src/execution/isolate.h:29, from ../../deps/v8/src/execution/isolate-inl.h:8, from ../../deps/v8/src/runtime/runtime-temporal.cc:5: ../../deps/v8/src/objects/fixed-array.h:615:41: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor] ../../deps/v8/src/objects/fixed-array.h:615:41: note: remove the '< >' ../../deps/v8/src/objects/fixed-array.h:615:212: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor] ../../deps/v8/src/objects/fixed-array.h:615:212: note: remove the '< >' ../../deps/v8/src/objects/fixed-array.h:672:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor] ../../deps/v8/src/objects/fixed-array.h:672:32: note: remove the '< >' ../../deps/v8/src/objects/fixed-array.h:672:194: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor] ../../deps/v8/src/objects/fixed-array.h:672:194: note: remove the '< >' Time variable usr sys wall GGC phase setup : 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 2171k ( 0%) phase parsing : 5.12 ( 89%) 1.09 ( 94%) 6.24 ( 90%) 628M ( 89%) phase lang. deferred : 0.53 ( 9%) 0.07 ( 6%) 0.60 ( 9%) 65M ( 9%) phase opt and generate : 0.10 ( 2%) 0.00 ( 0%) 0.10 ( 1%) 8840k ( 1%) |name lookup : 1.06 ( 18%) 0.14 ( 12%) 1.19 ( 17%) 24M ( 3%) |overload resolution : 1.47 ( 26%) 0.14 ( 12%) 1.50 ( 22%) 101M ( 14%) garbage collection : 0.38 ( 7%) 0.01 ( 1%) 0.38 ( 5%) 0 ( 0%) dump files : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) callgraph construction : 0.06 ( 1%) 0.00 ( 0%) 0.06 ( 1%) 7472k ( 1%) callgraph functions expansion : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 144k ( 0%) callgraph ipa passes : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 907k ( 0%) ipa inheritance graph : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 16k ( 0%) preprocessing : 0.17 ( 3%) 0.24 ( 21%) 0.37 ( 5%) 5702k ( 1%) parser (global) : 0.82 ( 14%) 0.21 ( 18%) 1.06 ( 15%) 128M ( 18%) parser struct body : 0.71 ( 12%) 0.12 ( 10%) 0.89 ( 13%) 98M ( 14%) parser enumerator list : 0.00 ( 0%) 0.00 ( 0%) 0.04 ( 1%) 2468k ( 0%) parser function body : 0.06 ( 1%) 0.03 ( 3%) 0.07 ( 1%) 5367k ( 1%) parser inl. func. body : 0.73 ( 13%) 0.10 ( 9%) 0.82 ( 12%) 68M ( 10%) parser inl. meth. body : 0.37 ( 6%) 0.08 ( 7%) 0.46 ( 7%) 40M ( 6%) template instantiation : 1.91 ( 33%) 0.36 ( 31%) 2.26 ( 33%) 340M ( 48%) constant expression evaluation : 0.48 ( 8%) 0.01 ( 1%) 0.47 ( 7%) 4293k ( 1%) dominator optimization : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 12k ( 0%) backwards jump threading : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 1936 ( 0%) dominance computation : 0.02 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 ( 0%) varconst : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 58k ( 0%) branch prediction : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 6704 ( 0%) symout : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) TOTAL : 5.75 1.16 6.94 704M real 0m7.027s user 0m5.755s sys 0m1.241s ``` All the time is in the FE and -O0 vs -O2 doesn't make any difference.
Created attachment 58866 [details] v8_base_without_compiler.runtime-temporal.ii.xz
push_to_top_level is about 5% and seems to do a lot of list walking of different scopes. Maybe a better data structure like a vector for the scopes would help. On my skylake it appears to be primarily Frontend Bound due to large code, so you might get a slight improvement by using a profile feedback built host compiler that does hot cold code splitting. 3+% is GC so you could get some boost by increasing the GC limits to GC less often. Try playing with --param ggc-min-expand and --param ggc-min-heapsize 0.94% of the cycles are iterative_hash, so you might get another slight improvement from https://github.com/andikleen/gcc/commits/rapidhash-1 which switches the hash function to something more modern (still looking for supporting data that it actually helps) But none of this will drastically cut the time, the profile is fairly flat. # Overhead Command Source Shared Object Source Symbol > # ........ ....... .................... ...........................................................................................................................................................> # 5.11% cc1plus cc1plus [.] push_to_top_level() > 2.71% cc1plus cc1plus [.] gt_ggc_mx_lang_tree_node(void*) > 1.00% cc1plus cc1plus [.] ggc_set_mark(void const*) > 0.94% cc1plus cc1plus [.] iterative_hash > 0.73% cc1plus cc1plus [.] fields_linear_search(tree_node*, tree_node*, bool) [clone .isra.0] > 0.72% cc1plus cc1plus [.] iterative_hash_template_arg(tree_node*, unsigned int) > 0.67% cc1plus cc1plus [.] ggc_internal_alloc(unsigned long, void (*)(void*), unsigned long, unsigned long) > 0.64% cc1plus cc1plus [.] gt_ggc_mx_lang_tree_node(void*) > 0.54% cc1plus cc1plus [.] ggc_set_mark(void const*) > 0.54% cc1plus cc1plus [.] fields_linear_search(tree_node*, tree_node*, bool) [clone .isra.0] > 0.51% cc1plus cc1plus [.] fields_linear_search(tree_node*, tree_node*, bool) [clone .isra.0] >