Bug 116285 - Compilation of nodejs/v8's v8_base_without_compiler.runtime-temporal.cc is slow
Summary: Compilation of nodejs/v8's v8_base_without_compiler.runtime-temporal.cc is slow
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 15.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: compile-time-hog
Depends on:
Blocks:
 
Reported: 2024-08-08 08:50 UTC by Sam James
Modified: 2024-08-13 18:11 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
v8_base_without_compiler.runtime-temporal.ii.xz (892.92 KB, application/x-xz)
2024-08-08 08:51 UTC, Sam James
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sam James 2024-08-08 08:50:26 UTC
We're currently looking at nodejs taking a while longer (~8 minutes) to build with GCC than Clang. I'm going to report some other testcases later which have some different properties (and are far more pronounced) and will give more background there.

This one feels kind of standalone and different to the others not yet reported.

On trunk with --enable-checking=release:
```
$ time /tmp/bisect-gcc-pfx/bin/g++ -c v8_base_without_compiler.runtime-temporal.ii -O0 -march=znver1 -std=gnu++20 -ftime-report
In file included from ../../deps/v8/src/objects/contexts.h:9,
                 from ../../deps/v8/src/execution/thread-local-top.h:13,
                 from ../../deps/v8/src/execution/isolate-data.h:12,
                 from ../../deps/v8/src/execution/isolate.h:29,
                 from ../../deps/v8/src/execution/isolate-inl.h:8,
                 from ../../deps/v8/src/runtime/runtime-temporal.cc:5:
../../deps/v8/src/objects/fixed-array.h:615:41: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
../../deps/v8/src/objects/fixed-array.h:615:41: note: remove the '< >'
../../deps/v8/src/objects/fixed-array.h:615:212: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
../../deps/v8/src/objects/fixed-array.h:615:212: note: remove the '< >'
../../deps/v8/src/objects/fixed-array.h:672:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
../../deps/v8/src/objects/fixed-array.h:672:32: note: remove the '< >'
../../deps/v8/src/objects/fixed-array.h:672:194: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
../../deps/v8/src/objects/fixed-array.h:672:194: note: remove the '< >'

Time variable                                   usr           sys          wall           GGC
 phase setup                        :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)  2171k (  0%)
 phase parsing                      :   5.11 ( 89%)   1.14 ( 93%)   6.28 ( 90%)   625M ( 89%)
 phase lang. deferred               :   0.53 (  9%)   0.08 (  7%)   0.61 (  9%)    66M (  9%)
 phase opt and generate             :   0.09 (  2%)   0.00 (  0%)   0.10 (  1%)  9987k (  1%)
 |name lookup                       :   1.04 ( 18%)   0.19 ( 16%)   1.25 ( 18%)    24M (  3%)
 |overload resolution               :   1.32 ( 23%)   0.22 ( 18%)   1.37 ( 20%)   101M ( 14%)
 garbage collection                 :   0.38 (  7%)   0.00 (  0%)   0.38 (  5%)     0  (  0%)
 callgraph construction             :   0.08 (  1%)   0.00 (  0%)   0.07 (  1%)  7630k (  1%)
 callgraph ipa passes               :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)   647k (  0%)
 trivially dead code                :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
 preprocessing                      :   0.19 (  3%)   0.18 ( 15%)   0.34 (  5%)  5702k (  1%)
 parser (global)                    :   0.74 ( 13%)   0.28 ( 23%)   1.01 ( 14%)   128M ( 18%)
 parser struct body                 :   0.73 ( 13%)   0.14 ( 11%)   0.99 ( 14%)    98M ( 14%)
 parser enumerator list             :   0.02 (  0%)   0.00 (  0%)   0.04 (  1%)  2465k (  0%)
 parser function body               :   0.05 (  1%)   0.02 (  2%)   0.09 (  1%)  5367k (  1%)
 parser inl. func. body             :   0.75 ( 13%)   0.12 ( 10%)   0.96 ( 14%)    68M ( 10%)
 parser inl. meth. body             :   0.31 (  5%)   0.08 (  7%)   0.41 (  6%)    40M (  6%)
 template instantiation             :   1.99 ( 35%)   0.39 ( 32%)   2.19 ( 31%)   340M ( 48%)
 constant expression evaluation     :   0.48 (  8%)   0.01 (  1%)   0.46 (  7%)  2297k (  0%)
 tree operand scan                  :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)   147k (  0%)
 varconst                           :   0.00 (  0%)   0.00 (  0%)   0.02 (  0%)    58k (  0%)
 integrated RA                      :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)   866k (  0%)
 final                              :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)    60k (  0%)
 TOTAL                              :   5.73          1.22          7.00          703M

real    0m7.076s
user    0m5.741s
sys     0m1.298s

$ time /tmp/bisect-gcc-pfx/bin/g++ -c v8_base_without_compiler.runtime-temporal.ii -O2 -march=znver1 -std=gnu++20 -ftime-report
In file included from ../../deps/v8/src/objects/contexts.h:9,
                 from ../../deps/v8/src/execution/thread-local-top.h:13,
                 from ../../deps/v8/src/execution/isolate-data.h:12,
                 from ../../deps/v8/src/execution/isolate.h:29,
                 from ../../deps/v8/src/execution/isolate-inl.h:8,
                 from ../../deps/v8/src/runtime/runtime-temporal.cc:5:
../../deps/v8/src/objects/fixed-array.h:615:41: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
../../deps/v8/src/objects/fixed-array.h:615:41: note: remove the '< >'
../../deps/v8/src/objects/fixed-array.h:615:212: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
../../deps/v8/src/objects/fixed-array.h:615:212: note: remove the '< >'
../../deps/v8/src/objects/fixed-array.h:672:32: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
../../deps/v8/src/objects/fixed-array.h:672:32: note: remove the '< >'
../../deps/v8/src/objects/fixed-array.h:672:194: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor]
../../deps/v8/src/objects/fixed-array.h:672:194: note: remove the '< >'

Time variable                                   usr           sys          wall           GGC
 phase setup                        :   0.00 (  0%)   0.00 (  0%)   0.00 (  0%)  2171k (  0%)
 phase parsing                      :   5.12 ( 89%)   1.09 ( 94%)   6.24 ( 90%)   628M ( 89%)
 phase lang. deferred               :   0.53 (  9%)   0.07 (  6%)   0.60 (  9%)    65M (  9%)
 phase opt and generate             :   0.10 (  2%)   0.00 (  0%)   0.10 (  1%)  8840k (  1%)
 |name lookup                       :   1.06 ( 18%)   0.14 ( 12%)   1.19 ( 17%)    24M (  3%)
 |overload resolution               :   1.47 ( 26%)   0.14 ( 12%)   1.50 ( 22%)   101M ( 14%)
 garbage collection                 :   0.38 (  7%)   0.01 (  1%)   0.38 (  5%)     0  (  0%)
 dump files                         :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 callgraph construction             :   0.06 (  1%)   0.00 (  0%)   0.06 (  1%)  7472k (  1%)
 callgraph functions expansion      :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)   144k (  0%)
 callgraph ipa passes               :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)   907k (  0%)
 ipa inheritance graph              :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)    16k (  0%)
 preprocessing                      :   0.17 (  3%)   0.24 ( 21%)   0.37 (  5%)  5702k (  1%)
 parser (global)                    :   0.82 ( 14%)   0.21 ( 18%)   1.06 ( 15%)   128M ( 18%)
 parser struct body                 :   0.71 ( 12%)   0.12 ( 10%)   0.89 ( 13%)    98M ( 14%)
 parser enumerator list             :   0.00 (  0%)   0.00 (  0%)   0.04 (  1%)  2468k (  0%)
 parser function body               :   0.06 (  1%)   0.03 (  3%)   0.07 (  1%)  5367k (  1%)
 parser inl. func. body             :   0.73 ( 13%)   0.10 (  9%)   0.82 ( 12%)    68M ( 10%)
 parser inl. meth. body             :   0.37 (  6%)   0.08 (  7%)   0.46 (  7%)    40M (  6%)
 template instantiation             :   1.91 ( 33%)   0.36 ( 31%)   2.26 ( 33%)   340M ( 48%)
 constant expression evaluation     :   0.48 (  8%)   0.01 (  1%)   0.47 (  7%)  4293k (  1%)
 dominator optimization             :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)    12k (  0%)
 backwards jump threading           :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)  1936  (  0%)
 dominance computation              :   0.02 (  0%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
 varconst                           :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)    58k (  0%)
 branch prediction                  :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)  6704  (  0%)
 symout                             :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 TOTAL                              :   5.75          1.16          6.94          704M

real    0m7.027s
user    0m5.755s
sys     0m1.241s
```

All the time is in the FE and -O0 vs -O2 doesn't make any difference.
Comment 1 Sam James 2024-08-08 08:51:14 UTC
Created attachment 58866 [details]
v8_base_without_compiler.runtime-temporal.ii.xz
Comment 2 Andi Kleen 2024-08-13 18:11:21 UTC
push_to_top_level is about 5% and seems to do a lot of list walking of different scopes. Maybe a better data structure like a vector for the scopes would help.

On my skylake it appears to be primarily Frontend Bound due to large code, so you might get a slight improvement by using a profile feedback built host compiler that does hot cold code splitting.

3+% is GC so you could get some boost by increasing the GC limits to GC less often.   Try playing with --param ggc-min-expand and --param ggc-min-heapsize

0.94% of the cycles are iterative_hash, so you might get another slight improvement from  https://github.com/andikleen/gcc/commits/rapidhash-1
which switches the hash function to something more modern
(still looking for supporting data that it actually helps)

But none of this will drastically cut the time, the profile is fairly flat.

# Overhead  Command  Source Shared Object  Source Symbol                                                                                                                                              >
# ........  .......  ....................  ...........................................................................................................................................................>
#
     5.11%  cc1plus  cc1plus               [.] push_to_top_level()                                                                                                                                    >     2.71%  cc1plus  cc1plus               [.] gt_ggc_mx_lang_tree_node(void*)                                                                                                                        >     1.00%  cc1plus  cc1plus               [.] ggc_set_mark(void const*)                                                                                                                              >     0.94%  cc1plus  cc1plus               [.] iterative_hash                                                                                                                                         >     0.73%  cc1plus  cc1plus               [.] fields_linear_search(tree_node*, tree_node*, bool) [clone .isra.0]                                                                                     >     0.72%  cc1plus  cc1plus               [.] iterative_hash_template_arg(tree_node*, unsigned int)                                                                                                  >     0.67%  cc1plus  cc1plus               [.] ggc_internal_alloc(unsigned long, void (*)(void*), unsigned long, unsigned long)                                                                       >     0.64%  cc1plus  cc1plus               [.] gt_ggc_mx_lang_tree_node(void*)                                                                                                                        >     0.54%  cc1plus  cc1plus               [.] ggc_set_mark(void const*)                                                                                                                              >     0.54%  cc1plus  cc1plus               [.] fields_linear_search(tree_node*, tree_node*, bool) [clone .isra.0]                                                                                     >     0.51%  cc1plus  cc1plus               [.] fields_linear_search(tree_node*, tree_node*, bool) [clone .isra.0]                                                                                     >