Bug 111619 - 'make profiledbootstrap' makes 10+ minutes on insn-recog.cc (x86_64-linux)
Summary: 'make profiledbootstrap' makes 10+ minutes on insn-recog.cc (x86_64-linux)
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 14.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: compile-time-hog
Depends on:
Blocks: 84402
  Show dependency treegraph
 
Reported: 2023-09-27 22:02 UTC by Sergei Trofimovich
Modified: 2024-12-18 10:57 UTC (History)
6 users (show)

See Also:
Host:
Target: x86_64-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sergei Trofimovich 2023-09-27 22:02:53 UTC
The reproducer on gcc from r14-4300-g1fab05a885a308:

$ ~/dev/git/gcc/configure --disable-multilib --enable-languages=c,c++
$ make profiledbootstrap

insn-recog.o takes ~13 min to build on `AMD Ryzen 9 5950X` CPU:

$ time /tmp/gb/./prev-gcc/cc1plus -quiet -nostdinc++ -I /tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu -I /tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/include -I /home/slyfox/dev/git/gcc/libstdc++-v3/libsupc++ -I . -I . -I /home/slyfox/dev/git/gcc/gcc -I /home/slyfox/dev/git/gcc/gcc/. -I /home/slyfox/dev/git/gcc/gcc/../include -I /home/slyfox/dev/git/gcc/gcc/../libcpp/include -I /home/slyfox/dev/git/gcc/gcc/../libcody -I /home/slyfox/dev/git/gcc/gcc/../libdecnumber -I /home/slyfox/dev/git/gcc/gcc/../libdecnumber/bid -I ../libdecnumber -I /home/slyfox/dev/git/gcc/gcc/../libbacktrace -iprefix /tmp/gb/prev-gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/ -isystem /tmp/gb/./prev-gcc/include -isystem /tmp/gb/./prev-gcc/include-fixed -MMD insn-recog.d -MF ./.deps/insn-recog.TPo -MP -MT insn-recog.o -D_GNU_SOURCE -D IN_GCC -D HAVE_CONFIG_H insn-recog.cc -quiet -dumpbase insn-recog.cc -dumpbase-ext .cc -mtune=generic -march=x86-64 -g -gtoggle -O2 -Wextra -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wsuggest-attribute=format -Wconditionally-supported -Woverloaded-virtual=2 -Wpedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-checking -fprofile-generate -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -fno-common -fno-PIE -o /run/user/1000/ccQK54tL.s

real    13m39,864s
user    13m38,263s
sys     0m0,823s

`insn-recog.cc` is 8.3MB.

$ ./prev-gcc/xgcc -Bprev-gcc -v
Reading specs from prev-gcc/specs
COLLECT_GCC=./prev-gcc/xgcc
COLLECT_LTO_WRAPPER=prev-gcc/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /home/slyfox/dev/git/gcc/configure --disable-multilib --enable-languages=c,c++
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 14.0.0 20230926 (experimental) (GCC)
Comment 1 Sergei Trofimovich 2023-09-27 22:17:47 UTC
-ftime-report breakdown:

time /tmp/gb/./prev-gcc/cc1plus -quiet -nostdinc++ -I /tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu -I /tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/include -I /home/slyfox/dev/git/gcc/libstdc++-v3/libsupc++ -I . -I . -I /home/slyfox/dev/git/gcc/gcc -I /home/slyfox/dev/git/gcc/gcc/. -I /home/slyfox/dev/git/gcc/gcc/../include -I /home/slyfox/dev/git/gcc/gcc/../libcpp/include -I /home/slyfox/dev/git/gcc/gcc/../libcody -I /home/slyfox/dev/git/gcc/gcc/../libdecnumber -I /home/slyfox/dev/git/gcc/gcc/../libdecnumber/bid -I ../libdecnumber -I /home/slyfox/dev/git/gcc/gcc/../libbacktrace -iprefix /tmp/gb/prev-gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/ -isystem /tmp/gb/./prev-gcc/include -isystem /tmp/gb/./prev-gcc/include-fixed -MMD insn-recog.d -MF ./.deps/insn-recog.TPo -MP -MT insn-recog.o -D_GNU_SOURCE -D IN_GCC -D HAVE_CONFIG_H insn-recog.cc -quiet -dumpbase insn-recog.cc -dumpbase-ext .cc -mtune=generic -march=x86-64 -g -gtoggle -O2 -Wextra -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wsuggest-attribute=format -Wconditionally-supported -Woverloaded-virtual=2 -Wpedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-checking -fprofile-generate -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -fno-common -fno-PIE -o /run/user/1000/ccQK54tL.s -ftime-report

Time variable                                   usr           sys          wall           GGC
 phase setup                        :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)  1892k (  0%)
 phase parsing                      :  22.49 (  3%)   1.58 ( 35%)  24.09 (  3%)   903M ( 22%)
 phase lang. deferred               :   0.06 (  0%)   0.01 (  0%)   0.07 (  0%)  2268k (  0%)
 phase opt and generate             : 791.23 ( 97%)   2.90 ( 65%) 794.84 ( 97%)  3111M ( 77%)
 |name lookup                       :   1.20 (  0%)   0.09 (  2%)   1.23 (  0%)  3296k (  0%)
 |overload resolution               :   3.40 (  0%)   0.18 (  4%)   3.69 (  0%)   107M (  3%)
 garbage collection                 :   5.82 (  1%)   0.08 (  2%)   5.86 (  1%)     0  (  0%)
 dump files                         :   0.24 (  0%)   0.00 (  0%)   0.15 (  0%)     0  (  0%)
 callgraph construction             :   4.41 (  1%)   0.14 (  3%)   4.74 (  1%)   329M (  8%)
 callgraph optimization             :   1.01 (  0%)   0.03 (  1%)   1.02 (  0%)  2938k (  0%)
 callgraph functions expansion      : 734.71 ( 90%)   2.08 ( 46%) 737.44 ( 90%)  2238M ( 56%)
 callgraph ipa passes               :  50.35 (  6%)   0.71 ( 16%)  51.10 (  6%)   437M ( 11%)
 ipa function summary               :   1.89 (  0%)   0.00 (  0%)   1.90 (  0%)  5969k (  0%)
 ipa dead code removal              :   0.22 (  0%)   0.00 (  0%)   0.22 (  0%)     0  (  0%)
 ipa devirtualization               :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
 ipa cp                             :   0.55 (  0%)   0.00 (  0%)   0.56 (  0%)  3831k (  0%)
 ipa inlining heuristics            :   0.57 (  0%)   0.03 (  1%)   0.46 (  0%)    20M (  1%)
 ipa comdats                        :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 ipa reference                      :   0.03 (  0%)   0.00 (  0%)   0.03 (  0%)     0  (  0%)
 ipa profile                        :   5.98 (  1%)   0.07 (  2%)   6.11 (  1%)   108M (  3%)
 ipa pure const                     :   0.57 (  0%)   0.01 (  0%)   0.55 (  0%)  1080  (  0%)
 ipa icf                            :   1.37 (  0%)   0.00 (  0%)   1.37 (  0%)    45k (  0%)
 ipa SRA                            :   4.22 (  1%)   0.01 (  0%)   4.27 (  1%)  6213k (  0%)
 ipa free lang data                 :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
 ipa free inline summary            :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 ipa modref                         :   1.33 (  0%)   0.00 (  0%)   1.33 (  0%)  1893k (  0%)
 cfg construction                   :   0.19 (  0%)   0.00 (  0%)   0.13 (  0%)    12M (  0%)
 cfg cleanup                        :   3.35 (  0%)   0.00 (  0%)   3.71 (  0%)  9974k (  0%)
 trivially dead code                :   0.90 (  0%)   0.01 (  0%)   0.77 (  0%)     0  (  0%)
 df scan insns                      :   1.45 (  0%)   0.00 (  0%)   1.39 (  0%)    95k (  0%)
 df reaching defs                   :   1.79 (  0%)   0.00 (  0%)   1.83 (  0%)     0  (  0%)
 df live regs                       :   6.03 (  1%)   0.01 (  0%)   5.78 (  1%)     0  (  0%)
 df live&initialized regs           :   2.55 (  0%)   0.00 (  0%)   2.49 (  0%)     0  (  0%)
 df must-initialized regs           :   0.19 (  0%)   0.00 (  0%)   0.20 (  0%)     0  (  0%)
 df use-def / def-use chains        :   1.13 (  0%)   0.00 (  0%)   1.05 (  0%)     0  (  0%)
 df reg dead/unused notes           :   2.89 (  0%)   0.01 (  0%)   2.79 (  0%)    34M (  1%)
 register information               :   0.45 (  0%)   0.00 (  0%)   0.48 (  0%)     0  (  0%)
 alias analysis                     :   3.00 (  0%)   0.00 (  0%)   2.98 (  0%)   199M (  5%)
 alias stmt walking                 :  28.80 (  4%)   0.42 (  9%)  29.61 (  4%)    74k (  0%)
 register scan                      :   0.50 (  0%)   0.00 (  0%)   0.40 (  0%)  3338k (  0%)
 rebuild jump labels                :   0.46 (  0%)   0.00 (  0%)   0.42 (  0%)  1632  (  0%)
 preprocessing                      :   1.37 (  0%)   0.55 ( 12%)   1.78 (  0%)   129M (  3%)
 parser (global)                    :   1.11 (  0%)   0.28 (  6%)   1.46 (  0%)   166M (  4%)
 parser struct body                 :   0.19 (  0%)   0.01 (  0%)   0.20 (  0%)  5944k (  0%)
 parser enumerator list             :   0.07 (  0%)   0.01 (  0%)   0.08 (  0%)  4065k (  0%)
 parser function body               :  17.07 (  2%)   0.61 ( 14%)  17.82 (  2%)   558M ( 14%)
 parser inl. func. body             :   0.34 (  0%)   0.02 (  0%)   0.31 (  0%) 10198k (  0%)
 parser inl. meth. body             :   0.09 (  0%)   0.01 (  0%)   0.09 (  0%)  3790k (  0%)
 template instantiation             :   0.64 (  0%)   0.07 (  2%)   0.75 (  0%)    15M (  0%)
 constant expression evaluation     :   0.77 (  0%)   0.03 (  1%)   0.77 (  0%)    11M (  0%)
 early inlining heuristics          :   0.13 (  0%)   0.00 (  0%)   0.12 (  0%)  5073k (  0%)
 inline parameters                  :   2.43 (  0%)   0.00 (  0%)   2.25 (  0%)    13M (  0%)
 integration                        :   0.79 (  0%)   0.03 (  1%)   0.71 (  0%)    72M (  2%)
 tree gimplify                      :   3.46 (  0%)   0.04 (  1%)   3.51 (  0%)   209M (  5%)
 tree eh                            :   0.02 (  0%)   0.00 (  0%)   0.04 (  0%)   182k (  0%)
 tree CFG construction              :   0.99 (  0%)   0.03 (  1%)   1.06 (  0%)    83M (  2%)
 tree CFG cleanup                   :   8.09 (  1%)   0.09 (  2%)   8.39 (  1%)  4098k (  0%)
 tree tail merge                    :   0.92 (  0%)   0.00 (  0%)   0.98 (  0%)  2247k (  0%)
 tree VRP                           :  19.01 (  2%)   0.05 (  1%)  18.92 (  2%)    11M (  0%)
 tree Early VRP                     :   5.70 (  1%)   0.02 (  0%)   5.76 (  1%)  3714k (  0%)
 tree copy propagation              :   1.64 (  0%)   0.02 (  0%)   1.44 (  0%)    13k (  0%)
 tree PTA                           :   9.13 (  1%)   0.02 (  0%)   8.79 (  1%)    15M (  0%)
 tree SSA other                     :   0.00 (  0%)   0.01 (  0%)   0.01 (  0%)    35k (  0%)
 tree SSA rewrite                   :   0.71 (  0%)   0.09 (  2%)   0.64 (  0%)    42M (  1%)
 tree SSA incremental               :   2.74 (  0%)   0.02 (  0%)   2.62 (  0%)    50M (  1%)
 tree operand scan                  :   1.65 (  0%)   0.09 (  2%)   1.39 (  0%)    99M (  2%)
 dominator optimization             :  28.19 (  3%)   0.23 (  5%)  28.75 (  4%)    75M (  2%)
 backwards jump threading           :   3.83 (  0%)   0.05 (  1%)   3.95 (  0%)    18M (  0%)
 tree SRA                           :   0.09 (  0%)   0.01 (  0%)   0.14 (  0%)  1656k (  0%)
 isolate eroneous paths             :   0.19 (  0%)   0.00 (  0%)   0.23 (  0%)     0  (  0%)
 tree CCP                           :  13.48 (  2%)   0.04 (  1%)  13.51 (  2%)    43M (  1%)
 tree split crit edges              :   0.12 (  0%)   0.00 (  0%)   0.10 (  0%)    12M (  0%)
 tree reassociation                 :   0.50 (  0%)   0.00 (  0%)   0.52 (  0%)   164k (  0%)
 tree PRE                           :  11.72 (  1%)   0.12 (  3%)  12.08 (  1%)    75M (  2%)
 tree FRE                           :  18.87 (  2%)   0.21 (  5%)  18.32 (  2%)    68M (  2%)
 tree code sinking                  :   1.08 (  0%)   0.00 (  0%)   1.09 (  0%)    17M (  0%)
 tree linearize phis                :   0.40 (  0%)   0.00 (  0%)   0.56 (  0%)  2328k (  0%)
 tree backward propagate            :   0.13 (  0%)   0.01 (  0%)   0.10 (  0%)     0  (  0%)
 tree forward propagate             :   7.79 (  1%)   0.17 (  4%)   7.91 (  1%)  9665k (  0%)
 tree phiprop                       :   0.06 (  0%)   0.00 (  0%)   0.08 (  0%)    12k (  0%)
 tree conservative DCE              :   1.66 (  0%)   0.02 (  0%)   1.74 (  0%)   301k (  0%)
 tree aggressive DCE                :   0.83 (  0%)   0.04 (  1%)   1.04 (  0%) 10184k (  0%)
 tree buildin call DCE              :   0.11 (  0%)   0.00 (  0%)   0.06 (  0%)     0  (  0%)
 tree DSE                           :   5.12 (  1%)   0.00 (  0%)   4.97 (  1%)  1229k (  0%)
 PHI merge                          :   0.18 (  0%)   0.00 (  0%)   0.21 (  0%)    10M (  0%)
 tree slp vectorization             :   6.18 (  1%)   0.01 (  0%)   6.27 (  1%)   101M (  3%)
 tree SSA uncprop                   :   0.30 (  0%)   0.00 (  0%)   0.33 (  0%)     0  (  0%)
 tree NRV optimization              :   0.02 (  0%)   0.00 (  0%)   0.01 (  0%)    47k (  0%)
 tree switch conversion             :   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)    40k (  0%)
 tree switch lowering               :   0.10 (  0%)   0.00 (  0%)   0.17 (  0%)  7090k (  0%)
 gimple CSE sin/cos                 :   0.02 (  0%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
 gimple expand pow/cabs             :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)     0  (  0%)
 gimple widening/fma detection      :   0.10 (  0%)   0.00 (  0%)   0.10 (  0%)     0  (  0%)
 tree strlen optimization           :   0.29 (  0%)   0.00 (  0%)   0.35 (  0%)  2144k (  0%)
 tree modref                        :   2.05 (  0%)   0.00 (  0%)   1.84 (  0%)  2098k (  0%)
 dominance frontiers                :   0.21 (  0%)   0.00 (  0%)   0.17 (  0%)     0  (  0%)
 dominance computation              :   3.26 (  0%)   0.04 (  1%)   3.83 (  0%)     0  (  0%)
 control dependences                :   0.11 (  0%)   0.00 (  0%)   0.07 (  0%)     0  (  0%)
 out of ssa                         :   1.35 (  0%)   0.02 (  0%)   1.24 (  0%)  1511k (  0%)
 expand vars                        :   0.39 (  0%)   0.00 (  0%)   0.49 (  0%)    38M (  1%)
 expand                             :   8.79 (  1%)   0.09 (  2%)   8.70 (  1%)   438M ( 11%)
 post expand cleanups               :   0.42 (  0%)   0.00 (  0%)   0.37 (  0%)  6047k (  0%)
 varconst                           :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)    22k (  0%)
 lower subreg                       :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 forward prop                       :   7.96 (  1%)   0.01 (  0%)   8.12 (  1%)    13M (  0%)
 CSE                                :   9.63 (  1%)   0.01 (  0%)   9.91 (  1%)    52M (  1%)
 dead code elimination              :   0.70 (  0%)   0.00 (  0%)   0.50 (  0%)     0  (  0%)
 dead store elim1                   :   2.00 (  0%)   0.00 (  0%)   1.99 (  0%)    28M (  1%)
 dead store elim2                   :   1.75 (  0%)   0.00 (  0%)   1.87 (  0%)    22M (  1%)
 loop analysis                      :   0.00 (  0%)   0.00 (  0%)   0.02 (  0%)     0  (  0%)
 loop init                          :   1.90 (  0%)   0.02 (  0%)   2.57 (  0%)    10M (  0%)
 loop fini                          :   0.07 (  0%)   0.00 (  0%)   0.07 (  0%)     0  (  0%)
 CPROP                              :   3.54 (  0%)   0.02 (  0%)   3.36 (  0%)    38M (  1%)
 PRE                                : 444.33 ( 55%)   0.19 (  4%) 444.86 ( 54%)   216k (  0%)
 CSE 2                              :   6.32 (  1%)   0.00 (  0%)   6.31 (  1%)  6346k (  0%)
 branch prediction                  :   0.71 (  0%)   0.00 (  0%)   0.79 (  0%)   264k (  0%)
 combiner                           :  10.33 (  1%)   0.00 (  0%)  10.45 (  1%)    90M (  2%)
 if-conversion                      :   0.40 (  0%)   0.00 (  0%)   0.38 (  0%)  2073k (  0%)
 integrated RA                      :  10.95 (  1%)   0.03 (  1%)  10.87 (  1%)   398M ( 10%)
 LRA non-specific                   :   2.38 (  0%)   0.03 (  1%)   2.35 (  0%)  2272k (  0%)
 LRA virtuals elimination           :   0.30 (  0%)   0.00 (  0%)   0.29 (  0%)    68k (  0%)
 LRA reload inheritance             :   0.28 (  0%)   0.00 (  0%)   0.42 (  0%)    58k (  0%)
 LRA create live ranges             :   0.65 (  0%)   0.00 (  0%)   0.76 (  0%)    40k (  0%)
 LRA hard reg assignment            :   0.11 (  0%)   0.00 (  0%)   0.18 (  0%)     0  (  0%)
 LRA rematerialization              :   0.02 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 reload                             :   0.06 (  0%)   0.00 (  0%)   0.05 (  0%)   155k (  0%)
 reload CSE regs                    :   4.29 (  1%)   0.03 (  1%)   4.59 (  1%)    42M (  1%)
 ree                                :   0.18 (  0%)   0.00 (  0%)   0.26 (  0%)    38k (  0%)
 thread pro- & epilogue             :   1.05 (  0%)   0.00 (  0%)   1.03 (  0%)  4436k (  0%)
 if-conversion 2                    :   0.21 (  0%)   0.00 (  0%)   0.31 (  0%)   443k (  0%)
 combine stack adjustments          :   0.22 (  0%)   0.00 (  0%)   0.19 (  0%)     0  (  0%)
 peephole 2                         :   0.55 (  0%)   0.00 (  0%)   0.60 (  0%)  2144k (  0%)
 hard reg cprop                     :   0.74 (  0%)   0.00 (  0%)   0.66 (  0%)  5376  (  0%)
 scheduling 2                       :   8.68 (  1%)   0.01 (  0%)   8.90 (  1%)    10M (  0%)
 machine dep reorg                  :   0.75 (  0%)   0.01 (  0%)   0.68 (  0%)     0  (  0%)
 reorder blocks                     :   1.06 (  0%)   0.01 (  0%)   1.05 (  0%)    26M (  1%)
 shorten branches                   :   0.94 (  0%)   0.01 (  0%)   0.96 (  0%)     0  (  0%)
 reg stack                          :   0.02 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 final                              :   1.48 (  0%)   0.03 (  1%)   1.34 (  0%)    47M (  1%)
 variable output                    :   0.07 (  0%)   0.00 (  0%)   0.07 (  0%)   609k (  0%)
 symout                             :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 tree if-combine                    :   0.07 (  0%)   0.00 (  0%)   0.06 (  0%)     0  (  0%)
 if to switch conversion            :   0.17 (  0%)   0.00 (  0%)   0.18 (  0%)  5480  (  0%)
 uninit var analysis                :   0.37 (  0%)   0.00 (  0%)   0.57 (  0%)     0  (  0%)
 straight-line strength reduction   :   1.44 (  0%)   0.00 (  0%)   1.19 (  0%)  1285k (  0%)
 store merging                      :   0.24 (  0%)   0.00 (  0%)   0.20 (  0%)  1171k (  0%)
 initialize rtl                     :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)    12k (  0%)
 address lowering                   :   0.04 (  0%)   0.00 (  0%)   0.01 (  0%)    30k (  0%)
 access analysis                    :   1.20 (  0%)   0.03 (  1%)   1.11 (  0%)    64k (  0%)
 early local passes                 :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 rest of compilation                :   3.87 (  0%)   0.03 (  1%)   4.34 (  1%)    11M (  0%)
 remove unused locals               :   1.01 (  0%)   0.02 (  0%)   0.91 (  0%)     0  (  0%)
 address taken                      :   1.36 (  0%)   0.01 (  0%)   1.31 (  0%)     0  (  0%)
 rebuild frequencies                :   0.36 (  0%)   0.01 (  0%)   0.41 (  0%)  7536  (  0%)
 repair loop structures             :   0.03 (  0%)   0.00 (  0%)   0.02 (  0%)     0  (  0%)
 TOTAL                              : 813.78          4.49        819.01         4018M
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.

real    13m39,057s
user    13m33,790s
sys     0m4,533s
Comment 2 Andrew Pinski 2023-09-27 22:26:42 UTC
Can you also try with --enable-checking=release to double check that it is not the extra compile time checks which is causing issues ...
Comment 3 Andrew Pinski 2023-09-27 22:27:26 UTC
Note prev-gcc/cc1plus is compiled at -O0 also which definitely makes things worse here.
Comment 4 Sergei Trofimovich 2023-09-28 05:18:28 UTC
(In reply to Andrew Pinski from comment #2)
> Can you also try with --enable-checking=release to double check that it is
> not the extra compile time checks which is causing issues ...

Added --enable-checking=release:

$ /tmp/gb/./prev-gcc/xg++ -B/tmp/gb/./prev-gcc/ -v
Reading specs from /tmp/gb/./prev-gcc/specs
COLLECT_GCC=/tmp/gb/./prev-gcc/xg++
COLLECT_LTO_WRAPPER=/tmp/gb/./prev-gcc/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /home/slyfox/dev/git/gcc/configure --disable-multilib --enable-languages=c,c++ --enable-checking=release
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 14.0.0 20230926 (experimental) (GCC)

Result did not change much:

$ time /tmp/gb/./prev-gcc/xg++ -B/tmp/gb/./prev-gcc/ -B/usr/local/x86_64-pc-linux-gnu/bin/ -nostdinc++ -B/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -B/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -I/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu -I/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/include -I/home/slyfox/dev/git/gcc/libstdc++-v3/libsupc++ -L/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -L/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -fno-PIE -c -g -O2 -fno-checking -gtoggle -fprofile-generate -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -DHAVE_CONFIG_H -fno-PIE -I. -I. -I/home/slyfox/dev/git/gcc/gcc -I/home/slyfox/dev/git/gcc/gcc/. -I/home/slyfox/dev/git/gcc/gcc/../include -I/home/slyfox/dev/git/gcc/gcc/../libcpp/include -I/home/slyfox/dev/git/gcc/gcc/../libcody -I/home/slyfox/dev/git/gcc/gcc/../libdecnumber -I/home/slyfox/dev/git/gcc/gcc/../libdecnumber/bid -I../libdecnumber -I/home/slyfox/dev/git/gcc/gcc/../libbacktrace -o insn-recog.o -MT insn-recog.o -MMD -MP -MF ./.deps/insn-recog.TPo insn-recog.cc

real    12m18,994s
user    12m17,085s
sys     0m1,001s
Comment 5 Sergei Trofimovich 2023-09-28 05:35:25 UTC
(In reply to Andrew Pinski from comment #3)
> Note prev-gcc/cc1plus is compiled at -O0 also which definitely makes things
> worse here.

Also tried with: '--enable-checking=release -O2 -g' as:

$ ~/dev/git/gcc/configure --disable-multilib --enable-languages=c,c++ --enable-checking=release 'CC=gcc -g -O2' 'CXX=g++ -g -O2'

$ /tmp/gb/./prev-gcc/xg++ -B/tmp/gb/./prev-gcc/ -v
Reading specs from /tmp/gb/./prev-gcc/specs
COLLECT_GCC=/tmp/gb/./prev-gcc/xg++
COLLECT_LTO_WRAPPER=/tmp/gb/./prev-gcc/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /home/slyfox/dev/git/gcc/configure --disable-multilib --enable-languages=c,c++ --enable-checking=release CC='gcc -g -O2' CXX='g++ -g -O2'
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 14.0.0 20230926 (experimental) (GCC)

Result is a lot better: 1m55s:

$ time /tmp/gb/./prev-gcc/xg++ -B/tmp/gb/./prev-gcc/ -B/usr/local/x86_64-pc-linux-gnu/bin/ -nostdinc++ -B/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -B/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -I/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu -I/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/include -I/home/slyfox/dev/git/gcc/libstdc++-v3/libsupc++ -L/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -L/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -fno-PIE -c -g -O2 -fno-checking -gtoggle -fprofile-generate -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -DHAVE_CONFIG_H -fno-PIE -I. -I. -I/home/slyfox/dev/git/gcc/gcc -I/home/slyfox/dev/git/gcc/gcc/. -I/home/slyfox/dev/git/gcc/gcc/../include -I/home/slyfox/dev/git/gcc/gcc/../libcpp/include -I/home/slyfox/dev/git/gcc/gcc/../libcody -I/home/slyfox/dev/git/gcc/gcc/../libdecnumber -I/home/slyfox/dev/git/gcc/gcc/../libdecnumber/bid -I../libdecnumber -I/home/slyfox/dev/git/gcc/gcc/../libbacktrace -o insn-recog.o -MT insn-recog.o -MMD -MP -MF ./.deps/insn-recog.TPo insn-recog.cc

real    1m55,334s
user    1m54,146s
sys     0m0,993s
Comment 6 Sergei Trofimovich 2023-09-28 05:46:19 UTC
And here is fomr completeness default checking with CC='gcc -g -O2' CXX='g++ -g -O2':

$ ~/dev/git/gcc/configure --disable-multilib --enable-languages=c,c++ 'CC=gcc -g -O2' 'CXX=g++ -g -O2'

$ /tmp/gb/./prev-gcc/xg++ -B/tmp/gb/./prev-gcc/ -v
Reading specs from /tmp/gb/./prev-gcc/specs
COLLECT_GCC=/tmp/gb/./prev-gcc/xg++
COLLECT_LTO_WRAPPER=/tmp/gb/./prev-gcc/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /home/slyfox/dev/git/gcc/configure --disable-multilib --enable-languages=c,c++ CC='gcc -g -O2' CXX='g++ -g -O2'
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 14.0.0 20230926 (experimental) (GCC)

Result is 1m57s:

$ time /tmp/gb/./prev-gcc/xg++ -B/tmp/gb/./prev-gcc/ -B/usr/local/x86_64-pc-linux-gnu/bin/ -nostdinc++ -B/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -B/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -I/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu -I/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/include -I/home/slyfox/dev/git/gcc/libstdc++-v3/libsupc++ -L/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -L/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -fno-PIE -c -g -O2 -fno-checking -gtoggle -fprofile-generate -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -fno-PIE -I. -I. -I/home/slyfox/dev/git/gcc/gcc -I/home/slyfox/dev/git/gcc/gcc/. -I/home/slyfox/dev/git/gcc/gcc/../include -I/home/slyfox/dev/git/gcc/gcc/../libcpp/include -I/home/slyfox/dev/git/gcc/gcc/../libcody -I/home/slyfox/dev/git/gcc/gcc/../libdecnumber -I/home/slyfox/dev/git/gcc/gcc/../libdecnumber/bid -I../libdecnumber -I/home/slyfox/dev/git/gcc/gcc/../libbacktrace -o insn-recog.o -MT insn-recog.o -MMD -MP -MF ./.deps/insn-recog.TPo insn-recog.cc

real    1m57,549s
user    1m56,617s
sys     0m0,780s
Comment 7 Andrew Pinski 2023-09-28 05:49:06 UTC
I am not sure there is not much to be done here really since the issue is profilingbootstrap will use -O0 for stage1 to make sure we don't run into bugs in host compiler (though we still run into issues there).
Comment 8 Sergei Trofimovich 2023-09-28 05:50:36 UTC
Looks like it's mainly -O0.

Why not try to use at least -O1 for bootstrap? Perhaps it was a safe default to workaround host compiler bugs in C days.

But nowadays gcc uses -std=c++11 with quite a bit of abstractions to remove at -O0. Maybe having a disableable -O1 (or even default -O2) would be a better default?
Comment 9 Sam James 2023-09-28 06:08:15 UTC
See also the discussion in https://inbox.sourceware.org/gcc-patches/41109217-1bf5-b112-e783-8040196fd410@suse.cz/.
Comment 10 Richard Biener 2023-09-28 06:48:02 UTC
 PRE                                : 444.33 ( 55%)   0.19 (  4%) 444.86 ( 54%)   216k (  0%)

There's a few other bugs about RTL PRE being slow (it's usually the dataflow
parts implemented with big sbitmaps and expression hashes).  I've tried a
few times to get my hands into that but it's somewhat difficult.

For profile-generate we have a lot more memory ops which is what likely
kills us here (lots of bitmap iteration with disabled inlining/optimization).

Is this really a regression in GCC 14?

Note we are already using -fno-checking for building stage2.
Comment 11 Sergei Trofimovich 2023-09-28 07:39:38 UTC
Tried releases/gcc-12 branch and it's twice as bad: 20 minutes. Removing `Regression` from the subject.

$ time /tmp/gb/./prev-gcc/xg++ -B/tmp/gb/./prev-gcc/ -B/usr/local/x86_64-pc-linux-gnu/bin/ -nostdinc++ -B/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -B/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -I/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu -I/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/include -I/home/slyfox/dev/git/gcc/libstdc++-v3/libsupc++ -L/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -L/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -fno-PIE -c -g -O2 -fno-checking -gtoggle -fprofile-generate -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -DHAVE_CONFIG_H -I. -I. -I/home/slyfox/dev/git/gcc/gcc -I/home/slyfox/dev/git/gcc/gcc/. -I/home/slyfox/dev/git/gcc/gcc/../include -I/home/slyfox/dev/git/gcc/gcc/../libcpp/include -I/home/slyfox/dev/git/gcc/gcc/../libcody -I/home/slyfox/dev/git/gcc/gcc/../libdecnumber -I/home/slyfox/dev/git/gcc/gcc/../libdecnumber/bid -I../libdecnumber -I/home/slyfox/dev/git/gcc/gcc/../libbacktrace -o insn-recog.o -MT insn-recog.o -MMD -MP -MF ./.deps/insn-recog.TPo insn-recog.cc

real    21m24,065s
user    21m21,966s
sys     0m1,135s
Comment 12 Richard Biener 2023-09-28 08:30:49 UTC
Yeah, ISTR I fixed the dataflow processing order in PREs LCM (r14-131-ga322f37a57bc16), maybe that helped.
Comment 13 Richard Biener 2024-08-19 12:01:38 UTC
Note I also said in this commit:

"The LCM iteration has very many other issues ..."

but I don't exactly remember what I stumbled upon.  Re-profiling (a release checking, properly optimized compiler!) will point to bitmap stuff but the
actual iteration performing those is the interesting bit to look at.
Comment 14 Robin Dapp 2024-12-18 07:34:37 UTC
Is this still an issue with the insn-recog split?
Comment 15 Sergei Trofimovich 2024-12-18 10:57:11 UTC
It's a lot better now: worst files take about 2-3 minutes, like insn-recog-3.cc. Not ideal, but at least CPU load is mostly even across all the cores when gcc is built (no more 10-min long tails on single files).

Let's declare it FIXED.