The reproducer on gcc from r14-4300-g1fab05a885a308: $ ~/dev/git/gcc/configure --disable-multilib --enable-languages=c,c++ $ make profiledbootstrap insn-recog.o takes ~13 min to build on `AMD Ryzen 9 5950X` CPU: $ time /tmp/gb/./prev-gcc/cc1plus -quiet -nostdinc++ -I /tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu -I /tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/include -I /home/slyfox/dev/git/gcc/libstdc++-v3/libsupc++ -I . -I . -I /home/slyfox/dev/git/gcc/gcc -I /home/slyfox/dev/git/gcc/gcc/. -I /home/slyfox/dev/git/gcc/gcc/../include -I /home/slyfox/dev/git/gcc/gcc/../libcpp/include -I /home/slyfox/dev/git/gcc/gcc/../libcody -I /home/slyfox/dev/git/gcc/gcc/../libdecnumber -I /home/slyfox/dev/git/gcc/gcc/../libdecnumber/bid -I ../libdecnumber -I /home/slyfox/dev/git/gcc/gcc/../libbacktrace -iprefix /tmp/gb/prev-gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/ -isystem /tmp/gb/./prev-gcc/include -isystem /tmp/gb/./prev-gcc/include-fixed -MMD insn-recog.d -MF ./.deps/insn-recog.TPo -MP -MT insn-recog.o -D_GNU_SOURCE -D IN_GCC -D HAVE_CONFIG_H insn-recog.cc -quiet -dumpbase insn-recog.cc -dumpbase-ext .cc -mtune=generic -march=x86-64 -g -gtoggle -O2 -Wextra -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wsuggest-attribute=format -Wconditionally-supported -Woverloaded-virtual=2 -Wpedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-checking -fprofile-generate -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -fno-common -fno-PIE -o /run/user/1000/ccQK54tL.s real 13m39,864s user 13m38,263s sys 0m0,823s `insn-recog.cc` is 8.3MB. $ ./prev-gcc/xgcc -Bprev-gcc -v Reading specs from prev-gcc/specs COLLECT_GCC=./prev-gcc/xgcc COLLECT_LTO_WRAPPER=prev-gcc/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /home/slyfox/dev/git/gcc/configure --disable-multilib --enable-languages=c,c++ Thread model: posix Supported LTO compression algorithms: zlib gcc version 14.0.0 20230926 (experimental) (GCC)
-ftime-report breakdown: time /tmp/gb/./prev-gcc/cc1plus -quiet -nostdinc++ -I /tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu -I /tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/include -I /home/slyfox/dev/git/gcc/libstdc++-v3/libsupc++ -I . -I . -I /home/slyfox/dev/git/gcc/gcc -I /home/slyfox/dev/git/gcc/gcc/. -I /home/slyfox/dev/git/gcc/gcc/../include -I /home/slyfox/dev/git/gcc/gcc/../libcpp/include -I /home/slyfox/dev/git/gcc/gcc/../libcody -I /home/slyfox/dev/git/gcc/gcc/../libdecnumber -I /home/slyfox/dev/git/gcc/gcc/../libdecnumber/bid -I ../libdecnumber -I /home/slyfox/dev/git/gcc/gcc/../libbacktrace -iprefix /tmp/gb/prev-gcc/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/ -isystem /tmp/gb/./prev-gcc/include -isystem /tmp/gb/./prev-gcc/include-fixed -MMD insn-recog.d -MF ./.deps/insn-recog.TPo -MP -MT insn-recog.o -D_GNU_SOURCE -D IN_GCC -D HAVE_CONFIG_H insn-recog.cc -quiet -dumpbase insn-recog.cc -dumpbase-ext .cc -mtune=generic -march=x86-64 -g -gtoggle -O2 -Wextra -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wsuggest-attribute=format -Wconditionally-supported -Woverloaded-virtual=2 -Wpedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-checking -fprofile-generate -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -fno-common -fno-PIE -o /run/user/1000/ccQK54tL.s -ftime-report Time variable usr sys wall GGC phase setup : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 1892k ( 0%) phase parsing : 22.49 ( 3%) 1.58 ( 35%) 24.09 ( 3%) 903M ( 22%) phase lang. deferred : 0.06 ( 0%) 0.01 ( 0%) 0.07 ( 0%) 2268k ( 0%) phase opt and generate : 791.23 ( 97%) 2.90 ( 65%) 794.84 ( 97%) 3111M ( 77%) |name lookup : 1.20 ( 0%) 0.09 ( 2%) 1.23 ( 0%) 3296k ( 0%) |overload resolution : 3.40 ( 0%) 0.18 ( 4%) 3.69 ( 0%) 107M ( 3%) garbage collection : 5.82 ( 1%) 0.08 ( 2%) 5.86 ( 1%) 0 ( 0%) dump files : 0.24 ( 0%) 0.00 ( 0%) 0.15 ( 0%) 0 ( 0%) callgraph construction : 4.41 ( 1%) 0.14 ( 3%) 4.74 ( 1%) 329M ( 8%) callgraph optimization : 1.01 ( 0%) 0.03 ( 1%) 1.02 ( 0%) 2938k ( 0%) callgraph functions expansion : 734.71 ( 90%) 2.08 ( 46%) 737.44 ( 90%) 2238M ( 56%) callgraph ipa passes : 50.35 ( 6%) 0.71 ( 16%) 51.10 ( 6%) 437M ( 11%) ipa function summary : 1.89 ( 0%) 0.00 ( 0%) 1.90 ( 0%) 5969k ( 0%) ipa dead code removal : 0.22 ( 0%) 0.00 ( 0%) 0.22 ( 0%) 0 ( 0%) ipa devirtualization : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 ( 0%) ipa cp : 0.55 ( 0%) 0.00 ( 0%) 0.56 ( 0%) 3831k ( 0%) ipa inlining heuristics : 0.57 ( 0%) 0.03 ( 1%) 0.46 ( 0%) 20M ( 1%) ipa comdats : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) ipa reference : 0.03 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 0 ( 0%) ipa profile : 5.98 ( 1%) 0.07 ( 2%) 6.11 ( 1%) 108M ( 3%) ipa pure const : 0.57 ( 0%) 0.01 ( 0%) 0.55 ( 0%) 1080 ( 0%) ipa icf : 1.37 ( 0%) 0.00 ( 0%) 1.37 ( 0%) 45k ( 0%) ipa SRA : 4.22 ( 1%) 0.01 ( 0%) 4.27 ( 1%) 6213k ( 0%) ipa free lang data : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 ( 0%) ipa free inline summary : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) ipa modref : 1.33 ( 0%) 0.00 ( 0%) 1.33 ( 0%) 1893k ( 0%) cfg construction : 0.19 ( 0%) 0.00 ( 0%) 0.13 ( 0%) 12M ( 0%) cfg cleanup : 3.35 ( 0%) 0.00 ( 0%) 3.71 ( 0%) 9974k ( 0%) trivially dead code : 0.90 ( 0%) 0.01 ( 0%) 0.77 ( 0%) 0 ( 0%) df scan insns : 1.45 ( 0%) 0.00 ( 0%) 1.39 ( 0%) 95k ( 0%) df reaching defs : 1.79 ( 0%) 0.00 ( 0%) 1.83 ( 0%) 0 ( 0%) df live regs : 6.03 ( 1%) 0.01 ( 0%) 5.78 ( 1%) 0 ( 0%) df live&initialized regs : 2.55 ( 0%) 0.00 ( 0%) 2.49 ( 0%) 0 ( 0%) df must-initialized regs : 0.19 ( 0%) 0.00 ( 0%) 0.20 ( 0%) 0 ( 0%) df use-def / def-use chains : 1.13 ( 0%) 0.00 ( 0%) 1.05 ( 0%) 0 ( 0%) df reg dead/unused notes : 2.89 ( 0%) 0.01 ( 0%) 2.79 ( 0%) 34M ( 1%) register information : 0.45 ( 0%) 0.00 ( 0%) 0.48 ( 0%) 0 ( 0%) alias analysis : 3.00 ( 0%) 0.00 ( 0%) 2.98 ( 0%) 199M ( 5%) alias stmt walking : 28.80 ( 4%) 0.42 ( 9%) 29.61 ( 4%) 74k ( 0%) register scan : 0.50 ( 0%) 0.00 ( 0%) 0.40 ( 0%) 3338k ( 0%) rebuild jump labels : 0.46 ( 0%) 0.00 ( 0%) 0.42 ( 0%) 1632 ( 0%) preprocessing : 1.37 ( 0%) 0.55 ( 12%) 1.78 ( 0%) 129M ( 3%) parser (global) : 1.11 ( 0%) 0.28 ( 6%) 1.46 ( 0%) 166M ( 4%) parser struct body : 0.19 ( 0%) 0.01 ( 0%) 0.20 ( 0%) 5944k ( 0%) parser enumerator list : 0.07 ( 0%) 0.01 ( 0%) 0.08 ( 0%) 4065k ( 0%) parser function body : 17.07 ( 2%) 0.61 ( 14%) 17.82 ( 2%) 558M ( 14%) parser inl. func. body : 0.34 ( 0%) 0.02 ( 0%) 0.31 ( 0%) 10198k ( 0%) parser inl. meth. body : 0.09 ( 0%) 0.01 ( 0%) 0.09 ( 0%) 3790k ( 0%) template instantiation : 0.64 ( 0%) 0.07 ( 2%) 0.75 ( 0%) 15M ( 0%) constant expression evaluation : 0.77 ( 0%) 0.03 ( 1%) 0.77 ( 0%) 11M ( 0%) early inlining heuristics : 0.13 ( 0%) 0.00 ( 0%) 0.12 ( 0%) 5073k ( 0%) inline parameters : 2.43 ( 0%) 0.00 ( 0%) 2.25 ( 0%) 13M ( 0%) integration : 0.79 ( 0%) 0.03 ( 1%) 0.71 ( 0%) 72M ( 2%) tree gimplify : 3.46 ( 0%) 0.04 ( 1%) 3.51 ( 0%) 209M ( 5%) tree eh : 0.02 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 182k ( 0%) tree CFG construction : 0.99 ( 0%) 0.03 ( 1%) 1.06 ( 0%) 83M ( 2%) tree CFG cleanup : 8.09 ( 1%) 0.09 ( 2%) 8.39 ( 1%) 4098k ( 0%) tree tail merge : 0.92 ( 0%) 0.00 ( 0%) 0.98 ( 0%) 2247k ( 0%) tree VRP : 19.01 ( 2%) 0.05 ( 1%) 18.92 ( 2%) 11M ( 0%) tree Early VRP : 5.70 ( 1%) 0.02 ( 0%) 5.76 ( 1%) 3714k ( 0%) tree copy propagation : 1.64 ( 0%) 0.02 ( 0%) 1.44 ( 0%) 13k ( 0%) tree PTA : 9.13 ( 1%) 0.02 ( 0%) 8.79 ( 1%) 15M ( 0%) tree SSA other : 0.00 ( 0%) 0.01 ( 0%) 0.01 ( 0%) 35k ( 0%) tree SSA rewrite : 0.71 ( 0%) 0.09 ( 2%) 0.64 ( 0%) 42M ( 1%) tree SSA incremental : 2.74 ( 0%) 0.02 ( 0%) 2.62 ( 0%) 50M ( 1%) tree operand scan : 1.65 ( 0%) 0.09 ( 2%) 1.39 ( 0%) 99M ( 2%) dominator optimization : 28.19 ( 3%) 0.23 ( 5%) 28.75 ( 4%) 75M ( 2%) backwards jump threading : 3.83 ( 0%) 0.05 ( 1%) 3.95 ( 0%) 18M ( 0%) tree SRA : 0.09 ( 0%) 0.01 ( 0%) 0.14 ( 0%) 1656k ( 0%) isolate eroneous paths : 0.19 ( 0%) 0.00 ( 0%) 0.23 ( 0%) 0 ( 0%) tree CCP : 13.48 ( 2%) 0.04 ( 1%) 13.51 ( 2%) 43M ( 1%) tree split crit edges : 0.12 ( 0%) 0.00 ( 0%) 0.10 ( 0%) 12M ( 0%) tree reassociation : 0.50 ( 0%) 0.00 ( 0%) 0.52 ( 0%) 164k ( 0%) tree PRE : 11.72 ( 1%) 0.12 ( 3%) 12.08 ( 1%) 75M ( 2%) tree FRE : 18.87 ( 2%) 0.21 ( 5%) 18.32 ( 2%) 68M ( 2%) tree code sinking : 1.08 ( 0%) 0.00 ( 0%) 1.09 ( 0%) 17M ( 0%) tree linearize phis : 0.40 ( 0%) 0.00 ( 0%) 0.56 ( 0%) 2328k ( 0%) tree backward propagate : 0.13 ( 0%) 0.01 ( 0%) 0.10 ( 0%) 0 ( 0%) tree forward propagate : 7.79 ( 1%) 0.17 ( 4%) 7.91 ( 1%) 9665k ( 0%) tree phiprop : 0.06 ( 0%) 0.00 ( 0%) 0.08 ( 0%) 12k ( 0%) tree conservative DCE : 1.66 ( 0%) 0.02 ( 0%) 1.74 ( 0%) 301k ( 0%) tree aggressive DCE : 0.83 ( 0%) 0.04 ( 1%) 1.04 ( 0%) 10184k ( 0%) tree buildin call DCE : 0.11 ( 0%) 0.00 ( 0%) 0.06 ( 0%) 0 ( 0%) tree DSE : 5.12 ( 1%) 0.00 ( 0%) 4.97 ( 1%) 1229k ( 0%) PHI merge : 0.18 ( 0%) 0.00 ( 0%) 0.21 ( 0%) 10M ( 0%) tree slp vectorization : 6.18 ( 1%) 0.01 ( 0%) 6.27 ( 1%) 101M ( 3%) tree SSA uncprop : 0.30 ( 0%) 0.00 ( 0%) 0.33 ( 0%) 0 ( 0%) tree NRV optimization : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 47k ( 0%) tree switch conversion : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 40k ( 0%) tree switch lowering : 0.10 ( 0%) 0.00 ( 0%) 0.17 ( 0%) 7090k ( 0%) gimple CSE sin/cos : 0.02 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 ( 0%) gimple expand pow/cabs : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 ( 0%) gimple widening/fma detection : 0.10 ( 0%) 0.00 ( 0%) 0.10 ( 0%) 0 ( 0%) tree strlen optimization : 0.29 ( 0%) 0.00 ( 0%) 0.35 ( 0%) 2144k ( 0%) tree modref : 2.05 ( 0%) 0.00 ( 0%) 1.84 ( 0%) 2098k ( 0%) dominance frontiers : 0.21 ( 0%) 0.00 ( 0%) 0.17 ( 0%) 0 ( 0%) dominance computation : 3.26 ( 0%) 0.04 ( 1%) 3.83 ( 0%) 0 ( 0%) control dependences : 0.11 ( 0%) 0.00 ( 0%) 0.07 ( 0%) 0 ( 0%) out of ssa : 1.35 ( 0%) 0.02 ( 0%) 1.24 ( 0%) 1511k ( 0%) expand vars : 0.39 ( 0%) 0.00 ( 0%) 0.49 ( 0%) 38M ( 1%) expand : 8.79 ( 1%) 0.09 ( 2%) 8.70 ( 1%) 438M ( 11%) post expand cleanups : 0.42 ( 0%) 0.00 ( 0%) 0.37 ( 0%) 6047k ( 0%) varconst : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 22k ( 0%) lower subreg : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) forward prop : 7.96 ( 1%) 0.01 ( 0%) 8.12 ( 1%) 13M ( 0%) CSE : 9.63 ( 1%) 0.01 ( 0%) 9.91 ( 1%) 52M ( 1%) dead code elimination : 0.70 ( 0%) 0.00 ( 0%) 0.50 ( 0%) 0 ( 0%) dead store elim1 : 2.00 ( 0%) 0.00 ( 0%) 1.99 ( 0%) 28M ( 1%) dead store elim2 : 1.75 ( 0%) 0.00 ( 0%) 1.87 ( 0%) 22M ( 1%) loop analysis : 0.00 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 ( 0%) loop init : 1.90 ( 0%) 0.02 ( 0%) 2.57 ( 0%) 10M ( 0%) loop fini : 0.07 ( 0%) 0.00 ( 0%) 0.07 ( 0%) 0 ( 0%) CPROP : 3.54 ( 0%) 0.02 ( 0%) 3.36 ( 0%) 38M ( 1%) PRE : 444.33 ( 55%) 0.19 ( 4%) 444.86 ( 54%) 216k ( 0%) CSE 2 : 6.32 ( 1%) 0.00 ( 0%) 6.31 ( 1%) 6346k ( 0%) branch prediction : 0.71 ( 0%) 0.00 ( 0%) 0.79 ( 0%) 264k ( 0%) combiner : 10.33 ( 1%) 0.00 ( 0%) 10.45 ( 1%) 90M ( 2%) if-conversion : 0.40 ( 0%) 0.00 ( 0%) 0.38 ( 0%) 2073k ( 0%) integrated RA : 10.95 ( 1%) 0.03 ( 1%) 10.87 ( 1%) 398M ( 10%) LRA non-specific : 2.38 ( 0%) 0.03 ( 1%) 2.35 ( 0%) 2272k ( 0%) LRA virtuals elimination : 0.30 ( 0%) 0.00 ( 0%) 0.29 ( 0%) 68k ( 0%) LRA reload inheritance : 0.28 ( 0%) 0.00 ( 0%) 0.42 ( 0%) 58k ( 0%) LRA create live ranges : 0.65 ( 0%) 0.00 ( 0%) 0.76 ( 0%) 40k ( 0%) LRA hard reg assignment : 0.11 ( 0%) 0.00 ( 0%) 0.18 ( 0%) 0 ( 0%) LRA rematerialization : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) reload : 0.06 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 155k ( 0%) reload CSE regs : 4.29 ( 1%) 0.03 ( 1%) 4.59 ( 1%) 42M ( 1%) ree : 0.18 ( 0%) 0.00 ( 0%) 0.26 ( 0%) 38k ( 0%) thread pro- & epilogue : 1.05 ( 0%) 0.00 ( 0%) 1.03 ( 0%) 4436k ( 0%) if-conversion 2 : 0.21 ( 0%) 0.00 ( 0%) 0.31 ( 0%) 443k ( 0%) combine stack adjustments : 0.22 ( 0%) 0.00 ( 0%) 0.19 ( 0%) 0 ( 0%) peephole 2 : 0.55 ( 0%) 0.00 ( 0%) 0.60 ( 0%) 2144k ( 0%) hard reg cprop : 0.74 ( 0%) 0.00 ( 0%) 0.66 ( 0%) 5376 ( 0%) scheduling 2 : 8.68 ( 1%) 0.01 ( 0%) 8.90 ( 1%) 10M ( 0%) machine dep reorg : 0.75 ( 0%) 0.01 ( 0%) 0.68 ( 0%) 0 ( 0%) reorder blocks : 1.06 ( 0%) 0.01 ( 0%) 1.05 ( 0%) 26M ( 1%) shorten branches : 0.94 ( 0%) 0.01 ( 0%) 0.96 ( 0%) 0 ( 0%) reg stack : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) final : 1.48 ( 0%) 0.03 ( 1%) 1.34 ( 0%) 47M ( 1%) variable output : 0.07 ( 0%) 0.00 ( 0%) 0.07 ( 0%) 609k ( 0%) symout : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) tree if-combine : 0.07 ( 0%) 0.00 ( 0%) 0.06 ( 0%) 0 ( 0%) if to switch conversion : 0.17 ( 0%) 0.00 ( 0%) 0.18 ( 0%) 5480 ( 0%) uninit var analysis : 0.37 ( 0%) 0.00 ( 0%) 0.57 ( 0%) 0 ( 0%) straight-line strength reduction : 1.44 ( 0%) 0.00 ( 0%) 1.19 ( 0%) 1285k ( 0%) store merging : 0.24 ( 0%) 0.00 ( 0%) 0.20 ( 0%) 1171k ( 0%) initialize rtl : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 12k ( 0%) address lowering : 0.04 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 30k ( 0%) access analysis : 1.20 ( 0%) 0.03 ( 1%) 1.11 ( 0%) 64k ( 0%) early local passes : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) rest of compilation : 3.87 ( 0%) 0.03 ( 1%) 4.34 ( 1%) 11M ( 0%) remove unused locals : 1.01 ( 0%) 0.02 ( 0%) 0.91 ( 0%) 0 ( 0%) address taken : 1.36 ( 0%) 0.01 ( 0%) 1.31 ( 0%) 0 ( 0%) rebuild frequencies : 0.36 ( 0%) 0.01 ( 0%) 0.41 ( 0%) 7536 ( 0%) repair loop structures : 0.03 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 ( 0%) TOTAL : 813.78 4.49 819.01 4018M Extra diagnostic checks enabled; compiler may run slowly. Configure with --enable-checking=release to disable checks. real 13m39,057s user 13m33,790s sys 0m4,533s
Can you also try with --enable-checking=release to double check that it is not the extra compile time checks which is causing issues ...
Note prev-gcc/cc1plus is compiled at -O0 also which definitely makes things worse here.
(In reply to Andrew Pinski from comment #2) > Can you also try with --enable-checking=release to double check that it is > not the extra compile time checks which is causing issues ... Added --enable-checking=release: $ /tmp/gb/./prev-gcc/xg++ -B/tmp/gb/./prev-gcc/ -v Reading specs from /tmp/gb/./prev-gcc/specs COLLECT_GCC=/tmp/gb/./prev-gcc/xg++ COLLECT_LTO_WRAPPER=/tmp/gb/./prev-gcc/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /home/slyfox/dev/git/gcc/configure --disable-multilib --enable-languages=c,c++ --enable-checking=release Thread model: posix Supported LTO compression algorithms: zlib gcc version 14.0.0 20230926 (experimental) (GCC) Result did not change much: $ time /tmp/gb/./prev-gcc/xg++ -B/tmp/gb/./prev-gcc/ -B/usr/local/x86_64-pc-linux-gnu/bin/ -nostdinc++ -B/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -B/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -I/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu -I/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/include -I/home/slyfox/dev/git/gcc/libstdc++-v3/libsupc++ -L/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -L/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -fno-PIE -c -g -O2 -fno-checking -gtoggle -fprofile-generate -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -DHAVE_CONFIG_H -fno-PIE -I. -I. -I/home/slyfox/dev/git/gcc/gcc -I/home/slyfox/dev/git/gcc/gcc/. -I/home/slyfox/dev/git/gcc/gcc/../include -I/home/slyfox/dev/git/gcc/gcc/../libcpp/include -I/home/slyfox/dev/git/gcc/gcc/../libcody -I/home/slyfox/dev/git/gcc/gcc/../libdecnumber -I/home/slyfox/dev/git/gcc/gcc/../libdecnumber/bid -I../libdecnumber -I/home/slyfox/dev/git/gcc/gcc/../libbacktrace -o insn-recog.o -MT insn-recog.o -MMD -MP -MF ./.deps/insn-recog.TPo insn-recog.cc real 12m18,994s user 12m17,085s sys 0m1,001s
(In reply to Andrew Pinski from comment #3) > Note prev-gcc/cc1plus is compiled at -O0 also which definitely makes things > worse here. Also tried with: '--enable-checking=release -O2 -g' as: $ ~/dev/git/gcc/configure --disable-multilib --enable-languages=c,c++ --enable-checking=release 'CC=gcc -g -O2' 'CXX=g++ -g -O2' $ /tmp/gb/./prev-gcc/xg++ -B/tmp/gb/./prev-gcc/ -v Reading specs from /tmp/gb/./prev-gcc/specs COLLECT_GCC=/tmp/gb/./prev-gcc/xg++ COLLECT_LTO_WRAPPER=/tmp/gb/./prev-gcc/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /home/slyfox/dev/git/gcc/configure --disable-multilib --enable-languages=c,c++ --enable-checking=release CC='gcc -g -O2' CXX='g++ -g -O2' Thread model: posix Supported LTO compression algorithms: zlib gcc version 14.0.0 20230926 (experimental) (GCC) Result is a lot better: 1m55s: $ time /tmp/gb/./prev-gcc/xg++ -B/tmp/gb/./prev-gcc/ -B/usr/local/x86_64-pc-linux-gnu/bin/ -nostdinc++ -B/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -B/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -I/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu -I/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/include -I/home/slyfox/dev/git/gcc/libstdc++-v3/libsupc++ -L/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -L/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -fno-PIE -c -g -O2 -fno-checking -gtoggle -fprofile-generate -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -DHAVE_CONFIG_H -fno-PIE -I. -I. -I/home/slyfox/dev/git/gcc/gcc -I/home/slyfox/dev/git/gcc/gcc/. -I/home/slyfox/dev/git/gcc/gcc/../include -I/home/slyfox/dev/git/gcc/gcc/../libcpp/include -I/home/slyfox/dev/git/gcc/gcc/../libcody -I/home/slyfox/dev/git/gcc/gcc/../libdecnumber -I/home/slyfox/dev/git/gcc/gcc/../libdecnumber/bid -I../libdecnumber -I/home/slyfox/dev/git/gcc/gcc/../libbacktrace -o insn-recog.o -MT insn-recog.o -MMD -MP -MF ./.deps/insn-recog.TPo insn-recog.cc real 1m55,334s user 1m54,146s sys 0m0,993s
And here is fomr completeness default checking with CC='gcc -g -O2' CXX='g++ -g -O2': $ ~/dev/git/gcc/configure --disable-multilib --enable-languages=c,c++ 'CC=gcc -g -O2' 'CXX=g++ -g -O2' $ /tmp/gb/./prev-gcc/xg++ -B/tmp/gb/./prev-gcc/ -v Reading specs from /tmp/gb/./prev-gcc/specs COLLECT_GCC=/tmp/gb/./prev-gcc/xg++ COLLECT_LTO_WRAPPER=/tmp/gb/./prev-gcc/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /home/slyfox/dev/git/gcc/configure --disable-multilib --enable-languages=c,c++ CC='gcc -g -O2' CXX='g++ -g -O2' Thread model: posix Supported LTO compression algorithms: zlib gcc version 14.0.0 20230926 (experimental) (GCC) Result is 1m57s: $ time /tmp/gb/./prev-gcc/xg++ -B/tmp/gb/./prev-gcc/ -B/usr/local/x86_64-pc-linux-gnu/bin/ -nostdinc++ -B/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -B/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -I/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu -I/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/include -I/home/slyfox/dev/git/gcc/libstdc++-v3/libsupc++ -L/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -L/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -fno-PIE -c -g -O2 -fno-checking -gtoggle -fprofile-generate -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -fno-PIE -I. -I. -I/home/slyfox/dev/git/gcc/gcc -I/home/slyfox/dev/git/gcc/gcc/. -I/home/slyfox/dev/git/gcc/gcc/../include -I/home/slyfox/dev/git/gcc/gcc/../libcpp/include -I/home/slyfox/dev/git/gcc/gcc/../libcody -I/home/slyfox/dev/git/gcc/gcc/../libdecnumber -I/home/slyfox/dev/git/gcc/gcc/../libdecnumber/bid -I../libdecnumber -I/home/slyfox/dev/git/gcc/gcc/../libbacktrace -o insn-recog.o -MT insn-recog.o -MMD -MP -MF ./.deps/insn-recog.TPo insn-recog.cc real 1m57,549s user 1m56,617s sys 0m0,780s
I am not sure there is not much to be done here really since the issue is profilingbootstrap will use -O0 for stage1 to make sure we don't run into bugs in host compiler (though we still run into issues there).
Looks like it's mainly -O0. Why not try to use at least -O1 for bootstrap? Perhaps it was a safe default to workaround host compiler bugs in C days. But nowadays gcc uses -std=c++11 with quite a bit of abstractions to remove at -O0. Maybe having a disableable -O1 (or even default -O2) would be a better default?
See also the discussion in https://inbox.sourceware.org/gcc-patches/41109217-1bf5-b112-e783-8040196fd410@suse.cz/.
PRE : 444.33 ( 55%) 0.19 ( 4%) 444.86 ( 54%) 216k ( 0%) There's a few other bugs about RTL PRE being slow (it's usually the dataflow parts implemented with big sbitmaps and expression hashes). I've tried a few times to get my hands into that but it's somewhat difficult. For profile-generate we have a lot more memory ops which is what likely kills us here (lots of bitmap iteration with disabled inlining/optimization). Is this really a regression in GCC 14? Note we are already using -fno-checking for building stage2.
Tried releases/gcc-12 branch and it's twice as bad: 20 minutes. Removing `Regression` from the subject. $ time /tmp/gb/./prev-gcc/xg++ -B/tmp/gb/./prev-gcc/ -B/usr/local/x86_64-pc-linux-gnu/bin/ -nostdinc++ -B/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -B/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -I/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu -I/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/include -I/home/slyfox/dev/git/gcc/libstdc++-v3/libsupc++ -L/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -L/tmp/gb/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -fno-PIE -c -g -O2 -fno-checking -gtoggle -fprofile-generate -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -DHAVE_CONFIG_H -I. -I. -I/home/slyfox/dev/git/gcc/gcc -I/home/slyfox/dev/git/gcc/gcc/. -I/home/slyfox/dev/git/gcc/gcc/../include -I/home/slyfox/dev/git/gcc/gcc/../libcpp/include -I/home/slyfox/dev/git/gcc/gcc/../libcody -I/home/slyfox/dev/git/gcc/gcc/../libdecnumber -I/home/slyfox/dev/git/gcc/gcc/../libdecnumber/bid -I../libdecnumber -I/home/slyfox/dev/git/gcc/gcc/../libbacktrace -o insn-recog.o -MT insn-recog.o -MMD -MP -MF ./.deps/insn-recog.TPo insn-recog.cc real 21m24,065s user 21m21,966s sys 0m1,135s
Yeah, ISTR I fixed the dataflow processing order in PREs LCM (r14-131-ga322f37a57bc16), maybe that helped.
Note I also said in this commit: "The LCM iteration has very many other issues ..." but I don't exactly remember what I stumbled upon. Re-profiling (a release checking, properly optimized compiler!) will point to bitmap stuff but the actual iteration performing those is the interesting bit to look at.
Is this still an issue with the insn-recog split?
It's a lot better now: worst files take about 2-3 minutes, like insn-recog-3.cc. Not ideal, but at least CPU load is mostly even across all the cores when gcc is built (no more 10-min long tails on single files). Let's declare it FIXED.