While looking for files which take a long time to compile, I remembered that I had a fold- const.i for a cross compiler to powerpc64-linux-gnu and that took about a minute to finish to build (this was back in june). So I decided to see if compile time had increased or decreased and it looks like it increased but it looks like memory usage had been decreased though. [zhivago2:~/src/testspeed] pinskia% time ~/fsf-clean-nocheck/bin/gcc -O3 fold-const.i - ftime-report -S Execution times (seconds) garbage collection : 2.38 ( 2%) usr 0.06 ( 0%) sys 2.51 ( 2%) wall callgraph construction: 0.27 ( 0%) usr 0.03 ( 0%) sys 0.31 ( 0%) wall cfg construction : 0.46 ( 0%) usr 0.17 ( 1%) sys 0.65 ( 1%) wall cfg cleanup : 0.80 ( 1%) usr 0.03 ( 0%) sys 0.83 ( 1%) wall trivially dead code : 2.16 ( 2%) usr 0.06 ( 0%) sys 2.39 ( 2%) wall life analysis : 2.23 ( 2%) usr 0.36 ( 2%) sys 2.74 ( 2%) wall life info update : 2.76 ( 3%) usr 0.01 ( 0%) sys 2.89 ( 3%) wall alias analysis : 2.17 ( 2%) usr 0.07 ( 0%) sys 2.23 ( 2%) wall register scan : 1.38 ( 1%) usr 0.02 ( 0%) sys 1.44 ( 1%) wall rebuild jump labels : 0.55 ( 1%) usr 0.00 ( 0%) sys 0.59 ( 1%) wall preprocessing : 0.64 ( 1%) usr 1.68 (11%) sys 2.44 ( 2%) wall lexical analysis : 1.18 ( 1%) usr 3.80 (25%) sys 5.23 ( 5%) wall parser : 1.26 ( 1%) usr 1.94 (13%) sys 3.18 ( 3%) wall expand : 1.13 ( 1%) usr 0.03 ( 0%) sys 1.20 ( 1%) wall varconst : 0.00 ( 0%) usr 0.03 ( 0%) sys 0.03 ( 0%) wall integration : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.21 ( 0%) wall jump : 1.89 ( 2%) usr 0.20 ( 1%) sys 2.05 ( 2%) wall CSE : 4.46 ( 5%) usr 0.08 ( 1%) sys 4.55 ( 4%) wall global CSE : 18.95 (20%) usr 1.96 (13%) sys 21.54 (19%) wall loop analysis : 0.66 ( 1%) usr 0.20 ( 1%) sys 0.90 ( 1%) wall bypass jumps : 0.99 ( 1%) usr 0.26 ( 2%) sys 1.24 ( 1%) wall web : 1.59 ( 2%) usr 0.07 ( 0%) sys 1.65 ( 1%) wall CSE 2 : 2.12 ( 2%) usr 0.03 ( 0%) sys 2.25 ( 2%) wall branch prediction : 0.58 ( 1%) usr 0.03 ( 0%) sys 0.61 ( 1%) wall flow analysis : 0.08 ( 0%) usr 0.01 ( 0%) sys 0.10 ( 0%) wall combiner : 5.61 ( 6%) usr 0.04 ( 0%) sys 5.74 ( 5%) wall if-conversion : 0.20 ( 0%) usr 0.02 ( 0%) sys 0.23 ( 0%) wall regmove : 0.38 ( 0%) usr 0.00 ( 0%) sys 0.38 ( 0%) wall scheduling : 9.00 ( 9%) usr 3.16 (21%) sys 12.43 (11%) wall local alloc : 21.82 (23%) usr 0.17 ( 1%) sys 22.41 (20%) wall global alloc : 2.74 ( 3%) usr 0.31 ( 2%) sys 3.09 ( 3%) wall reload CSE regs : 1.75 ( 2%) usr 0.02 ( 0%) sys 1.82 ( 2%) wall flow 2 : 0.20 ( 0%) usr 0.03 ( 0%) sys 0.20 ( 0%) wall if-conversion 2 : 0.12 ( 0%) usr 0.02 ( 0%) sys 0.13 ( 0%) wall rename registers : 0.95 ( 1%) usr 0.07 ( 0%) sys 1.11 ( 1%) wall scheduling 2 : 1.14 ( 1%) usr 0.07 ( 0%) sys 1.26 ( 1%) wall reorder blocks : 0.14 ( 0%) usr 0.01 ( 0%) sys 0.13 ( 0%) wall shorten branches : 0.19 ( 0%) usr 0.02 ( 0%) sys 0.19 ( 0%) wall final : 0.55 ( 1%) usr 0.02 ( 0%) sys 0.61 ( 1%) wall symout : 0.00 ( 0%) usr 0.06 ( 0%) sys 0.06 ( 0%) wall rest of compilation : 0.89 ( 1%) usr 0.10 ( 1%) sys 1.03 ( 1%) wall TOTAL : 96.60 15.30 114.68 96.610u 15.360s 1:55.56 96.8% 0+0k 1+6io 0pf+0w [zhivago2:~/src/testspeed] pinskia% time ~/gcc-3.3/bin/gcc -O3 fold-const.i -ftime-report - S Execution times (seconds) garbage collection : 2.67 ( 4%) usr 0.07 ( 0%) sys 2.75 ( 3%) wall cfg construction : 0.36 ( 1%) usr 0.09 ( 1%) sys 0.50 ( 1%) wall cfg cleanup : 0.51 ( 1%) usr 0.02 ( 0%) sys 0.50 ( 1%) wall trivially dead code : 2.18 ( 3%) usr 0.02 ( 0%) sys 2.38 ( 3%) wall life analysis : 2.65 ( 4%) usr 0.61 ( 4%) sys 3.00 ( 3%) wall life info update : 2.13 ( 3%) usr 0.53 ( 4%) sys 3.12 ( 4%) wall preprocessing : 0.74 ( 1%) usr 1.87 (12%) sys 2.38 ( 3%) wall lexical analysis : 1.31 ( 2%) usr 3.45 (23%) sys 5.88 ( 7%) wall parser : 1.18 ( 2%) usr 2.00 (13%) sys 2.38 ( 3%) wall expand : 0.80 ( 1%) usr 0.12 ( 1%) sys 1.12 ( 1%) wall varconst : 0.01 ( 0%) usr 0.03 ( 0%) sys 0.12 ( 0%) wall integration : 0.13 ( 0%) usr 0.02 ( 0%) sys 0.12 ( 0%) wall jump : 1.96 ( 3%) usr 0.18 ( 1%) sys 2.50 ( 3%) wall CSE : 5.08 ( 7%) usr 0.77 ( 5%) sys 5.50 ( 6%) wall global CSE : 7.69 (11%) usr 1.35 ( 9%) sys 9.00 (10%) wall loop analysis : 0.86 ( 1%) usr 0.23 ( 2%) sys 1.25 ( 1%) wall CSE 2 : 2.35 ( 3%) usr 0.39 ( 3%) sys 2.12 ( 2%) wall branch prediction : 0.86 ( 1%) usr 0.01 ( 0%) sys 1.12 ( 1%) wall flow analysis : 0.23 ( 0%) usr 0.01 ( 0%) sys 0.12 ( 0%) wall combiner : 5.21 ( 7%) usr 0.16 ( 1%) sys 6.50 ( 7%) wall if-conversion : 0.20 ( 0%) usr 0.02 ( 0%) sys 0.62 ( 1%) wall regmove : 0.40 ( 1%) usr 0.01 ( 0%) sys 0.38 ( 0%) wall scheduling : 6.04 ( 9%) usr 2.25 (15%) sys 8.75 (10%) wall local alloc : 15.43 (22%) usr 0.14 ( 1%) sys 15.75 (18%) wall global alloc : 2.45 ( 3%) usr 0.34 ( 2%) sys 2.38 ( 3%) wall reload CSE regs : 2.60 ( 4%) usr 0.03 ( 0%) sys 2.62 ( 3%) wall flow 2 : 0.31 ( 0%) usr 0.03 ( 0%) sys 0.25 ( 0%) wall if-conversion 2 : 0.06 ( 0%) usr 0.03 ( 0%) sys 0.00 ( 0%) wall rename registers : 1.14 ( 2%) usr 0.04 ( 0%) sys 1.25 ( 1%) wall scheduling 2 : 0.92 ( 1%) usr 0.02 ( 0%) sys 0.88 ( 1%) wall reorder blocks : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall shorten branches : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.38 ( 0%) wall final : 0.38 ( 1%) usr 0.08 ( 1%) sys 0.50 ( 1%) wall symout : 0.02 ( 0%) usr 0.02 ( 0%) sys 0.00 ( 0%) wall rest of compilation : 0.95 ( 1%) usr 0.05 ( 0%) sys 0.88 ( 1%) wall TOTAL : 70.14 15.03 87.12 70.160u 15.080s 1:27.50 97.4% 0+0k 0+7io 0pf+0w I will attach fold-const.i.
Created attachment 5655 [details] fold-const.i from June
The reason why GCSE is slower is a change I made so that darwin also used 64bit for HOST_WIDE_INT because of mpowerpc64 changes. One thing is that lshrdi3 shows high up on profiles so if this was expanded inline it would help out.
Also strchr shows high up on the profiles too because it is inlined like it was before and because of the over header to call a library function is high on darwin because it has to go through a stub. Also strchr on darwin is not optimized fully so inlining strchr will help here too.
I know that most of this is caused by my patch. I do not know how to fix it except for the things listed in the above comments. Maybe I will inline lshrdi3 first.
Here are the numbers after the tree-ssa was merged into the mainline (it is faster but still slower than 3.3.3): [zhivago:~/src/testspeed] pinskia% time ~/fsf-clean-nocheck/bin/gcc -O3 fold-const.i - ftime-report -S Execution times (seconds) garbage collection : 3.74 ( 4%) usr 0.01 ( 0%) sys 3.91 ( 3%) wall callgraph construction: 0.45 ( 1%) usr 0.02 ( 0%) sys 0.48 ( 0%) wall cfg construction : 0.23 ( 0%) usr 0.06 ( 1%) sys 0.28 ( 0%) wall cfg cleanup : 0.72 ( 1%) usr 0.01 ( 0%) sys 0.77 ( 1%) wall trivially dead code : 1.47 ( 2%) usr 0.00 ( 0%) sys 1.52 ( 1%) wall life analysis : 1.70 ( 2%) usr 0.36 ( 3%) sys 2.23 ( 2%) wall life info update : 2.04 ( 2%) usr 0.02 ( 0%) sys 2.11 ( 2%) wall alias analysis : 2.07 ( 2%) usr 0.04 ( 0%) sys 2.01 ( 2%) wall register scan : 0.95 ( 1%) usr 0.01 ( 0%) sys 1.02 ( 1%) wall rebuild jump labels : 0.31 ( 0%) usr 0.00 ( 0%) sys 0.31 ( 0%) wall preprocessing : 0.62 ( 1%) usr 1.17 (10%) sys 1.74 ( 1%) wall lexical analysis : 1.16 ( 1%) usr 1.99 (18%) sys 3.27 ( 3%) wall parser : 1.31 ( 2%) usr 1.12 (10%) sys 2.70 ( 2%) wall integration : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.22 ( 0%) wall tree gimplify : 0.85 ( 1%) usr 0.07 ( 1%) sys 0.96 ( 1%) wall tree eh : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall tree CFG construction : 0.09 ( 0%) usr 0.03 ( 0%) sys 0.15 ( 0%) wall tree CFG cleanup : 0.52 ( 1%) usr 0.01 ( 0%) sys 0.51 ( 0%) wall tree PTA : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall tree alias analysis : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall tree PHI insertion : 0.18 ( 0%) usr 0.04 ( 0%) sys 0.26 ( 0%) wall tree SSA rewrite : 0.40 ( 0%) usr 0.01 ( 0%) sys 0.45 ( 0%) wall tree SSA other : 1.11 ( 1%) usr 0.40 ( 4%) sys 1.75 ( 2%) wall tree operand scan : 0.69 ( 1%) usr 0.58 ( 5%) sys 1.06 ( 1%) wall dominator optimization: 4.23 ( 5%) usr 0.26 ( 2%) sys 4.68 ( 4%) wall tree CCP : 0.08 ( 0%) usr 0.01 ( 0%) sys 0.15 ( 0%) wall tree split crit edges : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall tree PRE : 2.06 ( 2%) usr 0.04 ( 0%) sys 2.12 ( 2%) wall tree forward propagate: 0.08 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall tree conservative DCE : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.22 ( 0%) wall tree aggressive DCE : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall tree DSE : 0.15 ( 0%) usr 0.03 ( 0%) sys 0.17 ( 0%) wall tree copy headers : 0.27 ( 0%) usr 0.02 ( 0%) sys 0.23 ( 0%) wall tree SSA to normal : 0.26 ( 0%) usr 0.03 ( 0%) sys 0.35 ( 0%) wall tree rename SSA copies: 0.06 ( 0%) usr 0.02 ( 0%) sys 0.05 ( 0%) wall dominance frontiers : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall control dependences : 0.01 ( 0%) usr 0.03 ( 0%) sys 0.06 ( 0%) wall expand : 3.31 ( 4%) usr 0.04 ( 0%) sys 3.64 ( 3%) wall varconst : 0.03 ( 0%) usr 0.01 ( 0%) sys 0.01 ( 0%) wall jump : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall CSE : 3.51 ( 4%) usr 0.00 ( 0%) sys 3.67 ( 3%) wall global CSE : 13.25 (15%) usr 1.63 (15%) sys 30.48 (26%) wall loop analysis : 0.48 ( 1%) usr 0.05 ( 0%) sys 0.49 ( 0%) wall bypass jumps : 0.69 ( 1%) usr 0.18 ( 2%) sys 0.91 ( 1%) wall CSE 2 : 1.74 ( 2%) usr 0.01 ( 0%) sys 1.81 ( 2%) wall branch prediction : 0.61 ( 1%) usr 0.09 ( 1%) sys 0.79 ( 1%) wall flow analysis : 0.09 ( 0%) usr 0.01 ( 0%) sys 0.13 ( 0%) wall combiner : 4.80 ( 6%) usr 0.11 ( 1%) sys 5.04 ( 4%) wall if-conversion : 0.19 ( 0%) usr 0.04 ( 0%) sys 0.24 ( 0%) wall regmove : 0.34 ( 0%) usr 0.00 ( 0%) sys 0.39 ( 0%) wall scheduling : 6.59 ( 8%) usr 1.79 (16%) sys 8.95 ( 8%) wall local alloc : 14.06 (16%) usr 0.10 ( 1%) sys 14.50 (12%) wall global alloc : 2.11 ( 2%) usr 0.25 ( 2%) sys 2.60 ( 2%) wall reload CSE regs : 1.50 ( 2%) usr 0.01 ( 0%) sys 1.57 ( 1%) wall flow 2 : 0.13 ( 0%) usr 0.05 ( 0%) sys 0.23 ( 0%) wall if-conversion 2 : 0.07 ( 0%) usr 0.06 ( 1%) sys 0.13 ( 0%) wall peephole 2 : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall rename registers : 0.31 ( 0%) usr 0.06 ( 1%) sys 0.36 ( 0%) wall scheduling 2 : 0.91 ( 1%) usr 0.03 ( 0%) sys 1.05 ( 1%) wall reorder blocks : 0.16 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall shorten branches : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.24 ( 0%) wall final : 0.50 ( 1%) usr 0.03 ( 0%) sys 0.57 ( 0%) wall symout : 0.02 ( 0%) usr 0.02 ( 0%) sys 0.05 ( 0%) wall rest of compilation : 1.09 ( 1%) usr 0.14 ( 1%) sys 1.06 ( 1%) wall TOTAL : 85.55 11.17 116.57 85.560u 11.240s 1:57.87 82.1% 0+0k 2+7io 0pf+0w
Retargeting to 3.4.1, being a regression on that release branch.
I will note that the current SSAPRE only changes one function and only once but takes about 1 second of the total time. Maybe GVN-PRE will help.
Ok, I have a patch, basically most of the non effiecient code is coming from sbitmap's use of HOST_WIDE_INT and not checking it is the most effiecient way of doing it.
Patch here: <http://gcc.gnu.org/ml/gcc-patches/2004-06/msg00398.html>.
It's too bad this patch has never been reviewed; it looks like it would help. Postponed until GCC 3.4.2.
Any progress on this one?
Newest patch here: <http://gcc.gnu.org/ml/gcc-patches/2004-08/msg02339.html>. I also found the other place where we should not be using HOST_WIDE_INT.
Postponed until GCC 3.4.3.
Subject: Bug 13987 CVSROOT: /cvs/gcc Module name: gcc Changes by: pinskia@gcc.gnu.org 2004-08-31 00:29:04 Modified files: gcc : ChangeLog config.host config.in configure configure.ac hard-reg-set.h hwint.h sbitmap.h Log message: 2004-08-30 Andrew Pinski <apinski@apple.com> PR rtl-opt/13987 * config.host (use_long_long_for_widest_fast_int): New, default is off. (ia64-*-hpux*): Enable use_long_long_for_widest_fast_int. * configure.ac: If use_long_long_for_widest_fast_int, then define USE_LONG_LONG_FOR_WIDEST_FAST_INT. * configure: Regenerate. * config.in: Regenerate. * hwint.h (HOST_WIDEST_FAST_INT, HOST_BITS_PER_WIDEST_FAST_INT): New: widest integer type supported efficiently in hardware for the host. * sbitmap.h (SBITMAP_ELT_BITS): Define based on HOST_BITS_PER_WIDEST_FAST_INT. (SBITMAP_ELT_TYPE): Define based on HOST_WIDEST_FAST_INT. * hard-reg-set.h (HARD_REG_ELT_TYPE): Define based on HOST_WIDEST_FAST_INT instead of HOST_WIDE_INT. (HARD_REG_SET_LONGS): Likewise. (UHOST_BITS_PER_WIDE_INT): Likewise. Change the checks for the fast cases to be based on HOST_BITS_PER_WIDES_FAST_INT instead of HOST_BITS_PER_WIDE_INT. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.5163&r2=2.5164 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config.host.diff?cvsroot=gcc&r1=2.10&r2=2.11 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config.in.diff?cvsroot=gcc&r1=1.194&r2=1.195 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/configure.diff?cvsroot=gcc&r1=1.848&r2=1.849 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/configure.ac.diff?cvsroot=gcc&r1=2.59&r2=2.60 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/hard-reg-set.h.diff?cvsroot=gcc&r1=1.20&r2=1.21 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/hwint.h.diff?cvsroot=gcc&r1=1.17&r2=1.18 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/sbitmap.h.diff?cvsroot=gcc&r1=1.21&r2=1.22
Fixed on the mainline, will be backporting when the 3.4 branch opens.
Subject: Bug 13987 CVSROOT: /cvs/gcc Module name: gcc Branch: apple-ppc-branch Changes by: geoffk@gcc.gnu.org 2004-09-07 21:44:33 Modified files: gcc : ChangeLog config.host config.in configure configure.ac hard-reg-set.h hwint.h sbitmap.h Log message: 2004-08-30 Andrew Pinski <apinski@apple.com> PR rtl-opt/13987 * config.host (use_long_long_for_widest_fast_int): New, default is off. (ia64-*-hpux*): Enable use_long_long_for_widest_fast_int. * configure.ac: If use_long_long_for_widest_fast_int, then define USE_LONG_LONG_FOR_WIDEST_FAST_INT. * configure: Regenerate. * config.in: Regenerate. * hwint.h (HOST_WIDEST_FAST_INT, HOST_BITS_PER_WIDEST_FAST_INT): New: widest integer type supported efficiently in hardware for the host. * sbitmap.h (SBITMAP_ELT_BITS): Define based on HOST_BITS_PER_WIDEST_FAST_INT. (SBITMAP_ELT_TYPE): Define based on HOST_WIDEST_FAST_INT. * hard-reg-set.h (HARD_REG_ELT_TYPE): Define based on HOST_WIDEST_FAST_INT instead of HOST_WIDE_INT. (HARD_REG_SET_LONGS): Likewise. (UHOST_BITS_PER_WIDE_INT): Likewise. Change the checks for the fast cases to be based on HOST_BITS_PER_WIDEST_FAST_INT instead of HOST_BITS_PER_WIDE_INT. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&only_with_tag=apple-ppc-branch&r1=1.14646.2.151.2.11&r2=1.14646.2.151.2.12 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config.host.diff?cvsroot=gcc&only_with_tag=apple-ppc-branch&r1=2.4.2.3.6.2&r2=2.4.2.3.6.3 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config.in.diff?cvsroot=gcc&only_with_tag=apple-ppc-branch&r1=1.141.2.18.2.7&r2=1.141.2.18.2.8 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/configure.diff?cvsroot=gcc&only_with_tag=apple-ppc-branch&r1=1.619.2.44.2.5&r2=1.619.2.44.2.6 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/configure.ac.diff?cvsroot=gcc&only_with_tag=apple-ppc-branch&r1=2.10.2.6.2.6&r2=2.10.2.6.2.7 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/hard-reg-set.h.diff?cvsroot=gcc&only_with_tag=apple-ppc-branch&r1=1.14.8.6&r2=1.14.8.6.6.1 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/hwint.h.diff?cvsroot=gcc&only_with_tag=apple-ppc-branch&r1=1.11.2.5&r2=1.11.2.5.8.1 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/sbitmap.h.diff?cvsroot=gcc&only_with_tag=apple-ppc-branch&r1=1.17.2.4&r2=1.17.2.4.8.1
Patch was only for 4.0.0, I have to do a backport to 3.4.x.
Postponed to GCC 3.4.4.
I am no longer working on this.
Fixed in 4.0 and up. Won't fix for 3.4.5