This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
- To: amylaar@cygnus.co.uk (Joern Rennecke)
- Subject: Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
- From: Brad Lucier <lucier@math.purdue.edu>
- Date: Fri, 6 Aug 1999 20:06:35 -0500 (EST)
- Cc: lucier@math.purdue.edu (Brad Lucier), gcc@gcc.gnu.org, gcc-bugs@gcc.gnu.org, staff@math.purdue.edu, hosking@cs.purdue.edu, wilker@math.purdue.edu, bernds@cygnus.com
> That's the price we pay for doing register spilling on a per-instruction
> basis, and calling compute_use_by_pseudos twice for every instruction
> and reload pass.
>
> I think we could fix this by using pseudo register birth / death lists
> instead of complete register sets.
>
I made a mistake, sorry. I didn't pass the -O1 -fPIC to cc1 when I
did the timings; reload is not the problem. Mea culpa.
Here are the timings for the various stages of egcs-1.1.2 and gcc-2.95
with -O1 -fPIC.
popov-2% /usr/lib/gcc-lib/alpha-redhat-linux/egcs-2.91.66/cc1 -fPIC -O1 g0-1.i
__copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20_g0_2d_1 ___init_proc ____20_g0_2d_1
time in parse: 10.843360
time in integration: 0.007808
time in jump: 7.701616
time in cse: 8.022720
time in loop: 0.046848
time in flow: 2.352160
time in combine: 7.864608
time in local-alloc: 2.923120
time in global-alloc: 4.692608
time in shorten-branch: 0.361120
time in final: 2.023248
popov-4% /export/u10/gcc-2.95/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.95/cc1 -fPIC -O1 g0-1.i
__copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20_g0_2d_1 ___init_proc ____20_g0_2d_1
time in parse: 10.939984
time in integration: 0.000976
time in jump: 8.168144
time in cse: 5.181584
time in loop: 0.039040
time in flow: 2.695712
time in combine: 8.443376
time in local-alloc: 3.038288
time in global-alloc: 119.387248
time in flow2: 2.311168
time in shorten-branch: 0.383568
time in final: 2.120848
You can see there is a big difference in global-alloc.
Here is the beginning of the flat composite profile
Flat profile:
Each sample counts as 0.000976562 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
36.61 57.68 57.68 16 3604.80 3605.63 prune_preferences
12.47 77.32 19.64 391960848 0.00 0.00 bitmap_bit_p
7.29 88.80 11.48 202789 0.06 0.06 record_one_conflict
7.22 100.18 11.38 24 474.20 1233.79 build_insn_chain
1.56 102.64 2.46 8 306.88 19628.50 yyparse
1.35 104.77 2.13 27806760 0.00 0.00 count_pseudo
1.32 106.85 2.08 15315 0.14 0.44 order_regs_for_reload
1.05 108.50 1.65 1436 1.15 1.15 find_reg
0.83 109.82 1.31 2455661 0.00 0.00 yylex
The biggest time sink seems to be the quadratic algorithm in prune_preferences
in global.c.
Again, the complete profile summary is at:
http://www.math.purdue.edu/~lucier/gmon.summary.gz
Brad Lucier lucier@math.purdue.edu