This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
How much resources for -Ok, k=1,2,...?
- To: gcc at gcc dot gnu dot org
- Subject: How much resources for -Ok, k=1,2,...?
- From: Brad Lucier <lucier at math dot purdue dot edu>
- Date: Mon, 15 Nov 1999 13:59:49 -0500 (EST)
- Cc: lucier at math dot purdue dot edu, feeley at iro dot umontreal dot ca, hosking at cs dot purdue dot edu
I just compiled, with today's mainline, a rather large routine written
with computed gotos to implement a threaded state machine. The nonzero
times for the various passes are given below:
popov-2% /export/u10/egcs-test/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.96/cc1 -O1 -fPIC -mcpu=ev6 -fno-math-errno _meroon.i
___H__20___meroon {GC 120520k -> 26015k in 0.321} {GC 33991k -> 30609k in 0.374} {GC 41402k -> 29605k in 0.364} {GC 41131k -> 31174k in 0.389} ___init_proc {GC 56940k -> 2267k in 0.023} ____20___meroon
time in parse: 10.756496 (3%)
time in jump: 10.346576 (3%)
time in cse: 3.383792 (1%)
time in loop: 0.032208 (0%)
time in flow: 135.258960 (37%)
time in combine: 3.508720 (1%)
time in local-alloc: 1.840736 (1%)
time in global-alloc: 7.540576 (2%)
time in flow2: 165.345136 (45%)
time in shorten-branch: 0.269376 (0%)
time in final: 19.630288 (5%)
time in varconst: 0.002928 (0%)
time in gc: 1.471808 (0%)
The characteristics of this routine are fairly extreme:
7383 labels
5516 labels whose addresses are taken (i.e., possible targets of computed gotos)
3916 computed gotos
9076 regular gotos
So, under the circumstances, gcc's performance is not so bad.
On the other hand, gcc requires > 1.6 Gbytes of memory on my alpha to
compile this code, which seems extreme. Here are the top functions from
the profiled version:
45.53 96.63 96.63 282359749 0.00 0.00 bitmap_operation
16.11 130.82 34.19 6 5698.08 21920.68 calculate_global_regs_live
4.93 141.29 10.47 6 1744.63 1744.63 mark_critical_edges
4.33 150.48 9.19 6 1531.74 1531.85 delete_unreachable_blocks
2.98 156.80 6.32 43222966 0.00 0.00 make_label_edge
2.53 162.17 5.37 43240831 0.00 0.00 make_edge
2.20 166.85 4.68 3 1558.59 1558.59 commit_edge_insertions
1.04 169.06 2.21 23102 0.10 0.10 chainon
0.93 171.03 1.97 43273780 0.00 0.00 xcalloc
0.88 172.90 1.87 9380 0.20 0.20 delete_from_jump_chain
0.84 174.68 1.78 116140 0.02 0.02 record_one_conflict
0.72 176.21 1.53 1 1532.23 211935.64 yyparse
(Generally, I believe that anything that takes longer than yyparse
is interesting.) What is most interesting to me is this part of flow.c:
-----------------------------------------------
34.19 97.34 6/6 life_analysis_1 [7]
[8] 62.0 34.19 97.34 6 calculate_global_regs_live [8]
96.16 0.00 280984010/282359749 bitmap_operation [9]
0.21 0.92 33203/65344 propagate_block [31]
0.02 0.00 67175/327322 bitmap_copy [281]
0.01 0.00 203557/1264373 bitmap_clear [263]
0.00 0.01 33203/33203 bitmap_equal_p [496]
0.00 0.00 32153/468915 bitmap_initialize [456]
0.00 0.00 6/247092 xmalloc [419]
0.00 0.00 6/32146 sbitmap_zero [989]
-----------------------------------------------
I believe that this is where most of the memory is being allocated.
OK, so why am I posting this? I'm getting worried about Jeff's continued
statements that the best thing is to move the optimization algorithms to
work on edges rather than basic blocks, registers, ... This may be true,
but in this routine, for example, you have
5516*3916+9076=21,609,732 edges
and if you start allocating a struct for each edge for some algorithm
then you're going to be screwed. We need only 80 bytes/edge to get
up to 1.6 Gbytes of memory needed, which is not so hard to do when a
pointer is 8 bytes.
So I would like to make a suggestion:
FOR -O1, STICK WITH OPTIMIZATIONS THAT CAN BE COMPUTED USING AT MOST
ONE BIT PER EDGE, OR PER BASIC BLOCK--REGISTER PAIR.
(I don't mean to imply I'm yelling, I just want this statement to stand
out among all this verbiage.)
This seems to be a fair dividing line between time- and (especially)
memory-intensive optimizations, which are appropriate for -Ok for k>1,
and less time- and memory-intensive optimizations, which are appropriate
for -O1.
Brad Lucier