This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: Out-of-sight compile times for calculate_loop_depth

To: lucier at math dot purdue dot edu, m dot hayes at elec dot canterbury dot ac dot nz
Subject: Re: Out-of-sight compile times for calculate_loop_depth
From: Brad Lucier <lucier at math dot purdue dot edu>
Date: Wed, 9 Feb 2000 12:05:09 -0500 (EST)
Cc: law at cygnus dot com, gcc at gcc dot gnu dot org

> From mph@elec.canterbury.ac.nz  Wed Feb  9 05:36:34 2000
> Could you please try the attached patch on your program.  This should
> improve the loop tree construction time.  Currently it has quadratic
> behaviour in the worst case.  This patch should nail this problem in
> the interim although I've got an inlinking for an even better scheme.

It made some difference; after your patch, I got:

/export/u10/egcs-profile/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.96/cc1 -mcpu=ev6 -fno-math-errno -mieee -fPIC -O1 _meroon.i
 ___H__20___meroon {GC 167029k -> 34257k in 0.803} {GC 58812k -> 37713k in 0.927} {GC 51743k -> 39567k in 0.990} ___init_proc {GC 72784k -> 3025k in 0.064} ____20___meroon
time in parse: 22.902816 (3%)
time in jump: 81.758544 (9%)
time in cse: 9.965936 (1%)
time in loop: 0.040992 (0%)
time in flow: 692.931696 (80%)   <= down from 738 seconds
time in combine: 13.778192 (2%)
time in local-alloc: 4.948320 (1%)
time in global-alloc: 15.723360 (2%)
time in flow2: 7.622560 (1%)
time in shorten-branch: 0.572912 (0%)
time in final: 3.108560 (0%)
time in varconst: 0.006832 (0%)
time in gc: 2.784528 (0%)

and

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 53.32    136.89   136.89    59615     2.30     2.30  sbitmap_intersection_of_preds
  6.80    154.34    17.45     6596     2.65     2.65  flow_loop_exits_find
  6.04    169.84    15.50     6596     2.35     2.35  flow_loop_pre_header_find
  2.23    175.56     5.72        8   714.60  4413.22  jump_optimize_1
  2.18    181.15     5.59 10438001     0.00     0.00  rtx_renumbered_equal_p
  2.03    186.36     5.21  9035782     0.00     0.00  find_cross_jump
  1.90    191.24     4.88    25373     0.19     0.19  delete_from_jump_chain
  1.53    195.16     3.92 12537008     0.00     0.00  next_active_insn
  1.27    198.43     3.27    70351     0.05     0.05  make_edge
  0.84    200.59     2.17   147364     0.01     0.01  record_one_conflict
  0.75    202.53     1.93        1  1931.64 256480.41  yyparse

with some details:

-----------------------------------------------
                0.00  170.87       3/3           rest_of_compilation [5]
[6]     66.6    0.00  170.87       3         calculate_loop_depth [6]
                0.06  170.82       3/3           flow_loops_find [7]
                0.00    0.00       3/3           flow_loops_free [1222]
-----------------------------------------------
                0.06  170.82       3/3           calculate_loop_depth [6]
[7]     66.6    0.06  170.82       3         flow_loops_find [7]
                0.06  137.59       3/3           compute_flow_dominators [8]
               17.45    0.00    6596/6596        flow_loop_exits_find [12]
               15.50    0.00    6596/6596        flow_loop_pre_header_find [13]
                0.11    0.00       3/6           sbitmap_vector_alloc [197]
                0.04    0.00    6596/6596        flow_loop_nodes_find [377]
                0.00    0.03       3/3           flow_loops_tree_build [428]
                0.01    0.00       1/1           flow_depth_first_order_compute [584]
                0.01    0.00    6598/6608        sbitmap_alloc [622]
                0.00    0.00       2/26511       sbitmap_zero [611]
                0.00    0.00       1/202709      xmalloc [422]
                0.00    0.00       1/112725      xcalloc [545]
                0.00    0.00       3/3           flow_loops_level_compute [1223]
-----------------------------------------------
                0.06  137.59       3/3           flow_loops_find [7]
[8]     53.6    0.06  137.59       3         compute_flow_dominators [8]
              136.89    0.01   59615/59615       sbitmap_intersection_of_preds [9]
                0.58    0.00   59618/59618       sbitmap_a_and_b [115]
                0.11    0.00       3/6           sbitmap_vector_alloc [197]
                0.00    0.01       3/3           sbitmap_vector_zero [661]
                0.00    0.00       3/3           sbitmap_vector_ones [801]
                0.00    0.00       3/26511       sbitmap_zero [611]
                0.00    0.00       3/202709      xmalloc [422]
-----------------------------------------------
              136.89    0.01   59615/59615       compute_flow_dominators [8]
[9]     53.3  136.89    0.01   59615         sbitmap_intersection_of_preds [9]
                0.01    0.00   59615/59615       sbitmap_copy [614]
-----------------------------------------------

So you did kill the flow_loops_tree_build/sbitmap_a_subset_b_p times,
which is good.  For some reason, sbitmap_intersection_of_preds seems to
take a lot more time now.

Brad

Follow-Ups:
- Re: Out-of-sight compile times for calculate_loop_depth
  - From: Michael Meissner

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]