This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
interesting gc result wrt cse
- To: gcc-patches at gcc dot gnu dot org
- Subject: interesting gc result wrt cse
- From: Richard Henderson <rth at cygnus dot com>
- Date: Thu, 9 Sep 1999 20:41:57 -0700
Curious about a comment Joern had made, I experimented with
a test case near and dear to the hearts of Cygnoids -- the
Plumhall test conform/exprtest/gt.c.
For those of you not familiar, at -O3 this test becomes one
huge `main', cse goes nuts allocating test rtl, and the core
size ballons such that many machines swap themselves to death.
For reference, on my desktop Alpha w/ 128M ram, native, the
peak memory usage is 157M, and time reports
284.03user 5.63system 5:34.31elapsed 86%CPU
(18068major+48954minor)pagefaults 13354swaps
If I just turn on GC and let things go (admittedly with an
unoptimized cc1, but it's unlikely to make that much difference),
things are, surprisingly, a lot worse: 215M peak,
514.64user 19.91system 13:33.49elapsed 65%CPU
(111439major+120819minor)pagefaults 60719swaps
time in cse: 148.796080 (28%)
time in cse2: 125.600464 (24%)
time in sched: 37.226592 (7%)
time in global-alloc: 113.023728 (21%)
time in gc: 49.659856 (9%)
However! Once we get to the point of this excercise, namely, to
do incremental garbage collection around cse_basic_block, the
story is quite different: 66M peak (21M - 27M during cse!),
320.32user 1.45system 5:30.76elapsed 97%CPU
(551major+17303minor)pagefaults 10swaps
time in cse: 116.740336 (36%)
time in cse2: 86.986976 (27%)
time in sched: 23.564544 (7%)
time in global-alloc: 51.428368 (16%)
time in sched2: 11.382112 (4%)
time in gc: 16.649584 (5%)
The overall runtime is only marginally reduced, but the resident
set size is probably a fifth what it was before.
I had applied the attached patch. Note that I found I needed to reduce
the frequency with which we did garbage collection in order to make any
real headway. Before, a collect might get 800 rtx for .227 sec spent.
After, a collect might get 140000 rtx for .388 sec spent.
I've not quite convinced myself that I'm not forgetting something wrt
placement of the ggc_push_context; I'll be running tests overnight to
make sure. But this is encouraging.
r~
* cse.c (cse_main): GC around cse_basic_block.
* ggc-simple.c (ggc_pop_context): Fold outstanding byte count
back into outer context.
(ggc_collect): Increase collect threshold to 4MB.
Index: cse.c
===================================================================
RCS file: /egcs/carton/cvsfiles/egcs/gcc/cse.c,v
retrieving revision 1.89
diff -c -p -d -r1.89 cse.c
*** cse.c 1999/09/07 05:47:41 1.89
--- cse.c 1999/09/10 03:10:41
*************** Boston, MA 02111-1307, USA. */
*** 36,41 ****
--- 36,42 ----
#include "toplev.h"
#include "output.h"
#include "splay-tree.h"
+ #include "ggc.h"
/* The basic idea of common subexpression elimination is to go
through the code, keeping a record of expressions that would
*************** cse_main (f, nregs, after_loop, file)
*** 8730,8735 ****
--- 8731,8739 ----
|| global_regs[i])
SET_HARD_REG_BIT (regs_invalidated_by_call, i);
+ if (ggc_p)
+ ggc_push_context ();
+
/* Loop over basic blocks.
Compute the maximum number of qty's needed for each basic block
(which is 2 for each SET). */
*************** cse_main (f, nregs, after_loop, file)
*** 8786,8795 ****
--- 8790,8805 ----
cse_jumps_altered |= old_cse_jumps_altered;
}
+ if (ggc_p)
+ ggc_collect ();
+
#ifdef USE_C_ALLOCA
alloca (0);
#endif
}
+
+ if (ggc_p)
+ ggc_pop_context ();
/* Tell refers_to_mem_p that qty_const info is not available. */
qty_const = 0;
Index: ggc-simple.c
===================================================================
RCS file: /egcs/carton/cvsfiles/egcs/gcc/ggc-simple.c,v
retrieving revision 1.13
diff -c -p -d -r1.13 ggc-simple.c
*** ggc-simple.c 1999/09/09 21:41:37 1.13
--- ggc-simple.c 1999/09/10 03:10:41
*************** ggc_pop_context PROTO ((void))
*** 220,225 ****
--- 220,227 ----
gs->next->strings = gs->strings;
}
+ gs->next->bytes_alloced_since_gc += gs->bytes_alloced_since_gc;
+
ggc_chain = gs->next;
free (gs);
}
*************** ggc_collect ()
*** 728,734 ****
#ifndef ENABLE_CHECKING
/* See if it's even worth our while. */
! if (ggc_chain->bytes_alloced_since_gc < 64*1024)
return;
#endif
--- 730,736 ----
#ifndef ENABLE_CHECKING
/* See if it's even worth our while. */
! if (ggc_chain->bytes_alloced_since_gc < 4*1024*1024)
return;
#endif