= General Information = * Submitted and accepted application: [[GCTuningApplication]] * Student: LaurynasBiveinis * Mentor: [[http://www.dberlin.org/|Daniel Berlin]] = Status Reports = 1. [[GCPerformanceTests]] 1. [[BoehmGCForGCC]] = Please make corrections if you see mistakes here! = = How to Add New Garbage Collector to GCC = * Add new case to the {{{gcc/configure.ac}}}. For example, {{{ Index: gcc/configure.ac =================================================================== --- gcc/configure.ac (revision 113493) +++ gcc/configure.ac (working copy) @@ -3213,9 +3213,12 @@ # Find out what GC implementation we want, or may, use. AC_ARG_WITH(gc, -[ --with-gc={page,zone} choose the garbage collection mechanism to use +[ --with-gc={boehm,page,zone} choose the garbage collection mechanism to use with the compiler], [case "$withval" in + boehm) + GGC=ggc-$withval + ;; page) GGC=ggc-$withval ;; }}} * Create a new file in {{{gcc}}} subdirectory named after the configure option, in this case {{{ggc-boehm.c}}}. Implement at least following functions (currently the parameter names might be not enough descriptive): 1. {{{void * ggc_alloc_stat (size_t size MEM_STAT_DECL)}}} - Allocate a chunk of memory of SIZE bytes. Its contents are undefined. 1. {{{void ggc_collect (void)}}} - Invoke the collector. Garbage collection occurs only when this function is called, not during allocations. 1. {{{void ggc_free (void * block)}}} 1. {{{size_t ggc_get_size (const void * block)}}} - Return the number of bytes allocated at the indicated address. 1. {{{int ggc_marked_p (const void * d)}}} 1. {{{char * ggc_pch_alloc_object (struct ggc_pch_data * d, void * p, size_t s, bool b, enum gt_types_enum t)}}} 1. {{{void ggc_pch_count_object (struct ggc_pch_data * d, void * p, size_t s, bool b, enum gt_types_enum t)}}} 1. {{{void ggc_pch_finish (struct ggc_pch_data * d, FILE * f)}}} 1. {{{void ggc_pch_read (FILE * f, void * p)}}} 1. {{{void ggc_pch_this_base (struct ggc_pch_data * d, void * p)}}} 1. {{{void ggc_pch_prepare_write (struct ggc_pch_data * d, FILE * f)}}} 1. {{{size_t ggc_pch_total_size (struct ggc_pch_data * d)}}} 1. {{{void ggc_pch_write_object (struct ggc_pch_data * d, FILE * f, void * p1, void * p2, size_t s, bool b)}}} 1. {{{void ggc_print_statistics (void)}}} 1. {{{int ggc_set_mark (const void * block)}}} 1. {{{void init_ggc (void)}}} - initialize the garbage collector. 1. {{{struct ggc_pch_data * init_ggc_pch (void)}}} * Add rules for building the new file to {{{gcc/Makefile.in}}}. = When to collect? = {{{ void init_ggc_heuristics (void) { #if !defined ENABLE_GC_CHECKING && !defined ENABLE_GC_ALWAYS_COLLECT set_param_value ("ggc-min-expand", ggc_min_expand_heuristic()); set_param_value ("ggc-min-heapsize", ggc_min_heapsize_heuristic()); #endif } }}} So any time both "gc" and "gcac" checking are turned off we alter the heuristic so that GCC probes the system's RAM and uses more memory before collecting. When either of those checking styles are turned on (as is the case on mainline where "gc" is on) then we default to smaller values that simulate a machine with less RAM (32MB IIRC) and collect more often. In practice this has uncovered collection bugs more reliably. Developers often have boxes with lots of RAM and their patches would get tested in situations where no collection was done. Then when the patch was installed on mainline someone would see a problem on a small RAM machine. With everyone pretending they have small RAM boxes, developers more often catch GC errors before installing anything. Anyway, the lower parameters are set with checking on to: GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 You can see this by compiling an actual file with -v. By that I mean you have to invoke cc1, not just run "gcc -v" without an input file. You can add these --params flags to a --disable-checking bootstrap to restore collection behavior to that found in a checking compiler. With checking turned off then the parameters depend on the machine's configuration. For example on my box with --disable-checking I see: GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 * -Kaveh R. Ghazi = GTY Documentation Notes = Following {{{if_marked}}} arguments are actually used in GCC sources: 1. {{{ggc_marked_p()}}} - checks if given object is already marked. 1. {{{tree_map_marked_p()}}}, 1. {{{tree_int_map_marked_p()}}}, 1. {{{type_hash_marked_p()}}} - these all contain optimizations for special cases. = PCH Notes = All you should need to do to support PCH is implement the {{{ggc_pch_*}}} routines, specified and declared in {{{ggc.h}}}, for your garbage collector. Only {{{ggc_pch_write_object}}} actually requires you to do something, everything else is to help your implementation know what's going on. For performance, it is necessary that you don't write too many objects in the PCH file (and so freeing and re-using them is a bad idea). You need to be able to mark the pointers placed in objects read from a PCH as they may refer to newly-allocated memory. * - Geoffrey Keating = Random GGC (short for GCC GC) Notes = * Previous experiments with copying collector: * http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01266.html * The existing collectors (and pseudocollectors) are: {{{ggc-none}}}, used by GCC build-time generators, provides no GC at all; {{{ggc-page}}} is the default bag-of-pages collector; and {{{ggc-zone}}}, that is a bag-of-pages collector with allocations segregated into separate zones * If we do a copying collector, then it's impossible to have a conservative collector. Everything is either a pointer or not with 100% confidence, as we must adjust pointer values when we move objects around. This might be possible to do with current PCH annotation for data structures. * Boehm's GC, although has generational features, is not a copying collector. * Preference of copying vs. non-copying collector is unclear without performance measurements. * To debug prematurely collected GCC objects use http://www.hpl.hp.com/personal/Hans_Boehm/gc/debugging.html * The registered roots from non-managed heap memory are marked by {{{ggc_mark_roots()}}} in {{{ggc-common.c}}}. Roots are registered by {{{gengtype}}} during build. A header file for each frontend is generated containing the master array of roots, e.g. {{{gtype-c.h}}}. Roots are grouped by per-file basis, e.g. the generated {{{gt-varasm.h}}} root information from {{{varasm.c}}}. Every variable with a {{{GTY}}} marker is considered to be a GC root. * There are several additional GC root lists: 1. {{{gt_ggc_rtab}}} - list of ordinary roots 1. {{{gt_ggc_deletable_rtab}}} - list of pointers that can be cleared on every collection. TODO: how this is different from {{{gt_ggc_cache_rtab }}}? 1. {{{gt_ggc_cache_rtab}}} - list of pointers in cache hash tables. These should be treated as weak pointers for marking purposes. 1. {{{gt_pch_cache_rtab}}} - TODO: PCH, not interesting for now? 1. {{{gt_pch_scalar_rtab}}} - TODO PCH, not interesting for now list of roots pointing to scalar values? * At least one exception to the scheme above is the identifier table. Marking of the trees that are associated with identifiers is started by hand, not by GTY stuff, see {{{gcc_mark_stringpool}}} in {{{stringpool.c}}}. * Collection is possible only on {{{gc_collect()}}} calls and not possible on allocation. Some parts of the compiler allocate objects with GC that do not path from root to them and expect them to stay alive until the next {{{gc_collect()}}} call.