General Information
Submitted and accepted application: GCTuningApplication
Student: LaurynasBiveinis
Mentor: Daniel Berlin
Status Reports
Please make corrections if you see mistakes here!
How to Add New Garbage Collector to GCC
Add new case to the gcc/configure.ac. For example,
Index: gcc/configure.ac =================================================================== --- gcc/configure.ac (revision 113493) +++ gcc/configure.ac (working copy) @@ -3213,9 +3213,12 @@ # Find out what GC implementation we want, or may, use. AC_ARG_WITH(gc, -[ --with-gc={page,zone} choose the garbage collection mechanism to use +[ --with-gc={boehm,page,zone} choose the garbage collection mechanism to use with the compiler], [case "$withval" in + boehm) + GGC=ggc-$withval + ;; page) GGC=ggc-$withval ;;
Create a new file in gcc subdirectory named after the configure option, in this case ggc-boehm.c. Implement at least following functions (currently the parameter names might be not enough descriptive):
void * ggc_alloc_stat (size_t size MEM_STAT_DECL) - Allocate a chunk of memory of SIZE bytes. Its contents are undefined.
void ggc_collect (void) - Invoke the collector. Garbage collection occurs only when this function is called, not during allocations.
void ggc_free (void * block)
size_t ggc_get_size (const void * block) - Return the number of bytes allocated at the indicated address.
int ggc_marked_p (const void * d)
char * ggc_pch_alloc_object (struct ggc_pch_data * d, void * p, size_t s, bool b, enum gt_types_enum t)
void ggc_pch_count_object (struct ggc_pch_data * d, void * p, size_t s, bool b, enum gt_types_enum t)
void ggc_pch_finish (struct ggc_pch_data * d, FILE * f)
void ggc_pch_read (FILE * f, void * p)
void ggc_pch_this_base (struct ggc_pch_data * d, void * p)
void ggc_pch_prepare_write (struct ggc_pch_data * d, FILE * f)
size_t ggc_pch_total_size (struct ggc_pch_data * d)
void ggc_pch_write_object (struct ggc_pch_data * d, FILE * f, void * p1, void * p2, size_t s, bool b)
void ggc_print_statistics (void)
int ggc_set_mark (const void * block)
void init_ggc (void) - initialize the garbage collector.
struct ggc_pch_data * init_ggc_pch (void)
Add rules for building the new file to gcc/Makefile.in.
When to collect?
void init_ggc_heuristics (void) { #if !defined ENABLE_GC_CHECKING && !defined ENABLE_GC_ALWAYS_COLLECT set_param_value ("ggc-min-expand", ggc_min_expand_heuristic()); set_param_value ("ggc-min-heapsize", ggc_min_heapsize_heuristic()); #endif }
So any time both "gc" and "gcac" checking are turned off we alter the heuristic so that GCC probes the system's RAM and uses more memory before collecting. When either of those checking styles are turned on (as is the case on mainline where "gc" is on) then we default to smaller values that simulate a machine with less RAM (32MB IIRC) and collect more often.
In practice this has uncovered collection bugs more reliably. Developers often have boxes with lots of RAM and their patches would get tested in situations where no collection was done. Then when the patch was installed on mainline someone would see a problem on a small RAM machine. With everyone pretending they have small RAM boxes, developers more often catch GC errors before installing anything.
Anyway, the lower parameters are set with checking on to: GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
You can see this by compiling an actual file with -v. By that I mean you have to invoke cc1, not just run "gcc -v" without an input file. You can add these --params flags to a --disable-checking bootstrap to restore collection behavior to that found in a checking compiler.
With checking turned off then the parameters depend on the machine's configuration. For example on my box with --disable-checking I see:
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
- -Kaveh R. Ghazi
GTY Documentation Notes
Following if_marked arguments are actually used in GCC sources:
ggc_marked_p() - checks if given object is already marked.
tree_map_marked_p(),
tree_int_map_marked_p(),
type_hash_marked_p() - these all contain optimizations for special cases.
PCH Notes
All you should need to do to support PCH is implement the ggc_pch_* routines, specified and declared in ggc.h, for your garbage collector. Only ggc_pch_write_object actually requires you to do something, everything else is to help your implementation know what's going on. For performance, it is necessary that you don't write too many objects in the PCH file (and so freeing and re-using them is a bad idea). You need to be able to mark the pointers placed in objects read from a PCH as they may refer to newly-allocated memory.
- - Geoffrey Keating
Random GGC (short for GCC GC) Notes
- Previous experiments with copying collector:
The existing collectors (and pseudocollectors) are: ggc-none, used by GCC build-time generators, provides no GC at all; ggc-page is the default bag-of-pages collector; and ggc-zone, that is a bag-of-pages collector with allocations segregated into separate zones
- If we do a copying collector, then it's impossible to have a conservative collector. Everything is either a pointer or not with 100% confidence, as we must adjust pointer values when we move objects around. This might be possible to do with current PCH annotation for data structures.
- Boehm's GC, although has generational features, is not a copying collector.
- Preference of copying vs. non-copying collector is unclear without performance measurements.
To debug prematurely collected GCC objects use http://www.hpl.hp.com/personal/Hans_Boehm/gc/debugging.html
The registered roots from non-managed heap memory are marked by ggc_mark_roots() in ggc-common.c. Roots are registered by gengtype during build. A header file for each frontend is generated containing the master array of roots, e.g. gtype-c.h. Roots are grouped by per-file basis, e.g. the generated gt-varasm.h root information from varasm.c. Every variable with a GTY marker is considered to be a GC root.
- There are several additional GC root lists:
gt_ggc_rtab - list of ordinary roots
gt_ggc_deletable_rtab - list of pointers that can be cleared on every collection. TODO: how this is different from gt_ggc_cache_rtab ?
gt_ggc_cache_rtab - list of pointers in cache hash tables. These should be treated as weak pointers for marking purposes.
gt_pch_cache_rtab - TODO: PCH, not interesting for now?
gt_pch_scalar_rtab - TODO PCH, not interesting for now list of roots pointing to scalar values?
At least one exception to the scheme above is the identifier table. Marking of the trees that are associated with identifiers is started by hand, not by GTY stuff, see gcc_mark_stringpool in stringpool.c.
Collection is possible only on gc_collect() calls and not possible on allocation. Some parts of the compiler allocate objects with GC that do not path from root to them and expect them to stay alive until the next gc_collect() call.