This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: GCC 3.3, GCC 3.4
- From: Tim Josling <tej at melbpc dot org dot au>
- Cc: gcc at gcc dot gnu dot org
- Date: Thu, 06 Feb 2003 21:38:42 +1100
- Subject: Re: GCC 3.3, GCC 3.4
- Organization: Melbourne PC User Group
- References: <EF0167C6-3881-11D7-AB1A-00039390D9E0@apple.com>
> > Matt Austern at Apple did similar experiment by using ggc placeholder.
> > (same interface as ggc, no collection, allocate memory using mmap in
> > chunks and
> > satisfy memory allocation requests from these chunks etc..).
> >
> > He got around ~12% speedup for sample c++ source.
> >
> > -Devang
> > (also from Geoff Keating in another email)
> > To be precise, Matt found that the speedup of using the ggc
> > placeholder was about equivalent to the speedup of using ggc-page but
> > disabling garbage collection. This contradicts the results that Tim
> > Josling got above.
> >
> > --
> > - Geoffrey Keating
> > Matt,
> >
> > Any chance I could get a copy of your c++ source and/or your minimal
> > non-gc
> > implementation so I can see what is under the discrepancy?
>
> My placeholder "collector" is attached...
> > Also did you keep your actual numbers, and what was your machine
> > configuration.
>
> I tested running cc1plus only, on a preprocessed source file. I don't
> think I can include that source file, I'm afraid, because it's
> proprietary
> Apple code. I was running the tests on a PPC running OS X
> 10.2.2.
>
> Numbers:
> TOT 2.6s real, 2.3s user, 0.3s sys
> TOT with placeholder gc 2.3s real, 2.0s user, 0.3s sys
>
> The numbers changed a bit depending on how large the chunks were
> that I allocated with ggc-placeholder.c. I can probably find some more
> detailed numbers if I try.
>
> And I have to say, these results surprised me. I expected to see bigger
> gains, because of improved locality.
>
> --Matt
Matt,
Thanks.
Tests
-----
I reran the tests with four scenarios:
1. Standard ggc-page.c
2. Stndard ggc-page.c but ggc_collect just returns immediately.
3. Matt's placeholder routine (mmap/alignment)
4. My placeholder routine (xmalloc/no alignment)
Numbers
=======
User CPU time in seconds
------------------------
gcc bootstrap on Pentium III, 1ghz, 256 mb ram, compaq Armada E500.
Configured languages, c c++ TL CB, c TL CB, Extra when c++
configured
TL = treelang, CB = COBOL
1. ggc-page.c, 2815.28, 1998.96, 816.32
2. ggc-page.c (no collect), 2642.98, 1894.14, 748.84
3. placeholder (matt), 2527.46, 1841.54, 685.92
4. placeholder (tim), 2527.41, 1835.33, 692.08
Cost of ggc-alloc %, 4.57, 2.86, 9.17
Cost of collects % , 6.12, 5.24, 8.27
Cost of GC (alloc+collect)%, 11.39, 8.55, 19.01
(this table best viewed in a fixed width font)
The elapsed times were in line with the CPU times and the system CPU time did
not vary much.
Placeholder routine
===================
I attached my version of the placeholder routine (new version, last year's one
is offsite). It is about the same speed as Matt's routine within 0.3%. I just
copied it over ggc-page.c rather than change the make files (having first
saved a copy of ggc-page.c).
There does not seem to be any difference between the placeholder routines that
would explain the discrepancy.
Analysis
========
The results for the gcc build were different from the results I have from last
year. Last year, running the ggc_collect sped the build up. Now it slows it
down further, by around 5%.
> > Geoff Keating...
> > To be precise, Matt found that the speedup of using the ggc
> > placeholder was about equivalent to the speedup of using ggc-page but
> > disabling garbage collection. This contradicts the results that Tim
> > Josling got above.
> Matt Austern...
> Yes. That's why I haven't been pushing to have a mode in which
> gc is entirely removed: my measurements suggest that the most
> important way in which gc introduces overhead is the most obvious
> one: the collection phase itself takes time.
Geoff is saying and you agree that the CPU for placeholder ~= CPU for
gcc_page_no_collect i.e. the main gain is from not collecting.
However according to my numbers for gcc bootstrap, the placeholder is
significantly faster than ggc_page_no_collect.
I am trying to get some numbers for some C++ programs.
It would be useful to know how much CPU Matt's C++ pgm takes with
ggc-page-no-collect i.e. no collection, but with things laid out for ggc-page.
Above we only have the numbers for standard GC and for the placeholder.
Tim Josling
#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "tm.h"
#include "tree.h"
#include "rtl.h"
#include "tm_p.h"
#include "toplev.h"
#include "varray.h"
#include "flags.h"
#include "ggc.h"
#include "timevar.h"
#include "params.h"
#ifdef ENABLE_VALGRIND_CHECKING
#include <valgrind.h>
#else
/* Avoid #ifdef:s when we can help it. */
#define VALGRIND_DISCARD(x)
#endif
#include <assert.h>
#include <sys/mman.h>
/* Alloc SIZE bytes of GC'able memory. */
#define K 1024
#define M (K * K)
static const size_t chunksize = (1 * M); /* 1MB */
static char* chunk_begin = 0;
static size_t chunk_remaining = 0;
static void get_chunk(size_t min);
static void get_chunk(size_t min)
{
size_t actual;
actual = (chunksize > min)? chunksize : min;
chunk_begin = xmalloc (actual);
chunk_remaining = actual;
}
void *
ggc_alloc (size_t size)
{
char* user_area;
size_t* size_area;
size_t actual;
actual = size + sizeof (size_t);
if (actual > chunk_remaining)
get_chunk(actual);
user_area = chunk_begin + sizeof (size_t);
size_area = (size_t*) (chunk_begin);
*size_area = size;
chunk_begin += actual;
chunk_remaining -= actual;
return user_area;
}
int
ggc_set_mark (const void *p ATTRIBUTE_UNUSED)
{
abort ();
return 0;
}
/* Return 1 if P has been marked, zero otherwise. */
int
ggc_marked_p (const void *p ATTRIBUTE_UNUSED)
{
return 1;
}
/* Return the size of the gc-able object P. */
size_t
ggc_get_size (const void *p)
{
return *((int*) ((char*) p - sizeof(size_t)));
}
/* A placeholder collection routine */
void
ggc_collect ()
{
}
/* Called once to initialize the garbage collector. */
void
init_ggc ()
{
}
/* Start a new GGC context. Memory allocated in previous contexts
will not be collected while the new context is active. */
void
ggc_push_context ()
{
}
/* Finish a GC context. Any uncollected memory in the new context
will be merged with the old context. */
void
ggc_pop_context ()
{
}
/* Report on GC memory usage. */
void
ggc_print_statistics ()
{
fprintf(stderr, "Statistics? What statistics?\n");
}
struct ggc_pch_data *
init_ggc_pch ()
{
sorry ("Generating PCH files is not supported when using ggc-placeholder.c");
return NULL;
}
void
ggc_pch_count_object (d, x, size)
struct ggc_pch_data *d ATTRIBUTE_UNUSED;
void *x ATTRIBUTE_UNUSED;
size_t size ATTRIBUTE_UNUSED;
{
}
size_t
ggc_pch_total_size (d)
struct ggc_pch_data *d ATTRIBUTE_UNUSED;
{
return 0;
}
void
ggc_pch_this_base (d, base)
struct ggc_pch_data *d ATTRIBUTE_UNUSED;
void *base ATTRIBUTE_UNUSED;
{
}
char *
ggc_pch_alloc_object (d, x, size)
struct ggc_pch_data *d ATTRIBUTE_UNUSED;
void *x ATTRIBUTE_UNUSED;
size_t size ATTRIBUTE_UNUSED;
{
return NULL;
}
void
ggc_pch_prepare_write (d, f)
struct ggc_pch_data * d ATTRIBUTE_UNUSED;
FILE * f ATTRIBUTE_UNUSED;
{
}
void
ggc_pch_write_object (d, f, x, newx, size)
struct ggc_pch_data * d ATTRIBUTE_UNUSED;
FILE *f ATTRIBUTE_UNUSED;
void *x ATTRIBUTE_UNUSED;
void *newx ATTRIBUTE_UNUSED;
size_t size ATTRIBUTE_UNUSED;
{
}
void
ggc_pch_finish (d, f)
struct ggc_pch_data * d ATTRIBUTE_UNUSED;
FILE *f ATTRIBUTE_UNUSED;
{
}
void
ggc_pch_read (f, addr)
FILE *f ATTRIBUTE_UNUSED;
void *addr ATTRIBUTE_UNUSED;
{
/* This should be impossible, since we won't generate any valid PCH
files for this configuration. */
abort ();
}
/* Dummy stuff for pfe */
void pfe_freeze_thaw_ggc(void* p) ;
void check_struct_page_entry(int n) ;
void check_struct_page_group(int n) ;
void check_struct_page_table_chain(int n) ;
void check_struct_globals(int n) ;
void pfe_freeze_thaw_ggc(void* p ATTRIBUTE_UNUSED) { }
void check_struct_page_entry(int n ATTRIBUTE_UNUSED) { }
void check_struct_page_group(int n ATTRIBUTE_UNUSED) { }
void check_struct_page_table_chain(int n ATTRIBUTE_UNUSED) { }
void check_struct_globals(int n ATTRIBUTE_UNUSED) { }