This is the mail archive of the
java@gcc.gnu.org
mailing list for the Java project.
RE: I'm new and wanting to help!
- From: "Boehm, Hans" <hans dot boehm at hp dot com>
- To: "Joel Dice" <dicej at mailsnare dot net>, "Tom Tromey" <tromey at redhat dot com>
- Cc: <java at gcc dot gnu dot org>
- Date: Thu, 3 Aug 2006 12:54:21 -0700
- Subject: RE: I'm new and wanting to help!
> -----Original Message-----
> From: Joel Dice
> Well, it's nice to know it's possible, anyway. :)
>
> > I'm not a GC expert, so I don't know how well this would really
> > compare to what we've got now. And, maybe a mostly-copying
> GC would
> > serve just as well.
>
> I'm no expert either, but my understanding is that you pick
> copying GC when you want to maximize cache locality and
> minimize fragmentation. In order to get an idea of how such
> a strategy compares with conservative GC, it may be worth
> benchmarking something like PLT Scheme, which supports
> either, I believe.
The tradeoffs are complicated.
Clearly pure two space copying collection wins on worst-case
fragmentation. On the other hand, it typically loses in overall space
consumption, at least if nearly all objects are small, just because it
does need space for a second copy of the heap. If you throw large
objects into the mix, the current collector occasionally has more
serious fragmentation issues, even in practice. But those can be
avoided with memory remapping games. And copying collectors typically
don't really copy those large objects either, so they may have similar
issues, depending on the implementation.
There are some reasonable comparisons in the 2004 SIGMETRICS paper by
Blackburn, Cheng, and McKinley. They did get significantly better
locality out of any collector, such as a copying one, that supports
contiguous allocation.
But they measured a slightly different mark-sweep collector, and I still
do not have a great understanding for the origin of the locality
improvements. Based on what I've seen, a mark-sweep collector like the
one we're currently using, or the one they measured, usually ends up
allocating contiguous objects for a specific object size. Hence
consecutively allocated objects may not to reside in the same cache
line. But if you look at 1000 consecutively allocated objects, they
should be spread out over the same number of cache lines in either case.
It's quite possible that second order effects, e.g. those cases in which
consecutive objects are not available, or interaction with hardware
prefetchers, actually end up dominating.
(Their measurements, like all such measurements, have a couple of
shortcomings. The collectors they used at the time had tracing rates
much slower than that of the gcj one, which probably exaggerated the
overall impact of GC time and possibly the benefit of generational GC.
I understand this has since been fixed in the Jikes collectors. Their
allocation rate however was much better than ours, for reasons that I
think we understand. (The differences in both cases were well over a
factor of 2, as I recall.) Secondly, I believe their Mark-Sweep
collector does not avoid touching pages of pointerfree objects, whereas
gcj does so, and tries hard to moderately hard to make things like
strings pointerfree. This probably makes semispace copying look better
in their results than it should be, since semi-space copying does have
to touch all those objects, in two different locations.)
The object rearrangement you get out of simple Cheney-style
breadth-first copying generally doesn't buy you a lot in terms of
locality, I think. They used depth-first scanning, which did seem to
help a bit. Cleverer schemes based in observing reference behavior
probably can do substantially better.
>
> > Joel> Also, how might java.util.IdentityHashMap or the
> > Joel> hash-synchronization option be implemented when using
> a copying
> > Joel> GC, given that object addresses may change at any time?
> >
> > I'm not sure how other VMs do this. I know earlier ones kept the
> > "system" hash code in the object header, but I'm not sure
> whether this
> > is current practice.
>
> I've just assumed that other VMs don't use copying
> collectors. One option would be to have support for
> IdentityHashMaps built in to the GC, so that each instance is
> rehashed after GC.
>
I think most other VMs use a copied young generation. They have to
store hashcodes separately in some cases, and generally have more of an
object header than gcj does. Treatment of the old generation varies.
I think the places we could win for object allocation are:
1) Support a generational collector with a copied young generation. I
think this wins on most, though not all, benchmarks. This is hard,
since you need to support copying and hence exact pointer
identification. Note that most VMS do this by adding safepoints to the
code so that not all temporaries have to be described everywhere. The
safepoints typically add some overhead to nonallocating code, but that's
often small.
2) Alternatives to the above that are probably not as good, but limit
the work mostly to the collector. A mostly-copied young generation
might be nearly as good. There are known issues with the current
non-mving generational collector, which could be fixed to make it more
viable.
3) Escape analysis in the compiler to reduce the allocation volume by
using stack allocation instead in some places. This might get us a
significant fraction of the benefit of a generational collector by
eliminating t some of the volume of very short-lived objects.
4) Fix the allocation path. That probably requires gc7 integration.
Gc7.0 has better support for in-line allocation, which might need a bit
more work to inline allocation inside the libgcj allocation routines. I
think that with compiler help, we could do better still. (Last time I
looked, the allocation path was fairly significant in profiles; the
problem wasn't just the marker. But I haven't looked recently.)
Hans