This is the mail archive of the java@gcc.gnu.org mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Avoid function call with JVMPI enabled


On Aug 24, 2004, at 8:42 PM, Boehm, Hans wrote:

Presumably you can still move the size computation out of
loops if you pass it as a parameter?  Similarly for the finalizer
lookup?

That is possible, but there are trade-offs - we'd have to be sure that it really is a win. Possible disadvantages include:


- Increased code size due to the extra code needed to prepare the arguments.

- Increased dependence between compiled code and the layout of vtables/class objects in the runtime. This would work against us if we want to make changes in the future.

I'm worried about all of this because my impression is that
the allocation path is one of the major places we currently
lose substantial amounts of performance relative to standard JVMs.
(I know we are substantially slower here; I only have anecdotal
evidence that it's important, but I think it is.)

Yeah, its certainly important, and we should optimize wherever possible. Certainly we can make the allocation routines better, but I suspect a good chunk of allocation overhead may be due to things beyond just JvAllocObject - eg JITs doing better at inlining constructor calls (gcj calling the Object constructor is an obvious example of a missed optimization opportunity here, since it means we don't fully inline _any_ constructors).


Is there a way to make the dynamic test for a nontrivial finalizer
cheaper?
...
I haven't followed the discussion of the BC-ABI enough.  Is there a
way to get a dynamically set flag into the vtable?  Or can
allocation become a method in the vtable?

Yes. With the BC-ABI, the vtables are constructed at runtime, so we can insert extra flags at runtime without breaking existing binaries. We could, say, put a field containing the length value as well as bit-flags like has_finalizer.


There may be a clean way to do this, which leaves the one allocation
procedure outside the GC. The GC always had an inline-able fast
allocation routine. The problem was that this made the client code
dependent on the GC version and gc_priv.h, since the inlined code knew something
about GC_arrays offsets. But I would really like to make
THREAD_LOCAL_ALLOC the default (and perhaps only real option)
in the next major GC version. In that case,
the inlined code only needs to know about thread-local allocation
buffers whose layout we could probably freeze, and which are much
more self-contained. I'll think about it.

I don't see any major problems with inlining allocator code into libgcj, we link in the GC anyway thus there is little chance of version skew. It would mean that we need a runtime check for the debug case, though. I think we can enable GC_USE_COMPILER_TLS for many (most?) linux targets now, so that should simplify any inlined allocator code a bit.


Regards

Bryce


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]