need to focus on java performance?

Mon May 22 15:41:00 GMT 2006

Andrew Haley wrote:
> David Daney writes:
>  > Tom Tromey wrote:
>  > > We do pay for our approach in 2 ways that I'm aware of.  One is that
>  > > we compile everything PIC.  The other is that BC itself comes with a
>  > > pretty big price -- 15% if I remember correctly.  That is a lot of
>  > > overhead to have to overcome.
>  > >   
>  > 
>  > I threw this idea out before, but since we are on the subject I will 
>  > offer it up again (perhaps slightly revised):
>  > 
>  > The overhead (once a class is linked) in BC comes from all the 
>  > indirection used by the current BC runtime linker.  I think it is 
>  > possible to get rid of the indirection at the expense of a more complex 
>  > (and less portable) linker.
>  > 
>  > <hand_waving>
>  > 
>  > My idea is to compile everything to use as little indirection as 
>  > possible (similar to the current C++ ABI) except that all call targets 
>  > initially point to linker stubs.  Similarly all data accesses initially 
>  > point to a region of memory that will trap when accessed.
> 
> Traps are vv. expensive in Linux.  That makes this idea pretty much a
> non-starter.
> 

In x86 code, a call instruction has the same size as a mov to eax.  So 
instead of trapping, initially encode all static accesses as a call to a 
stub.  When patching the code, rewrite it to a load (mov).  This avoids 
all trapping.  Non-starter eliminated

>  > Whenever one of the stubs is traversed, or a there is trap in the 
>  > special memory region, the runtime linker take over and patches the call 
>  > site (perhaps after doing some linking and class initialization) so that 
>  > the next time the call/memory access is direct to the properly 
>  > initialized class.
> 
> All the text in a file is shared by all the processes that use it.
> You can't patch the call site without generating a copy of the page
> you're patching.  We could do it by making the libraries non-shared,
> but this would have other bad consequences.  I doubt whether this is
> worthwhile in general: it might lead to better microbenchmark
> performance, but worse performance under real world loads.
> 
>  > Since it would be impossible to access an uninitialized class, all
>  > of those calls to Jv_initClass (or what ever it is called) could be
>  > eliminated.
> 
> That would be nice.  It's fairly easy to patch call sites (that's what
> ld.so does) to remove all the Jv_initClass calls but hard to do it
> portable because of problems with locking.
> 
>  > * All this runtime patching of the code would make it impossible to
>  > share code pages across different executables executing at the same
>  > time.  But JIT systems have this problem anyhow, so it would not be
>  > worse than that with respect to sharing code pages.
> 
> Right, but this is one of the big disadvantages that JITs have.
> Shared text=good, indirection=bad.  It's a tradeoff.
> 

The patching of the call sites could be defered so that only hot 
portions of code are patched.  For seldom traversed paths, the stub 
would do the 'right thing'.  That would make most of the code shareable, 
but allow you to optimize hot sections.

> It's pretty clear that if we're prepared to do non-portable things we
> can make gcj much faster.   
> 
> Andrew.