This is the mail archive of the java-discuss@sources.redhat.com mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Initial profiling results (was: Re: -O2 loop problems)


Tom Tromey wrote:
> If you've got any profiling info I'd love to see it.  I don't really
> have a feeling for what bottlenecks might exist for real applications.

Data is now trickling in.  I had hoped to run some fairly comprehensive programs
by this week, but I didn't plan on spending the holidays fixing my alpha-linux
build... *sigh*

(Anybody else seeing weird memory corruption from iconv?  It could be a platform
bug, or a 64-bit thing.)

My first experiment, a simple interpreter test, exercises various parts of the
runtime a bit, particularly collections and object allocation.  Here is what I
have so far for libgcj:

  Name                                     Count   Pct
  ----                                    ------  ----
  _Jv_FindArrayClass__FPQ34java4lang5Cla    5620   2.5
  _Jv_AllocObject                           5487   2.4
  _Jv_CheckArrayStore                       5062   2.2
  table_search__FPA1_Cwiw                   4513   2.0
  _Jv_InitClass                             3661   1.6
  _Jv_FindClassInCache__FP13_Jv_Utf8Cons    2847   1.2
  _Jv_AllocObj__FiPQ34java4lang5Class       2809   1.2 
  get__Q34java4util9HashtablePQ34java4la    2774   1.2
  _Jv_MonitorEnter                          2765   1.2
  _Jv_equalUtf8Consts__FP13_Jv_Utf8Const    2690   1.2
  hashUtf8String__FPci                      2567   1.1

2nd column is a count of samples, 3rd is the percentage of total samples for the
entire process.  Samples below 1% are omitted.

I wouldn't have guessed that _Jv_FindArrayClass is hit so hard.  It also
surprises me that so many cycles are spent in _Jv_AllocObject, which doesn't do
much besides call the GC allocator.

I have similar counts for the GC:

  GC_malloc                                12232   5.3
  GC_mark_from_mark_stack                   6759   3.0
  GC_malloc_atomic                          3559   1.6
  GC_reclaim_clear                          2425   1.1

and libc:

  memset_loop                              11041   4.8
  memcpy                                    5963   2.6
  __divqu                                   5217   2.3
  __pthread_lock                            3488   1.5
  __pthread_getspecific                     2523   1.1
  __pthread_unlock                          2512   1.1

(__divqu is an Alpha thing... no integer division in h/w.)  Overall, everything
seems well-balanced, no single function consumes more than about 5% overall
runtime.

In theory I can measure relative times spent in
application/libgcj/libgcjgc/libc/kernel mode, once I have shared libs working
again. *double sigh*

I tried some event counters to compare gcj to the Compaq JDK:

  VM                     Wall time   Instructions       Cycles    CPI
  ------------------     ---------   ------------   ----------  -----
  gcj, -O2 -fno-defer-pop   30.7 s       5.9x10^9    16.3x10^9   2.76
  JDK 1.2.2, symcjit       101.8 s      16.1x10^9    55.7x10^9   3.45

Not a fair comparison, really.  The JDK is fast on single method execution, but
far inferior at real programs, because in spite of the JIT it is based on Sun's
original "Classic" VM without the runtime speedups.

The CPI is interesting, since it gives a relative measure of pipeline
performance. This machine is capable of issuing two integer insns per cycle, in
theory.  2.76 is not a particularly good result (lower is better) though it
fares far better than the JDK.  Reasons for pipeline stalls are cache misses,
branch mispredictions and poorly scheduled code (read-after-write conflicts,
etc.).  It should improve once we can do adequate method inlining.  

I'd love to know how to get at similar hardware event counters on x86.  I'm not
familiar with any free tools for that arch.  Since I can't do reasonable-cost
multithreaded profiling in software, I'm sticking with Iprobe on Alpha.

(Sorry about the long post.  This is perhaps far more detail than you expected,
and I've just hit the tip of the iceberg.  The real fun is figuring out how to
interpret the data.  I'll keep you posted on any unusual results.)

--
Jeff Sturm
jeff.sturm@commerceone.com

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]