This is the mail archive of the
java@gcc.gnu.org
mailing list for the Java project.
Re: Speed Impact experiment on GCJ
On Wed, 15 Feb 2006 12:29:24 +0100, Thomas Hallgren wrote:
> I think that's incorrect. The 1.5 JVM and it's rt.jar is backward
> compatible with 1.4. And you're likely to see some performance gain if you
> upgrade to the latest 1.5.0_06 version. I think even a highly optimized
> gcj compilation will have a hard time to catch up. It would be interesting
> to see a comparison.
Even more interesting would be a comparison with a recent Mustang aka 1.6
build, because that will make the 1.5 test look slow. :-)
The one thing where gcj used to have a very clear advantage was
interface-method invocation since before 1.5 that used to be up to 10 (!)
times slower than direct method invocations; this was reduced to a
factor of ~1.8 in 1.5 and completely optimized away in Mustang.
Mustang also sports a new register allocator that makes a tremendous
difference for loops and therefore String/array/collection operations.
The other big change - escape analysis - is very recent and supposed to
help the GC but I didn't benchmark that; all I know is that some large
maven builds are up to twice as fast with 1.6 than with 1.5. It is really
fast.
Here are the results of a simple benchmark that I used to track the method
invocation evolution over the years. "Direct" is a simple base class
method invocation, "Derived" tests the same method for a derived class and
"Interface" uses the derived class but refers to the method via an
interface. The method simply does several hundred million multiplications
and stores the value in a static field in order to avoid being optimized
away by clever compilers and to cause cache flushes/memory writes. There
are no allocations or GCs going on. Yes I fully understand that this kind
of benchmarking is hokey and does not mean much for real-world
applications, but please read on anyway. ;)
1.4.2_09:
Direct : 2813
Direct : 1734
Direct : 1735
Direct : 1765
Direct : 1735
Derived : 2843
Derived : 1750
Derived : 1735
Derived : 1734
Derived : 1750
Interface : 15297
Interface : 14500
Interface : 14656
Interface : 14469
Interface : 14516
Check out the ridiculous invokeinterface times!
1.5.0_05:
Direct : 1969
Direct : 1750
Direct : 1750
Direct : 1734
Direct : 1781
Derived : 1844
Derived : 1750
Derived : 1781
Derived : 1766
Derived : 1750
Interface : 3343
Interface : 2735
Interface : 2765
Interface : 2750
Interface : 2735
Now they are a bit better, only almost twice as slow
1.6-b71:
Direct : 1750
Direct : 1750
Direct : 1375
Direct : 1375
Direct : 1391
Derived : 1750
Derived : 1750
Derived : 1359
Derived : 1359
Derived : 1375
Interface : 2610
Interface : 2594
Interface : 1375
Interface : 1359
Interface : 1391
..and in Mustang they are consistently as fast as everything else.
Also consistently visible is how HotSpot kicks in after the first or
second run and inlines the method.
GCJ 3.4.5 on Windows is the last and only release that I have; might be
good to test this with 4.x. Anybody who can test please let me know.
gcj 3.4.5 no optimization:
Direct : 5672
Direct : 5703
Direct : 5719
Direct : 5672
Direct : 5672
Derived : 5734
Derived : 5734
Derived : 5735
Derived : 5703
Derived : 5734
Interface : 10219
Interface : 10219
Interface : 10312
Interface : 10282
Interface : 10250
gcj 3.4.5 -O:
Direct : 4016
Direct : 4016
Direct : 4109
Direct : 3969
Direct : 4015
Derived : 3938
Derived : 3937
Derived : 3953
Derived : 3969
Derived : 3937
Interface : 7860
Interface : 7890
Interface : 7875
Interface : 7875
Interface : 7844
gcj 3.4.5 -O2:
Direct : 3859
Direct : 3891
Direct : 3890
Direct : 3875
Direct : 3875
Derived : 3890
Derived : 3891
Derived : 3938
Derived : 3937
Derived : 3906
Interface : 7391
Interface : 7453
Interface : 7438
Interface : 7390
Interface : 7453
gcj 3.4.5 -O3:
same as -O2 though some were consistently a bit slower!
gcj 3.4.5 -O2 -funroll-loops -finline-functions:
Direct : 3547
Direct : 3532
Direct : 3515
Direct : 3516
Direct : 3515
Derived : 3516
Derived : 3500
Derived : 3547
Derived : 3500
Derived : 3516
Interface : 7078
Interface : 7109
Interface : 7094
Interface : 7078
Interface : 7089
What is interesting here is that GCC really does seem to shave off small
increments with each optimization step. The absolute time is quite a
bit behind the modern JDKs, though.
> Also note that the Sun JVM is not the fastest one around. Both IBM and BEA
> (JRockit) are known to be faster.
Yes - depending on OS, application, the person doing the tuning and the
company that does the benchmark. I would replace "known" with "has been
observed in a particular setting".
Holger