I have a very compact code example (10 lines) for which running java interpreter on a class file takes 4.5s and executing optimized binary created from the class file with gcj takes 5.5s. Something is funny here. Can anyone explain what's happening? % uname -a Linux localdomain 2.6.12-1.1372_FC3 #1 Fri Jul 15 00:59:10 EDT 2005 i686 i686 i386 GNU/Linux >>>>>>>>>>> The java code (X.java): import java.lang.*; import java.util.*; import java.io.*; public class X { public static void main(String[] args) { Random R = new Random(); for (int i=0; i < 10000000; i++) R.nextDouble(); } } >>>>>>>>>>The Makefile all: @javac X.java @echo "*** Interpreted:" @time java -classpath . X @gcj -O3 --main=X -o x X.java @echo "*** Compiled:" @time ./x >>>>>>>>>>Here is the output: *** Interpreted: 4.30user 0.02system 0:04.51elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (1major+2734minor)pagefaults 0swaps *** Compiled: 5.55user 0.01system 0:05.58elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+2069minor)pagefaults 0swaps
Actually for me Sun's java and GCJ take about the same at the default settings (well for GCJ compiled at -O3). Now if I change Sun's java to use the server tuned JIT, Sun's java is 2 seconds while GCJ is still 4.5 or so. timecopper:~>time ./a.out 4999358.6586720785 4.900u 0.070s 0:06.12 81.2% 0+0k 0+0io 148pf+0w copper:~>javac X.java java Xcopper:~>java X 5001132.020866861 copper:~>time java X 5000974.98115474 4.791u 0.037s 0:04.88 98.7% 0+0k 0+0io 0pf+0w copper:~>time java -server X 5000248.791328681 3.290u 0.061s 0:03.91 85.6% 0+0k 0+0io 54pf+0w This with keeping and printing out the return value for nextDouble. This might be startup/shut down time taking into account.
I see this too. Compiling with -fno-bounds-check helps, but not enough. One possibility is simply that our implementation of nextDouble is inefficient. I doubt this function was coded for maximum performance.
Subject: Re: Sun's JIT faster than gcc for Random.nextDouble It looks like the problem is that we don't remove the synchronization for nextDouble() even though the test case is single-threaded. qprof: /tmp/x: 299 samples, 299 counts X::main(JArray<java::lang::String*>*):X.java:8 5 ( 2%) libc.so.6(memchr) 1 ( 0%) libgcj.so.6 2 ( 1%) libgcj.so.6(_Jv_MonitorEnter) 110 ( 37%) libgcj.so.6(_Jv_MonitorExit) 108 ( 36%) libgcj.so.6(_ZN4java4util6Random4nextEi) 27 ( 9%) libgcj.so.6(_ZN4java4util6Random10nextDoubleEv) 46 ( 15%)
Yes, I think that most invocations of next should be inlined, and wrapped in a single synchronized block. Apart from that, I am pretty sure that here seed = (seed * 0x5DEECE66DL + 0xBL) & ((1L << 48) - 1); return (int) (seed >>> (48 - bits)); it makes no difference if you do the AND or not. So nextDouble could be implemented as: long first; long second; synchronized (this) { seed = (seed * 0x5DEECE66DL + 0xBL); first = (seed & 0x0000FFFFFFC00000L) << 5; seed = (seed * 0x5DEECE66DL + 0xBL); second = (seed >> 21) & 0x7FFFFFF; } return (first | second) / (double) (1L << 53); Similarly, for nextFloat float f; int bits; synchronized (this) { seed = (seed * 0x5DEECE66DL + 0xBL); bits = (int) (seed >> 24); } return bits / 16777216.0f; nextInt int bits; synchronized (this) { seed = (seed * 0x5DEECE66DL + 0xBL); bits = (int) (seed >> 16); } nextLong long first, second; synchronized (this) { seed = (seed * 0x5DEECE66DL + 0xBL); first = (seed << 16) & 0xFFFFFFFF00000000L; seed = (seed * 0x5DEECE66DL + 0xBL); second = (seed >> 16) & 0xFFFFFFFFL; } return first | second; nextInt (n) int bits; synchronized (seed) { seed = (seed * 0x5DEECE66DL + 0xBL); bits = (int) (seed >> 17); } if ((n & -n) == n) // i.e., n is a power of 2 return (int)((n * (long) bits) >> 31); int bits, val; val = bits % n; if (bits - val + n - 1 < 0) synchronized (seed) { do { seed = (seed * 0x5DEECE66DL + 0xBL); bits = (int) (seed >> 17); } while (bits - val + n - 1 < 0); } return val; nextBoolean boolean bit; synchronized (this) { seed = (seed * 0x5DEECE66DL + 0xBL); bit = (seed & 0x800000000000) != 0; } return bit; And I left out nextBytes, I know. Also these are untested, which is why I'm not preparing a full patch. Paolo
> It looks like the problem is that we don't remove the synchronization > for nextDouble() even though the test case is single-threaded. If we can remove even only half of the synchronization overhead, by synchronizing just once per nextDouble() call, it's a win.
Closing as won't fix as the Java front-end has been removed from the trunk.