This is the mail archive of the java-patches@gcc.gnu.org mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

patch Java string interning problems


This seems to boostrap and pass the tests, but there is a Kawa regression
which may or may not be related.  I'd like to understand/fix that before
I check this in; once I do, it should go in both the branch and the trunk.

The first patch is the critical one - _Jv_NewStringUtf8Const was entering
a String into strhash (the intern table) without register a finalizer.
Oops.

The second patch fixes a performance bug - the chaining was using the
same low-order bits of the hash code as the primary index.  This means
that all strings that mapped into the same initial slot in strhash
would step through the same sequence of alternatives slots, which leads
to the bunching we wanted to avoid.  Another ooops.  Using the high-order
bits seems to make sense.

Finally, it seemed better to rehash when 2/3 full rather than waiting
to rehash until we are 3/4 full.  This hashing scheme works best when
the table is somewhat sparse, and we can afford to spend a little more
space, given that we don't have the overhead of chaining with links.

If someone feels inspired to do some measurements to see how these fixes
perform (i.e. how much probing is done), that would be really cool.

2001-04-08  Per Bothner  <per@bothner.com>

	* java/lang/natString.cc (_Jv_NewStringUtf8Const):  Register finalizer.

	* java/lang/natString.cc (_Jv_StringFindSlot, rehash):  Use high-order
	bits of hash to calculate step for chaining.

	* java/lang/natString.cc (intern, _Jv_NewStringUtf8Const):  Rehash
	when 2/3 full, rather than 3/4 full.

Index: natString.cc
===================================================================
RCS file: /cvs/gcc/gcc/libjava/java/lang/natString.cc,v
retrieving revision 1.16.4.2
diff -u -p -r1.16.4.2 natString.cc
--- natString.cc	2001/04/01 21:49:48	1.16.4.2
+++ natString.cc	2001/04/08 18:10:16
@@ -62,7 +62,7 @@ _Jv_StringFindSlot (jchar* data, jint le
 
   int index = start_index;
   /* step must be non-zero, and relatively prime with strhash_size. */
-  int step = 8 * hash + 7;
+  jint step = (hash ^ (hash >> 16)) | 1;
   for (;;)
     {
       jstring* ptr = &strhash[index];
@@ -145,7 +145,7 @@ java::lang::String::rehash()
 	  jstring val = (jstring) UNMASK_PTR (*ptr);
 	  jint hash = val->hashCode();
 	  jint index = hash & (nsize - 1);
-	  jint step = 8 * hash + 7;
+	  jint step = (hash ^ (hash >> 16)) | 1;
 	  for (;;)
 	    {
 	      if (next[index] == NULL)
@@ -166,7 +166,7 @@ jstring
 java::lang::String::intern()
 {
   JvSynchronize sync (&StringClass);
-  if (4 * strhash_count >= 3 * strhash_size)
+  if (3 * strhash_count >= 2 * strhash_size)
     rehash();
   jstring* ptr = _Jv_StringGetSlot(this);
   if (*ptr != NULL && *ptr != DELETED_STRING)
@@ -270,7 +270,7 @@ _Jv_NewStringUtf8Const (Utf8Const* str)
   chrs -= length;
 
   JvSynchronize sync (&StringClass);
-  if (4 * strhash_count >= 3 * strhash_size)
+  if (3 * strhash_count >= 2 * strhash_size)
     java::lang::String::rehash();
   int hash = str->hash;
   jstring* ptr = _Jv_StringFindSlot (chrs, length, hash);
@@ -285,6 +285,8 @@ _Jv_NewStringUtf8Const (Utf8Const* str)
     }
   *ptr = jstr;
   SET_STRING_IS_INTERNED(jstr);
+  // When string is GC'd, clear the slot in the hash table.
+  _Jv_RegisterFinalizer ((void *) str, unintern);
   return jstr;
 }
 

-- 
	--Per Bothner
per@bothner.com   http://www.bothner.com/~per/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]