This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: GNU ObjC runtime class table rewritten



Hi Ovidiu,

thanks for your comments :-)

After reading your comments, I think I forgot to talk about order of
magnitudes.  The table is sized so that linked lists are nearly always
composed of a single string - or at most two or three.

    Ovidiu> 1. In class_table_insert(), when you insert the node in
    Ovidiu> the hash table that already has a linked list for the a
    Ovidiu> given index, you can insert it at the beginning of the
    Ovidiu> list. Thus you have constant time for insertion. 

Good point.

I wouldn't make the change though, simply because - because of the
orders of magnitude involved - it wouldn't make any real difference.

The hash table is so big that long linked lists should be very rare.
As an example, a normal gnustep gui application has something like 300
classes, and the hash table has 1024 slots.  Actually, taking into
account all ObjC libraries on my system, I hardly have 1000 classes.
I think applications with 3000 classes are probably possible in some
extreme cases - but even in that case, linked lists would be composed
by 3 nodes on average, so that inserting a class at the end or at the
beginning does not really make any difference.

    Ovidiu> However
    Ovidiu> this will not help much during searches, so probably a
    Ovidiu> better idea is to insert the node in the linked list so
    Ovidiu> that it is sorted based on the length of the class
    Ovidiu> name. You can than modify the class_table_get_safe()
    Ovidiu> function to take into account this modification.
    Ovidiu> Maybe a smarter arrangement of the linked list could also
    Ovidiu> be done. Moving the node corresponding to a class to the
    Ovidiu> beginning of the linked list, based on the access
    Ovidiu> frequency may help in cases where the program has a large
    Ovidiu> number of classes. However the bookkeeping may be too
    Ovidiu> costly to be done on each class lookup.  Probably a better
    Ovidiu> approach is to modify the size of the hash table
    Ovidiu> dynamically, as the number of classes in the program
    Ovidiu> increases.

Yes - there are quite some standard enhancements which could be
attempted.  I did one of them - using the string length as a secondary
hash.  There are two reasons I wouldn't add any other though:

1. If you don't have a huge number of classes - say that you have 600
(which are already a lot - the biggest application I could build on my
system) - then those enhancements have no effect at all - or if they
have an effect, it is to slow down things: the current lookup computes
the hash, jumps to the correct linked list, which contains a single
string - the right one, and compares it.  You can't go faster than
that (for example, after adding the length checks, that slowed down (a
few percent) the code in normal conditions <as you make the lookup
path longer by comparing also lenghts>.  I added it thinking it would
be useful for very huge apps, as in the following case).  Even if you
have, say, 2000 classes - even in that case, on average you have 2
strings in the linked list.  Those two strings are very likely to have
different lenghts.  Thus the current code just computes the hash,
jumps to the linked list, checks the length of the first string, if
it's correct it compares this string with the string to lookup (and
it's nearly certainly a match), otherwise it compares the next string.
It's not easy to make it faster than that.  Intelligent enhancements
start to make sense only when you have longer linked lists.  And in
that case, I would simply suggest to increase (double) the hash table
size, and recompile the library. <how did I benchmark/study the case
of table full?  By reducing the hash table size, so that I could
over-fill the table with my 600 classes.>

2. Keeping the whole stuff very simple (hash table of fixed size,
linked lists in which all operations are done at the end) means it is
very easy to understand what is happening when the stuff is running
multithreading, and one thread is writing, and multiple other threads
are reading (without locks) at the same time of the writing.  If you
start moving things around, it's more difficult to be sure that even
without locks, the thing doesn't crash in multithreading.  I prefer it
to be 10% slower, but absolutely patent and obvious that it really
works without locks and doesn't crash even under thread stress.  Who
knows who will have to maintain this code in the future, let's make
things easy for them, simple stuff.  I'm quite happy you both seem to
have read and understood the code easily.


    Ovidiu> 2. The class_table_replace() does an exhaustive search on
    Ovidiu> the hash table. I think you could instead use the hash
    Ovidiu> code for the old_class_pointer to directly identify the
    Ovidiu> linked list that contains the class. This would speed up
    Ovidiu> things considerably.

Your suggestion looks like a very good idea.  I did it that way
because I simply mimicked the way the old code was doing the
replacement - since I wasn't really interested in working on poseAs:,
and didn't want to break it.  Also, I did some tests to check it still
works, but nothing exceptionally accurate.

But once the new code is in place, this change you propose to
class_table_replace looks like a good improvement to make.


    Ovidiu> Just to satisfy my curiosity, did you do any benchmarks
    Ovidiu> using class_table_print_histogram to see how good that
    Ovidiu> hash function performs in a normal OpenStep environment? 
    Ovidiu> I'm curious to see how similar prefixes for the class
    Ovidiu> names affects the hash function.

Yes, I did a lot of benchmarking using simple application linked
against the gnustep libraries, and evaluated different hash functions.

The hash function must be very fast to compute (for example, the one
used elsewhere in libobjc was too slow), and still it must be good.
Quite some fast hashes I tried were bad.  Typically, as you guessed,
they wouldn't work well with strings with the same prefix (NSString,
NSArray); or with strings differing only for a letter (GSString,
NSString); or with strings where one extends the other one (NSText,
NSTextView); or even some would be too much correlated to the string's
length (ie, strings with the same hash tended to have the same
length).  Since I wanted to use the length as a secondary hash, the
first hash must not be correlated to the length.  The hash submitted
was the best one I could find for my tests: it's quite quick, but
still works well.

Thanks for the good comments !


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]