Hash table and binary search trees

It is now over 60 days since the last post. This thread is closed.
Refresh page
Posted by 
Nick Cash
USA (626 posts) bio

Date 
Wed 22 Dec 2004 02:06 AM (UTC) Amended on Wed 22 Dec 2004 02:10 AM (UTC) by Nick Cash

Message 
I was just wandering around, and I remembered these two things. I've read a little about them, but not much. Could anyone give me a brief synopsis (or point me to a good site) that show's what exactly these things are, their bennefits, and a basic/good implementation?
edit:
I know SMAUG uses a hash table, but I never really looked up how it really worked. Thats really about the extent of my experience with a hash table. As for a binary tree, I've never done anything with them.
Thanks :) 
~Nick Cash
http://www.nickcash.com  top 

Posted by 
Nick Gammon
Australia (21,651 posts) bio
Forum Administrator 
Date 
Reply #1 on Wed 22 Dec 2004 02:20 AM (UTC) 
Message 
Hash tables are a quick way of looking up random data by making a "hash" of the key and then indexing the key into a directlookup table.
For example, if you hashed your keys (eg. names) in such a way that the hash was in the range 0 to 255 then you could then index straight into a 256item table to find the resulting name, without having to scan the entire list.
Clearly a potential problem is collisions, where multiple names hash to the same thing. Typically a hash implementation reverts to a linear scan to handle this case.
Hashes can be very fast, however you usually have to choose between a smallish table (and thus have collisions) or a large table, which reduces collisions but then is likely to waste space by having empty entries.
Binary trees on the other hand sort keys into pairs, eg.
A M
AK MP
AB KL MN PR
Each pair indexes into a smaller chunk of the data. Transversing such a table is, I think, proportional to the log of the number of entries (each level is another power of 2).
For example, in this case to find N you need to go down 3 levels. (log (8) / log (2))
For each additional level you double the number of entries stored, for only one more level traversal. eg. 16 items would take 4 levels, 32 items would be 5 levels and so on.
Thus such a tree can be very quick to find something amongst a large number of entries. A sideeffect is that by traversing the bottom level things are in sequence, which isn't true for hash tables, which are essentially random.
You can find implementations of trees in the STL (sets, multisets, maps, multimaps). I think some STL implementations also have hash sets and hash maps as an extension. 
 Nick Gammon
www.gammon.com.au, www.mushclient.com  top 

Posted by 
Nick Cash
USA (626 posts) bio

Date 
Reply #2 on Wed 22 Dec 2004 02:59 AM (UTC) Amended on Wed 22 Dec 2004 03:19 AM (UTC) by Nick Cash

Message 
Hmm, very interesting.
/* for commands */
for ( x = 0; x < 126; x++ )
{
for ( command = command_hash[x]; command; command = command>next )
{
/* rest of code omitted */
So this is going through the hash table. The command hash table has 127 spaces in it. There are then command's within that space, which are all linked to eachother in their own linked list. Correct?
 Edit 
for ( hash = 0; hash < MAX_KEY_HASH; hash++ )
for ( room = room_index_hash[hash] ; room ; room = room>next)
/* rest of code omitted */
Looking at this, I go and look up MAX_KEY_HASH in mud.h and it is defined as 2048. So, the room hashing has a hash table with 2049 spaces?
/* for socials */
for ( x = 0; x < 27; x++ )
{
for ( social = social_index[x]; social; social = social>next )
{
/* rest of code omitted */
What is the point of having a hash table thats only 28 spaces in it? Does it really save that much time over a normal "master" linkedlist? 
~Nick Cash
http://www.nickcash.com  top 

Posted by 
Flannel
USA (1,230 posts) bio

Date 
Reply #3 on Wed 22 Dec 2004 03:53 AM (UTC) Amended on Wed 22 Dec 2004 03:55 AM (UTC) by Flannel

Message 
Another way to think of it is, if we have 10 items (the numbers 1 through 10), and we want to store them. We can either store just 10 items in sequence. Or we can do a simple hash (lets say mod 3) to put them in 'bags'.
So we'll have three bags: 0, 1, 2, we place each thing in a bag based on the function n % 3.
So our bags would look like this:
Bag 0: 3,6,9
Bag 1: 1,4,7,10
Bag 2: 2,5,8
That way when we want to find 7, we can put that through our function (7 % 3) and know that it is in bag 1.
Obviously this hash has collisions, which arent nessisarily a bad thing. So once we knew bag 1, we'd (usually) search linearly. You can see how that saved us time (more bags you have, if bags > items the more time you save). Obviously if we had only a couple of items, this probably wouldnt be worth your while.
And you can also use other hash functions, hash functions usually try and normalize the frequency of each outcome, so that one bag doesnt have 10 items, while another has only three. You can google for some nice hash functions for different circumstances. 
~Flannel
Messiah of Rose
Eternity's Trials.
Clones are people two.  top 

Posted by 
Nick Gammon
Australia (21,651 posts) bio
Forum Administrator 
Date 
Reply #4 on Wed 22 Dec 2004 04:27 AM (UTC) Amended on Wed 22 Dec 2004 04:29 AM (UTC) by Nick Gammon

Message 
I have 384 socials, so assuming they hash with an equal distribution, that would be 13 per slot. That is only 13 in the linked list (which you would find on average halfway through), so it is about 6 iterations, compared to finding something halfway through the 384 socials, which is 192 iterations.
So yes, it would be shorter.
One of the problems with hashes is that the hash itself takes time to calculate. I see from the code that the socials are hashed on the first letter of the social. This is hardly a hash, as there would be more starting with A than Z, however it still cuts down the linear search, for a tiny amount of extra computation, plus a small list of 27 items. They use 1 to 26 for the normal letters, plus 0 for the nonalphabetic ones. 
 Nick Gammon
www.gammon.com.au, www.mushclient.com  top 

Posted by 
David Haley
USA (3,881 posts) bio

Date 
Reply #5 on Wed 22 Dec 2004 06:32 AM (UTC) 
Message 
Quote: Looking at this, I go and look up MAX_KEY_HASH in mud.h and it is defined as 2048. So, the room hashing has a hash table with 2049 spaces? A small note here  if max_hash is defined as 2048, then there are 2048 entries, not 2049. This is because despite starting from 0, we go up until 2048 exclusive.
for ( i = 0; i < x; i++ )
You can see that if x=1, then we'll only run the loop once.
Hash tables have O(1) performance ideally, but in actuality have O(N/B) where B is the number of buckets  but this still assumes perfect distribution. In the real world, you still have a linear scan, but as Nick pointed out it's a much smaller scan.
A common trick with hash tables is to have a function that yields results of a much greater range than your number of buckets. Then, you modulo by the number of buckets as Flannel was describing. If you realize that you're filling up your buckets, you increase the number of buckets and then rehash everything. This way, you've reduced the number of items per buckets. This method helps reduce the linear scan time, in addition to not allocating more buckets than you need.
BSTs indeed have O(log n) lookup behavior. Note that a binary search tree can be implemented as an array, which has much less space overhead. The problem is that it's much harder to grow.
I believe that SMAUG implements its skill table as a BST but my memory may be hazy on that. 
David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone
http://david.thehaleys.org  top 

The dates and times for posts above are shown in Universal Coordinated Time (UTC).
To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.
9,920 views.
It is now over 60 days since the last post. This thread is closed.
Refresh page
top
Quick links:
MUSHclient.
MUSHclient help.
Forum shortcuts.
Posting templates.
Lua modules.
Lua documentation.
Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.