Gammon Forum : Programming : General : Hash table and binary search trees

Entire forum ➜ Programming ➜ General ➜ Hash table and binary search trees

Hash table and binary search trees

It is now over 60 days since the last post. This thread is closed. Refresh page

Posted by

Nick Cash USA (626 posts) Bio

Date

Wed 22 Dec 2004 02:06 AM (UTC)

Amended on Wed 22 Dec 2004 02:10 AM (UTC) by Nick Cash

Message

I was just wandering around, and I remembered these two things. I've read a little about them, but not much. Could anyone give me a brief synopsis (or point me to a good site) that show's what exactly these things are, their bennefits, and a basic/good implementation?

edit:
I know SMAUG uses a hash table, but I never really looked up how it really worked. Thats really about the extent of my experience with a hash table. As for a binary tree, I've never done anything with them.

Thanks :)

~Nick Cash
http://www.nick-cash.com

Top

Posted by Nick Gammon Australia (23,165 posts) Bio Forum Administrator

Date Reply #1 on Wed 22 Dec 2004 02:20 AM (UTC)

Message

Hash tables are a quick way of looking up random data by making a "hash" of the key and then indexing the key into a direct-lookup table.

For example, if you hashed your keys (eg. names) in such a way that the hash was in the range 0 to 255 then you could then index straight into a 256-item table to find the resulting name, without having to scan the entire list.

Clearly a potential problem is collisions, where multiple names hash to the same thing. Typically a hash implementation reverts to a linear scan to handle this case.

Hashes can be very fast, however you usually have to choose between a smallish table (and thus have collisions) or a large table, which reduces collisions but then is likely to waste space by having empty entries.

Binary trees on the other hand sort keys into pairs, eg.


    A          M
  A---K      M---P
A--B K--L  M--N P--R

Each pair indexes into a smaller chunk of the data. Transversing such a table is, I think, proportional to the log of the number of entries (each level is another power of 2).

For example, in this case to find N you need to go down 3 levels. (log (8) / log (2))

For each additional level you double the number of entries stored, for only one more level traversal. eg. 16 items would take 4 levels, 32 items would be 5 levels and so on.

Thus such a tree can be very quick to find something amongst a large number of entries. A side-effect is that by traversing the bottom level things are in sequence, which isn't true for hash tables, which are essentially random.

You can find implementations of trees in the STL (sets, multisets, maps, multimaps). I think some STL implementations also have hash sets and hash maps as an extension.

- Nick Gammon

www.gammon.com.au, www.mushclient.com

Top

Posted by Nick Cash USA (626 posts) Bio

Date

Reply #2 on Wed 22 Dec 2004 02:59 AM (UTC)

Amended on Wed 22 Dec 2004 03:19 AM (UTC) by Nick Cash

Message

Hmm, very interesting.


/* for commands */
    for ( x = 0; x < 126; x++ )
    {
	for ( command = command_hash[x]; command; command = command->next )
	{
           /* rest of code omitted */

So this is going through the hash table. The command hash table has 127 spaces in it. There are then command's within that space, which are all linked to eachother in their own linked list. Correct?

-- Edit --


   for ( hash = 0; hash < MAX_KEY_HASH; hash++ )
    for ( room = room_index_hash[hash] ; room ; room = room->next)
     /* rest of code omitted */

Looking at this, I go and look up MAX_KEY_HASH in mud.h and it is defined as 2048. So, the room hashing has a hash table with 2049 spaces?


/* for socials */
    for ( x = 0; x < 27; x++ )
    {
	for ( social = social_index[x]; social; social = social->next )
	{
          /* rest of code omitted */

What is the point of having a hash table thats only 28 spaces in it? Does it really save that much time over a normal "master" linked-list?

~Nick Cash
http://www.nick-cash.com

Top

Posted by

Flannel USA (1,230 posts) Bio

Date

Reply #3 on Wed 22 Dec 2004 03:53 AM (UTC)

Amended on Wed 22 Dec 2004 03:55 AM (UTC) by Flannel

Message

Another way to think of it is, if we have 10 items (the numbers 1 through 10), and we want to store them. We can either store just 10 items in sequence. Or we can do a simple hash (lets say mod 3) to put them in 'bags'.
So we'll have three bags: 0, 1, 2, we place each thing in a bag based on the function n % 3.

So our bags would look like this:

Bag 0: 3,6,9
Bag 1: 1,4,7,10
Bag 2: 2,5,8

That way when we want to find 7, we can put that through our function (7 % 3) and know that it is in bag 1.

Obviously this hash has collisions, which arent nessisarily a bad thing. So once we knew bag 1, we'd (usually) search linearly. You can see how that saved us time (more bags you have, if bags > items the more time you save). Obviously if we had only a couple of items, this probably wouldnt be worth your while.
And you can also use other hash functions, hash functions usually try and normalize the frequency of each outcome, so that one bag doesnt have 10 items, while another has only three. You can google for some nice hash functions for different circumstances.

~Flannel

Messiah of Rose
Eternity's Trials.

Clones are people two.

Top

Posted by

Nick Gammon Australia (23,165 posts) Bio Forum Administrator

Date

Reply #4 on Wed 22 Dec 2004 04:27 AM (UTC)

Amended on Wed 22 Dec 2004 04:29 AM (UTC) by Nick Gammon

Message

I have 384 socials, so assuming they hash with an equal distribution, that would be 13 per slot. That is only 13 in the linked list (which you would find on average half-way through), so it is about 6 iterations, compared to finding something half-way through the 384 socials, which is 192 iterations.

So yes, it would be shorter.

One of the problems with hashes is that the hash itself takes time to calculate. I see from the code that the socials are hashed on the first letter of the social. This is hardly a hash, as there would be more starting with A than Z, however it still cuts down the linear search, for a tiny amount of extra computation, plus a small list of 27 items. They use 1 to 26 for the normal letters, plus 0 for the non-alphabetic ones.

- Nick Gammon

www.gammon.com.au, www.mushclient.com

Top

Posted by

David Haley USA (3,881 posts) Bio

Date

Reply #5 on Wed 22 Dec 2004 06:32 AM (UTC)

Message

Quote:
Looking at this, I go and look up MAX_KEY_HASH in mud.h and it is defined as 2048. So, the room hashing has a hash table with 2049 spaces?

A small note here - if max_hash is defined as 2048, then there are 2048 entries, not 2049. This is because despite starting from 0, we go up until 2048 exclusive.

for ( i = 0; i < x; i++ )

You can see that if x=1, then we'll only run the loop once.

Hash tables have O(1) performance ideally, but in actuality have O(N/B) where B is the number of buckets - but this still assumes perfect distribution. In the real world, you still have a linear scan, but as Nick pointed out it's a much smaller scan.

A common trick with hash tables is to have a function that yields results of a much greater range than your number of buckets. Then, you modulo by the number of buckets as Flannel was describing. If you realize that you're filling up your buckets, you increase the number of buckets and then rehash everything. This way, you've reduced the number of items per buckets. This method helps reduce the linear scan time, in addition to not allocating more buckets than you need.

BSTs indeed have O(log n) lookup behavior. Note that a binary search tree can be implemented as an array, which has much less space overhead. The problem is that it's much harder to grow.

I believe that SMAUG implements its skill table as a BST but my memory may be hazy on that.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org

Top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.

25,121 views.

It is now over 60 days since the last post. This thread is closed. Refresh page

Go to topic: Search the forum

top