Gammon Forum : Programming : General : Balanced Binary Tree's

Entire forum ➜ Programming ➜ General ➜ Balanced Binary Tree's

Balanced Binary Tree's

It is now over 60 days since the last post. This thread is closed. Refresh page

Posted by

Nick Cash USA (626 posts) Bio

Date

Thu 16 Feb 2006 04:07 AM (UTC)

Amended on Thu 04 May 2006 09:49 PM (UTC) by Nick Cash

Message

Recently in my CS II class we have been playing with tree's a great deal, and I figured I post a small example of using a balanced binary tree. All it does is read in words, throw them into the tree (after converting to lower case, smashing puncuation, and skipping length 3 or less words) and, if the already exist, update the count.

It uses a rather old (but still quite small and brilliant) balancing algorithm . The name currently escapes me, as do the papers about it. I will find them if anyone really wishes to know.

Some stats spewed by the program tested on a high end server:
---
Nodes w/ 0 bal: 136003
Nodes w/ 1 bal: 32058
Nodes w/-1 bal: 31939
Nodes w/ other: 0
Overal balance: 120

Number of words: 7201810
Number of nodes: 200000

Time: 31 seconds
---

That is, 7.2 million words were added to the tree, resulting in 200,000 unique nodes (which I capped it at). All of this took 31 seconds.

If you want to test it, take one of your instant messenger conversations, save it as a text file, and route it through the program. It comes with a print_tree() function to print the tree and the counts of words.

Code (exported to html):
http://ew.xidus.net/download/cs2/1/files/main_mod_cpp.html

~Nick Cash
http://www.nick-cash.com

Top

Posted by

David Haley USA (3,881 posts) Bio

Date

Reply #1 on Thu 16 Feb 2006 04:25 AM (UTC)

Message

Cool stuff. I am slightly curious what algorithm it is. Doesn't look like red-black to me, but then again, it's hard to tell exactly without looking at it in great detail.

Another interesting statistic is to compare lookup times for balanced trees vs. unbalanced trees. That's when you really see why it's advantageous to use balanced trees.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org

Top

Posted by

Enderandrew USA (37 posts) Bio

Date

Reply #2 on Thu 16 Feb 2006 08:27 AM (UTC)

Message

Color me stupid as I am just learning how to code in C.

How might we use this in a mud codebase?

Will it reduce memory useage or CPU load on parsing data files?

"Nihilism makes me smile."

Top

Posted by

David Haley USA (3,881 posts) Bio

Date

Reply #3 on Thu 16 Feb 2006 11:59 AM (UTC)

Message

Binary trees can be used to implement maps, where, at every node, instead of having a 'count' like Whiteknight has, you have a pointer to whatever you want to store.

Whiteknight, for instance, basically wrote a string-to-integer map.

So, a binary tree could be used as an effective data structure for, e.g., every time you need to store a very large quantity of string-to-xyz mappings. SMAUG does this a lot: skill tables, command tables, ...

What's nifty about what Whiteknight wrote is that the binary tree is balanced. An unbalanced binary tree is at the worse case basically a linked list, which means you lose all the good properties of a balanced tree. A balanced tree, on the other hand, guarantees at all times that the left-right distribution of every node is optimal, i.e. for every node, the quantity of nodes to the left is equal to the quantity of nodes on the right. (Except of course when you don't have the choice, e.g. a tree with 4 elements.)

You don't do this in C a whole lot, but in C++, every time you create a map<foo, bar>, chances are you are using an implementation that uses binary trees.

More technically speaking a binary tree gives you O(log n) lookup times, as opposed to O(n) for an array. It is very hard to beat a binary tree in general.

One way is to use a hash table, which is as good as its hash function. The 'ideal hash table' is O(1) (i.e. constant-time lookup), but these do not occur very often in practice. Practically speaking, hash tables are, assuming you have a very good hash function, O(n/m) where m is the number of buckets in your hash table.

There are other tricks you can use to implement mappings where the key is something sequential (numbers, strings : these are both sequences of digits, letters). One of my all-time favorites is the "trie". This stores a tree-like structure of the keys, where every node is an element of the sequence. So to look up "foobar", you'd follow the f node of the root node, the o node of the f node, the o node of the o node, the b of the o, the a of the b and the r of the a.
What's so nifty about 'tries' is that your lookup time is O(p) where p is the length of your key. Tries are not at all affected (time complexity) by the number of elements. This is an incredibly good feature.

Anyhow, enough rambling...

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org

Top

Posted by

Enderandrew USA (37 posts) Bio

Date

Reply #4 on Thu 16 Feb 2006 01:04 PM (UTC)

Message

Could this basically be used as a replacement function for something like stralloc then?

"Nihilism makes me smile."

Top

Posted by

David Haley USA (3,881 posts) Bio

Date

Reply #5 on Thu 16 Feb 2006 01:25 PM (UTC)

Message

I'm not sure exactly what you mean by that. Stralloc allocates memory for a string and happens to share memory, by returning a pointer to an existing block of memory if that string is already allocated.

You could use a btree to implement a shared string library, yes. You would search the btree for the string to allocate, and if it is in the btree, you return it and update the count; if it's not in the btree, you just add it with a count of 1.

In fact, this is, basically speaking, how shared string libraries work. You map strings to reference counts, and increment/decrement on allocation/deallocation. My shared string library, linked to from these forums, does exactly that except I use a hash table instead of a btree.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org

Top

Posted by

Nick Cash USA (626 posts) Bio

Date

Reply #6 on Tue 21 Feb 2006 05:07 PM (UTC)

Message

It is an algorithm developed by Adel'son-Vel'skii and Landis, and is otherwise known as the AVL balancing algorithm.

~Nick Cash
http://www.nick-cash.com

Top

Posted by

Nick Cash USA (626 posts) Bio

Date

Reply #7 on Thu 20 Apr 2006 05:20 PM (UTC)

Message

Very late update on this. You can find a PDF describing the AVL algorithm at:

http://math-cs.cns.uni.edu/~okane/061/BalancedTree.pdf

~Nick Cash
http://www.nick-cash.com

Top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.

31,868 views.

It is now over 60 days since the last post. This thread is closed. Refresh page

Go to topic: Search the forum

top