[Home] [Downloads] [Search] [Help/forum]


Register forum user name Search FAQ

Gammon Forum

[Folder]  Entire forum
-> [Folder]  Programming
. -> [Folder]  General
. . -> [Subject]  Self-managed hashed strings

Self-managed hashed strings

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page


Pages: 1 2  3  4  

Posted by David Haley   USA  (3,881 posts)  [Biography] bio
Date Sun 13 Feb 2005 02:39 AM (UTC)
Message
I've seen a lot of trouble caused by hash strings in SMAUG. The main issue is that there is no type-safety between hashed strings and non-hashed strings, so if you mismatch create/dispose/stralloc/strfree, consequences can be disastrous.

A while ago, I wrote some C++ classes to fix this issue. Basically, it's just a small library to implement managed hashed strings. You assign a value to the string, and it automatically takes care of entering it into the hash table, or only incrementing reference count if it's already present.

While it's not quite presentable for public use at the moment, it would be very easy to make it so. Is anybody interested in this? It's in C++ so it wouldn't be useful to most SMAUG coders unless they feel like moving to C++ - a good thing to do even if you don't use this code - but I know there are at least a few people who do program in C++. Let me know if you're interested and I'll put in a nice little package. :)

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by Greven   Canada  (835 posts)  [Biography] bio
Date Reply #1 on Sun 13 Feb 2005 02:52 AM (UTC)
Message
I'm fairly certain that I have a clean use of STRALLOC/str_dup, however I would love to have a copy to install. I EVENTUALLY plan to release my code, and if new coders are using it, it would be a great thing to have. Even some safety if I'm not payain attention, god knows thats a regular occurance :)

Nobody ever expects the spanish inquisition!

darkwarriors.net:4848
http://darkwarriors.net
[Go to top] top

Posted by Zeno   USA  (2,871 posts)  [Biography] bio
Date Reply #2 on Sun 13 Feb 2005 03:49 AM (UTC)
Message
I'd love to use it, I plan on converting to C++ before we go beta. As you can tell, I've had some problems. ;)

Zeno McDohl,
Owner of Bleached InuYasha Galaxy
http://www.biyg.org
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio
Date Reply #3 on Sun 13 Feb 2005 07:28 AM (UTC)
Message
Alrighty. :) I'll package it up and post it in a day or two.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by Nick Cash   USA  (626 posts)  [Biography] bio
Date Reply #4 on Sun 13 Feb 2005 07:51 AM (UTC)
Message
Just to add to these posts, I'd like to see it as well. While I don't plan to move my MUD over to C++, I do other things in C++. This could definitely come in handy :)

~Nick Cash
http://www.nick-cash.com
[Go to top] top

Posted by Samson   USA  (683 posts)  [Biography] bio
Date Reply #5 on Sun 13 Feb 2005 04:29 PM (UTC)
Message
I too would be interested in seeing this as I am also in the process of converting to C++ and would love nothing more than to say goodbye to STRALLOC/str_dup :)
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio
Date Reply #6 on Mon 14 Feb 2005 07:01 AM (UTC)
Message
OK, I'm almost done reworking all of this. I wrote the code about two years ago and I've learned an awful lot since. I've reorganized a lot of this code, namely to separate the conceptual notion of a shared string from a hash-table implementation - the hash table is now just a subclass of the shared string manager. This way, you can use whatever implementation of the string manager you want, if you don't like the hash table for some reason. In principle, you could even use a linked-list implementation, but that'd be kind of silly...

I've also used a neat little trick with templates, so that you can create shared strings that use different managers but without having to set a string's manager - it sets itself automatically based on its type.

Basically, you do something like this:
StringManager * gSharedStrManager;

typedef SharedString<&gSharedStrManager> shared_str;

int main()
{
    gSharedStrManager = new HashTable;

    shared_str s = "hello";
    shared_str s2 = "there";
    shared_str s3 = "hello";

    // at this point, there are only two entries in the string manager

    return 0;
}


Of course, if you ever allocate a shared_str without having created its manager, you'll be up a creek without a paddle. :-)

I plan on having a preview (read: undoc'ed) version in a day or two and a documented version a day or so after that.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio
Date Reply #7 on Sat 19 Feb 2005 01:05 AM (UTC)
Message
I haven't forgotten about this - I've just been busy with midterms + deadline at work. I'll have it up shortly...

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by Zeno   USA  (2,871 posts)  [Biography] bio
Date Reply #8 on Sat 19 Feb 2005 01:21 AM (UTC)
Message
No need to rush, I don't need it anytime soon. ;)

Zeno McDohl,
Owner of Bleached InuYasha Galaxy
http://www.biyg.org
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio
Date Reply #9 on Wed 23 Feb 2005 06:55 PM (UTC)
Message
Here is that preview version I was talking about. I only have Visual Studio build files at the moment but it should be pretty easy to stick it into a Unix project.

http://david.the-haleys.org/tmp/shared-str-v0_9.zip

I will be uploading a more complete version with Unix makefiles, documentation etc. shortly. It'll also include a more complete testing package. In the mean time, comments, criticism or suggestions would be most appreciated. :)

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio
Date Reply #10 on Sun 27 Feb 2005 11:17 PM (UTC)
Message
I am changing the license to a slightly modified BSD license. I will update the license in the 1.0 release, which will be when I finish the documentation.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by Raz   (32 posts)  [Biography] bio
Date Reply #11 on Thu 03 Mar 2005 12:51 AM (UTC)
Message
Umm...

From sharedstr_manager.h


size_t refCount_; //!< How many times this string is shared
//Among other similar instances


This will break under a conforming compiler. size_t is in the namespace std under C++. You are not allowed to use it without the appropiate scoping.


virtual void dumpTable(std::ostringstream & os) const = 0; // for debugging: dump whole table to os.


Why? Why not just simply use a std::ostream? It still allows a std::ostringstream to be passed to it.

From sharedstr_hastable.h:


protected:
//...
inline size_t hash(const std::string & str) const;


Which will break if a subclass ever tries to use this function. Do not declare a function inline unless you intend to have the code readily available for all files.

And then I turn to your class in shardstr.hpp.

Essentially you have provided a very strange interface for the programmer. You force the programmer to retain the manager variables that he or she creates. Personally, I do not find this very appealing. A better solution would be to have a more class-based solution where each class has a internal static storage so your shared strings can instantate many instances of the class but all the instances would still reference the same shared strings. This is very similar to the STL allocator design.

-Raz
C++ Wiki: http://danday.homelinux.org/dan/cppwiki/index.php
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio
Date Reply #12 on Thu 03 Mar 2005 04:49 AM (UTC)
Message
Thank you for your comments, Raz.

Quote:
This will break under a conforming compiler. size_t is in the namespace std under C++. You are not allowed to use it without the appropiate scoping.
size_t is not in the std namespace under c++! It is defined in the global namespace in std.io which is included via including the string header file.
Quote:
Why? Why not just simply use a std::ostream?
Because I wasn't thinking when I wrote that. :-)
Quote:
Which will break if a subclass ever tries to use this function. Do not declare a function inline unless you intend to have the code readily available for all files.
Umm... what?

For starters, there is no reason for a subclass to reimplement the hash function.

Secondly, what do you mean, it will break it?
Quote:
Essentially you have provided a very strange interface for the programmer. You force the programmer to retain the manager variables that he or she creates. Personally, I do not find this very appealing.
What if you want to keep track of the manager publicly to access its statistics? The whole point of this template argument was precisely to keep track of the manager variables - which, incidentally, you only have to store once in a typedef.
Quote:
A better solution would be to have a more class-based solution where each class has a internal static storage so your shared strings can instantate many instances of the class but all the instances would still reference the same shared strings. This is very similar to the STL allocator design.
What if you wanted to have different kinds of shared strings (using e.g. different hash functions), depending on the specific kind of strings you are sharing?

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by Raz   (32 posts)  [Biography] bio
Date Reply #13 on Fri 04 Mar 2005 12:30 AM (UTC)
Message
Quote:
size_t is not in the std namespace under c++!


Wrong. size_t is included within the std namespace.

Quote:
It is defined in the global namespace in std.io which is included via including the string header file.


Well, no. It is not defined in cstdio. It is defined in other files, such as cstddef or cstring. Those files may be included by cstdio, but it is not portable.

Anyhow, I'm wrong for other reasons. It seems that C++ kept that the borrowed C types would be available in the global namespace as well as the std namespace. Something I didn't realize.

Quote:
For starters, there is no reason for a subclass to reimplement the hash function.


I never made such a claim.

Quote:
Secondly, what do you mean, it will break it?


Since you declared the hash function inline without providing the definition in a header file (or other suitable file), it would be impossible for potential subclasses to use the hash function. However, this is only a problem if you intended on that class to be subclasses (which I think I thought it was).

Quote:
What if you want to keep track of the manager publicly to access its statistics? The whole point of this template argument was precisely to keep track of the manager variables - which, incidentally, you only have to store once in a typedef.


Your design isn't the only solution to that. You could easily do that with my allocator-like design. The internal static class could keep statisitics which the allocator can access when the programmer needs them.

Quote:
What if you wanted to have different kinds of shared strings (using e.g. different hash functions), depending on the specific kind of strings you are sharing?


That's the beauty of subclassing: you're not limited to the number of children you make. Each hash function could easily have its own subclass. You could even make the design so generic that minimal typing would be necessary for each subclass.

-Raz
C++ Wiki: http://danday.homelinux.org/dan/cppwiki/index.php
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio
Date Reply #14 on Fri 04 Mar 2005 01:33 AM (UTC)
Message
Quote:
Wrong. size_t is included within the std namespace.
You were saying that it's in the std namespace and that a 'standards-compliant' compiler would fail. Well, it won't- it's in the global namespace.
Quote:
It is not defined in cstdio
It is for the VS header files. For the g++ header files, it is defined via cstddef which includes stddef.h.
Quote:
The internal static class could keep statisitics which the allocator can access when the programmer needs them.
It seems that you suggest that instead of dragging around a manager, you drag around an allocator. I'm not sure what the gain is, since you felt it 'clumsy' to drag around a manager - which incidentally I disagree with.
Quote:
That's the beauty of subclassing: you're not limited to the number of children you make. Each hash function could easily have its own subclass. You could even make the design so generic that minimal typing would be necessary for each subclass.
You're just shifting the problem. Instead of storing the hash function etc. in a manager, you're making the programmer subclass the shared string class and then you use an allocator to deal with it instead. Personally, I would find it a bother to have to subclass off of the shared string all the time, which is one reason why I didn't do it that way. IMHO it is much cleaner to have a single, generic shared string type without need for subclasses where you can plug in a single, generic type of manager, and to use different hash functions all you need to do is call the set-hash-function method on the manager.

In any case it seems that this is a matter of personal preference. I'd be curious to hear arguments in the absolute about one being 'better' than the other. I don't think you're silly for wanting to do it that way, I just don't like it. I take it that you don't like my approach either, so I'd like to hear if you feel it's a matter of preference or if you have some kind of argument what one is simply better than the other in the absolute.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


100,985 views.

This is page 1, subject is 4 pages long: 1 2  3  4  [Next page]

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page

Go to topic:           Search the forum


[Go to top] top

Quick links: MUSHclient. MUSHclient help. Forum shortcuts. Posting templates. Lua modules. Lua documentation.

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.

[Home]


Written by Nick Gammon - 5K   profile for Nick Gammon on Stack Exchange, a network of free, community-driven Q&A sites   Marriage equality

Comments to: Gammon Software support
[RH click to get RSS URL] Forum RSS feed ( https://gammon.com.au/rss/forum.xml )

[Best viewed with any browser - 2K]    [Hosted at HostDash]