[Home] [Downloads] [Search] [Help/forum]


Register forum user name Search FAQ

Gammon Forum

[Folder]  Entire forum
-> [Folder]  SMAUG
. -> [Folder]  SMAUG coding
. . -> [Subject]  Descriptor 0

Descriptor 0

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page


Posted by Boborak   USA  (228 posts)  [Biography] bio
Date Mon 13 Jan 2003 02:43 AM (UTC)
Message
Anyone have any idea what would cause a descriptor to be set to 0? I have a problem that I can't seem to track down or reproduce. Occasionally a player will log on and their descriptor will be 0. After a short while they get kicked off and when they reconnect it's 0 again. This usually continues until either I A) hard reboot the mud (a copyover crashes it) or B) the player does something that writes to their descriptor at approximately the same time they get kicked off and I get a bad_desciptor bug and a crash. I *believe* this situation only occurs after a copyover, and I've set up a debug message in init_descriptor that at least verifies this happens right as the descriptor is allocated. Maybe if someone were to describe to me how exactly the mud obtains a descriptor number it would help in my debugs. Thanks in advance.
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #1 on Mon 13 Jan 2003 03:17 AM (UTC)
Message
Depends exactly on what you mean by descriptor. There is a DESCRIPTOR_DATA structure for each connected player, and inside that structure is a field called (confusingly) "descriptor".

They are set up in comm.c, around line 906, in the code below. I suggest that the DESCRIPTOR_DATA structure is created with a malloc (indirectly), where as the d->descriptor is the socket (file handle) for the connected player.

I don't see how the socket could be zero unless the code set it to zero, or he was never connected. Zero is reserved as the file descriptor for stdin, as in:


unistd.h:#define         STDIN_FILENO   0       /* standard input file descriptor */


As for the DESCRIPTOR_DATA structure that could be zero (null) if the malloc failed, but I don't see that happening in mid-stream. Most probably you have a bug that is overwriting memory.

... code from comm.c follows ...


    CREATE( dnew, DESCRIPTOR_DATA, 1 );
    dnew->next          = NULL;
    dnew->descriptor    = desc;
    dnew->connected     = CON_GET_NAME;
    dnew->outsize       = 2000;
    dnew->idle          = 0;
    dnew->lines         = 0;
    dnew->scrlen        = 24;
    dnew->port          = ntohs( sock.sin_port );
    dnew->user          = STRALLOC("(unknown)");
    dnew->newstate      = 0;
    dnew->prevcolor     = 0x07;
 


- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Boborak   USA  (228 posts)  [Biography] bio
Date Reply #2 on Mon 13 Jan 2003 04:00 AM (UTC)
Message
I fear I have a memory leak somewhere. the descriptor I'm talking about is, dnew->descriptor.

It's actually right at the end of that code block that I have my if(dnew->descriptor == 0) bug("PANIC!");

Any ideas on how to track this down?

It's important to note that the 'victim' of this bug can usually do a few commands before they get kicked or the mud crashes.

Here's a brief chuck of my logs right before a crash:
(the bug is my if 0 check)

Sun Jan 12 19:55:38 2003 :: Jander has quit.
Write_to_descriptor: compressed: Bad file descriptor
Sun Jan 12 19:55:38 2003 :: Closing link to Seto.
Sun Jan 12 19:55:45 2003 :: [*****] BUG: BAD DESCRIPTOR! PREPARE FOR THE WORST. HARD REBOOT NOW!
Sun Jan 12 19:55:45 2003 :: Sock.sinaddr: 63.175.0.129, port 1025.
Sun Jan 12 19:55:50 2003 :: zmud client detected for descriptor: 0
Sun Jan 12 19:55:50 2003 :: MCCP (Compression) support detected for descriptor: 0.
Sun Jan 12 19:55:50 2003 :: MSP (Sound) support detected for descriptor: 0.
Sun Jan 12 19:55:53 2003 :: RasiTalon has quit.
accept_new: select: poll: Bad file descriptor

//this was the last line in my logs
[Go to top] top

Posted by Boborak   USA  (228 posts)  [Biography] bio
Date Reply #3 on Mon 13 Jan 2003 04:07 AM (UTC)
Message
I'm begining to notice a trend in which the problem shows up if someone joins right as someone else is quitting. Perhaps there's a problem with the closing of my sockets on exiting players?
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #4 on Mon 13 Jan 2003 09:31 PM (UTC)
Message
Is this modified SMAUG code? It is very hard to track down memory corruption, unless you have some idea of what changes were made. What you could do it check all descriptors in the main loop, at least to try to narrow down what happened just before the corruption.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Boborak   USA  (228 posts)  [Biography] bio
Date Reply #5 on Tue 14 Jan 2003 12:41 AM (UTC)
Message
Yes. It's highly modified. In the process of adding MCCP, MSP, terminal detection, ect. comm.c alone has become quite modified. I've determined it's copyover related somehow or other. It wouldn't be the first mem leak I had to hunt down.
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #6 on Tue 14 Jan 2003 09:24 PM (UTC)
Message
OK then in the copyover code, maybe for each connected person you could check the descriptors, eg.

for ( d = first_descriptor; d; d = d->next )
if ( d->descriptor == 0 )
{
// do something here
}

Put that into a function, and then sprinkle the function call all through the code, perhaps including some "call number", so you know which one failed. eg.

check_descriptors (1);

// do some stuff

check_descriptors (2);

etc.

The check itself would be pretty fast so you could afford a few of them until you find the problem.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Boborak   USA  (228 posts)  [Biography] bio
Date Reply #7 on Tue 14 Jan 2003 10:07 PM (UTC)
Message
Excellent idea. Thanks. I'll let you know how it turns out.
[Go to top] top

Posted by Boborak   USA  (228 posts)  [Biography] bio
Date Reply #8 on Thu 16 Jan 2003 07:22 PM (UTC)
Message
Hmmm. Still not fixed, but I can reproduce it exactly. After a copyover, players can connect without a hitch, UNTIL one of the players that was on during the copyover quits. Then the very next connection attempt will result in a descriptor of 0. Any players that join after THAT player will be fine however. I have traced a quitting player through the 'close_socket' function and the first player that quits after a copyover, and see nothing abnormal. I'm at a loss as to what else to do really, I simply don't understand how a socket is allocatted.

What does accept() do? This is obviously where a new descriptor is allocated, but if I step through the function in gdb, I can't seem to figure out whats causing it to set the desc too 0

if ( ( desc = accept( new_desc, (struct sockaddr *) &sock, &size) ) < 0 )
{
perror( "New_descriptor: accept");
set_alarm( 0 );
return;
}

anywho.. back into the fray I go..
[Go to top] top

Posted by Boborak   USA  (228 posts)  [Biography] bio
Date Reply #9 on Thu 16 Jan 2003 10:05 PM (UTC)
Message
Yet more interesting stuff to share..

using this:

void do_fdcheck(CHAR_DATA *ch, char *argument)
{
struct stat fs;
int i, j = 0;

send_to_char("FD's in use:\n\r", ch);
for (i = 0; i < 256; ++i)
if (!fstat(i, &fs))
{
ch_printf(ch, "%03d ", i);
if (!(++j%15))
send_to_char("\n\r", ch);
} if (j%15)
send_to_char("\n\r", ch);
ch_printf(ch, "%d descriptors in use.\n\r", j);
return;
}

..I created a command to display file descriptors that were in use.

After logging 2 chars on, I had FD's 0-29 in use. After running a copyover I had 0-48 in use (problem #1). I logged another player on and I got an additional descriptor as expected, and logging that player off I only lost one descriptor as expected. But! As soon as I logged off a player that was on during the copyover I lost TWO, the descriptor of the char and 0 (problem #2). So, that explains why accept() is grabbing descriptor 0 on the next log on. So...
[Go to top] top

Posted by Boborak   USA  (228 posts)  [Biography] bio
Date Reply #10 on Thu 16 Jan 2003 11:41 PM (UTC)
Message
Well. I got the descriptor leaks under control, and as sad as it may sound that didn't fix my descriptor 0 bug any. Still trying to figure out why descriptor 0 is dropping like it is. Any help would be appreciated.
[Go to top] top

Posted by Orange   United Kingdom  (25 posts)  [Biography] bio
Date Reply #11 on Thu 16 Jan 2003 11:50 PM (UTC)
Message
You could use gdb or whatever debugger you use and set a breakpoint on the 'close' function, then do the thing that triggers the behaviour, and see where and why it gets called...
[Go to top] top

Posted by Boborak   USA  (228 posts)  [Biography] bio
Date Reply #12 on Thu 16 Jan 2003 11:57 PM (UTC)
Message
Heh. Way ahead of you on that one. I live in GDB it seems, and surprisingly enough, even with my skills in gdb it's not helping to track this down. Everything short of comparing each and every variable between good descriptor and bad (which I'm resorting to now) I've pretty much done. I'll be sure to post the outcome, so as others may not suffer as I have ;-)
[Go to top] top

Posted by Boborak   USA  (228 posts)  [Biography] bio
Date Reply #13 on Fri 17 Jan 2003 01:37 AM (UTC)
Message
Fixed it ;-) I knew from the begining, it was something *I* did to break it. But it was such a weird problem I didn't know specifically what to look for. To make a long story short, I have added an authentication scheme to stock smaug that I hadn't added to the copyover code, which caused this (in close_socket):

if ( dclose->auth_fd != -1 )
close( dclose->auth_fd );

..to close descriptor 0 on the first close_socket call.

On the bright side. I plugged all the descriptor leaks I could find. Going from a fresh reboot of 29 descriptors to just 8 (most of which are system reserved anyway).
[Go to top] top

Posted by Jason   (109 posts)  [Biography] bio
Date Reply #14 on Fri 25 Apr 2014 08:46 PM (UTC)
Message
Ok so what is the fix for this bug? I have swfote which came with copyover already installed on it. I changed codebases mainly cause I chose to start over on my mud now that I have a bit more experience, but I never had this issue on SWR1.0 when I installed copyover myself. Now that I have this same issue... Can anyone tell me how to fix it?
[Go to top] top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


32,105 views.

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page

Go to topic:           Search the forum


[Go to top] top

Quick links: MUSHclient. MUSHclient help. Forum shortcuts. Posting templates. Lua modules. Lua documentation.

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.

[Home]


Written by Nick Gammon - 5K   profile for Nick Gammon on Stack Exchange, a network of free, community-driven Q&A sites   Marriage equality

Comments to: Gammon Software support
[RH click to get RSS URL] Forum RSS feed ( https://gammon.com.au/rss/forum.xml )

[Best viewed with any browser - 2K]    [Hosted at HostDash]