[Home] [Downloads] [Search] [Help/forum]


Register forum user name Search FAQ

Gammon Forum

[Folder]  Entire forum
-> [Folder]  SMAUG
. -> [Folder]  Running the server
. . -> [Subject]  game_loop(), select(), and dual core CPUs

game_loop(), select(), and dual core CPUs

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page


Pages: 1 2  3  4  5  

Posted by Samson   USA  (683 posts)  [Biography] bio
Date Sun 11 Feb 2007 06:16 AM (UTC)
Message
I'm not sure exactly how to figure this out, who to ask, or where to begin looking for this kind of problem. But here goes.

I recently upgraded both of my servers to dual core AMD64 CPUs. Obviously I am thrilled with the new speed and such. But it seems to have had an interesting side affect I wasn't counting on.

In the game_loop handlers that deal with idling descriptors, my code is just like any other Merc derivative. It adds one to the idle counter with each pulse as it runs the game loop. Immortals are all set to idle off the mud after 2 hours. Lately though, this has been drastically cut to somewhere in the neighborhood of 15 minutes.

The deeper implications of this should be clear. This would suggest that all of the handlers that game_loop stalls time for are firing off at greatly increased intervals. Zeno was able to verify this easily enough on his game which is also affected by this ( same server etc ). It appears as though everything has been accelerated to about 8x normal speed.

Is this something that is known and can be corrected? Or have I stumbled into one of those uncharted areas us bleeding edgers often end up in?
[Go to top] top

Posted by Nick Gammon   Australia  (22,982 posts)  [Biography] bio   Forum Administrator
Date Reply #1 on Sun 11 Feb 2007 06:41 AM (UTC)
Message
When you say "pulse", what do you mean exactly? You would expect that with faster CPUs things will go faster, so that can't be the exact problem.

With select() you get to specify the time to wait before it proceeds (without any IO necessarily happening), what have you set that to?

Quote:

Immortals are all set to idle off the mud after 2 hours.


Can't you do a simple time-of-day check?

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Samson   USA  (683 posts)  [Biography] bio
Date Reply #2 on Sun 11 Feb 2007 06:48 AM (UTC)
Message
      /*
       * Synchronize to a clock.
       * Sleep( last_time + 1/PULSE_PER_SECOND - now ).
       * Careful here of signed versus unsigned arithmetic.
       */
      {
         struct timeval now_time;
         long secDelta;
         long usecDelta;

         gettimeofday( &now_time, NULL );
         usecDelta = ( ( int )last_time.tv_usec ) - ( ( int )now_time.tv_usec ) + 1000000 / PULSE_PER_SECOND;
         secDelta = ( ( int )last_time.tv_sec ) - ( ( int )now_time.tv_sec );
         while( usecDelta < 0 )
         {
            usecDelta += 1000000;
            secDelta -= 1;
         }

         while( usecDelta >= 1000000 )
         {
            usecDelta -= 1000000;
            secDelta += 1;
         }

         if( secDelta > 0 || ( secDelta == 0 && usecDelta > 0 ) )
         {
            struct timeval stall_time;

            stall_time.tv_usec = usecDelta;
            stall_time.tv_sec = secDelta;
#ifdef WIN32
            Sleep( ( stall_time.tv_sec * 1000L ) + ( stall_time.tv_usec / 1000L ) );
#else
            if( select( 0, NULL, NULL, NULL, &stall_time ) < 0 && errno != EINTR )
            {
               perror( "game_loop: select: stall" );
               exit( 1 );
            }
#endif
         }
      }

      gettimeofday( &last_time, NULL );
      current_time = ( time_t ) last_time.tv_sec;


Perhaps the above will help clear up what I was getting at. This clock sync stuff happens in game_loop() right at the end of the while( !mud_down ) loop.

By "pulse" I mean it in this usage. The sync is supposed to stall execution for a certain length of time, which in Smaug is 1/4 second by default. It's the heart of the entire update system. And it seems to have found some way to speed itself up majorly, though I couldn't tell you why. It's especially disturbing since I wasn't the only one on the server to have become affected by it.
[Go to top] top

Posted by Meerclar   USA  (733 posts)  [Biography] bio
Date Reply #3 on Sun 11 Feb 2007 07:10 AM (UTC)

Amended on Sun 11 Feb 2007 07:24 AM (UTC) by Meerclar

Message
Part of the problem, and a fairly signifigant part I suspect it is, would be that the entire mu* time system is written for the 32 bit architecture. As with any other software wrtten for a specific architecture, putting it on another architecture may have some... unforseen results.

The long and short of the answer is you've found yourself deep in bleeding edge land and will probably have to do a 64 bit patch (and possibly a dual core patch to boot) for the time system to fix the problem.


Just for reference on comparative datat processing per processor:

32 bit - 2^32 bits per second
64 bit - 2^64 bits per second
dual core 64 bit - 2(2^64) per second

It actually gets very very silly if you start dealing with multiple dual core processors. Just an upgratde from 32 to 64 improves data thruput by a silly amount, adding a 2nd core just compounds the issue. My personal suggestion would be not to run mu*s on a 64 bit system but failing that, go with either a win64 ifdef in the time system or tweak the win32 to account for the massive speed boost from the 64bit dual core (multiply delay by 16 and adjust from there).

Meerclar - Lord of Cats
Coder, Builder, and Tormenter of Mortals
Stormbringer: Rebirth
storm-bringer.org:4500
www.storm-bringer.org
[Go to top] top

Posted by Samson   USA  (683 posts)  [Biography] bio
Date Reply #4 on Sun 11 Feb 2007 07:39 AM (UTC)
Message
I'm not trying to run this in Windows. This is on Fedora Core 6 with a dual core CPU.

I did some further testing, did a 'make clean' and sat idle again. This time instead of being crazy fast at 8x normal speed, it cut me off after 1 hour, which is half of the expected time frame.

So you're probably on to something here. The dual core is probably causing it to literally process twice the information in the same clock cycle. I suppose that throws things off, but hey. I can't just "not run in 64 bit" at this point. Which means a patch it is. But that's the problem. I haven't got any idea where to start on something like this.
[Go to top] top

Posted by Meerclar   USA  (733 posts)  [Biography] bio
Date Reply #5 on Sun 11 Feb 2007 08:01 AM (UTC)

Amended on Sun 11 Feb 2007 08:17 AM (UTC) by Meerclar

Message
Whether you're running in windows or not, the cumputations per cycle are what I gave earlier. The suggested increase all your delays by 16x and adjust from there stands for all operating systems as well. Only reason I suggested the win64 ifdef was so it's something you could include in the standard distro instead of having to support 2 (or perhaps 3) distros. Not sure if theres a simple way to do the time definitions for 64 bit and maintain a single distro but thats the very challenge you face now. Actually, in looking back over that piece of code, if by some miracle pulse_per_second is a multiple of 4 you could have an easy out - quarter the pulses and see how it works compared to a 32bit system.


Oh yeah, Nick, a time of day check would be the simple solution if the only affect the upgrade had was kicking imms early. Unfortunately I suspect this hits on everything that has a timer of any kind - spell durations, wait states, idle timers, etc etc etc as they're all controlled by that function Samson pasted for review. It's just not a pretty situation he finds himself in today.

Meerclar - Lord of Cats
Coder, Builder, and Tormenter of Mortals
Stormbringer: Rebirth
storm-bringer.org:4500
www.storm-bringer.org
[Go to top] top

Posted by Samson   USA  (683 posts)  [Biography] bio
Date Reply #6 on Sun 11 Feb 2007 08:21 AM (UTC)
Message
Well if the solution is as simple as you suggest, I may well be in luck.

My "pulse per second" value is currently 4, which is the Smaug default. Long ago, I added functionality to cset to allow people to fiddle with these values, either to increase or decrease them as they saw fit. Might have been a huge stroke of luck combined with foresight :)

I don't have time tonite anymore to run a test, but I think if I understand you right I can change "pulse per second" to 2 and cut the rate in half? If so, from the look of things now, that should return everything to "normal" speed.

Might have to hook up my linux drive again to see if my single core AMD64 box I use in Windows is affected by this, or if this is in fact an issue with having the second core.
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio
Date Reply #7 on Sun 11 Feb 2007 08:39 AM (UTC)
Message
Woah, woah. Changing from 32 to 64 bit does not double your processor speed. So although the processor might be moving more bits around per cycle, you are not doing more cycles per second, therefore not more computations per second.

Just think about it. Think of all the programs that run on both 32 and 64 bit processors, especially games that have cycles and the like. A game designed before 64 bit processors were mainstream won't run twice as fast just because you stick it on a 64 bit machine.

64 bit machines have a larger address space. The number of bits doesn't affect the number of cycles per second.

I will look into this as well, because this is a very worrying symptom, but I would be really, really surprised if this had to do with a direct problem with 64 bits, as opposed to something that worked by accident on 32 bits but was not fully proper to begin with.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by Samson   USA  (683 posts)  [Biography] bio
Date Reply #8 on Sun 11 Feb 2007 08:51 AM (UTC)
Message
120m * 60s = 7200 sec.
28800 pulses / 4 = 7200 sec.

In Smaug, 4 is PULSE_PER_SECOND by default.
So codewise, it should be working fine.

The idle counter for SmaugFUSS:

         d->idle++;  /* make it so a descriptor can idle out */
         if( FD_ISSET( d->descriptor, &exc_set ) )
         {
            FD_CLR( d->descriptor, &in_set );
            FD_CLR( d->descriptor, &out_set );
            if( d->character && ( d->connected == CON_PLAYING || d->connected == CON_EDITING ) )
               save_char_obj( d->character );
            d->outtop = 0;
            close_socket( d, TRUE );
            continue;
         }
         else if( ( !d->character && d->idle > 360 )  /* 2 mins */
                  || ( d->connected != CON_PLAYING && d->idle > 1200 )  /* 5 mins */
                  || d->idle > 28800 ) /* 2 hrs  */
         {
            write_to_descriptor( d, "Idle timeout... disconnecting.\r\n", 0 );
            d->outtop = 0;
            close_socket( d, TRUE );
            continue;
         }


The idle counter for AFKMud 2.0:

      ++d->idle;  /* make it so a descriptor can idle out */
      if( FD_ISSET( d->descriptor, &exc_set ) )
      {
         FD_CLR( d->descriptor, &in_set );
         FD_CLR( d->descriptor, &out_set );
         if( d->character && d->connected >= CON_PLAYING )
            d->character->save(  );
         d->outtop = 0;
         close_socket( d, true );
         continue;
      }
      else if( ( !d->character && d->idle > 360 )  /* 2 mins */
               || ( d->connected != CON_PLAYING && d->idle > 2400 )  /* 10 mins */
               || ( ( d->idle > 14400 ) && ( d->character->level < LEVEL_IMMORTAL ) )  /* 1hr */
               || ( ( d->idle > 32000 ) && ( d->character->level >= LEVEL_IMMORTAL ) ) )
      // imms idle off after 32000 to prevent rollover crashes 
      {
         d->write( "Idle timeout... disconnecting.\r\n", 0 );
         update_connhistory( d, CONNTYPE_IDLE );
         d->outtop = 0;
         close_socket( d, true );
         continue;
      }


32000 pulses / 4 = 8000 sec, or approximately 133 minutes.

For my test I changed the disconnect to a message display saying when it reached 32000 pulses. Whether it should be or not, I consistently got the 32000th pulse at just over *ONE* hour instead of two. I tested it for AFKMud 2.0 obviously, but there's no difference between that and stock SmaugFUSS as far as the timing loops.

I don't know if it's because of 64 bit, or because of my second CPU core. I'll need to confirm this with a single core CPU running the same OS. Also not sure if it's relevant but I'm running the x86_64 version of Fedora Core 6.
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio
Date Reply #9 on Sun 11 Feb 2007 09:11 AM (UTC)
Message
I really am not sure what to say. Dual-cores or 64 bit processors should not double the execution time of a single-threaded application, by causing time to go by twice as quickly. That just doesn't make sense... this is odd, and something else is going on here, not that 64 bits or dual-cores are doubling time's passage.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio
Date Reply #10 on Sun 11 Feb 2007 09:21 AM (UTC)
Message
I just edited my copy of SMAUGfuss to print out the current time at the end of every sleep, like so:

      gettimeofday( &last_time, NULL );
      current_time = ( time_t ) last_time.tv_sec;
      printf("end of tick. current time: %lu\n", current_time);


As expected, I get 4 ticks per second. I am running on a dual-core AMD64 processor, with Ubuntu.

I really, really, really don't think that this has something to do with the code note being "64-bit compatible" or "dual-core compatible". If anything, there might be an object file or a library that needs to be recompiled somewhere, so that everything is on par in 64-bit world. (Problems might arise if parts of the library are 32, and others 64.)

I would pursue that route aggressively, making sure that all linked object files were compiled on the same system, using the same compiler, before anything else.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by Samson   USA  (683 posts)  [Biography] bio
Date Reply #11 on Sun 11 Feb 2007 04:39 PM (UTC)

Amended on Sun 11 Feb 2007 04:41 PM (UTC) by Samson

Message
So exactly how would I go about verifying all of that? When I did my system upgrades, I installed FC6 from scratch with no OS on the system. I only copied back important stuff like apache configs, user data, and the /home directory. All of the system libraries should be what got put there by the installer.

I've done "make clean" more often than I care to know and it hasn't helped. I'm also not dreaming this, Zeno said he had the same thing happen to him. Both of us use Smaug based code with vastly different modifications compared to each other.

Someone on TMC suggested this might be to blame, but was speculating:
         usecDelta = ((int) last_time.tv_usec) - ((int) now_time.tv_usec) + 1000000 / sysdata->pulsepersec;
         secDelta  = ((int) last_time.tv_sec ) - ((int) now_time.tv_sec );


So I'm testing to see what happens with the (int) cast removed. I'm not even sure why it's there, it comes from stock Smaug. Stock Rom has the same thing, but Merc 2.2 does not.

Oh, and one other question. Is your copy of Ubuntu 32bit or are you using the 64bit install?
[Go to top] top

Posted by Zeno   USA  (2,871 posts)  [Biography] bio
Date Reply #12 on Sun 11 Feb 2007 05:54 PM (UTC)
Message
Hm, could we get a 3rd party here to make sure they see the pulse speed increase as well? Sign on my MUD or Samson's and see if you notice the speed increase as well. I watched the MUD game time to see the pulse increase, but it's also pretty obvious from the speed of combat.

Zeno McDohl,
Owner of Bleached InuYasha Galaxy
http://www.biyg.org
[Go to top] top

Posted by Samson   USA  (683 posts)  [Biography] bio
Date Reply #13 on Sun 11 Feb 2007 06:20 PM (UTC)
Message
Removing the (int) cast also had no affect. So that wasn't the problem either. I'm still getting 2 hour idle messages popping at the 1 hour marker.
[Go to top] top

Posted by Nick Gammon   Australia  (22,982 posts)  [Biography] bio   Forum Administrator
Date Reply #14 on Sun 11 Feb 2007 06:49 PM (UTC)
Message
I am pretty certain what your problem is here. :)

It is nothing to do with 64-bit numbers, dual-CPUs or anything like that.

Your main problem is you have two "select" calls in your main loop.

The first one is used to check for incoming IO on the sockets:


if( select( maxdesc + 1, &in_set, &out_set, &exc_set, &null_time ) < 0 )


The second is attempting to introduce some sort of delay into processing (your "game pulse"):


#ifdef WIN32
            Sleep( ( stall_time.tv_sec * 1000L ) + ( stall_time.tv_usec / 1000L ) );
#else
            if( select( 0, NULL, NULL, NULL, &stall_time ) < 0 && errno != EINTR )


Note, and this is important, that when I did the Windows version I found that the select here did nothing, and thus I did a Sleep instead, otherwise it just went crazy using CPU very quickly.

Check the documentation for select:


timeout is an upper bound on the amount of time elapsed before select returns.


Notice the words "upper bound". It does not guarantee that the time will actually elapse.

I suggest rewriting with only a single select in the game loop (my preferred option), or make the "time delay" select use a different method, like the Sleep in Windows.

Another approach would be to take the "real" select (the one that waits on sockets), introduce a small fixed delay (eg. 1/10 second), and then check (using the system clock) whether your pulse time is up.

Or, perhaps, make the real select wait 1/4 second, which is your game pulse time anyway, then your only problem is that the select might return sooner, because of TCP IO. Thus you still need a check, to not do a game pulse, if 1/4 second has not elapsed since last time through the loop.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


126,184 views.

This is page 1, subject is 5 pages long: 1 2  3  4  5  [Next page]

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page

Go to topic:           Search the forum


[Go to top] top

Quick links: MUSHclient. MUSHclient help. Forum shortcuts. Posting templates. Lua modules. Lua documentation.

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.

[Home]


Written by Nick Gammon - 5K   profile for Nick Gammon on Stack Exchange, a network of free, community-driven Q&A sites   Marriage equality

Comments to: Gammon Software support
[RH click to get RSS URL] Forum RSS feed ( https://gammon.com.au/rss/forum.xml )

[Best viewed with any browser - 2K]    [Hosted at HostDash]