Arimaa Forum (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
Arimaa >> Site Discussion >> Server Overload
(Message started by: Fritzlein on Feb 1st, 2010, 12:02pm)

Title: Server Overload
Post by Fritzlein on Feb 1st, 2010, 12:02pm
The server overload that just happened during the World Championship preliminaries (Round 4, The_Jeh vs. Nevermind) is a sobering warning sign.  There were only four bots running at the same time.  Yes, there were over twenty people logged on to the server, and fifteen or so in chat, but those applications are not supposed to be CPU-intensive.  Was it perhaps the TeamSpeak server that tipped over arimaa.com?

I hope the TeamSpeak server is in fact to blame, so that in a normal situation arimaa.com can run more than four bots.  Unfortunately, even if that was the case, other bot issues in the recent past suggest that even when there is no live commentary, the server runs into issues when it runs around six bots.

My hypothesis is that the server has four CPUs, so it shows strain (and/or crashes) when four bots are thinking at the same time, because that is when the server must share CPU with a bot.  As long as only three bots are thinking the server can dole out one CPU per bot and still have a CPU left over to take care of everything else, but not so with four bots.

Note that it isn't the number of bots playing games that matters, because most (all?) bots don't ponder while waiting for their opponent to move.  Thus it would be possible to have five bots playing at the same time, and by good fortune never have more than three of them thinking at the same time, but when the opponent moves happen to line up just wrong so that four bots need to think at the same time, watch out.

If my hypothesis is correct, it is the timed bots that are the problem (Blitz, Fast, and CC) more than the P1 and P2 bots, because the timed bots are thinking for half of the time whereas the P1 and P2 bots move quickly and spend the rest of their time idle.  We might be able to have twenty copies of ArimaaScoreP1 playing games without ever having four of them thinking at the same time.

Those of you who see where I am heading may object to limiting and/or disabling the timed bots, because it would stink not to have the best opponents available.  It is true that only being able to play P1 and P2 bots would stink, but is that worse than having timeouts like Nevermind had, and worse than the illegitimate wins that peppered the Computer Championship qualifying phase?  Games that are terminated prematurely by server overload are worse than games not played at all.

If we don't outright disable the timed bots, then we at least need a better system to prevent the server from getting overloaded.  I know there is a check for load average when starting a bot, but that is proving to be inadequate, because the server doesn't disallow starting new bots until after the server is overloaded.  Even when we are in the danger zone, it is likely that not all playing bots are thinking simultaneously, so the server doesn't appear overloaded until disaster strikes.

Of course, HvH games take up minimal CPU, so those would not have to be limited for the foreseeable future, and even if we should be so lucky as to have so many HvH games that it causes a problem (say forty?) then I would rather reserve CPU for ten HvH games rather than one HvB game.  Also when developers put up their bots for public play (Thanks jdb, tize, aaaa, Janzert!) it uses hardly any of the arimaa.com CPU.

I know that nobody likes limits of any kind, so the only solution that everybody can agree to is to let everyone consume until something breaks.  Let's pretend that there is no problem until things get really bad.  Sort of like U.S. government spending which will stay out of control until the dollar devalues.  So maybe I shouldn't expect that the awkward crash of The_Jeh and Nevermind's game will serve as a wake-up call to anyone.  Maybe things will have to get a lot worse before we can have a serious discussion about how to ration server resources.  If so, it would be par for the course.

If, on the other hand, folks want to propose potential ways to deal with the problem before it gets so bad it drives away players, I'd love to hear what ideas are out there.

Suggestion 1 (no programming, no money):
Disable the timed bots.

Suggestion 2 (programming, but no money):
Assign each bot a load factor based on what percentage of time it typically spends thinking.  All timed bots would have a load factor of 0.5, whereas P1 and P2 bots would have lower factors.  Keep a running total of the load factors of the bots that are playing.  Go ahead and allow users to create as many instances of bots as they would like, but when a user tries to start a game, don't let the game start if it would raise the load factor to 1.75 or higher.  Instead put games in a queue of "games to be started when some other bot finishes".  Then later try to start those games in the order that they were requested.

Suggestion 3 (money but no programming):
Rent two servers year-round instead of one.  Then we don't have to worry about overload until about twelve simultaneous bots run.  Omar, are you game to double your Arimaa expenses?  Would anyone else like to volunteer?

Just some ideas to get the discussion rolling.  :)

Title: Re: Server Overload
Post by Arimabuff on Feb 1st, 2010, 12:29pm
One server overload does not a problem make.

Fritz you've been arguing for that kind of drastic measures for as long as I can remember. So far the site is doing well on a regular day and that's enough not to drive away players. I don't know of any site (and I know many) that doesn't crash occasionally and the people who log in these sites know that as well. They won't be deterred because of one isolated incident unless they were not much interested to begin with.

Title: Re: Server Overload
Post by Janzert on Feb 1st, 2010, 12:48pm
At the WC timeout yesterday, there was 1-p1 1-p2 1-fast and 1-blitz bot running on the server. The WC game timed out as did the blitz and fast bots. 16 minutes prior to that one other fast bot timed out.

My experience running a teamspeak version 2 server some years ago was that it didn't use much in the way of CPU but could be fairly bandwidth heavy. I don't recall how memory heavy it was but would think it was relatively light. I'm not sure how much this has changed with version 3.

One other thing to check might be the size of hash tables the bots are using. For the tournament I know I set opfor to use a large table (800MB?), but there probably isn't much gained after 100-200MB. If the bots are using more memory for a hash tables than is available I would expect that to cause severe swapping and performance issues for the whole server.

Janzert

Title: Re: Server Overload
Post by Fritzlein on Feb 1st, 2010, 3:44pm

on 02/01/10 at 12:29:28, Arimabuff wrote:
One server overload does not a problem make.

True, but there has been more than one timeout due to server overload so far, and there will be more in the future.  It isn't a level that bothers you yet, I understand, but isn't there some level that you would consider a problem?  And once that level has been reached, wouldn't you recommend doing something about it?

Title: Re: Server Overload
Post by Fritzlein on Feb 1st, 2010, 4:01pm

on 02/01/10 at 12:48:56, Janzert wrote:
My experience running a teamspeak version 2 server some years ago was that it didn't use much in the way of CPU but could be fairly bandwidth heavy.

Ah, it didn't occur to me that it could have been a bandwidth problem rather than a CPU problem.  Before the game timeouts I was trying to load ArimaaWiki pages, and they were loading very slowly or not at all, which I assumed was a CPU problem, but could well have been a bandwidth problem.


Quote:
One other thing to check might be the size of hash tables the bots are using.

Good idea.  If it is memory issue, then probably some bots cause more of a problem than others, and we ought to be able to isolate the worst offenders.


Quote:
If the bots are using more memory for a hash tables than is available I would expect that to cause severe swapping and performance issues for the whole server.

Omar mentioned that he had very slow disk writes when GnoBot2009Blitz was running in the qualifying.  I guess that could be indicative of the memory swapping problem.  Thanks for offering these other debugging suggestions in case it is not the CPU that is the bottleneck.  We wouldn't want to introduce the wrong limits on the bots because of mis-diagnosing the problem.

Title: Re: Server Overload
Post by Eltripas on Feb 1st, 2010, 5:08pm
I like the suggestion 2 to be used during the WC period, also if the problem is Teamspeak related we should move back to Skype which as far as I understand doesn't uses any resource from the arimaa.com server.


on 02/01/10 at 12:29:28, Arimabuff wrote:
One server overload does not a problem make.


That's something Yoda would say, hehe.


Title: Re: Server Overload
Post by Janzert on Feb 1st, 2010, 5:57pm

on 02/01/10 at 17:08:38, Eltripas wrote:
I like the suggestion 2 to be used during the WC period, also if the problem is Teamspeak related we should move back to Skype which as far as I understand doesn't uses any resource from the arimaa.com server.


Actually after checking dns I don't think the teamspeak server is running off the main arimaa.com server either. So it shouldn't be causing any load problem either.

Janzert

Title: Re: Server Overload
Post by Manuel on Feb 1st, 2010, 11:54pm
I would be much in favor of suggestion 2, at first perhaps only during WC games, so that at least for these games time-outs are prevented.
However, it must be checked first what is the problem: CPU, memory or bandwidth. Can't you simply see this in the logfiles? If not: perhaps Omar should start keeping logfiles for CPU load, memory load and networking, so that one can easily check what causes the problems.

@Arimabuff: I remember you using strong language about freezeups, so one would think you would be in favor of solving these problems?

Title: Re: Server Overload
Post by Arimabuff on Feb 2nd, 2010, 12:50am

on 02/01/10 at 17:08:38, Eltripas wrote:
...That's something Yoda would say, hehe.

Be that as it may, it's an adaptation of an old English proverb: "One swallow does not a summer make."

Title: Re: Server Overload
Post by Arimabuff on Feb 2nd, 2010, 12:59am

on 02/01/10 at 23:54:38, Manuel wrote:
@Arimabuff: I remember you using strong language about freezeups, so one would think you would be in favor of solving these problems?

They were mostly due to my mathusalemic computer; the troubles went partially away after I did some cleaning and rearranging of the files. It’s almost tolerable now.  ;D

Title: Re: Server Overload
Post by omar on Feb 3rd, 2010, 12:27pm
Actually the current server only has two processors. It was the same one used as gold in the 2008 WCC. It is the server load that is causing problems.

I have an easy way to turn off bots during WC games. I'll continue to use that for now.

After the events this year I'll be transferring the site over to the server used for this years WCC. It will be about 3 times as fast as the current system.

Title: Re: Server Overload
Post by Fritzlein on Feb 3rd, 2010, 1:45pm

on 02/03/10 at 12:27:12, omar wrote:
After the events this year I'll be transferring the site over to the server used for this years WCC. It will be about 3 times as fast as the current system.

That's very good news.  Then we shouldn't have to worry about the load for another couple of years.

Title: Re: Server Overload
Post by Hippo on Feb 7th, 2010, 2:44pm
I know it's Omar's choice and investment, but wouldn't the configuration where the bot's are not running on the game server preferable.

Having bot's server on different hardware than the game server?

The game protocol would hardly be problem even with a lot of bots.

Title: Re: Server Overload
Post by Janzert on Feb 7th, 2010, 8:48pm
It's certainly preferable, the problem is the cost. If someone were to donate the server I'm sure Omar wouldn't have a problem moving the bots to their own server.

Janzert

Title: Re: Server Overload
Post by omar on Feb 10th, 2010, 8:23pm
Yes, that would be the best way to do it. With no bots running I think the new server would be able to easily handle 300 users in the gameroom at the same time.

Probably later this year I will let others invest in Arimaa and at that time I'll get another server for the bots.

Title: Re: Server Overload
Post by Fritzlein on Feb 11th, 2010, 10:53am

on 02/10/10 at 20:23:40, omar wrote:
Probably later this year I will let others invest in Arimaa and at that time I'll get another server for the bots.

You can't throw out a teaser like that and not expect questions.  Specifically, what's the revenue model for the Arimaa company people would be investing in?  More specifically, if investor money is used to buy a separate server to host bots for people to play, will those bots generate revenue because they are pay-for-play?

Title: Re: Server Overload
Post by omar on Feb 17th, 2010, 8:27am
As tempting as it is for me to want to say more about this I have to be careful that any communication that I do relating to the business side is done through formal channels due to potential liabilities. When the business plan is ready I'll announce it and accredited investors will be able to request a copy. I know many people may not be accredited investors, but may still want to invest  in Arimaa so I am looking at options on how to include them as well. This is about all I can say right now.

Title: Re: Server Overload
Post by Fritzlein on Feb 17th, 2010, 8:55am
Thanks for saying that much.  Perhaps 2010 will have as many firsts for Arimaa as 2009 did!

Title: Re: Server Overload
Post by Eltripas on Feb 28th, 2010, 4:35pm

on 02/03/10 at 12:27:12, omar wrote:
After the events this year I'll be transferring the site over to the server used for this years WCC. It will be about 3 times as fast as the current system.


Does this means that the fast and blitz bots on the ladder will be better?

Title: Re: Server Overload
Post by Fritzlein on Feb 28th, 2010, 5:54pm

on 02/28/10 at 16:35:31, Eltripas wrote:
Does this means that the fast and blitz bots on the ladder will be better?

Yep, they'll get better.  The amount of improvement, however, is debatable.



Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.