Author |
Topic: Server Overload (Read 2331 times) |
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Server Overload
« on: Feb 1st, 2010, 12:02pm » |
Quote Modify
|
The server overload that just happened during the World Championship preliminaries (Round 4, The_Jeh vs. Nevermind) is a sobering warning sign. There were only four bots running at the same time. Yes, there were over twenty people logged on to the server, and fifteen or so in chat, but those applications are not supposed to be CPU-intensive. Was it perhaps the TeamSpeak server that tipped over arimaa.com? I hope the TeamSpeak server is in fact to blame, so that in a normal situation arimaa.com can run more than four bots. Unfortunately, even if that was the case, other bot issues in the recent past suggest that even when there is no live commentary, the server runs into issues when it runs around six bots. My hypothesis is that the server has four CPUs, so it shows strain (and/or crashes) when four bots are thinking at the same time, because that is when the server must share CPU with a bot. As long as only three bots are thinking the server can dole out one CPU per bot and still have a CPU left over to take care of everything else, but not so with four bots. Note that it isn't the number of bots playing games that matters, because most (all?) bots don't ponder while waiting for their opponent to move. Thus it would be possible to have five bots playing at the same time, and by good fortune never have more than three of them thinking at the same time, but when the opponent moves happen to line up just wrong so that four bots need to think at the same time, watch out. If my hypothesis is correct, it is the timed bots that are the problem (Blitz, Fast, and CC) more than the P1 and P2 bots, because the timed bots are thinking for half of the time whereas the P1 and P2 bots move quickly and spend the rest of their time idle. We might be able to have twenty copies of ArimaaScoreP1 playing games without ever having four of them thinking at the same time. Those of you who see where I am heading may object to limiting and/or disabling the timed bots, because it would stink not to have the best opponents available. It is true that only being able to play P1 and P2 bots would stink, but is that worse than having timeouts like Nevermind had, and worse than the illegitimate wins that peppered the Computer Championship qualifying phase? Games that are terminated prematurely by server overload are worse than games not played at all. If we don't outright disable the timed bots, then we at least need a better system to prevent the server from getting overloaded. I know there is a check for load average when starting a bot, but that is proving to be inadequate, because the server doesn't disallow starting new bots until after the server is overloaded. Even when we are in the danger zone, it is likely that not all playing bots are thinking simultaneously, so the server doesn't appear overloaded until disaster strikes. Of course, HvH games take up minimal CPU, so those would not have to be limited for the foreseeable future, and even if we should be so lucky as to have so many HvH games that it causes a problem (say forty?) then I would rather reserve CPU for ten HvH games rather than one HvB game. Also when developers put up their bots for public play (Thanks jdb, tize, aaaa, Janzert!) it uses hardly any of the arimaa.com CPU. I know that nobody likes limits of any kind, so the only solution that everybody can agree to is to let everyone consume until something breaks. Let's pretend that there is no problem until things get really bad. Sort of like U.S. government spending which will stay out of control until the dollar devalues. So maybe I shouldn't expect that the awkward crash of The_Jeh and Nevermind's game will serve as a wake-up call to anyone. Maybe things will have to get a lot worse before we can have a serious discussion about how to ration server resources. If so, it would be par for the course. If, on the other hand, folks want to propose potential ways to deal with the problem before it gets so bad it drives away players, I'd love to hear what ideas are out there. Suggestion 1 (no programming, no money): Disable the timed bots. Suggestion 2 (programming, but no money): Assign each bot a load factor based on what percentage of time it typically spends thinking. All timed bots would have a load factor of 0.5, whereas P1 and P2 bots would have lower factors. Keep a running total of the load factors of the bots that are playing. Go ahead and allow users to create as many instances of bots as they would like, but when a user tries to start a game, don't let the game start if it would raise the load factor to 1.75 or higher. Instead put games in a queue of "games to be started when some other bot finishes". Then later try to start those games in the order that they were requested. Suggestion 3 (money but no programming): Rent two servers year-round instead of one. Then we don't have to worry about overload until about twelve simultaneous bots run. Omar, are you game to double your Arimaa expenses? Would anyone else like to volunteer? Just some ideas to get the discussion rolling.
|
|
IP Logged |
|
|
|
Arimabuff
Forum Guru
Arimaa player #2764
Gender:
Posts: 589
|
|
Re: Server Overload
« Reply #1 on: Feb 1st, 2010, 12:29pm » |
Quote Modify
|
One server overload does not a problem make. Fritz you've been arguing for that kind of drastic measures for as long as I can remember. So far the site is doing well on a regular day and that's enough not to drive away players. I don't know of any site (and I know many) that doesn't crash occasionally and the people who log in these sites know that as well. They won't be deterred because of one isolated incident unless they were not much interested to begin with.
|
|
IP Logged |
|
|
|
Janzert
Forum Guru
Arimaa player #247
Gender:
Posts: 1016
|
|
Re: Server Overload
« Reply #2 on: Feb 1st, 2010, 12:48pm » |
Quote Modify
|
At the WC timeout yesterday, there was 1-p1 1-p2 1-fast and 1-blitz bot running on the server. The WC game timed out as did the blitz and fast bots. 16 minutes prior to that one other fast bot timed out. My experience running a teamspeak version 2 server some years ago was that it didn't use much in the way of CPU but could be fairly bandwidth heavy. I don't recall how memory heavy it was but would think it was relatively light. I'm not sure how much this has changed with version 3. One other thing to check might be the size of hash tables the bots are using. For the tournament I know I set opfor to use a large table (800MB?), but there probably isn't much gained after 100-200MB. If the bots are using more memory for a hash tables than is available I would expect that to cause severe swapping and performance issues for the whole server. Janzert
|
« Last Edit: Feb 1st, 2010, 12:50pm by Janzert » |
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Server Overload
« Reply #3 on: Feb 1st, 2010, 3:44pm » |
Quote Modify
|
on Feb 1st, 2010, 12:29pm, Arimabuff wrote:One server overload does not a problem make. |
| True, but there has been more than one timeout due to server overload so far, and there will be more in the future. It isn't a level that bothers you yet, I understand, but isn't there some level that you would consider a problem? And once that level has been reached, wouldn't you recommend doing something about it?
|
« Last Edit: Feb 1st, 2010, 4:02pm by Fritzlein » |
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Server Overload
« Reply #4 on: Feb 1st, 2010, 4:01pm » |
Quote Modify
|
on Feb 1st, 2010, 12:48pm, Janzert wrote:My experience running a teamspeak version 2 server some years ago was that it didn't use much in the way of CPU but could be fairly bandwidth heavy. |
| Ah, it didn't occur to me that it could have been a bandwidth problem rather than a CPU problem. Before the game timeouts I was trying to load ArimaaWiki pages, and they were loading very slowly or not at all, which I assumed was a CPU problem, but could well have been a bandwidth problem. Quote:One other thing to check might be the size of hash tables the bots are using. |
| Good idea. If it is memory issue, then probably some bots cause more of a problem than others, and we ought to be able to isolate the worst offenders. Quote:If the bots are using more memory for a hash tables than is available I would expect that to cause severe swapping and performance issues for the whole server. |
| Omar mentioned that he had very slow disk writes when GnoBot2009Blitz was running in the qualifying. I guess that could be indicative of the memory swapping problem. Thanks for offering these other debugging suggestions in case it is not the CPU that is the bottleneck. We wouldn't want to introduce the wrong limits on the bots because of mis-diagnosing the problem.
|
|
IP Logged |
|
|
|
Eltripas
Forum Guru
Meh-he-kah-naw
Gender:
Posts: 225
|
|
Re: Server Overload
« Reply #5 on: Feb 1st, 2010, 5:08pm » |
Quote Modify
|
I like the suggestion 2 to be used during the WC period, also if the problem is Teamspeak related we should move back to Skype which as far as I understand doesn't uses any resource from the arimaa.com server. on Feb 1st, 2010, 12:29pm, Arimabuff wrote:One server overload does not a problem make. |
| That's something Yoda would say, hehe.
|
|
IP Logged |
|
|
|
Janzert
Forum Guru
Arimaa player #247
Gender:
Posts: 1016
|
|
Re: Server Overload
« Reply #6 on: Feb 1st, 2010, 5:57pm » |
Quote Modify
|
on Feb 1st, 2010, 5:08pm, Eltripas wrote:I like the suggestion 2 to be used during the WC period, also if the problem is Teamspeak related we should move back to Skype which as far as I understand doesn't uses any resource from the arimaa.com server. |
| Actually after checking dns I don't think the teamspeak server is running off the main arimaa.com server either. So it shouldn't be causing any load problem either. Janzert
|
|
IP Logged |
|
|
|
Manuel
Forum Guru
Arimaa player #4020
Gender:
Posts: 58
|
|
Re: Server Overload
« Reply #7 on: Feb 1st, 2010, 11:54pm » |
Quote Modify
|
I would be much in favor of suggestion 2, at first perhaps only during WC games, so that at least for these games time-outs are prevented. However, it must be checked first what is the problem: CPU, memory or bandwidth. Can't you simply see this in the logfiles? If not: perhaps Omar should start keeping logfiles for CPU load, memory load and networking, so that one can easily check what causes the problems. @Arimabuff: I remember you using strong language about freezeups, so one would think you would be in favor of solving these problems?
|
|
IP Logged |
|
|
|
Arimabuff
Forum Guru
Arimaa player #2764
Gender:
Posts: 589
|
|
Re: Server Overload
« Reply #8 on: Feb 2nd, 2010, 12:50am » |
Quote Modify
|
on Feb 1st, 2010, 5:08pm, Eltripas wrote:...That's something Yoda would say, hehe. |
| Be that as it may, it's an adaptation of an old English proverb: "One swallow does not a summer make."
|
|
IP Logged |
|
|
|
Arimabuff
Forum Guru
Arimaa player #2764
Gender:
Posts: 589
|
|
Re: Server Overload
« Reply #9 on: Feb 2nd, 2010, 12:59am » |
Quote Modify
|
on Feb 1st, 2010, 11:54pm, Manuel wrote: @Arimabuff: I remember you using strong language about freezeups, so one would think you would be in favor of solving these problems? |
| They were mostly due to my mathusalemic computer; the troubles went partially away after I did some cleaning and rearranging of the files. It’s almost tolerable now.
|
|
IP Logged |
|
|
|
omar
Forum Guru
Arimaa player #2
Gender:
Posts: 1003
|
|
Re: Server Overload
« Reply #10 on: Feb 3rd, 2010, 12:27pm » |
Quote Modify
|
Actually the current server only has two processors. It was the same one used as gold in the 2008 WCC. It is the server load that is causing problems. I have an easy way to turn off bots during WC games. I'll continue to use that for now. After the events this year I'll be transferring the site over to the server used for this years WCC. It will be about 3 times as fast as the current system.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Server Overload
« Reply #11 on: Feb 3rd, 2010, 1:45pm » |
Quote Modify
|
on Feb 3rd, 2010, 12:27pm, omar wrote:After the events this year I'll be transferring the site over to the server used for this years WCC. It will be about 3 times as fast as the current system. |
| That's very good news. Then we shouldn't have to worry about the load for another couple of years.
|
|
IP Logged |
|
|
|
Hippo
Forum Guru
Arimaa player #4450
Gender:
Posts: 883
|
|
Re: Server Overload
« Reply #12 on: Feb 7th, 2010, 2:44pm » |
Quote Modify
|
I know it's Omar's choice and investment, but wouldn't the configuration where the bot's are not running on the game server preferable. Having bot's server on different hardware than the game server? The game protocol would hardly be problem even with a lot of bots.
|
« Last Edit: Feb 7th, 2010, 2:46pm by Hippo » |
IP Logged |
|
|
|
Janzert
Forum Guru
Arimaa player #247
Gender:
Posts: 1016
|
|
Re: Server Overload
« Reply #13 on: Feb 7th, 2010, 8:48pm » |
Quote Modify
|
It's certainly preferable, the problem is the cost. If someone were to donate the server I'm sure Omar wouldn't have a problem moving the bots to their own server. Janzert
|
|
IP Logged |
|
|
|
omar
Forum Guru
Arimaa player #2
Gender:
Posts: 1003
|
|
Re: Server Overload
« Reply #14 on: Feb 10th, 2010, 8:23pm » |
Quote Modify
|
Yes, that would be the best way to do it. With no bots running I think the new server would be able to easily handle 300 users in the gameroom at the same time. Probably later this year I will let others invest in Arimaa and at that time I'll get another server for the bots.
|
|
IP Logged |
|
|
|
|