Welcome, Guest. Please Login or Register.
Nov 22nd, 2024, 4:55am

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « 2013 WCC R5 sharp vs marwin timeout »


   Arimaa Forum
   Arimaa
   Events
(Moderator: supersamu)
   2013 WCC R5 sharp vs marwin timeout
« Previous topic | Next topic »
Pages: 1  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: 2013 WCC R5 sharp vs marwin timeout  (Read 1715 times)
omar
Forum Guru
*****



Arimaa player #2

   


Gender: male
Posts: 1003
2013 WCC R5 sharp vs marwin timeout
« on: Mar 5th, 2013, 8:30am »
Quote Quote Modify Modify

bot_marwin timed out in this game. From the web server logs it seems like the web server process restarted due to the threading module crashing. See the errors at time: Mar 05 03:16:04
 
http://silver.arimaa.com/logs/httpd/error_log
 
Searching for this error lead me to:
 
https://bugzilla.redhat.com/show_bug.cgi?id=703495
 
Though it was reported as an OS bug, seems it a Python bug and was fixed in Python version 3.3. However, the default version of Python on the server is 2.6.6.
 
From the web server logs it seems like it also happened at Mar 03 08:18:38 and bot_marwin (or any other bots) was not running at that time.
 
So I would categorize this as a server fault and resume the game from the most recent position with the same reserve times. I add 10 seconds to the reserve of the player to move (bot_marwin) to compensate for the delay between setting up the game and starting the bots.
 
IP Logged
omar
Forum Guru
*****



Arimaa player #2

   


Gender: male
Posts: 1003
Re: 2013 WCC R5 sharp vs marwin timeout
« Reply #1 on: Mar 5th, 2013, 8:50am »
Quote Quote Modify Modify

I tried to resume the game, but sharp didn't join the game and timed out; although marwin did join the game and made a move. So I tried to resume this game and it seems both bots joined and are playing now.
IP Logged
Janzert
Forum Guru
*****



Arimaa player #247

   


Gender: male
Posts: 1016
Re: 2013 WCC R5 sharp vs marwin timeout
« Reply #2 on: Mar 5th, 2013, 11:28am »
Quote Quote Modify Modify

Omar, if you follow the bug report linked above upstream to the wsgi report you'll see that these messages are harmless and generated as wsgi is being shutdown. So they are not the cause of the problem. Apache seems to be shutting down in this case because it received a SIGHUP (finding the cause of that might lead to the real problem).  
 
Certainly something out of the ordinary happened on the server and the restart seems to be completely appropriate. But the wsgi error is also not the cause just another result. I'd be happy to try and dig into it further if you'd like so possibly it can be prevented from happening again, but there doesn't seem to be any more clues in these logs as to the original cause.
 
Janzert
« Last Edit: Mar 5th, 2013, 11:29am by Janzert » IP Logged
omar
Forum Guru
*****



Arimaa player #2

   


Gender: male
Posts: 1003
Re: 2013 WCC R5 sharp vs marwin timeout
« Reply #3 on: Mar 5th, 2013, 8:19pm »
Quote Quote Modify Modify

That's really strange. I wonder what's sending SIGHUP to the web server. I took a quick look but didn't see what might be causing this. It would be great if you could look at it as well. I'll PM you the login info.
IP Logged
Janzert
Forum Guru
*****



Arimaa player #247

   


Gender: male
Posts: 1016
Re: 2013 WCC R5 sharp vs marwin timeout
« Reply #4 on: Mar 6th, 2013, 6:58am »
Quote Quote Modify Modify

I haven't been on the server to check yet, but as I was going to sleep last night I'm 90% sure I figured it out. My guess right now is that this was caused by log rotation. After this morning's WCC game I'll try to get on and confirm the problem.
 
The easy fix is just don't have games between 3am and 4am server time. The correct fix would be to have the cgi script that starts up the bot daemonize itself, basically detaching from the parent process of the http server. This should take a dozen lines of code or less to accomplish, although I haven't looked at the script yet*.
 
Janzert
 
* If the script does this already then I'm back to stumped as to the cause. But I really think this will the root cause right now.
IP Logged
omar
Forum Guru
*****



Arimaa player #2

   


Gender: male
Posts: 1003
Re: 2013 WCC R5 sharp vs marwin timeout
« Reply #5 on: Mar 7th, 2013, 12:17am »
Quote Quote Modify Modify

I thought it might be a log rotate also, but since the logs continued I figured that might not be it.
 
Demonizing the bot process sounds like a good idea. I'll look into that.
IP Logged
Janzert
Forum Guru
*****



Arimaa player #247

   


Gender: male
Posts: 1016
Re: 2013 WCC R5 sharp vs marwin timeout
« Reply #6 on: Mar 8th, 2013, 10:06am »
Quote Quote Modify Modify

Only 2 of the ssl logs* were rotated, if any logs are rotated the process has to restart though.
 
Also Omar allowed me to add a bit more logging on the server. Specifically the server state log now includes a 'ps -efjH' call at the end. If you look at the output, for example here at the very end, you can see that while the bot runner scripts are changing the user and parent process the process group and session** are being retained from the web server. So any signal sent to the web server will also be sent to the bots.
 
Janzert
 
* ssl_access and ssl_request
 
** 4th and 5th column of the ps command output
« Last Edit: Mar 8th, 2013, 10:06am by Janzert » IP Logged
rbarreira
Forum Guru
*****



Arimaa player #1621

   


Gender: male
Posts: 605
Re: 2013 WCC R5 sharp vs marwin timeout
« Reply #7 on: Mar 11th, 2013, 10:22am »
Quote Quote Modify Modify

Could this problem affect screening games?
IP Logged
Janzert
Forum Guru
*****



Arimaa player #247

   


Gender: male
Posts: 1016
Re: 2013 WCC R5 sharp vs marwin timeout
« Reply #8 on: Mar 11th, 2013, 10:35am »
Quote Quote Modify Modify

Yes, it would be possible for this to happen again right now.
 
I'm (slowly) working on the modifications needed to prevent it in the future, but nothing has been changed yet.
 
Janzert
IP Logged
Janzert
Forum Guru
*****



Arimaa player #247

   


Gender: male
Posts: 1016
Re: 2013 WCC R5 sharp vs marwin timeout
« Reply #9 on: Mar 13th, 2013, 2:40am »
Quote Quote Modify Modify

I finally got time and all the kinks worked out of my patch to fix this issue and sent it off to Omar. So hopefully in the near future this should no longer be a concern.
 
Janzert
IP Logged
tize
Forum Guru
*****



Arimaa player #3121

   


Gender: male
Posts: 118
Re: 2013 WCC R5 sharp vs marwin timeout
« Reply #10 on: Mar 13th, 2013, 2:14pm »
Quote Quote Modify Modify

Nice done Janzert. And I think the "finally" should be a "already".
IP Logged
Pages: 1  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.