Author |
Topic: 2014 Arimaa Challenge (Read 9041 times) |
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
2014 Arimaa Challenge
« on: Mar 11th, 2014, 11:30am » |
Quote Modify
|
After a brief discussion of how much we fear the bots and/or expect to dominate them, the 2014 Arimaa Challenge screening games have begun. The bots won four of the first five, a relatively strong start, especially since ziltoid was noticeably ahead at one point even in the one game it lost, to browni3141. I recently told a casual observer that the strongest bot is about 2200 on the gameroom scale, and they're jointly a bit ahead of that out of the gate. Does anyone care to revise their assessment of the 2014 bot strength based on the early evidence? As the screening continues, I will update table below with the results from this year. Year Pairs Decisive Winner / Score / Perf Loser / Score / Perf ---- ----- -------- --------------------- -------------------- 2007 12 . 2 . bomb / 2 / 2087 . Zombie / 0 / 1876 2008 16 . 7 . bomb / 6 / 1918 . sharp / 1 / 1576 2009 23 . 7 clueless / 5 / 1910 . GnoBot / 2 / 1792 2010 25 . 11 marwin / 6 / 2065 clueless / 5 / 1960 2011 40 . 11 marwin / 6 / 2110 . sharp / 5 / 2109 2012 33 . 7 briareus / 5 / 2232 . marwin / 2 / 2128 2013 25 . 6 marwin / 4 / 2121 ziltoid / 2 / 2055 2014 33 . 11 ziltoid / 6 / 2259 . sharp / 5 / 2244
|
« Last Edit: Mar 31st, 2014, 8:08pm by Fritzlein » |
IP Logged |
|
|
|
browni3141
Forum Guru
Arimaa player #7014
Gender:
Posts: 385
|
|
Re: 2014 Arimaa Challenge
« Reply #1 on: Mar 11th, 2014, 12:12pm » |
Quote Modify
|
It's pretty much what I would expect so far. The fact that ziltoid couldn't convert a strong position against me indicates weakness more than getting to such a position indicates strength (if you look at the game, I made a pretty major blunder at 8s) I guess I would have expected at least one of the other four games to have been a human win, but it is a pretty small sample.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: 2014 Arimaa Challenge
« Reply #2 on: Mar 11th, 2014, 3:27pm » |
Quote Modify
|
on Mar 11th, 2014, 12:12pm, browni3141 wrote:The fact that ziltoid couldn't convert a strong position against me indicates weakness more than getting to such a position indicates strength (if you look at the game, I made a pretty major blunder at 8s) |
| Well, gee, if the human players never made mistakes, then I would bet against the bots just as heavily as you do. In the history of chess man vs. machine matches there was a glorious tradition of discounting machine victories as meaningless because of how badly their human opponents played. But I guess it isn't limited to man versus machine; there is also the ancient chess quote, "I've hardly ever defeated a healthy opponent". Quote:[...] it is a pretty small sample. |
| Amen to that! Even with each bot playing dozens of games by the end of the screening, the sample remains small, and it is hard to draw conclusions. For example, I highly doubt that ziltoid2013 was weaker than briareus2012, as if rbarreria had introduced bugs in the mean time, but the performance rating in the screening dropped off by 177 points, as you can see in the table in my first post. There's a lot a random variation both on the human side and on the bot side.
|
« Last Edit: Mar 11th, 2014, 3:28pm by Fritzlein » |
IP Logged |
|
|
|
browni3141
Forum Guru
Arimaa player #7014
Gender:
Posts: 385
|
|
Re: 2014 Arimaa Challenge
« Reply #3 on: Mar 11th, 2014, 3:44pm » |
Quote Modify
|
on Mar 11th, 2014, 3:27pm, Fritzlein wrote: Well, gee, if the human players never made mistakes, then I would bet against the bots just as heavily as you do. In the history of chess man vs. machine matches there was a glorious tradition of discounting machine victories as meaningless because of how badly their human opponents played. But I guess it isn't limited to man versus machine; there is also the ancient chess quote, "I've hardly ever defeated a healthy opponent". |
| But I made a serious blunder and still won. It doesn't take skill from the bot to recognize that I allowed a double hostage and take it. It does take some skill to convert it into a win (which it lacked) I will admit that if ziltoid had actually won I would probably discount it due to my error. It was a rather large error after all, to go from significantly ahead to significantly behind. 8s was probably a net loss of a dog's worth to a horse's worth of material. I really want the bots to start taking advantage of mistakes we don't even know we're making, but I suppose that's a lot to ask Quote: Amen to that! Even with each bot playing dozens of games by the end of the screening, the sample remains small, and it is hard to draw conclusions. For example, I highly doubt that ziltoid2013 was weaker than briareus2012, as if rbarreria had introduced bugs in the mean time, but the performance rating in the screening dropped off by 177 points, as you can see in the table in my first post. There's a lot a random variation both on the human side and on the bot side. |
| I have been thinking that the screening is not a very fair way to determine which of the bots becomes the challenger, if the bots are close. I guess that can be argued for any format though. Hopefully the first bot with a legitimate chance at winning the Challenge is clearly ahead of the others.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: 2014 Arimaa Challenge
« Reply #4 on: Mar 11th, 2014, 7:51pm » |
Quote Modify
|
on Mar 11th, 2014, 3:44pm, browni3141 wrote:It doesn't take skill from the bot to recognize that I allowed a double hostage and take it. It does take some skill to convert it into a win (which it lacked) |
| In other words, what a bot executes well doesn't count as skill, and what a bot does badly proves it has no skill? If you applied your argument in reverse to humans, it would go something like this: "Sure humans make great long-term plans and have good fuzzy evaluation apart from lookahead, but they still completely overlook moves, therefore they stink at Arimaa." To my mind, the threat that is posed by the bots isn't made less by pointing out that it isn't "skill". I certainly can't play blunder-free myself, so it doesn't comfort me much to say, "Apart from my blunders, I can crush bots." I'm not going to disparage a bot based on what kind of thing it does well. Whatever a bot does well gives it winning chances. Furthermore, if you want to reserve "skill" for what we do better, you need to provide us with another word for being good at finding good moves in the way that bots find good moves. No matter how either player does what it takes to win, it comes down to wins and losses. So far, at 6-1 over humans, the wins are going relatively badly for humanity. 6-1 against the human opposition would be the expected score of a player with a gameroom rating of 2382. Small sample, small sample, small sample. Since it ultimately does come down to wins and losses, a small sample isn't likely to change anybody's mind. When the evidence is too small and nobody is going to give ground, the traditional male maneuver is to place a wager. I will bet you one hundred Arimaa points that the top bot in the screening will have the highest-ever performance rating in a screening, i.e. over 2232. I define performance rating as the rating the bot would have needed in order to have an expected number of wins equal to the actual number of wins it got, using gameroom ratings. I won't even complain that your deflated gameroom rating is holding down the performance ratings of the bots. If you think this is a bad bet for you, then we have been arguing over nothing, because we actually agree about how tough the bots are to beat.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: 2014 Arimaa Challenge
« Reply #5 on: Mar 11th, 2014, 8:10pm » |
Quote Modify
|
on Mar 11th, 2014, 3:44pm, browni3141 wrote:Hopefully the first bot with a legitimate chance at winning the Challenge is clearly ahead of the others. |
| This is the first year since 2009 that the winner of the Computer Championship wasn't down to its last life, so it is already less of a coin flip than we are used to. I would bet another 100 Arimaa points that sharp advances to the Challenge with a point to spare, i.e. you would win if ziltoid advances or if it is tied and sharp advances by virtue of having won the CC. Of course, you could win this bet by throwing your games to sharp, so it's only on offer if you take the first bet too. The point of the bet is that if it unclear which bot is better, it would be slightly in your favor.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: 2014 Arimaa Challenge
« Reply #6 on: Mar 12th, 2014, 12:41am » |
Quote Modify
|
OK, browni accepted both bets in chat. As of the first update, he's losing the first and winning the second: ziltoid leads with a performance of 2354 to sharp's 2057.
|
|
IP Logged |
|
|
|
mistre
Forum Guru
Gender:
Posts: 553
|
|
Re: 2014 Arimaa Challenge
« Reply #7 on: Mar 12th, 2014, 10:02am » |
Quote Modify
|
Bot_Sharp timing out against SilverMitt obviously hurt its performance rating. Any idea why the time out occurred?
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: 2014 Arimaa Challenge
« Reply #8 on: Mar 12th, 2014, 12:04pm » |
Quote Modify
|
Janzert has to sort out sharp's timeout against SilverMitt and also kzb's timeout against sharp, which has disappeared from the standings due to kzb's unrating it. Certainly the game met the criteria for why we allowed unrating of games in the first place, i.e. kzb was way ahead and we trust him if he says he lost connection rather than thinking for too long. But that doesn't mean it shouldn't count in the standings. It's a delicate issue, because counting it adds noise to the standings and not counting it opens the screening to some kinds of abuse.
|
|
IP Logged |
|
|
|
browni3141
Forum Guru
Arimaa player #7014
Gender:
Posts: 385
|
|
Re: 2014 Arimaa Challenge
« Reply #9 on: Mar 12th, 2014, 12:21pm » |
Quote Modify
|
Doesn't counting it also add potential for abuse? If we think sharp is the easier bot to beat, then we have a "network problem" or two to make sure sharp is the one that makes it, not ziltoid. It is harder to detect abuse there than someone intentionally throwing games by other means.
|
|
IP Logged |
|
|
|
kzb52
Forum Guru
Arimaa player #8454
Gender:
Posts: 71
|
|
Re: 2014 Arimaa Challenge
« Reply #10 on: Mar 12th, 2014, 1:37pm » |
Quote Modify
|
To clarify my situation, my timeout was a problem on my end, and was unfortunate but not unusual in that regard. If I had known the game would disappear from the standings, I would not have unrated it. I will not play any more games until I get some sort of ruling from above. If the game needs to be un-unrated, go for it.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: 2014 Arimaa Challenge
« Reply #11 on: Mar 12th, 2014, 2:04pm » |
Quote Modify
|
on Mar 12th, 2014, 12:21pm, browni3141 wrote:Doesn't counting it also add potential for abuse? If we think sharp is the easier bot to beat, then we have a "network problem" or two to make sure sharp is the one that makes it, not ziltoid. It is harder to detect abuse there than someone intentionally throwing games by other means. |
| Good point, browni. This is unlike the World Championship tournament, where the players would have a strong incentive to abuse the system to increase their prize payout, although I guess people have been known to lie in order to win with no other motivation than winning. Here abusing the system would just mean favoring one bot over the other, which can be done by other means. That argument has me leaning towards trusting the human players to be honest about whether they got disconnected for a minute, as opposed to sending the move with just a few seconds left. We could resume the game and give extra thinking time to the bot that won on time, similar to how it is done in the World Championship. Given the lack of financial incentive, it might not be abused at all, so allowing resumption would add less noise to the result than letting the result stand. Perhaps the major downside would be a headache for the TD in trying to get games restarted, though. What if someone with a super-awful connection participates in the screening, and gets disconnected eleven times over the course of the four games? Maybe timeouts, even though unfair, should stand out of sympathy for the poor TD. Getting the most accurate result should be balanced against how much work that is, which has me leaning the other way now.
|
« Last Edit: Mar 12th, 2014, 2:06pm by Fritzlein » |
IP Logged |
|
|
|
browni3141
Forum Guru
Arimaa player #7014
Gender:
Posts: 385
|
|
Re: 2014 Arimaa Challenge
« Reply #12 on: Mar 12th, 2014, 2:30pm » |
Quote Modify
|
on Mar 11th, 2014, 11:30am, Fritzlein wrote: Year Pairs Decisive Winner / Score / Perf Loser / Score / Perf ---- ----- -------- --------------------- -------------------- 2007 12 . 2 . bomb / 2 / 2087 . Zombie / 0 / 1876 2008 16 . 7 . bomb / 6 / 1918 . sharp / 1 / 1576 2009 23 . 7 clueless / 5 / 1910 . GnoBot / 2 / 1792 2010 25 . 11 marwin / 6 / 2065 clueless / 5 / 1960 2011 40 . 11 marwin / 6 / 2110 . sharp / 5 / 2109 2012 33 . 7 briareus / 5 / 2232 . marwin / 2 / 2128 2013 25 . 6 marwin / 4 / 2121 ziltoid / 2 / 2055 2014 .2 . 1 ziltoid / 1 / 2354 . sharp / 0 / 2057 |
| Does anybody have any theories on why screening participation has gone down in the last two years?
|
|
IP Logged |
|
|
|
Ail
Forum Guru
Rabbits can't push Rabbits!
Gender:
Posts: 52
|
|
Re: 2014 Arimaa Challenge
« Reply #13 on: Mar 12th, 2014, 4:41pm » |
Quote Modify
|
on Mar 12th, 2014, 2:30pm, browni3141 wrote: Does anybody have any theories on why screening participation has gone down in the last two years? |
| Theory #1: People like winning. Winning was easier when the bots were easier to beat. Thus less people felt like challenging the bots when they expected to be beaten.
|
|
IP Logged |
|
|
|
Janzert
Forum Guru
Arimaa player #247
Gender:
Posts: 1016
|
|
Re: 2014 Arimaa Challenge
« Reply #14 on: Mar 12th, 2014, 7:09pm » |
Quote Modify
|
Regarding game #295864, after examining the bot logs it's apparent that the bot timed out as the result of a network problem between the bot server and arimaa.com server. The relevant section of the log is: Code:2014-03-12 05:09:20 ERROR:gameroom:Caught unkown exception #1, restarting. Traceback (most recent call last): File "gameroom.py", line 875, in main table.playgame(engine_ctl, bot_greeting, options['onemove']) File "gameroom.py", line 448, in playgame self.move(response.move) File "gameroom.py", line 244, in move response = post(self.url, values, "Table.move") File "gameroom.py", line 83, in post response = urllib2.urlopen(req) File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib64/python2.6/urllib2.py", line 391, in open response = self._open(req, data) File "/usr/lib64/python2.6/urllib2.py", line 409, in _open '_open', req) File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open raise URLError(err) URLError: <urlopen error [Errno 113] No route to host> |
| By the rules and previous precedent the game should be restarted from the point of the timeout. If SilverMitt is unavailable to resume the game before the end of screening the result should be invalidated and removed from the screening results. Janzert
|
|
IP Logged |
|
|
|
|