Arimaa Forum - Print Page


    
      
        Arimaa Forum
        (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
      

        Arimaa >> Events >> 2014 Arimaa Challenge
        
(Message started by: Fritzlein on Mar 11^th, 2014, 11:30am)

Title: 2014 Arimaa Challenge
Post by Fritzlein on Mar 11^th, 2014, 11:30am

After a brief discussion (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=events;action=display;num=1392928780) of how much we fear the bots and/or expect to dominate them, the 2014 Arimaa Challenge screening games have begun. The bots won four of the first five, a relatively strong start, especially since ziltoid was noticeably ahead at one point even in the one game it lost, to browni3141. I recently told a casual observer that the strongest bot is about 2200 on the gameroom scale, and they're jointly a bit ahead of that out of the gate. Does anyone care to revise their assessment of the 2014 bot strength based on the early evidence? As the screening continues, I will update table below with the results from this year.

Year Pairs Decisive Winner / Score / Perf Loser / Score / Perf
---- ----- -------- --------------------- --------------------
2007 12 . 2 . bomb / 2 / 2087 . Zombie / 0 / 1876
2008 16 . 7 . bomb / 6 / 1918 . sharp / 1 / 1576
2009 23 . 7 clueless / 5 / 1910 . GnoBot / 2 / 1792
2010 25 . 11 marwin / 6 / 2065 clueless / 5 / 1960
2011 40 . 11 marwin / 6 / 2110 . sharp / 5 / 2109
2012 33 . 7 briareus / 5 / 2232 . marwin / 2 / 2128
2013 25 . 6 marwin / 4 / 2121 ziltoid / 2 / 2055
2014 33 . 11 ziltoid / 6 / 2259 . sharp / 5 / 2244

Title: Re: 2014 Arimaa Challenge
Post by browni3141 on Mar 11^th, 2014, 12:12pm

It's pretty much what I would expect so far. The fact that ziltoid couldn't convert a strong position against me indicates weakness more than getting to such a position indicates strength (if you look at the game, I made a pretty major blunder at 8s)
I guess I would have expected at least one of the other four games to have been a human win, but it is a pretty small sample.

Title: Re: 2014 Arimaa Challenge
Post by Fritzlein on Mar 11^th, 2014, 3:27pm

on 03/11/14 at 12:12:00, browni3141 wrote:

The fact that ziltoid couldn't convert a strong position against me indicates weakness more than getting to such a position indicates strength (if you look at the game, I made a pretty major blunder at 8s)

Well, gee, if the human players never made mistakes, then I would bet against the bots just as heavily as you do. ;) In the history of chess man vs. machine matches there was a glorious tradition of discounting machine victories as meaningless because of how badly their human opponents played. But I guess it isn't limited to man versus machine; there is also the ancient chess quote, "I've hardly ever defeated a healthy opponent". :)

Quote:

[...] it is a pretty small sample.

Amen to that! Even with each bot playing dozens of games by the end of the screening, the sample remains small, and it is hard to draw conclusions. For example, I highly doubt that ziltoid2013 was weaker than briareus2012, as if rbarreria had introduced bugs in the mean time, but the performance rating in the screening dropped off by 177 points, as you can see in the table in my first post. There's a lot a random variation both on the human side and on the bot side.

Title: Re: 2014 Arimaa Challenge
Post by browni3141 on Mar 11^th, 2014, 3:44pm

on 03/11/14 at 15:27:44, Fritzlein wrote:

But I made a serious blunder and still won. It doesn't take skill from the bot to recognize that I allowed a double hostage and take it. It does take some skill to convert it into a win (which it lacked)
I will admit that if ziltoid had actually won I would probably discount it due to my error. It was a rather large error after all, to go from significantly ahead to significantly behind. 8s was probably a net loss of a dog's worth to a horse's worth of material. I really want the bots to start taking advantage of mistakes we don't even know we're making, but I suppose that's a lot to ask :P

Quote:

I have been thinking that the screening is not a very fair way to determine which of the bots becomes the challenger, if the bots are close. I guess that can be argued for any format though. Hopefully the first bot with a legitimate chance at winning the Challenge is clearly ahead of the others.

Title: Re: 2014 Arimaa Challenge
Post by Fritzlein on Mar 11^th, 2014, 7:51pm

on 03/11/14 at 15:44:32, browni3141 wrote:

It doesn't take skill from the bot to recognize that I allowed a double hostage and take it. It does take some skill to convert it into a win (which it lacked)

In other words, what a bot executes well doesn't count as skill, and what a bot does badly proves it has no skill? If you applied your argument in reverse to humans, it would go something like this: "Sure humans make great long-term plans and have good fuzzy evaluation apart from lookahead, but they still completely overlook moves, therefore they stink at Arimaa."

To my mind, the threat that is posed by the bots isn't made less by pointing out that it isn't "skill". I certainly can't play blunder-free myself, so it doesn't comfort me much to say, "Apart from my blunders, I can crush bots." I'm not going to disparage a bot based on what kind of thing it does well. Whatever a bot does well gives it winning chances. Furthermore, if you want to reserve "skill" for what we do better, you need to provide us with another word for being good at finding good moves in the way that bots find good moves.

No matter how either player does what it takes to win, it comes down to wins and losses. So far, at 6-1 over humans, the wins are going relatively badly for humanity. 6-1 against the human opposition would be the expected score of a player with a gameroom rating of 2382.

Small sample, small sample, small sample. Since it ultimately does come down to wins and losses, a small sample isn't likely to change anybody's mind. When the evidence is too small and nobody is going to give ground, the traditional male maneuver is to place a wager. I will bet you one hundred Arimaa points that the top bot in the screening will have the highest-ever performance rating in a screening, i.e. over 2232. I define performance rating as the rating the bot would have needed in order to have an expected number of wins equal to the actual number of wins it got, using gameroom ratings. I won't even complain that your deflated gameroom rating is holding down the performance ratings of the bots. If you think this is a bad bet for you, then we have been arguing over nothing, because we actually agree about how tough the bots are to beat.

Title: Re: 2014 Arimaa Challenge
Post by Fritzlein on Mar 11^th, 2014, 8:10pm

on 03/11/14 at 15:44:32, browni3141 wrote:

Hopefully the first bot with a legitimate chance at winning the Challenge is clearly ahead of the others.

This is the first year since 2009 that the winner of the Computer Championship wasn't down to its last life, so it is already less of a coin flip than we are used to. I would bet another 100 Arimaa points that sharp advances to the Challenge with a point to spare, i.e. you would win if ziltoid advances or if it is tied and sharp advances by virtue of having won the CC. Of course, you could win this bet by throwing your games to sharp, so it's only on offer if you take the first bet too. :) The point of the bet is that if it unclear which bot is better, it would be slightly in your favor.

Title: Re: 2014 Arimaa Challenge
Post by Fritzlein on Mar 12^th, 2014, 12:41am

OK, browni accepted both bets in chat. As of the first update, he's losing the first and winning the second: ziltoid leads with a performance of 2354 to sharp's 2057.

Title: Re: 2014 Arimaa Challenge
Post by mistre on Mar 12^th, 2014, 10:02am

Bot_Sharp timing out against SilverMitt obviously hurt its performance rating. Any idea why the time out occurred?

Title: Re: 2014 Arimaa Challenge
Post by Fritzlein on Mar 12^th, 2014, 12:04pm

Janzert has to sort out sharp's timeout against SilverMitt and also kzb's timeout against sharp, which has disappeared from the standings due to kzb's unrating it. Certainly the game met the criteria for why we allowed unrating of games in the first place, i.e. kzb was way ahead and we trust him if he says he lost connection rather than thinking for too long. But that doesn't mean it shouldn't count in the standings. It's a delicate issue, because counting it adds noise to the standings and not counting it opens the screening to some kinds of abuse.

Title: Re: 2014 Arimaa Challenge
Post by browni3141 on Mar 12^th, 2014, 12:21pm

Doesn't counting it also add potential for abuse? If we think sharp is the easier bot to beat, then we have a "network problem" or two to make sure sharp is the one that makes it, not ziltoid. It is harder to detect abuse there than someone intentionally throwing games by other means.

Title: Re: 2014 Arimaa Challenge
Post by kzb52 on Mar 12^th, 2014, 1:37pm

To clarify my situation, my timeout was a problem on my end, and was unfortunate but not unusual in that regard. If I had known the game would disappear from the standings, I would not have unrated it. I will not play any more games until I get some sort of ruling from above. If the game needs to be un-unrated, go for it.

Title: Re: 2014 Arimaa Challenge
Post by Fritzlein on Mar 12^th, 2014, 2:04pm

on 03/12/14 at 12:21:10, browni3141 wrote:

Good point, browni. This is unlike the World Championship tournament, where the players would have a strong incentive to abuse the system to increase their prize payout, although I guess people have been known to lie in order to win with no other motivation than winning. Here abusing the system would just mean favoring one bot over the other, which can be done by other means. That argument has me leaning towards trusting the human players to be honest about whether they got disconnected for a minute, as opposed to sending the move with just a few seconds left. We could resume the game and give extra thinking time to the bot that won on time, similar to how it is done in the World Championship. Given the lack of financial incentive, it might not be abused at all, so allowing resumption would add less noise to the result than letting the result stand.

Perhaps the major downside would be a headache for the TD in trying to get games restarted, though. What if someone with a super-awful connection participates in the screening, and gets disconnected eleven times over the course of the four games? Maybe timeouts, even though unfair, should stand out of sympathy for the poor TD. Getting the most accurate result should be balanced against how much work that is, which has me leaning the other way now.

Title: Re: 2014 Arimaa Challenge
Post by browni3141 on Mar 12^th, 2014, 2:30pm

on 03/11/14 at 11:30:13, Fritzlein wrote:

Year Pairs Decisive Winner / Score / Perf Loser / Score / Perf
---- ----- -------- --------------------- --------------------
2007 12 . 2 . bomb / 2 / 2087 . Zombie / 0 / 1876
2008 16 . 7 . bomb / 6 / 1918 . sharp / 1 / 1576
2009 23 . 7 clueless / 5 / 1910 . GnoBot / 2 / 1792
2010 25 . 11 marwin / 6 / 2065 clueless / 5 / 1960
2011 40 . 11 marwin / 6 / 2110 . sharp / 5 / 2109
2012 33 . 7 briareus / 5 / 2232 . marwin / 2 / 2128
2013 25 . 6 marwin / 4 / 2121 ziltoid / 2 / 2055
2014 .2 . 1 ziltoid / 1 / 2354 . sharp / 0 / 2057

Does anybody have any theories on why screening participation has gone down in the last two years?

Title: Re: 2014 Arimaa Challenge
Post by Ail on Mar 12^th, 2014, 4:41pm

on 03/12/14 at 14:30:22, browni3141 wrote:

Does anybody have any theories on why screening participation has gone down in the last two years?

Theory #1:
People like winning. Winning was easier when the bots were easier to beat.
Thus less people felt like challenging the bots when they expected to be beaten.

Title: Re: 2014 Arimaa Challenge
Post by Janzert on Mar 12^th, 2014, 7:09pm

Regarding game #295864 (http://arimaa.com/arimaa/gameroom/comments.cgi?gid=295864), after examining the bot logs it's apparent that the bot timed out as the result of a network problem between the bot server and arimaa.com server. The relevant section of the log is:

Code:

2014-03-12 05:09:20 ERROR:gameroom:Caught unkown exception #1, restarting.
Traceback (most recent call last):
File "gameroom.py", line 875, in main
table.playgame(engine_ctl, bot_greeting, options['onemove'])
File "gameroom.py", line 448, in playgame
self.move(response.move)
File "gameroom.py", line 244, in move
response = post(self.url, values, "Table.move")
File "gameroom.py", line 83, in post
response = urllib2.urlopen(req)
File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib64/python2.6/urllib2.py", line 391, in open
response = self._open(req, data)
File "/usr/lib64/python2.6/urllib2.py", line 409, in _open
'_open', req)
File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 113] No route to host>

By the rules and previous precedent (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=events;action=display;num=1363101824;start=5#5 ) the game should be restarted from the point of the timeout. If SilverMitt is unavailable to resume the game before the end of screening the result should be invalidated and removed from the screening results.

Janzert

Title: Re: 2014 Arimaa Challenge
Post by Janzert on Mar 13^th, 2014, 8:14pm

Sorry for the delay. I'm going to make the symmetric ruling for human timeouts attributable to connection issues as for bot timeouts. Specifically the game should be resumed if possible and disregarded if it can't be completed by the time the screening ends.

If you need a game resumed you can get with either myself and/or Omar if you need a game resumed.

Janzert

Title: Re: 2014 Arimaa Challenge
Post by Fritzlein on Mar 15^th, 2014, 1:53pm

I have updated the results including kzb's resumed win over sharp but excluding SilverMitt's timeout win over sharp. This leaves me even on my bets with browni because the top bot is setting a record, but the top bot isn't sharp. Ziltoid leads sharp by 2-0 in completed pairs, and by 2382-2067 in performance rating.

Title: Re: 2014 Arimaa Challenge
Post by Fritzlein on Mar 19^th, 2014, 5:55pm

Arimaa_master's win over sharp drops sharp's performance rating to a disappointing 2036, while ziltoid has kept on trucking to a stratospheric performance rating of 2455. The small sample is obviously at work on both sides. There have been no more decisive pairs completed, so ziltoid continues to lead 2-0.

Title: Re: 2014 Arimaa Challenge
Post by Fritzlein on Mar 20^th, 2014, 10:42pm

Since last update, browni beat ziltiod, but ziltoid beat arimaa_master and sharp beat both aaaa and harvestsnow, so the bots collectively gained a bit of ground. Ziltoid's lead stretches to 3-0 on the completed arimaa_master pair, but its lead in performance rating shrinks to 2403 vs. 2139.

Title: Re: 2014 Arimaa Challenge
Post by Fritzlein on Mar 23^rd, 2014, 10:40am

on 03/12/14 at 16:41:38, Ail wrote:

Theory #1:
People like winning. Winning was easier when the bots were easier to beat.
Thus less people felt like challenging the bots when they expected to be beaten.

Good theory, Ail. It would be nice if we could at least match the 25 completed pairs that we had last year (currently we have 11 with eight days to go), but I'm afraid there will be a lot of "one and done" screening participants. I'll bet people who lose their first screening game are much less likely to play a second than people who win their first. What seems like a fun challenge can quickly turn into a chore without positive feedback.

Hat tip to arimaa_master for becoming the first player to complete all four screening games. His final game, a victory over ziltoid, gives sharp its first point of the screening, narrowing ziltoid's lead to 3-1. Ziltoid also leads in performance rating by 2347 to 2172, but there is plenty of time for that to change in the final week of screening!

Title: Re: 2014 Arimaa Challenge
Post by browni3141 on Mar 23^rd, 2014, 11:03am

Has omar considered a shorter time control, like 1m/move? I think a lot of people either can't, or don't want to set aside such a large block of time.

Title: Re: 2014 Arimaa Challenge
Post by Fritzlein on Mar 23^rd, 2014, 3:04pm

on 03/23/14 at 11:03:11, browni3141 wrote:

Has omar considered a shorter time control, like 1m/move? I think a lot of people either can't, or don't want to set aside such a large block of time.

It was discussed in the past, but the argument that the Arimaa Challenge time controls are the ones that should govern the screening is the one that prevailed. One concern is that bots may be better at different speeds, and we want the bot that is best at the Challenge speed. But these days there is starting to be another issue: if you speed up the time control, then even fewer humans will be able to win. Halving the time control probably adds 50 Elo or more to bot strength relative to humans, further demotivating people who get whacked and further shrinking the pool of folks who are likely to provide discrimination by beating one bot and losing to the other.

I do see the case for shorter time controls: more games equals more information. In fact, I once proposed that we speed up the time controls temporarily, as long as humans are comfortably ahead, and only slow them down again when we are nearer to defeat. That idea didn't fly because it creates the impression that we are willing to "move the goalposts", i.e. keep changing the rules of the Challenge so that we can be sure to keep winning. For that reason alone, I expect any rule change will be a tough sell to Omar. He would be happiest if we could get away with not making any more changes until the Challenge expires in 2020.

For the mean time, I hope we can inspire a few more people to take their best shot at winning a long, slow game. Scoring even one win is an achievement to be proud of. Do it now before our silicon overlords enslave us all! :P

Title: Re: 2014 Arimaa Challenge
Post by rbarreira on Mar 23^rd, 2014, 5:46pm

I noticed that the precise moment the screening ends is not defined in the rules:

http://arimaa.com/arimaa/wc/2014/sch.html

http://arimaa.com/arimaa/challenge/2014/

It just says "March 31" without specifying a time or timezone for the games to start/end.

It might be worth it to clarify that before it becomes an issue.

Title: Re: 2014 Arimaa Challenge
Post by 99of9 on Mar 23^rd, 2014, 7:21pm

on 03/23/14 at 15:04:08, Fritzlein wrote:

He would be happiest if we could get away with not making any more changes until the Challenge expires in 2020.

Me too, for the same reasons.

Title: Re: 2014 Arimaa Challenge
Post by Ail on Mar 24^th, 2014, 11:00am

on 03/23/14 at 10:40:55, Fritzlein wrote:

I'm afraid there will be a lot of "one and done" screening participants. I'll bet people who lose their first screening game are much less likely to play a second than people who win their first.

I feel pretty much looked through now.

I am like 1:10 against not even the highest level of the 2012-Sharp on my phone having it use like 5 seconds while I use as long as I feel like.
Thus I got smashed like expected. And I really don't feel like getting smashed 3 more times.

I think that if I can't even put up a good fight against my phone, it's too unlikely I can do against better bots on better hardware.

Title: Re: 2014 Arimaa Challenge
Post by browni3141 on Mar 24^th, 2014, 12:13pm

on 03/23/14 at 15:04:08, Fritzlein wrote:

Halving the time control probably adds 50 Elo or more to bot strength relative to humans, further demotivating people who get whacked and further shrinking the pool of folks who are likely to provide discrimination by beating one bot and losing to the other.

Wow, my own estimate was that a single doubling was worth about 150 points of strength relative to a bot getting the same time increase, at least for myself.

I agree with all the reasons why we shouldn't change the time control, but at the same time I think increasing participation is extremely important, especially as we are losing participation and accuracy in the screening in consecutive years.
How about having a reward for each pair completed? Then the problem is where the reward will come from...
Perhaps the reward can just be someone's time. Maybe some strong players can annotate all the games of completed pairs, and getting some free game help will be enough for more players to complete at least one pair.

Another suggestion that can be implemented independently of previous suggestions is to allow players to complete more than two pairs. This would be a very minor rule change. I understand that omar wouldn't want one player's performance being weighted too heavily, but I don't see how more games can hurt at this point. A cap of three or four pairs seems reasonable. I'm not sure how many people would want to do more anyway. Two is probably already plenty for most ;)

Also, I just remembered that I have a half typed response for this thread...

on 03/24/14 at 11:00:07, Ail wrote:

Although games between very close opponents should yield the most information, every pair completed is meaningful, Ail, so it would be really nice if you could play just one more game. If you play a second screening game, then I offer to annotate both of your games for you, and answer any questions you have about either game.
If you play another pair after that, I'll do the same for that pair also.

Title: Re: 2014 Arimaa Challenge
Post by browni3141 on Mar 24^th, 2014, 12:16pm

on 03/23/14 at 17:46:33, rbarreira wrote:

It is on this page: http://arimaa.com/arimaa/challenge/2014/playBestBots.cgi
but it probably should be on those pages also.

Title: Re: 2014 Arimaa Challenge
Post by Fritzlein on Mar 25^th, 2014, 5:35pm

on 03/24/14 at 12:13:49, browni3141 wrote:

Wow, my own estimate was that a single doubling was worth about 150 points of strength relative to a bot getting the same time increase, at least for myself.

Hmmm...
With three doublings between CC and blitz, that would be a 450 point difference? I admit that the CC bots are probably a bit overrated because the humans don't use their full time allotment, but the actual rating difference between a blitz and a CC bot of the same vintage on the server seems to be in the 150-200 point range on average, just from eyeballing it. So my 50 points per doubling is probably a lower bound rather than an accurate guess, but not a ridiculously conservative lower bound.

Title: Re: 2014 Arimaa Challenge
Post by browni3141 on Mar 25^th, 2014, 8:46pm

on 03/25/14 at 17:35:18, Fritzlein wrote:

These win-rates seem reasonable:
Blitz: 50%
Fast: 70%
60s: 85%
CC: 93%
I would be interested in seeing more data, but for something like this, there are tons of variables which could affect the results.

Title: Re: 2014 Arimaa Challenge
Post by Fritzlein on Mar 26^th, 2014, 8:50pm

Hat tip to Heyckie for becoming the second player to complete all four screening games, and to BlakeD, Braveheart, and BrendanM for individual wins. The bots have slipped a bit to performance ratings of 2286 vs. 2112, and ziltoid's lead has opened back up to 4-1, so it is looking more likely that browni will win both of his bets with me.

Title: Re: 2014 Arimaa Challenge
Post by Fritzlein on Mar 28^th, 2014, 12:26am

A flurry of screening activity since last update: browni3141 becomes the third player to complete all four games, aurelian beats ziltoid from a 909-point rating disadvantage, each bot wins a completed pair, the still incomplete pairs tilt slightly to sharp's favor, and the combined performance rating dips slightly.

Ziltoid now leads 5-2 in points, and 2233-2145 in performance rating. I'm still winning one of my two bets, but only by a single rating point!

Title: Re: 2014 Arimaa Challenge
Post by Fritzlein on Mar 29^th, 2014, 10:42am

A continued high level of screening activity down the home stretch. Yay! Aurelian and kzb52 become the fourth and fifth players to complete all four screening games. Ziltoid's lead in decisive pairs has narrowed to 6-4. Furthermore, three incomplete pairs favor sharp (chessandgo, SilverMitt, Hippo) while only one favors ziltoid (aaaa), so the final outcome is still very much in the air.

Sharp's comeback is also reflected in a narrower gap in performance rating, now just 2236 vs. 2180 in favor of ziltoid. With each bot having played 28 games, the performance rating no longer moves as much with each new game played, but it is still conceivable that ziltoid could drop below the record level while sharp surpasses it.

Title: Re: 2014 Arimaa Challenge
Post by Fritzlein on Mar 30^th, 2014, 10:49am

The bots swept the five games since my last update. Congratulations to BlakeD, aaaa, and Hippo for each completing all four screening games. We now have more completed pairs of games than last year: I'm proud of the level of participation from the Arimaa community. And we still have over a day left!

I am embarrassed to report that I have been miscounting the performance rating for ziltoid. I don't know when I got off by one, but I doubled-checked today, and ziltoid is only 26-7, not 27-6 as I had the tally. :-[

That means the "record performance" bet that I thought I had nearly clinched will now likely go against me. Ziltoid's performance rating is now 2219; sharp's is 2193. The sum of the two performance ratings is on track to be a record sum, but that wasn't the bet. :P

Title: Re: 2014 Arimaa Challenge
Post by Janzert on Mar 31^st, 2014, 7:32am

In game 298157 (http://arimaa.com/arimaa/gameroom/comments.cgi?gid=298157) Lion had a connection related timeout against sharp last night. As with the others it should be resumed and played out, if that can happen before the screening period ends.

Janzert

Title: Re: 2014 Arimaa Challenge
Post by Janzert on Mar 31^st, 2014, 2:28pm

The below is just a "for the record" announcement. :)

Lion played another game with sharp before resuming the timed out game. This apparently confused the screening scheduler and it was pairing him with ziltoid playing the wrong color.

To work around that I manually started a game with the correct color assignment for Lion. Unfortunately Alfons also started a screening game just before that and the bots for both games ended up on the same server. Both of those game were then stopped on move 5 and unrated. Both players then started again with the bots playing on different servers.

Janzert

Title: Re: 2014 Arimaa Challenge
Post by Fritzlein on Mar 31^st, 2014, 8:45pm

So, the screening is over and ziltoid wins by a final score of 6-5. It really came down to the wire with sharp needing only for chessandgo to beat ziltoid for sharp to pull into a tie and advance to the challenge on the tiebreaker of having won the computer championship. Alas for carbon (but good for silicon) ziltoid beat chessandgo to preserve a victory by the slimmest of margins. Ziltoid's final record was 29-7, while sharp's final record was 27-8.

The final performance ratings of 2259 for ziltoid and 2244 for sharp are quite intimidating given that both are higher than the previous record in a screening, and given that only four active players have a higher gameroom rating: browni3141, Fritzlein, chessandgo, and Adanac. Browni3141 swept all four of his games, but was the only player to achieve a winning record in the screening, as Adanac didn't participate. Max and supersamu, two of the Challenge defenders, currently have gameroom ratings of 2222 and 2171 respectively.

We bounced back from lower participation last year to have 33 completed pairs this year. Since my previous update, Braveheart, SilverMitt, Lion, and RmznA each completed the four-game set. It was also fantastic to see so much discussion of the screening games in the chat room.

I am quite surprised that ziltoid beat sharp, given that I personally have much more trouble beating sharp than ziltoid. I guess I extrapolated from a data set that was too small (me) to the general strength of the bot. I lost this $1 bet with browni3141, but I am quite happy to lose, given that I believe it boosts my chances of being able to successfully defend the Challenge this year.

On the flip side, I won $1 from browni3141 with double security as even the losing bot attained a higher performance rating than any previous winning bot in a screening. I would have been happy to lose this bet as well, since it might have indicated that humanity retains a comfortable lead in the Arimaa Challenge, but alas, it appears that we are losing ground.

For those of you who didn't see it in the chat room, I made a third $1 bet that I expect to win but hope to lose, this time with supersamu. He will win if he beats ziltoid in all three Challenge games, whereas I will win if supersamu fails to sweep.

Even though screening participation increased from last year, there is a fair bit of variation in the measurement of bot strength. For example, if ziltoid had lost one more game to finish 28-8, it would have dropped 40 points to a performance of 2219. The standard deviation in expected wins for zitloid across these 36 games was +/- 2.0, so taking two standard deviations means we should think of ziltoid's rating as demonstrated by this screening to be 2259 +/- 160.

Any speaking of uncertainly, sharp's performance of 15 Elo below ziltoid seems even more insignificant considering that lightvector configured sharp to only use half the CPU of the server, which one expects to reduce performance by about 50 Elo. Next year, next year...

Title: Re: 2014 Arimaa Challenge
Post by Hippo on Apr 1^st, 2014, 4:47am

I would recomend to play at perfect time for the defenders and play without hurry (you could accumulate gametime in the openning as ziltoid is passive and you should develop pieces before starting the attack). Ziltoids home play should be OK for Fritzlein with Fritzlein having better positional evaluation.
I would say Fritzlein goal attack style could be good way to use.
Bots are very good at defending goal threats as cutting at goals is much more efficient than cutting at captures. So eliminating defenders by trapping them is good way to go. No reason to try to achieve the shortest win.

Title: Re: 2014 Arimaa Challenge
Post by Fritzlein on Apr 7^th, 2014, 8:42am

on 04/01/14 at 04:47:57, Hippo wrote:

I would recomend to play [...] without hurry [...] as ziltoid is passive [...] No reason to try to achieve the shortest win.

Well, I certainly did that! I hope that in addition to achieving a safe win, and achieving the complete boredom of the spectators, I created a game that many players will look at and think "I could have won if I had played that way." It isn't completely a formula, i.e. there are technical issues to not getting your own pieces and rabbits pulled out, but still it seems closer to a formula for winning than anything else proposed.

Big thanks to everyone who participated in the screening. If I hadn't had all of ziltoid's screening games to analyze, I would not have expected my opening strategy to be effective. I was somehow under the impression that modern bots didn't allow themselves to be beaten in such a slow fashion, but as I looked at one screening game after another, and I never saw ziltoid deviate from the lone-elephant opening, it started to look like this weakness was exploitable. Even though nobody beat ziltoid in quite that way, it just goes to show that you don't have to win your screening games for them to give good information; just trying to win is enough.

By the time I play my next Challenge game, Max may have already secured the defense by winning his second game, in which case I will look for a different weakness to exploit, one that may win in fewer than eighty moves. Sorry, browni, if I still don't play the objectively best moves, but there may be more aggressive moves that are still safe enough for an inveterate coward to contemplate. :)

Title: Re: 2014 Arimaa Challenge
Post by browni3141 on Apr 7^th, 2014, 12:09pm

on 04/07/14 at 08:42:36, Fritzlein wrote:

I created a game that many players will look at and think "I could have won if I had played that way." It isn't completely a formula, i.e. there are technical issues to not getting your own pieces and rabbits pulled out, but still it seems closer to a formula for winning than anything else proposed.

I wonder what level of player could exploit this. Often I think that ziltoid (or another bot) is beginnerified if I play a certain strategy, but I forget about those "technical details." Just because the win comes fairly easily to me most of the time I give ziltoid a camel hostage, for example, doesn't mean that is so for everybody. Since I'm bringing up the camel hostage, it seems ziltoid is harder to beat at certain types of hostages than others. If it is defending the hostage trap with the camel it is somewhat harder to beat (but still weaker than average). If it's horses are already activated it is harder to beat, but the problem with ziltoid is that it will willingly bury its own horse to take a camel hostage, and doesn't seek to re-activate it quickly (or at all). It is very inefficient with its camel counterattack, and tends to ignore defense of the hostage trap. I still believe that an opening camel hostage, pushing the horse to b7 or otherwise burying it, is a very good formula (and an easily applicable one) for beating ziltoid, if you understand its weaknesses, and understand the camel hostage basics.
Also, don't forget the formula I gave for winning by score. I think that is much easier than the one you pulled off, and it takes a player to the very end of the game. The only challenge there is staying awake!

Title: Re: 2014 Arimaa Challenge
Post by Fritzlein on Apr 7^th, 2014, 8:15pm

On further reflection, my victory was not as formulaic as I first portrayed it. Not only was winning the first three rabbits not completely trivial, but also winning the cat after that was improvisation. And even with a huge material lead of CRRR, it wasn't totally obvious how to make progress. The camel hostage I took didn't immediately give me the strongest free piece, because my own camel was trapped on the h-file and the hostage was not in a secure position at first, so things could have gone worse than they did. It required continued small errors from ziltoid to make progress feel inevitable.

I will definitely study more of ziltoid's screening games before my next Challenge game to get a better sense of its weaknesses apart from its opening passivity. In particular, I will (as you suggest) study giving a camel hostage, which I might have done already in the first game if ziltoid had pulled my rabbit twice on 26g when I was only two rabbits ahead. I'm not ashamed to bot-bash during the Challenge, but if the next game goes even a little bit differently, it might come in handy (or even be essential) to have more than one bot-bashing tool in my toolbox.

Title: Re: 2014 Arimaa Challenge
Post by mattj256 on Apr 7^th, 2014, 10:15pm

I lost my screening game, but I can assure that as long as you turtle up and don't expose any weaknesses, you have ALL DAY to rearrange your pieces however you see fit.

I encourage the challengers to play with really absurd opening setups, like EMHH on a1-a2-a3-a4, or putting all eight rabbits on one wing.

Title: Re: 2014 Arimaa Challenge
Post by Fritzlein on Apr 25^th, 2014, 6:48pm

The Challenge is safely defended again, and by a wide margin of 7-2 for the humans in the actual Challenge games. Each defender won his match. Hooray for humanity!

For the purpose of assessing the balance of power between humans and machines, probably the largest clean data set we can get is to lump all the Screening and Challenge games together, including both of the top bots. This includes a wide variety of human opponents at all different skill levels trying a wide variety of different strategies. In these games the computers were running on the full hardware and the humans were taking the games seriously. Crucially, these were games against "fresh" bots for which we hadn't yet worked out winning bot-bashing formulas that conceal the bot's strengths by exaggerating their weaknesses.

In this cycle we have 80 games total: Ziltoid went 29-7 in the Screening and 2-7 in the Challenge, while sharp went 27-8 in the Screening for a total record of 58-22 for the bots. Using the gameroom ratings of the human opponents, this gives a total bot performance rating of 2221. For reference, the current list of active human players with higher gameroom ratings than the total bot performance rating is Fritzlein, chessandgo, browni3141, Adanac, and Max. Brendan_M and supersamu are just a couple of rating points behind.

There are lots of ways explain why we know humans have a bigger lead over bots than the raw data says: we can excuse human losses due to the circumstances of individual games; we can point out particular weaknesses of bots; we can bot-bash up to sky-high ratings (and Arimanator's 2715 is an old number that is likely not the current limit). For my money, however, this 80-game data set is more persuasive than a hill of talk, and the results say what they say.

I'm looking forward more than ever to the 2015 Challenge cycle, and our next opportunity to assess how much the bots have progressed versus how much humanity has progressed.