Arimaa Forum - 2014 Arimaa Challenge

Welcome, Guest. Please Login or Register.
Jul 3^rd, 2025, 11:04pm

Home

Help

Members

Arimaa Forum « 2014 Arimaa Challenge »

   Arimaa Forum
   Arimaa
   Events (Moderator: supersamu)
   2014 Arimaa Challenge

« Previous topic | Next topic »

Pages: 1 2 3

Notify of replies

Send Topic

Author

Topic: 2014 Arimaa Challenge (Read 9305 times)

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: 2014 Arimaa Challenge
« Reply #30 on: Mar 28^th, 2014, 12:26am »

Quote

Modify

A flurry of screening activity since last update: browni3141 becomes the third player to complete all four games, aurelian beats ziltoid from a 909-point rating disadvantage, each bot wins a completed pair, the still incomplete pairs tilt slightly to sharp's favor, and the combined performance rating dips slightly.

Ziltoid now leads 5-2 in points, and 2233-2145 in performance rating. I'm still winning one of my two bets, but only by a single rating point!

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: 2014 Arimaa Challenge
« Reply #31 on: Mar 29^th, 2014, 10:42am »

Quote

Modify

A continued high level of screening activity down the home stretch. Yay! Aurelian and kzb52 become the fourth and fifth players to complete all four screening games. Ziltoid's lead in decisive pairs has narrowed to 6-4. Furthermore, three incomplete pairs favor sharp (chessandgo, SilverMitt, Hippo) while only one favors ziltoid (aaaa), so the final outcome is still very much in the air.

Sharp's comeback is also reflected in a narrower gap in performance rating, now just 2236 vs. 2180 in favor of ziltoid. With each bot having played 28 games, the performance rating no longer moves as much with each new game played, but it is still conceivable that ziltoid could drop below the record level while sharp surpasses it.

« Last Edit: Mar 29^th, 2014, 11:12am by Fritzlein »

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: 2014 Arimaa Challenge
« Reply #32 on: Mar 30^th, 2014, 10:49am »

Quote

Modify

The bots swept the five games since my last update. Congratulations to BlakeD, aaaa, and Hippo for each completing all four screening games. We now have more completed pairs of games than last year: I'm proud of the level of participation from the Arimaa community. And we still have over a day left!

I am embarrassed to report that I have been miscounting the performance rating for ziltoid. I don't know when I got off by one, but I doubled-checked today, and ziltoid is only 26-7, not 27-6 as I had the tally. Embarassed

That means the "record performance" bet that I thought I had nearly clinched will now likely go against me. Ziltoid's performance rating is now 2219; sharp's is 2193. The sum of the two performance ratings is on track to be a record sum, but that wasn't the bet. Tongue

IP Logged

Janzert
Forum Guru

Arimaa player #247

Gender: male

Posts: 1016

Re: 2014 Arimaa Challenge
« Reply #33 on: Mar 31^st, 2014, 7:32am »

Quote

Modify

In game 298157 Lion had a connection related timeout against sharp last night. As with the others it should be resumed and played out, if that can happen before the screening period ends.

Janzert

« Last Edit: Mar 31^st, 2014, 7:32am by Janzert »

IP Logged

Janzert
Forum Guru

Arimaa player #247

Gender: male

Posts: 1016

Re: 2014 Arimaa Challenge
« Reply #34 on: Mar 31^st, 2014, 2:28pm »

Quote

Modify

The below is just a "for the record" announcement.

Lion played another game with sharp before resuming the timed out game. This apparently confused the screening scheduler and it was pairing him with ziltoid playing the wrong color.

To work around that I manually started a game with the correct color assignment for Lion. Unfortunately Alfons also started a screening game just before that and the bots for both games ended up on the same server. Both of those game were then stopped on move 5 and unrated. Both players then started again with the bots playing on different servers.

Janzert

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: 2014 Arimaa Challenge
« Reply #35 on: Mar 31^st, 2014, 8:45pm »

Quote

Modify

So, the screening is over and ziltoid wins by a final score of 6-5. It really came down to the wire with sharp needing only for chessandgo to beat ziltoid for sharp to pull into a tie and advance to the challenge on the tiebreaker of having won the computer championship. Alas for carbon (but good for silicon) ziltoid beat chessandgo to preserve a victory by the slimmest of margins. Ziltoid's final record was 29-7, while sharp's final record was 27-8.

The final performance ratings of 2259 for ziltoid and 2244 for sharp are quite intimidating given that both are higher than the previous record in a screening, and given that only four active players have a higher gameroom rating: browni3141, Fritzlein, chessandgo, and Adanac. Browni3141 swept all four of his games, but was the only player to achieve a winning record in the screening, as Adanac didn't participate. Max and supersamu, two of the Challenge defenders, currently have gameroom ratings of 2222 and 2171 respectively.

We bounced back from lower participation last year to have 33 completed pairs this year. Since my previous update, Braveheart, SilverMitt, Lion, and RmznA each completed the four-game set. It was also fantastic to see so much discussion of the screening games in the chat room.

I am quite surprised that ziltoid beat sharp, given that I personally have much more trouble beating sharp than ziltoid. I guess I extrapolated from a data set that was too small (me) to the general strength of the bot. I lost this $1 bet with browni3141, but I am quite happy to lose, given that I believe it boosts my chances of being able to successfully defend the Challenge this year.

On the flip side, I won $1 from browni3141 with double security as even the losing bot attained a higher performance rating than any previous winning bot in a screening. I would have been happy to lose this bet as well, since it might have indicated that humanity retains a comfortable lead in the Arimaa Challenge, but alas, it appears that we are losing ground.

For those of you who didn't see it in the chat room, I made a third $1 bet that I expect to win but hope to lose, this time with supersamu. He will win if he beats ziltoid in all three Challenge games, whereas I will win if supersamu fails to sweep.

Even though screening participation increased from last year, there is a fair bit of variation in the measurement of bot strength. For example, if ziltoid had lost one more game to finish 28-8, it would have dropped 40 points to a performance of 2219. The standard deviation in expected wins for zitloid across these 36 games was +/- 2.0, so taking two standard deviations means we should think of ziltoid's rating as demonstrated by this screening to be 2259 +/- 160.

Any speaking of uncertainly, sharp's performance of 15 Elo below ziltoid seems even more insignificant considering that lightvector configured sharp to only use half the CPU of the server, which one expects to reduce performance by about 50 Elo. Next year, next year...

IP Logged

Hippo
Forum Guru

Arimaa player #4450

Gender: male

Posts: 883

Re: 2014 Arimaa Challenge
« Reply #36 on: Apr 1^st, 2014, 4:47am »

Quote

Modify

I would recomend to play at perfect time for the defenders and play without hurry (you could accumulate gametime in the openning as ziltoid is passive and you should develop pieces before starting the attack). Ziltoids home play should be OK for Fritzlein with Fritzlein having better positional evaluation.
I would say Fritzlein goal attack style could be good way to use.
Bots are very good at defending goal threats as cutting at goals is much more efficient than cutting at captures. So eliminating defenders by trapping them is good way to go. No reason to try to achieve the shortest win.

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: 2014 Arimaa Challenge
« Reply #37 on: Apr 7^th, 2014, 8:42am »

Quote

Modify

on Apr 1^st, 2014, 4:47am, Hippo wrote:

I would recomend to play [...] without hurry [...] as ziltoid is passive [...] No reason to try to achieve the shortest win.

Well, I certainly did that! I hope that in addition to achieving a safe win, and achieving the complete boredom of the spectators, I created a game that many players will look at and think "I could have won if I had played that way." It isn't completely a formula, i.e. there are technical issues to not getting your own pieces and rabbits pulled out, but still it seems closer to a formula for winning than anything else proposed.

Big thanks to everyone who participated in the screening. If I hadn't had all of ziltoid's screening games to analyze, I would not have expected my opening strategy to be effective. I was somehow under the impression that modern bots didn't allow themselves to be beaten in such a slow fashion, but as I looked at one screening game after another, and I never saw ziltoid deviate from the lone-elephant opening, it started to look like this weakness was exploitable. Even though nobody beat ziltoid in quite that way, it just goes to show that you don't have to win your screening games for them to give good information; just trying to win is enough.

By the time I play my next Challenge game, Max may have already secured the defense by winning his second game, in which case I will look for a different weakness to exploit, one that may win in fewer than eighty moves. Sorry, browni, if I still don't play the objectively best moves, but there may be more aggressive moves that are still safe enough for an inveterate coward to contemplate.

IP Logged

browni3141
Forum Guru

Arimaa player #7014

Gender: male

Posts: 385

Re: 2014 Arimaa Challenge
« Reply #38 on: Apr 7^th, 2014, 12:09pm »

Quote

Modify

on Apr 7^th, 2014, 8:42am, Fritzlein wrote:

I created a game that many players will look at and think "I could have won if I had played that way." It isn't completely a formula, i.e. there are technical issues to not getting your own pieces and rabbits pulled out, but still it seems closer to a formula for winning than anything else proposed.

I wonder what level of player could exploit this. Often I think that ziltoid (or another bot) is beginnerified if I play a certain strategy, but I forget about those "technical details." Just because the win comes fairly easily to me most of the time I give ziltoid a camel hostage, for example, doesn't mean that is so for everybody. Since I'm bringing up the camel hostage, it seems ziltoid is harder to beat at certain types of hostages than others. If it is defending the hostage trap with the camel it is somewhat harder to beat (but still weaker than average). If it's horses are already activated it is harder to beat, but the problem with ziltoid is that it will willingly bury its own horse to take a camel hostage, and doesn't seek to re-activate it quickly (or at all). It is very inefficient with its camel counterattack, and tends to ignore defense of the hostage trap. I still believe that an opening camel hostage, pushing the horse to b7 or otherwise burying it, is a very good formula (and an easily applicable one) for beating ziltoid, if you understand its weaknesses, and understand the camel hostage basics.
Also, don't forget the formula I gave for winning by score. I think that is much easier than the one you pulled off, and it takes a player to the very end of the game. The only challenge there is staying awake!

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: 2014 Arimaa Challenge
« Reply #39 on: Apr 7^th, 2014, 8:15pm »

Quote

Modify

On further reflection, my victory was not as formulaic as I first portrayed it. Not only was winning the first three rabbits not completely trivial, but also winning the cat after that was improvisation. And even with a huge material lead of CRRR, it wasn't totally obvious how to make progress. The camel hostage I took didn't immediately give me the strongest free piece, because my own camel was trapped on the h-file and the hostage was not in a secure position at first, so things could have gone worse than they did. It required continued small errors from ziltoid to make progress feel inevitable.

I will definitely study more of ziltoid's screening games before my next Challenge game to get a better sense of its weaknesses apart from its opening passivity. In particular, I will (as you suggest) study giving a camel hostage, which I might have done already in the first game if ziltoid had pulled my rabbit twice on 26g when I was only two rabbits ahead. I'm not ashamed to bot-bash during the Challenge, but if the next game goes even a little bit differently, it might come in handy (or even be essential) to have more than one bot-bashing tool in my toolbox.

IP Logged

mattj256
Forum Guru

Arimaa player #8519

Gender: male

Posts: 138

Re: 2014 Arimaa Challenge
« Reply #40 on: Apr 7^th, 2014, 10:15pm »

Quote

Modify

I lost my screening game, but I can assure that as long as you turtle up and don't expose any weaknesses, you have ALL DAY to rearrange your pieces however you see fit.

I encourage the challengers to play with really absurd opening setups, like EMHH on a1-a2-a3-a4, or putting all eight rabbits on one wing.

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: 2014 Arimaa Challenge
« Reply #41 on: Apr 25^th, 2014, 6:48pm »

Quote

Modify

The Challenge is safely defended again, and by a wide margin of 7-2 for the humans in the actual Challenge games. Each defender won his match. Hooray for humanity!

For the purpose of assessing the balance of power between humans and machines, probably the largest clean data set we can get is to lump all the Screening and Challenge games together, including both of the top bots. This includes a wide variety of human opponents at all different skill levels trying a wide variety of different strategies. In these games the computers were running on the full hardware and the humans were taking the games seriously. Crucially, these were games against "fresh" bots for which we hadn't yet worked out winning bot-bashing formulas that conceal the bot's strengths by exaggerating their weaknesses.

In this cycle we have 80 games total: Ziltoid went 29-7 in the Screening and 2-7 in the Challenge, while sharp went 27-8 in the Screening for a total record of 58-22 for the bots. Using the gameroom ratings of the human opponents, this gives a total bot performance rating of 2221. For reference, the current list of active human players with higher gameroom ratings than the total bot performance rating is Fritzlein, chessandgo, browni3141, Adanac, and Max. Brendan_M and supersamu are just a couple of rating points behind.

There are lots of ways explain why we know humans have a bigger lead over bots than the raw data says: we can excuse human losses due to the circumstances of individual games; we can point out particular weaknesses of bots; we can bot-bash up to sky-high ratings (and Arimanator's 2715 is an old number that is likely not the current limit). For my money, however, this 80-game data set is more persuasive than a hill of talk, and the results say what they say.

I'm looking forward more than ever to the 2015 Challenge cycle, and our next opportunity to assess how much the bots have progressed versus how much humanity has progressed.

IP Logged

Pages: 1 2 3

Notify of replies

Send Topic


« Previous topic \| Next topic »