Arimaa Forum - 2015 Arimaa Challenge

Welcome, Guest. Please Login or Register.
Jul 15^th, 2025, 7:23pm

Home

Help

Members

Arimaa Forum « 2015 Arimaa Challenge »

   Arimaa Forum
   Arimaa
   Events (Moderator: supersamu)
   2015 Arimaa Challenge

« Previous topic | Next topic »

Pages: 1 2 3 4 5

Notify of replies

Send Topic

Author

Topic: 2015 Arimaa Challenge (Read 12728 times)

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: 2015 Arimaa Challenge
« Reply #30 on: Mar 19^th, 2015, 11:06pm »

Quote

Modify

Humanity won two of the most recent five games, and thus gained ground on silicon. Both of the wins, however, were against Z, while two losses were against sharp, so the gap between the bots has widened. Sharp has won all three decisive pairs of the ten completed pairs, with a performance of 2364. Z trails with a performance of 2062 and meager hopes from leading only two of the six incomplete pairs.

« Last Edit: Mar 19^th, 2015, 11:07pm by Fritzlein »

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: 2015 Arimaa Challenge
« Reply #31 on: Mar 21^st, 2015, 4:49pm »

Quote

Modify

Humanity won only one of the last six, again against Z. The two completed pairs were indecisive, so sharp's lead remains at only 3-0, but sharp leads in seven of the eight incomplete pairs, and 2438 to 2062 in performance rating.

On a personal note, it feels very much like the top bots have drawn even with me, not only because I split my two games with each of them, but also because sharp's Screening performance is currently higher than my game room rating. Of course four games and one player is too little evidence to base anything on. I merely note that in the prolonged man vs. machine struggle, in every domain that silicon conquers, each individual must face his own personal loss, and this feels like mine.

Yes, there is a good chance I will do better next year, but there is also a fair chance that last year was my last year to get a positive score against the bots. We'll see what happens in the rest of the Screening and remaining years of the Challenge.

IP Logged

Belteshazzar
Forum Guru

Arimaa player #5094

Gender: male

Posts: 108

Re: 2015 Arimaa Challenge
« Reply #32 on: Mar 21^st, 2015, 10:46pm »

Quote

Modify

Interesting that no one has yet defeated sharp in a remotely efficient manner. I wonder if the challengers will be able to win in less than 80 turns.

IP Logged

browni3141
Forum Guru

Arimaa player #7014

Gender: male

Posts: 385

Re: 2015 Arimaa Challenge
« Reply #33 on: Mar 22^nd, 2015, 1:25am »

Quote

Modify

on Mar 21^st, 2015, 10:46pm, Belteshazzar wrote:

Interesting that no one has yet defeated sharp in a remotely efficient manner. I wonder if the challengers will be able to win in less than 80 turns.

I would be horrified if all my games took more than 80 turns to either win or lose.
I expect to average about 35 (planning on three wins)

Note that only two players have beaten sharp at all, and one win was in a deliberately inefficient manner, and the other completely indifferent to efficiency.

« Last Edit: Mar 22^nd, 2015, 1:29am by browni3141 »

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: 2015 Arimaa Challenge
« Reply #34 on: Mar 25^th, 2015, 2:06am »

Quote

Modify

Another six games are in the books: sharp won two of two and Z won three of four, so both gain in performance rating. Sharp is up to 2472 and Z up to 2082. We're up to fourteen completed pairs, but still only three decisive, and all three in sharp's favor. In ten incomplete pairs, seven favor sharp.

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: 2015 Arimaa Challenge
« Reply #35 on: Mar 28^th, 2015, 12:04am »

Quote

Modify

Another prediction garnered from the chat room, this time kzb52:
Quote:

I don't know if anyone's stepped in to be the prophet of doom for the challenge this year. I guess I can fill that role, since I didn't predict anything before now
I say sharp wins 4 games at a minimum, but probably 5
(some plausible results might be 3-0, 1-2, 0-3, or 2-1, 2-1, 0-3, etc. )

IP Logged

PerkofBR
Forum Senior Member

Arimaa player #9787

Gender: male

Posts: 31

Re: 2015 Arimaa Challenge
« Reply #36 on: Mar 29^th, 2015, 7:19pm »

Quote

Modify

I predict all defenders will win theirs BO3, with sharp getting 2 to 3 wins. Wink

IP Logged

Boo
Forum Guru

Arimaa player #6466

Gender: male

Posts: 118

Re: 2015 Arimaa Challenge
« Reply #37 on: Mar 30^th, 2015, 8:16am »

Quote

Modify

I think humans will barely hold. 2-1, 1-2, 0-3. I hope I am wrong though.

IP Logged

Belteshazzar
Forum Guru

Arimaa player #5094

Gender: male

Posts: 108

Re: 2015 Arimaa Challenge
« Reply #38 on: Mar 30^th, 2015, 4:41pm »

Quote

Modify

I wonder how Fritz would have done in a third game against sharp. When he beat it in his first game, I assumed the challenge was safe.

IP Logged

deep_blue
Forum Guru

Arimaa player #9854

Posts: 212

Re: 2015 Arimaa Challenge
« Reply #39 on: Mar 31^st, 2015, 4:14am »

Quote

Modify

Fritzlein, where are the new performance ratings? Wink

« Last Edit: Mar 31^st, 2015, 4:14am by deep_blue »

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: 2015 Arimaa Challenge
« Reply #40 on: Mar 31^st, 2015, 8:37am »

Quote

Modify

on Mar 31^st, 2015, 4:14am, deep_blue wrote:

Fritzlein, where are the new performance ratings? Wink

Sorry, my weekend Ultimate tournament consumed more than my weekend. Still, I'm glad somebody missed my updates.

I guess there is a flurry of activity around the deadline; thirteen games since I last updated. But with less than 24 hours to go, I think I'll just wait for the final entry. In the mean time you can well imagine how stratospheric sharp's performance rating has become. As I write, your win-on-score strategy is already bust; good luck winning over the board!

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: 2015 Arimaa Challenge
« Reply #41 on: Apr 1^st, 2015, 12:00am »

Quote

Modify

I'm very pleased with the late flurry of Screening games: twenty-one since I last posted six days ago. The final tallies are:
Sharp won 29 and lost 2 for a performance rating of 2557.
Z won 18 and lost 10 for a performance rating of 2123.
There were 27 completed pairs.*
Sharp won all of the 8 decisive pairs.

*(counting Hufflepup's pair and BlakeD's second pair despite the server color glitch; not counting either of DanielM's two games or 722caasi's second game)

« Last Edit: Apr 1^st, 2015, 12:01am by Fritzlein »

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: 2015 Arimaa Challenge
« Reply #42 on: Apr 1^st, 2015, 12:28am »

Quote

Modify

Before moving on to my predictions for the future, let me consider a couple of my past predictions.

I predicted (not very confidently) that sharp would win every decisive pair this year. This came true, but not quite as I had envisioned. Since sharp lost only two games, it only had two chances to lose a pair! Although I did foresee sharp's dominance over Z, I didn't anticipate sharp's dominance over all human participants.

I thought supersamu had less than 1/3 chance of sweeping four screening games, and we made a $1:$2 bet to that effect. Supersamu only played one pair and lost both games, so that dollar is clearly mine. (If it is any consolation, supersamu, my expectations for myself were also higher than my performance.)

I bought insurance from browni3141 against the Challenge being won this year at a rate I estimated at 1:100, although I estimated the true odds of the Challenge being won nearer to 1:200 and browni3141 put it around 1:1000. This is turning out to be a great purchase by me, both because browni3141 is now only one win (and two losses) away from the World Championship, so his prize equity has risen above $200, and also because I now put the odds of the Challenge being won closer to 1:20 now that the Screening is over. The value of my insurance is worth over $10 in my current opinion, even though I paid just $2 for it.

The winner in that prediction is quasar, and for exactly the reason that he gave in the chat room: If sharp had turned out to be worse than expected, I stood to lose at most $2, but if sharp turned out to be better than expected (as it has), I could gain much more than $2 of equity. If I was overestimating sharp's strength by 100 elo, it would costs me much less than it would cost browni3141 to underestimate sharp's strength by 100 elo. (In addition to which, he appears to have underestimated by more than that, but that is getting in to my next post.) Uncertainty always skews in the favor of the insurance buyer, and the strength of the improved sharp was uncertain.

« Last Edit: Apr 1^st, 2015, 1:58am by Fritzlein »

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: 2015 Arimaa Challenge
« Reply #43 on: Apr 1^st, 2015, 1:44am »

Quote

Modify

Sharp's performance rating of 2557 is truly intimidating. If we pit it against the game room ratings of the defenders, 2512, 2255, and 2235, for browni3141, chessandgo, and harvestsnow respectively, we can calculate that sharp has a 53% chance of winning the Challenge this year! How, then, am I still giving sharp only a 5% chance or so in my above post?

First, I should confess that, as much as I like to calculate sharp's performance rating as the rating that would have predicted a 29-2 showing, this is a biased estimator. One can demonstrate this by simulation: assume sharp's true rating is some fixed value, say 2400, and use that to generate a million screenings against the 31 actual opponents. (of course also assume the screeners' ratings are accurate and the elo formula is true). Average the calculated performance rating across those million simulations, and the average will be higher than 2400.

This upward bias is not merely because a single perfect screening has a performance of infinity and thus makes the average infinity. One can take infinity out of the picture by adding a draw against a 2400 player to each screening, and even so the average performance rating would be over 2400. This is because the screeners are rather weaker than 2400 on average.

Sorry for the math mumbo-jumbo; the upshot is that my performance rating calculation over-reacts to extreme results. In this respect it is rather like humans.

We now have a very high measurement of sharp's ability, and this time the uncertainty in our measurement skews to the downside.

Secondly, chessandgo is clearly underrated on the game room scale, and browni3141 probably is slightly underrated as well.

Thirdly, I think it is fairly likely that sharp can be beaten on score. Deep_blue did not manage it on two tries, but it is not obvious that he chose the optimal method to lull sharp into near-repetition. His chosen moves might have provoked advances from sharp that other moves would not have. Since nobody else tried, our information is limited. (Incidentally, my expectation that a win-on-score formula exists even though we haven't quite discovered it yet reinforces my belief that it would be meaningfully unfair to bots to allow individuals unlimited Screening games in which to work out formulaic wins.)

Given that shuffling pieces in the opening can accumulate reserve, there is significant upside and negligible downside for Challenge Defenders to attempt to win on score, and to revert to normal Arimaa only if it fails. I don't know whether the Defenders will try this from the outset, or only in desperation, or not even if they are desperate, but this consideration can only increase humanity's chance of defending.

Fourth and finally, HvB games tend to exaggerate differences as measured on an HvH scale. For example, suppose Alice beats Bomb 50% of the time and Alice beats Charlie 70% of the time. You might guess that since Bomb and Alice are equally good, Bomb will beat Charlie 70% of the time too, but I would guess 80% or more. Alice is error-prone in a way that Bomb is not, so Bomb will blunder away fewer games. Similarly if Alice beats Daniel 30% of the time, you might guess that Bomb will too, but I would guess 20% or less, since Alice can adapt to what she perceives Daniel's strategy to be in a way that Bomb can't.

If I am about equal to sharp, then I expect browni3141 and chessandgo to beat sharp with greater probability than they beat me. (Also I note with relief that both are seriously training at the moment.) Admittedly, by the same token, I expect harvestsnow to lose to sharp with greater probability than he would lose to me, but we only need one mini-match victory to defend.

I predict sharp will most likely win four Challenge games this year, with a slight upward bias, i.e. if it isn't four then it is more likely five than three. Before the Screening that would have made me a doomsayer, but in face of sharp's fantastic results, kzb52 and Boo have passed me on pessimism. On the other hand, I'm not the optimist in the room either, with PerkofBR predicting that even harvestsnow will win his mini-match.

Before the Screening, I was going to stick to my prediction from the last few years that bots have a 30% chance of winning the Challenge before it expires in 2020. Given how dramatically sharp has improved this year, and given how lightvector drops hints that there is plenty more for him to optimize before he runs out of ideas, I have to bump my prediction up to 70% that the Challenge is won. Elapsing time is on our side, but the trend in bot strength versus the trend in human strength is against us much more than I anticipated. Hats off to lightvector!

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: 2015 Arimaa Challenge
« Reply #44 on: Apr 1^st, 2015, 1:55am »

Quote

Modify

I should also give the 95% confidence interval on sharp's performance rating: 2312 to 2799. That conveys some idea of how much random variation there is in the screening measurement. This is unrelated to the reasons I gave above to like humanity's chances better than actual performance rating would suggest.

IP Logged

Pages: 1 2 3 4 5

Notify of replies

Send Topic


« Previous topic \| Next topic »