Welcome, Guest. Please Login or Register.
Apr 26th, 2024, 6:31pm

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « 2015 Arimaa Challenge »


   Arimaa Forum
   Arimaa
   Events
(Moderator: supersamu)
   2015 Arimaa Challenge
« Previous topic | Next topic »
Pages: 1 2 3 4 5  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: 2015 Arimaa Challenge  (Read 11942 times)
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2015 Arimaa Challenge
« Reply #30 on: Mar 19th, 2015, 11:06pm »
Quote Quote Modify Modify

Humanity won two of the most recent five games, and thus gained ground on silicon.  Both of the wins, however, were against Z, while two losses were against sharp, so the gap between the bots has widened.  Sharp has won all three decisive pairs of the ten completed pairs, with a performance of 2364.  Z trails with a performance of 2062 and meager hopes from leading only two of the six incomplete pairs.
« Last Edit: Mar 19th, 2015, 11:07pm by Fritzlein » IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2015 Arimaa Challenge
« Reply #31 on: Mar 21st, 2015, 4:49pm »
Quote Quote Modify Modify

Humanity won only one of the last six, again against Z.  The two completed pairs were indecisive, so sharp's lead remains at only 3-0, but sharp leads in seven of the eight incomplete pairs, and 2438 to 2062 in performance rating.
 
On a personal note, it feels very much like the top bots have drawn even with me, not only because I split my two games with each of them, but also because sharp's Screening performance is currently higher than my game room rating.  Of course four games and one player is too little evidence to base anything on.  I merely note that in the prolonged man vs. machine struggle, in every domain that silicon conquers, each individual must face his own personal loss, and this feels like mine.
 
Yes, there is a good chance I will do better next year, but there is also a fair chance that last year was my last year to get a positive score against the bots.  We'll see what happens in the rest of the Screening and remaining years of the Challenge.
IP Logged

Belteshazzar
Forum Guru
*****



Arimaa player #5094

   


Gender: male
Posts: 108
Re: 2015 Arimaa Challenge
« Reply #32 on: Mar 21st, 2015, 10:46pm »
Quote Quote Modify Modify

Interesting that no one has yet defeated sharp in a remotely efficient manner.  I wonder if the challengers will be able to win in less than 80 turns.
IP Logged
browni3141
Forum Guru
*****



Arimaa player #7014

   


Gender: male
Posts: 384
Re: 2015 Arimaa Challenge
« Reply #33 on: Mar 22nd, 2015, 1:25am »
Quote Quote Modify Modify

on Mar 21st, 2015, 10:46pm, Belteshazzar wrote:
Interesting that no one has yet defeated sharp in a remotely efficient manner.  I wonder if the challengers will be able to win in less than 80 turns.

 
I would be horrified if all my games took more than 80 turns to either win or lose.
I expect to average about 35 (planning on three wins)
 
Note that only two players have beaten sharp at all, and one win was in a deliberately inefficient manner, and the other completely indifferent to efficiency.
« Last Edit: Mar 22nd, 2015, 1:29am by browni3141 » IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2015 Arimaa Challenge
« Reply #34 on: Mar 25th, 2015, 2:06am »
Quote Quote Modify Modify

Another six games are in the books: sharp won two of two and Z won three of four, so both gain in performance rating.  Sharp is up to 2472 and Z up to 2082.  We're up to fourteen completed pairs, but still only three decisive, and all three in sharp's favor.  In ten incomplete pairs, seven favor sharp.
IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2015 Arimaa Challenge
« Reply #35 on: Mar 28th, 2015, 12:04am »
Quote Quote Modify Modify

Another prediction garnered from the chat room, this time kzb52:
Quote:
I don't know if anyone's stepped in to be the prophet of doom for the challenge this year. I guess I can fill that role, since I didn't predict anything before now
I say sharp wins 4 games at a minimum, but probably 5
(some plausible results might be 3-0, 1-2, 0-3, or 2-1, 2-1, 0-3, etc. )
IP Logged

PerkofBR
Forum Senior Member
****



Arimaa player #9787

   


Gender: male
Posts: 31
Re: 2015 Arimaa Challenge
« Reply #36 on: Mar 29th, 2015, 7:19pm »
Quote Quote Modify Modify

I predict all defenders will win theirs BO3, with sharp getting 2 to 3 wins.  Wink
IP Logged
Boo
Forum Guru
*****



Arimaa player #6466

   


Gender: male
Posts: 118
Re: 2015 Arimaa Challenge
« Reply #37 on: Mar 30th, 2015, 8:16am »
Quote Quote Modify Modify

I think humans will barely hold. 2-1, 1-2, 0-3. I hope I am wrong though.
IP Logged

Belteshazzar
Forum Guru
*****



Arimaa player #5094

   


Gender: male
Posts: 108
Re: 2015 Arimaa Challenge
« Reply #38 on: Mar 30th, 2015, 4:41pm »
Quote Quote Modify Modify

I wonder how Fritz would have done in a third game against sharp.  When he beat it in his first game, I assumed the challenge was safe.
IP Logged
deep_blue
Forum Guru
*****



Arimaa player #9854

   


Posts: 212
Re: 2015 Arimaa Challenge
« Reply #39 on: Mar 31st, 2015, 4:14am »
Quote Quote Modify Modify

Fritzlein, where are the new performance ratings? Wink
« Last Edit: Mar 31st, 2015, 4:14am by deep_blue » IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2015 Arimaa Challenge
« Reply #40 on: Mar 31st, 2015, 8:37am »
Quote Quote Modify Modify

on Mar 31st, 2015, 4:14am, deep_blue wrote:
Fritzlein, where are the new performance ratings? Wink

Sorry, my weekend Ultimate tournament consumed more than my weekend.  Still, I'm glad somebody missed my updates. Smiley  I guess there is a flurry of activity around the deadline; thirteen games since I last updated.  But with less than 24 hours to go, I think I'll just wait for the final entry.  In the mean time you can well imagine how stratospheric sharp's performance rating has become.  As I write, your win-on-score strategy is already bust; good luck winning over the board!
IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2015 Arimaa Challenge
« Reply #41 on: Apr 1st, 2015, 12:00am »
Quote Quote Modify Modify

I'm very pleased with the late flurry of Screening games: twenty-one since I last posted six days ago.  The final tallies are:
Sharp won 29 and lost 2 for a performance rating of 2557.
Z won 18 and lost 10 for a performance rating of 2123.
There were 27 completed pairs.*
Sharp won all of the 8 decisive pairs.
 
*(counting Hufflepup's pair and BlakeD's second pair despite the server color glitch; not counting either of DanielM's two games or 722caasi's second game)
« Last Edit: Apr 1st, 2015, 12:01am by Fritzlein » IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2015 Arimaa Challenge
« Reply #42 on: Apr 1st, 2015, 12:28am »
Quote Quote Modify Modify

Before moving on to my predictions for the future, let me consider a couple of my past predictions.
 
I predicted (not very confidently) that sharp would win every decisive pair this year.  This came true, but not quite as I had envisioned.  Since sharp lost only two games, it only had two chances to lose a pair!  Although I did foresee sharp's dominance over Z, I didn't anticipate sharp's dominance over all human participants.
 
I thought supersamu had less than 1/3 chance of sweeping four screening games, and we made a $1:$2 bet to that effect.  Supersamu only played one pair and lost both games, so that dollar is clearly mine.  (If it is any consolation, supersamu, my expectations for myself were also higher than my performance.)
 
I bought insurance from browni3141 against the Challenge being won this year at a rate I estimated at 1:100, although I estimated the true odds of the Challenge being won nearer to 1:200 and browni3141 put it around 1:1000.  This is turning out to be a great purchase by me, both because browni3141 is now only one win (and two losses) away from the World Championship, so his prize equity has risen above $200, and also because I now put the odds of the Challenge being won closer to 1:20 now that the Screening is over.  The value of my insurance is worth over $10 in my current opinion, even though I paid just $2 for it.
 
The winner in that prediction is quasar, and for exactly the reason that he gave in the chat room: If sharp had turned out to be worse than expected, I stood to lose at most $2, but if sharp turned out to be better than expected (as it has), I could gain much more than $2 of equity.  If I was overestimating sharp's strength by 100 elo, it would costs me much less than it would cost browni3141 to underestimate sharp's strength by 100 elo.  (In addition to which, he appears to have underestimated by more than that, but that is getting in to my next post.)  Uncertainty always skews in the favor of the insurance buyer, and the strength of the improved sharp was uncertain.
« Last Edit: Apr 1st, 2015, 1:58am by Fritzlein » IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2015 Arimaa Challenge
« Reply #43 on: Apr 1st, 2015, 1:44am »
Quote Quote Modify Modify

Sharp's performance rating of 2557 is truly intimidating.  If we pit it against the game room ratings of the defenders, 2512, 2255, and 2235, for browni3141, chessandgo, and harvestsnow respectively, we can calculate that sharp has a 53% chance of winning the Challenge this year!  How, then, am I still giving sharp only a 5% chance or so in my above post?
 
First, I should confess that, as much as I like to calculate sharp's performance rating as the rating that would have predicted a 29-2 showing, this is a biased estimator.  One can demonstrate this by simulation: assume sharp's true rating is some fixed value, say 2400, and use that to generate a million screenings against the 31 actual opponents.  (of course also assume the screeners' ratings are accurate and the elo formula is true).  Average the calculated performance rating across those million simulations, and the average will be higher than 2400.
 
This upward bias is not merely because a single perfect screening has a performance of infinity and thus makes the average infinity.  One can take infinity out of the picture by adding a draw against a 2400 player to each screening, and even so the average performance rating would be over 2400.  This is because the screeners are rather weaker than 2400 on average.
 
Sorry for the math mumbo-jumbo; the upshot is that my performance rating calculation over-reacts to extreme results.  In this respect it is rather like humans.  Smiley  We now have a very high measurement of sharp's ability, and this time the uncertainty in our measurement skews to the downside.
 
Secondly, chessandgo is clearly underrated on the game room scale, and browni3141 probably is slightly underrated as well.
 
Thirdly, I think it is fairly likely that sharp can be beaten on score.  Deep_blue did not manage it on two tries, but it is not obvious that he chose the optimal method to lull sharp into near-repetition.  His chosen moves might have provoked advances from sharp that other moves would not have.  Since nobody else tried, our information is limited.  (Incidentally, my expectation that a win-on-score formula exists even though we haven't quite discovered it yet reinforces my belief that it would be meaningfully unfair to bots to allow individuals unlimited Screening games in which to work out formulaic wins.)
 
Given that shuffling pieces in the opening can accumulate reserve, there is significant upside and negligible downside for Challenge Defenders to attempt to win on score, and to revert to normal Arimaa only if it fails.  I don't know whether the Defenders will try this from the outset, or only in desperation, or not even if they are desperate, but this consideration can only increase humanity's chance of defending.
 
Fourth and finally, HvB games tend to exaggerate differences as measured on an HvH scale.  For example, suppose Alice beats Bomb 50% of the time and Alice beats Charlie 70% of the time.  You might guess that since Bomb and Alice are equally good, Bomb will beat Charlie 70% of the time too, but I would guess 80% or more.  Alice is error-prone in a way that Bomb is not, so Bomb will blunder away fewer games.  Similarly if Alice beats Daniel 30% of the time, you might guess that Bomb will too, but I would guess 20% or less, since Alice can adapt to what she perceives Daniel's strategy to be in a way that Bomb can't.
 
If I am about equal to sharp, then I expect browni3141 and chessandgo to beat sharp with greater probability than they beat me.  (Also I note with relief that both are seriously training at the moment.)  Admittedly, by the same token, I expect harvestsnow to lose to sharp with greater probability than he would lose to me, but we only need one mini-match victory to defend.
 
I predict sharp will most likely win four Challenge games this year, with a slight upward bias, i.e. if it isn't four then it is more likely five than three.  Before the Screening that would have made me a doomsayer, but in face of sharp's fantastic results, kzb52 and Boo have passed me on pessimism.  On the other hand, I'm not the optimist in the room either, with PerkofBR predicting that even harvestsnow will win his mini-match.
 
Before the Screening, I was going to stick to my prediction from the last few years that bots have a 30% chance of winning the Challenge before it expires in 2020.  Given how dramatically sharp has improved this year, and given how lightvector drops hints that there is plenty more for him to optimize before he runs out of ideas, I have to bump my prediction up to 70% that the Challenge is won.  Elapsing time is on our side, but the trend in bot strength versus the trend in human strength is against us much more than I anticipated.  Hats off to lightvector!
IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2015 Arimaa Challenge
« Reply #44 on: Apr 1st, 2015, 1:55am »
Quote Quote Modify Modify

I should also give the 95% confidence interval on sharp's performance rating: 2312 to 2799.  That conveys some idea of how much random variation there is in the screening measurement.  This is unrelated to the reasons I gave above to like humanity's chances better than actual performance rating would suggest.
IP Logged

Pages: 1 2 3 4 5  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.