Arimaa Forum (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
Arimaa >> Events >> 2012 Arimaa Challenge
(Message started by: Fritzlein on May 26th, 2011, 1:02am)

Title: 2012 Arimaa Challenge
Post by Fritzlein on May 26th, 2011, 1:02am
For the record, I am renewing my $1,000 pledge to the Arimaa prize fund for the 2012 Challenge.  Counting the original $10,000 prize announced by Omar, the total prize fund is $11,000 this year.

Title: Re: 2012 Arimaa Challenge
Post by omar on May 26th, 2011, 7:55pm

on 05/26/11 at 01:02:09, Fritzlein wrote:
For the record, I am renewing my $1,000 pledge to the Arimaa prize fund for the 2012 Challenge.  Counting the original $10,000 prize announced by Omar, the total prize fund is $11,000 this year.


Thanks for increasing the challenge prize, Karl.

If anyone else would like to help increase the challenge prize please post here.

Title: Re: 2012 Arimaa Challenge
Post by UruramTururam on Jun 8th, 2011, 2:40am
I drop $150 into the Arimaa Challenge 2012 prize box.

Title: Re: 2012 Arimaa Challenge
Post by Belteshazzar on Jan 23rd, 2012, 6:26pm
Question: Is the Challenge bot going to be selected in the same manner as last year?  Recall that Sharp was by all indications the best even against humans, yet by a fluke Marwin won out.  

Title: Re: 2012 Arimaa Challenge
Post by lightvector on Jan 23rd, 2012, 7:35pm
I don't think Marwin won by a fluke. Marwin is an excellent bot.

I think both were very close in strength, so it seems not surprising to me that they each won over the other (by the barest of margins) last year in different situations. Especially since their playing styles were so different.

Title: Re: 2012 Arimaa Challenge
Post by Belteshazzar on Jan 23rd, 2012, 8:03pm
I wasn't insulting Marwin, which could well three-peat this year.  However, in 2011 Sharp did appear significantly stronger in the qualifying round against humans.  See this discussion (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=events;action=display;num=1299781791;start=60).

Title: Re: 2012 Arimaa Challenge
Post by Fritzlein on Jan 23rd, 2012, 11:33pm

on 01/23/12 at 20:03:14, Belteshazzar wrote:
I wasn't insulting Marwin, which could well three-peat this year.  However, in 2011 Sharp did appear significantly stronger in the qualifying round against humans.  See this discussion (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=events;action=display;num=1299781791;start=60).

Due to the one statistics course I survived, I still break out in hives every time someone says "significant" in connection with numbers.  Of course the best-of-five mini-match that sharp won 3-2 over marwin in the Computer Championship was not significant: between evenly matched opponents there is a 25% chance of a 3-0 result, so even a clean sweep can't be considered significant!

In the the longer round against humans, Sharp got off to a fast start, which had us all worried, but when all the votes were counted, it ended up in a dead heat.  Marwin's victory by one in 40 comparisons was not significant, and wouldn't have been even if the comparisons had been direct rather than mediated by a human opponent.  Indeed, winning by one in a match of any length is insignificant; the longer the match is, the less significant that extra victory is.  In terms of my Elo calculation whereby sharp performed three Elo points higher than marwin, that difference in playing strength would not be significant unless it were calculated across more than 50,000 games!

We humans are conditioned to see patterns, and we manage to do so with or without justification.  In college I was a volunteer subject in an experiment where I was supposed to guess the next symbol a computer would display on screen.  After ten minutes I couldn't find a pattern, and my guesses had been no better than random, but in the exit interview I expressed confidence that I would be able to find the pattern if they would only give me more time.  Afterwards I learned that the symbols had been completely random, and the only point of the experiment was to test people's confidence when they have been given no reason to be confident.  :-[

Perhaps in subjective terms I and others felt that sharp was better than marwin, but in objective terms "all indications" in the form of game results were resoundingly insignificant.

Title: Re: 2012 Arimaa Challenge
Post by Fritzlein on Jan 23rd, 2012, 11:35pm

on 06/08/11 at 02:40:19, UruramTururam wrote:
I drop $150 into the Arimaa Challenge 2012 prize box.

Awesome, that brings us up to $11150!

Title: Re: 2012 Arimaa Challenge
Post by omar on Mar 12th, 2012, 12:16am
The screening period for the challenge has started. Please try and play some games against the bots.

I guess Mark (mistre) might be busy and didn't get a chance yet to post who the challenge defenders are. I've sent a message to remind him.

Title: Re: 2012 Arimaa Challenge
Post by UruramTururam on Mar 12th, 2012, 2:44am
Just because I'm silent and not playing It does not mean that I'm not here. I am here and I remember about my pledge.  ;)

Title: Re: 2012 Arimaa Challenge
Post by rbarreira on Mar 12th, 2012, 4:14am
It seems that bot_briareus has the wrong time control set up for the screening games. It is playing at 30s per move against Simnik right now.

I've notified Omar but I'm posting here about it so that people know they should not start screening yet.

Title: Re: 2012 Arimaa Challenge
Post by rbarreira on Mar 12th, 2012, 8:17am
Omar replied saying he has fixed it, so I guess the screening can proceed.

Title: Re: 2012 Arimaa Challenge
Post by mistre on Mar 12th, 2012, 9:41am
The 2012 challenge match defenders will be:
  • Jean Daligault (chessandgo)  
  • Eric Momson (Nombril)  
  • Tarou Auso (hanzack)

The backup will be:
  • Omar Syed (omar)

Good Luck!

Title: Re: 2012 Arimaa Challenge
Post by rbarreira on Mar 12th, 2012, 11:23am
As I said in the chatroom, that's an incredibly strong lineup of challenge defenders. The last time all defenders were in the top 10 was 2006.

Title: Re: 2012 Arimaa Challenge
Post by Eltripas on Mar 12th, 2012, 11:11pm

on 03/12/12 at 09:41:52, mistre wrote:
The 2012 challenge match defenders will be:
  • Jean Daligault (chessandgo)  
  • Eric Momson (Nombril)  
  • Tarou Auso (hanzack)

The backup will be:
  • Omar Syed (omar)

Good Luck!


I think Nombril's name is Eric Momsen, not Momson.

Title: Re: 2012 Arimaa Challenge
Post by thomastanck on Mar 13th, 2012, 1:20am
and hanzack's name isn't Tarou Asou either, I think it'd be better to just have him stay anonymous.

Title: Re: 2012 Arimaa Challenge
Post by Swynndla on Mar 13th, 2012, 5:34am

on 03/12/12 at 11:23:43, rbarreira wrote:
As I said in the chatroom, that's an incredibly strong lineup of challenge defenders. The last time all defenders were in the top 10 was 2006.

Maybe it's a sign that the bots have gotten so strong that there needs to be defenders of this caliber to defend the challenge. ;)

Title: Re: 2012 Arimaa Challenge
Post by Hippo on Mar 13th, 2012, 11:07am

on 03/12/12 at 11:23:43, rbarreira wrote:
As I said in the chatroom, that's an incredibly strong lineup of challenge defenders. The last time all defenders were in the top 10 was 2006.


Yes, I would expect to use 1st, 2nd and 4th of the WC as a last resort in a case challenge seems to be almost over ... we have 2013, 2014, 2015, 2016, 2017, 2018, 2019 and 2020 in front of us.
It would be hard to create stronger teams for these years.

It would be bad for arimaa to lose the challenge this year, but will we be able to follow with consistent conditions? What if the next year the top 2 humans would not be able for defense and the bot win. How would we think about it?

Never mind the decision was made and it's fully in Omar's hands to chose the defenders. ... I again hope we had time at least till 2015 even when it contradicts what I have written previous year.

Title: Re: 2012 Arimaa Challenge
Post by Adanac on Mar 13th, 2012, 11:40am

on 03/13/12 at 11:07:24, Hippo wrote:
It would be bad for arimaa to lose the challenge this year, but will we be able to follow with consistent conditions? What if the next year the top 2 humans would not be able for defense and the bot win. How would we think about it?


Another strange situation would be Chessandgo playing the screening games in some future year, winning all 4 games, but then the three human defenders all lose their match in the Arimaa Challenge.  I wonder if in the future the screening games should just be 1 game against each bot so that humans aren't demonstrating the ability to win 4/4 before the Arimaa Challenge even begins.  It seems to defeat the whole purpose of the AC  ;D

A different solution is to change the challenge games to 1 minute/move.  Then even if a human player wins all 4 screening games the Arimaa Challenge still has full legitimacy because the screening games weren't played at the "official" time control.  That would probably encourage more people to play the screening bots, too.

Title: Re: 2012 Arimaa Challenge
Post by Swynndla on Mar 13th, 2012, 3:10pm
I'm predicting the defenders will win all non-handicap games, and it will be such a thrashing that we'll all be wondering just how close the bots really are to taking the challenge.  I'm not saying the defenders should be a weaker lineup at all - I'm just making a prediction.  :)

Title: Re: 2012 Arimaa Challenge
Post by tharkun on Mar 13th, 2012, 3:22pm
I'm not so sure about briareus... But I do not fancy marwin's chances should it win the screening.

Title: Re: 2012 Arimaa Challenge
Post by mistre on Mar 13th, 2012, 3:41pm
Prediction:

Briareus wins screening and then shocks with 2 wins in Challenge.  Of course neither are vs. Chessandgo...

Title: Re: 2012 Arimaa Challenge
Post by aaaa on Mar 13th, 2012, 6:46pm
I have two questions:
  • How far into the human championship were the defenders picked?
  • What reassured Omar that hanzack won't falsely resign his games?

Title: Re: 2012 Arimaa Challenge
Post by Adanac on Mar 13th, 2012, 8:18pm

on 03/13/12 at 18:46:14, aaaa wrote:
I have two questions:
  • How far into the human championship were the defenders picked?
  • What reassured Omar that hanzack won't falsely resign his games?


When Omar asked me in past years, it was usually in early or mid-February.

Notice that Omar selected hanzack in the same year as chessandgo ;).  So even if hanzack loses, it doesn't affect the Arimaa Challenge.  And the results of this year's Challenge will affect his decisions in future years.  Hanzack has been taking all of his World Championship games seriously for the past 2 years, and I'm sure he'll give his best effort in the AC.  I'm expecting him to win his match, with a good chance at 3-0.

Title: Re: 2012 Arimaa Challenge
Post by omar on Mar 13th, 2012, 11:59pm
I usually like to have one very strong player (like chessandgo, Fritzlein or Adanac), and two rising star type players who have not played in the AC before. Since Fritzlein and Adanac have played in recent AC, I want to let chessandgo have a chance. Nombril and hanzack haven't played before and are definitely rising stars.

I asked hanzack not resign in his WC games and he as cooperated, so I trust he will cooperate in the AC as well.

Title: Re: 2012 Arimaa Challenge
Post by Hippo on Mar 15th, 2012, 2:23am

on 03/13/12 at 23:59:25, omar wrote:
I usually like to have one very strong player (like chessandgo, Fritzlein or Adanac), and two rising star type players who have not played in the AC before. Since Fritzlein and Adanac have played in recent AC, I want to let chessandgo have a chance. Nombril and hanzack haven't played before and are definitely rising stars.

I asked hanzack not resign in his WC games and he as cooperated, so I trust he will cooperate in the AC as well.


OK, this point of view explains it well. And according the screening games I am not afraid of humanity being defeated this year. Max, ocmiente, Harren overplayed them easily ... I myself was much more frustrated by the game length than the playing strength of the bots.
Definitely this is not wise to beleive in fast goal attack or any races (capture/goal/or even mixed).
This is where the alpha-beta bots are amazing.
But keeping good piece allignment and key squares guarant there would be captures nonetheless to tactics and finally win on depleted board by any means.

Restricting bot's elephant mobility helps a lot, as well as space advantage whenever there is safe way to advance.

... of course these are general advices what could be summarised as ... play Fritzlein style.

Title: Re: 2012 Arimaa Challenge
Post by Eltripas on Mar 17th, 2012, 3:43pm

on 03/13/12 at 15:41:10, mistre wrote:
Prediction:

Briareus wins screening and then shocks with 2 wins in Challenge.  Of course neither are vs. Chessandgo...


So you are pretty much predicting that Nombril will lose his mini-match or that Hanzack won't take his games seriously, Hanzack is the best bot basher around and almost as good if not as good as C&G, if he takes his games seriously there is no chance he will lose.

Title: Re: 2012 Arimaa Challenge
Post by mistre on Mar 17th, 2012, 9:27pm
Ok, here is my prediction by player:

Chessandgo - 3-0
Hanzack - 2-0
Nombril - 2-1
Omar subbing for Hanzack - 0-1

;)

Seriously, you are probably right, neither Hanzack or Chessandgo should lose a game and I give Nombril 50-50 odds of going undefeated.


Title: Re: 2012 Arimaa Challenge
Post by Fritzlein on Mar 18th, 2012, 2:08pm

on 03/13/12 at 01:20:55, thomastanck wrote:
and hanzack's name isn't Tarou Asou either, I think it'd be better to just have him stay anonymous.

QFT.  Omar, if hanzack had given his name as "Omar Syed", would you go around saying "Omar Syed is defending the Challenge."?

Title: Re: 2012 Arimaa Challenge
Post by Fritzlein on Mar 18th, 2012, 2:59pm
I will update bot performance here as I have time.  (Sorry about my complete absence last week.)

Year  Pairs  Decisive  Winner / Score / Perf  Loser / Score / Perf
----  -----  --------  ---------------------  --------------------
2007     12    .    2    . bomb / 2 / 2087    . Zombie / 0 / 1876  
2008     16    .    7    . bomb / 6 / 1918    .  sharp / 1 / 1576
2009     23    .    7  clueless / 5 / 1910    . GnoBot / 2 / 1792
2010     25    .   11    marwin / 6 / 2065    clueless / 5 / 1960
2011     40    .   11    marwin / 6 / 2110    .  sharp / 5 / 2109
2012     33    .    7  briareus / 5 / 2232    . marwin / 2 / 2128


So far briareus is 14-4 for a performance of 2181 and marwin is 9-8 for a performance of 1955.  This just shows how much variability there is in a small number of games; it seems highly unlikely that marwin is weaker than last year.  I hope we get 40 completed pairs again this year!

[EDIT]
Now briareus is 16-6 for a performance of 2196 and marwin is 11-9 for a performance of 1969.  Harren and Adanac completed their respective sweeps, which would have taken briareus down a peg but for my loss. :'(  Briareus has opened a 2.5-game lead over marwin, which looks huge, but recall last year when sharp dashed out to a similar early lead over marwn.  We all thought sharp was invincible, but marwin came back to win the screening.  Will history repeat itself?

[EDIT]
Strangely, the six games since last update all involved briareus.  Briareus went 5-1, improving its record to 21-7, increasing its lead over marwin to 3.5 games, and increasing its performance rating to 2214.

[EDIT]
Mawin has come roaring back since last update, going 6-1 with its only loss to me, while briareus lost the only game it played, to browni3141, rated 1830.  Briareus is now 21-8 while marwin is 17-10.  Briareus' performance rating dipped to 2173 while marwin's shot up to 2086.  In the only statistic that actually matters, marwin now trails by only 1.5 games.  With one week of screening left, it's still anybody's match.

[EDIT]
The bots are 9-0 since last update, bringing briareus to 25-8 with performance rating 2204 and marwin to 22-10 with performance rating 2113.  With just a couple of days left in the screening, Briareus's lead of 1.5 games is going to hold up, especially if nobody can beat either bot!

[EDIT]
The bots finish on a 14-1 run, with briareus going 8-0 to close out with an overall record of 33-8 and a performance of 2232.  Marwin finishes 28-11 with a performance of 2128, just a touch higher than last year.  Unlike the photo-finishes in 2010 and 2011, this year had a clear winner.  The bot that gave humanity the most fits was clearly briareus.  Congratulations rbarriera!

Title: Re: 2012 Arimaa Challenge
Post by hyperpape on Mar 20th, 2012, 11:25am
Btw: if bots have a lower variance in winning (they very consistently beat players below a certain mark, and very consistently lose to players above a different mark) isn't the probability distribution of the performance rating dependent on the strength of opponents in a way that a human's performance rating wouldn't?

I guess the very top bots might not show the same pattern, since they're a lot less one-dimensional than old Bomb et al.

Title: Re: 2012 Arimaa Challenge
Post by Fritzlein on Mar 20th, 2012, 6:00pm

on 03/20/12 at 11:25:19, hyperpape wrote:
Btw: if bots have a lower variance in winning (they very consistently beat players below a certain mark, and very consistently lose to players above a different mark) isn't the probability distribution of the performance rating dependent on the strength of opponents in a way that a human's performance rating wouldn't?

This situation you outline is exactly what I believe.  If players rated below the bot participate in the screening, the bot should be infallible and its performance rating should be biased upward.  I think this happened in the 2007 screening.  On the other hand, if players rated above the bot participate in the screening, the bot should be crushed and its performance rating should be biased downward.  This was more like the 2008 screening.  Bomb didn't change between 2007 and 2008, so something else must explain the 169-point drop in its performance rating between the two years.

The only trouble with our theory is that it doesn't match the facts of the current year.  Postulating a true strength of 2200 for both bots, there are few humans with a WHR above that level.  Chessandgo, hanzack, and Nombril can't participate.  Rabbits and 99of9 haven't started yet.  Boo, Adanac, Hippo, and I have collectively won six and lost five, hardly a dominating score.

Meanwhile, the supposedly over-matched rest of the field has won nine and lost twenty-two, much better than expected from their WHR ratings.  Harren(4-0), Max(3-1), ocmiente(1-0), and aaaa(1-0) are all rated lower than the hypothetical strength of the bots.  Their wins are what is keeping down the performance rating of the two bots.


Quote:
I guess the very top bots might not show the same pattern, since they're a lot less one-dimensional than old Bomb et al.

I don't know how to explain the failure of our theory at present, so I'll just file away the notion that it may well be wrong, and await further observations.  Perhaps it is just random fluctuations due to a small data set and further games will spare me the inconvenience of changing my mind.  :)

(Also it is a great temptation to start gloating too early that my prediction of performance rating looks closer than rbarriera's prediction.  But it would still be quite possible for briareus to top 2300 in performance by the time the screening is over, so I had better keep my mouth shut.  :P)

Title: Re: 2012 Arimaa Challenge
Post by Fritzlein on Mar 28th, 2012, 8:01am
The bots are on a twelve-game winning streak.  With only a few days left in the screening, who is stand up for humanity and put silicon back in its place?  (Or even just give it a try?  ;))

Title: Re: 2012 Arimaa Challenge
Post by tize on Mar 30th, 2012, 2:07pm
The bots are still undefeted this week. Go silicon, go!  :)

Title: Re: 2012 Arimaa Challenge
Post by Fritzlein on Mar 31st, 2012, 10:12pm
Congratulations, Ricardo!  Briareus won convincingly, although I must point out that the performance rating of 2232 falls shy of your "closer to 2300 than 2200" prediction.  (I so seldom predict correctly that I have to make a fuss over it when I do.   ;))  The fact that marwin fell well short of 2200 despite edging out briareus in the Computer Championships makes me wonder marwin isn't as tough on humans or it was just random variation.

The total of 33 completed pairs falls short of last year's 40, which is surprising given that bot advances have given new spice to the man vs. machine contest.  Ah, well, now we can kick back and wait for the Arimaa Challenge games.

Title: Re: 2012 Arimaa Challenge
Post by tize on Apr 1st, 2012, 1:46am

Quote:
The bots finish on a 14-1 run, with briareus going 8-0 to close out with an overall record of 33-8 and a performance of 2232.

Actually if you just take briareus games then it's even more impressive, it finished the screen with 12 straight victories!

Congratulations rbarriera!

Title: Re: 2012 Arimaa Challenge
Post by Hippo on Apr 1st, 2012, 2:36am
Oh sorry I have not finished the other pair.
The only long enough time window I have found was time when I had headache :(. Hmmm ... actually I could play last night ... my bad.
Fortunately this time the unfinished games does not favour the bot finishing second.

Congrats Ricardo, good job Mattias.

Title: Re: 2012 Arimaa Challenge
Post by omar on Apr 4th, 2012, 2:59pm
Congrats rbarriera. Interestingly this is the second time now where the bot that placed second in the computer championship performed better against the humans.

The challenge match games for round one have been scheduled. Please check the gameroom for your local times.

If there is interest in commentating on these games, please post here. It would be interesting to have some bot developers commentate along with some top players.

Title: Re: 2012 Arimaa Challenge
Post by Arimabuff on Apr 4th, 2012, 11:40pm

on 04/04/12 at 14:59:09, omar wrote:
...If there is interest in commentating on these games, please post here. It would be interesting to have some bot developers commentate along with some top players.

I am neither but I'd still be interested in commentating those games.

Title: Re: 2012 Arimaa Challenge
Post by omar on Apr 5th, 2012, 2:25pm

on 04/04/12 at 23:40:49, Arimabuff wrote:
I am neither but I'd still be interested in commentating those games.


Great. Feel free to join in TeamSpeak. It's two minutes per move, so it'll be good to have a few people there.

Title: Re: 2012 Arimaa Challenge
Post by Arimabuff on Apr 12th, 2012, 9:50am

on 04/05/12 at 14:25:40, omar wrote:
Great. Feel free to join in TeamSpeak. It's two minutes per move, so it'll be good to have a few people there.

I am sorry I wasn't there last time. I'll do my best to be present for each game from now on, even though unexpected events sometimes get in the way.

Title: Re: 2012 Arimaa Challenge
Post by Fritzlein on Apr 12th, 2012, 12:39pm
Thanks for this community service, Patrick!

Title: Re: 2012 Arimaa Challenge
Post by Arimabuff on Apr 14th, 2012, 9:54am

on 04/12/12 at 12:39:27, Fritzlein wrote:
Thanks for this community service, Patrick!

That's nice to hear, I mean read. ;)

Title: Re: 2012 Arimaa Challenge
Post by mistre on Apr 16th, 2012, 9:59am

on 03/13/12 at 15:41:10, mistre wrote:
Prediction:

Briareus wins screening and then shocks with 2 wins in Challenge.  Of course neither are vs. Chessandgo...


Turns out I was right... This does not bode well for humans going forward.  We might be in for several years of the computer champion beating 1 or 2 challengers.

Let me see if I understand the challenge correctly.. So for a bot to win, it has to beat all 3 challengers?  So if a bot loses 2-1 to one challenger, but then destroys the other two 3-0 for a total record of 7-2, it still loses?  Hardly seems fair...


Title: Re: 2012 Arimaa Challenge
Post by Fritzlein on Apr 16th, 2012, 10:30am

on 04/16/12 at 09:59:38, mistre wrote:
Let me see if I understand the challenge correctly.. So for a bot to win, it has to beat all 3 challengers?  So if a bot loses 2-1 to one challenger, but then destroys the other two 3-0 for a total record of 7-2, it still loses?

Correct.  This means a bot that is even in skill with all three defenders has only 1/8 chance of winning the Challenge in any given year.  In order for a bot to have a 1/2 chance of winning the Challenge in any given year, it needs a 79% chance of winning each mini-match (assuming all three defenders are equal), which translates into 71% chance of winning each game, which means being 154 Elo stronger than the defenders.

On the other hand, the bots get to try every year.  If we said the bot needed only to win two of the three mini-matches, so that a bot that was even in skill with all three defenders would have a 50% of winning the Challenge in any given year, there would be a high chance of a weaker bot winning by a fluke.  A bot rated 100 Elo below the defenders would have a 36% chance of winning each game, a 29.5% chance of winning each mini-match, and a 18.5% chance of winning the Challenge.  If that bot tried for five years, it would have a 64% chance of winning the Challenge some year.  That is to say, a bot weaker than all the defenders would be a favorite to win the Challenge given multiple tries.

It isn't obvious how to deal with uncertainty.  Yes, it is unfair that a bot on a par with the best humans is an underdog to win, but it would also be unfair if a bot won the Challenge on a fluke and humans had no chance to win it back.  It would be silly if a bot walked away with the prize, and a month later several humans were consistently winning the majority of games against that bot.  When you balance out potential evils, the Challenge structure may not be as unfair as it first appears.

Title: Re: 2012 Arimaa Challenge
Post by mistre on Apr 16th, 2012, 12:22pm
I think a MORE fair way to do it (and would make more sense) would have been for humans to have to win a majority of the matches in a majority of the pairings.  So, 2 of the 3 challengers would have to go at least 2-1 vs the bot.  The 3rd challenger could still go 0-3, so the bot could still finish with a winning record and lose the challenge (but only 5-4 and not 7-2).

If you want to reduce the possibility of a fluke, then make each series best of 5 instead of best of 3.

Another way of looking at this is the following scenario:  If Omar had said that their would be 9 human challengers instead of 3, would it make more sense that at least a majority of of the challengers (5 out of 9) would have the win the series or if only one challenger would have to win the series (1 out of 9). If the 2nd scenario is the case, the more people you add, the more you tilt the odds in favor of the humans.

But this is Omar's challenge and it is his rules.  He can make it as unfair as he wants it...   ;D

Title: Re: 2012 Arimaa Challenge
Post by Fritzlein on Apr 16th, 2012, 1:25pm

on 04/16/12 at 12:22:25, mistre wrote:
I think a MORE fair way to do it (and would make more sense) would have been for humans to have to win a majority of the matches in a majority of the pairings.  So, 2 of the 3 challengers would have to go at least 2-1 vs the bot.  The 3rd challenger could still go 0-3, so the bot could still finish with a winning record and lose the challenge (but only 5-4 and not 7-2).

That's the scenario for which I ran the math.


Quote:
If you want to reduce the possibility of a fluke, then make each series best of 5 instead of best of 3.

But still, a bot could lose three years, win one year, then lose the next three years, right?  How is it fair that the bot can win one even Challenge match in seven years, lose six of them, and still be considered better than all humans?


Quote:
But this is Omar's challenge and it is his rules.  He can make it as unfair as he wants it...   ;D

Indeed, our arguing isn't going to change the rules, and it shouldn't.  I was upset when Omar changed the rules in the past, and I will be upset again if he changes the rules in the future.  You lose credibility in a big hurry if you offer a monetary prize and then change what people need to do to win it after they have already put in a bunch of effort.

How we talk about the rules does matter, though, even if the rules are set in stone.  I think it is important to understand that the unfairness (which I freely admit) of having to win all three matches instead of just two is balanced by the unfairness (which you haven't acknowledged) that the bot only has to win one year to win for all time, even if the bot isn't yet good enough to win a majority of games in a majority of years.

I do hope that we are never in a situation where it looks like a bot is able to win a majority of games from all humans, but humanity keeps defending the Challenge on a fluke.  Humans defending with a 2-7 score would be embarrassing, because it would suggest that even the player who won his match 2-1 might not be better than the bot, and might have merely gotten lucky.

Title: Re: 2012 Arimaa Challenge
Post by mistre on Apr 16th, 2012, 3:47pm

on 04/16/12 at 13:25:31, Fritzlein wrote:
That's the scenario for which I ran the math.

I do hope that we are never in a situation where it looks like a bot is able to win a majority of games from all humans, but humanity keeps defending the Challenge on a fluke.  Humans defending with a 2-7 score would be embarrassing, because it would suggest that even the player who won his match 2-1 might not be better than the bot, and might have merely gotten lucky.


From an earlier post - Btw: if bots have a lower variance in winning (they very consistently beat players below a certain mark, and very consistently lose to players above a different mark) isn't the probability distribution of the performance rating dependent on the strength of opponents in a way that a human's performance rating wouldn't?

I know that the comment above was not proven in the screening period, but you stated that you believed the statement and it might have not proven true due to the small sample size of games.  If the statement above is true, does that make your math for which you ran the scenario unreliable?

I can't prove my theory, but I see the scenario that a bot losing the challenge with a 2-7 record as more likely to occur at some point in time than the challengers losing by a fluke (all finishing 1-2 even though they are all stronger than the bot).  I guess time will tell.

Another point I would like to make - even at the current playing strength of the top bots, there exists the possibility that certain players are just better at botbashing versus playing humans. For example Harren and Max (who went a combined 7-1 in the screening), might have fared better in the challenge than Nombril even though they are ranked lower than Nombril in WHR.  There actually might come a time when the best players for the challenge might not actually be the strongest players vs humans....

Title: Re: 2012 Arimaa Challenge
Post by Fritzlein on Apr 16th, 2012, 9:43pm

on 04/16/12 at 15:47:49, mistre wrote:
From an earlier post - Btw: if bots have a lower variance in winning (they very consistently beat players below a certain mark, and very consistently lose to players above a different mark) isn't the probability distribution of the performance rating dependent on the strength of opponents in a way that a human's performance rating wouldn't?

I know that the comment above was not proven in the screening period, but you stated that you believed the statement and it might have not proven true due to the small sample size of games.  If the statement above is true, does that make your math for which you ran the scenario unreliable?

Yes, if bot vs. human games have a lower variance than human vs. human games, then that throws off my calculation.  That would make any result more indicative of the true man vs. machine relationship.


Quote:
I can't prove my theory, but I see the scenario that a bot losing the challenge with a 2-7 record as more likely to occur at some point in time than the challengers losing by a fluke (all finishing 1-2 even though they are all stronger than the bot).  I guess time will tell.

Indeed, it looks like the Challenge is going to be close enough down the wire that it could play out in various flukey ways.  It is looking more likely that the exact rules will matter before 2020.


Quote:
Another point I would like to make - even at the current playing strength of the top bots, there exists the possibility that certain players are just better at botbashing versus playing humans. For example Harren and Max (who went a combined 7-1 in the screening), might have fared better in the challenge than Nombril even though they are ranked lower than Nombril in WHR.  There actually might come a time when the best players for the challenge might not actually be the strongest players vs humans....

I agree, although I expect the effect is weaker with a new bot that folks haven't had time to work out a formula for.

Title: Re: 2012 Arimaa Challenge
Post by Arimabuff on Apr 17th, 2012, 6:52am
I think the rules of the challenge should reflect what we are trying to prove with the challenge, for example if we want to make it absolutely certain that a bot dominates humanity we could impose that a bot that wins an equitable challenge against say three humans, would have to play an entire year without change against human players, on this site and then if after that year the bot still wins again under the same equitable conditions against three players then there shouldn't be any doubt that the bot is definitely superior to humanity.

Title: Re: 2012 Arimaa Challenge
Post by hyperpape on Apr 17th, 2012, 7:22am
While it might be nice for the Arimaa challenge to perfectly reflect the question of who is stronger, it seems more like a signpost. Any bot that can win the challenge is capable of playing at about the same level as the top humans. A judgment of whether it's just close, equal or above them has to depend on the circumstances.

I think Patrick's idea of seeing how the bot fares after open play is a good one. It just doesn't have to be attached to the prize.

Title: Re: 2012 Arimaa Challenge
Post by hyperpape on Apr 17th, 2012, 7:39am
I've been recording information about the match as it progresses on the wiki, but as far as analysis, I'm limited to reporting what others say. If anyone else wants to add thoughts about the way the games have gone, feel free to do so (or relay them to me if you don't have an account).



Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.