Welcome, Guest. Please Login or Register.
May 16th, 2024, 7:57pm

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « 2010 Challenge Screening »


   Arimaa Forum
   Arimaa
   Events
(Moderator: supersamu)
   2010 Challenge Screening
« Previous topic | Next topic »
Pages: 1 2  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: 2010 Challenge Screening  (Read 3581 times)
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2010 Challenge Screening
« Reply #15 on: Mar 31st, 2010, 5:04pm »
Quote Quote Modify Modify

Congratulations to marwin on a one point victory in the screening!

Year  Pairs  Decisive  Winner / Score / Perf  Loser / Score / Perf
----  -----  --------  ---------------------  --------------------
2007     12    .    2    . bomb / 2 / 2087    . Zombie / 0 / 1876  
2008     16    .    7    . bomb / 6 / 1918    .  sharp / 1 / 1576
2009     23    .    7  clueless / 5 / 1910    . GnoBot / 2 / 1792
2010     25    .   11    marwin / 6 / 2065    clueless / 5 / 1960

I would have guessed clueless would perform in the 1950-2000 range and marwin in the 2000-2050 range.  Marwin's actual performance rating of 2065 in the screening is scary good.  However, both camelback and robinson had winning positions which they had to abandon due to time constraints.  Removing these two games drops marwin's performance rating to 2029, and scoring them as wins for the humans would take marwin all the way down to 1994.
 
So I don't think marwin is better than expected, just a significant step forward and not a huge one.  If we estimate that Bomb played at 1850 in 2004 and marwin now plays at 2050 in 2010, that's a rate of progress of 33 rating points per year.  We'll have to see whether that long-term rate projects linearly into the future, or continues its more recent spike.
IP Logged

aaaa
Forum Guru
*****



Arimaa player #958

   


Posts: 768
Re: 2010 Challenge Screening
« Reply #16 on: Mar 31st, 2010, 7:57pm »
Quote Quote Modify Modify

Congratulations again, tize. I noticed I'm the only one who gave either bot 2 net points, but fortunately, the fact that the total influence of my games was more than those of any other, didn't make the difference, if barely. Maybe next time, for fairness sake, the number of different opponents with unequal results should be considered first, with the current scoring system being the first tiebreaker and, finally, the championship result. Thoughts anyone?
IP Logged
tize
Forum Guru
*****



Arimaa player #3121

   


Gender: male
Posts: 118
Re: 2010 Challenge Screening
« Reply #17 on: Apr 2nd, 2010, 2:26am »
Quote Quote Modify Modify

Thank you guys.
 
I never would have guessed that the screening could be this even with marwin and clueless taking turns to be ahead with just a few days before finish.
 
Quote:

If we estimate that Bomb played at 1850 in 2004 and marwin now plays at 2050 in 2010, that's a rate of progress of 33 rating points per year.

If say that this years hardware was 8 times faster than 2004's hardware and that a doubling of speed gives 100 points, we have that the software improvments have a negative progress rate of about 16 rating points per year. Shocked
 
I better stop "improving" marwin...  Undecided
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2010 Challenge Screening
« Reply #18 on: Apr 2nd, 2010, 4:47am »
Quote Quote Modify Modify

on Apr 2nd, 2010, 2:26am, tize wrote:
If say that this years hardware was 8 times faster than 2004's hardware and that a doubling of speed gives 100 points

The amount of rating improvement from doubling CPU speed is a figure I am very interested in.  It appears to be a bit less than 100 for chess.  I doubt it would be more for Arimaa than for chess; I waver between thinking the benefit of CPU doubling will be less for Arimaa and thinking it will be about the same for both games.
 
As for the software improvement represented by marwin, one could also make a case that Bomb played at 1850 strength in 2008, so marwin playing at 2050 in 2010 represents a rate of progress of 100 rating points per year.  Even Assuming 40 points of hardware progress per year (particularly generous since a quad core doesn't search 4x nodes), that still leaves 60 points per year due to better software.  Smiley
IP Logged

chubb
Forum Newbie
*



Arimaa player #3740

   


Gender: male
Posts: 1
Re: 2010 Challenge Screening
« Reply #19 on: Apr 5th, 2010, 12:56am »
Quote Quote Modify Modify

Hi,
 
could you tell me what screening exactly means. Why let the bots play against each other again after the computer championship? To find their rating?
 
Apart from that I am curious about the challenge matches. It should have started yesterday, but I can't find anything about them and the first scheduled match for bot_Marwin is scheduled for Friday. It would be cool if the coverage of the matches would be easier to know in advance and to follow.
 
Thank you,
chubb
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2010 Challenge Screening
« Reply #20 on: Apr 5th, 2010, 5:07am »
Quote Quote Modify Modify

on Apr 5th, 2010, 12:56am, chubb wrote:
could you tell me what screening exactly means. Why let the bots play against each other again after the computer championship? To find their rating?

When the Arimaa Challenge was first established, the computer challenger was simply the winner of the Computer Championship.  Starting in 2007, the rules were changed so that the top two computers from the Computer Championship have a playoff (screening) for the right to be the challenger.  But the bots do not play off against each other, they play against humans.
 
It seems that it would be possible to develop a bot that plays well against other bots and poorly against humans; this is not the kind of bot we want in the Arimaa Challenge.  The Computer Championship only tells us which bot plays well against other bots, whereas the screening shows how well the top two bots play against humans.  So far, the winner of the Computer Championship has also won the screening in every year, but I don't expect the trend to continue indefinitely.  The primary purpose of the screening is to keep developers focused on winning the Arimaa Challenge, instead of just trying to win the Computer Championship and giving up on beating humans.
 
A secondary reason for the screening is to prevent a bot from winning the Arimaa Challenge with glaring weaknesses that humans can exploit, but didn't have time to figure out.  Secrecy is the friend of software that can't learn and adapt.  The screening gives humans time to test out various strategies against the computer challenger and see which ones are most effective.  This makes it far less likely for a computer to win the Arimaa Challenge only to be busted a month later.
 
Quote:
Apart from that I am curious about the challenge matches. It should have started yesterday, but I can't find anything about them and the first scheduled match for bot_Marwin is scheduled for Friday. It would be cool if the coverage of the matches would be easier to know in advance and to follow.

I'll let Omar field that question.  Apparently one of our three Arimaa Challenge defenders has gone incommunicado.
IP Logged

omar
Forum Guru
*****



Arimaa player #2

   


Gender: male
Posts: 1003
Re: 2010 Challenge Screening
« Reply #21 on: Apr 5th, 2010, 8:54am »
Quote Quote Modify Modify

on Apr 5th, 2010, 12:56am, chubb wrote:
Hi,
 
could you tell me what screening exactly means. Why let the bots play against each other again after the computer championship? To find their rating?
 
Apart from that I am curious about the challenge matches. It should have started yesterday, but I can't find anything about them and the first scheduled match for bot_Marwin is scheduled for Friday. It would be cool if the coverage of the matches would be easier to know in advance and to follow.
 
Thank you,
chubb

 
The first round of games will be played this week. The challenge defenders get to select the time for their games. I have scheduled the games for the first round. In the gameroom look in the 'Scheduled Games' section.
IP Logged
tize
Forum Guru
*****



Arimaa player #3121

   


Gender: male
Posts: 118
Re: 2010 Challenge Screening
« Reply #22 on: Apr 26th, 2010, 1:06pm »
Quote Quote Modify Modify

Since we talked about how much rating increase a double of cpu power would give I have made a little experiment with marwin.
 
I've let marwin play itself with different time to think to get a rough idea of the rating difference of a cpu doubling when two players have the same strategic strength.
 
And here's what I got:
GamesWonWinning %Rating per doubling
1s vs 2s80496179
2s vs 10s56447896
10s vs 20s72456288
15s vs 30s885967123
15s vs 60s302480120

 
Which means that a doubling of cpu power gives you about 100 rating points in the best case. When facing humans or other bots I assume that the rating difference is smaller.
IP Logged
omar
Forum Guru
*****



Arimaa player #2

   


Gender: male
Posts: 1003
Re: 2010 Challenge Screening
« Reply #23 on: Apr 26th, 2010, 3:36pm »
Quote Quote Modify Modify

Thanks for posting this. For the last one did you mean 30s vs 60s?
 
I am surprised it is gaining 100 points per doubling. In chess they say it is about 50 to 70 elo points per doubling.
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2010 Challenge Screening
« Reply #24 on: Apr 26th, 2010, 8:57pm »
Quote Quote Modify Modify

Thanks for running that experiment, tize.  I'm very interested in the results.  Since we were starting to drift a little off topic, I replied in this thread.
IP Logged

Pages: 1 2  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.