Arimaa Forum - 2010 Challenge Screening

Welcome, Guest. Please Login or Register.
May 16^th, 2024, 7:57pm

Home

Help

Members

Arimaa Forum « 2010 Challenge Screening »

   Arimaa Forum
   Arimaa
   Events (Moderator: supersamu)
   2010 Challenge Screening

« Previous topic | Next topic »

Pages: 1 2

Notify of replies

Send Topic

Author

Topic: 2010 Challenge Screening (Read 3581 times)

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: 2010 Challenge Screening
« Reply #15 on: Mar 31^st, 2010, 5:04pm »

Quote

Modify

Congratulations to marwin on a one point victory in the screening!

Year Pairs Decisive Winner / Score / Perf Loser / Score / Perf
---- ----- -------- --------------------- --------------------
2007 12 . 2 . bomb / 2 / 2087 . Zombie / 0 / 1876
2008 16 . 7 . bomb / 6 / 1918 . sharp / 1 / 1576
2009 23 . 7 clueless / 5 / 1910 . GnoBot / 2 / 1792
2010 25 . 11 marwin / 6 / 2065 clueless / 5 / 1960

I would have guessed clueless would perform in the 1950-2000 range and marwin in the 2000-2050 range. Marwin's actual performance rating of 2065 in the screening is scary good. However, both camelback and robinson had winning positions which they had to abandon due to time constraints. Removing these two games drops marwin's performance rating to 2029, and scoring them as wins for the humans would take marwin all the way down to 1994.

So I don't think marwin is better than expected, just a significant step forward and not a huge one. If we estimate that Bomb played at 1850 in 2004 and marwin now plays at 2050 in 2010, that's a rate of progress of 33 rating points per year. We'll have to see whether that long-term rate projects linearly into the future, or continues its more recent spike.

IP Logged

aaaa
Forum Guru

Arimaa player #958

Posts: 768

Re: 2010 Challenge Screening
« Reply #16 on: Mar 31^st, 2010, 7:57pm »

Quote

Modify

Congratulations again, tize. I noticed I'm the only one who gave either bot 2 net points, but fortunately, the fact that the total influence of my games was more than those of any other, didn't make the difference, if barely. Maybe next time, for fairness sake, the number of different opponents with unequal results should be considered first, with the current scoring system being the first tiebreaker and, finally, the championship result. Thoughts anyone?

IP Logged

tize
Forum Guru

Arimaa player #3121

Gender: male

Posts: 118

Re: 2010 Challenge Screening
« Reply #17 on: Apr 2^nd, 2010, 2:26am »

Quote

Modify

Thank you guys.

I never would have guessed that the screening could be this even with marwin and clueless taking turns to be ahead with just a few days before finish.

Quote:

If we estimate that Bomb played at 1850 in 2004 and marwin now plays at 2050 in 2010, that's a rate of progress of 33 rating points per year.

If say that this years hardware was 8 times faster than 2004's hardware and that a doubling of speed gives 100 points, we have that the software improvments have a negative progress rate of about 16 rating points per year. Shocked

I better stop "improving" marwin... Undecided

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: 2010 Challenge Screening
« Reply #18 on: Apr 2^nd, 2010, 4:47am »

Quote

Modify

on Apr 2^nd, 2010, 2:26am, tize wrote:

If say that this years hardware was 8 times faster than 2004's hardware and that a doubling of speed gives 100 points

The amount of rating improvement from doubling CPU speed is a figure I am very interested in. It appears to be a bit less than 100 for chess. I doubt it would be more for Arimaa than for chess; I waver between thinking the benefit of CPU doubling will be less for Arimaa and thinking it will be about the same for both games.

As for the software improvement represented by marwin, one could also make a case that Bomb played at 1850 strength in 2008, so marwin playing at 2050 in 2010 represents a rate of progress of 100 rating points per year. Even Assuming 40 points of hardware progress per year (particularly generous since a quad core doesn't search 4x nodes), that still leaves 60 points per year due to better software.

IP Logged

chubb
Forum Newbie

Arimaa player #3740

Gender: male

Posts: 1

Re: 2010 Challenge Screening
« Reply #19 on: Apr 5^th, 2010, 12:56am »

Quote

Modify

Hi,

could you tell me what screening exactly means. Why let the bots play against each other again after the computer championship? To find their rating?

Apart from that I am curious about the challenge matches. It should have started yesterday, but I can't find anything about them and the first scheduled match for bot_Marwin is scheduled for Friday. It would be cool if the coverage of the matches would be easier to know in advance and to follow.

Thank you,
chubb

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: 2010 Challenge Screening
« Reply #20 on: Apr 5^th, 2010, 5:07am »

Quote

Modify

on Apr 5^th, 2010, 12:56am, chubb wrote:

could you tell me what screening exactly means. Why let the bots play against each other again after the computer championship? To find their rating?

When the Arimaa Challenge was first established, the computer challenger was simply the winner of the Computer Championship. Starting in 2007, the rules were changed so that the top two computers from the Computer Championship have a playoff (screening) for the right to be the challenger. But the bots do not play off against each other, they play against humans.

It seems that it would be possible to develop a bot that plays well against other bots and poorly against humans; this is not the kind of bot we want in the Arimaa Challenge. The Computer Championship only tells us which bot plays well against other bots, whereas the screening shows how well the top two bots play against humans. So far, the winner of the Computer Championship has also won the screening in every year, but I don't expect the trend to continue indefinitely. The primary purpose of the screening is to keep developers focused on winning the Arimaa Challenge, instead of just trying to win the Computer Championship and giving up on beating humans.

A secondary reason for the screening is to prevent a bot from winning the Arimaa Challenge with glaring weaknesses that humans can exploit, but didn't have time to figure out. Secrecy is the friend of software that can't learn and adapt. The screening gives humans time to test out various strategies against the computer challenger and see which ones are most effective. This makes it far less likely for a computer to win the Arimaa Challenge only to be busted a month later.

Quote:

Apart from that I am curious about the challenge matches. It should have started yesterday, but I can't find anything about them and the first scheduled match for bot_Marwin is scheduled for Friday. It would be cool if the coverage of the matches would be easier to know in advance and to follow.

I'll let Omar field that question. Apparently one of our three Arimaa Challenge defenders has gone incommunicado.

IP Logged

omar
Forum Guru

Arimaa player #2

Gender: male

Posts: 1003

Re: 2010 Challenge Screening
« Reply #21 on: Apr 5^th, 2010, 8:54am »

Quote

Modify

on Apr 5^th, 2010, 12:56am, chubb wrote:

The first round of games will be played this week. The challenge defenders get to select the time for their games. I have scheduled the games for the first round. In the gameroom look in the 'Scheduled Games' section.

IP Logged

tize
Forum Guru

Arimaa player #3121

Gender: male

Posts: 118

Re: 2010 Challenge Screening
« Reply #22 on: Apr 26^th, 2010, 1:06pm »

Quote

Modify

Since we talked about how much rating increase a double of cpu power would give I have made a little experiment with marwin.

I've let marwin play itself with different time to think to get a rough idea of the rating difference of a cpu doubling when two players have the same strategic strength.

And here's what I got:

	Games	Won	Winning %	Rating per doubling
1s vs 2s	80	49	61	79
2s vs 10s	56	44	78	96
10s vs 20s	72	45	62	88
15s vs 30s	88	59	67	123
15s vs 60s	30	24	80	120

Which means that a doubling of cpu power gives you about 100 rating points in the best case. When facing humans or other bots I assume that the rating difference is smaller.

IP Logged

omar
Forum Guru

Arimaa player #2

Gender: male

Posts: 1003

Re: 2010 Challenge Screening
« Reply #23 on: Apr 26^th, 2010, 3:36pm »

Quote

Modify

Thanks for posting this. For the last one did you mean 30s vs 60s?

I am surprised it is gaining 100 points per doubling. In chess they say it is about 50 to 70 elo points per doubling.

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: 2010 Challenge Screening
« Reply #24 on: Apr 26^th, 2010, 8:57pm »

Quote

Modify

Thanks for running that experiment, tize. I'm very interested in the results. Since we were starting to drift a little off topic, I replied in this thread.

IP Logged

Pages: 1 2

Notify of replies

Send Topic


« Previous topic \| Next topic »