Welcome, Guest. Please Login or Register.
Nov 22nd, 2024, 4:49pm

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « Will the 2010 Computer Championship be open? »


   Arimaa Forum
   Arimaa
   Events
(Moderator: supersamu)
   Will the 2010 Computer Championship be open?
« Previous topic | Next topic »
Pages: 1 2 3  ...  11 Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: Will the 2010 Computer Championship be open?  (Read 14308 times)
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Will the 2010 Computer Championship be open?
« on: Aug 10th, 2009, 5:29pm »
Quote Quote Modify Modify

In the past we have had sporadic discussion of whether the Computer Championship should be invitation-only for the top bots, or should be open to all interested parties.  The debate never came to a head, though, because the limit of eight imposed by Omar has always been greater than the number of interested participants.  In 2010 this theoretical discussion will probably have practical importance for the first time due to more than eight developers wanting to participate.
 
I am vehemently in favor of the Computer Championship being open, because there is no good way to narrow the field.  The rules last year stated that the eight bots with the highest rating would qualify, but ratings can easily be manipulated, so this is a terrible selection criterion.  For starters, the faster a bot plays, the higher its rating.  I can imagine developers trying to qualify by having their bots play only games at five seconds per move!  But even worse, a developer could pick an opposing bot that their own bot knows how to beat, and play that bot incessantly.  Do we really want a bot to qualify by virtue of having beating ArimaaScoreP1 five hundred times?
 
If we had a better way of selecting the top eight bots, such as the Open Classic tournament which decides the eight finalists for the human World Championship, then I would have no problems with a limited field for the Computer Championship.  But qualifying on the basis of ratings is so prone to abuse that it must be scrapped.
 
And what is the problem with having more bots, anyway?  With the floating triple elimination format, each additional bot adds only three additional games to the tournament length.  Having ten or twelve bots instead of only eight would still be very doable in the short time frame of the server rentals.  In fact, it would be wonderful to see so many bots signed up.
 
If we are afraid of too many spurious entrants, then we could always raise the entry fee to $30, so that instead of needing one win to break even, a bot would need three wins, i.e. an even 3-3 record.  I believe that a higher entry fee would be less of a burden on serious developers than having to waste time artificially pumping up the bot's rating in order to be sure of being in the top eight.
 
How to other people feel about limiting the field of the Computer Championships to eight bots versus having an open tournament?  If Omar declares that the field must be narrowed somehow, what would be the best way to narrow it?
IP Logged

99of9
Forum Guru
*****




Gnobby's creator (player #314)

  toby_hudson  


Gender: male
Posts: 1413
Re: Will the 2010 Computer Championship be open?
« Reply #1 on: Aug 10th, 2009, 5:48pm »
Quote Quote Modify Modify

I'm glad you raised this Fritz, I think it's important to think about.  I'm not sure what the best system is, but here's one thing I already know:
 
The previous year's champion (and perhaps places 2&3) should be given automatic entry.  It would be very disappointing not to see the champion defend its title because of ratings or any other selection reason.
« Last Edit: Aug 10th, 2009, 5:49pm by 99of9 » IP Logged
Simon
Forum Guru
*****



Arimaa player #1198

   


Gender: male
Posts: 125
Re: Will the 2010 Computer Championship be open?
« Reply #2 on: Aug 10th, 2009, 9:58pm »
Quote Quote Modify Modify

I agree that it's preferable not to exclude any bots based on ratings. Having a preliminary set of rounds for qualification would also allow seeding based on something other than highly manipulable gameroom ratings, and for bots it doesn't need to add all that much time to the tournament. If there is only a little more than 8 bots though, it would probably not be worth the trouble, so just extending the existing tournament format to more bots as you suggest would make sense.  
 
Automatic qualification of top previous competitors makes sense if there are exclusions based on ratings, but if there is a pre-qualification tournament I think every bot should compete equally.
IP Logged
jdb
Forum Guru
*****



Arimaa player #214

   


Gender: male
Posts: 682
Re: Will the 2010 Computer Championship be open?
« Reply #3 on: Aug 11th, 2009, 5:06am »
Quote Quote Modify Modify

If Omar posts the schedule in November, I guess this is a good time to discuss the tournament.
 
1. Instead of completely open, the entrants should have to demonstrate a winning record against some fixed performance bot. My first thought would be something along the lines of bot_arimaazilla. This would keep the random move level bots out, but not set the bar too high.
 
2. A hybrid tournament format of a Round robin followed by a floating elimination with losses from the round robin carried forward is worth looking at. It performs much better than straight floating elimination in the simulator, and allows early rounds to be scheduled in advance. With this method, all the bots get to play each other.
IP Logged
Janzert
Forum Guru
*****



Arimaa player #247

   


Gender: male
Posts: 1016
Re: Will the 2010 Computer Championship be open?
« Reply #4 on: Aug 11th, 2009, 9:13am »
Quote Quote Modify Modify

I really like Jdb's hybrid format proposal. One thing to keep in mind though, anything that increases the number of games needed for the whole tournament is going to require more automation since I believe Omar is already pretty close to the limit of time he can spend running the tournament.
 
For Jdb's proposal specifically this probably means at least the roundrobin portion of the tournament would need to be able to run without any intervention by Omar once it was started. Certainly something that is possible to do, but is not in place currently (in past CC's Omar has had to start each game manually).
 
Janzert
IP Logged
RonWeasley
Forum Guru
*****




Harry's friend (Arimaa player #441)

   


Gender: male
Posts: 882
Re: Will the 2010 Computer Championship be open?
« Reply #5 on: Aug 11th, 2009, 9:44am »
Quote Quote Modify Modify

I would like to see specific language in the rules about handling server failures or degradations.  My policy of continuing the game at the point of failure was not unanimously supported and it put quite a burden on Omar.  If the rules are amended to call for a restart, for example, whenever a server failure is detected, the TD would not have to be consulted each time and tournament management would be more tractable.
 
Also think about handling server failures in the qualifying games and the effects of restarts or continuations on players.  A simple restart policy might be the most effective in dealing with scheduling.
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Will the 2010 Computer Championship be open?
« Reply #6 on: Aug 11th, 2009, 10:55am »
Quote Quote Modify Modify

The tournament format of single round-robin followed by elimation among the top bots, with losses carried forward, is attractive assuming automatic scheduling can be made to work.  Given all the server problems of last year, however, I wouldn't take it as a given that we can just set up the schedule and let the tournament run.  Also there is the difficulty of letting bots run automated updates; between each round some automatic update time would have to be scheduled as well.
 
My biggest concern with the hybrid format, however, is that it inherently requires the field of bots to be limited.  The number of necessary games in a round robin is (N)(N+1)/2, which scales quadratically, whereas the number of games in a floating triple elimination is 3N, which scales linearly.  If I ever get to see a fifteen-bot Computer Championship, it will not be a round robin.  Sad
 
That brings us back to narrowing the field before the tournament.  If we must cap the field at eight players, I like jdb's idea of using bots rather than ratings as a qualifying standard.  Ideally, however, that qualifying benchmark would select exactly the top eight.  If the criterion is a winning record against Arimaazilla (or any binary indicator), we might accidentally limit the field to six or accidentally expand it to eleven.
 
In order to provide more discrimination we could limit the field to those bots which have passed the most bots in some list, say (ArimaaScoreP1, Gnobot2005P1, Arimaazilla, OpFor2008P2, Clueless2007P1, Bomb2005P2, OpFor2009Blitz, Clueless2009Blitz).  I would include bots that are not fixed-performance in order to eliminate the time handicap at the high end.  To pass a bot would mean beating it twice in a row, playing different colors in the two games.  Ties could be broken by who first passed the tied number of bots.  That should give adequate discrimination between any bots on the bubble of being in the top eight or not, as well as giving a reasonable seeding among the qualifiers.
 
If we do throw up this hurdle, which will require developers to have their bots play lots of games to qualify for the Computer Championship, then it seems reasonable to give the top three finishers from the previous year an automatic exemption.  It would be possible for the #3 bot one year to be the #9 bot the next year, but it seems rather unlikely.  I like the courtesy of not forcing bots which have proven themselves to immediately prove themselves again.
 
A field limited to eight contestants by playing a small bot ladder, followed by a tournament in jdb's format, seems like a workable solution.  I confess, however, that my preferred solution is still floating triple elimination with an open field.  A developer who participates in the Computer Championship is doing the Arimaa community a favor by adding to the fold of bots available to play in the future.  This stable of computer opponents is a fabulous asset for arimaa.com.  Having any barriers to entry beyond the technical ones outlined in the rules (compiled to linux, a way to fix performance, etc.) seems unnecessary and counterproductive.
IP Logged

jdb
Forum Guru
*****



Arimaa player #214

   


Gender: male
Posts: 682
Re: Will the 2010 Computer Championship be open?
« Reply #7 on: Aug 11th, 2009, 12:32pm »
Quote Quote Modify Modify

Bugs could be ironed out by running a blitz test tournament before the main event. This could be applied to whatever tournament format is used.
 
With the floating elimination, a bot could be 3 and out. It takes at least a hundred hours work to get a decent  bot. If someone has put that much time in, they should at least get their money's worth, so to speak.
 
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Will the 2010 Computer Championship be open?
« Reply #8 on: Aug 11th, 2009, 2:33pm »
Quote Quote Modify Modify

on Aug 11th, 2009, 9:44am, RonWeasley wrote:
I would like to see specific language in the rules about handling server failures or degradations.

I totally agree.  It is not possible to cover every situation in the rules, so a tournament director will always be necessary, but if we don't make rules about situations that have already happened, we are asking for trouble.
 
The reason to write rules in advance is that it is impossible not to have one's judgement affected by knowing which bot is helped or hurt by a specific situation.  If we don't have a rule crafted in advance, then we will know the tournament standing and the board position when the server issues occurred.  That additional knowledge will make any decision made in the moment seem unfair.  If we weren't blessed with such a great TD as Ron, there would be the potential for a lot of hard feelings.
 
So, what are the specific bugs that cropped up?
 
1) The bot sent a move, but the server didn't get it.
2) The bot was sharing the CPU instead of having the whole box to itself.
3) Bot was misconfigured due to Omar's error.
 
Were there others?  My suggestion would be to unrate the game and resume from the terminal position for (1) and to unrate the game and replay it from the start for (2) and (3).  Unrating the game is important so that GnoBot (and others) can distinguish real tournament results in the game database from disqualified results.
 
Any condition that causes a game to be replayed should cause a game to be terminated and unrated immediately if the problem is discovered mid-game.  The game should not be played out to see what happens, because it shouldn't matter what happens.  In particular, it shouldn't matter if the disadvantaged bot was winning or had already won when the issue was discovered.  It is not fair for a bot to get a replay if it lost but get a win if it won; if there are to be replays at all, they must be automatically triggered by the playing conditions independent of the actual result.
 
On the other hand, I think the game result in all three cases should stand if the error is not detected until after another game has been played that was paired based on the questionable result.  There needs to be a statue of limitations so that we aren't forced to replay every subsequent game in the tournament if we discover something unfair about the very first game.
 
I am eager to see these cases spelled out in advance, so that we don't take actions based on fallacious reasoning like, "The outcome would have been the same, so there is no need to replay," or "Bomb would have won the whole tournament except for that first-round server error we caught later."  Every year bots become less deterministic, luck plays an increasing role, and it gets more ridiculous to say we know what would have happened had the conditions been different.  We need to make the rules as clear and as ironclad as possible in advance.  We can't prevent all unfairness due to unforeseen circumstances, but if the rules are agreed upon before we know who will suffer from bad luck, then at least no one can claim the TD's decisions were biased against a particular bot.
IP Logged

Janzert
Forum Guru
*****



Arimaa player #247

   


Gender: male
Posts: 1016
Re: Will the 2010 Computer Championship be open?
« Reply #9 on: Aug 11th, 2009, 3:36pm »
Quote Quote Modify Modify

Probably your case 1 can and should be expanded to include all cases of server and network error. Two other specific errors that have occured with the server/network in the last few years are, the bot did not receive the move until a significant time after the move was played and the bot was stopped and restarted by the server in the middle of a move. The latter is an obvious occurance, the former could potentially end up having a questionable area on whether an error actually occured or not.
 
Also when the server was incorrectly restarting bots, this was not discovered until a few rounds into the tournament (after it caused a timeout for one bot). This set a precedent to have previous games stand if the error wasn't discovered until later. It would certainly be good to spell it out in the rules though.
 
Janzert
IP Logged
omar
Forum Guru
*****



Arimaa player #2

   


Gender: male
Posts: 1003
Re: Will the 2010 Computer Championship be open?
« Reply #10 on: Aug 13th, 2009, 9:24pm »
Quote Quote Modify Modify

Though it would be nice to have the tournament be open, it is much more practical to limit the number of entries. Even though triple elimination makes the tournament more practical by having about 3N games, N could always get large enough that it would get too difficult for me to run the tournament. So if the number of entrants in the tournament kept increasing each year I would eventually have to limit the number of entries anyways.
 
Only problem is that limiting the bots requires some way to determine which bots will be in the tournament. Also the tournament requires an initial ranking for the bots. In the past I've used the bots gameroom ratings for ranking the bots and never had to use the ratings to filter out any bots from participating in the tournament. Using the gameroom ratings is not a good long term solution and as Karl mentioned could be inflated by carefully selecting your opponents. Running a preliminary tournament (like the swiss we use in the human championship) would be a better solution, but the burden of running another carefully controlled tournament is not something that I want to take on.
 
The proposed solution of playing against a field of bots and playing sufficient games to provide discrimination between the entrant bots seems like the best option. In a way this is kind of similar to using ratings to rank the bots, but the ratings are based on controlled games where the opponents are assigned rather than self chosen. So we could compute a performance rating for the entrants by assigning fixed ratings to the screening bots and use this performance rating to rank the bots and select the top 8 bots. Maybe we might have some cases where two or more entrants have the same performance rating and we might have to have just these bots play against each other to break the tie or we might want to use some metric based on number of moves in the screening games as a second level tie breaker.
 
To actually implement something like this we will need to specify several things including a time frame during which the screening games must be played; the field of bots to be used in the screening and the number of games to be played against each. During the screening period the entrant bots should not be allowed to play games against the screening bots which don't count towards the screening. The developers will be running their bot on their own hardware/os during the screening. The developers should be allowed to continue modifying the bot during the screening period. Also all bots have to participate in the screening regardless of how they did the previous year. The screening will determine their rank for seeding in the upcoming tournament. Also we have to assume that bots are looking at each others game histories to determine how they will play against a particular opponent. So it would be unfair if some of the entrants did not have to play in the screening. Also we need to specify what happens when something goes wrong in a screening game; like a bot losing on time.
 
From last years tournament we've gained more experience in some of the things that can go wrong and so incorporating how such situations should be handled into the rules would be good.
 
Maybe the best way to get started with this is to make a wiki page for the 2010 Computer Championship tournament rules; starting with a copy of last years tournament rules and modify it based on the discussion here until we finalize it. Would anyone like to volunteer with creating and updating that page?
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Will the 2010 Computer Championship be open?
« Reply #11 on: Aug 14th, 2009, 12:06pm »
Quote Quote Modify Modify

on Aug 13th, 2009, 9:24pm, omar wrote:
The proposed solution of playing against a field of bots and playing sufficient games to provide discrimination between the entrant bots seems like the best option.

The idea of using a set of bots to filter entrants has grown on me as well.  It adds a barrier to all bots that want to qualify, which is bad, but it is an equal-opportunity barrier, which is at least in the spirit of openness.  
 
Quote:
In a way this is kind of similar to using ratings to rank the bots, but the ratings are based on controlled games where the opponents are assigned rather than self chosen. So we could compute a performance rating for the entrants by assigning fixed ratings to the screening bots and use this performance rating to rank the bots and select the top 8 bots.[...]
 
To actually implement something like this we will need to specify several things including a time frame during which the screening games must be played; the field of bots to be used in the screening and the number of games to be played against each. During the screening period the entrant bots should not be allowed to play games against the screening bots which don't count towards the screening. The developers will be running their bot on their own hardware/os during the screening. The developers should be allowed to continue modifying the bot during the screening period.

Hmmm, your proposed implementation is rather different than mine.  I'm not sure about the relative merits of having a fixed number of games against a fixed set of benchmark bots, versus having the games be unlimited and having the qualifying bots try to collect scalps of the benchmark bots.
 
One disadvantage of the scalp collection might be excessive games.  Perhaps one developer has trouble beating Bomb2005P1, so he sets up his bot to play it incessantly until getting lucky enough to win two in a row.  We would want to reward the best bots, not the most persistent.
 
One disadvantage of the fixed number of games is having only one shot.  If a developer discovers a fixable bug during the series, the loss is already on the books.  This makes it harder to determine which bot is best as of the tournament entry date, as opposed to the best bot at the time it played its qualifying games.
 
Another disadvantage of the fixed number of games is that the benchmark bots become off-limits to developers for testing/tuning their bots.  We would want to make sure there are still plenty of non-benchmark bots for testing against.
 
If we do go with a fixed number of games, then there is no need to calculate performance ratings.  We can simply take the number of wins.  If there were two games each against ten benchmark bots, then the number of wins out of the twenty games is the correct measure, regardless of the ratings of the bots.  The only reason ratings are involved in measuring performance is to compare the relative value of a win against a strong opponent compared to a win against a weak opponent.  For qualifying, we should take strength of schedule out of the equation by defining the opposition.  When strength of schedule is fixed, the number of wins is the definitive measure of performance.
 
Quote:
Also all bots have to participate in the screening regardless of how they did the previous year. The screening will determine their rank for seeding in the upcoming tournament. Also we have to assume that bots are looking at each others game histories to determine how they will play against a particular opponent. So it would be unfair if some of the entrants did not have to play in the screening.

For seeding alone we wouldn't have to push everyone through qualifying.  The top finishers from last year could be given not only the automatic berths, but also the top seeds.  But it is definitely true that it is not fair to have automatic qualifiers.  Now that you mention it, I agree that the fairness is more important than the courtesy to past winners.
 
Quote:
Maybe the best way to get started with this is to make a wiki page for the 2010 Computer Championship tournament rules; starting with a copy of last years tournament rules and modify it based on the discussion here until we finalize it. Would anyone like to volunteer with creating and updating that page?

Sure, I'll volunteer to draft new rules.
IP Logged

jdb
Forum Guru
*****



Arimaa player #214

   


Gender: male
Posts: 682
Re: Will the 2010 Computer Championship be open?
« Reply #12 on: Aug 14th, 2009, 12:54pm »
Quote Quote Modify Modify

Quote:
One disadvantage of the fixed number of games is having only one shot.  If a developer discovers a fixable bug during the series, the loss is already on the books.  This makes it harder to determine which bot is best as of the tournament entry date, as opposed to the best bot at the time it played its qualifying games.

 
Instead of using a fixed number of games against a set of bots, maybe just count the longest winning streak against each bot. Capped at some suitable number like 10 or so.
 
This allows developers to fix their bugs and still obtain a "score" against each bot.
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Will the 2010 Computer Championship be open?
« Reply #13 on: Aug 14th, 2009, 1:05pm »
Quote Quote Modify Modify

on Aug 14th, 2009, 12:54pm, jdb wrote:
Instead of using a fixed number of games against a set of bots, maybe just count the longest winning streak against each bot. Capped at some suitable number like 10 or so.
 
This allows developers to fix their bugs and still obtain a "score" against each bot.

I like the longest streak idea!  To make it more explicit, you would add together the winning streak lengths against all benchmark bots for a total qualifying score?
 
I might cap it lower, say at four games.  It would be convenient for a developer with a great bot to be able to "max out" the scale in a reasonable time.  With even a four-game streak and only eight benchmark bots, it would take a minimum of 32 games to run the table.  Also I would specify that the games in the streak must alternate colors.
« Last Edit: Aug 14th, 2009, 1:07pm by Fritzlein » IP Logged

jdb
Forum Guru
*****



Arimaa player #214

   


Gender: male
Posts: 682
Re: Will the 2010 Computer Championship be open?
« Reply #14 on: Aug 14th, 2009, 4:47pm »
Quote Quote Modify Modify

on Aug 14th, 2009, 1:05pm, Fritzlein wrote:

I like the longest streak idea!  To make it more explicit, you would add together the winning streak lengths against all benchmark bots for a total qualifying score?
 
I might cap it lower, say at four games.  It would be convenient for a developer with a great bot to be able to "max out" the scale in a reasonable time.  With even a four-game streak and only eight benchmark bots, it would take a minimum of 32 games to run the table.  Also I would specify that the games in the streak must alternate colors.

 
Your suggestion looks reasonable to me.
 
The actual mechanics  would depend on what the benchmark bots are being used for. If it is being used to provide a ranking for the tournament, the top end of the ladder needs to be tough, in order to separate the better bots. I don't know what ELO difference is approximated by an N game winning streak.
IP Logged
Pages: 1 2 3  ...  11 Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.