Arimaa Forum (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
Arimaa >> Events >> Will the 2010 Computer Championship be open?
(Message started by: Fritzlein on Aug 10th, 2009, 5:29pm)

Title: Will the 2010 Computer Championship be open?
Post by Fritzlein on Aug 10th, 2009, 5:29pm
In the past we have had sporadic discussion of whether the Computer Championship should be invitation-only for the top bots, or should be open to all interested parties.  The debate never came to a head, though, because the limit of eight imposed by Omar has always been greater than the number of interested participants.  In 2010 this theoretical discussion will probably have practical importance for the first time due to more than eight developers wanting to participate.

I am vehemently in favor of the Computer Championship being open, because there is no good way to narrow the field.  The rules last year stated that the eight bots with the highest rating would qualify, but ratings can easily be manipulated, so this is a terrible selection criterion.  For starters, the faster a bot plays, the higher its rating.  I can imagine developers trying to qualify by having their bots play only games at five seconds per move!  But even worse, a developer could pick an opposing bot that their own bot knows how to beat, and play that bot incessantly.  Do we really want a bot to qualify by virtue of having beating ArimaaScoreP1 five hundred times?

If we had a better way of selecting the top eight bots, such as the Open Classic tournament which decides the eight finalists for the human World Championship, then I would have no problems with a limited field for the Computer Championship.  But qualifying on the basis of ratings is so prone to abuse that it must be scrapped.

And what is the problem with having more bots, anyway?  With the floating triple elimination format, each additional bot adds only three additional games to the tournament length.  Having ten or twelve bots instead of only eight would still be very doable in the short time frame of the server rentals.  In fact, it would be wonderful to see so many bots signed up.

If we are afraid of too many spurious entrants, then we could always raise the entry fee to $30, so that instead of needing one win to break even, a bot would need three wins, i.e. an even 3-3 record.  I believe that a higher entry fee would be less of a burden on serious developers than having to waste time artificially pumping up the bot's rating in order to be sure of being in the top eight.

How to other people feel about limiting the field of the Computer Championships to eight bots versus having an open tournament?  If Omar declares that the field must be narrowed somehow, what would be the best way to narrow it?

Title: Re: Will the 2010 Computer Championship be open?
Post by 99of9 on Aug 10th, 2009, 5:48pm
I'm glad you raised this Fritz, I think it's important to think about.  I'm not sure what the best system is, but here's one thing I already know:

The previous year's champion (and perhaps places 2&3) should be given automatic entry.  It would be very disappointing not to see the champion defend its title because of ratings or any other selection reason.

Title: Re: Will the 2010 Computer Championship be open?
Post by Simon on Aug 10th, 2009, 9:58pm
I agree that it's preferable not to exclude any bots based on ratings. Having a preliminary set of rounds for qualification would also allow seeding based on something other than highly manipulable gameroom ratings, and for bots it doesn't need to add all that much time to the tournament. If there is only a little more than 8 bots though, it would probably not be worth the trouble, so just extending the existing tournament format to more bots as you suggest would make sense.

Automatic qualification of top previous competitors makes sense if there are exclusions based on ratings, but if there is a pre-qualification tournament I think every bot should compete equally.

Title: Re: Will the 2010 Computer Championship be open?
Post by jdb on Aug 11th, 2009, 5:06am
If Omar posts the schedule in November, I guess this is a good time to discuss the tournament.

1. Instead of completely open, the entrants should have to demonstrate a winning record against some fixed performance bot. My first thought would be something along the lines of bot_arimaazilla. This would keep the random move level bots out, but not set the bar too high.

2. A hybrid tournament format of a Round robin followed by a floating elimination with losses from the round robin carried forward is worth looking at. It performs much better than straight floating elimination in the simulator, and allows early rounds to be scheduled in advance. With this method, all the bots get to play each other.

Title: Re: Will the 2010 Computer Championship be open?
Post by Janzert on Aug 11th, 2009, 9:13am
I really like Jdb's hybrid format proposal. One thing to keep in mind though, anything that increases the number of games needed for the whole tournament is going to require more automation since I believe Omar is already pretty close to the limit of time he can spend running the tournament.

For Jdb's proposal specifically this probably means at least the roundrobin portion of the tournament would need to be able to run without any intervention by Omar once it was started. Certainly something that is possible to do, but is not in place currently (in past CC's Omar has had to start each game manually).

Janzert

Title: Re: Will the 2010 Computer Championship be open?
Post by RonWeasley on Aug 11th, 2009, 9:44am
I would like to see specific language in the rules about handling server failures or degradations.  My policy of continuing the game at the point of failure was not unanimously supported and it put quite a burden on Omar.  If the rules are amended to call for a restart, for example, whenever a server failure is detected, the TD would not have to be consulted each time and tournament management would be more tractable.

Also think about handling server failures in the qualifying games and the effects of restarts or continuations on players.  A simple restart policy might be the most effective in dealing with scheduling.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Aug 11th, 2009, 10:55am
The tournament format of single round-robin followed by elimation among the top bots, with losses carried forward, is attractive assuming automatic scheduling can be made to work.  Given all the server problems of last year, however, I wouldn't take it as a given that we can just set up the schedule and let the tournament run.  Also there is the difficulty of letting bots run automated updates; between each round some automatic update time would have to be scheduled as well.

My biggest concern with the hybrid format, however, is that it inherently requires the field of bots to be limited.  The number of necessary games in a round robin is (N)(N+1)/2, which scales quadratically, whereas the number of games in a floating triple elimination is 3N, which scales linearly.  If I ever get to see a fifteen-bot Computer Championship, it will not be a round robin.  :(

That brings us back to narrowing the field before the tournament.  If we must cap the field at eight players, I like jdb's idea of using bots rather than ratings as a qualifying standard.  Ideally, however, that qualifying benchmark would select exactly the top eight.  If the criterion is a winning record against Arimaazilla (or any binary indicator), we might accidentally limit the field to six or accidentally expand it to eleven.

In order to provide more discrimination we could limit the field to those bots which have passed the most bots in some list, say (ArimaaScoreP1, Gnobot2005P1, Arimaazilla, OpFor2008P2, Clueless2007P1, Bomb2005P2, OpFor2009Blitz, Clueless2009Blitz).  I would include bots that are not fixed-performance in order to eliminate the time handicap at the high end.  To pass a bot would mean beating it twice in a row, playing different colors in the two games.  Ties could be broken by who first passed the tied number of bots.  That should give adequate discrimination between any bots on the bubble of being in the top eight or not, as well as giving a reasonable seeding among the qualifiers.

If we do throw up this hurdle, which will require developers to have their bots play lots of games to qualify for the Computer Championship, then it seems reasonable to give the top three finishers from the previous year an automatic exemption.  It would be possible for the #3 bot one year to be the #9 bot the next year, but it seems rather unlikely.  I like the courtesy of not forcing bots which have proven themselves to immediately prove themselves again.

A field limited to eight contestants by playing a small bot ladder, followed by a tournament in jdb's format, seems like a workable solution.  I confess, however, that my preferred solution is still floating triple elimination with an open field.  A developer who participates in the Computer Championship is doing the Arimaa community a favor by adding to the fold of bots available to play in the future.  This stable of computer opponents is a fabulous asset for arimaa.com.  Having any barriers to entry beyond the technical ones outlined in the rules (compiled to linux, a way to fix performance, etc.) seems unnecessary and counterproductive.

Title: Re: Will the 2010 Computer Championship be open?
Post by jdb on Aug 11th, 2009, 12:32pm
Bugs could be ironed out by running a blitz test tournament before the main event. This could be applied to whatever tournament format is used.

With the floating elimination, a bot could be 3 and out. It takes at least a hundred hours work to get a decent  bot. If someone has put that much time in, they should at least get their money's worth, so to speak.


Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Aug 11th, 2009, 2:33pm

on 08/11/09 at 09:44:03, RonWeasley wrote:
I would like to see specific language in the rules about handling server failures or degradations.

I totally agree.  It is not possible to cover every situation in the rules, so a tournament director will always be necessary, but if we don't make rules about situations that have already happened, we are asking for trouble.

The reason to write rules in advance is that it is impossible not to have one's judgement affected by knowing which bot is helped or hurt by a specific situation.  If we don't have a rule crafted in advance, then we will know the tournament standing and the board position when the server issues occurred.  That additional knowledge will make any decision made in the moment seem unfair.  If we weren't blessed with such a great TD as Ron, there would be the potential for a lot of hard feelings.

So, what are the specific bugs that cropped up?

1) The bot sent a move, but the server didn't get it.
2) The bot was sharing the CPU instead of having the whole box to itself.
3) Bot was misconfigured due to Omar's error.

Were there others?  My suggestion would be to unrate the game and resume from the terminal position for (1) and to unrate the game and replay it from the start for (2) and (3).  Unrating the game is important so that GnoBot (and others) can distinguish real tournament results in the game database from disqualified results.

Any condition that causes a game to be replayed should cause a game to be terminated and unrated immediately if the problem is discovered mid-game.  The game should not be played out to see what happens, because it shouldn't matter what happens.  In particular, it shouldn't matter if the disadvantaged bot was winning or had already won when the issue was discovered.  It is not fair for a bot to get a replay if it lost but get a win if it won; if there are to be replays at all, they must be automatically triggered by the playing conditions independent of the actual result.

On the other hand, I think the game result in all three cases should stand if the error is not detected until after another game has been played that was paired based on the questionable result.  There needs to be a statue of limitations so that we aren't forced to replay every subsequent game in the tournament if we discover something unfair about the very first game.

I am eager to see these cases spelled out in advance, so that we don't take actions based on fallacious reasoning like, "The outcome would have been the same, so there is no need to replay," or "Bomb would have won the whole tournament except for that first-round server error we caught later."  Every year bots become less deterministic, luck plays an increasing role, and it gets more ridiculous to say we know what would have happened had the conditions been different.  We need to make the rules as clear and as ironclad as possible in advance.  We can't prevent all unfairness due to unforeseen circumstances, but if the rules are agreed upon before we know who will suffer from bad luck, then at least no one can claim the TD's decisions were biased against a particular bot.

Title: Re: Will the 2010 Computer Championship be open?
Post by Janzert on Aug 11th, 2009, 3:36pm
Probably your case 1 can and should be expanded to include all cases of server and network error. Two other specific errors that have occured with the server/network in the last few years are, the bot did not receive the move until a significant time after the move was played and the bot was stopped and restarted by the server in the middle of a move. The latter is an obvious occurance, the former could potentially end up having a questionable area on whether an error actually occured or not.

Also when the server was incorrectly restarting bots, this was not discovered until a few rounds into the tournament (after it caused a timeout for one bot). This set a precedent to have previous games stand if the error wasn't discovered until later. It would certainly be good to spell it out in the rules though.

Janzert

Title: Re: Will the 2010 Computer Championship be open?
Post by omar on Aug 13th, 2009, 9:24pm
Though it would be nice to have the tournament be open, it is much more practical to limit the number of entries. Even though triple elimination makes the tournament more practical by having about 3N games, N could always get large enough that it would get too difficult for me to run the tournament. So if the number of entrants in the tournament kept increasing each year I would eventually have to limit the number of entries anyways.

Only problem is that limiting the bots requires some way to determine which bots will be in the tournament. Also the tournament requires an initial ranking for the bots. In the past I've used the bots gameroom ratings for ranking the bots and never had to use the ratings to filter out any bots from participating in the tournament. Using the gameroom ratings is not a good long term solution and as Karl mentioned could be inflated by carefully selecting your opponents. Running a preliminary tournament (like the swiss we use in the human championship) would be a better solution, but the burden of running another carefully controlled tournament is not something that I want to take on.

The proposed solution of playing against a field of bots and playing sufficient games to provide discrimination between the entrant bots seems like the best option. In a way this is kind of similar to using ratings to rank the bots, but the ratings are based on controlled games where the opponents are assigned rather than self chosen. So we could compute a performance rating for the entrants by assigning fixed ratings to the screening bots and use this performance rating to rank the bots and select the top 8 bots. Maybe we might have some cases where two or more entrants have the same performance rating and we might have to have just these bots play against each other to break the tie or we might want to use some metric based on number of moves in the screening games as a second level tie breaker.

To actually implement something like this we will need to specify several things including a time frame during which the screening games must be played; the field of bots to be used in the screening and the number of games to be played against each. During the screening period the entrant bots should not be allowed to play games against the screening bots which don't count towards the screening. The developers will be running their bot on their own hardware/os during the screening. The developers should be allowed to continue modifying the bot during the screening period. Also all bots have to participate in the screening regardless of how they did the previous year. The screening will determine their rank for seeding in the upcoming tournament. Also we have to assume that bots are looking at each others game histories to determine how they will play against a particular opponent. So it would be unfair if some of the entrants did not have to play in the screening. Also we need to specify what happens when something goes wrong in a screening game; like a bot losing on time.

From last years tournament we've gained more experience in some of the things that can go wrong and so incorporating how such situations should be handled into the rules would be good.

Maybe the best way to get started with this is to make a wiki page for the 2010 Computer Championship tournament rules; starting with a copy of last years tournament rules and modify it based on the discussion here until we finalize it. Would anyone like to volunteer with creating and updating that page?

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Aug 14th, 2009, 12:06pm

on 08/13/09 at 21:24:33, omar wrote:
The proposed solution of playing against a field of bots and playing sufficient games to provide discrimination between the entrant bots seems like the best option.

The idea of using a set of bots to filter entrants has grown on me as well.  It adds a barrier to all bots that want to qualify, which is bad, but it is an equal-opportunity barrier, which is at least in the spirit of openness.


Quote:
In a way this is kind of similar to using ratings to rank the bots, but the ratings are based on controlled games where the opponents are assigned rather than self chosen. So we could compute a performance rating for the entrants by assigning fixed ratings to the screening bots and use this performance rating to rank the bots and select the top 8 bots.[...]

To actually implement something like this we will need to specify several things including a time frame during which the screening games must be played; the field of bots to be used in the screening and the number of games to be played against each. During the screening period the entrant bots should not be allowed to play games against the screening bots which don't count towards the screening. The developers will be running their bot on their own hardware/os during the screening. The developers should be allowed to continue modifying the bot during the screening period.

Hmmm, your proposed implementation is rather different than mine.  I'm not sure about the relative merits of having a fixed number of games against a fixed set of benchmark bots, versus having the games be unlimited and having the qualifying bots try to collect scalps of the benchmark bots.

One disadvantage of the scalp collection might be excessive games.  Perhaps one developer has trouble beating Bomb2005P1, so he sets up his bot to play it incessantly until getting lucky enough to win two in a row.  We would want to reward the best bots, not the most persistent.

One disadvantage of the fixed number of games is having only one shot.  If a developer discovers a fixable bug during the series, the loss is already on the books.  This makes it harder to determine which bot is best as of the tournament entry date, as opposed to the best bot at the time it played its qualifying games.

Another disadvantage of the fixed number of games is that the benchmark bots become off-limits to developers for testing/tuning their bots.  We would want to make sure there are still plenty of non-benchmark bots for testing against.

If we do go with a fixed number of games, then there is no need to calculate performance ratings.  We can simply take the number of wins.  If there were two games each against ten benchmark bots, then the number of wins out of the twenty games is the correct measure, regardless of the ratings of the bots.  The only reason ratings are involved in measuring performance is to compare the relative value of a win against a strong opponent compared to a win against a weak opponent.  For qualifying, we should take strength of schedule out of the equation by defining the opposition.  When strength of schedule is fixed, the number of wins is the definitive measure of performance.


Quote:
Also all bots have to participate in the screening regardless of how they did the previous year. The screening will determine their rank for seeding in the upcoming tournament. Also we have to assume that bots are looking at each others game histories to determine how they will play against a particular opponent. So it would be unfair if some of the entrants did not have to play in the screening.

For seeding alone we wouldn't have to push everyone through qualifying.  The top finishers from last year could be given not only the automatic berths, but also the top seeds.  But it is definitely true that it is not fair to have automatic qualifiers.  Now that you mention it, I agree that the fairness is more important than the courtesy to past winners.


Quote:
Maybe the best way to get started with this is to make a wiki page for the 2010 Computer Championship tournament rules; starting with a copy of last years tournament rules and modify it based on the discussion here until we finalize it. Would anyone like to volunteer with creating and updating that page?

Sure, I'll volunteer to draft new rules.

Title: Re: Will the 2010 Computer Championship be open?
Post by jdb on Aug 14th, 2009, 12:54pm

Quote:
One disadvantage of the fixed number of games is having only one shot.  If a developer discovers a fixable bug during the series, the loss is already on the books.  This makes it harder to determine which bot is best as of the tournament entry date, as opposed to the best bot at the time it played its qualifying games.


Instead of using a fixed number of games against a set of bots, maybe just count the longest winning streak against each bot. Capped at some suitable number like 10 or so.

This allows developers to fix their bugs and still obtain a "score" against each bot.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Aug 14th, 2009, 1:05pm

on 08/14/09 at 12:54:41, jdb wrote:
Instead of using a fixed number of games against a set of bots, maybe just count the longest winning streak against each bot. Capped at some suitable number like 10 or so.

This allows developers to fix their bugs and still obtain a "score" against each bot.

I like the longest streak idea!  To make it more explicit, you would add together the winning streak lengths against all benchmark bots for a total qualifying score?

I might cap it lower, say at four games.  It would be convenient for a developer with a great bot to be able to "max out" the scale in a reasonable time.  With even a four-game streak and only eight benchmark bots, it would take a minimum of 32 games to run the table.  Also I would specify that the games in the streak must alternate colors.

Title: Re: Will the 2010 Computer Championship be open?
Post by jdb on Aug 14th, 2009, 4:47pm

on 08/14/09 at 13:05:12, Fritzlein wrote:
I like the longest streak idea!  To make it more explicit, you would add together the winning streak lengths against all benchmark bots for a total qualifying score?

I might cap it lower, say at four games.  It would be convenient for a developer with a great bot to be able to "max out" the scale in a reasonable time.  With even a four-game streak and only eight benchmark bots, it would take a minimum of 32 games to run the table.  Also I would specify that the games in the streak must alternate colors.


Your suggestion looks reasonable to me.

The actual mechanics  would depend on what the benchmark bots are being used for. If it is being used to provide a ranking for the tournament, the top end of the ladder needs to be tough, in order to separate the better bots. I don't know what ELO difference is approximated by an N game winning streak.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Aug 14th, 2009, 6:54pm
If you play enough games, you can get as long a wining streak as you like, regardless of the rating difference.  The table below shows how many games one should expect to play on average to get a certain length of winning streak given a fixed rating difference:

diff\streak 1     2     3    .    4
-500    18.8  371.6  6998.0  131461.2
-400    11.0  132.0  1463.0   16104.0
-300     6.6   50.5   341.1    2265.6
-200     4.2   21.5    93.6     393.7
-100     2.8   10.5    31.9 .    91.5
   0     2.0    6.0    14.0 .    30.0
 100     1.6    4.0     7.8 .    13.8
 200     1.3    3.0     5.3 .     8.3
 300     1.2    2.6     4.2 .     6.1
 400     1.1    2.3     3.6 .     5.1
 500     1.1    2.2     3.4 .     4.6


Of course this assumes the rating formula is correct, including independence of consecutive trials, which clearly doesn't hold, but it should give a general sense of the futility of weak bots trying to get a four-game wining streak against benchmark bots that are too strong for them.

Admittedly, a developer who is willing to have his bot play 400 games during qualifying would have an advantage over a developer who was only willing to have his bot play 40.  However, all that extra time would only get a bot one or two notches further down the ladder of benchmark bots.

Is that too much encouragement for playing excessive games?  Would excessive games be worse than the finality of specifying exactly four games against each benchmark bot with those results to determine seeding regardless of later improvement?  Maybe.  I am not sure.

Title: Re: Will the 2010 Computer Championship be open?
Post by ChrisB on Aug 15th, 2009, 12:55am

on 08/14/09 at 18:54:44, Fritzlein wrote:
Is that too much encouragement for playing excessive games?  Would excessive games be worse than the finality of specifying exactly four games against each benchmark bot with those results to determine seeding regardless of later improvement?  Maybe.  I am not sure.


Perhaps we could limit the number of games against each benchmark bot to, say, three times the maximum-counted streak.  That is, if the maximum-counted streak is four, we could have a cap of, say, 12 games against each benchmark bot.  That would give the developer some opportunity to improve the bot, if it initially doesn't do well against a benchmark bot.

Then, if two candidate bots have the same sum-of-the-streaks score, a tiebreaker could be the total games played against all the benchmark bots.

Title: Re: Will the 2010 Computer Championship be open?
Post by tize on Aug 15th, 2009, 5:25am
Why not just count the last 4 games, seams easier to check. This will allow for bugs to be fixed and a bot having a streak of 4 wouldn't continue playing anyway.

Also a status page should exist to easily see the present score. And also that every bot developer could see their seeding and if the bot is qualified.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Aug 15th, 2009, 9:04am
That's a good point about the status page, tize.  While we are discussing about what qualification format to use, we should keep in mind how difficult it would be for Omar to implement.  The latest four games against the benchmark bots would be quite easy to implement, so I like it.  Unfortunately it loses the requirement of alternating colors.

My suggestion of mandating alternating colors is not so easy to enforce in a query, but something that would be fairly easy to do in a query is taking the last two games as Silver and the last two games as Gold.  But then that would imply getting two winning streaks of two games each, which can be much easier than getting one winning streak of four games, so the ease of implementation undermines the intent this way too.

Omar's notion of taking a fixed number of games is not undermined by enforcing color: taking the first two games as Gold and the first two games as Silver (instead of the first four games) is consistent with the intent of measuring performance.

Title: Re: Will the 2010 Computer Championship be open?
Post by omar on Aug 15th, 2009, 4:29pm
Wow, that's an interesting oberservation Karl; if the opponents are fixed and the number of times you must play each opponent is fixed then the opponent ratings don't matter, just the number of games won does. I tested it and you're right:
 p2 +1800 -1500 gives 1629
 p2 -1800 +1500 gives 1629

My reason for fixing the number of games was so that an entrant bot would not play many, many games against the screening bot(s) that it knows how to defeat in order to inflate the rating.

What if the requirement was that during the screening period you must play each screening bot at least once with each color and not more than 5 times (numbers could vary) with each color and only the most recent games which have a corresponding opposite color game are counted. This way the entrant bots do have some leeway in picking their opponents to try and maximize their performance rating, but can't go wild trying to inflate ratings from the same defeated screening bots. Note that there is no explicit requirement that any screening bot must be defeated in order to qualify. The entrant bots are just competing with each other to maximize their performance rating. I would further refine this to set the max number of games for lower rated bots to lower values and for higher rated bots to higher values. For example against a 1300 rated screening bot the max games that can be played may be 5 with each color and against a 1800 rated screening bot the max games may be 10 with each color.

I can setup a status page to show the current ranking of the bots based on the performance rating.

Thank you Karl for volunteering to start the page for 2010 Computer Championship tournament rules.

Title: Re: Will the 2010 Computer Championship be open?
Post by jdb on Aug 15th, 2009, 8:20pm

on 08/15/09 at 16:29:10, omar wrote:
Wow, that's an interesting oberservation Karl; if the opponents are fixed and the number of times you must play each opponent is fixed then the opponent ratings don't matter, just the number of games won does. I tested it and you're right:
 p2 +1800 -1500 gives 1629
 p2 -1800 +1500 gives 1629

My reason for fixing the number of games was so that an entrant bot would not play many, many games against the screening bot(s) that it knows how to defeat in order to inflate the rating.

What if the requirement was that during the screening period you must play each screening bot at least once with each color and not more than 5 times (numbers could vary) with each color and only the most recent games which have a corresponding opposite color game are counted. This way the entrant bots do have some leeway in picking their opponents to try and maximize their performance rating, but can't go wild trying to inflate ratings from the same defeated screening bots. Note that there is no explicit requirement that any screening bot must be defeated in order to qualify. The entrant bots are just competing with each other to maximize their performance rating. I would further refine this to set the max number of games for lower rated bots to lower values and for higher rated bots to higher values. For example against a 1300 rated screening bot the max games that can be played may be 5 with each color and against a 1800 rated screening bot the max games may be 10 with each color.

I can setup a status page to show the current ranking of the bots based on the performance rating.

Thank you Karl for volunteering to start the page for 2010 Computer Championship tournament rules.


So if I understand correctly, if the opponents are restricted to just the screening bots, and there is  maximum number of games for each individual screening bot, then the number of total wins is a valid performance rating?





Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Aug 16th, 2009, 6:28am

on 08/15/09 at 20:20:51, jdb wrote:
So if I understand correctly, if the opponents are restricted to just the screening bots, and there is  maximum number of games for each individual screening bot, then the number of total wins is a valid performance rating?

Against a fixed schedule, the number of wins determines the performance rating.  The order of wins is irrelevant.  If I beat Bomb2005Blitz and lose to ArimaaScoreP1 it is equivalent to beating ArimaaScoreP1 and losing to Bomb2005Blitz.  You might say the former performance is more erratic and the latter more consistent, but by the performance rating formula they are the same quality.

This is not a mathematical truth so much as a philosophical commitment.  Everyone just seems to agree that wins are fundamental and ratings are derived.  Ratings permit us to compare wins against different opposition (e.g. is 4 of 5 against ArimaaScoreP1 better or worse than 1 of 5 against Bomb2005Blitz?), but when the opposition is constant (e.g. playing five games against each opponent), equal wins must produce equal performance ratings, or else people will say the measurement is broken.  Therefore, starting from that philosophical commitment, a formula for performance rating has been derived that respects the "equal wins means equal performance" mandate.

Omar, I strongly recommend that you not calculate performance ratings at all; it introduces an unnecessary confusion and complication.  For starters, calculating performance ratings would require us to fix the ratings of the benchmark bots in order to be fair.  The outcome should not depend on whether a benchmark bot's rating was higher or lower at the particular time a qualifying bot played it.  But we don't know what level to fix their ratings at, and we don't want them to have fixed ratings in general.  Furthermore there is no benefit to dragging ratings into it, because number of wins provides exactly the same ranking of bots that calculating a performance rating would.  And finally, if we allow multiple plays to try to get a better result, then that will make a calculated performance rating not a true measure anyway, i.e. it would be inflated relative to the fixed ratings we chose for the benchmark bots.

My latest (still tentative) thought:
1) Fix eight benchmark bots, spanning the range from the weakest to the strongest Omar has available.
2) Fix a starting date.  Before that date no games count.
3) Fix a minimum number of games per benchmark bot per color, and a maximum number of games per benchmark bot per color.  I suggest two for the minimum and five for the maximum.
4) The point total for each qualifying bot is calculated as follows.  For each for benchmark bot, for each color, count the number of wins in the two most recent games against the qualifying bot that are after the starting date but before the maximum of five games played.  Thus if the qualifying bot has played the benchmark bot twenty times with that color since the starting date, count only the fourth and fifth games.  The best possible score is 32 and the worst possible is 0.
5) For a tiebreaker, count the number of qualifying games played, with a lower number being better.  The best possible tiebreaker is 32 and the worst possible is 80.
6) For a second tiebreaker, measure the time of the last qualifying game to be played, where earlier is better.

So, the objective for developers will be to get a two-game winning streak against each benchmark bot with each color.  There is a five-game window in which to achieve this two-game winning streak, so fixable bugs that cause a loss are not fatal.  On the other hand, we have ruled out the qualifying bot playing incessantly until it gets lucky.

There is some small element of gambling.  If your first three games against a particular bot with a particular color are loss-win-loss, should you play two more?  You could improve your score with a win-win, but would lower it with a loss-loss.

Of course, if your two most recent games are loss-win, then playing one more can't hurt your score and might help by producing the desired two-game winning streak.  And if at any point you get two wins in a row, you should stop playing, not only so as not to risk points for no benefit, but also to preserve a lower tiebreak score.

Counter-suggestions are welcome.

Title: Re: Will the 2010 Computer Championship be open?
Post by jdb on Aug 16th, 2009, 10:13am

Quote:
My latest (still tentative) thought:
1) Fix eight benchmark bots, spanning the range from the weakest to the strongest Omar has available.
2) Fix a starting date.  Before that date no games count.
3) Fix a minimum number of games per benchmark bot per color, and a maximum number of games per benchmark bot per color.  I suggest two for the minimum and five for the maximum.
4) The point total for each qualifying bot is calculated as follows.  For each for benchmark bot, for each color, count the number of wins in the two most recent games against the qualifying bot that are after the starting date but before the maximum of five games played.  Thus if the qualifying bot has played the benchmark bot twenty times with that color since the starting date, count only the fourth and fifth games.  The best possible score is 32 and the worst possible is 0.
5) For a tiebreaker, count the number of qualifying games played, with a lower number being better.  The best possible tiebreaker is 32 and the worst possible is 80.
6) For a second tiebreaker, measure the time of the last qualifying game to be played, where earlier is better.


Looks like a good start.

If the second tie breaker is number of games played, there is little benefit to setting a maximum number of games against a bot. If an entrant plays 200 extra games against a bot, it will show up in the second tie breaker.

Setting a minimum number of games is only useful if an entrant is much weaker than a bot. Assuming there is no restriction on the maximum number of games, an entrant might as well keep trying until they eventually get a win.

If there is a maximum number of games, bot developers will wait to play qualifying games until their bot is as strong as possible. Without the maximum restriction, the only penalty in the second tie breaker. Developers would be much more likely to try their entries against the reference bots earlier. This would allow the standings page to be more meaningful.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Aug 16th, 2009, 2:39pm
JDB, are you saying that you would like number of games played to be second tie-breaker rather than first?

The thought behind setting a maximum is to not overly reward persistence compared to skill, but maybe its not a problem to give a large persistence bonus.

Title: Re: Will the 2010 Computer Championship be open?
Post by jdb on Aug 16th, 2009, 3:02pm
No, that was my mistake.

The tiebreakers should be as written in your original post.


Title: Re: Will the 2010 Computer Championship be open?
Post by tize on Aug 17th, 2009, 4:48am
Is there a real benefit of having a minimum number of games.

A bot that only having played 25 games, should that bot be automatic out. Even if all games where won?

Title: Re: Will the 2010 Computer Championship be open?
Post by aaaa on Aug 17th, 2009, 6:34am
A developer could fraudulently make moves on behalf of his bot.

Title: Re: Will the 2010 Computer Championship be open?
Post by Janzert on Aug 17th, 2009, 6:45am
I've been out of town for a bit and haven't had a chance to do more than skim over the discussion so far. One thing I'd like to raise though is that I would really like to see the previous years champion given an automatic slot. It could still be ranked by the qualification process but even if it falls below the lowest slot I'd like to have it still put in as the lowest seeded entry.

Janzert

Title: Re: Will the 2010 Computer Championship be open?
Post by Arimabuff on Aug 17th, 2009, 7:05am

on 08/17/09 at 06:34:40, aaaa wrote:
A developer could fraudulently make moves on behalf of his bot.

Not during the final phase of the championship (which is what really counts). Otherwise, what good would that do to fraudulently qualify a bot that's not good enough? It won't win anyway. Besides a bot that has a shot at making the final, is most likely better than its programmer is.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Aug 18th, 2009, 9:28am

on 08/17/09 at 04:48:38, tize wrote:
Is there a real benefit of having a minimum number of games.

A bot that only having played 25 games, should that bot be automatic out. Even if all games where won?

You are right, having a minimum doesn't make sense.  If a bot can accumulate enough points without playing every opponent, then more power to it.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Aug 18th, 2009, 9:30am

on 08/17/09 at 06:45:35, Janzert wrote:
One thing I'd like to raise though is that I would really like to see the previous years champion given an automatic slot.

Could you explain more why this is important to you, and what you see as the pros and cons of exempting one (or more) bots from qualifying?

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Aug 18th, 2009, 9:35am

on 08/17/09 at 06:34:40, aaaa wrote:
A developer could fraudulently make moves on behalf of his bot.

This is the flip side of the cheating issue.  Since bots are weak, we don't have to worry much about humans getting bot assistance to cheat in the World Championship, but we do have to worry about bots getting human assistance to cheat in the Computer Championship.

Would you care to propose a solution?

Title: Re: Will the 2010 Computer Championship be open?
Post by aaaa on Aug 18th, 2009, 10:45am
Possibly a weakening of the current tournament after making it open. Floating double elimination until X bots remain (possibly dynamically calculated based on participation), who will then receive an extra life. With seeding becoming more important in this structure, protect last year's top-Y finishers. Y could also be dynamically determined by specifically looking at the record of that year. E.g., those with a plus or non-negative score, for this year making Y=2 and Y=4 respectively.

Title: Re: Will the 2010 Computer Championship be open?
Post by Arimabuff on Aug 18th, 2009, 11:21am

on 08/18/09 at 09:35:08, Fritzlein wrote:
This is the flip side of the cheating issue.  Since bots are weak, we don't have to worry much about humans getting bot assistance to cheat in the World Championship, but we do have to worry about bots getting human assistance to cheat in the Computer Championship.

Would you care to propose a solution?

Aren't bots running on Omar's computer for the important games? That's all that counts in my view. If a weak bot make it to play these games, it'll be massacred by the legitimate ones and the fraud will become self-evident.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Aug 18th, 2009, 2:27pm

on 08/18/09 at 11:21:33, Arimabuff wrote:
Aren't bots running on Omar's computer for the important games?

Yes, the tournament games are run on Omar's computer, but it would be a shame for some bot to not even make it that far after having been squeezed out by a cheater in the qualifying.  Then it would be small consolation to see the cheater eliminated from the real tournament.

Title: Re: Will the 2010 Computer Championship be open?
Post by Arimabuff on Aug 18th, 2009, 4:17pm

on 08/18/09 at 14:27:24, Fritzlein wrote:
Yes, the tournament games are run on Omar's computer, but it would be a shame for some bot to not even make it that far after having been squeezed out by a cheater in the qualifying.  Then it would be small consolation to see the cheater eliminated from the real tournament.

But what would be the incentive of the cheater then? I mean I could imagine someone cheating for a big reward or even the satisfaction of being lauded  as the best but simply to sneak into a competition where you'll undergo a humiliating defeat seems kinda weird to me. Don't you think? Besides you don’t really believe that a bot with an actual shot at winning the WCC could be nudged out that way, do you ?

Title: Re: Will the 2010 Computer Championship be open?
Post by jdb on Aug 18th, 2009, 5:39pm
I think the discussion about cheating is pretty much moot. If the bot is run on the developer's machine, its pretty much impossible to prevent.

Title: Re: Will the 2010 Computer Championship be open?
Post by Janzert on Aug 18th, 2009, 7:50pm

on 08/18/09 at 09:30:56, Fritzlein wrote:
Could you explain more why this is important to you, and what you see as the pros and cons of exempting one (or more) bots from qualifying?


Most of my reasons for wanting the previous champion in the tournament whether or not it qualifies are more emotional than logical. I want to see continuity from the previous tournament, I want to see as a developer how my bot performs against the current champion, I want to see how much further the current field has progressed than the champion. I would have absolutely hated it if bomb had been champion for so many years and then suddenly got knocked out by not even qualifying for the tournament.

For a slightly more logical basis. The qualifier is pretty much by definition less discriminating of the true abilities of the bots than the tournament will be (otherwise why have the tournament at all?). Winning the previous year's tournament is probably a fairly good discriminator as well. Although hard to say whether better or worse than the qualifier.

Janzert

Title: Re: Will the 2010 Computer Championship be open?
Post by tize on Aug 19th, 2009, 11:36am
At first I thought that last years champion should be given automatic entry, but should we give that to jdb or clueless2009cc. I mean its clueless2009cc that is the reigning champ, but that's not the bot that jdb will register, he will register an updated version of clueless (most likely stronger) or something new and unseen.

So to have continuity in the tournament from year to year omar should register the reigning champ, with automatic entry. And every developer can then register what ever bot they like.

Not sure I like this, we might get clueless to be both first and second.

Title: Re: Will the 2010 Computer Championship be open?
Post by jdb on Aug 19th, 2009, 12:36pm

on 08/19/09 at 11:36:30, tize wrote:
At first I thought that last years champion should be given automatic entry, but should we give that to jdb or clueless2009cc. I mean its clueless2009cc that is the reigning champ, but that's not the bot that jdb will register, he will register an updated version of clueless (most likely stronger) or something new and unseen.

So to have continuity in the tournament from year to year omar should register the reigning champ, with automatic entry. And every developer can then register what ever bot they like.

Not sure I like this, we might get clueless to be both first and second.


IF this were to be done, it would make sense to enter clueless2009cc. However, this bot is available for play all the time, as will whatever bot wins the 2010 tournament. If someone wanted to compare the two winners, a match could be played at their leisure.

Title: Re: Will the 2010 Computer Championship be open?
Post by Arimabuff on Aug 19th, 2009, 3:39pm
If we want to keep the contest honest, we have to limit to one bot per owner. One easy way to cheat would be to have the exact same bot playing under different aliases, thus artificially increasing the chances of that bot to win the WCC. Another way would be to program one instance to be weak when playing another, thus increasing the points of that other instance.

Title: Re: Will the 2010 Computer Championship be open?
Post by Janzert on Aug 19th, 2009, 6:23pm
I would want the automatic entry to allow the author to enter whatever updated version he wanted. If he didn't have an updated version to enter then Omar could add the same version from the previous year, as was done with Bomb.

Both the new and the old version should not be entered as that allows an advantage to the new version because of the problems with self play. To wit, when a bot is improved the new version will often play disproportionately better against the old version than the actual improvement would suggest it should. There isn't even a need for the purposeful manipulation that Arimaabuff brings up, it tends to just naturally occur anyway.

Janzert

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Aug 19th, 2009, 7:20pm

on 08/18/09 at 16:17:11, Arimabuff wrote:
I mean I could imagine someone cheating for a big reward or even the satisfaction of being lauded  as the best but simply to sneak into a competition where you'll undergo a humiliating defeat seems kinda weird to me. Don't you think?

OK, you have persuaded me.  Any bot that had to cheat to get in isn't going to win any games once it gets in, so it will actually lose money under the current prize structure.  I don't see how we could prevent cheating in the qualifying, but the incentives aren't there, so it probably won't be an issue.  If we suspect that someone has cheated, we can address it in subsequent years.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Aug 19th, 2009, 7:39pm

on 08/19/09 at 18:23:36, Janzert wrote:
I would want the automatic entry to allow the author to enter whatever updated version he wanted.

I agree with that sentiment.  Exemption from qualifying should be a courtesy extended to a developer, not to a version of a bot.  Also I agree with JDB that if we want to pit different versions of a bot from different years against each other, we can do so outside of the Computer Championship.  If a developer were allowed to enter more than one version, it could squeeze out up-and-coming contenders.  In 2005 the strongest three bots were probably Bomb2005, Bomb2004, and Arimaanator, but it would have been silly to give Fotland all three medals.

So, if we are just trying to be nice to the developer with exemption from qualifying, it seems like a delicate balancing act.  Whatever favor is extended to one developer is in equal proportion a hardship imposed on other developers.

I'm torn because we would definitely not want to deter the winning developer by the hassle of qualifying.  It would also have been silly had Bomb been dethroned by default in 2006 because Fotland was too busy to qualify his winning bot from 2005.  We want as many developers as possible to participate, whether or not they are likely to win, but the participation of the best bots is particularly desirable.  The developers are doing a favor to arimaa.com to participate more than arimaa.com is doing the developers a favor to allow them to participate.

My current thought is this: force all developers to qualify, even the returning champion.  If, at some point, our policy costs us the participation of a strong bot that we severely miss, then we can reconsider extending favors to the elite.


Title: Re: Will the 2010 Computer Championship be open?
Post by Janzert on Aug 19th, 2009, 7:54pm
One thing that maybe I wasn't completely clear with before. I don't see the automatic qualification as simply a favor to the previous champion. I think it is also a favor to the other developers that qualify as well as the all the spectators. The only one I actually see it as a disfavor to is the unfortunate developer that would have gotten the lowest seed but doesn't if the previous champion gets bumped up into that seed.

I also don't see it as terribly likely that the situation actually arises that the last champion doesn't qualify (at least as long as Omar attempts to qualify the previous champion if the developer themselves doesn't). But I would rather have the rule there than to be terribly disappointed if it does occur.

Janzert

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Aug 20th, 2009, 6:19am

on 08/19/09 at 19:54:55, Janzert wrote:
I think it is also a favor to the other developers that qualify as well as the all the spectators.

Hmm, I wonder if all the developers that have to jump through hoops while watching the previous champion rest on his laurels would agree that exempting the previous champion is a favor to them too.

In the chess world, FIDE tried holding knock-out tournaments, but wanted to make sure the previous champion would be involved in the climactic final match instead of getting eliminated early, so they seeded the previous champion directly into the finals.  I assure you that the rank and file did not view it as a favor to themselves that the previous champion had an easier path to victory, regardless of whether it preserved the glory of the event.  It's a different situation, true, but the sentiment might be similar for Arimaa.

Title: Re: Will the 2010 Computer Championship be open?
Post by tize on Aug 20th, 2009, 7:49am
Why not just give the developer of the reigning champ a unique decision, either he register the old bot unmodified and get the bot automatic qualified, or he register a new bot and have to qualify it like everybody else.

If the developer doesn't register any bot, Omar can register the champ. (auto qualified)

Title: Re: Will the 2010 Computer Championship be open?
Post by omar on Aug 20th, 2009, 7:55am
Regarding the cheating issue. I don't think we should worry about it too much. As Arimabuff mentioned there isn't much incentive to do it. In fact, the developer would probably just lose the entry fee in the finals and risk being banned from future tournaments. If the hardware used by the developer was much, much faster than the hardware of the tournament server, it could also appear as if the developer was cheating. I may need to publish the hardware that will be used on the tournament server before the qualifier and there might need to be a clause in the rules to says that during the qualifier the developers should not use a system that is more than twice as fast as the tournament server. If there is suspicion of cheating, I can run the bot from the tournament server against the benchmark bots to see if it performs similar to how it did during the qualifying phase. The tournament director can look at this performance difference while taking into consideration the hardware used by the developer compared to the tournament server and decide if there has been cheating. There should be a clause in the rules that if the tournament director determines a developer cheated during the qualifying phase then the developer loses the entry fee, is not entitled to any awards or prizes and will be banned from entering a bot in the following years tournament.

Regarding automatic entry in the final for the previous champion. The motivation for this rule seems to be to guarantee continuity from one year to the next of the champion being in the finals. Continuity is desirable, but it is possible the bot developer just doesn't have the time to improve the bot and does not enter a new bot at all; we've seen this happen already. In such a case I will enter the previous years champion in the qualifier. The probability of the best bot from one year not making it to the finals the very next year is low enough that I don't think we need a special rule to guarantee it. Although it could happen in the future when there are say 30 really good bots competing for the 8 positions in the finals. There could be big changes in position from one year to the next. But having such a rule opens up the door for the Deep Blue syndrome. That is, the developer of the champion bot can enter a bot into the finals which has not played any public games at all. This bot could be very different than the champion bot. I think this is very unfair to all the other bots in the final; especially if the bot that got automatic entry looks at their public games to decided how to play against them and they have no access to it's games. GnoBot already does look at previous games and we have to assume that this may become the norm in the future. So for this reason I would not want to have a special exemption for the developer of the champion bot. Also as has been mentioned the previous years champion bot will be available during development season, so I am sure the other bots will have played against it, especially the bot that will go on to become the new champion.

Karl I didn't quite get this:

Quote:
Thus if the qualifying bot has played the benchmark bot twenty times with that color since the starting date, count only the fourth and fifth games.

Did you mean to say "... has played the benchmark bot five times with that color ..."; because I thought you suggested a maximum of 5 games with each color.

I am afraid that counting only the last two games with each color against the benchmark bots will not provide enough discrimination. It will be a problem if the really good bots all get a perfect score. Then we have to rely on the tie breaker which counts the total number of qualifying games played. This measure is also not forgiving to allow for multiple plays to get a better result. In addition the really good bots might only need to play two games with each color to get the perfect score. This would require going into the second tie breaker which is not based on the merits of the bot. If this happens I don't think we would be very happy with how the qualifying phase went.

My reason for wanting to use a performance rating was to allow discrimination between the bots based only on merit. Rather than assigning fixed ratings to the benchmark bots, we could use a weighted point system for wins and losses. For examples a win against the first lowest benchmark bot is worth 1 point, against the second is worth 2 points and so on. If there are 8 benchmark bots then a win against the best benchmark bot is worth 8 points. Losses count for equivalent negative points; or to encourage more playing the negative points for a loss could be half the positive points for a win. To allow some leeway to improve the bot during the qualifying phase and try for a better score, one loss with each color against each benchmark bot will not be counted. As tize mentioned there is no need to require a minimum number of games. So there is only a cap on the maximum number of games which can be played against each benchmark bot for each color. This should allow the bots to compete with each other to maximize their score during the qualifying phase and provide sufficient discrimination. In the event of tied scores a blitz tie break game can be played between the tied bots. The bot which achieved the score first gets to pick the color in the tie break game.


Title: Re: Will the 2010 Computer Championship be open?
Post by 99of9 on Aug 27th, 2009, 10:01pm
I'm with Janzert.

However, one thing I'd like to ensure is that any bot allowed in by a special route still plays a minimum number of games to "show" as much of itself as the other bots show.  [EDIT: I now read that Omar also spoke of this.  But in response to Omar, it is still possible to have both a requirement to play qualifiers, and a free pass even if you fail to qualify.]

Omar's argument that the 2009CC is unlikely to miss out in 2010 is not a reason for or against having a rule.  The question is, if or when it DOES miss out, do we want to give it special consideration?  I do.

I'm with ArimaaBuff on the cheating issue.  There's hardly any incentive, and this is the kind of situation where the trustworthiness of the Arimaa community comes in handy.  If we were really to get paranoid on cheating, monitoring players in the WC to ensure they were not running bots would be more important IMO.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Sep 17th, 2009, 3:04pm
I apologize for dropping the ball on getting new rules written up.  I have now at least copied the 2009 rules here (http://arimaa.com/arimaa/mwiki/index.php/2010_World_Computer_Championship_Rules), and have begun editing.

I suggest the following eight benchmark bots:
   * ArimaaScoreP1
   * Loc2007P1
   * Arimaazilla
   * Aamira2006P2
   * Bomb2005Fast
   * OpFor2009Fast
   * Gnobot2009Blitz
   * Clueless2009Blitz

My selection criteria were:
1) We want a range of benchmarks from the worst bots to the best, so that we provide discrimination throughout the spectrum.  It would be silly to do a good job of seeding the top four qualifying bots but a terrible job of deciding which bot should qualify into the top eight compared to the bot that doesn't qualify in ninth place.  Conversely, we don't want all the discrimination at the lower end while all the best qualifying bots tie for perfect score at the top end, because that wouldn't give us a good seeding.
2) In case a developer has an advantage in beating previous versions of his own bot, there should be no more than one benchmark bot from any developer.
3) There should be a variety of time controls.  For the lower four benchmark bots the qualifying bot has a time advantage, but for the upper four benchmark bots, the qualifying bot has no time advantage.

Any further suggestions?  Next year we can adjust the stable of benchmark bots if this collection appears somehow lacking during 2010 qualification.


on 08/20/09 at 07:55:12, omar wrote:
I may need to publish the hardware that will be used on the tournament server before the qualifier

If you are going to have a rule that says the qualifying bots can't run on super-fast hardware, you at least need to specify the hardware limit, if not the exact hardware that will be used for the tournament.


Quote:
Continuity is desirable, but it is possible the bot developer just doesn't have the time to improve the bot and does not enter a new bot at all; we've seen this happen already. In such a case I will enter the previous years champion in the qualifier.

I like this solution because it guarantees continuity without opening the can of worms of a developer having two bots.  If a developer wants to enter a wholly different bot, perhaps one that is much weaker than his champion of last year, then that new bot will have to prove itself by qualifying.  If the champion developer doesn't have the time or inclination to qualify, then Omar can run his previous bot through the qualifying in order to get a reasonable seeding and insure that the quality of the tournament is at least not lower than the previous year.

If the situation ever arises that either the old version or the new version of a championship bot fails to qualify in the top eight after being entered, then I say boot it out and give the eighth spot to a more worthy contender.  If so many new and/or improved bots have surpassed the old champion, there is no possible issue with the quality of the tournament being low.


Quote:
Karl I didn't quite get this:
Did you mean to say "... has played the benchmark bot five times with that color ..."; because I thought you suggested a maximum of 5 games with each color.

Yes, I suggested a maximum of five games against each benchmark bot during the qualifying.  Can you set up some hard limit on the benchmark bots so that they don't play extra?  In the absence of an automatically-enforced limit, I thought we should have a rule to cover what should be done when a developer plays a sixth and seventh game.  One could say that the bot in question is automatically disqualified, but that seems a wee bit harsh.  I thought it would be nicer to instead ignore any games beyond the maximum (i.e. any games beyond the fifth).  Thus no matter how many extra games a qualifying bot plays against a benchmark bot with a certain color, we ignore all extras and just look at the last two legitimate games, i.e. the fourth and fifth.  Does this make sense?


Quote:
I am afraid that counting only the last two games with each color against the benchmark bots will not provide enough discrimination. It will be a problem if the really good bots all get a perfect score. Then we have to rely on the tie breaker which counts the total number of qualifying games played. This measure is also not forgiving to allow for multiple plays to get a better result. In addition the really good bots might only need to play two games with each color to get the perfect score.

I would be astonished if any bot entered for the 2010 championship could rack up 32 wins against this lineup with no losses.  To have a reasonable shot of running the table, the new bot would need to be at least 250 rating points better than the best bot of last year.  Do you really expect so much improvement?  My expectation is that several bots at the top will be able to max out their score at 32, but also that the number of tries needed to get those wins will provide sufficient discrimination.


Quote:
My reason for wanting to use a performance rating was to allow discrimination between the bots based only on merit. Rather than assigning fixed ratings to the benchmark bots, we could use a weighted point system for wins and losses. For examples a win against the first lowest benchmark bot is worth 1 point, against the second is worth 2 points and so on. If there are 8 benchmark bots then a win against the best benchmark bot is worth 8 points. Losses count for equivalent negative points; or to encourage more playing the negative points for a loss could be half the positive points for a win.

Introducing an arbitrary number of points for wins and losses against certain bots seems like a complication that not only has no benefit, it actually makes things worse as well as more complicated.  How do you know that your point system is fair?  You could make it more lucrative to play against one bot than against another, in which case careful selection of opponents will be part of the skill in getting a high seed for a qualifying bot.  I strongly recommend not having any scoring system in which self-selection of opponents can provide any advantage.  Selection of opponents is what wrecks the ordinary ratings, and it seems like a great danger to you ad-hoc scoring as well.

Besides, if you think this year's bots are strong enough to win and win against the old bots without losing, how does it help matters to make the scoring system more complicated?  A perfect score is a perfect score in any system.  The only way you can add discrimination is by making more of the games count.  That's fine by me: if you want to make a total of 80 games count in the scoring instead of having 32 games count in the scoring, then let's do that for greater discrimination.  Let's not count the last two games against each benchmark bot with each color; instead let's count all five games.

I repeat, however, that an arbitrary scoring system with self-selection of opponents does not help you in any way.  Instead we should define the opponents exactly (however we decide) and count one point per win among the defined games.


Quote:
In the event of tied scores a blitz tie break game can be played between the tied bots. The bot which achieved the score first gets to pick the color in the tie break game.

A playoff blitz game is a reasonable alternative to break ties instead of using time of completion.

Title: Re: Will the 2010 Computer Championship be open?
Post by jdb on Sep 17th, 2009, 5:07pm

Quote:
Yes, I suggested a maximum of five games against each benchmark bot during the qualifying.  Can you set up some hard limit on the benchmark bots so that they don't play extra?  In the absence of an automatically-enforced limit, I thought we should have a rule to cover what should be done when a developer plays a sixth and seventh game.  One could say that the bot in question is automatically disqualified, but that seems a wee bit harsh.  I thought it would be nicer to instead ignore any games beyond the maximum (i.e. any games beyond the fifth).  Thus no matter how many extra games a qualifying bot plays against a benchmark bot with a certain color, we ignore all extras and just look at the last two legitimate games, i.e. the fourth and fifth.  Does this make sense?


Bots entering the tournament are most likely under development, and changing over time. If there is a limit on the number of games against each bot, the developer will wait until the last minute to play the games. If the rules only count the last n games against each bot, then there is much less incentive to hold off playing games.

The last 4 bots on the list are not fixed performance bots. Thats OK, but it is something to be aware of.


Title: Re: Will the 2010 Computer Championship be open?
Post by Arimabuff on Sep 17th, 2009, 9:18pm

on 09/17/09 at 15:04:28, Fritzlein wrote:
...Yes, I suggested a maximum of five games against each benchmark bot during the qualifying...

I think that any rule that limits the number of games played by a bot is a very bad idea. The whole idea behind this site is to promote the game of Arimaa and multiply the possibilities to play it. NOT the other way around. Also, we must encourage the bot developers to… well… develop their bots as much as possible and we will not do that if we limit the number of test games that they are allowed to play.

Title: Re: Will the 2010 Computer Championship be open?
Post by Arimabuff on Sep 17th, 2009, 9:22pm

on 09/17/09 at 17:07:17, jdb wrote:
Bots entering the tournament are most likely under development, and changing over time. If there is a limit on the number of games against each bot, the developer will wait until the last minute to play the games. If the rules only count the last n games against each bot, then there is much less incentive to hold off playing games.

The last 4 bots on the list are not fixed performance bots. Thats OK, but it is something to be aware of.

Point well taken, we should encourage the evolving bot vis a vis the fixed one. Also a developer who tests his bot against human players is more likely to do a better job than one who doesn't.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Sep 19th, 2009, 10:37am

on 09/17/09 at 17:07:17, jdb wrote:
The last 4 bots on the list are not fixed performance bots. Thats OK, but it is something to be aware of.

Using only fixed-performance bots would be ideal for fairness, but unfortunately fixed-performance bots by their nature give a time handicap.  Therefore the strongest available bots are not fixed-performance.  Since we are worried about discrimination between the best bots, we need the top of the ladder to be as strong as possible.  Therefore the top benchmark bots should be variable-performance.

In the middle of the ladder, using Bomb2005P2 instead of Bomb2005Fast would probably be better in terms of having the rungs of the ladder equally spaced, and would be fairer by virtue of being fixed-performance.  The drawback of this substitution would be that BombP2 can be beaten by imitating past bot-bashes, whereas BombFast is more variable and therefore presumably less prone to duplicating whole games.

How do people feel about whether it should be BombP2 or BombFast among the benchmark bots?


Quote:
Bots entering the tournament are most likely under development, and changing over time. If there is a limit on the number of games against each bot, the developer will wait until the last minute to play the games. If the rules only count the last n games against each bot, then there is much less incentive to hold off playing games.

If developers hold off playing their games until the last minute, it creates the possible problem of the server not having the resources to play all the qualifying games at once.  If we mandate playing eighty games (rather than merely allowing eighty games), the problem of congestion would be exacerbated.

This consideration makes me want to revive my proposed second tiebreak, namely earliest completion of qualifying, to give at least a small incentive not to wait until the last minute.  Also, on further consideration, I don't like having a blitz playoff game as a tiebreak because it would be a logistical hassle to arrange the game in advance of the programs being ported to the server.  I will provisionally change the rule Wiki to use time as the second tiebreaker, but of course I'm not trying to end the discussion thereby.


on 09/17/09 at 21:18:14, Arimabuff wrote:
I think that any rule that limits the number of games played by a bot is a very bad idea. The whole idea behind this site is to promote the game of Arimaa and multiply the possibilities to play it. NOT the other way around. Also, we must encourage the bot developers to… well… develop their bots as much as possible and we will not do that if we limit the number of test games that they are allowed to play.

I agree with your general point about encouraging development and play, but I still think we had better cap the number of qualifying games.  Removing the limit on the number of qualifying games rewards persistence more than further development.  A developer whose bot beats Bomb only 10% of the time with a month to go before the tournament can get a two-game winning streak by playing about a hundred games against Bomb.  This would be an easier and more reliable way to improve the qualifying score than by adding last-minute features.  I don't think we want to encourage developers to set up their bots to play incessantly until getting a win streak of the right length.

If we define the stable of benchmark bots now, developers have over three months of unlimited practice to test new features and fix bugs.  Also, even during qualifying, developers can test and practice as much as they want against any bots other than the eight benchmarks.  We are by no means stifling developers from playing, testing, and developing.  We are merely trying to come up with a reasonable way to measure performance.  Allowing unlimited games interferes with the performance measurement.

I think the policy of allowing five qualifying games and counting the last two is a good compromise between accommodating last-minute bugfixes on one hand and keeping the scoring related to actual bot ability on the other hand.

Title: Re: Will the 2010 Computer Championship be open?
Post by jdb on Sep 19th, 2009, 1:00pm

Quote:
For bots which tie on qualifying score, the first tiebreaker is score minus the number of qualifying games played. Thus the best possible tiebreak score is zero, which represents winning all qualifying games and not playing any excess games. The worst possible tiebreak score is -80, which represents playing the maximum number of games and winning none. The second tiebreak is earliest date of completion, i.e. the bot whose last qualifying game was played first.


I hope someone can clarify this for me. If two bots are equal in score, the first tiebreaker can simply be the number of qualifying games played. Or am I missing something?

I like the second tiebreak. No need for a playoff game.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Sep 19th, 2009, 2:29pm

on 09/19/09 at 13:00:43, jdb wrote:
I hope someone can clarify this for me. If two bots are equal in score, the first tiebreaker can simply be the number of qualifying games played. Or am I missing something?

That's what I intended for the first tiebreak.  Did I not present it clearly?

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Sep 23rd, 2009, 8:58am
While chatting with JDB today, it became clear that we must not split winning streaks against benchmark bots into streaks as Gold and streaks as Silver.  The discrimination provided by qualifying is vastly greater if the longest winning streak that counts is four rather than two.  That's just how the math works out.  So we are going to have to require that qualifying bots play the benchmark bots with alternating colors.

I am sad advocate an alternating color requirement, because I know that will make it impossible for Omar to create a leader board with straight SQL queries.  Instead Omar will have to query for all the games of a qualifying bot versus a benchmark bot in date order and step through those games looking for valid winning streaks.  This is a hassle, I know, but will be extremely valuable in adding discrimination between qualifying bots without adding extra games.

JDB and I also discussed a maximum number of games allowed per benchmark bot.  I wanted a low limit to stop the server from being pounded by games, and to stop a weaker bot from passing a stronger bot by pure persistence.  JDB wanted no limit, to allow for open-ended development.  In the end, I think we were both satisfied with allowing twenty games per benchmark bot.  A developer who wants to play more than the 160-game maximum within the qualifying month is probably not focusing much on developing anyway.  Hopefully this limit is low enough that the server doesn't get swamped.

What do you think?

Title: Re: Will the 2010 Computer Championship be open?
Post by jdb on Sep 23rd, 2009, 9:13am

on 09/23/09 at 08:58:24, Fritzlein wrote:
I am sad advocate an alternating color requirement, because I know that will make it impossible for Omar to create a leader board with straight SQL queries.  Instead Omar will have to query for all the games of a qualifying bot versus a benchmark bot in date order and step through those games looking for valid winning streaks.  This is a hassle, I know, but will be extremely valuable in adding discrimination between qualifying bots without adding extra games.

What do you think?


It might be easier to enforce the alternating colour requirement on the other end. Enforce the alternation of colour before the game is played, instead of checking for it after.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Sep 23rd, 2009, 9:59am

on 08/11/09 at 09:44:03, RonWeasley wrote:
I would like to see specific language in the rules about handling server failures or degradations.  My policy of continuing the game at the point of failure was not unanimously supported and it put quite a burden on Omar.  If the rules are amended to call for a restart, for example, whenever a server failure is detected, the TD would not have to be consulted each time and tournament management would be more tractable.

Also think about handling server failures in the qualifying games and the effects of restarts or continuations on players.  A simple restart policy might be the most effective in dealing with scheduling.

I have put tentative language about handling server failures in the proposed rules (http://arimaa.com/arimaa/mwiki/index.php/2010_World_Computer_Championship_Rules).  My proposal for dealing with issues during qualifying is that everything counts no matter what happens.  This may sound arbitrary and harsh, but it would be a nightmare for Omar to have to deal with issues for a whole month like he has to deal with for the two-week tournament.

For in-tournament issues, I think we should agree on a list of issues that cause manadatory halt/restart, in order to take the judgment of the Tournament Director out of the equation.  Should we also list any issues that don't qualify for a halt/restart?

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Oct 15th, 2009, 6:01am
If Fotland doesn't return to active development this year, we may have the issue of whether to enter Bomb in the qualification process or not.  Bomb is not the defending champion any more, so we needn't worry about continuity in that sense.

My feeling is that if there are seven or fewer other entrants, it is fine to add Bomb to fill out the field, but if there are eight or more active developers, then we shouldn't enter Bomb on behalf of an inactive developer.  Even if Bomb would still be stronger than the bot in the eighth spot, I think it is less important to have the strongest field possible, and more important to include a bot with a future in front of it.  Let's not have deadwood taking up a slot.

How do other people feel about this rule clarification?  That is to say, we will automatically enter an unimproved reigning champion, but will not automatically enter an unimproved bot otherwise?

Title: Re: Will the 2010 Computer Championship be open?
Post by arimaa_master on Oct 15th, 2009, 6:42am

on 10/15/09 at 06:01:47, Fritzlein wrote:
If Fotland doesn't return to active development this year, we may have the issue of whether to enter Bomb in the qualification process or not.  Bomb is not the defending champion any more, so we needn't worry about continuity in that sense.

My feeling is that if there are seven or fewer other entrants, it is fine to add Bomb to fill out the field, but if there are eight or more active developers, then we shouldn't enter Bomb on behalf of an inactive developer.  Even if Bomb would still be stronger than the bot in the eighth spot, I think it is less important to have the strongest field possible, and more important to include a bot with a future in front of it.  Let's not have deadwood taking up a slot.

How do other people feel about this rule clarification?  That is to say, we will automatically enter an unimproved reigning champion, but will not automatically enter an unimproved bot otherwise?


I completely support that (secretely hoping that Fotland will come back with the improved bomb - to smash the opponents again).


Title: Re: Will the 2010 Computer Championship be open?
Post by Adanac on Oct 15th, 2009, 6:47am

on 10/15/09 at 06:01:47, Fritzlein wrote:
If Fotland doesn't return to active development this year, we may have the issue of whether to enter Bomb in the qualification process or not.  Bomb is not the defending champion any more, so we needn't worry about continuity in that sense.

My feeling is that if there are seven or fewer other entrants, it is fine to add Bomb to fill out the field, but if there are eight or more active developers, then we shouldn't enter Bomb on behalf of an inactive developer.  Even if Bomb would still be stronger than the bot in the eighth spot, I think it is less important to have the strongest field possible, and more important to include a bot with a future in front of it.  Let's not have deadwood taking up a slot.

How do other people feel about this rule clarification?  That is to say, we will automatically enter an unimproved reigning champion, but will not automatically enter an unimproved bot otherwise?


I agree with that.  Now that Bomb has been de-throned I'd prefer to see an active developer get the 8th spot.  As an added bonus, that would spare us the possibility of another boringly predictable 9-0 Human vs. Bomb challenge match.

Title: Re: Will the 2010 Computer Championship be open?
Post by RonWeasley on Oct 15th, 2009, 8:40am

on 10/15/09 at 06:01:47, Fritzlein wrote:
If Fotland doesn't return to active development this year, we may have the issue of whether to enter Bomb in the qualification process or not.  Bomb is not the defending champion any more, so we needn't worry about continuity in that sense.

My feeling is that if there are seven or fewer other entrants, it is fine to add Bomb to fill out the field, but if there are eight or more active developers, then we shouldn't enter Bomb on behalf of an inactive developer.  Even if Bomb would still be stronger than the bot in the eighth spot, I think it is less important to have the strongest field possible, and more important to include a bot with a future in front of it.  Let's not have deadwood taking up a slot.

How do other people feel about this rule clarification?  That is to say, we will automatically enter an unimproved reigning champion, but will not automatically enter an unimproved bot otherwise?

I prefer having the top 8 compete even if they include unimproved bots.  I like the 8 competing bots being the best 8, even if #9 is actively being worked on.  The bot left out can still arrange to play against the other bots, just not as part of the championship.  I can see a problem running the qualifier and having to include thousands of abandoned bots, but we're not there yet.

In practice, it's hard to imagine an unimproved bot winning the challenge, so it's not a huge deal for me if we omit them as a reward for active development.  If we go this route, we may have to think about our definition of "active development".  Imagine a developer trying to cheat by changing a comment line and declaring the bot eligible.  Hard to imagine someone doing this, but such surprises are possible.

Title: Re: Will the 2010 Computer Championship be open?
Post by Janzert on Oct 15th, 2009, 9:23am
Improved or unimproved makes no difference to me. The related distinction I would make is whether the developer submitted it.

I would still like to see the current champion given a slot even if it means Omar submits, qualifies, etc. the previous year's version. For all others I would rather the developer is required to submit it, run it through qualification and set it up on the tournament server. If the developer happens to be using the same version as last year that's fine with me, but at least they still have enough interest to invest the time necessary to get the engine in the tournament.

Janzert

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Oct 15th, 2009, 11:17am

on 10/15/09 at 09:23:18, Janzert wrote:
Improved or unimproved makes no difference to me. The related distinction I would make is whether the developer submitted it.

Yes, I guess that is the distinction I wanted to make too.  If Fotland wants to go to all the hassle of qualifying and setting up Bomb on his own, then it doesn't bother me if Bomb hasn't improved.  I wouldn't want to police it anyway.  The clarification is just that Omar wouldn't be obliged to run the bot through the process unless it was the defending champion.

If Omar sets up a page to automatically track the qualifying standings, I may try running bot_Arimaanator through the paces, just for giggles.  Would it make it into the top eight?  I wouldn't take up a slot in any case, because I don't meet the other requirements of tournament participation.

Title: Re: Will the 2010 Computer Championship be open?
Post by jdb on Oct 17th, 2009, 6:46am
I was reading over the rules regarding technical problems. In case of a technical problem, it might be easier to replay the game, instead of trying to resume it. It is easy to replay a game from the start, but it is tricky to resume a game.

In the event of a perceived technical problem, the game could be replayed automatically. The tournament director could then decide after the fact which game is the official game. This would minimize delays in the tournament schedule.

Title: Re: Will the 2010 Computer Championship be open?
Post by Janzert on Oct 17th, 2009, 10:46am
I agree, games should just be restarted from the beginning when a problem is detected. Although I think if the game is in progress it should be stopped if there is a reason already listed in the rules as disqualifying the game. If it's possible preferably it could be stopped in a way that would cause the result to be "abandoned", along with unrating it.

Janzert

Title: Re: Will the 2010 Computer Championship be open?
Post by omar on Oct 18th, 2009, 7:39am
Yes, restarting would be easier than restoring. But I am willing to go either way.

Title: Re: Will the 2010 Computer Championship be open?
Post by RonWeasley on Oct 19th, 2009, 4:36am

on 10/18/09 at 07:39:16, omar wrote:
Yes, restarting would be easier than restoring. But I am willing to go either way.

While I still have a preference for resuming, especially for bots, the arguments against it involving extra time on the adjourned position are reasonable and complex.  And I recognize that always requiring a restart is fair in the sense that all players are treated equally, even though one might be very unlucky .  Such a rule makes things much easier for tournament administration, so I yield to the more practical solution.

Title: Re: Will the 2010 Computer Championship be open?
Post by aaaa on Oct 24th, 2009, 11:43am
The rules say unrated games count for the purpose of qualifying, but this would allow a developer to take advantage of the fact that Gnobot2009Blitz and Clueless2009Blitz always set up their rabbits in front in those games.

Title: Re: Will the 2010 Computer Championship be open?
Post by omar on Oct 24th, 2009, 5:12pm
You are right; I guess the games against the benchmark bots can't be unrated during the qualifying phase. I changed the rules page now to read as:

Games against these bots during the qualifying period will be considered official qualifying games and must be rated games. Playing unrated games against these bots during the qualifying period would disqualify the qualifying bot.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Oct 24th, 2009, 5:19pm

on 10/24/09 at 17:12:14, omar wrote:
Playing unrated games against these bots during the qualifying period would disqualify the qualifying bot.

That's a bit harsh for something that could be an accident.  How about -1 point in the qualifying score for each unrated game played against a benchmark bot?  Plus, of course, any unrated games don't count in the win streaks.

Title: Re: Will the 2010 Computer Championship be open?
Post by omar on Oct 24th, 2009, 5:28pm
Yes, I would hate to see a bot get disqualified due to an error by the developer in setting up the game. We are just trying to discourage intentional unrated games, so deducting a point from the score sounds like a good way to do that. Changed it to:

Games against these bots during the qualifying period will be considered official qualifying games and must be rated games. Playing unrated games against these bots during the qualifying period would deduct one point for each such game from the qualifying bots score and such games would not count in the win streak.

Title: Re: Will the 2010 Computer Championship be open?
Post by aaaa on Oct 28th, 2009, 5:04pm
Wouldn't it be better if the first tiebreaker only applied to all the qualifying games up to the last one that contributed to the primary score? Otherwise, there could be an incentive for one to stop early.

Title: Re: Will the 2010 Computer Championship be open?
Post by jdb on Oct 31st, 2009, 3:15pm

on 10/24/09 at 17:28:09, omar wrote:
Yes, I would hate to see a bot get disqualified due to an error by the developer in setting up the game. We are just trying to discourage intentional unrated games, so deducting a point from the score sounds like a good way to do that. Changed it to:

Games against these bots during the qualifying period will be considered official qualifying games and must be rated games. Playing unrated games against these bots during the qualifying period would deduct one point for each such game from the qualifying bots score and such games would not count in the win streak.


This still seems a little harsh to me.

What is gained if the bot plays an unrated game against one of the ladder bots? Clearly, the game doesn't count in the score, so what is the benefit of playing the unrated game? I can easily setup bot_clueless_jr and play against them anyway. Its most likely just a developer mistake setting things up. I don't understand the need to penalize for a mistake.

Title: Re: Will the 2010 Computer Championship be open?
Post by omar on Nov 5th, 2009, 9:38am

on 10/28/09 at 17:04:46, aaaa wrote:
Wouldn't it be better if the first tiebreaker only applied to all the qualifying games up to the last one that contributed to the primary score? Otherwise, there could be an incentive for one to stop early.


Suppose against a particular benchmark bot, two qualifying bots A and B have a record like this:

A: 10111
B: 101110

Assuming that A and B have an identical record against all other bots and thus tie on qualifying score. aaaa is saying that since the first tie break is the score minus the number of qualifying games played bot B would get a lower first tie break score because it attempted one more game than bot A. This would provide an incentive to not continue playing after reaching a fairly good winning streak. I tend to agree that only the games up to the last game of the longest winning streak should be subtracted from the score in computing the first tie break. So the first tie break could probably be stated as: the score minus the total number of losses prior to the longest winning streak.

Title: Re: Will the 2010 Computer Championship be open?
Post by omar on Nov 5th, 2009, 9:57am

on 10/31/09 at 15:15:58, jdb wrote:
This still seems a little harsh to me.

What is gained if the bot plays an unrated game against one of the ladder bots? Clearly, the game doesn't count in the score, so what is the benefit of playing the unrated game? I can easily setup bot_clueless_jr and play against them anyway. Its most likely just a developer mistake setting things up. I don't understand the need to penalize for a mistake.


I guess the intent was that during the qualifying period practice games played as unrated games should not be allowed against the benchmark bots. If unrated games were allowed it might allow a qualifying bot to play unrated games to adjust to the benchmark bot before playing the rated games which count. But as jdb mentioned this could still be done using a second bot that is not registered. So I am for removing the restriction that unrated games can't be played against the benchmark bot during the qualifying phase unless someone can provide a strong case for keeping it.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Nov 5th, 2009, 1:08pm

on 10/28/09 at 17:04:46, aaaa wrote:
Otherwise, there could be an incentive for one to stop early.

If there is no incentive to stop early, I expect we will see most or all qualifying bots play all twenty games against all eight benchmark bots, except when they achieve a four-game winning streak.  If we are going to use up the server resources by having all the games played anyway, we might as well change the scoring to count all the games, so as to get better discrimination between bots.

The longer I mull over the "winning streak" scoring, the less I like it.  It throws away lots of information in the cause of encouraging last-minute development.  But what our scoring scheme will probably do is encourage developers to play more games instead of developing more, because there is greater payoff in the scoring for persistence than there is for tinkering to gain minor improvements.  Furthermore, with 160 games per developer, I expect contention for who gets to play the benchmark bots, particularly at the last minute.

I think we would get a better seeding by requiring exactly four games against each benchmark bot, i.e. 32 games per developer, and counting them all.  To get around jdb's concern of every developer waiting until the last minute to play the qualifying games, we could give a slight bonus for playing early, say 0.2% per day before the deadline.  I have no qualms having seeding favor the best bot as of January 1 rather than the best bot as of January 31 if that means we will not have to clog up the server with qualifying games.

Title: Re: Will the 2010 Computer Championship be open?
Post by omar on Nov 10th, 2009, 7:11pm
I updated to rules to incorporate the suggestions given by aaaa and jdb.


Quote:
Furthermore, with 160 games per developer, I expect contention for who gets to play the benchmark bots, particularly at the last minute.

I think the first four bots should be pretty easy for the qualifying bots.


Quote:
To get around jdb's concern of every developer waiting until the last minute to play the qualifying games, we could give a slight bonus for playing early, say 0.2% per day before the deadline.

I think this year we might be too late to make more changes to the rules. So lets hold off until next year for this.

Title: Re: Will the 2010 Computer Championship be open?
Post by aaaa on Nov 11th, 2009, 7:09am
The second tiebreaker also needs rewriting as the way it is phrased would imply that it could only be used to differentiate between bots if either has played all possible qualifying games, including possible excessive ones. In the same spirit behind the change of the first tiebreaker, I would therefore turn "earliest date of completion" into "earliest date of reaching one's final score" or words to that effect.

Title: Re: Will the 2010 Computer Championship be open?
Post by omar on Nov 12th, 2009, 5:59pm
Thanks. I change it to read:

The second tie-break is earliest date of completion, i.e. the bot whose last game which counts towards the score was played first.

Title: Re: Will the 2010 Computer Championship be open?
Post by aaaa on Nov 14th, 2009, 7:18am
If you want to leave absolutely no room for interpretation, replace "played" by "finished".

Title: Re: Will the 2010 Computer Championship be open?
Post by omar on Nov 24th, 2009, 1:13pm
Changed it; thanks.

Title: Re: Will the 2010 Computer Championship be open?
Post by omar on Dec 19th, 2009, 8:00pm
Here is the bot qualifying program. Please check and let me know if you see any problems.

For now I've set only four bots as the qualifying bots; quad, clueless, marwin and OpFor.

Also for testing I've set the the time frame for the games to last two years.

A break down of all the games that were used in computing the score is listed along with how the score and tie breaks were computed.

The requirement for playing with alternating colors is observed as follows. The first game against a benchmark bot sets the color and subsequent games must be with alternating colors. Otherwise it breaks the streak even if the game was won; and if the game was lost it adds to the loses if it occurred prior to the the longest streak.

http://arimaa.com/arimaa/wcc/2010/qual.htm


Title: Re: Will the 2010 Computer Championship be open?
Post by Janzert on Dec 19th, 2009, 8:49pm
A few small comments.

I expected a streak to include the first game of the streak, i.e. one win is a streak of one.

I think losses fits the tense of the page better than loses if it's not a pain to change.

Janzert

Title: Re: Will the 2010 Computer Championship be open?
Post by tize on Dec 20th, 2009, 5:59am

Code:
bot_OpFor
...
   bot_Loc2007P1
       s L 1220896327 82968
       s L 1220930920 82985
       s W 1220934422 82990
       streak is 0, loses is 2, total score is 0, total loses is 2, ...
...
bot_clueless
...
   bot_Bomb2005Fast
       s L 1203799256 71259
       s W 1233711016 96104
       s L 1233749824 96136
       s W 1233775674 96157
       g W 1233784044 96164
       g W 1233787675 96169
       s W 1233795579 96188
       s W 1233955791 96400
       g W 1234270647 96839
       s W 1234899619 97612
       s L 1235056316 97833
       g W 1235063963 97840
       s W 1235068572 97846
       streak is 2, loses is 2, total score is 2, total loses is 2, ...

OpFor should have a streak one, and clueless should have a streak of 3. Other than that it looks good.

Title: Re: Will the 2010 Computer Championship be open?
Post by omar on Dec 20th, 2009, 8:14am

on 12/19/09 at 20:49:23, Janzert wrote:
I expected a streak to include the first game of the streak, i.e. one win is a streak of one.

Yes, a single win counts as a streak of one; for example search the page for: 91061.


Quote:
I think losses fits the tense of the page better than loses if it's not a pain to change.

Thanks. Changed it.


Title: Re: Will the 2010 Computer Championship be open?
Post by omar on Dec 20th, 2009, 8:36am

on 12/20/09 at 05:59:36, tize wrote:

Code:
bot_OpFor
...
   bot_Loc2007P1
       s L 1220896327 82968
       s L 1220930920 82985
       s W 1220934422 82990
       streak is 0, loses is 2, total score is 0, total loses is 2, ...
...
bot_clueless
...
   bot_Bomb2005Fast
       s L 1203799256 71259
       s W 1233711016 96104
       s L 1233749824 96136
       s W 1233775674 96157
       g W 1233784044 96164
       g W 1233787675 96169
       s W 1233795579 96188
       s W 1233955791 96400
       g W 1234270647 96839
       s W 1234899619 97612
       s L 1235056316 97833
       g W 1235063963 97840
       s W 1235068572 97846
       streak is 2, loses is 2, total score is 2, total loses is 2, ...

OpFor should have a streak one, and clueless should have a streak of 3. Other than that it looks good.


For OpFor game 82990 does not count because of the alternating color requirement. Game 82968 was played as silver so the next game which can increase the streak must be played as gold.

For clueless the streak gets broken several times because of the alternating color requirement. Game 71259 sets the initial color to silver, so the next game which can increase the streak must be played as gold. Game 96164 is the next game which increases the streak count to 1 since it was played with gold. But the very next game, 96169, breaks the streak since it was not played as silver even though the game was won. Because the streak gets broken several times the longest streak is two games.

Title: Re: Will the 2010 Computer Championship be open?
Post by jdb on Dec 20th, 2009, 8:46am
Clueless has a winning streak of 5 against bot_arimaascorep1. It should only count as 4 points towards the total score.

Title: Re: Will the 2010 Computer Championship be open?
Post by Sconibulus on Dec 20th, 2009, 9:31am
Omar, as I read it, clueless should still have a streak of 3 due to this segment.

s W 1233955791 96400
g W 1234270647 96839
s W 1234899619 97612

I think the reason that it doesn't is that when a streak is broken due to colour-fault, it doesn't count that game as a start of a new streak. Basically it seems to count that win as a loss, rather than a streak-breaking win.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Dec 20th, 2009, 9:32am
It looks like the streak is not cut off at four.  Clueless now has credit for a 5-game win streak against ArimaaScoreP1.

Why doesn't game 82990 give OpFor a 1-game winning streak against Loc2007P1?  Similarly game 109862 should give marwin a one-game winning streak against OpFor2009Fast.  I think perhaps you are ignoring the next game of the same color, even though a loss as Silver followed by a win as Silver ought to start a streak.

Thanks for getting this up and running.  The continually-udpated standing are going to make for a very engaging race!

[edit]
This post would have said something original, but Tuks challenged me to a game in the middle of writing it, so jdb and Sconibulus beat me to it. :P
[/edit]

Title: Re: Will the 2010 Computer Championship be open?
Post by aaaa on Dec 20th, 2009, 3:17pm
I don't see why games with the wrong color assignment shouldn't simply be ignored, just like unrated ones.

Title: Re: Will the 2010 Computer Championship be open?
Post by omar on Dec 21st, 2009, 6:48pm

on 12/20/09 at 08:46:19, jdb wrote:
Clueless has a winning streak of 5 against bot_arimaascorep1. It should only count as 4 points towards the total score.


Oh yeah, forgot about the maximum of 4 points for streaks longer than 4. It is fixed now. Thanks for spotting that jdb.

Title: Re: Will the 2010 Computer Championship be open?
Post by omar on Dec 21st, 2009, 7:07pm

on 12/20/09 at 09:31:42, Sconibulus wrote:
Omar, as I read it, clueless should still have a streak of 3 due to this segment.

s W 1233955791 96400
g W 1234270647 96839
s W 1234899619 97612

I think the reason that it doesn't is that when a streak is broken due to colour-fault, it doesn't count that game as a start of a new streak. Basically it seems to count that win as a loss, rather than a streak-breaking win.


I added a star now in front of games that count so we can more easily see which games counted and which didn't due to the alternating color requirement.


Code:
   bot_Bomb2005Fast
     * s L 1203799256 71259
       s W 1233711016 96104
       s L 1233749824 96136
       s W 1233775674 96157
     * g W 1233784044 96164
       g W 1233787675 96169
     * s W 1233795579 96188
       s W 1233955791 96400
     * g W 1234270647 96839
     * s W 1234899619 97612
       s L 1235056316 97833
     * g W 1235063963 97840
     * s W 1235068572 97846


From this you can see that game 96400 didn't count because after 96188 which was played as silver we are expecting the next game to be played as gold, but 96400 was played as silver.

Title: Re: Will the 2010 Computer Championship be open?
Post by omar on Dec 21st, 2009, 7:20pm

on 12/20/09 at 09:32:18, Fritzlein wrote:
Why doesn't game 82990 give OpFor a 1-game winning streak against Loc2007P1?  Similarly game 109862 should give marwin a one-game winning streak against OpFor2009Fast.  I think perhaps you are ignoring the next game of the same color, even though a loss as Silver followed by a win as Silver ought to start a streak.


But after losing as Silver isn't the qualifying bot expected to play the next game as Gold. Or is there a rule that breaking a streak also breaks the alternating color requirement such that the first win of a streak set the initial color.

Title: Re: Will the 2010 Computer Championship be open?
Post by omar on Dec 21st, 2009, 7:28pm

on 12/20/09 at 15:17:34, aaaa wrote:
I don't see why games with the wrong color assignment shouldn't simply be ignored, just like unrated ones.


That is, they don't break a streak nor do they count as pre-streak losses. This could easily be done, but would it cause any problems or create loop holes. Off hand I don't see any problems since we are allowing unrated games to be ignored. Does anyone have an objection to ignoring rated games played with the wrong color?

Title: Re: Will the 2010 Computer Championship be open?
Post by Janzert on Dec 21st, 2009, 7:35pm

on 12/21/09 at 19:20:09, omar wrote:
...the first win of a streak set the initial color.


This is the way I expected it to work.

aaaa wrote:

Quote:
I don't see why games with the wrong color assignment shouldn't simply be ignored, just like unrated ones.


+1

Janzert

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Dec 21st, 2009, 9:09pm
Janzert, I think you just agreed with two contradictory things.  If games of the wrong color are ignored, then some wins will be ignored, including potentially the first (color-setting) win of a streak.

Ignoring games of the wrong color is simplest, but to me it is unintuitive that a losing streak is not broken by a win of the wrong color and a winning streak is not broken by a loss of the wrong color.  If we decide for the simpler (unintuitive) way, then the explanation in the rules had better be very, very clear.

Title: Re: Will the 2010 Computer Championship be open?
Post by Janzert on Dec 21st, 2009, 10:45pm
Hmm, I must be missing something. I admit I wasn't paying full attention while the qualifying method was being discussed and developed. After reading the rules a few days ago here in my own words is the mental model I had for how it worked.

With the start of the qualifying period, for each benchmark bot a qualifying bot is in one of two states: In a winning streak, out of winning streak. Each bot pair starts of course in the latter state. The state is updated once after each rated game between a pair.

While in the "out of winning streak" state the bot stays in the current state until it has a win, it then enters the "in winning streak" state with a streak length of 1.

While in the "in winning streak" state; if the bot has a game against the opposing bot with the same color as the previous game or the bot loses a game the state is reset to "out of winning streak" and the current streak length becomes the permanent length for this streak, else if the bot wins a game the current streak length is incremented.

In agreeing with aaaa I meant to change the "in winning streak" state to "if the bot has a game against the opposing bot with the same color as the previous game the game is ignored, else if the bot loses...". I never expected losses prior to the first win of a streak to count for anything.

Hopefully that makes it a little clearer on what my mental model of the process was/is.

Janzert

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Dec 22nd, 2009, 5:38am
Janzert, what you imagined is close to what I imagined, but apparently far from what omar and aaaa thought.  (And I'm not positive they agreed exactly with each other...)  I thought that if you are in a winning streak a loss stops it (regardless of color) and if you are in a losing streak a win stops it (regardless of color).  I probably said that games with the wrong color don't count, but what I meant was that within a winning streak a win with the wrong color is neutral (neither ending nor extending the streak), while losses with the wrong color always count, either to break a winning streak or to lower your tiebreak score in a losing streak.

All of this is not to argue for doing it my way.  If bots faithfully alternate colors, all of our ways of counting are equivalent, so it barely matters what the penalty for violating the alternation is.  What does matter is being super clear about whatever we decide on, so as to avoid misunderstandings.  We thought we were pretty much in agreement until the implementation exposed sharp differences in our intuitions about how things should work.  Two other things to be absolutely clear about: do games with the wrong color count toward the limit of 20 against each bot?  and do unrated games count toward the limit?  Of course if we are ignoring unrated games it is essential that games with begin as rated can't become unrated after a timeout: unrating must apply only to human games.

Title: Re: Will the 2010 Computer Championship be open?
Post by jdb on Dec 22nd, 2009, 7:43am

Code:
...
bot_clueless
...
   bot_Bomb2005Fast
A        s L 1203799256 71259
B        s W 1233711016 96104
C        s L 1233749824 96136
D        s W 1233775674 96157
E        g W 1233784044 96164
F        g W 1233787675 96169
G        s W 1233795579 96188
H        s W 1233955791 96400
I        g W 1234270647 96839
J        s W 1234899619 97612
K        s L 1235056316 97833
L        g W 1235063963 97840
M        s W 1235068572 97846
       streak is 2, loses is 2, total score is 2, total loses is 2, ...


I had no idea how much of a difference interpretation could make on the length of the streak!

For the above example,

Rule 1) The streak can start on any colour, at any time, and non alternating games are ALWAYS ignored.

Best Streak: BEGIJLM
Length: 7

Rule 2) The streak can start on any colour, at any time, and non alternating games ALWAYS break the streak.

Best Streak: HIJ
Length: 3

Rule 3) The streak can start on any colour, at any time, and non alternating LOSSES break the streak, non alternating wins are ignored

Best Streak: DEGIJ
Length: 5

Rule 4) The first game played (ie win or lose) sets the colour, each non alternating game after that is ignored. The streak can start on either colour.

Counting Games: AEGIJLM
Best Streak: EGIJLM
Length: 6


I think rule 4 would work well. It forces the 20 games to be split by colour evenly. There is no penalty for accidently playing a non alternating game. It is crystal clear which 20 games count, so no loop holes.

Title: Re: Will the 2010 Computer Championship be open?
Post by omar on Dec 22nd, 2009, 3:29pm
I also like rule 4 because it enforces an equal number of games with both colors. If a non-alternating game occurs, it would be due to a mistake in setting up the game by the bot developer; so it makes sense to not have those games penalize the bot in any way. I don't see any way in which the bot develop could intentionally use such games to gain an advantage.

I changed it now to completely ignore games that occurred with a non-alternating color. Such games do not, break the streak, increase the pre-streak losses or count against the 20 game limit.

http://arimaa.com/arimaa/wcc/2010/qual.htm

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Jan 3rd, 2010, 11:20am
I'm glad that stable of benchmark bots is not so weak that there will be no discrimination between the top qualifying bots: marwin has already lost a game to Bomb2005Fast.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Jan 7th, 2010, 9:01am
Earlier I saw that bot_bomb had registered, but later the name changed to bot_Bomb2005CC.  I take this to mean that Omar is putting bomb through qualifying on Fotland's behalf.

We discussed earlier in this thread whether bomb should be given an automatic entry or not.  Now I see both that "automatic entry" is ambiguous and that Omar never gave his opinion on that point.

In some sense bomb is getting an automatic entry, because Omar is doing work on Fotland's behalf that other developers don't have to do.  I don't like the possiblity that there is an actively-developed bot getting squeezed out.  I would prefer the eighth spot in the tournament to go to Rat or akimot because their respective developers have put in time in the past year, even if bomb is stronger than both Rat and akimot.  I truly hope that neither developer was deterred from entering by the thought that the field was too tough.  I hope that neither was going to take up the eighth space in bomb's absence.

On the other hand, if there truly were only going to be seven entrants without bomb, then I don't mind the preferential treatment.  It will make the tournament smoother and more interesting to have a full complement of eight bots.  Furthermore, bomb still has to meet the same qualifying requirements as other bots, and therefore should be seeded appropriately.

Title: Re: Will the 2010 Computer Championship be open?Or
Post by BlackKnight on Jan 8th, 2010, 5:44pm

on 01/07/10 at 09:01:22, Fritzlein wrote:
I would prefer the eighth spot in the tournament to go to Rat or akimot because their respective developers have put in time in the past year, even if bomb is stronger than both Rat and akimot.  I truly hope that neither developer was deterred from entering by the thought that the field was too tough.  I hope that neither was going to take up the eighth space in bomb's absence.

Originally, I thought I could enter Rat again, but first of all I was too busy in the end of last term to implement the changes in Rat that are necessary to actually improve the bot. Furthermore, I have an even much busier schedule this term, but I will introduce Arimaa to two of my classes as we go along.

Title: Re: Will the 2010 Computer Championship be open?Or
Post by Fritzlein on Jan 8th, 2010, 6:10pm

on 01/08/10 at 17:44:17, BlackKnight wrote:
...but I will introduce Arimaa to two of my classes as we go along.

Awesome!

Title: Re: Will the 2010 Computer Championship be open?
Post by omar on Jan 9th, 2010, 11:37pm

on 01/07/10 at 09:01:22, Fritzlein wrote:
In some sense bomb is getting an automatic entry, because Omar is doing work on Fotland's behalf that other developers don't have to do.  I don't like the possiblity that there is an actively-developed bot getting squeezed out.  I would prefer the eighth spot in the tournament to go to Rat or akimot because their respective developers have put in time in the past year, even if bomb is stronger than both Rat and akimot.  I truly hope that neither developer was deterred from entering by the thought that the field was too tough.  I hope that neither was going to take up the eighth space in bomb's absence.


It was only recently that Fotland mentioned to me that he didn't get a chance to work on Bomb. If there had been more than 8 bots I would not have run the qualifying games for Bomb and it would get most likely get excluded. Initially I was not planning to run the qualifying games for Bomb and let it get seeded in the last position. But then I thought it's not that much work for me to run those games and it would not mess up the seeding.


Title: Re: Will the 2010 Computer Championship be open?Or
Post by omar on Jan 9th, 2010, 11:48pm

on 01/08/10 at 17:44:17, BlackKnight wrote:
I have an even much busier schedule this term, but I will introduce Arimaa to two of my classes as we go along.


I've been contemplating sending out a flyer about Arimaa and the Arimaa challenge to all the professors who teach an Intro to AI class. It would be a lot of work though trying to compile such a list.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Jan 10th, 2010, 8:53am

on 01/09/10 at 23:37:27, omar wrote:
It was only recently that Fotland mentioned to me that he didn't get a chance to work on Bomb. If there had been more than 8 bots I would not have run the qualifying games for Bomb and it would get most likely get excluded. Initially I was not planning to run the qualifying games for Bomb and let it get seeded in the last position. But then I thought it's not that much work for me to run those games and it would not mess up the seeding.

Given that there were only only seven other participants, it is good to have even an unaltered Bomb taking part.  Given that Bomb is taking part, it will be good to have it seeded correctly.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Jan 10th, 2010, 8:58am
Now that marwin has reached the maximum score of 32, I see more clearly that the second tiebreaker of completion time is actually important.  Marwin has only two losses, so marwin will have the top seed unless some other bot gets a four-game winning streak against all the benchmark bots with only one loss total.  By going first, tize has secured a large advantage, because anyone coming after needs to score half his losses.  Maybe the qualifying procedures this year aren't going to work out so badly after all.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Jan 13th, 2010, 9:55am
The initial spate of qualifying games has tapered off, so that now I think we are actually behind schedule, i.e. 40% of the month is gone but less than 40% of the qualifying games have been played.  In terms of server load we are even further behind because the fast and blitz bots create more load per game than the P1 and P2 bots.  I hope that the last-minute rush of CC qualifying games doesn't bog down the server.

Title: Re: Will the 2010 Computer Championship be open?
Post by tize on Jan 15th, 2010, 2:23am

Quote:
bot_pragmatictheory
...
   bot_Bomb2005Fast
     * s L 1263437897 131721
     * g L 1263441124 131722
       streak is 0, losses is 2, total score is 16, total losses is 4, finished is 1263007331
   bot_OpFor2009Fast
     * s L 1263515921 131816
     * g L 1263518395 131821
       streak is 0, losses is 2, total score is 16, total losses is 6, finished is 1263007331


Quote:
For bots which tie on qualifying score, the first tie-breaker is zero minus the losses prior to longest winning streak with each bot. Losses after the longest winning streak are ignored.

These four losses shouldn't count in the tie-breaker score as they are all played after the winning streak of 0.

Title: Re: Will the 2010 Computer Championship be open?
Post by omar on Jan 15th, 2010, 12:19pm
Or it could be considered loses prior to the winning streak of 0.

Unfortunately we didn't discuss this detail prior to the start of the qualifying.

Might as well discuss it now and see if it is critical enough to warrant changing it.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Jan 15th, 2010, 1:29pm

on 10/28/09 at 17:04:46, aaaa wrote:
Wouldn't it be better if the first tiebreaker only applied to all the qualifying games up to the last one that contributed to the primary score? Otherwise, there could be an incentive for one to stop early.



on 01/15/10 at 12:19:44, omar wrote:
Or it could be considered loses prior to the winning streak of 0.

Unfortunately we didn't discuss this detail prior to the start of the qualifying.

The motivation behind the current score/tiebreak method that aaaa proposed was explicitly to make it risk-free to play more games.  In that sense we did discuss it.  The decision was implicitly that your tiebreak never gets worse unless your score has gotten better, because if you can lower your tiebreak without raising your score, you have an incentive to stop playing qualifying games.

I disagreed with the idea of making it risk-free to play more games, but I lost the argument, so it's OK.  Now that we have made it risk-free to keep trying in all other cases, it just seems wrong for the first games against a bot to be risky.  It makes no sense if a record of

WLLLLLLWLLLLLLLLLLLL

against a benchmark gets no tiebreak penalty, while a record of

LLLLLLLL

is minus eight in the tiebreaker.  Maybe in a logical-coding-procedure sense it could be consistent, but in terms of rewarding/punishing developer behavior it would be wildly inconsistent.  Also it would not be true to the literal wording of aaaa's proposal that was supposedly accepted, because it would be counting games in the tiebreak after the last game that contributed to the bot's primary score.

Title: Re: Will the 2010 Computer Championship be open?
Post by jdb on Jan 15th, 2010, 2:18pm

on 01/15/10 at 02:23:39, tize wrote:
These four losses shouldn't count in the tie-breaker score as they are all played after the winning streak of 0.


I agree.

Title: Re: Will the 2010 Computer Championship be open?
Post by Janzert on Jan 15th, 2010, 9:38pm

on 01/15/10 at 02:23:39, tize wrote:
These four losses shouldn't count in the tie-breaker score as they are all played after the winning streak of 0.


+1

Title: Re: Will the 2010 Computer Championship be open?
Post by 99of9 on Jan 19th, 2010, 6:06pm

on 01/13/10 at 09:55:30, Fritzlein wrote:
The initial spate of qualifying games has tapered off, so that now I think we are actually behind schedule, i.e. 40% of the month is gone but less than 40% of the qualifying games have been played.  In terms of server load we are even further behind because the fast and blitz bots create more load per game than the P1 and P2 bots.  I hope that the last-minute rush of CC qualifying games doesn't bog down the server.

Sorry, I've contributed a big fraction to this by not being organized.  I've finally updated gnobot's opening book, so am ready to start my charge for the year!  (Sorry, no major improvements this year.)

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Jan 19th, 2010, 7:42pm
Hey, I'm glad that you are at least going to try to get GnoBot seeded properly rather than accepting the #8 seed.  It's better to have the last-minute rush of games now instead of... well, instead of at the last minute. :)

I wonder how much the opening book will raise GnoBot's seed.  If the book is too effective, it may be an argument for next year's qualifying bots to be blitz bots only, with no P1 or P2 bots, so that opening books will at least have to cope with minor clock fluctuations from the benchmark bots.

Title: Re: Will the 2010 Computer Championship be open?
Post by 99of9 on Jan 20th, 2010, 1:43am

on 01/19/10 at 19:42:28, Fritzlein wrote:
I wonder how much the opening book will raise GnoBot's seed.  If the book is too effective, it may be an argument for next year's qualifying bots to be blitz bots only, with no P1 or P2 bots, so that opening books will at least have to cope with minor clock fluctuations from the benchmark bots.

Interestingly both ArimaaScoreP1 and Aamira2006P2 went out of book very early because of some algorithmic randomness.  I checked and it was definitely them that deviated.

In the end these bots don't affect anything much because they are all a bit too weak.  So I'd argue for cutting them for that reason alone.  I'm not sure tailoring the rules or selection bots to suit or not suit participant styles is a good idea.  Are book wins less legitimate?  They force other bots in the tournament to be somewhat adaptive/randomized, which in itself removes another system humans could use to maintain their dominance.

I think the qualifying method will spread our bots well.  Gnobot is likely to clock up quite a lot of losses if it ever gets to 32 streaks.  Marwin's result is amazing.  I think everyone is sitting at 16 in trepidation because it seems so impossible!

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Jan 20th, 2010, 8:25am

on 01/20/10 at 01:43:00, 99of9 wrote:
In the end these bots don't affect anything much because they are all a bit too weak.  So I'd argue for cutting them for that reason alone.

Agreed.  And luckily we will have some new bots to replace them with, e.g. bot_Marwin2010Blitz.


Quote:
I'm not sure tailoring the rules or selection bots to suit or not suit participant styles is a good idea.  Are book wins less legitimate?

Yes, I think book wins are less legitimate for this purpose because they are a less robust indicator of tournament performance.  If someone comes up with a way to ace the qualifying, we want that to also be a way to ace the tournament.


Quote:
They force other bots in the tournament to be somewhat adaptive/randomized, which in itself removes another system humans could use to maintain their dominance.

I think book wins within the computer tournament are entirely legitimate, and I'm very happy for GnoBot's book win over Bomb in 2009 for exactly the reasons you mention.  Because of GnoBot, every developer should implement some kind of defense against losing the same way twice.  But whatever defense is implemented by developers in 2010 can't be backported to old benchmark bots.

Furthermore, if everyone is like JDB in letting parallelism be the main line of defense, but Omar turns off parallel search for game room bots (as I think he must to conserve server resources), then opening books could be extremely effective against benchmark bots while being wholly ineffective in the tournament itself.  That kind of disconnect would be a flaw in the qualifying procedure.


Quote:
I think the qualifying method will spread our bots well.  Gnobot is likely to clock up quite a lot of losses if it ever gets to 32 streaks.  Marwin's result is amazing.  I think everyone is sitting at 16 in trepidation because it seems so impossible!

I'm pleased the system is working so far, but it hasn't really been stressed yet since many bots are sitting at 16, except marwin, which didn't have many losses.  I'm imagining many bots coming up against a benchmark they can't beat consistently, and playing all 20 games to try to stretch a one-game winning streak into a two-game winning streak.  And we have only eleven days left for that to happen.  Yikes!  This could still be a train wreck on the last few days of the month.

Title: Re: Will the 2010 Computer Championship be open?
Post by 99of9 on Jan 20th, 2010, 2:12pm
I'll reply properly later, but just a note that Gnobot2009Blitz timeouts could cause even more issues than with marwin... it is happening against gnobot too.

Title: Re: Will the 2010 Computer Championship be open?
Post by omar on Jan 20th, 2010, 2:40pm

on 01/15/10 at 02:23:39, tize wrote:
These four losses shouldn't count in the tie-breaker score as they are all played after the winning streak of 0.


OK I changed it so that, losses do not count unless there was a non-zero winning streak.

Title: Re: Will the 2010 Computer Championship be open?
Post by omar on Jan 20th, 2010, 2:52pm

on 01/20/10 at 14:12:03, 99of9 wrote:
I'll reply properly later, but just a note that Gnobot2009Blitz timeouts could cause even more issues than with marwin... it is happening against gnobot too.


The system load goes up quite a bit when any version of GnoBot2009 runs. I think this may be causing the timeouts. I have each version capped at one instance, but different versions could get started at the same time.

Maybe we should not count games where a benchmark bot loses on time.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Jan 20th, 2010, 3:18pm

on 01/20/10 at 14:52:47, omar wrote:
The system load goes up quite a bit when any version of GnoBot2009 runs. I think this may be causing the timeouts. I have each version capped at one instance, but different versions could get started at the same time.

Maybe we should not count games where a benchmark bot loses on time.

Maybe GnoBot2009 should not be a benchmark bot, and indeed should not be available for play by anyone.  Probably the load is caused by GnoBot2009 starting multiple threads, and it is unacceptable for any single bot instance to use up more than one CPU.  For the future you should ask developers to include a single-threaded mode, and if no single-threaded mode is included, not let the resource-hogging bot run on the main server, period.

In terms of having a fair qualifying procedure, it is a poor stopgap to discard games that end on time.  It is unfair for the server to be under load that might make a bot perform worse in one game than in another.  If GnoBot uses all four CPUs when it is the only bot running, but uses only one, two, or three when other bots are running, then its performance could vary drastically from game to game.

The best way to make qualifying as fair as possible is to prevent the server from ever being under high load.  Admittedly, it does some violence to the rules to drop from eight benchmark bots down to seven, but that seems less unfair than counting some games and not others, and much less unfair than having such a high load on the system that we can't rely on relatively consistent performance from the benchmark bots.

Ron, if you are reading this, what do you think of removing GnoBot2009 from the pool of benchmark bots?

Title: Re: Will the 2010 Computer Championship be open?
Post by 99of9 on Jan 20th, 2010, 3:58pm

on 01/20/10 at 08:25:06, Fritzlein wrote:
Yes, I think book wins are less legitimate for this purpose because they are a less robust indicator of tournament performance.  If someone comes up with a way to ace the qualifying, we want that to also be a way to ace the tournament.

I agree that this is the right comparison to make.  But it works in reverse too.  If a method works in the tournament, it should work to the same extent in qualifying.  Since the only data we have about what works in a tournament is what happened last year, I'd say that the book did "work" to some degree.


on 01/20/10 at 08:25:06, Fritzlein wrote:
I think book wins within the computer tournament are entirely legitimate, and I'm very happy for GnoBot's book win over Bomb in 2009 for exactly the reasons you mention.  Because of GnoBot, every developer should implement some kind of defense against losing the same way twice.  But whatever defense is implemented by developers in 2010 can't be backported to old benchmark bots.

You are right about what should happen, but we know that this will not be the case for all bots (since bomb is re-entered), and don't yet know how well the other bots will respond to the challenge.  Since we can't know until after each CC tourney, I suggest that next year's qualifier bots should roughly reflect this year's entrants.


on 01/20/10 at 08:25:06, Fritzlein wrote:
Furthermore, if everyone is like JDB in letting parallelism be the main line of defense, but Omar turns off parallel search for game room bots (as I think he must to conserve server resources), then opening books could be extremely effective against benchmark bots while being wholly ineffective in the tournament itself.  That kind of disconnect would be a flaw in the qualifying procedure.

Has Omar turned off parallelism?

On the other hand, server load fluctuations are much bigger in the benchmarking than they are in the real tournament (as we've seen with Gnobot2009Blitz... it never timed out in the real tourney, so it can't be blamed entirely).  This is making the book less effective than it would be if this were last year's tournament.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Jan 20th, 2010, 6:58pm

on 01/20/10 at 15:58:29, 99of9 wrote:
Since we can't know until after each CC tourney, I suggest that next year's qualifier bots should roughly reflect this year's entrants.

That sounds eminently reasonable.  As long as the benchmark bots approximate the previous year's tournament, I'm happy.  My biggest concern on that regard is the presence/absence of parallelism.


Quote:
Has Omar turned off parallelism?

I thought Omar had set the 2009 bots to run in single-threaded mode on the server, but then he posted about GnoBot2009 causing high load, which convinces me that he didn't disable parallelism.

My suggestion is that bots should never be run in multi-threaded mode on the main server, because even one such bot will interfere with the performance of all other bots that happen to be running.  And we should be especially sure that there are no multi-threaded bots in the pool of benchmark bots, because the performance of such bots will vary wildly.


Quote:
server load fluctuations are much bigger in the benchmarking than they are in the real tournament

The performance of single-threaded bots will definitely vary more on the main server than it varies on the dedicated tournament server, but as long as each single-threaded bot has a free CPU it can grab (i.e. as long as four or fewer such bots are running simultaneously) their performance should be reasonably stable.  The variation will at least be less than for a multi-threaded bot which always tries to use all the available CPUs.

Title: Re: Will the 2010 Computer Championship be open?
Post by 99of9 on Jan 20th, 2010, 7:19pm
That puts us in a conundrum.  We want qualifier bots to reflect the previous year's entrants (which are now on the whole parallel).  But we don't want them to ever run in parallel?

I understand where you're coming from, so I'm not sure what is the best thing to do.

I suppose since we're only really aiming to select the top 8 entrants, we don't really need the qualifier bots to be operating at their top standard, just a consistent standard.  So in that case your solution is right.  Even with a 4:1 processor advantage, the 9th place bot is unlikely to get a streak of 4 against the previous year's winner.

Perhaps the number of test games should be cut from 20 to 10 per bot?  That would make it even more challenging to get a string of 4 (and would help server load in the dieing days).

Title: Re: Will the 2010 Computer Championship be open?
Post by 99of9 on Jan 20th, 2010, 7:59pm
Oh dear, I just had a timeout against Opfor2009Fast.  The only other bots playing at the same time were: Opfor2008Fast, and Clueless2007Fast (all 3 are single-core as far as I remember).

[EDIT: And another, in fact the humans playing against bots at the moment are having the same problems.]

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Jan 21st, 2010, 4:49am

on 01/20/10 at 19:59:17, 99of9 wrote:
Oh dear, I just had a timeout against Opfor2009Fast.  The only other bots playing at the same time were: Opfor2008Fast, and Clueless2007Fast (all 3 are single-core as far as I remember).

[EDIT: And another, in fact the humans playing against bots at the moment are having the same problems.]

Ah, so I guess it isn't just GnoBot2009 causing server issues.  As usual, the problem is less straightforward than my instant analysis.  :P

Title: Re: Will the 2010 Computer Championship be open?
Post by RonWeasley on Jan 21st, 2010, 6:43am

on 01/20/10 at 15:18:17, Fritzlein wrote:
Ron, if you are reading this, what do you think of removing GnoBot2009 from the pool of benchmark bots?

I'll try to follow this.  Right now it seems we don't understand what's happening, so I'm not making any rulings.

Title: Re: Will the 2010 Computer Championship be open?
Post by Janzert on Jan 21st, 2010, 7:27am
Given that the first 3-4 bots are providing essentially no differentiation. I'd rather not reduce the field of competitive opponents even further and would prefer to have gnobot2009blitz stay in at this point.

Janzert

Title: Re: Will the 2010 Computer Championship be open?
Post by aaaa on Jan 21st, 2010, 8:54am
GnoBot2009Fast, for one, is well known to me as a bot with quite the timeout problem.

Title: Re: Will the 2010 Computer Championship be open?
Post by 99of9 on Jan 21st, 2010, 2:02pm

on 01/21/10 at 08:54:43, aaaa wrote:
GnoBot2009Fast, for one, is well known to me as a bot with quite the timeout problem.

It is certainly susceptible if it doesn't have the whole server to itself.  It uses every available resource.

Title: Re: Will the 2010 Computer Championship be open?
Post by jdb on Jan 21st, 2010, 2:12pm

on 01/21/10 at 07:27:41, Janzert wrote:
Given that the first 3-4 bots are providing essentially no differentiation. I'd rather not reduce the field of competitive opponents even further and would prefer to have gnobot2009blitz stay in at this point.

Janzert


Bot 4 provided some differentiation today.  >:(

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Jan 21st, 2010, 2:46pm

on 01/21/10 at 07:27:41, Janzert wrote:
Given that the first 3-4 bots are providing essentially no differentiation. I'd rather not reduce the field of competitive opponents even further and would prefer to have gnobot2009blitz stay in at this point.

That's a good point.  Except for one loss by clueless, everyone swept the first four bots.  So dropping GnoBot2009 would leave the burden on just three bots, namely Bomb2005Fast, OpFor2009Fast, and Clueless2009Blitz.  But those three seem to be doing a good job separating anyone who plays them so far.  Marwin nearly swept all three, pragmatic_theory is having trouble beating any of them, and GnoBot is somewhere in between.

The problem with including GnoBot2009Blitz as a benchmark bot is not just that it is disruptive to the server and anyone else trying to play a bot.  The problem is also that GnoBot2009Blitz may be adding more noise than signal to the qualifying scores due to performing worse (and possibly even timing out) when it isn't the only bot running.

Title: Re: Will the 2010 Computer Championship be open?
Post by omar on Jan 22nd, 2010, 8:16am
I checked the options for GnoBot 2009 and I didn't see anything for limiting the number of CPUs that it uses. Also it does a lot of disk IO and that also slows down other processes. I the TD approves I will remove bot_GnoBot2009Blitz from the list of benchmark bots.

Title: Re: Will the 2010 Computer Championship be open?
Post by RonWeasley on Jan 25th, 2010, 10:16am
OK.  This seems like the best consensus we're going to achieve.  Go ahead and remove bot_GnoBot2009Blitz from the benchmark list.

Title: Re: Will the 2010 Computer Championship be open?
Post by jdb on Jan 29th, 2010, 5:30pm

on 01/25/10 at 10:16:32, RonWeasley wrote:
OK.  This seems like the best consensus we're going to achieve.  Go ahead and remove bot_GnoBot2009Blitz from the benchmark list.


Would it be possible to do this before the qualifying period ends? It can be figured out manually, so its not overly critical.

Title: Re: Will the 2010 Computer Championship be open?
Post by 99of9 on Jan 30th, 2010, 2:18am
And omar, if you have any other spare time this weekend, please can you run lots more qualifying games for bomb?  Correct seeding may require up to another 53 games!

Title: Re: Will the 2010 Computer Championship be open?
Post by Tuks on Jan 30th, 2010, 3:15am
and the others need to start playing, the whole point of the qualifiers is to rank the bots, if you dont play all your games the rankings are completely inaccurate and end up being pointless or even unfair.


Title: Re: Will the 2010 Computer Championship be open?
Post by doublep on Jan 30th, 2010, 7:16am
Maybe qualifying period could be extended another week or so?  E.g. OpFor has not played top bots at all and risks receiving 8th seed whereas it would be 4th or 5th otherwise at least.

Title: Re: Will the 2010 Computer Championship be open?
Post by doublep on Jan 30th, 2010, 7:24am
Also, I think if think if this qualifying scheme will be preserved, we should cut down on the weaker opponent.  E.g. of the 4 weaker bots leave at most two (Arimaazilla and Aami-ra).  Or maybe remove them all; contestants are expected to become stronger each next year, so weaker qualifying opponents will make even less of a difference.

And please reduce max. number of games per opponent from 20 to something like 10.  Otherwise current situation with too many unplayed games which could possibly improve seeding, will certainly repeat.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Jan 30th, 2010, 7:39am
Agreed.  The weaker bots were too weak, and 20 games per bot is too many.  I supported the "winning streak" idea at first, but I have since totally changed my mind.

Next year I propose that the stable consist of the blitz version of eight different bots (no P1, P2, or Fast), and the qualifying be exactly four games against each, i.e. two as Gold and two as Silver.  Add up the wins, and that's the qualifying score.

The idea behind having such a huge number of potential games was to permit developers to make last-minute modifications to their bots after losing to a benchmark bot.  Mostly these last-minute modifications have turned out to be fictitious, and to the extent that they have been genuine, it has been a bad thing for the qualifying because we now have one day left and huge numbers of unplayed games.  Either those games will never be played, distorting the standings, or they will all be played at the last minute, potentially overloading the server and disrupting the human World Championship.

Lets get back to something manageable with 8x4=32 games per qualifying bot.  That's pretty close to what developers have decided to play anyway.

Title: Re: Will the 2010 Computer Championship be open?
Post by doublep on Jan 30th, 2010, 8:06am
Well, at least for Badger those last-minute changes were important.  E.g. it now has a fair chance again OpFor2009 whereas it lost 7 games in row before.  But I agree they don't justify sheer number of qualifying games.  If format was different with fewer games, I could just train and debug against other bots.

Having only blitz is perhaps too severe.  After all, some bots can be natural "slow thinkers" and such qualifying scheme would unfairly disadvantage them.  I'd propose to have 4 to 6 blitz bots and 2 to 4 fast opponents instead.

Title: Re: Will the 2010 Computer Championship be open?
Post by jdb on Jan 30th, 2010, 8:14am
For me, the problem was having to manually start each game. The games against the fast bots take around an hour, so all I can do is get in a few games a day. I have to be at the computer to start the next game. If there were a script to automatically play the games it would make it much easier.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Jan 30th, 2010, 8:54am

on 01/30/10 at 08:06:43, doublep wrote:
Having only blitz is perhaps too severe.  After all, some bots can be natural "slow thinkers" and such qualifying scheme would unfairly disadvantage them.  I'd propose to have 4 to 6 blitz bots and 2 to 4 fast opponents instead.

As I recall, Gnobot2009 did better against Bomb2005 at blitz than at slower speeds, so it is true that some bots are natural slow/fast thinkers.

However, by proposing only blitz games I am thinking of the time commitment (see jdb's post) and server resources.  Each blitz game uses (on average) half the server CPU of a fast game, because it only lasts half as long.  Therefore the total server CPU consumption is minimized by having all qualifying games played at blitz speed.  I consider this a weightier consideration than accommodating naturally slow-thinking bots.

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Jan 30th, 2010, 9:08am

on 01/22/10 at 08:16:28, omar wrote:
I checked the options for GnoBot 2009 and I didn't see anything for limiting the number of CPUs that it uses. Also it does a lot of disk IO and that also slows down other processes. I the TD approves I will remove bot_GnoBot2009Blitz from the list of benchmark bots.

Thanks for fixing the standings, Omar.  That makes it much easier to see that marwin has the #1 seed locked up, and that both GnoBot and clueless have maxed out all the bots except Clueless2009Blitz, so that JDB probably doesn't need to play any more games to have the #2 seed unless GnoBot unexpectedly pulls out a four-game winning streak against Clueless2009Blitz, etc.

Title: Re: Will the 2010 Computer Championship be open?
Post by omar on Jan 30th, 2010, 9:08am
I've removed GnoBot2009Blitz from the list of benchmark bots.

Toby, I don't think I'll be able to run too many more games. I'll try to at least have Bomb play at least 4 games against all the benchmark bots.

Title: Re: Will the 2010 Computer Championship be open?
Post by aaaa on Jan 30th, 2010, 9:13am
I say we just scratch the whole idea of benchmark bots and next time put no limit to the number of bots that can participate in the championship. Any seeding could possibly be determined by a (thus voluntary) tournament involving developer bots, perhaps in a style similar to that of the Continuous Tournament.

In order to make the championship fit in the allotted time frame, automatize the whole process of scheduling games (maybe allowing for some pre- and post-game processing). Switching to a round-robin/elimination hybrid tournament format would then also follow naturally.

Let's not be held back by any supposed consideration towards spectators. So what, if one won't be able anymore to follow live as large a proportion of games as before, which are already pretty unfriendly towards spectators being 2 minutes per move? Why should that be allowed to stand in the way of any thorough way of determining what is the best bot?

I know this all has been said before, but surely a year should be enough time to work things out, right?

Title: Re: Will the 2010 Computer Championship be open?
Post by Fritzlein on Jan 30th, 2010, 9:40am

on 01/30/10 at 09:13:51, aaaa wrote:
I say we just scratch the whole idea of benchmark bots and next time put no limit to the number of bots that can participate in the championship.

As Omar pointed out earlier, if we use the finite resources more efficiently, the number of participating bots could be greater, but not unlimited.  A qualifying procedure will eventually have to be implemented regardless.

That said, I do like the idea of having more bots in the main tournament, even if the games are automated to run 24/7.   That's not spectator-unfriendly: on the contrary it means that there is always a tournament game to watch.  It would be a spectator's dream!

Title: Re: Will the 2010 Computer Championship be open?
Post by doublep on Jan 30th, 2010, 10:05am
I don't like idea of a championship without a formalized and strictly timed qualifying phase.  When rules are certain and qualification period is defined, I can do anything I want with my bot in other time.  Otherwise it is always "if I experiment now and it turns out badly, will it hurt the seed 5 months later?".

Having an automated and/or with more games main phase is fine.

Title: Re: Will the 2010 Computer Championship be open?
Post by aaaa on Jan 30th, 2010, 11:26am

on 01/30/10 at 09:40:44, Fritzlein wrote:
As Omar pointed out earlier, if we use the finite resources more efficiently, the number of participating bots could be greater, but not unlimited.  A qualifying procedure will eventually have to be implemented regardless.

But if we ever reach that point wouldn't a qualifying procedure like the current one be too much of a strain on the server anyway? If it ever becomes an issue, the championship format could be made a bit "spongy" by requiring participants to survive with fewer losses, e.g. less than 2, before reaching a specific round or until the number of remaining players no longer exceeds a certain number.



Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.