Arimaa Forum (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
Arimaa >> Events >> 2009 Arimaa Events
(Message started by: omar on Nov 1st, 2008, 8:13pm)

Title: 2009 Arimaa Events
Post by omar on Nov 1st, 2008, 8:13pm
Bot developers start tweaking your engines; human players start practicing your tactics, because the 2009 Arimaa season is fast approaching.

The registration for the 2009 Arimaa events is now open. Here is the overall schedule:
 http://arimaa.com/arimaa/wc/2009/sch.html

Everything is pretty much the same as last year, except that I had to cut back on how much I can contribute to the prize funds. However, I am looking for companies (or wealthy individuals) who might be interested to sponsor the events. If you have any contacts that could help me in this, please let me know.

I've contacted Z-man Games and they will be helping to sponsor the World Championship tournament.

Hope we have a great turnout this year.

Title: Re: 2009 Arimaa Events
Post by Fritzlein on Nov 1st, 2008, 9:55pm
It's very cool that Z-Man is putting up prizes this year.  Hopefully Arimaa will be a big commercial success, which will encourage him to pony up in the future as well.  I'm getting psyched about the coming tournament season.

Big points:
* I see that for the preliminaries players will be seeded by game room rating.  It doesn't matter too much, since the finals will be seeded based on the preliminaries, but I wonder if this will set off extreme bot bashing in the pursuit of better seeding.  Are you still open to discussing an alternate method of seeding?
* Last year forfeits in the preliminaries were a major problem.  It is silly for someone who has quit to have their ghost entry handing out wins essentially at random, perhaps to people who would rather have an opponent to play than have a free win anyway.  We discussed the possibility that anyone who forfeits a game is automatically withdrawn from future rounds unless they explicitly ask the tournament coordinator for permission to continue.  How would you feel about such a rule?

Small points:
* On the World Champion list of registered players, we have pairing numbers behind our names that are left over from last year's final.
* The Computer Championship registration page has the list of last year's entrants.

Title: Re: 2009 Arimaa Events
Post by chessandgo on Nov 2nd, 2008, 5:18am

on 11/01/08 at 21:55:00, Fritzlein wrote:
Last year forfeits in the preliminaries were a major problem.  


Indeed, to my eyes this is the main problem. The whole championship lasted very long, and I prefered the previous years' method. If I recall correctly, the decisive argument in favor of the 6-round prelim + finals formula was that everyone got to play 6 games no matter what. If many games are forfeited du to lack of interest of quite many registered players, then the former setting of a single tournament where everyone had 3 lives would not give significantly less games to be played.

I'm fine with the current formula, but I hope there will be a way to prevent as many games as last year to be forfeited.

Jean

Title: Re: 2009 Arimaa Events
Post by Fritzlein on Nov 2nd, 2008, 6:26am

on 11/02/08 at 05:18:36, chessandgo wrote:
If many games are forfeited du to lack of interest of quite many registered players, then the former setting of a single tournament where everyone had 3 lives would not give significantly less games to be played.

I'm not sure what causes forfeits, but one virtue of a unified triple-elimination is that everyone who hasn't been eliminated can still be champion.  You don't have people with no chance of winning who are still playing games.

One of the arguments in favor of a six-round preliminary last year was to give people a chance for serious, live, human vs. human games.  Now that we have had 13 rounds of the Continuous Tournament, this argument has become obsolete.  People already have a chance to play serious, live, human vs. human games in the off-season.

The distinction between the World Championship preliminaries and the Continuous Tournament is that in the former people are playing to determine who is best, rather than just playing to play their best.  In the context of determining who is best, elimination makes more sense than Swiss.  Swiss pairing is a participation format, and the World Championship is not necessarily about participation.  I am a huge supporter of having the World Championship be an open tournament, with no restrictions on who can enter, but I am increasingly ambivalent about trying to guarantee that it is fun in a "everybody wins" sense instead of letting it be a bloodthirsty "survival of the fittest" contest.

Since we now have an ongoing participation tournament, the one remaining argument for having a Swiss preliminary to the World Championship is that the floating elimination format we have used in the past gives a huge advantage based on seeding, particularly to the top seed over the second seed.  The bias toward the high seeds in the elimination final is not a problem if the seeding is fair, and the preliminary makes the seeding into the final fair (even if the seeding into the preliminary was wack).  But if the World Championship has no preliminary and consists only of a unified triple-elimination round, then either we need to seed it more fairly than game room ratings would seed it, or we need to change the floating elimination format to give less reward to higher seeds.

Title: Re: 2009 Arimaa Events
Post by 99of9 on Nov 4th, 2008, 8:54pm

Quote:
During the 3 months before the start of the tournament any rated games played by the program should use a time control with a time-per-move between one and three minutes.

I feel that this requirement is too onerous.  Similar conditions are not imposed on humans.

Title: Re: 2009 Arimaa Events
Post by 99of9 on Nov 4th, 2008, 9:24pm

Quote:
Also the programs submitted for the championship tournament will be made available for others to play against in the public Arimaa gameroom after the challenge match is over. Thus the programs and players participating in the following year can be improved against the best programs of the previous year.

Omar, could you please comment on the IP status of bot binaries that are handed over under this rule?  I presume this is something like bot developers giving you a free perpetual nonexclusive license to run it in the gameroom.

Some specific questions:
Will this still apply if one day you start charging for access to the gameroom?  What about other such changes to the nature of the gameroom?

If the stated intent is only to help players and programs for the following year, perhaps a (longish) time limitation would make sense?

Would it be possible for developers to propose conditions of use that you would agree to upon accepting a bot for the tourney?  If you did not agree, the developer would either have to remove the offending condition, or not enter.


Quote:
... programs that are limited after some time, some games or limited in any manner will not be allowed to participate in following years...

I presume that handicapping unrated games is ok according to this rule?  Gnobot already deliberately sets up poorly in unrated games (to prevent a certain form of ratings-exploitation-botbashing).  Am I excluded already?  Do I have to remove this feature?


Title: Re: 2009 Arimaa Events
Post by omar on Nov 5th, 2008, 3:42pm

on 11/01/08 at 21:55:00, Fritzlein wrote:
Big points:
* I see that for the preliminaries players will be seeded by game room rating.  It doesn't matter too much, since the finals will be seeded based on the preliminaries, but I wonder if this will set off extreme bot bashing in the pursuit of better seeding.  Are you still open to discussing an alternate method of seeding?

Since we don't have another rating system setup yet, I just went with the gameroom ratings. The gameroom ratings are easily available to the page that shows the registered players; so the players can see what their seeding would be. If someone wants to propose a different rating system and provide a web service to access it (so I can integrate it with the registered players page) I am open to using it.


Quote:
* Last year forfeits in the preliminaries were a major problem.  It is silly for someone who has quit to have their ghost entry handing out wins essentially at random, perhaps to people who would rather have an opponent to play than have a free win anyway.  We discussed the possibility that anyone who forfeits a game is automatically withdrawn from future rounds unless they explicitly ask the tournament coordinator for permission to continue.  How would you feel about such a rule?

Yes, I forgot about this. I think it makes sense that if someone misses a game they are dropped from the tournament unless the tournament director lets them continue. I'll add that note.


Quote:
Small points:
* On the World Champion list of registered players, we have pairing numbers behind our names that are left over from last year's final.
* The Computer Championship registration page has the list of last year's entrants.

Thanks; fixed it.

Title: Re: 2009 Arimaa Events
Post by omar on Nov 5th, 2008, 4:19pm

on 11/04/08 at 20:54:04, 99of9 wrote:
I feel that this requirement is too onerous.  Similar conditions are not imposed on humans.

This was just so that the seeding into the bot tournament would be a little fair. Since bots that played at a fast speed could get a higher rating and that would not be fair to a bot running closer to the tournament speed.

Since the tournaments for the humans and bots are different, different conditions could be used for each. As long as the same conditions apply to all the participants in each of the tournament.

Title: Re: 2009 Arimaa Events
Post by 99of9 on Nov 5th, 2008, 4:39pm

on 11/05/08 at 16:19:52, omar wrote:
This was just so that the seeding into the bot tournament would be a little fair. Since bots that played at a fast speed could get a higher rating and that would not be fair to a bot running closer to the tournament speed.

Unfortunately with the current ratings system, you have to rely on the developer's goodwill anyway to hope that the seedings are a little fair.  If we wanted to distort the seedings, it would be easy to get our bots to win 100 straight games against the P2 bots just prior to the tourney.

My concern here is that often during development it is useful to play blitz or low ply games to quickly test new improvements.  This rule effectively prohibits this (against the existing diversity of bots) for the next 3 months!  Alternatively we could do everything unrated, but it seems a waste of good information, and some opponents play differently anyway.


Quote:
Since the tournaments for the humans and bots are different, different conditions could be used for each. As long as the same conditions apply to all the participants in each of the tournament.

Yes and no.  My point is that you have not imposed the same constraints to ensure "fair seedings" in the human tourney.  Humans are also better at some time controls than others.  Some have a very high blitz rating, but a low postal rating.  Should they be allowed to inflate their ratings at non-tourney time controls?

Title: Re: 2009 Arimaa Events
Post by Janzert on Nov 5th, 2008, 5:32pm
Hmm, I had completely forgotten about the rated game time control for bots. I've been planning on testing opfor in the coming months. This is often easiest done at faster time controls. How do I set unrated mode on its account so that it can play against the fast or blitz bots?

Like 99of9 I'm not sure how this helps give a better rating since a developer can easily manipulate the rating if desired by choosing the opponents the bot plays. This is probably even easier for a developer than the traditional human botbashing manipulations since the developer doesn't have to hang around for the games.

Janzert

Title: Re: 2009 Arimaa Events
Post by omar on Nov 5th, 2008, 5:41pm

on 11/05/08 at 16:39:45, 99of9 wrote:
My concern here is that often during development it is useful to play blitz or low ply games to quickly test new improvements.  This rule effectively prohibits this (against the existing diversity of bots) for the next 3 months!  Alternatively we could do everything unrated, but it seems a waste of good information, and some opponents play differently anyway.

I see what mean now. No problem I'll remove that. Hope the other developers are OK with this.


Quote:
Yes and no.  My point is that you have not imposed the same constraints to ensure "fair seedings" in the human tourney.  Humans are also better at some time controls than others.  Some have a very high blitz rating, but a low postal rating.  Should they be allowed to inflate their ratings at non-tourney time controls?

I probably should also tell the human players to not do bashing to boost the ratings, but as Karl mentioned the rating have less of an impact in the human tournament because of the preliminary stage.

Title: Re: 2009 Arimaa Events
Post by 99of9 on Nov 5th, 2008, 5:47pm

on 11/05/08 at 17:41:27, omar wrote:
I see what mean now. No problem I'll remove that. Hope the other developers are OK with this.

Many thanks!

Title: Re: 2009 Arimaa Events
Post by omar on Nov 5th, 2008, 7:21pm

on 11/04/08 at 21:24:21, 99of9 wrote:
Omar, could you please comment on the IP status of bot binaries that are handed over under this rule?  I presume this is something like bot developers giving you a free perpetual nonexclusive license to run it in the gameroom.

Some specific questions:
Will this still apply if one day you start charging for access to the gameroom?  What about other such changes to the nature of the gameroom?

A bot developer does not lose any IP rights by submitting the bot for the tournament and challenge match. The bot developer is providing a binary for indefinite use in the Arimaa gameroom. However, I do not have any intent to use the submitted bots in a commercial way. If the nature of this gameroom ever changes to become a commercial one then I would not run those bots here. I would have to setup a different non-commercial gameroom for the developers and run them there. If I wanted to run the bots in the commercial gameroom I would purchase them from the developers for that purpose.

Before the first bot tournament had occurred, I purchased Occam from Don Daily and a version of Bomb from David Fotland for the purpose of having them run in the Arimaa gameroom. Don later decided to release the Occam code publicly. David later decided to sell Bomb commercially. So developers always retain IP rights to the code (unless they sell that).


Quote:
If the stated intent is only to help players and programs for the following year, perhaps a (longish) time limitation would make sense?

I also like to preserve the original bots for historical purposes, so it would be nice if they didn't. I should change the wording on that to say years instead of year.


Quote:
Would it be possible for developers to propose conditions of use that you would agree to upon accepting a bot for the tourney?  If you did not agree, the developer would either have to remove the offending condition, or not enter.

I would rather not do this, just to keep things simple.


Quote:
I presume that handicapping unrated games is ok according to this rule?  Gnobot already deliberately sets up poorly in unrated games (to prevent a certain form of ratings-exploitation-botbashing).  Am I excluded already?  Do I have to remove this feature?

As long as the bot runs the same way after the tournament as it did in the tournament there is no reason for concern. It is OK for bots to play differently for rated and unrated games; they can even play differently against humans then they do against other bots or even differently against specific different opponents.

Title: Re: 2009 Arimaa Events
Post by aaaa on Nov 6th, 2008, 10:53pm
Why not make the computer championship a (possibly double) round-robin tournament? That would eliminate seeding as a factor.

Title: Re: 2009 Arimaa Events
Post by omar on Nov 7th, 2008, 5:39pm
The double round robin format does eliminate the need for seeding. But it requires a lot more games (not that the bots care, but the tournament coordinator does) to complete. Also it doesn't have a climatic finish.

Actually, we never done any experiments to see how much the seeding matters for floating double/triple elimination tournaments. This would be an interesting project.

Title: Re: 2009 Arimaa Events
Post by aaaa on Nov 7th, 2008, 8:12pm
You could mitigate the problem a bit by having the seeding be determined to a certain extent by the performance of a player in last year's championship.

Title: Re: 2009 Arimaa Events
Post by RonWeasley on Nov 12th, 2008, 7:43am
Omar,

I went to register for the Spectator contest.  The description says the fee is $5 but PayPal asks for $10.  I'll pay either one, but which is correct?  (Or is it $10 for me and $5 for everybody else?)

Title: Re: 2009 Arimaa Events
Post by chessandgo on Nov 12th, 2008, 10:26am
This is tax on Time-Turners probably.

Title: Re: 2009 Arimaa Events
Post by Adanac on Nov 14th, 2008, 8:26am

on 11/12/08 at 07:43:02, RonWeasley wrote:
Omar,

I went to register for the Spectator contest.  The description says the fee is $5 but PayPal asks for $10.  I'll pay either one, but which is correct?  (Or is it $10 for me and $5 for everybody else?)


I noticed the same thing as well, but Omar has now fixed the problem.  The time-turner-tax has been eliminated.

Title: Re: 2009 Arimaa Events
Post by omar on Nov 14th, 2008, 11:41am
I forgot to change the settings on the PayPal button. Thanks for reminding me. It's been fixed now. There goes my hopes of winning the spectator contest by being the only contestant :-)

Title: Re: 2009 Arimaa Events
Post by Fritzlein on Nov 21st, 2008, 6:14am
It's official.  There are nine players registered for the World Championship, therefore we can't skip straight to the finals.  There will have to be a preliminary round.  Yay!

Title: Re: 2009 Arimaa Events
Post by aaaa on Nov 25th, 2008, 2:45pm
Can the format of the Computer Championship be changed to floating quadruple elimination? We now have what appears to be four closely matched bots, so I think that in order for the final result to be a sufficiently accurate reflection of strength, it's now more important than ever that the tournament should truly be discerning, especially in light of the excessive influence we have currently attributed to its seeding.

Title: Re: 2009 Arimaa Events
Post by Fritzlein on Nov 26th, 2008, 8:22am
I also like quadruple elimination.  The pairing program can handle it, because the program is limited by participants, not by rounds.  The extra elimination makes the seeding less relevant and the determination of the winner more accurate.  The only drawback is that the tournament lasts longer and is more work for Omar.  There are too many possible pairings to calculate by hand, but I estimate that for eight participants triple elimination has 7-10 rounds and 21-23 games, while quadruple elimination has 9-13 rounds and 28-31 games.

To compress the time, Omar might want to do two rounds per day in the late rounds.  Omar has talked about not compressing the rounds, so that spectators have some time to see the schedule in advance and watch the games live, but I don't think this should be a large factor in the pacing of the tournament.  The slow time control is not good for spectating anyway, plus the different time zones already mean each game is impossible for some spectators to watch, so I don't see a problem with running through the CC games as quickly as is convenient.

Title: Re: 2009 Arimaa Events
Post by omar on Nov 29th, 2008, 12:27pm

on 11/25/08 at 14:45:29, aaaa wrote:
Can the format of the Computer Championship be changed to floating quadruple elimination? We now have what appears to be four closely matched bots, so I think that in order for the final result to be a sufficiently accurate reflection of strength, it's now more important than ever that the tournament should truly be discerning, especially in light of the excessive influence we have currently attributed to its seeding.


You might be surprised to know that ratings can be better at picking the best player than even a double round robin tournament. Just depends on how accurate the ratings are. Also the gain in higher probability of picking the best player is only about 2% between FTE and FQE.

http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=talk;action=display;num=1114794077;start=84#84

It is quite surprising, but there is much more to be gained by improving the initial seeding than by extending the number of rounds. However, these simulations were done for 16 player tournaments; they need to be rerun for 8 player tournaments.

Title: Re: 2009 Arimaa Events
Post by aaaa on Nov 29th, 2008, 2:31pm

on 11/29/08 at 12:27:51, omar wrote:
You might be surprised to know that ratings can be better at picking the best player than even a double round robin tournament.

Oh, but I'm not surprised by this at all, for see below.


Quote:
Just depends on how accurate the ratings are.

And how do we arrive at accurate ratings? By having games be played between the contestants in the first place! You must surely see that there is some sort of "no free lunch" theorem at work here? Wouldn't it in that case be much better that it's more the games in the controlled environment of the tournament itself that are going to determine the result rather than a rating system that represents other games, full with the severe flaws that would come with it, not the least of which the extreme vulnerability to self-selection distortions? I just met an Arimaa player who saw maximizing his own rating as a goal in itself and for that reason refused to play me rated.


Quote:
Also the gain in higher probability of picking the best player is only about 2% between FTE and FQE.

http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=talk;action=display;num=1114794077;start=84#84

Even accepting that number for sake of argument, that doesn't cover the additional advantage that, even with no change in winner, the inclination towards the "right" result would still be more likely to become manifest in the degree of decisiveness of the final result, i.e. it would likely make us more confident with a "right" result and less with a "wrong" one. The importance of this should not be underestimated.


Quote:
It is quite surprising, but there is much more to be gained by improving the initial seeding than by extending the number of rounds.

I still think that, for the purpose of seeding, trying to fix the ratings rather than outright minimizing their influence would be beating the air. It might even be a dangerous pursuit, as it could lead to a false confidence in a solution that would only (temporarily) hide the weaknesses.


Quote:
However, these simulations were done for 16 player tournaments; they need to be rerun for 8 player tournaments.

More simulations can never hurt of course, but I for one, would like to see what happens if there are several closely matched players (as I consider the top-4 bots to be) in them. Chances are, we'll see the percentages fall significantly, even with more reliable ratings.

Title: Re: 2009 Arimaa Events
Post by jdb on Nov 29th, 2008, 4:51pm
Just a quick comment, and then I'll go back to lurking on this thread.

If a bot is under constant development, the bot that got its rating in the game room and the one that enters the tournament, a couple weeks later, could be very different.

Title: Re: 2009 Arimaa Events
Post by omar on Dec 3rd, 2008, 9:02am
I ran some simulations to compare triple elimination with quad elimination when there are 8 players. It seems like the improvement is closer to about 4%.

run3 'formats/floatTripElim' 1000 8 500 50 10000000

means: 1000 trials, 8 players, range of true ratings is 500 elo points, measured rating inaccuracy of 50 elo points, and a draw ratio of 1:10000000.


Code:
run3 'formats/floatTripElim' 1000 8 500 50 10000000
 1   52.4%
average number of rounds = 8.97
average rating from best = 32.4

run3 'formats/floatQuadElim' 1000 8 500 50 10000000
 1   56.3%
average number of rounds = 11.43
average rating from best = 29.5


run3 'formats/floatTripElim' 1000 8 500 100 10000000
 1   51.0%
average number of rounds = 9.04
average rating from best = 34.2

run3 'formats/floatQuadElim' 1000 8 500 100 10000000
 1   54.9%
average number of rounds = 11.41
average rating from best = 28.7


run3 'formats/floatTripElim' 1000 8 500 200 10000000
 1   52.0%
average number of rounds = 9.03
average rating from best = 34.2

run3 'formats/floatQuadElim' 1000 8 500 200 10000000
 1   50.1%
average number of rounds = 11.46
average rating from best = 33.3



run3 'formats/floatTripElim' 1000 8 500 400 10000000
 1   51.3%
average number of rounds = 9.00
average rating from best = 34.8

run3 'formats/floatQuadElim' 1000 8 500 400 10000000
 1   53.2%
average number of rounds = 11.38
average rating from best = 33.0


I didn't see what happens yet if the true rating range is smaller.

The programs I used are available from:

http://arimaa.com/arimaa/tourn/compare/sim.tar

or in ZIP format:

http://arimaa.com/arimaa/tourn/compare/sim.zip


Title: Re: 2009 Arimaa Events
Post by Fritzlein on Dec 3rd, 2008, 9:48am
The 400-point inaccuracy for gameroom ratings might be the most realistic of the trials you ran.  Remember in 2008 Sharp was the lowest seed with a rating of 1500, but then came in second place, ahead of OpFor who had a pre-tournament rating of 1890.  That's a 390-point swing right there.

Also in 2006, the top two seeds by gameroom rating were GnoBot and Loc, while Bomb was the bottom seed, but Bomb took first while GnoBot and Loc took the bottom two spots.

Right now we have a situation where GnoBot and Rat each have a Bomb-beating formula, so either developer could push down Bomb to the bottom seed if they wanted, while giving their own bot the top seed.  In such a situation, it makes much more sense to me to assume wildly inaccurate ratings in the simulations.

As for the range of true strengths of the bots, I would guess that it has been 500 points in the past, but this year there are a lot of bots that are close to each other at the top, so we might be trying to make a finer discrimination than in the past.

For the statistic "average rating from best", do you average in a zero when the best player wins, or is it the average only from the times when the best player doesn't win?  If the former than the two formats aren't that different in their misses, but if the latter quadruple elimination is not only more likely to determine the best player, it is also more likely to miss only by a little.

As I think about it, if triple-elimination is right over half the time and, when it is wrong is only wrong by an average of 35 rating points, it's not doing so bad.

Finally, for the Computer Championship, aren't you more concerned about the average number of games than the average number of rounds, since you can only play one game at a time no matter how many games are in the round?  But maybe the increase in games is essentially proportional to the increase in rounds.

Title: Re: 2009 Arimaa Events
Post by aaaa on Dec 3rd, 2008, 10:22am
If we restrict our attention to the top-4 bots (with all due respect to the developers of the others), we see that the current maximum difference in rating is 160. If for each run, the true rating range is set to this number plus the given rating inaccuracy we get the following:


Code:
./run3 'formats/floatTripElim' 1024 4 190 30 1000000000
 1   46.4%
 2   26.4%
 3   17.8%
 4    9.5%
average number of rounds = 6.80
average rating from best = 29.5
./run3 'formats/floatQuadElim' 1024 4 190 30 1000000000
 1   52.5%
 2   26.5%
 3   13.7%
 4    7.3%
average number of rounds = 8.86
average rating from best = 23.7
./run3 'formats/floatTripElim' 1024 4 220 60 1000000000
 1   47.4%
 2   29.8%
 3   15.2%
 4    7.6%
average number of rounds = 6.81
average rating from best = 30.3
./run3 'formats/floatQuadElim' 1024 4 220 60 1000000000
 1   54.2%
 2   25.7%
 3   14.3%
 4    5.9%
average number of rounds = 8.89
average rating from best = 25.9
./run3 'formats/floatTripElim' 1024 4 280 120 1000000000
 1   53.3%
 2   26.9%
 3   13.4%
 4    6.4%
average number of rounds = 6.79
average rating from best = 30.9
./run3 'formats/floatQuadElim' 1024 4 280 120 1000000000
 1   54.3%
 2   27.6%
 3   12.1%
 4    6.0%
average number of rounds = 8.83
average rating from best = 27.8

Title: Re: 2009 Arimaa Events
Post by aaaa on Dec 3rd, 2008, 10:34am

on 12/03/08 at 09:48:10, Fritzlein wrote:
For the statistic "average rating from best", do you average in a zero when the best player wins, or is it the average only from the times when the best player doesn't win?

From a cursory glance at the code it appears that it's simply the average distance from every true rating to the best one and that the results don't come in.

Title: Re: 2009 Arimaa Events
Post by Fritzlein on Dec 3rd, 2008, 10:38am
Interesting, aaaa.  Your first two simulations show a much larger difference between triple and quadruple elimination than the last one shows.  Is this a fluke or does increasing (spread + uncertainty) actually blur the distinction between the two formats?

Title: Re: 2009 Arimaa Events
Post by Fritzlein on Dec 3rd, 2008, 10:41am

on 12/03/08 at 10:34:22, aaaa wrote:
From a cursory glance at the code it appears that it's simply the average distance from every true rating to the best one and that the results don't come in.

Hmmm, but then why would that be consistently lower for quadruple elimination than for triple?

Title: Re: 2009 Arimaa Events
Post by aaaa on Dec 3rd, 2008, 10:59am

on 12/03/08 at 10:38:57, Fritzlein wrote:
Interesting, aaaa.  Your first two simulations show a much larger difference between triple and quadruple elimination than the last one shows.  Is this a fluke or does increasing (spread + uncertainty) actually blur the distinction between the two formats?

The hypothesis that immediately comes to mind is that with a larger spread of true ratings, the chance increases significantly that there is one player in the tournament who is much stronger than the rest and will often just run away with the title, regardless of what format may happen to be in use.


on 12/03/08 at 10:41:05, Fritzlein wrote:
Hmmm, but then why would that be consistently lower for quadruple elimination than for triple?

Heh heh, I told you it was from a cursory glance. With some more scrutiny, I now think the average difference is instead with respect to the true rating of the eventual winner.

[EDIT]
<Sigh> I should really take the time to figure things out before spouting off. It now appears to be indeed the (obvious) average difference between the winner and the best player and that includes the case where they happen to coincide.
[/EDIT]

Title: Re: 2009 Arimaa Events
Post by Fritzlein on Dec 3rd, 2008, 4:51pm

on 12/03/08 at 10:59:45, aaaa wrote:
The hypothesis that immediately comes to mind is that with a larger spread of true ratings, the chance increases significantly that there is one player in the tournament who is much stronger than the rest and will often just run away with the title, regardless of what format may happen to be in use.

There are three data points that don't form a line, so I still suspect some statistical fluke, but it is very plausible that the more the true ratings are spread out, the less difference it makes what system we use.  I guess the scenario I am most interested in is where the participants are close to each other in true strength, but the seeding is essentially random.  If each bot can beat each other at least one third of the time, that would be a true strength range of 120 points, plus (say) a range of 240 points inaccuracy added to that to randomize the seeding.  That scenario probably is where the difference between FTE and FQE is great, whereas if the true strength had a range of 400 points and there was no inaccuracy in ratings (hence perfect seeding) both would perform pretty well.

Title: Re: 2009 Arimaa Events
Post by aaaa on Dec 3rd, 2008, 5:33pm

on 12/03/08 at 16:51:37, Fritzlein wrote:
I guess the scenario I am most interested in is where the participants are close to each other in true strength, but the seeding is essentially random.  If each bot can beat each other at least one third of the time, that would be a true strength range of 120 points, plus (say) a range of 240 points inaccuracy added to that to randomize the seeding.

I assume you mean that the rating error should be between -120 and +120, but, just to be on the safe side, I've also added the scenario where it's between -240 and +240:


Code:
./run3 'formats/floatTripElim' 1024 4 120 120 1000000000
 1   37.4%
 2   30.5%
 3   19.3%
 4   12.8%
average number of rounds = 6.84
average rating from best = 24.6
./run3 'formats/floatQuadElim' 1024 4 120 120 1000000000
 1   43.0%
 2   25.1%
 3   19.6%
 4   12.3%
average number of rounds = 8.98
average rating from best = 21.6
./run3 'formats/floatTripElim' 1024 4 120 240 1000000000
 1   35.4%
 2   28.5%
 3   20.6%
 4   15.5%
average number of rounds = 6.78
average rating from best = 27.2
./run3 'formats/floatQuadElim' 1024 4 120 240 1000000000
 1   41.3%
 2   23.8%
 3   20.0%
 4   14.8%
average number of rounds = 8.97
average rating from best = 23.8

Title: Re: 2009 Arimaa Events
Post by omar on Dec 3rd, 2008, 5:51pm

on 12/03/08 at 10:22:11, aaaa wrote:
If we restrict our attention to the top-4 bots (with all due respect to the developers of the others), we see that the current maximum difference in rating is 160. If for each run, the true rating range is set to this number plus the given rating inaccuracy we get the following:


Thanks for trying this aaaa. The difference between the two formats continues to get bigger as the number of players decreases. The third comparison seems to suggest that quad elimination is also very sensitive to the initial seeding; though as Karl mentioned this seems to be a bit of a fluke. But also I would suggest not changing two variables at the same time because it makes identifying the source difficult.

I think these simulations show that we can make about as much improvement by having better initial seeding as we can by adding a few more rounds.

The only problem I have with adding more rounds is that I try to do one round per day and pre-schedule them so that people know when to show up if they want to watch the games. Also I can't schedule the next round until the current one is finished. With 11 to 12 rounds I would be cutting it close so that if there was a problem and I had to replay a game, it could cause problems with finishing the tournament as scheduled. With about 8 to 9 rounds I have some leeway for mistakes. I could extend the time for the computer tournament, but the way I have it setup now the first two weeks of the month are for the computer tournament and the second two for the challenge match preliminaries. This works out nicely so that I only have to rent an extra dedicated server (silver.arimaa.com) for one month (this is in addition to renting gold.arimaa.com for 3 months). Adding a few more days and crossing into the next month means you have to pay for another month of server rental when you don't fully use it. Ideally these kind of issues should not come into play and they definitely won't once the computer championship gets important enough that hundreds of people show up to watch, but until then I also have to juggle the practical issues :-)

Maybe we can get better initial seeding by just using a filter that only looks at bot-bot games.


Title: Re: 2009 Arimaa Events
Post by jdb on Dec 4th, 2008, 8:06am
Another tournament format possibility, floatRRQuadElim. It starts with a round robin tournament, and the losses carry forward into the floating quad elimination phase. Statistics on a small run are presented below.

Modify the formats/floatQuadElim file by adding the following lines and uncommenting them.


Code:
pa, ga, ra, ph, rn = getTournState(ARGV[0]);

#print " # #{pa.length} #{rn} \n"
#
#
#if (rn < (pa.length-1) )
#  print "* Calling round robin\n"
#  print "* Round #{rn+1}\n"
#  result = `formats/roundRobin #{ARGV[0]}`
#  print result
#  exit
#end








Code:
jeff@quad:~/arimaa_bots/sim$ run3 floatQuadElim 200 6 120 240 100000
./run3 'formats/floatQuadElim' 200 6 120 240 100000
 1   28.5%
 2   28.5%
 3   17.0%
 4   13.5%
 5    8.5%
 6    4.0%
average number of rounds = 10.27
average rating from best = 27.8
jeff@quad:~/arimaa_bots/sim$ run3 floatTripElim 200 6 120 240 100000
./run3 'formats/floatTripElim' 200 6 120 240 100000
 1   27.5%
 2   27.0%
 3   14.5%
 4   15.0%
 5    9.5%
 6    6.5%
average number of rounds = 8.29
average rating from best = 25.7
jeff@quad:~/arimaa_bots/sim$ run3 floatRRQuadElim 200 6 120 240 100000
./run3 'formats/floatRRQuadElim' 200 6 120 240 100000
 1   29.5%
 2   24.0%
 3   15.5%
 4   13.5%
 5   10.0%
 6    7.5%
average number of rounds = 10.16
average rating from best = 27.7

Title: Re: 2009 Arimaa Events
Post by omar on Dec 5th, 2008, 4:16pm
Jeff are you sure you posted the right code; floatQuadElim doesn't have such lines in it??

Thanks for trying this out. I like it when more people are able to experiment and contribute their results.

Title: Re: 2009 Arimaa Events
Post by jdb on Dec 5th, 2008, 6:38pm

on 12/05/08 at 16:16:06, omar wrote:
Jeff are you sure you posted the right code; floatQuadElim doesn't have such lines in it??

Thanks for trying this out. I like it when more people are able to experiment and contribute their results.


Sorry, I should have been clearer. In the following block of code, the first line is the only line that exists in the original floatQuadElim file. I added all the commented lines, to create the floatRRQuadElim format. It should be line 40. The commented lines need to be uncommented to run the new format. If you want I can email the file to you.  


Code:
pa, ga, ra, ph, rn = getTournState(ARGV[0]);

#print " # #{pa.length} #{rn} \n"
#
#
#if (rn < (pa.length-1) )
#  print "* Calling round robin\n"
#  print "* Round #{rn+1}\n"
#  result = `formats/roundRobin #{ARGV[0]}`
#  print result
#  exit
#end

Title: Re: 2009 Arimaa Events
Post by omar on Dec 7th, 2008, 11:21pm
Thanks I got it now.

Title: Re: 2009 Arimaa Events
Post by Fritzlein on Dec 12th, 2008, 11:37am
Omar, have to contemplated moving the scheduling window 24 hours later in the week?  If so, do you want to implement the change while the Continuous Tournament is ongoing (to test it) or after the Continuous Tournament is over (to minimize disruption)?

Title: Re: 2009 Arimaa Events
Post by omar on Dec 14th, 2008, 7:41am
I think it might be too disruptive to make that change in the Continuous Tournament while it is in progress. The slots I've selected for say Sunday would suddenly be treated as slots selected for Monday. So everyone would need to make sure they update their times or they could get scheduled for unexpected times.

Changing the programs to shift by 24 hours is not too bad. It gets complicated for shifts that are not a multiple of 24.

I'll make this change for the 2009 WC tournament first and then change it for the Continuous Tournament when it is not in progress.

Title: Re: 2009 Arimaa Events
Post by Fritzlein on Dec 14th, 2008, 9:13am
OK, thanks.  I think it will result in slightly more games being played a favorable times over all.  Since you won't be changing the time slot in the middle of the Continuous Tournament, that gives an extra reason not to have the CT run all the way to the week before the World Championship.  If people have two weeks instead of one to get in the right times for the Championship, there is less likely to be any confusion on that score.

Title: Re: 2009 Arimaa Events
Post by Fritzlein on Dec 14th, 2008, 3:25pm
Now 12 players are signed up for the World Championship, so we are guaranteed at least four rounds of preliminaries.  Yay!

Title: Re: 2009 Arimaa Events
Post by omar on Dec 14th, 2008, 5:23pm
Karl reminded me in the chat that the pairing program that we use for floating elimination does not assign the colors properly and that we had to fix it manually last year. I would like to get it fixed before the tournaments start this year.

The color assignment rule I want to use is:
Within each pairing, the player who has played gold a fewer number of times so far against this opponent will play gold for that game. If this is a tie, then the player who has played gold a fewer percentage of times so far in the tournament will play gold for that game, with ties broken randomly.

I was just looking at the code Paul provided for doing the floating elimination pairings.

http://arimaa.com/arimaa/wc/2009/floatDoubleElim

Off hand I don't see how to change the code to add these color assignment rules. Does anyone have an idea on how this could be done.

Title: Re: 2009 Arimaa Events
Post by Janzert on Dec 14th, 2008, 6:14pm
From lines 751-755 it appears that currently games with opposite color assignments are given equal weights. But without understanding the full pairing algorithm I'm not sure if it's safe to just change the weighting function (get_game_penalty line:1016) to take into account side and rewrite the loop to assign separate values.

I don't currently have C++ compiler setup or I'd try it out and see what happens, maybe I can do that soon anyway.

Janzert

Title: Re: 2009 Arimaa Events
Post by Fritzlein on Dec 15th, 2008, 1:15pm

on 12/14/08 at 18:14:55, Janzert wrote:
From lines 751-755 it appears that currently games with opposite color assignments are given equal weights. But without understanding the full pairing algorithm I'm not sure if it's safe to just change the weighting function (get_game_penalty line:1016) to take into account side and rewrite the loop to assign separate values.

If the game penalty is used to assign color, then the code is horrendously designed, because considering A vs B as a different pairing than B vs A causes the number of possible pairings to increase exponentially.  In a 16-player tournament, there will be 2^8 times as many possible pairings.  The branch-and-bound method is sometimes fast in spite of having to search an exponential number of possibilities, but it is asking for trouble not to prune the input space as much as possible beforehand.

The pairings should be assigned based on the game_penalty values (irrespective of color), and then after the optimal pairing is found, the colors within pairings should be assigned as optimally as possible in a separate pass.  Now that I think about it, I don't recall ever seeing the color assignment code, so for all I know there is no such code and color assignment is random!  (or more likely deterministic in a bad way, such as lower pairing number always plays Gold)

Title: Re: 2009 Arimaa Events
Post by Janzert on Dec 15th, 2008, 2:02pm
Ok, having looked at it some more. I don't think the current code takes into account color at all. Fritzlein is right, it does assume that a pair is the same whichever player is player 1 or player 2. The current program is just saying it should be player x vs. player y for the final output and wasn't meant to imply color assignment.

Janzert

Title: Re: 2009 Arimaa Events
Post by omar on Dec 15th, 2008, 10:44pm
Thanks guys. That makes sense. No wonder I couldn't find any code for color assignment :-)

So, I'll probably have to take the output from Paul's program and run it through another program to do the color assignment. Thanks.

Title: Re: 2009 Arimaa Events
Post by Fritzlein on Dec 18th, 2008, 5:23pm
Sixteen players are now registered for the World Championship, which guarantees at least five rounds of preliminaries.  Yay!  

Title: Re: 2009 Arimaa Events
Post by aaaa on Jan 2nd, 2009, 6:05am
Is it normal that there are no registrations yet for the Computer Championship with only one month to go before the deadline?

Title: Re: 2009 Arimaa Events
Post by Fritzlein on Jan 2nd, 2009, 6:27pm
The World Championship registration deadline has passed, so I assume the 18 players currently registered are the field for the preliminary, and there will be five preliminary rounds.  When are the ratings fixed for seeding the preliminary rounds?

Title: Re: 2009 Arimaa Events
Post by 99of9 on Jan 2nd, 2009, 10:55pm

on 01/02/09 at 06:05:43, aaaa wrote:
Is it normal that there are no registrations yet for the Computer Championship with only one month to go before the deadline?

Perhaps you should enter bot_quad and get a win by default?  :)

Title: Re: 2009 Arimaa Events
Post by Tuks on Jan 3rd, 2009, 4:47am
today at 6, exactly 24 hours after registration ends...that's what it says anyway



Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.