Author |
Topic: World Championship tournament format (Read 9582 times) |
|
omar
Forum Guru
Arimaa player #2
Gender:
Posts: 1003
|
|
Re: World Championship tournament format
« Reply #75 on: Jun 18th, 2005, 6:58am » |
Quote Modify
|
Thanks for sharing this Tarr. It seems like a pretty easy format to implement (for the 16 player case). I'll simulate this format and let you know how it compares with the others.
|
|
IP Logged |
|
|
|
omar
Forum Guru
Arimaa player #2
Gender:
Posts: 1003
|
|
Re: World Championship tournament format
« Reply #76 on: Jun 18th, 2005, 12:41pm » |
Quote Modify
|
I decided to change the tournament simulation program so that it generates the true ratings first (to be in the range of the 4th argument) and set the measured ratings by adding the rating inaccuracies to the true ratings. The new program is called run2. Initially I didn't think it mattered which was done first and since we could limit enteries in a tournament based on measured ratings and not true ratings, I had decided to generate the measured ratings first and set the true ratings from them. But Im finding that a consequence of doing it this way is that as the rating inaccuracies increase the performance of most formats also increases. Even the performance of singleElimRand increases. This is becuase the range of the true ratings increases with rating inaccuracies. It makes more sense if the performance of most formats decrease with increase in rating inaccuries (and the performance of singleElimRand stays constant). Changing it so that the true ratings are generated first and measured ratings set from them does this. I also changed it so that if you pass the string 'show' to run2 in place of the number of trials, it runs one trial which shows what is happening at each round; pausing for an enter between rounds. I need to rerun the previous simulations with run2.
|
|
IP Logged |
|
|
|
99of9
Forum Guru
Gnobby's creator (player #314)
Gender:
Posts: 1413
|
|
Re: World Championship tournament format
« Reply #77 on: Jun 18th, 2005, 11:30pm » |
Quote Modify
|
Oh good - I think this is definitely a better way to do it.
|
|
IP Logged |
|
|
|
omar
Forum Guru
Arimaa player #2
Gender:
Posts: 1003
|
|
Re: World Championship tournament format
« Reply #78 on: Jun 19th, 2005, 5:06pm » |
Quote Modify
|
Here's the results of running the simulations using the new run program (run2). Each format was run for 2000 trials and the results averaged. The number of players was fixed at 16, the true rating range was set to 500 points the draw ratio was set to 9999. In one set of simulations the rating inaccuracies were set to 50 and in another set they were set to 200. With inaccuracies set to 50: run2 'formats/doubleElimFold' 2000 16 500 50 9999 1 34.6% 2 23.4% 3 16.3% run2 'formats/roundRobin' 2000 16 500 50 9999 1 32.2% 2 22.7% 3 16.1% run2 'formats/roundRobinDouble' 2000 16 500 50 9999 1 33.4% 2 23.8% 3 16.4% run2 'formats/roundRobinRated 10' 2000 16 500 50 9999 1 64.2% 2 23.6% 3 8.9% run2 'formats/singleElimFold' 2000 16 500 50 9999 1 32.3% 2 21.7% 3 15.8% run2 'formats/singleElimOrd' 2000 16 500 50 9999 1 20.2% 2 14.4% 3 11.8% run2 'formats/singleElimRand' 2000 16 500 50 9999 1 25.6% 2 18.4% 3 15.8% run2 'formats/singleElimSlide' 2000 16 500 50 9999 1 24.7% 2 19.9% 3 15.6% run2 'formats/swissKnife' 2000 16 500 50 9999 1 65.3% 2 23.9% 3 7.7% run2 'formats/swissOmatic 10' 2000 16 500 50 9999 1 65.1% 2 23.6% 3 8.0% run2 'formats/swissSaw 10' 2000 16 500 50 9999 1 65.0% 2 22.9% 3 8.3% run2 'formats/upa16' 2000 16 500 50 9999 1 27.2% 2 22.1% 3 15.2% With inaccuries set to 200: run2 'formats/doubleElimFold' 2000 16 500 200 9999 1 31.9% 2 22.4% 3 17.3% run2 'formats/roundRobin' 2000 16 500 200 9999 1 30.0% 2 23.3% 3 17.4% run2 'formats/roundRobinDouble' 2000 16 500 200 9999 1 32.4% 2 22.7% 3 15.6% run2 'formats/roundRobinRated 40' 2000 16 500 200 9999 1 40.6% 2 24.8% 3 16.0% run2 'formats/singleElimFold' 2000 16 500 200 9999 1 28.6% 2 21.6% 3 16.1% run2 'formats/singleElimOrd' 2000 16 500 200 9999 1 23.4% 2 15.9% 3 13.1% run2 'formats/singleElimRand' 2000 16 500 200 9999 1 25.1% 2 19.4% 3 14.7% run2 'formats/singleElimSlide' 2000 16 500 200 9999 1 25.9% 2 18.9% 3 15.0% run2 'formats/swissKnife' 2000 16 500 200 9999 1 35.4% 2 24.1% 3 16.9% run2 'formats/swissOmatic 40' 2000 16 500 200 9999 1 42.5% 2 25.4% 3 15.6% run2 'formats/swissSaw 40' 2000 16 500 200 9999 1 42.0% 2 25.3% 3 15.6% run2 'formats/upa16' 2000 16 500 200 9999 1 26.9% 2 20.8% 3 15.7%
|
|
IP Logged |
|
|
|
omar
Forum Guru
Arimaa player #2
Gender:
Posts: 1003
|
|
Re: World Championship tournament format
« Reply #79 on: Jun 19th, 2005, 5:34pm » |
Quote Modify
|
The doubleElimFold format was contributed by Jeff Bacher. The upa16 format is my implementation of the format Tarr mentioned earler. First thing I noticed is that the formats that don't make any use of the ratings (or use them only for tie breaks) have about the same performace when the inaccuracy range is increased from 50 to 200. These formats include: singleElimRand, roundRobin, roundRobinDouble, and upa16. The other thing I noticed is that formats which make use of ratings usually out perform the formats that don't even when the rating inaccuracies were higher. The doubleElimFold seems to perform slightly better than the singleElimFold; about 2 or 3 percent. But it will run for about 11 or 12 weeks compared to about 6 for singleElimFold.
|
|
IP Logged |
|
|
|
omar
Forum Guru
Arimaa player #2
Gender:
Posts: 1003
|
|
Re: World Championship tournament format
« Reply #80 on: Jun 19th, 2005, 5:36pm » |
Quote Modify
|
If you want to try out any format to see how it works, just run the 'run2' program with the format and 'show' as the number of trials. For example: run2 format/singleElimRand show
|
|
IP Logged |
|
|
|
Tarr
Forum Newbie
Arimaa player #1239
Gender:
Posts: 5
|
|
Re: World Championship tournament format
« Reply #81 on: Jun 23rd, 2005, 10:37am » |
Quote Modify
|
Wow, lots to comment on here. I'll start with a comment on just the simulation results. First, once again, it's important to keep in mind the number of total games when comparing two formats. I can easily add robustness to a format simply by adding more games strtegically. So when comparing the simulation results, keep that in mind "swissOmatic" and "swissSaw" and "round robin" formats take 15-16 rounds of play for a 16 player tournament. It would be pretty shocking if a format like the upa one (7 rounds of play) could produce equally robust results. (It's also worth noting that unless there are special provisions which I am not aware of, both the swiss methods will produce a ton of rematches, which hardly seems optimal). Comparing the upa format to double elimination is a bit more fair, as the double elimination format is only 9 rounds. Still, that's more rounds to work with = more robust results. The interesting results to me are: - A comparison of the single elimination methods shows that the "fold" is the more robust that the "slide", or than random parings. This is not surprising at all, but is nice to confirm. - The UPA format does not do any better than straight single elimination. This shows (to me) that the reshuffling of the lower seeds doesn't really help us much in avoiding later upsets. This clearly demonstrates that if we want robustness in a format, we need extra games at the _end_ of the format to make sure we sort the top players correctly. An obvious simple approach would be to have the four semifinalists play a double elimination for the top three spots. I think a good approach would be to first decide how many games you are willing to give each player at maximum, and how many rounds the tournament may last at maximum. Once I know this, I can suggest a format that I think would work well for you.
|
|
IP Logged |
|
|
|
Tarr
Forum Newbie
Arimaa player #1239
Gender:
Posts: 5
|
|
Re: World Championship tournament format
« Reply #82 on: Jun 23rd, 2005, 10:42am » |
Quote Modify
|
One more quick comment: While I suggested the "16 teams, 1 advances" format, we actually have a whole manual of formats for a variety of permutations. So I can easily draw on that to suggest something for, say, 18 players, three "advance". I say three because I am now aware that you care about not just who finishes first, but also second and third. So the one team advances format is probably not the best, since it is primarily concerned with crowning the champion. But as I said above, there's no point in me suggesting specifics until I have a better sense of the maximum number of games allowed.
|
|
IP Logged |
|
|
|
omar
Forum Guru
Arimaa player #2
Gender:
Posts: 1003
|
|
Re: World Championship tournament format
« Reply #83 on: Jun 26th, 2005, 9:31am » |
Quote Modify
|
In the 2004 WC we had 18 players and in the 2005 WC we had 10 players. The number of players can vary quite a bit. Since none of us are professional Arimaa players and have other commitments we limit the rate of the games to just one game per player per week. We also try to avoid simultanious games so that everyone has a chance to watch the games of other players. There is also the constraint that we want the tournament to finish in about two months due to other events coming up. It could be a little longer than 8 rounds, but probably not more than 12. Although second and third place is recognized in our WC, getting first place correct I think is the most important factor for a WC type tournament. These experiments have convinced me that incorporating a rating system into the tournament significantly improves its performance over a similar version that does not. For example roundRobinRated is significantly better than roundRobinDouble even though double has twice as many rounds and twice as many games.
|
|
IP Logged |
|
|
|
omar
Forum Guru
Arimaa player #2
Gender:
Posts: 1003
|
|
Re: World Championship tournament format
« Reply #84 on: Jun 27th, 2005, 6:13pm » |
Quote Modify
|
I ran the simulations on some more formats. Here are the results; Im also including results from the previous simulations for comparison and organizing it in a table so that it's easier to see: format | inacc=50 | inacc=200 | inacc=400 | | randomSelection (0) | 6.3% | 6.3% | 6.3% | | roundRobin (15) | 32.2% | 30.0% | 31.7% | roundRobinDouble (30) | 40.8% | 39.9% | 40.8% | roundRobinRated inacc/5 (15) | 64.2% | 40.6% | ? % | roundRobinRatedEqual inacc/5 (15) | 35.3% | 35.5% | ? % | roundRobinRatedRank inacc/5 (15) | 46.0% | 38.5% | ? % | | singleElimRand (4) | 25.6% | 25.1% | ? % | singleElimOrd (4) | 20.2% | 23.4% | ? % | singleElimSlide (4) | 24.7% | 25.9% | ? % | singleElimFold (4) | 26.8% | 28.6% | 24.4% | | swissKnife (0) | 65.3% | 35.4% | 6.0% | swissSaw inacc/5 (16) | 65.0% | 42.0% | ? % | swissOmatic inacc/5 (16) | 65.1% | 42.5% | 31.1% | swissOmaticEqual inacc/5 (16) | 38.8% | 38.2% | ? % | swissOmaticRank inacc/5 (16) | 43.1% | 40.8% | ? % | | doubleElimFold (11.6) | 34.6% | 31.9% | ? % | upa16 (7) | 27.2% | 26.9% | ? % | | floatDoubleElim (7.7) | 30.9% | 31.6% | 29.4% | floatTripElim (10.3) | 35.3% | 34.7% | 33.5% | floatTripElim2 (11.4) | 35.7% | 34.9% | 34.4% | floatTripElimRand (11.0) | 35.2% | 33.9% | 33.5% | floatQuadElim (13.4) | 35.1% | 36.3% | 35.1% | The simulations were run as follows: run2 'formats/roundRobin' 2000 16 500 50 9999 Each format was run for 2000 trials, with 16 players, true rating range of 500, measured rating inaccuaracy of 50 or 200, and a draw ratio of 1:9999. The number in parenthesis after the format name is the average number of rounds the format requires.
|
« Last Edit: Aug 10th, 2005, 12:18pm by omar » |
IP Logged |
|
|
|
omar
Forum Guru
Arimaa player #2
Gender:
Posts: 1003
|
|
Re: World Championship tournament format
« Reply #85 on: Jun 27th, 2005, 6:58pm » |
Quote Modify
|
The new formats are: roundRobinRatedEqual, roundRobinRatedRank, swissOmaticEqual and swissOmaticRank. After thinking about the points Karl raised regarding the use of players inital measured ratings in formats such as swissSaw, swissOmatic and roundRobinRated, I wanted to see what would happen if they did not use the players inital ratings, but still used a rating system to determine the winner. So roundRobinRatedEqual and swissOmaticEqual, are the same as their original corresponding formats except that they set the initial ratings of all the players to the same value. These formats continued to preform better than formats that did not use rating systems. Not nearly as good as the original formats, but they eliminated the problems that can arise from relying heavily on the players initial ratings (see Karl's posting of Jun 13th). For example swissOmaticEqual performed better than roundRobin, roundRobinDouble, singleElimFold, doubleElimFold and upa16. Both swissOmaticEqual and roundRobinRatedEqual are 100% fair formats, in that they do not favor any player over another. The next thing that I wanted to try was to rank the players based on their initial ratings and use some rank based ratings as the initial ratings for the rating system. The lowest rated player was given a ranked rating of 2000, the next higher rated player was given a ranked rating of 2000+delta, the next higher rated received a ranked rating of 2000+2*delta and so on. If two players had the same ratings their ranked ratings would also be the same. The value used for delta was the rating inaccuracy divided by 15. I tried various values for delta and found this produced a performance that was about midway between the original format and the Equal variant. These formats are called roundRobinRatedRank and swissOmaticRank. These formats are a bit unfair in that they do favor higher rated players. However, all players will still have some chance of winning if they have a good performance at the event and a player will not be able to gaurentee themselves a win by getting their rating way above the competition. So they don't suffer as much as the orignal formats, but still have a performance that is significantly better than other formats.
|
|
IP Logged |
|
|
|
jdb
Forum Guru
Arimaa player #214
Gender:
Posts: 682
|
|
Re: World Championship tournament format
« Reply #86 on: Jun 27th, 2005, 7:34pm » |
Quote Modify
|
I ran one "show" version of the swissOMaticEqual variant to have a look at how the pairings went. It is an interesting format, that shows alot of promise. Here is the first part of the ts file for one run: # Round 1 *% * Ratings as of round 1 player p1 2000 player p2 2000 player p3 2000 player p4 2000 player p5 2000 player p6 2000 player p7 2000 player p8 2000 player p9 2000 player p10 2000 player p11 2000 player p12 2000 player p13 2000 player p14 2000 player p15 2000 player p16 2000 * pair p9 p1 winner p9 pair p10 p2 winner p10 pair p3 p11 winner p11 pair p12 p4 winner p12 pair p13 p5 winner p13 pair p14 p6 winner p14 p6 pair p15 p7 winner p15 pair p8 p16 winner p8 * next round 2 # Round 2 pair p1 p16 winner p16 pair p2 p15 winner p15 pair p3 p14 winner p3 pair p4 p13 winner p4 pair p12 p5 winner p5 pair p6 p11 winner p11 pair p7 p10 winner p10 pair p8 p9 winner p8 * next round 3 Two questions: 1) Which two players get dropped? 2) And why? Answers) 1) p1 & p7 got dropped 2) p7 and p2 have IDENTICAL records. They both lost to p15 and p10. What is the criteria to decide in this case? I honestly only ran the simulation once, and this popped up!
|
|
IP Logged |
|
|
|
omar
Forum Guru
Arimaa player #2
Gender:
Posts: 1003
|
|
Re: World Championship tournament format
« Reply #87 on: Jun 28th, 2005, 7:33am » |
Quote Modify
|
I changed the swissOmaticEqual format to show the list of players with their ratings before the last two are dropped. When players have the same ratings the order within this range of players is the same as the order in which they were in the last listing of ratings or if that is not available then the order in which they were first given. It's a bit of bad luck for the last two players on the list, because you can have a situation like this: last 2 will be dropped p2 2020 p5 2019 p9 2019 p4 2019 p1 2001 p14 2001 p11 2001 p10 2000 p15 2000 p3 1999 p6 1999 p16 1999 p8 1981 p12 1981 p13 1981 p7 1980 But the players that get dropped have lost two consecutive games.
|
|
IP Logged |
|
|
|
Tarr
Forum Newbie
Arimaa player #1239
Gender:
Posts: 5
|
|
Re: World Championship tournament format
« Reply #88 on: Jul 5th, 2005, 6:56pm » |
Quote Modify
|
on Jun 26th, 2005, 9:31am, omar wrote:In the 2004 WC we had 18 players and in the 2005 WC we had 10 players. The number of players can vary quite a bit. |
| Well in that case, it's useful to have an abaptable format. More on this later. on Jun 26th, 2005, 9:31am, omar wrote:There is also the constraint that we want the tournament to finish in about two months due to other events coming up. It could be a little longer than 8 rounds, but probably not more than 12. |
| Well, then... doesn't that rule out most of the formats you're running simulations on? The swiss and round robin formats you're looking at all take more than 12 rounds, as long as there are more than 13 players. At the risk of sounding like a broken record - if you're comparing the accuracy of formats with drastically different numbers of games, you're comparing apples and oranges. on Jun 26th, 2005, 9:31am, omar wrote:Although second and third place is recognized in our WC, getting first place correct I think is the most important factor for a WC type tournament. |
| Understood. on Jun 26th, 2005, 9:31am, omar wrote:[...]experiments have convinced me that incorporating a rating system into the tournament significantly improves its performance over a similar version that does not. For example roundRobinRated is significantly better than roundRobinDouble even though double has twice as many rounds and twice as many games. |
| Agreed, using rankings amounts to using more information, which is generally a good thing. Let me take another crack at this, now that I have a little better understanding of the constraints. There will be some obvious similarities to "upa16" but I've modded it up a bit. The following format can work for any number of players 12 or higher, although it works best with 16 or more. Stage one- group play (2-5 games). Players are seeded into four groups based on their rankings. I've listed player rankings up to 24, it should be obvious how to rank players beyond that. If you have fewer players, the group is smaller. Group A: 1, 8, 11, 14, 20, 21 Group B: 2, 7, 12, 13, 19, 22 Group C: 3, 6, 9, 16, 18, 23 Group D: 4, 5, 10, 15, 17, 24 After group play, players are ranked within each group. All 4th, 5th, and 6th place players are eliminated. Stage 2: crossover games (1 game) The following crossover games are then played. The first two are for seeding purposes, while the others are elimination games: 1) A1 vs. C1 2) B1 vs. D1 3) A2 vs. C3 4) B2 vs. D3 5) C2 vs. A3 6) D2 vs. B3 The players are then ranked as follows: 1) The winner of A1 vs. C1 2) The winner of B1 vs. D1 3) The loser of A1 vs. C1 4) The loser of C1 vs. D1 5) The higher ranked of (A2 vs. D3 winner) and (D2 vs. A3 winner) 6) The higher ranked of (C2 vs. A3 winner) and (A2 vs. C3 winner) 7) The lower ranked of (A2 vs. D3 winner) and (D2 vs. A3 winner) 8 ) The lower ranked of (C2 vs. A3 winner) and (A2 vs. C3 winner) Stage 3: modified elimination play (8 games) These 8 players then play a double elimination bracket. All games are 2-game series, where the higher ranked player gets draw odds. EXCEPTION: the finals is a 4-game series. So, (winner takes the high seeds), the tournament proceeds as such: round 1 (all games are 2 game series): 1v8 4v5 3v6 2v7 round 2 (all games are 2 game series): 1v4 2v3 5v8 (loser of series is eliminated) 6v7 (loser of series is eliminated) round 3 1v2 (4 game series - winner is 1st, loser is 2nd) 3v5 (2 game series - loser is eliminated) 4v6 (2 game series - loser is eliminated) round 4 3v4 (2 game series - winner is 3rd place) With 16 players, the 1st, 2nd, 3rd, and 4th place players play 12 games total. It's more or less depending on the number of initial players, which changes the size of the groups. I'd be interested to see how that compares to other formats which use a similar number of games.
|
|
IP Logged |
|
|
|
omar
Forum Guru
Arimaa player #2
Gender:
Posts: 1003
|
|
Re: World Championship tournament format
« Reply #89 on: Jul 5th, 2005, 10:11pm » |
Quote Modify
|
on Jul 5th, 2005, 6:56pm, Tarr wrote: Well, then... doesn't that rule out most of the formats you're running simulations on? The swiss and round robin formats you're looking at all take more than 12 rounds, as long as there are more than 13 players. |
| Yes, I ran the simulations on these longer formats just for comparison. Quote: Agreed, using rankings amounts to using more information, which is generally a good thing. |
| Not rankings, but ratings. Actually roundRobinRatedEqual and swissOmaticEqual are not using any more information, they are just using a different way of keeping a score. For example the winner of a round robin is typically defined to be the player who wins the most games. The method of scoring is basically 1 point for win, 0 for loss and 1/2 point for a draw. We could easily change the scoring method to say that if you win against a player you not only get the 1 point but also 1/5 of the points they have accumulated. If we change the method of scoring to something like that we have not added any external information that we already did not have. Adding a rating system and keeping the inital ratings of all players the same basically amounts to changing the way we keep score. We are still using the same win/loss results to compute the score. We can also add additional external information by making use of the ratings the players already have. Such formats perform extrealy well. roundRobinRated does this. Doing this significantly improves the performance of a tournament in picking the true best player. But it runs the risk of players manipulating their ratings prior to the start of the tournament. roundRobinRatedRanked tries to minimize this by resetting the initial ratings so that they are equally spaced based on the rank from the inital ratings.
|
|
IP Logged |
|
|
|
|