Arimaa Forum (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
Arimaa >> Events >> World Championship tournament format
(Message started by: Fritzlein on Apr 29th, 2005, 12:01pm)

Title: World Championship tournament format
Post by Fritzlein on Apr 29th, 2005, 12:01pm
Yes, I know it will be eight months before the next human World Championship tournament, but it can't hurt to get a discussion started early, and I have the time right now.

I think the format of the 2005 World Championship left something to be desired.  Single-elimination makes the championship too short, and too prone to fluke upsets.  I only had to win four games and then I was champ, which not only short-changes the spectators, but also raises the question of whether I just got lucky over a short run of games.  We see these problems also in the chess world, where the recent FIDE World Championship knockout events haven't picked the best player as accurately as longer qualifying matches used to do.   You can call Khalifman the World Champion on the basis of one big tournament if you like, but he wasn't even in the top ten in the world, much less number one.

On the other hand, elimination matches seem to be more fair than a round robin, and less prone to collusion.  For example, if I see that I am already not going to win a prize, I could intentionally lose my game against someone I liked, while trying even harder against the people I don't like.  A group of strong players working together has a great chance of making sure one of their own group wins overall.  Bobby Fischer complained about the Soviets colluding in round-robin chess tournaments, and he had a point, which is why FIDE changed the format to have eliminations instead.

Now one way to extend the tournament while preserving the elimination format would be to have each round be the best two out of three, and the final round be best three out of five.  That would already be an improvement over simple single elimination.  But there is still a bit of room for improvement because of two minor flaws:  At one game per week a tournament with as few as 9 players would take 14 weeks to complete.  Also the seeding remains too important in that if you get paired with the #1 in the first round, that's probably the only person you get to play ever.

A different idea would be to have a standard double-elimination where one loss sends you to the loser's bracket and you have to win your way back up.  Again, I like this idea better than single elimination, and it mostly fixes the flaw of getting knocked out by two losses to the same person.  If the #1 seed beats you in the first round, at least to get to play someone else the next round where you have a better chance.  Also it is shorter that the mini-match format, lasting only 9 weeks.

On the other hand, the traditional structure of the double-elimination leaves room for improvement because it is so hard on the losers.  In a 16-player double-elimination bracket, a player who wins the first round can win the tournament with 4 straight wins, whereas someone who loses the first round needs 8 straight wins to win the tournament.  That's not fair.  To win the whole tournament, it should take a first round loser exactly one more win than a first round winner, not twice as many.

My attempt at solving all these difficulties is what I call "floating triple elimination".  Maybe it isn't original to me, but I've never seen it laid out elsewhere, and I'm curious what people think about it.  The basic idea is that you get to keep playing until you lose three times, but there is no fixed bracket.  On every round you must play someone you haven't yet played, until there is no alternative but to repeat matchups.  This means that except for the top few players, everyone who gets eliminated will have lost to three different players, rather than to one player multiple times.  Also, since everyone plays every round, everyone needs the same number of total wins to become champion, no matter whether they win or lose in the first round.  Finally, a full sixteen-player bracket would complete in 12 weeks, which is fewer rounds than the mini-match format even though in that format one can be knocked out in two game but in my proposed format everyone gets to play a minimum of three games.

Title: Re: World Championship tournament format
Post by Fritzlein on Apr 29th, 2005, 12:01pm

As usual with my proposals there is an increase in complexity in exchange for everything else being ideal. :-)  One problem is handling byes, because not only might there be an odd number of players to start out, but the way players are eliminated might create an odd number of players in a later round.  Furthermore, when the time comes that it is inevitable to repeat a matchup, there needs to be a strict algorithm for which repeat matchups are assigned in order to keep things fair.  I expect the pairings would be complicated  enough that they would have to be done by machine algorithm rather than by hand.  Still, for a programmer it should be easy enough to be worth doing if the goals of formatting the tournament are worth striving for.  Here is my proposed algorithm:

Rule 1: When an even number of players remain, everyone plays the round.  When an odd number remain, one player gets a bye.  No player may get a second bye unless every remaining player has already recieved one bye.  Similarly no player can get an (n+1)st bye unless all remaining players have n byes.

Rule 2: Subject to Rule 1, the pairings for each round must minimize the number of repeat matchups.  A pairing which has a matchup for the (n+1)st time must be rejected in favor of any in which all matchups are occuring for the nth time or fewer.

These are absolute rules which are enough to ensure that the goals of the tournament are met, but if there are multiple pairings which meet the requirements, one might prefer one pairing to another based on things like:

* Minimizing the number of second-time matchups is desirable among all pairings which don't have any third-time matchups
* Matching players with the same record is desirable, i.e. it's better to have two 2-0 players matched and two 1-1 players matched than to have two matchups of 2-0 vs. 1-1.
* Giving a bye to the highest-rated player eligible for the bye is desirable.
* Perhaps as a tiebreaker between two pairings equal on all these counts, pick the one which minimizes the sum of the squares of the rating differences

Anyway, that's enough to give a flavor of what I'm proposing.  I'm very open to comments, corrections, criticism, etc., and I would especially appreciate a reference if someone has heard of a similar proposal before.

Title: Re: World Championship tournament format
Post by Fritzlein on Apr 30th, 2005, 6:13pm
It's a good point that if we have a long time, there are many good tournament formats.  My "floating triple elimination" is an attempt to keep it fun, minimize fluke champions, and still keep the time frame somewhat compressed.  I'm curious what Omar thinks would be the maximum acceptable length of the World Championship tournament.  I was actually worried that he might think the worst case of 12 rounds for a 16-player tournament would already make the tournament too long.

You are absolutely right that new people can join and rapidly climb the ladder.  For example, at the start of the Postal Championship, Robinson was ranked about #15 among humans, and now  he's #3 and rising, while the tournament is still ongoing!  Also the 2005 World Championship started only a bit more than three months after I played my first ever game of Arimaa on the server.

But that said, I don't think the changing landsacpe is the biggest reason to keep the tournament short.  The main pressure is probably the time commitment required by the participants.  We would like lots of participants, but what if several top players opted out because they couldn't commit to one game per week for 12 consecutive weeks?  (Although it would be shorter than 12 weeks if fewer than 16 players signed up.)

So the time commitment has to be balanced against making the result meaningful.  Single elimination is the shortest possible format, but very prone to upsets.  A series of qualifying pools is not at all prone to upsets and very likely to pick the strongest player as champion, but would be very slow.  I tried to aim for some middle ground with my proposal, but everything depends on what shape Omar would like the tournament to have.  If time is no obstacle, then we should try something slower and simpler.  On the other hand, if 12 weeks is already too long, there's no point even discussing the details of how my idea would work.  :-/

But with that said, I know Omar likes to hear everyone's opinion before he lays down the law. :-)

Title: Re: World Championship tournament format
Post by Robert on May 3rd, 2005, 6:46am
just a bit in the defense of double elimination. i went through it quickly for 16 players and if you win every game it takes 5 wins, and if you lose your first game you have to win the next  7 games to win the match... so you can either win with a record of 5-0 or 7-1.  
with your proposed system, isn't there a chance of the amount of rounds varying quite a lot, and would thus be kind of difficult to judge when the final would be, or what time limit to put on the tournement? and you might get a final where one player has to win three times to win, and another only has to win once?

Title: Re: World Championship tournament format
Post by Fritzlein on May 3rd, 2005, 9:13am
I'm pretty sure that in a sixteen-player double-elimination, if you lose the first round, you need to win the next eight in a row to win the tournament.  Did you remember that the winner of the loser's bracket has to beat the winner of the winner's bracket twice in a row at the end?

As for the number of rounds varying in floating triple elimination, there is a little bit of uncertainty depending on on how the byes work out, but I think only +/- one round.  A greater uncertainty comes from whether or not the eventual winner ever loses.  If one player never loses, I believe that a 16-player triple-elimination will take at most ten rounds (contrasted with eight rounds for traditional double-elimination), but if everyone gets up to their maximum two losses it can take an extra two rounds (i.e. twelve for triple elimination versus nine for the traditional double-elimination.)

You are quite right, with triple-elimination you could get to a final situation where one player player has to win three times to win the match and the other only has to win once.  But that happens all the time in sports, for example the NBA playoffs going on now.  In a best-of-five playoff, if you lose twice you have to win the next three whereas your opponent only has to win once.  And in a double-elimination tournament, the final *always* is a situation where one player has to win twice in a row and the other player only once.  This is a consequence of each player being allotted a fixed number of losses.

In summary I think the variable number of rounds in floating triple-elimination is a minor disadvantage, but I contend it is not as big a flaw as traditional double-elimination making a first-round loser win twice as many more games to win the tournament as a first-round winner, i.e. 1-0 needs four more wins and 0-1 needs eight more wins.  But now that I think about it, this problem also could be remedied by doing *floating* double-elimination, i.e. by having eveyone play every round as I proposed for triple elimination, in place of having more rounds in the loser's bracket than in the winner's bracket.  This would solve the fairness issue and also keep it as short as possible.

Title: Re: World Championship tournament format
Post by 99of9 on May 3rd, 2005, 6:13pm

Quote:
i went through it quickly for 16 players and if you win every game it takes 5 wins, and if you lose your first game you have to win the next  7 games to win the match... so you can either win with a record of 5-0 or 7-1.



on 05/03/05 at 09:13:32, Fritzlein wrote:
I'm pretty sure that in a sixteen-player double-elimination, if you lose the first round, you need to win the next eight in a row to win the tournament.  Did you remember that the winner of the loser's bracket has to beat the winner of the winner's bracket twice in a row at the end?


I just tried it and although I've never been in one of these style tournaments, I agree with Robert.  I personally think that 7-1 is about equivalent to 5-0 so this is a good system.  Fritz can you point out what's wrong with these calculations?

R is the round number
N is the number of players in the normal round
E is the number of players in the elimination half of the round
O is the number of players who are totally out


R    N    E    O
1   16    0    0
2    8    8    0
3    4    8    4
4    2    6    8
5    1    4   11
6    1    2   13
7    1----1   14
8*   1----1   14


A player who won all his/her games would play rounds 1, 2, 3, 4, and 7.  In this case round 8 would not be played.  Their final score would be 5-0.

A player who lost his/her first game and subsequently won all would play all 8 rounds, with one lost giving 7-1.

Not that this makes your triple-floating system bad of course.  In part it all depends how many games you're willing to run to reduce the luck element, and how complicated you're willing to make the algorithm :-).

My main concern for this year is to not repeat the situation where the top ranked player plays all games as silver, and also where the top ranked player has to play a higher ranked player than the second ranked player in every round until they meet (or are knocked out).

Title: Re: World Championship tournament format
Post by Fritzlein on May 3rd, 2005, 7:05pm

on 05/03/05 at 18:13:06, 99of9 wrote:
I just tried it and although I've never been in one of these style tournaments, I agree with Robert.  I personally think that 7-1 is about equivalent to 5-0 so this is a good system.  Fritz can you point out what's wrong with these calculations?


Sure thing.  The way double-elimination traditionally has worked is that the losers' bracket plays more rounds.

For 16 players, after one round there are 8 winners and 8 losers.  Next the losers play a round knocking 4 out and leaving 4, and the winners play a round, leaving 4 undefeated and creating 4 new losers.  Before the winners play again, the 4 old losers play the 4 new losers, knocking out four and leaving only 4 losers.

So, apart from the first round, the losers have played two more rounds, and the winners have played one more round, leaving 4 winners and losers each.  The process then repeats with the losers playing a round, the winners playing a round, and the old losers playing a round against the new losers.   At this point, apart from the first round, the losers have played four more rounds, and the winners have played two more rounds, leaving 2 winners and losers each.

etc. etc.  No matter what the size of the tournament, the losers play twice as many rounds as the winners.  Let me try to do the bracket in ASCII

[pre]
                -----
            ----|   |----
            |   -----   |
        ----|           |----
        |   |   -----   |   |
        |   ----|   |----   |----
        |       -----       |   |
    ----|                ----   |
    |   |       -----           |
    |   |   ----|   |----       |----
    |   |   |   -----   |       |   |
    |   ----|           |----   |   |
    |       |   -----   |   |   |   |
    |       ----|   |----   |----   |----
    |           -----       |       |   |
----|                    ----       |   |
    |           -----               |   |
    |       ----|   |----        ----   |
    |       |   -----   |               |
    |   ----|           |----           |
    |   |   |   -----   |   |           |
    |   |   ----|   |----   |----       |----
    |   |       -----       |   |       |   |
    ----|                ----   |       |   |
        |       -----           |       |   |
        |   ----|   |----       |----   |   |
        |   |   -----   |       |   |   |   |
        ----|           |----   |   |   |   |
            |   -----   |   |   |   |   |   |
            ----|   |----   |----   |----   |----
                -----       |       |       |
                         ----       |       |
                                    |       |
                                 ----       |
                                            |
                                            |
                                            |
                                         ----
[\pre]

LOL.  That totally didn't work, but it is so funny I'm leaving it. :-)  Try this instead: http://www.foosballheaven.com/pdf/tournament16c.pdf

Anyway, it looks like what you were doing, 99of9, is almost having everyone play every round, i.e. almost what I am proposing.

Title: Re: World Championship tournament format
Post by 99of9 on May 3rd, 2005, 11:29pm
OK thanks for the explanation.  That makes sense.

I don't think robert's/my method is quite the same as your floating (double) elimination, but I agree it's close.  In my scheme the E category (those with 1 loss) is strictly separated from the N category (those with 0 losses).  The only time they play against each other is when both pools have found their winner.  (The winner of pool N gets byes in rounds 4 and 5.)

I like your picture... at first glance I thought you'd drawn out an entire hypothetical triple-elimination to show me that it was not so complex :-).

Title: Re: World Championship tournament format
Post by omar on May 5th, 2005, 5:54pm

on 04/30/05 at 18:13:08, Fritzlein wrote:
But with that said, I know Omar likes to hear everyone's opinion before he lays down the law. :-)


And even after that Im still open for suggestions and changes :-)

Im all for a system that requires more than a single elimination to avoid flukes. But I was using this in the past mostly so that it would not require a lot of time commitment from the players. I would hate to see one of the top ranked players not be able to play just because they couldn't make the time commitment. So the total number of games the winner has to play is still a bit of a concern for me.

Also it would be nice if we could fit something into about a two month time frame. Something that fits into a 3 month time frame is still OK with me, but would perfer something shorter if possible.

Im also open to the possibility of using ratings to limit the number of entries to reduce the number of rounds. For example only the top 8 players (based on ratings) are selected for the tournament.

I like the idea proposed by Karl where a player continues playing until losing at least three times. Does anyone want to try running some simulations with it. That is given a set of players and their measured (used for seeding and pairings) and true (used in simulating the outcome of a game) ratings where the measured ratings are within 100 points of the true ratings run the tournament many times to see what the probability is of the true highest rated player winning the tournament. Maybe we can also compare this to other formats like one, tow and four eliminations.

I would love to run some simulations like this, but I just can't right now. However, if no one else does it I know I eventually will.

I also came across this article which is of interest to this topic:

http://www.chesscenter.com/twic/sonas010704.html


Title: Re: World Championship tournament format
Post by Fritzlein on May 6th, 2005, 12:35am
Omar, thanks for the link to Sonas' article.  It is interesting that top chess players don't like round-robins or even Swiss pairing for their championship matches due to the possibility of collusion.  I quite agree, and there is an added point I discovered recently in a casual ping-pong tournament.  The tournament was supposed to be round-robin, but the participants, as soon as they were eliminated from prize contention, quit and went home, forfeiting their remaining games.

The take-home message is that players shouldn't have any scheduled games after they no longer have a chance of winning.  In a serious situation there may be cheating, and in a casual situation there may be apathy.  Knockout-style formats are both more exciting and more fair.

Sonas' double-elimination proposal is interesting.  It is more fair than usual double-elimination, because the matches for the winners are twice as long as for the losers.  To use this for Arimaa, we would have to introduce a tiebreak into two-game matches, but maybe that is simple: the lower number of moves to victory.  This balances out the effort one has to put into playing, i.e. everyone now plays the same number of games, but it still has a huge defect: It doesn't balance out winning chances, because folks in the losers bracket still have to face twice as many chances of being eliminated as folks in the winner's bracket.

With 16 players and Sonas-style double elimination, the whole tournament still takes only nine weeks because the extra games are coming from making the winners bracket play more games instead of resting while the losers bracket plays extra rounds.

Sonas calls a late-round bye "ugly", and I admit that there is a certain inelegance, but I think it is not so very unfair because the number of byes is equalised overall.  In triple-elimination, most players who survive that long will have a turn at getting a bye.  Moreover, the harsh penalty for losing early that exists in traditional double-elimination and persists in Sonas' double-elimination is more unfair (uglier) in my opinion than having players take turns receiving byes.

If we go with triple elimination as I poropse and limit the field to 8 players, by my calculation the tournament will last 8-10 weeks, i.e. about the same length of time as traditional double-elimination with 16 players.  This raises the question of whether it is more desirable to eliminate fluke champions by having triple-elimination and only the top eight interested parties competing, or whether it is better to let sixteen people go at it and have only double-elimination.

My gut feeling is that it is nice to have the championship be as "open" as possible.  If we must have a short tournament, then it's better to accept a slightly higher chance of upsets and give everyone a chance to play who wants to play.

At the moment, unemployed as I am, it actually seems the least of the evils to have an "open" triple-elimination and if it takes three months, so be it.  But I can imagine that if I have a job come November, playing one tournament game every week for 12 straight weeks could become a burden.

Title: Re: World Championship tournament format
Post by 99of9 on May 6th, 2005, 9:08am
Just for the record I'm strongly against limiting the field for the WC at this early stage of arimaa's development.  I can understand this may be necessary when there are hundreds of willing participants, but until then I really think the tournament should be open to all.

As one of those who had an "unexpected" loss in my first game of this years comp, you might expect me to be one of the ones railing for triple elimination to reduce the random/noise element.  But personally I think a double elimination is sufficient warning in arimaa.  If someone is really the best arimaa player, they probably shouldn't be losing twice in the course of 5-8 games.

Title: Re: World Championship tournament format
Post by 99of9 on May 6th, 2005, 10:47am
You are right if all the opponents are the same standard.  But there will be plenty of easy games in that 5-8 games.  

It seems to me that the arimaa field is quite spread out at the moment.

For comparison note that a win rate of 2/3 would have placed over 5th in the postal tourney with 16 players.

My statement is not intended as a hard and fast mathematical theorem which covers all possible outcomes and all possible player skill distributions.  It's based on my opinion of how a tournament between current players would turn out.  In the current conditions, 2 losses in 5-8 games would not cut it.

Title: Re: World Championship tournament format
Post by jdb on May 6th, 2005, 11:39am
In my opinion, one of the things that makes a tournament (in any sport) interesting is that anyone has a chance (however remote) of winning. Some tournaments are set up this way, on purpose, in order to encourage participation. (ie Slowpitch baseball)

One of the positive aspects of the single elimination tournament, is a weaker team still has a chance to advance far into the tournament. (ie NCAA basketball)

A double elimination still retains the feature that a weaker team still has a decent chance (although not as good as a single elimination). It also allows a strong team to get upset once and still be able to win. If a team gets upset twice, then they don't really deserve to win the tournament. Another nice feature of double elimination, is that two teams rarely have to play each other twice. It can happen, but not until very near the end of the tournament.





Title: Re: World Championship tournament format
Post by omar on May 6th, 2005, 6:37pm

on 05/06/05 at 11:39:27, jdb wrote:
In my opinion, one of the things that makes a tournament (in any sport) interesting is that anyone has a chance (however remote) of winning. Some tournaments are set up this way, on purpose, in order to encourage participation. (ie Slowpitch baseball)


I guess it really comes down to a question of what we want the World championship tournament to be.

I've always felt that the winner of a world championship tournament should be the person with the highest true rating (see my previous post for more on true rating).

But after seeing Jeffs posting, I am reconsidering my position. I'll post more after I've thought about it.

Title: Re: World Championship tournament format
Post by omar on May 18th, 2005, 5:17pm
I've been thinking about this for a while and Im still a bit undecided.

On the one hand I feel the winner of the WC should be the player with the highest true rank. This is what I would expect the result of a WC tournment to produce. But it is hard to acheive, and it could take a lot of games if the true rating difference between the players is small.

The other possibility is that the winner should be one of the better players but not necessarily the best. Initially I would not have even considered this, but if we take into consideration more factors such as the time frame of the tournament, greater participation, and spectators, then it is worth considering. Such a tournament is interesting for even lower ranked players, since they have a higher chance of winning; so more players are likely to participate. It is interesting for spectators since the outcome is more unpredictable. And it can be acheived with short tournaments. Also this is what the results of most tournaments will be anyway (unless a lot of effort is put into selecting the best player).

I am so undecided on this that I think it might be better if we just have two seperate tournaments. One which is open to all and has the single-elimination format and another which is limited to just the 4 top rated players and  has a more stringent format such as a double round-robin or Karl's format. The first would be like the "Open Classic" and the second would be the "World Championship". The purpose of the first is to have a  fun tournament where anyone can participate and the winner is not necessarily the best player. The purpose of the second would be to select as best as possible the player with the true highest rank. The tournaments would not overlap so a player could play in both. Both tournaments will have prizes and will require a registration deposit (to ensure commitment) which will be refunded if no games are forfeited.


Title: Re: World Championship tournament format
Post by Fritzlein on May 18th, 2005, 8:48pm
I've thought somewhat about the conflict between "every player should have a chance" and "the truly strongest player should win".

First, I think that for the World Championship to be as legitimate as possible, it should be open to everyone.  Specifically, qualifying for the tournament should not be based on ratings, because of the flaws inherent in the rating system.  At the moment ratings measure mostly how well one does against bots rather than against humans.  One dramatic example of how ratings could exclude a legitimate contender is that Naveed is currently ranked 16th among humans!  If the World Championship were open only to the top 8 (by rating) who signed up, Naveed might not get a seat despite having scored victories against every other top player at one time or another.

On the other hand, although I believe everyone should have a chance to participate, I don't like formats which are conducive to upsets.  Single-elimination is too vulnerable to freak result.  Once everyone has signed up, the tournament should have as great a chance as possible of declaring the best player to be the winner.

Obviously, the longer the tournament is allowed to run, the greater the chance that the best player wins.  However, even if you fix a certain number of rounds and declare that the tournament can't go any longer than that maximum, within that time frame some formats will have a greater chance than others that the best player will win.  My contention is that floating triple-elimination is nearly as short as traditional double-elimination, but more fair and more likely to leave the best player standing.

One final thought about time commitment: If the main problem with a large number of rounds is that some people might not be able to commit to a game per week for X straight weeks, folks might be able instead to commit to one game every two weeks over a total of 2X weeks.  That gives a longer time between tournament begin and end, which is undesirable, but takes a lot of pressure off the players, which might make it worth it.

Title: Re: World Championship tournament format
Post by omar on May 19th, 2005, 7:22pm
I forgot to mention one of the benifits of having seperate tournaments. In addition to using different formats the different tournaments could also use different time controls. We've always had a conflict with time controls as well. Faster time controls allow more people to be able to participate, but for WC quality games we want to use longer time controls. The Open Classic could use a time controls like 1 min per move while the WC could use time controls of say 2 min per move. I think finding a long streatch of free time during a week to fit in a game has been the main problem for most people. So using faster time controls for the OC could help eliminate that.

I tried to search the Internet for a quantatative comparison of different tournament formats. Particularly with respect to a given formats ability to select the player with the true highest rating. But I didn't find anything even close to it. If someone knows of such previous work please let me know.

I did however come across this interesting article:
http://www.johnpratt.com/items/docs/ranker/ranker.html
It suggests that all players in a tournament should be ranked to make it more interesting for everyone. Maybe a format like this could be used for the OC.

I think having a WC tournament that is limited to just a select number of top rated players can be considered the final stage of a much longer tournament that was open to everyone. If it is known well in advance that only the 4 top rated players will continue into the finals of the WC then a player who is really interested to win the WC has basically the whole year to get their ratings up. Of course we could very easily setup a page to compute the ratings based only on H-H games and select the top players from that page to avoid people inflating their ratings against bots. Although there are some flaws with rating systems, the ratings are still a very good predictor on a gross level. It is only when the rating difference between players is small that ratings are not accurate. So I think it is good to make use of ratings as a way of narrowing the field for the final stage of the WC.


Title: Re: World Championship tournament format
Post by jdb on May 20th, 2005, 8:30am
Here is an article about the effectiveness of different types of tournaments. It gave me a headache, but it might be what your looking for.

http://www.chessbase.com/newsdetail.asp?newsid=260

Title: Re: World Championship tournament format
Post by omar on May 21st, 2005, 12:54am
Thanks for that link Jeff; it was very interesting. Looks like Jeff Sonas actually ran thousands of simulations on 13000 tournament formats using simulated mearsured and true ratings for the players. This is exactly the kind of data we need to look at make a more informed decision. I just wish that the full results were published along with a more precise description of how the simulations were run.

Never the less, it is interesting that none of the formats even break the 70% mark for selecting the true strongest player. And these are formats that have long series of about 20 games between the final candidates.

After reading Jeff's article I get the feeling that there probably isn't that much of a difference between the single, double and triple elimination formats with respect to selecting the best player as the winner. But we really should run some simulations and see what the numbers are. Suppose we generate 16 players with true ratings randomly assigned between the range of 1500 to 2000. Then each players measured rating is produced by adding a random number between -50 to 50 to the true rating. We simulate a tournament by using the true ratings to determine the outcome of the games and use the mearsued ratings as needed for pairing. After the tournament is over we check to see if the player with the highest true rating has won the tournament. After running thousands of such simulations we can get a pretty good idea of how good a particular tournament format is at selecting the best player.

It would be great if someone could try this out for the single, double and triple elimination formats. Any volunteers ?

Title: Re: World Championship tournament format
Post by Fritzlein on May 21st, 2005, 1:13am

on 05/21/05 at 00:54:28, omar wrote:
Suppose we generate 16 players with true ratings randomly assigned between the range of 1500 to 2000. Then each players measured rating is produced by adding a random number between -50 to 50 to the true rating.


It might be more realistic to add a random number between -250 and +250.  There are many examples of wildly inaccurate ratings, including Arimanator jumping from just over 1600 to just over 1800 within two days.  Do you suppose his true playing strength changed 200 points overnight?  I don't, and I conclude that at least one of the two ratings is inaccurate, if not both.  Plus or minus fifty doesn't cover it, and it doesn't cover the inaccuracies in the ratings of lots of other players either, myself included.

Title: Re: World Championship tournament format
Post by 99of9 on May 21st, 2005, 2:12am

on 05/21/05 at 00:54:28, omar wrote:
Never the less, it is interesting that none of the formats even break the 70% mark for selecting the true strongest player. And these are formats that have long series of about 20 games between the final candidates.


This depends on how close together the top few players are in real ratings.  If one player was far ahead, nearly all tournament formats would select the correct player.  If all players were about equal, in a 16 player tournament, most tournament formats would only give the correct winner just over 1/16 of the time.

The distribution of top chess players is probably different to the distribution of top arimaa players, since we do not have any professionals.

Guessing a distribution such as that omar suggests (16 randomly spaced between 1500 and 2000) is probably going to introduce significant biases in these calculations.  It would be better to use actual numbers from current ratings or previous tournaments.

But to be honest I think programming in the details of each tournament method sounds like more work than the results will be worth.  I think most of the guestimated comments in this thread about the effects of different tournament methods (and their lengths) are fairly accurate.

My vote is still for the double elimination variant that Robert and I were talking about.  It's not as long as Fritz's triple-elim, and it has significantly higher accuracy than knockout or traditional double-elim.

Regarding having 2 different tournaments, with candidature decided by ratings.  I think that would be open to serious exploitation if the bots were involved in the ratings.  Even if it was cut down to human-only ratings, collusion may become an issue (eg if one group of friends decided to all help one candidate get in).  The other problem with using ratings is that it encourages very high rated players to stop playing often, in case their rating takes a dive.  On Fritz's current rating, he could be in the tournament for the next 20 years without playing a game!

Title: Re: World Championship tournament format
Post by 99of9 on May 21st, 2005, 5:46am
Naveed is quite an unusual case though, because his real on-the-day playing quality varies so much.  If he's been away from the game for quite some time, or is just having a bad day, he often has a *very* bad day - and can overlook threats from even the simplest bots.  When in form though, he is one of the fiercest competitors and may beat any of us.  I think this may be why he had trouble in the postals... some days his moves were very good, others were very bad.  So I don't think even limiting it to H-H will fix Naveed's rating oscillations!

It would be nice to still have the bots involved in the ratings lists though.. just so we know how humanity is doing against cyberdom.  I think humans often exaggerate how far humanity is in front of cyberdom at the moment - and the ratings lists are often a small antidote to our delusions.

One possibility is that any game *against* a human should count.  That way a Human v Bot game, would count for the bot but not for the human.  Unfortunately the bots might get an unfair rating inflation if humans started taking these games less seriously.

I support the idea of having this kind of rating system, but personally I'd say it'd be better to run both at once - if it's not too complicated for Omar.

Title: Re: World Championship tournament format
Post by omar on May 21st, 2005, 10:04am

on 05/21/05 at 05:46:46, 99of9 wrote:
I support the idea of having this kind of rating system, but personally I'd say it'd be better to run both at once - if it's not too complicated for Omar.


Our previous discussions about rating sytems are still pending.

rating inflation/deflation:
http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=talk;action=display;num=1065901453;start=0

distortion due to selection of opponents:
http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=talk;action=display;num=1096120807;start=

distortion due to time controls:
http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=talk;action=display;num=1103741634;start=

I hope to get back to those eventually and hopefully completely revamp the rating system based on what we come up with.

We have made good progress on it so far by identifying the problems and discussed some possible solutions. The three main problems areas are:
* rating inflation/deflation due to a floating scale
* rating inaccuracies due to selection of opponents
* rating inaccuracies due to time controls

We've discssed possible solutions to each, but it's still in progress.

For the puropose of this discussion, lets just assume the problems with the rating system have been fixed.

Title: Re: World Championship tournament format
Post by omar on May 21st, 2005, 10:09am

on 05/21/05 at 01:13:23, Fritzlein wrote:
It might be more realistic to add a random number between -250 and +250.  There are many examples of wildly inaccurate ratings, including Arimanator jumping from just over 1600 to just over 1800 within two days.  Do you suppose his true playing strength changed 200 points overnight?  I don't, and I conclude that at least one of the two ratings is inaccurate, if not both.  Plus or minus fifty doesn't cover it, and it doesn't cover the inaccuracies in the ratings of lots of other players either, myself included.


Actually once we have simulation program setup, it will be very easy to change high level parameters like rating inaccuracies and see what the new results are. So we can definitely try out +-250 and maybe more.

Title: Re: World Championship tournament format
Post by omar on May 21st, 2005, 11:03am

on 05/21/05 at 02:12:22, 99of9 wrote:
Guessing a distribution such as that omar suggests (16 randomly spaced between 1500 and 2000) is probably going to introduce significant biases in these calculations.  It would be better to use actual numbers from current ratings or previous tournaments.


We can vary the distribution range as well and see how that effects the results. But keep in mind that we are not interested in the result of any one format, but rather a comparision between formats. So if format A is better than format B for the range I proposed, it will probably be better on a different range as well.


Quote:
But to be honest I think programming in the details of each tournament method sounds like more work than the results will be worth.  I think most of the guestimated comments in this thread about the effects of different tournament methods (and their lengths) are fairly accurate.


Well the problem with basing the decision on guestimats rather than actual figures is that you never know for sure if your decision is right or not. Suppose a year later someone proposes a new format for the WC; we will be stuck trying to defend what we selected based only on guestimates and have no way to compare it against any new proposals. I think the effort we put in now to quantatively compare some tourn. formats and make an informed decision will be worthwhile in the long run. If a new proposal comes along in the future we can objectively compare it and possibly accept it without being biased towards our current system.


Quote:
My vote is still for the double elimination variant that Robert and I were talking about.  It's not as long as Fritz's triple-elim, and it has significantly higher accuracy than knockout or traditional double-elim.


Hummm, how much is "significantly higher" :-)

Since there doesn't seem to be any volunteers to do this, I'll start on it.


Title: Re: World Championship tournament format
Post by Fritzlein on May 21st, 2005, 11:49pm
One guesstimate that I highly expecte simulation to prove true is that pairings of the form

1 v 16
2 v 15
3 v 14
...
8 v 9

give a greater chance of the truly strongest player winning overall than do pairings of the form

1 v 9
2 v 10
3 v 11
...
8 v 16

as we had in the last World Championship.  But I am guessing that you had a well-considered reason for doing it the latter way, Omar.  Do you mind explaining the rationale for doing the pairings as you did?  Perhaps there are factors we are forgetting in the quest to have a tournament structure that is more likely to have the strongest player win overall.

Title: Re: World Championship tournament format
Post by omar on May 22nd, 2005, 10:25am

on 05/21/05 at 23:49:07, Fritzlein wrote:
But I am guessing that you had a well-considered reason for doing it the latter way, Omar.  Do you mind explaining the rationale for doing the pairings as you did?  Perhaps there are factors we are forgetting in the quest to have a tournament structure that is more likely to have the strongest player win overall.


Don Dailey had suggested sliding the lower half to play the upper half. He thought it would work well because it would eliminate the most number of weak players at each round. He had also mentioned the folding method would work better, but said it relied heavily on the ratings being accurate. So with no actual figures to go by and knowing that our rating system is new and based on a small pool, I decided to go with the sliding method. Once we have a system to compare tournament formats we'll be able to test your hypothises and know for sure if it is better and exactly how much.

Title: Re: World Championship tournament format
Post by omar on May 22nd, 2005, 10:32am

on 05/21/05 at 01:13:23, Fritzlein wrote:
Plus or minus fifty doesn't cover it, and it doesn't cover the inaccuracies in the ratings of lots of other players either, myself included.


Karl, while Im working on the programs for comparing tournament systems, could you please do some kind of analysys with the games archive to see how well the ratings are predicting outcome of games and see if that can be converted to a number we can use for rating inaccuracies in the simulations.

Title: Re: World Championship tournament format
Post by omar on May 22nd, 2005, 10:48am

on 05/21/05 at 02:12:22, 99of9 wrote:
Guessing a distribution such as that omar suggests (16 randomly spaced between 1500 and 2000) is probably going to introduce significant biases in these calculations.  It would be better to use actual numbers from current ratings or previous tournaments.


If we use measured ratings from the previous tournaments, what should we do about the true ratings. Should we just add a random number in the range of the rating inaccuracies to the measured rating and use that. Should the true ratings be changed from one trial to the next, or computed just once and used for all trials?

Title: Re: World Championship tournament format
Post by 99of9 on May 22nd, 2005, 6:46pm

on 05/22/05 at 10:25:36, omar wrote:
Don Dailey had suggested sliding the lower half to play the upper half. He thought it would work well because it would eliminate the most number of weak players at each round. He had also mentioned the folding method would work better, but said it relied heavily on the ratings being accurate. So with no actual figures to go by and knowing that our rating system is new and based on a small pool, I decided to go with the sliding method. Once we have a system to compare tournament formats we'll be able to test your hypothises and know for sure if it is better and exactly how much.


One thing is clear about the sliding system, with no simulations necessary.  In every round the top rated player has to play a harder opponent than the second rated player, until they meet.  This creates a clear bias toward the chances of the 2nd top player winning the tournament.

In fact even Don's reasoning doesn't make sense to me.  Take the round of 16 for example.  In the sliding method, #8 plays #16, but #9 plays #1.  This is a massive reinforcement of small anomalies that may occur in the ratings of #8 and #9.  I agree that it "will knock out more of the lower rated players", but not neccessarily by real rating, just by predicted rating.  It seems to me much fairer for #8 to play *against* #9 in that round to get a rough determination of who in fact is the better player that will get to contest the next round of the tournament.  That is what would happen in the crossover method.

Title: Re: World Championship tournament format
Post by 99of9 on May 22nd, 2005, 6:56pm

on 05/22/05 at 10:48:33, omar wrote:
If we use measured ratings from the previous tournaments, what should we do about the true ratings. Should we just add a random number in the range of the rating inaccuracies to the measured rating and use that. Should the true ratings be changed from one trial to the next, or computed just once and used for all trials?


Here's my personal suggestion:
1) Add a random number selected from the gaussian (normal) distribution, with the standard deviation determined by how inaccurate you think the ratings are.  As you mentioned before, this parameter can be varied and different simulations done.  Actually though, I think that even using a standard deviation of 0 (where  predicted rating == true rating), will produce interesting results of the type you are after.

2) Choose a set of (true,predicted) ratings.  Don't change them until after you have tested all tournament formats with them.  As long as all tournament formats get to see the same sets of ratings the same number of times, it is probably good to recalculate the real ratings in between iterations.  In psuedocode:


get_predicted_ratings_from_old_tournament()
for (1<iterations<1000) {
 get_real_ratings_from_predicted_ratings_and_normal_distribution()
 for (1<tournament_method<Num_Methods) {
   simulate_tournament()
   is_winner_highest_real_rated()
 }
}

Title: Re: World Championship tournament format
Post by omar on May 23rd, 2005, 6:12pm

on 05/22/05 at 18:46:25, 99of9 wrote:
One thing is clear about the sliding system, with no simulations necessary.  In every round the top rated player has to play a harder opponent than the second rated player, until they meet.  This creates a clear bias toward the chances of the 2nd top player winning the tournament.


True, but in a crossover (or folding) system the second rated player has to play a harder opponent in each round than the first rated player. So the top rated player is biased to win in this case. In the sliding system the rating difference between the pairs of players would on average be about the the same, but with the folding system, it would be much higher at the top than at the bottom. So I can see players complaining (especially the lower rated ones) that they had to play a much harder opponent than players who are rated higher than them. In a sliding system this complaint is not justified because the rating difference between the pairs of players would be about the same.

No matter what tournament system we pick there is going to be some players who don't like it. I don't want us to be in a situation where we are defending the system we picked based on vauge arguments. We should be able to say this is what our objects were for the system, we evaluated various systems objectively and picked the one that best met the objectives. Someone else should be able to come to the same conclusion by looking at that. If they differ about the objectives, that's a different story :-)

Anyways Im not against using a folding system. If the simulations show that it's better than we will have a valid justification for using it. If our objective is for the player with the highest true rating to win then it might well be a better system. But how much rating inaccuracy can this system tolerate. I don't know yet, but the simulations can help answer that.


Title: Re: World Championship tournament format
Post by omar on May 23rd, 2005, 6:30pm

on 05/22/05 at 18:56:51, 99of9 wrote:
Don't change them until after you have tested all tournament formats with them.  As long as all tournament formats get to see the same sets of ratings the same number of times, it is probably good to recalculate the real ratings in between iterations.


Thanks for the suggestions.

The way Im doing it now is you have to type:
 run [tournament format program] [other arguments]
and it runs all trials with one tournament format. You have invoke it again to run it on a different tournament format. This makes it easy to add new tourament formats without having to change the 'run' program directly.

But I guess we can seed the random number generator with the same value before running the next tournament format to get what you suggested.


Title: Re: World Championship tournament format
Post by jdb on May 23rd, 2005, 6:41pm
I took a look at the arimaa rating system and did a little analysis.

Rating          Prob. of Winning
Difference    1 Game   3 Game Match

400   0.91  0.98
350   0.88  0.96
300   0.85  0.94
250   0.81  0.90
200   0.76  0.85
150   0.70  0.79
100   0.64  0.70
50     0.57  0.61
0       0.50  0.50
-50    0.43  0.39
-100  0.36  0.30
-150  0.30  0.21
-200  0.24  0.15

If the difference is under (say) 200 points the chance of an "upset" is still fairly common.  The current arimaa rating list has about 10 players in the 1850-2050 range. A short tournament would have a tough time separating these players.

Title: Re: World Championship tournament format
Post by 99of9 on May 23rd, 2005, 6:44pm

on 05/23/05 at 18:12:22, omar wrote:
but in a crossover (or folding) system the second rated player has to play a harder opponent in each round than the first rated player. So the top rated player is biased to win in this case.


That is definitely preferable if our aim is to select the player with the highest real rating.  No matter how large the ratings inaccuracy parameter is, the player with the highest predicted rating is the *most likely* to have the highest real rating (unless there has been ratings manipulation).  Therefore if you're going to give someone this "favouritism" of course it is better to give the #1 the easier run than to give it to #2.  [Although i agree that in principle it would be even better still for them both to play opponents with an approximately equal total rating.]

On a pragmatic basis it is also better to favour the #1 so that the best player is not tempted to manipulate his/her rating to go into the tournament as #2.  This is something that simulations will not tell us!

Title: Re: World Championship tournament format
Post by 99of9 on May 23rd, 2005, 6:50pm

on 05/23/05 at 18:30:54, omar wrote:
But I guess we can seed the random number generator with the same value before running the next tournament format to get what you suggested.


But that will only work if the random number generator is not used during each tournament itself.  Different tournament schemes might call the RNG a different number of times.

Perhaps my warning was too strong anyway though - the disadvantage with the tournaments seeing different ratings lists is only that more samples will be required to obtain correct, low error averages.  Perhaps your simulation will run fast enough to make these averages repeatable and accurate anyway.

Title: Re: World Championship tournament format
Post by 99of9 on May 23rd, 2005, 7:03pm

on 05/23/05 at 18:12:22, omar wrote:
So I can see players complaining (especially the lower rated ones) that they had to play a much harder opponent than players who are rated higher than them. In a sliding system this complaint is not justified because the rating difference between the pairs of players would be about the same.


This complaint *is* justified in your scheme for #9, who as I outlined above has to play #1, when #8 gets to play against #16 !!!  This difference between opponents of adjacent players is bigger than any that will occur in the crossover scheme.  I think you've just put all your "opponent ratings divergence" eggs in one basket (#8 vs #9) rather than spreading them out throughout the field.

Anyway... how likely is it really that #16 is actually the best real-rated arimaa player???  If the aim is to select the best player, then it makes sense to me that the #16 is the most likely to get knocked out in the first round.  In the sliding scheme #9 is just as likely to get knocked out as #16 (and perhaps *more* likely if the #1 happens to be way ahead of the field, rated at 2300 perhaps).

Title: Re: World Championship tournament format
Post by 99of9 on May 23rd, 2005, 7:27pm

on 05/21/05 at 11:03:50, omar wrote:
We can vary the distribution range as well and see how that effects the results. But keep in mind that we are not interested in the result of any one format, but rather a comparision between formats. So if format A is better than format B for the range I proposed, it will probably be better on a different range as well.

This sounds like a guestimate to me :-).  Actually I was more talking about the distribution shape than about the range.  You assume a uniform-random distribution, but to me it seems more likely to have a long spread-out tail toward the higher ratings.


on 05/21/05 at 11:03:50, omar wrote:
Since there doesn't seem to be any volunteers to do this, I'll start on it.

I admire your get-in-there-and-do-it work ethic!  I think the results will certainly be interesting, but the coding of many different tournament schemes too time-consuming for my liking.

Title: Re: World Championship tournament format
Post by omar on May 23rd, 2005, 11:01pm

on 05/23/05 at 18:41:56, jdb wrote:
A short tournament would have a tough time separating these players.


Yep, that what Im finding also as I run the simulations.

Title: Re: World Championship tournament format
Post by omar on May 23rd, 2005, 11:56pm
Well Toby, you will be happy to know that the crossover (or folding) variation of the single elimination tournament works relatively well.

Here's what Im finding:

2000 trials were done for each of the following type of tournaments. Each trial used 16 random players with a measured ratings in the range of 1500 to 2000 and a rating error of +-50.

seR: 28.8%   single elim; random seeding of players
seO: 24.9%  single elim; players ordered by ratings
seS: 28.0%   single elim; slide method of pairing at each round
seF: 35.6%  single elim; fold method of pairing at each round
rrS: 36.7%  round robin single game between each pair
rrD: 37.6%  round robin double; two games between each pair

Very interesting results. seF is almost as good as round robin and seS is no better than random seeding.

Suprisingly round robin, with all that extra effort is not much better than single elimination; and double RR is hardly any better than single.

But maybe there are still bugs in my programs. Right now, I can't seem to find them. But feel free to download the programs and try it out.

http://arimaa.com/arimaa/tourn/compare/sim.tar

or in ZIP format:

http://arimaa.com/arimaa/tourn/compare/sim.zip


Title: Re: World Championship tournament format
Post by 99of9 on May 24th, 2005, 2:42am
Thanks for providing those Omar.

I have no idea about the language the programs are in, so I'll stick to testing them out.

One problem is:
run singleElimFold 1000 2 0 0 99999
 1  100.0%

In other words if the 2 top players in a tournament both have the same rating, the program counts either of them winning as a successful tournament.

Here's another very strange result, but maybe it is legit...
run singleElimSlide 1000 16 500 0 99999
 1   20.0%
 2   20.8%
 3   18.9%
 4   13.2%
(all I've done is remove all possibility of draws and make the predicted ratings perfect)  

Does the sliding mechanism really favour player #2 so much that they are *more* likely than #1 to win??

Omar is there a way to fix the ratings by hand and then simulate many tournaments with that set?  (I've only skimmed readme.2nd - so just tell me if I should look at it harder)

Title: Re: World Championship tournament format
Post by jdb on May 24th, 2005, 8:34am

Quote:
In other words if the 2 top players in a tournament both have the same rating, the program counts either of them winning as a successful tournament.


Sounds ok to me


Quote:
Does the sliding mechanism really favour player #2 so much that they are *more* likely than #1 to win??


It can happen. Consider a 4 person tournament.
Mr. A  rated 2000
Mr. B rated 1950
Mr. C rated 1945
Mr. D rated 1700

The fold seeding method favours Mr A
The slide seeding method favours Mr B

Although, I'm not sure if the slide method actually gives Mr. B a higher chance of winning than Mr. A.

Title: Re: World Championship tournament format
Post by 99of9 on May 24th, 2005, 8:45am

on 05/24/05 at 08:34:19, jdb wrote:
It can happen. Consider a 4 person tournament.
Mr. A  rated 2000
Mr. B rated 1950
Mr. C rated 1945
Mr. D rated 1700

The fold seeding method favours Mr A
The slide seeding method favours Mr B

Although, I'm not sure if the slide method actually gives Mr. B a higher chance of winning than Mr. A.


If Mrs A was instead rated 1951 then you're right, B would definitely have a higher total chance than A.  Good example.

About the 2 equal rated player thing - I'd prefer it to record 50% (as it would if the players were split by 1 rating point).  That way accidents don't skew the overall averages.  But actually the reason I want it is mostly because there was another calculation I wanted to do but couldn't the way it is currently set up.

Title: Re: World Championship tournament format
Post by omar on May 24th, 2005, 1:51pm

on 05/24/05 at 02:42:00, 99of9 wrote:
I have no idea about the language the programs are in, so I'll stick to testing them out.


They are written in Perl. A good language for writting programs quickly if you don't care too much about the program running very fast. A perl program runs about 100 times slower than a C program, but you can probably write it 10 times faster in Perl.


Quote:
One problem is:
run singleElimFold 1000 2 0 0 99999
 1  100.0%

In other words if the 2 top players in a tournament both have the same rating, the program counts either of them winning as a successful tournament.


I wanted to suggest a quick change that you could make to your copy to get that, but I don't see how to do it. Here is what's currently happening:
After a tournament completes we have an array with the names of the players that tied for first place. For each of these players we determine their rank based on their true rating. We maintain another array that is a histogram of what ranks have won the tournament. The result of 'run' is printed from that histogram array.


Quote:
Here's another very strange result, but maybe it is legit...
run singleElimSlide 1000 16 500 0 99999
 1   20.0%
 2   20.8%
 3   18.9%
 4   13.2%


Seems the slide method does give the second player a pretty good chance, but the advantage of 0.8% is just a fluke. When I ran it a couple of times the second rank came out a bit lower then the first.


Quote:
Omar is there a way to fix the ratings by hand and then simulate many tournaments with that set?  (I've only skimmed readme.2nd - so just tell me if I should look at it harder)


Yes, there is now. I ment to add that, but forgot. Download a new copy to get this feature. You can create a file with the player names and measured ratings and pass the name of that file in place of the 'number of players' argument.

Title: Re: World Championship tournament format
Post by omar on May 24th, 2005, 2:03pm
Wow, a major break-through !!!!

I came up with a tournament format that is both short, and very good; it's even better than round robin; it has a percentage of about 64% compared to 37% for round robin for the 16 player case. I 've named this format the swissKnife :-)

Download the new version:
 http://arimaa.com/arimaa/tourn/compare/sim.tar
or
 http://arimaa.com/arimaa/tourn/compare/sim.zip

and run it like this:
 run swissKnife 1000 16 500 50

to see a detailed description of how it works type:
 formats/swissKnife

Absolutely amazing :-)

Title: Re: World Championship tournament format
Post by Fritzlein on May 24th, 2005, 6:09pm
I am curious about the swissknife pairing scheme, but I can't untar files on my current Windows machine.  I'll have to download some utility for that.

I wanted to weigh in further about pairing people based on ratings.  Suppose there are four players in the tournament with accurate ratings of

Player A: 2100
Player B: 2000
Player C: 1900
Player D: 1800

Now if it is know that the tournament pairings will be "sliding", i.e. #1 vs. #3 in one game and #2 vs. #4 in the other, then Player A has a strong incentive to purposely lose games to get his rating lower.  If he can lower his rating by 150 points to 1950, then he will get to play Player D instead of Player C in the first round for a significantly easier matchup.

Or supposing that Player A doesn't realize this opportunity and keeps his high rating.  In that case Player C has an incentive to intentionally lower his rating down to 1750 so the he will get to play Player B instead of Player A.

I think that we must reject in advance any system which would reward players (even if only a few of the players) for purposely throwing their games to get a lower rating before tournament begin.  Yes, there is a bias towards the higher-rated players in "folding" pairing, but that is exactly the way it should be.  A higher rating should always be beneficial or at least neutral, or else we could see some very strange sandbagging going on in advance of the tournament.

Title: Re: World Championship tournament format
Post by Fritzlein on May 24th, 2005, 6:13pm

on 05/22/05 at 10:32:52, omar wrote:
Karl, while Im working on the programs for comparing tournament systems, could you please do some kind of analysys with the games archive to see how well the ratings are predicting outcome of games and see if that can be converted to a number we can use for rating inaccuracies in the simulations.


I'll see whether I can come up with some interesting and possibly useful numbers.

Title: Re: World Championship tournament format
Post by 99of9 on May 24th, 2005, 6:38pm
Haha, very good Omar ;-).  Now I see why Fritz is so curious about this one!

Actually that is exactly what I was thinking of writing yesterday until I realised I couldn't understand the language they were written in!

Now that you've enabled us to specify ratings, I think I will be able to give a second quantitative criteria which we can rate the tournaments on.  (In addition to "highest rated player should win most often".)

Title: Re: World Championship tournament format
Post by omar on May 24th, 2005, 7:42pm

on 05/24/05 at 18:09:05, Fritzlein wrote:
I am curious about the swissknife pairing scheme, but I can't untar files on my current Windows machine.  I'll have to download some utility for that.


Actually I ment for the second link to be to a ZIP file. I've fixed it now.

On a Windows PC you will need to install Perl if you don't already have it. You can get it from:

http://activestate.com/Products/Download/Download.plex?id=ActivePerl

Get the most latest version for Windows with the MSI installer.


Title: Re: World Championship tournament format
Post by omar on May 24th, 2005, 7:46pm

on 05/24/05 at 18:38:35, 99of9 wrote:
Actually that is exactly what I was thinking of writing yesterday until I realised I couldn't understand the language they were written in!


You can write the program to implement a tournament format in any language you want. Just put your program in the 'formats' directory and you can start using it. Just read the 2nd README file to see what your program needs to print out.

Title: Re: World Championship tournament format
Post by 99of9 on May 24th, 2005, 11:15pm

on 05/24/05 at 18:38:35, 99of9 wrote:
Now that you've enabled us to specify ratings, I think I will be able to give a second quantitative criteria which we can rate the tournaments on.  (In addition to "highest rated player should win most often".)



Actually, I'm not sure I can do it unless I can fix the real ratings - is that possible as well as fixing the predicted ratings?  

Here's the idea:

If the two (or three...) top players have nearly equal real rating, but one has a slightly higher predicted rating, then their probability of winning the tournament should be as similar as possible.

In fact this applies to any two players in the tournament (eg players 8 and 9).  Ideally a good tournament scheme should give two players with an equal real rating but different predicted rating the same chance of winning.

I agree with Fritz that if there is a difference, you always want to give the advantage to the higher predicted rating, to prevent rating manipulation.   But it is also preferable to minimize the difference in the first place.

On this condition I predict that:
  • Swissknife will fail miserably,
  • Sliding scale will fail number 8+9
  • Crossover will give a gradual linear bias (the higher the predicted position, the stronger the bias)
  • Round robins will perform perfectly.  
  • How will double / triple elimination do I wonder?

Title: Re: World Championship tournament format
Post by omar on May 25th, 2005, 6:12am

on 05/24/05 at 23:15:03, 99of9 wrote:
If the two (or three...) top players have nearly equal real rating, but one has a slightly higher predicted rating, then their probability of winning the tournament should be as similar as possible.


This gets back to the question of what we want the objective of the tournament to be. Some possibilites are:

1. Give the player with the highest real rating the greatest chance of winning. This is what I thought a WC type tournament tries to acheive.

2. Give all players an equal chance of winning (just have a lottery as Arimaanator suggested :-) ).

3. Give players a chance of winning the tournament proportional to their real rating. Round robin acheives this as Toby mentioned.

4. Give the player who showed the best performance at the event relative to their real rating the highest chance of winning. Maybe something like singleElimOrd acheives this.

5. Something else.

Different tournament might have different objectives. For example a WC type tournament I've always thought should try for objective #1. But an open classic type tournament might want to use objective #4.

So I think we need to decide on what the objective of the WC tournament should be.


Title: Re: World Championship tournament format
Post by 99of9 on May 25th, 2005, 8:33am
But that is exactly my point, SwissKnife proves that if you have your sole aim as #1, then one of the best ways to do it is to not play a tournament at all.

Actually I wasn't suggesting #3 either.  I was suggesting that  #1 is a good aim, but needs to be complimented by success at another aim:

The effect of rating error on a person's chances at winning the tournament should be minimized.  i.e. if a person has a real rating of R, and a predicted rating of R+E, their chances of winning should depend almost entirely on R, and as little as possible on E.  (It's still fine to aim for the person with the highest R to have the highest possible chance of a win.)  (I would certainly suggest that the probability of winning as a function of R should be monotonic and smooth.)

So in the example I gave where 2 people have the same R, but different E, a good tournament would give them both similar chances of winning.

Title: Re: World Championship tournament format
Post by omar on May 26th, 2005, 6:06pm

on 05/25/05 at 08:33:34, 99of9 wrote:
So in the example I gave where 2 people have the same R, but different E, a good tournament would give them both similar chances of winning.


Yes, but only tournaments which do not seed players and do not make use of ratings will be able to give players with the exact same real rating (R) the same chance of winning. Any tournament that makes use of measured ratings (M) will not be able to satisfy this.


Quote:
But that is exactly my point, SwissKnife proves that if you have your sole aim as #1, then one of the best ways to do it is to not play a tournament at all.


Actually it turns out that the performance of swissKnife degrades very fast as the number of players increases or the rating inaccuracy increases. My posting earlier only looked at the case when the number of players was 16 and the rating inaccuracy was 50. The performance of roundRobin also degrades as the number of players increases, but very gradually and it is not effected by rating inaccuracies (since it doesn't make use of ratings). For the 16 player case the crossover point between swissKnife and roundRobin seems to be at a rating inaccuracy of about 90.

For the case of 16 players, rating in the range of 1500 to 2000, with a rating inaccuracy of 200 points we get:
 swissKnife: 33.3%
 roundRobin: 44.9%

So the difference can be quite significant. I wouldn't suggest handing over the torphy to the highest rated player without having a tournament just yet :-)

So the bigger picture now says that if you have very few players and the rating inaccuracies are low, you would be better off just using swissKnife. But if you have a lot of players or if rating inaccuries are high then roundRobin is better. So we might not need an additional criterias for judging tournament formats after all. The number that Karl comes up with for the actual rating inaccuracies will be critical. I expect it to be around a 130 or so. The Arimaa rating system has an intrinsic error of about 30 and the error introduced by other factors such as selection of opponents and game speeds probably adds another 100 points.

Let see what Karl finds.

Title: Re: World Championship tournament format
Post by 99of9 on May 26th, 2005, 7:23pm

on 05/26/05 at 18:06:20, omar wrote:
Yes, but only tournaments which do not seed players and do not make use of ratings will be able to give players with the exact same real rating (R) the same chance of winning. Any tournament that makes use of measured ratings (M) will not be able to satisfy this.


I agree, nothing else will satisfy it perfectly, but some bad tournaments are worse than others.  Swissknife will perform much worse on this test than any knockout for example.

Title: Re: World Championship tournament format
Post by Fritzlein on May 27th, 2005, 12:24am

on 05/26/05 at 18:06:20, omar wrote:
The number that Karl comes up with for the actual rating inaccuracies will be critical. I expect it to be around a 130 or so. The Arimaa rating system has an intrinsic error of about 30 and the error introduced by other factors such as selection of opponents and game speeds probably adds another 100 points.

Let see what Karl finds.


I'm not finding anything statistical for you yet, Omar, but I have an empirical project for you.  Assume for a moment that the ratings model we use is perfectly correct, i.e. assume that bots, selection of opponents, and selection of time controls makes no difference.  Take 16 players with true strengths randomly distributed between 1500 and 2000.  Give each of those players an exactly accurate rating and an RU of 30.  Play 1000 games between those players, randomly choosing pairings, basing winning probability on the true strengths, and adjusting their ratings as if those players were playing on the server.  At the end of 1000 games, record the error in rating for each player.  Repeat 100 times for a new 16 players and new 1000 games.  I expect your 1600 recorded errors will form a bell curve of sorts -- calculate the standard deviation of this bell curve.

The number you get out of this trial will be the absolute rock-bottom error in ratings that you can hope for, because that is the best the system as it exists can possibly perform.  Every other factor (and I haven't even mentioned that players may actually be changing in true strength) can only increase that error.

Title: Re: World Championship tournament format
Post by 99of9 on May 27th, 2005, 1:43am
I'll volunteer to do this sometime this weekend if nobody beats me to it.

Title: Re: World Championship tournament format
Post by omar on May 27th, 2005, 4:49pm

on 05/27/05 at 00:24:33, Fritzlein wrote:
The number you get out of this trial will be the absolute rock-bottom error in ratings that you can hope for, because that is the best the system as it exists can possibly perform.


Yes, this is what I was refering to as intrinsic error in my previous post. It is about 30 points. Back in 2003 when I was evaluating various rating formulas to use for the Arimaa rating system this was one of the criterias that I used in the simulations. Another was the convergence rate; i.e. the average number of games it takes for a players rating error to reach this level. For the Arimaa rating system the convergence rate is about 90 games.


Title: Re: World Championship tournament format
Post by omar on May 27th, 2005, 4:50pm

on 05/27/05 at 01:43:46, 99of9 wrote:
I'll volunteer to do this sometime this weekend if nobody beats me to it.


It will be good to see if you come up with about the same numbers.

Title: Re: World Championship tournament format
Post by 99of9 on May 28th, 2005, 6:19am
Standarddev 51.06979


Here is my program, feel free to look for bugs or use it in any way you see fit.  It's public domain.


/*
Program written by Toby Hudson, 28 May 2005.
The specifications for the program were provided by Karl Juhnke, and are given here:

Assume for a moment that the ratings model we use is perfectly correct, i.e. assume that bots, selection of opponents, and selection of time controls makes no difference.  Take 16 players with true strengths randomly distributed between 1500 and 2000.  Give each of those players an exactly accurate rating and an RU of 30.  Play 1000 games between those players, randomly choosing pairings, basing winning probability on the true strengths, and adjusting their ratings as if those players were playing on the server.  At the end of 1000 games, record the error in rating for each player.  Repeat 100 times for a new 16 players and new 1000 games.  I expect your 1600 recorded errors will form a bell curve of sorts -- calculate the standard deviation of this bell curve.
*/

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#define PLAYERS 16
int minrating=1500;
int maxrating=2000;

#define ENSEMBLE 100
int numgames=1000;

int realratings[PLAYERS];
int estimatedratings[PLAYERS];

int errors[ENSEMBLE][PLAYERS];

void ChooseRatings(int ratings[PLAYERS]) {
     int p;
     for (p=0; p<PLAYERS; p++) ratings[p] = minrating + rand()%(maxrating-minrating+1);
}

void CopyRatings(int copy[PLAYERS], int orig[PLAYERS]) {
     int p;
     for (p=0; p<PLAYERS; p++) copy[p] = orig[p];
}

void GetErrors(int diff[PLAYERS], int real[PLAYERS], int est[PLAYERS]) {
     int p;
     for (p=0; p<PLAYERS; p++) diff[p] = est[p] - real[p];
}

double Rand_Double () {
     return ((double)rand()/(double)RAND_MAX);
}

double WinProb (int rA, int rB) {
     return (1.0/(1.0+pow(10.0,(rB-rA)/400.0)));
}

double ChooseWinner (int rA, int rB) {
     double chi;
     
     chi = Rand_Double();
     
     if(chi<WinProb(rA,rB)) return 1.0;
     else return 0.0;
}

void SimulateGames(int games) {
     int n;
     int pA, pB, r_estA, r_estB;
     double ww;
     
     for (n=0; n<games; n++) {
           pA = rand()%PLAYERS;
           pB = pA;
           while (pB==pA) pB = rand()%PLAYERS;
           
           ww = ChooseWinner(realratings[pA], realratings[pB]);
           
           r_estA = estimatedratings[pA];
           r_estB = estimatedratings[pB];
           
           estimatedratings[pA] = (int)(estimatedratings[pA] + 30*(   ww   - WinProb(r_estA, r_estB)) + 0.5);
           estimatedratings[pB] = (int)(estimatedratings[pB] + 30*((1.0-ww)- WinProb(r_estB, r_estA)) + 0.5);
           
           //printf("game %5d, rA %5d rB %5d, r_estA %5d r_estB %5d, wp=%8.5f ww=%5.1f, new_estA %5d new_estB %5d\n", n, realratings[pA], realratings[pB], r_estA, r_estB, WinProb(realratings[pA], realratings[pB]), ww, estimatedratings[pA], estimatedratings[pB]);
     }
}

void ExamineDistribution(int err[ENSEMBLE][PLAYERS]) {
     int e, p;
     int sum=0;
     double sumofsquares=0.0;
     double mean;
     double standarddev;
           
     for (e=0; e<ENSEMBLE; e++) {
           for (p=0; p<PLAYERS; p++) {
                 sum += err[e][p];
                 
                 // printf("%d\n", err[e][p]);
           }
     }
     mean = (double)sum/(double)(ENSEMBLE*PLAYERS);
     
     for (e=0; e<ENSEMBLE; e++) {
           for (p=0; p<PLAYERS; p++) {
                 sumofsquares += pow(err[e][p]-mean,2.0);
           }
     }
     
     standarddev = pow( sumofsquares/(double)(ENSEMBLE*PLAYERS-1) , 0.5);
     
     printf("Mean %e, Standarddev %e\n", mean, standarddev);
     
}

int main () {
     int e;
     
     for (e=0; e<ENSEMBLE; e++) {
           ChooseRatings(realratings);
           CopyRatings(estimatedratings, realratings);
           SimulateGames(numgames);
           GetErrors(errors[e],realratings,estimatedratings);
     }
     
     ExamineDistribution(errors);
}

Title: Re: World Championship tournament format
Post by omar on May 28th, 2005, 9:47am
Changing

for (p=0; p<PLAYERS; p++) diff[p] = est[p] - real[p];

to

for (p=0; p<PLAYERS; p++) diff[p] = abs(est[p] - real[p]);

give a value of about 30.


Title: Re: World Championship tournament format
Post by 99of9 on May 29th, 2005, 12:38am
But I think it's better to think of the error rather than the absolute value of the error.  One big advantage is that the distribution is a bell curve so standard statistical rules-of-thumb can be used.

Title: Re: World Championship tournament format
Post by Fritzlein on May 29th, 2005, 7:36am
If I understand how the code works, I agree with 99of9.  You need to keep the sign on the error to get a correct calculation of standard deviation.  Taking absolute values first will give a smaller number, but that smaller number won't be what statisticians call standard deviation.

If you are simulating a standard deviation of 51 with a uniform distribution, then you need to pick a random number between -102 and + 102.  This should be the minimum amount of error in simulated trials of which tournament format is best.  If we think there is additional error from sources other than random variation of the system functioning as well as possible, then the range of error should increase.  However, as you may see in the other thread, I haven't yet found evidence of other such error, even though I expect it exists.

Title: Re: World Championship tournament format
Post by omar on May 30th, 2005, 6:54pm
Oh OK. I better make a mention of the fact that in my simulation programs the rating inaccuracy parameter is just used to define the range of the unifrom random numbers to add to the true ratings to get the measured ratings. I guess that number should be entered as twice the standard deviation value. So in my results with the rating inaccuracy set to 200 it really means SD of 100.

--- 2005.06.17 ---

I just realized that I said this wrong. The 'rating distribution range' defines the range of measured ratings. The 'rating inaccuracy range' defines the range of the uniform random numbers to add to the measured ratings to set the true ratings. I did it this way because I figured it would not matter which way you do it, but in practice we can limit entries in a tourament based on measured ratings and not true ratings.


Title: Re: World Championship tournament format
Post by omar on Jun 11th, 2005, 10:17am
Has anyone tried out any other tournament formats.

I have not tried out the double or triple eliminations formats proposed by Toby and Karl yet since they are not easy to implement.

However I did try out another seemingly strange tournament format. It works about as good as swissKnife, but it continues to perform well even when the number of players and the rating error range is increased.

The basic idea is to allow all players to play two rounds of games (details of how the pairing is done is described later) and update the players ratings using the Arimaa rating formula, but set the rating uncertainty of all players to the same fixed value (more on this later). After updating the ratings, the two players with the lowest ratings are removed and the tournament continues repeating the same procedure again. When only two players are left the player with the highest rating after the final two rounds is selected as the winner.

In some ways this is similar to swissKnife because it is selecting winner based primarily on ratings. But unlike swissKnife, it actually uses the tournament to refine the original ratings before making the final selection.

I've found that a good value to use for the rating uncertianty of the players is about 1/5 of the 'rating inaccuracy range'. But it is not very sensitive to the value used and even values that are 1/10 to 1/3 of the rating inaccuracy range produce good results.

For the pairing of the players I tried two variations and they both seem to produce almost similar results. The first one which I refer to as the swissSaw pairs #1-#2, #3-#4, ... in the first round and #2-#3, #4-#5, ... in the second round with #1 and lowest rated getting a bye. The second pairing method which I refer to as swissOmatic pairs the lower half against the upper half by sliding in the first round and by folding in the second round.

For comparision I also tried a format that I call 'roundRobinRated'. After running a round robin the ratings are updated and the highest rated player selected as the winner.

Here are the results on some simulations:


2000 trials with each format, 8 players, 500 point rating distribution range, 200 point rating inaccuracy range, 1 in 9999 draw ratio:

46.6%     run swissKnife 2000 8 500 200 9999
45.8%     run singleElimRand 2000 8 500 200 9999
45.5%     run roundRobin 2000 8 500 200 9999
46.1%     run roundRobinDouble 2000 8 500 200 9999
60.7%     run 'roundRobinRated 40' 2000 8 500 200 9999
62.8%     run 'swissOmatic 40' 2000 8 500 200 9999
63.7%     run 'swissSaw 40' 2000 8 500 200 9999

The value of 40 being passed to some of the tournament formats is the rating uncertianty to be used in the rating formula. It is set to 1/5 of the rating inaccuracy range.


same as above, but with 16 players instead of 8:

33.3%     run swissKnife 2000 16 500 200 9999
44.9%     run roundRobin 2000 16 500 200 9999
57.0%     run 'roundRobinRated 40' 2000 16 500 200 9999
59.0%     run 'swissOmatic 40' 2000 16 500 200 9999
61.8%     run 'swissSaw 40' 2000 16 500 200 9999


same as above, but with 32 players:

25.6%     run swissKnife 2000 32 500 200 9999
43.2%     run roundRobin 2000 32 500 200 9999
53.1%     run 'roundRobinRated 40' 2000 32 500 200 9999
52.3%     run 'swissOmatic 40' 2000 32 500 200 9999
56.9%     run 'swissSaw 40' 2000 32 500 200 9999


Feel free to download the simulation package and try out your own experiments.


Title: Re: World Championship tournament format
Post by Fritzlein on Jun 13th, 2005, 1:49am
Omar, you are as creative in thinking up tournament formats as you are in thinking up Arimaa moves.  Using the pre-tournament ratings as a handicapping system is extremely interesting, because it unites some very plausible ideas: (1) We want the player crowned as World Champion to actually be the best player in the world, (2) The ratings are a pretty good indicator of who the best player is, and (3) By forcing certain matchups and by rating those games, we can iron out some rating inaccuracy.

Nevertheless, I am obliged to oppose the SwissSaw tournament format because it could make the World Championship tournament almost pointless.  Supposing the tournament were held right now with the top 16 players (by rating) participating.  There would be four rounds of two games each, but I would get one bye every round due to having the top rating, so I would only play four times.  With an RU of 40, I could lose at most 160 points relative to my opponents, but I would be starting 222 points ahead of 99of9.  He would have to pick up the other 63 points in his games against Robinson, and even winning 3 out of 4 wouldn't be enough for that: he would have to win all 4.  So the conditions to win for the various opponents would be thus:  Fritz is World Champion if he wins any one of four games, 99of9 is World Champion only if he wins all eight of his eight games, and everyone else has no chance, so one wonders why they even signed up to play.  And if everyone under 2000 wisely declined to participate, then only 4 of us would be playing, reducing the tournament to only 2 rounds, and making me automatic winner!

But worse than that, the tournament wouldn't be played with the current ratings, it would be played with the ratings achieved by players knowing that the tournament was upcoming.  If I knew that having a ridiculously high rating going into the tournament could make me the automatic World Champion, I would never play humans at all between now and then.  Instead I would beat the bots over and over again by rote formula until I was 700 points above the highest-rated bot, i.e. until my rating was well over 2500.  If we believe ratings at present are somewhat distorted, you have only to announce that SwissSaw will be used in the next World Championship to drive those distortions to ridiculous extremes.

Indeed, I would like to propose a new axiom for the World Championship Tournament format:  The number of wins required to become champion (and likewise the number of losses permitted before elimination) should be the same for all entrants.  Note that I don't want everyone to have an equal chance of winning, and my axiom doesn't require it.  For example, a single-elimination tournament with 16 players paired by the folding method is most likely to be won by the best player and least likely to be won by the worst player, but everyone is on equal footing in the sense that anyone must win four times to win overall, and that one loss knocks anyone out.

I know, Omar, that you encouraged us to assume that problems in the rating system will have been worked out before the next tournament, but I think that that is a dangerously optimistic assumption.  We should put a new rating system in place and have at least a year to test it and refine it before relying so heavily on ratings in determining the World Championship.  And even with a superb rating system the potential for abuse would remain, for example the #3 player intentionally losing repeatedly to his friend the #2 player to pump that friend's rating way up just before the tournament, thus granting an unfair advantage (headstart) in rating relative to the previously #1 player.

Truly, I don't how one could ever make pre-tournament ratings important in determining the World Champion without inviting ratings abuse.  I would therefore strongly discourage not only SwissSaw, but all similar formats as well.  When one considers the possibility of collusion and other forms of ratings manipulation, such formats are likely to disadvantage everyone who doesn't try to gimmick the system.

Title: Re: World Championship tournament format
Post by omar on Jun 15th, 2005, 10:41am
Im pretty confident that the new rating system will be very immune to distortions. But until we set it up and get comfortable with it, I also would not suggest using it in a WC format that relies on ratings so heavily.

Also I think I had mentioned earlier that only H-H games would be used in computing the rating for the WC qualifer. We could also add other restrictions such as only interactive games with an effective time control of 45 sec  per move or more are counted, only games against players with an RU of less than 60 are counted, etc.

Even still as Karl mentioned collusion could still be a problem; more so in the swissSaw type formats than other formats that use ratings. This is probably the biggest road block for a system that relies so heavily on ratings. But perhaps the rating for the WC qualifer should be based on the highest rating one can establish against one or two very strong bots. For example bot_Bomb2005CC. This would definitely prevent collusion and all players would have an equal chance of trying to master the bot and acheive the highest rating they can before the start of the tournament. So depending on how we define the rating for the WC qualifer we may be able to overcome such obstacles and possibly use a format that relies heavily on ratings.

For this year I am thinking of going with the single elimination format, but using fold for the pairings instead of slide and randomly assigning the colors (singleElimFold). Im still open to using another format if anyone wants to implement it and show that it performs better. The pairing part of the floating triple elimination format is not defined in a constructive way, so Im not sure how to implement it; it may require generating all possible pairings to select the best one. We've seen from the comparison of singleElimFold and singleElimSlide that the method of pairing can make a big difference. So the details of the choices made for pairing in the floating triple elimination format can make a difference in how it performs.

If a format that can fit into about 8 weeks or so and is better than singleElimFold isn't found in about a month or so, I am seriously considering going with singleElimFold for this year.

For next year I will consider a format that has a WC qualifer based on some rating scheme like the one mentioned earlier. I think it will also be good to have a "Open Classic" type tournament next year where the focus of the tournament is not so much on selecting the best player, but rather on having a faster more unpredictable tournament which more people can participate in. Im thinking that perhaps the time per move for such a tournament would be about 45 sec and the format would be single elimination with random pairing. The time per move for the WC next year could then be something higher like 2 min; and the format very focused on selecting the best player.


Title: Re: World Championship tournament format
Post by jdb on Jun 15th, 2005, 11:08am

Quote:
We've seen from the comparison of singleElimFold and singleElimSlide that the method of pairing can make a big difference.


Also, if I understand correctly, the current singleElim formats, repair the field after every round. Sometimes the pairings are finalized at the start, as in tennis.

I will give a go at implementing the double elim format. I'll use Fritzlein's posting of the phoosball brackets as a guide.

I am not familiar with the floating triple elim, so I'll leave that to someone else.


Title: Re: World Championship tournament format
Post by jdb on Jun 15th, 2005, 3:56pm
Omar,

I was working on the tournament simulations and I have some questions.

As a test, I created a very simple tournament. It reads in the player list from the tournament state file. It sorts them, and declares the highest rated player the winner. This all works.   :)

I have a couple questions. ???

1) What rating is the number in the tournament state file? Is it the real or the measured rating(ie taken from rating list)?

2) I expected this tournament format to find the best player 100% of the time. This does not happen! Any idea why?

Thanks

Title: Re: World Championship tournament format
Post by omar on Jun 15th, 2005, 7:39pm
Jeff caught me in the gameroom and chatted with me to clear his doubts about these questions. I'll answer them here also for others who may be interested.

1. The ratings passed to the tournament format program are the measured ratings. The true ratings and the measured ratings are generated by the 'run' program. The measured ratings are passed to the tournament format program in case it wants to use them. The true ratings are passed to the 'simGames' program to generate the outcomes for pairs of players.

2. A format that picks the highest rated player will not be 100% since it only sees the measured ratings which are generated by adding a random number to the true ratings. If the tournament format program were given the true ratings then it would be able to pick the correct player 100% of the times.

Title: Re: World Championship tournament format
Post by jdb on Jun 16th, 2005, 8:01am
The double elim seems to be working. It uses fold repairing for every round.

100 trials on 16person tourny

1 43%
2 33%
3 11%
4 4%
5 2%

However there is a fair bit of variance in the results from run to run.

I'll try and convince WinZip to bundle everything up, to send to Omar.



Title: Re: World Championship tournament format
Post by Tarr on Jun 16th, 2005, 11:44pm
FULL DISCLOSURE:  I do not know how to play Arimaa.  I happened upon this forum just by googling "tournament formats" and perusing the links.  I have since read up on the origin and nature of the game, which is quite interesting.

But what I DO know is tournament formats.  The reason I was googling that was that I just finished the latest edit of the UPA's manual of tournament formats.  UPA stands for Ultimate Player's association - the national (USA) organization for a sport some of you may know as Ultimate Frisbee.  I am in charge of coming up with tournament formats for the national championship series.

Tournaments in Ultimate start at the sectional level and proceed to a national championship, with teams being eliminated at each stage.  The number of teams, and the number of teams eliminated at each stage, varies from event to event.  As such, we need many different formats to handle these variables.  The result is a manual with a ton of different formats.

At any rate, the format you seem to be most interested in is 16 teams, 1 champion ("1 advances"). Here is that format.  The basic idea is to use initial round robins in small groups to sort out the seeding and get a balanced bracket.

Players are divided into the following four groups by initial seeding:

Group A: 1, 8, 12, 13
Group B: 2, 7, 11, 14
Group C: 3, 6, 10, 15
Group D: 4, 5, 9, 16

All players in each group play all the other players in each group.  This comprises the first three rounds of play.

After this, the players are re-seeded into the following bracket:

A1vD4
B2vC3

C2vB3
D1vA4

C1vB4
D2vA3

A2vD3
B1VC4

(NOTE: in the format for Ultimate, the 4th place finishers in each group are eliminated, and the first place finishers in each group get a bye to the quarter finals.  Here I show all 16 teams advancing to the bracket, but it is trivial to trim out the 4th place teams and get the bracket we use in Ultimate.

I would actually suggest dropping the 4th place players and giving byes to the top finishers, as this adds some "teeth" to the initial group results, which could otherwise be seen as harmless preliminaries.)

If the winner of the tournament won their group, they will have gone 6-0 overall (5-0 if you have a bye round for group winners, like I suggest).

One obvious question looking at the group assignments is why they are not the more traditional 1,8,9,16 and so on.  The reason is that this switched seeding set gets the right matchups (1v16, 2v15, ... 8v9) in the round of 16, where it is more important.  The quarterfinal matchups (if seeds hold) are 1v7, 4v6, 3v5, and 2v8.  Not quite perfect, but as close as you can get without a group play rematch.  Semis are the perfect 1v4 and 2v3, and finals are of course 1v2.

Anyway, interested to hear any comments, and/or how this format stacks up in computer simulation against other formats of similar length (7 total rounds, 35 or 39 total games).  Comparing to a format that has more games is of course a bit unfair, since any format can be made more robust by adding more games.  For example, in this case, the final four could be made into double elimination, which would add 3 or 4 more games but add quite a bit of robustness.

Adding games like that is not really an option for Ultimate, since games take a while, and fatigue piles up over a weekend event.  But in Arimaa it might be a possibility.

One issue is what to do in the case of a cyclic tie in the initial groups.  For example, you could have three 2-1 players, and one 0-3 player.  You now need to determine who is the top finisher, and so on.  In Ultimate, point differential is used here.  I don't see an equivalent concept in Arimaa, but you could simply use initial ranking to decide it.  Just a thought.

Title: Re: World Championship tournament format
Post by 99of9 on Jun 17th, 2005, 1:11am
Thanks for stopping by!  I hope you have a game or two of arimaa oneday... it's almost as fun as Ultimate ;-).  [Coincidentally these are my two favourite sports]

It definitely seems an interesting tournament format.  I think one big benefit of including pool groups as you do is that you're more likely to get the lower rankings correct.  Since the arimaa world championship has prizes for 2nd and 3rd places, perhaps we need to think about that! (Since so far we have focussed on only the rating of the winner.)

Title: Re: World Championship tournament format
Post by Fritzlein on Jun 17th, 2005, 12:10pm
Thanks a bunch for your comments, Tarr.  We're open to any kind of good tournament format.

I have been fairly staunchly opposed to using round robins because of possible collusion, and possible indifference on the part of players already out of contention.  However, the fact that it is single-elimination after the opening rounds sharply diminishes the impact of any collusion, and the fact that nobody gets eliminated takes care of the indifference problem.  Even if the bottom team were eliminated, an 0-2 player could have something to play for, because a single win will usually advance, and only possibly be eliminated on tiebreaks.


on 06/16/05 at 23:44:59, Tarr wrote:
One issue is what to do in the case of a cyclic tie in the initial groups.  For example, you could have three 2-1 players, and one 0-3 player.  You now need to determine who is the top finisher, and so on.  In Ultimate, point differential is used.


We had one case where we used a tiebreak rather than a playoff, namely the 2005 computer championship, and it worked pretty well, because we all thought the bot who played better won the title.  You can take the number of moves in each game, where winning quickly is an advantage, winning slowly is a disadvantage, losing slowly is an advantage, and losing quickly is a disadvantage.

It is great that initial rounds are used to determine appropriate seeding, because (as we have discussed in numerous places) the ratings can be wildly inaccurate, and shouldn't be too heavily relied on.

Overall, I think your proposal is great except for one BIG problem: The number of participants in the Arimaa World Championship is not known in advance.  We may have 11 participants or 17.  Whatever format we choose needs to be flexible enough to handle this fairly.  How would suggest we deal with this special need?

Title: Re: World Championship tournament format
Post by jdb on Jun 17th, 2005, 12:41pm
An interesting tournament format. One good point, is every team is guaranteed four games. That way everyone feels they got their money's worth.

One potential problem with round robins is, what to do if a game is defaulted or cannot be played.

One league I am in uses a four team double round robin for the playoffs. The top two teams then play a single game for the championship. This year one of the games was cancelled due to a snow storm. There was no way to reschedule the game.

The league president decided the game was a double forfeit (both teams getting zero points). Some people felt it should have been treated as a tie game (each team getting one point). There was alot of controversy, since, as it turned out, this game decided who finished second and third.

For computer simulations, the probability of the best team winning would be slightly better than a straight single elimination tournament. The round robin phase effectively improves the quality of the seeding going into the elimination phase. If the ratings of the teams are well known in advance then the round robin phase only improves the accuracy a little. However, if the team's ratings are not known in advance, then it should improve over single elimination. (Just my 2 cents)



Title: Re: World Championship tournament format
Post by omar on Jun 18th, 2005, 6:58am
Thanks for sharing this Tarr. It seems like a pretty easy format to implement (for the 16 player case). I'll simulate this format and let you know how it compares with the others.

Title: Re: World Championship tournament format
Post by omar on Jun 18th, 2005, 12:41pm
I decided to change the tournament simulation program so that it generates the true ratings first (to be in the range of the 4th argument) and set the measured ratings by adding the rating inaccuracies to the true ratings. The new program is called run2.

Initially I didn't think it mattered which was done first and since we could limit enteries in a tournament based on measured ratings and not true ratings, I had decided to generate the measured ratings first and set the true ratings from them. But Im finding that a consequence of doing it this way is that as the rating inaccuracies increase the performance of most formats also increases. Even the performance of singleElimRand increases. This is becuase the range of the true ratings increases with rating inaccuracies. It makes more sense if the performance of most formats decrease with increase in rating inaccuries (and the performance of singleElimRand stays constant). Changing it so that the true ratings are generated first and measured ratings set from them does this.

I also changed it so that if you pass the string 'show' to run2 in place of the number of trials, it runs one trial which shows what is happening at each round; pausing for an enter between rounds.

I need to rerun the previous simulations with run2.


Title: Re: World Championship tournament format
Post by 99of9 on Jun 18th, 2005, 11:30pm
Oh good - I think this is definitely a better way to do it.

Title: Re: World Championship tournament format
Post by omar on Jun 19th, 2005, 5:06pm
Here's the results of running the simulations using the new run program (run2).

Each format was run for 2000 trials and the results averaged. The number of players was fixed at 16, the true rating range was set to 500 points the draw ratio was set to 9999. In one set of simulations the rating inaccuracies were set to 50 and in another set they were set to 200.

With inaccuracies set to 50:

run2 'formats/doubleElimFold' 2000 16 500 50 9999
 1   34.6%
 2   23.4%
 3   16.3%

run2 'formats/roundRobin' 2000 16 500 50 9999
 1   32.2%
 2   22.7%
 3   16.1%

run2 'formats/roundRobinDouble' 2000 16 500 50 9999
 1   33.4%
 2   23.8%
 3   16.4%

run2 'formats/roundRobinRated 10' 2000 16 500 50 9999
 1   64.2%
 2   23.6%
 3    8.9%

run2 'formats/singleElimFold' 2000 16 500 50 9999
 1   32.3%
 2   21.7%
 3   15.8%

run2 'formats/singleElimOrd' 2000 16 500 50 9999
 1   20.2%
 2   14.4%
 3   11.8%

run2 'formats/singleElimRand' 2000 16 500 50 9999
 1   25.6%
 2   18.4%
 3   15.8%

run2 'formats/singleElimSlide' 2000 16 500 50 9999
 1   24.7%
 2   19.9%
 3   15.6%

run2 'formats/swissKnife' 2000 16 500 50 9999
 1   65.3%
 2   23.9%
 3    7.7%

run2 'formats/swissOmatic 10' 2000 16 500 50 9999
 1   65.1%
 2   23.6%
 3    8.0%

run2 'formats/swissSaw 10' 2000 16 500 50 9999
 1   65.0%
 2   22.9%
 3    8.3%

run2 'formats/upa16' 2000 16 500 50 9999
 1   27.2%
 2   22.1%
 3   15.2%


With inaccuries set to 200:

run2 'formats/doubleElimFold' 2000 16 500 200 9999
 1   31.9%
 2   22.4%
 3   17.3%

run2 'formats/roundRobin' 2000 16 500 200 9999
 1   30.0%
 2   23.3%
 3   17.4%

run2 'formats/roundRobinDouble' 2000 16 500 200 9999
 1   32.4%
 2   22.7%
 3   15.6%

run2 'formats/roundRobinRated 40' 2000 16 500 200 9999
 1   40.6%
 2   24.8%
 3   16.0%

run2 'formats/singleElimFold' 2000 16 500 200 9999
 1   28.6%
 2   21.6%
 3   16.1%

run2 'formats/singleElimOrd' 2000 16 500 200 9999
 1   23.4%
 2   15.9%
 3   13.1%

run2 'formats/singleElimRand' 2000 16 500 200 9999
 1   25.1%
 2   19.4%
 3   14.7%

run2 'formats/singleElimSlide' 2000 16 500 200 9999
 1   25.9%
 2   18.9%
 3   15.0%

run2 'formats/swissKnife' 2000 16 500 200 9999
 1   35.4%
 2   24.1%
 3   16.9%

run2 'formats/swissOmatic 40' 2000 16 500 200 9999
 1   42.5%
 2   25.4%
 3   15.6%

run2 'formats/swissSaw 40' 2000 16 500 200 9999
 1   42.0%
 2   25.3%
 3   15.6%

run2 'formats/upa16' 2000 16 500 200 9999
 1   26.9%
 2   20.8%
 3   15.7%


Title: Re: World Championship tournament format
Post by omar on Jun 19th, 2005, 5:34pm
The doubleElimFold format was contributed by Jeff Bacher. The upa16 format is my implementation of the format Tarr mentioned earler.

First thing I noticed is that the formats that don't make any use of the ratings (or use them only for tie breaks) have about the same performace when the inaccuracy range is increased from 50 to 200. These formats include:
singleElimRand, roundRobin, roundRobinDouble, and upa16.

The other thing I noticed is that formats which make use of ratings usually out perform the formats that don't even when the rating inaccuracies were higher.

The doubleElimFold seems to perform slightly better than the singleElimFold; about 2 or 3 percent. But it will run for about 11 or 12 weeks compared to about 6 for singleElimFold.




Title: Re: World Championship tournament format
Post by omar on Jun 19th, 2005, 5:36pm
If you want to try out any format to see how it works, just run the 'run2' program with the format and 'show' as the number of trials. For example:

run2 format/singleElimRand show


Title: Re: World Championship tournament format
Post by Tarr on Jun 23rd, 2005, 10:37am
Wow, lots to comment on here.  I'll start with a comment on just the simulation results.

First, once again, it's important to keep in mind the number of total games when comparing two formats.  I can easily add robustness to a format simply by adding more games strtegically.  So when comparing the simulation results, keep that in mind "swissOmatic" and "swissSaw" and "round robin" formats take 15-16 rounds of play for a 16 player tournament.  It would be pretty shocking if a format like the upa one (7 rounds of play) could produce equally robust results.

(It's also worth noting that unless there are special provisions which I am not aware of, both the swiss methods will produce a ton of rematches, which hardly seems optimal).

Comparing the upa format to double elimination is a bit more fair, as the double elimination format is only 9 rounds.  Still, that's more rounds to work with = more robust results.  The interesting results to me are:

- A comparison of the single elimination methods shows that the "fold" is the more robust that the "slide", or than random parings.  This is not surprising at all, but is nice to confirm.

- The UPA format does not do any better than straight single elimination.  This shows (to me) that the reshuffling of the lower seeds doesn't really help us much in avoiding later upsets.  This clearly demonstrates that if we want robustness in a format, we need extra games at the _end_ of the format to make sure we sort the top players correctly.  An obvious simple approach would be to have the four semifinalists play a double elimination for the top three spots.

I think a good approach would be to first decide how many games you are willing to give each player at maximum, and how many rounds the tournament may last at maximum.  Once I know this, I can suggest a format that I think would work well for you.

Title: Re: World Championship tournament format
Post by Tarr on Jun 23rd, 2005, 10:42am
One more quick comment:

While I suggested the "16 teams, 1 advances" format, we actually have a whole manual of formats for a variety of permutations.  So I can easily draw on that to suggest something for, say, 18 players, three "advance".  I say three because I am now aware that you care about not just who finishes first, but also second and third.  So the one team advances format is probably not the best, since it is primarily concerned with crowning the champion.

But as I said above, there's no point in me suggesting specifics until I have a better sense of the maximum number of games allowed.

Title: Re: World Championship tournament format
Post by omar on Jun 26th, 2005, 9:31am
In the 2004 WC we had 18 players and in the 2005 WC we had 10 players. The number of players can vary quite a bit. Since none of us are professional Arimaa players :-) and have other commitments we limit the rate of the games to just one game per player per week. We also try to avoid simultanious games so that everyone has a chance to watch the games of other players. There is also the constraint that we want the tournament to finish in about two months due to other events coming up. It could be a little longer than 8 rounds, but probably not more than 12.

Although second and third place is recognized in our WC, getting first place correct I think is the most important factor for a WC type tournament.

These experiments have convinced me that incorporating a rating system into the tournament significantly improves its performance over a similar version that does not. For example roundRobinRated is significantly better than roundRobinDouble even though double has twice as many rounds and twice as many games.


Title: Re: World Championship tournament format
Post by omar on Jun 27th, 2005, 6:13pm
I ran the simulations on some more formats. Here are the results; Im also including results from the previous simulations for comparison and organizing it in a table so that it's easier to see:
formatinacc=50 inacc=200inacc=400
randomSelection (0)6.3% 6.3%6.3%
roundRobin (15)32.2% 30.0% 31.7%
roundRobinDouble (30)40.8% 39.9%40.8%
roundRobinRated inacc/5 (15)64.2% 40.6% ? %
roundRobinRatedEqual inacc/5 (15)35.3% 35.5% ? %
roundRobinRatedRank inacc/5 (15)46.0% 38.5% ? %
singleElimRand (4)25.6% 25.1% ? %
singleElimOrd (4)20.2% 23.4% ? %
singleElimSlide (4)24.7% 25.9% ? %
singleElimFold (4)26.8% 28.6%24.4%
swissKnife (0)65.3% 35.4%6.0%
swissSaw inacc/5 (16)65.0% 42.0% ? %
swissOmatic inacc/5 (16)65.1% 42.5%31.1%
swissOmaticEqual inacc/5 (16)38.8% 38.2% ? %
swissOmaticRank inacc/5 (16)43.1% 40.8% ? %
doubleElimFold (11.6)34.6% 31.9% ? %
upa16 (7)27.2% 26.9% ? %
floatDoubleElim (7.7)30.9% 31.6%29.4%
floatTripElim (10.3)35.3% 34.7%33.5%
floatTripElim2 (11.4)35.7% 34.9%34.4%
floatTripElimRand (11.0)35.2% 33.9%33.5%
floatQuadElim (13.4)35.1% 36.3%35.1%

The simulations were run as follows:

run2 'formats/roundRobin' 2000 16 500 50 9999

Each format was run for 2000 trials, with 16 players, true rating range of 500, measured rating inaccuaracy of 50 or 200, and a draw ratio of 1:9999.

The number in parenthesis after the format name is the average number of rounds the format requires.

Title: Re: World Championship tournament format
Post by omar on Jun 27th, 2005, 6:58pm
The new formats are:

roundRobinRatedEqual, roundRobinRatedRank, swissOmaticEqual and swissOmaticRank.

After thinking about the points Karl raised regarding the use of players inital measured ratings in formats such as swissSaw, swissOmatic and roundRobinRated, I wanted to see what would happen if they did not use the players inital ratings, but still used a rating system to determine the winner.

So roundRobinRatedEqual and swissOmaticEqual, are the same as their original corresponding formats except that they set the initial ratings of all the players to the same value. These formats continued to preform better than formats that did not use rating systems. Not nearly as good as the original formats, but they eliminated the problems that can arise from relying heavily on the players initial ratings (see Karl's posting of Jun 13th). For example swissOmaticEqual performed better than roundRobin, roundRobinDouble, singleElimFold, doubleElimFold and upa16. Both swissOmaticEqual and roundRobinRatedEqual are 100% fair formats, in that they do not favor any player over another.

The next thing that I wanted to try was to rank the players based on their initial ratings and use some rank based ratings as the initial ratings for the rating system. The lowest rated player was given a ranked rating of 2000, the next higher rated player was given a ranked rating of 2000+delta, the next higher rated received a ranked rating of 2000+2*delta and so on. If two players had the same ratings their ranked ratings would also be the same. The value used for delta was the rating inaccuracy divided by 15. I tried various values for delta and found this produced a performance that was about midway between the original format and the Equal variant.  These formats are called roundRobinRatedRank and swissOmaticRank. These formats are a bit unfair in that they do favor higher rated players. However, all players will still have some chance of winning if they have a good performance at the event and a player will not be able to gaurentee themselves a win by getting their rating way above the competition. So they don't suffer as much as the orignal formats, but still have a performance that is significantly better than other formats.


Title: Re: World Championship tournament format
Post by jdb on Jun 27th, 2005, 7:34pm
I ran one "show" version of the swissOMaticEqual variant to have a look at how the pairings went.

It is an interesting format,  that shows alot of promise.

Here is the first part of the ts file for one run:

# Round 1
*%
* Ratings as of round 1
player p1 2000
player p2 2000
player p3 2000
player p4 2000
player p5 2000
player p6 2000
player p7 2000
player p8 2000
player p9 2000
player p10 2000
player p11 2000
player p12 2000
player p13 2000
player p14 2000
player p15 2000
player p16 2000
*
pair p9 p1 winner p9
pair p10 p2 winner p10
pair p3 p11 winner p11
pair p12 p4 winner p12
pair p13 p5 winner p13
pair p14 p6 winner p14 p6
pair p15 p7 winner p15
pair p8 p16 winner p8
* next round 2

# Round 2
pair p1 p16 winner p16
pair p2 p15 winner p15
pair p3 p14 winner p3
pair p4 p13 winner p4
pair p12 p5 winner p5
pair p6 p11 winner p11
pair p7 p10 winner p10
pair p8 p9 winner p8
* next round 3


Two questions:

1) Which two players get dropped?
2) And why?


Answers)
1) p1 & p7 got dropped
2) p7 and p2 have IDENTICAL records. They both lost to p15 and p10. What is the criteria to decide in this case?

I honestly only ran the simulation once, and this popped up!


Title: Re: World Championship tournament format
Post by omar on Jun 28th, 2005, 7:33am
I changed the swissOmaticEqual format to show the list of players with their ratings before the last two are dropped.

When players have the same ratings the order within this range of players is the same as the order in which they were in the last listing of ratings or if that is not available then the order in which they were first given.

It's a bit of bad luck for the last two players on the list, because you can have a situation like this:

last 2 will be dropped
p2 2020
p5 2019
p9 2019
p4 2019
p1 2001
p14 2001
p11 2001
p10 2000
p15 2000
p3 1999
p6 1999
p16 1999
p8 1981
p12 1981
p13 1981
p7 1980

But the players that get dropped have lost two consecutive games.


Title: Re: World Championship tournament format
Post by Tarr on Jul 5th, 2005, 6:56pm

on 06/26/05 at 09:31:50, omar wrote:
In the 2004 WC we had 18 players and in the 2005 WC we had 10 players. The number of players can vary quite a bit.


Well in that case, it's useful to have an abaptable format.  More on this later.


on 06/26/05 at 09:31:50, omar wrote:
There is also the constraint that we want the tournament to finish in about two months due to other events coming up. It could be a little longer than 8 rounds, but probably not more than 12.


Well, then... doesn't that rule out most of the formats you're running simulations on?  The swiss and round robin formats you're looking at all take more than 12 rounds, as long as there are more than 13 players.

At the risk of sounding like a broken record - if you're comparing the accuracy of formats with drastically different numbers of games, you're comparing apples and oranges.


on 06/26/05 at 09:31:50, omar wrote:
Although second and third place is recognized in our WC, getting first place correct I think is the most important factor for a WC type tournament.


Understood.


on 06/26/05 at 09:31:50, omar wrote:
[...]experiments have convinced me that incorporating a rating system into the tournament significantly improves its performance over a similar version that does not. For example roundRobinRated is significantly better than roundRobinDouble even though double has twice as many rounds and twice as many games.


Agreed, using rankings amounts to using more information, which is generally a good thing.

Let me take another crack at this, now that I have a little better understanding of the constraints.  There will be some obvious similarities to "upa16" but I've modded it up a bit.

The following format can work for any number of players 12 or higher, although it works best with 16 or more.

Stage one- group play (2-5 games).

Players are seeded into four groups based on their rankings.  I've listed player rankings up to 24, it should be obvious how to rank players beyond that.  If you have fewer players, the group is smaller.

Group A: 1, 8, 11, 14, 20, 21
Group B: 2, 7, 12, 13, 19, 22
Group C: 3, 6, 9, 16, 18, 23
Group D: 4, 5, 10, 15, 17, 24

After group play, players are ranked within each group.  All 4th, 5th, and 6th place players are eliminated.

Stage 2: crossover games (1 game)

The following crossover games are then played.  The first two are for seeding purposes, while the others are elimination games:

1) A1 vs. C1
2) B1 vs. D1

3) A2 vs. C3
4) B2 vs. D3
5) C2 vs. A3
6) D2 vs. B3

The players are then ranked as follows:

1)  The winner of A1 vs. C1
2)  The winner of B1 vs. D1
3)  The loser of A1 vs. C1
4)  The loser of C1 vs. D1
5)  The higher ranked of (A2 vs. D3 winner) and (D2 vs. A3 winner)
6)  The higher ranked of (C2 vs. A3 winner) and (A2 vs. C3 winner)
7)  The lower ranked of (A2 vs. D3 winner) and (D2 vs. A3 winner)
8 )  The lower ranked of (C2 vs. A3 winner) and (A2 vs. C3 winner)

Stage 3: modified elimination play (8 games)

These 8 players then play a double elimination bracket.  All games are 2-game series, where the higher ranked player gets draw odds.  EXCEPTION: the finals is a 4-game series.

So, (winner takes the high seeds), the tournament proceeds as such:

round 1 (all games are 2 game series):

1v8
4v5
3v6
2v7

round 2 (all games are 2 game series):

1v4
2v3

5v8 (loser of series is eliminated)
6v7 (loser of series is eliminated)

round 3
1v2 (4 game series - winner is 1st, loser is 2nd)
3v5 (2 game series - loser is eliminated)
4v6 (2 game series - loser is eliminated)

round 4
3v4 (2 game series - winner is 3rd place)

With 16 players, the 1st, 2nd, 3rd, and 4th place players play 12 games total.  It's more or less depending on the number of initial players, which changes the size of the groups.

I'd be interested to see how that compares to other formats which use a similar number of games.

Title: Re: World Championship tournament format
Post by omar on Jul 5th, 2005, 10:11pm

on 07/05/05 at 18:56:34, Tarr wrote:
Well, then... doesn't that rule out most of the formats you're running simulations on?  The swiss and round robin formats you're looking at all take more than 12 rounds, as long as there are more than 13 players.


Yes, I ran the simulations on these longer formats just for comparison.


Quote:
Agreed, using rankings amounts to using more information, which is generally a good thing.


Not rankings, but ratings. Actually roundRobinRatedEqual and swissOmaticEqual are not using any more information, they are just using a different way of keeping a score. For example the winner of a round robin is typically defined to be the player who wins the most games. The method of scoring is basically 1 point for win, 0 for loss and 1/2 point for a draw. We could easily change the scoring method to say that if you win against a player you not only get the 1 point but also 1/5 of the points they have accumulated. If we change the method of scoring to something like that we have not added any external information that we already did not have. Adding a rating system and keeping the inital ratings of all players the same basically amounts to changing the way we keep score. We are still using the same win/loss results to compute the score.

We can also add additional external information by making use of the ratings the players already have. Such formats perform extrealy well. roundRobinRated does this. Doing this significantly improves the performance of a tournament in picking the true best player. But it runs the risk of players manipulating their ratings prior to the start of the tournament. roundRobinRatedRanked tries to minimize this by resetting the initial ratings so that they are equally spaced based on the rank from the inital ratings.


Title: Re: World Championship tournament format
Post by Tarr on Jul 6th, 2005, 1:22pm

on 07/05/05 at 18:56:34, Tarr wrote:
1)  The winner of A1 vs. C1
2)  The winner of B1 vs. D1
3)  The loser of A1 vs. C1
4)  The loser of C1 vs. D1
5)  The higher ranked of (A2 vs. D3 winner) and (D2 vs. A3 winner)
6)  The higher ranked of (C2 vs. A3 winner) and (A2 vs. C3 winner)
7)  The lower ranked of (A2 vs. D3 winner) and (D2 vs. A3 winner)
8)  The lower ranked of (C2 vs. A3 winner) and (A2 vs. C3 winner)


Uh... yeah, that's not quite right.

1)  The winner of A1 vs. C1
2)  The winner of B1 vs. D1
3)  The loser of A1 vs. C1
4)  The loser of B1 vs. D1
5)  The higher ranked of (A2 vs. C3 winner) and (C2 vs. A3 winner)
6)  The higher ranked of (D2 vs. B3 winner) and (B2 vs. D3 winner)
7)  The lower ranked of (A2 vs. C3 winner) and (C2 vs. A3 winner)
8)  The lower ranked of (D2 vs. B3 winner) and (B2 vs. D3 winner)


Quote:
We can also add additional external information by making use of the ratings the players already have. Such formats perform extrealy well. roundRobinRated does this. Doing this significantly improves the performance of a tournament in picking the true best player. But it runs the risk of players manipulating their ratings prior to the start of the tournament. roundRobinRatedRanked tries to minimize this by resetting the initial ratings so that they are equally spaced based on the rank from the inital ratings.


How about a Bayesian approach?  Let everyone come in with a "tournament" ranking equal to their true ranking, but give it only a small weight (maybe two or three games worth).  As they play in the tournament, their rating will change.

Title: Re: World Championship tournament format
Post by omar on Jul 7th, 2005, 10:33pm

on 07/06/05 at 13:22:11, Tarr wrote:
How about a Bayesian approach?  Let everyone come in with a "tournament" ranking equal to their true ranking, but give it only a small weight (maybe two or three games worth).  As they play in the tournament, their rating will change.


Yes, this is exactly what the RoundRobinRank and SwissOmaticRank formats are doing.

Title: Re: World Championship tournament format
Post by omar on Jul 8th, 2005, 2:06pm

on 07/06/05 at 13:22:11, Tarr wrote:
Uh... yeah, that's not quite right.

1)  The winner of A1 vs. C1
2)  The winner of B1 vs. D1
3)  The loser of A1 vs. C1
4)  The loser of B1 vs. D1
5)  The higher ranked of (A2 vs. C3 winner) and (C2 vs. A3 winner)
6)  The higher ranked of (D2 vs. B3 winner) and (B2 vs. D3 winner)
7)  The lower ranked of (A2 vs. C3 winner) and (C2 vs. A3 winner)
8)  The lower ranked of (D2 vs. B3 winner) and (B2 vs. D3 winner)


Tarr I don't think I will be able to try this out soon. Things have gotten a bit busy for me again. If someone else wants to code this and send it to me I'll include it in the results.

Also if anyone is trying out another format, please post the results soon. I've got to start on the pages for the up coming events (including the human WC) soon.


Title: Re: World Championship tournament format
Post by Fritzlein on Jul 26th, 2005, 4:14pm

on 06/11/05 at 10:17:59, omar wrote:
I have not tried out the double or triple eliminations formats proposed by Toby and Karl yet since they are not easy to implement.


I thought about this a bit, and I realized I also can't think of a way to do the pairing for my brand of triple-elimination without looking at all possible pairings and seeing which is least bad.  For a tournament of even moderate size, looking at all possible pairings is prohibitively time-consuming.

Just so that my triple elimination can be compared with the other formats at all, let me present an algorithm for triple-elimination.  It won't meet my ideal criteria, but it can at least run.

1) Everyone plays until they have lost three times.  At the end of each round, any player who has lost for the third time is eliminated.

2) At the beginning of each round, if an odd number of players remain, a bye must be assigned.  The bye goes to the player who has recieved the fewest byes so far, with ties broken in favor fewest losses so far, and further ties broken in favor of highest pre-tournament rating.

3) The players who don't get a bye are paired iteratively as follows: Select the unpaired player with the fewest losses (ties broken by highest pretournament rating), and pair him with the player in the field he has played the least number of times, with ties broken in favor of having fewest losses, and further ties broken in favor of lower pre-tournament rating.

If all favorites win in every round, the pairings in a 16-player tournament will be

1v16, 2v15, 3v14, 4v13, 5v12, 6v11, 7v10, 8v9
1v8, 2v7, 3v6, 4v5, 9v16, 10v15, 11v14, 12v13
1v4, 2v3, 5v11, 6v12, 7v9, 8v10, 13v15, 14v16
1v2, 3v8, 4v7, 5v6, 9v14, 10v13, 11v12
1bye, 2v5, 3v4, 6v10, 7v11, 8v9
1v3, 2v8, 4v6, 5v7
2bye, 1v5, 3v4
3bye, 1v2
1bye, 2v3
1v2

The only avoidable glitch is that 8 and 9 wouldn't have had to play for the second time in the fifth round, but that's the price for algorithmic efficiency.

If all favorites win in every round, the pairings in a 15-player tournament will be

1bye, 2v15, 3v14, 4v13, 5v12, 6v11, 7v10, 8v9
2bye, 1v8, 3v7, 4v6, 5v15, 9v14, 10v13, 11v12
3bye, 1v5, 2v4, 6v10, 7v11, 8v15, 9v13, 12v14
1v3, 2v9, 4v8, 5v7, 6v12, 10v11
1v2, 3v6, 4v5, 7v9, 8v10
1v4, 2v3, 5v8, 6v7
1v6, 2v5, 3v4
1bye, 2v3
1v2
1v2

In this case, no pairing occurred more than once until the very end.  Everyone got to play at least three games, against three different opponents, and many close pairing occurred.

I conclusion, I still love my triple elimination, even when it has to be impaired slightly in the name of efficiency.  If someone would code and test it, I bet it would do as well at picking a winner as any format which makes no use (apart from seeding) of pre-tournament ratings.

Title: Re: World Championship tournament format
Post by omar on Jul 27th, 2005, 7:47am
Thanks for describing the tournament this way Karl, it makes it much more easier to implement.

Im assuming the color assignment would be random.

I probably won't be able to get to this for a few weeks. If someone has a litte time maybe they can take a crack at it. It would definitely be fun to code this up.


Title: Re: World Championship tournament format
Post by Fritzlein on Jul 27th, 2005, 11:45am

on 07/27/05 at 07:47:32, omar wrote:
Thanks for describing the tournament this way Karl, it makes it much more easier to implement.

Im assuming the color assignment would be random.


I'm glad this version of "floating triple elimination" is at least implementable.  It's amusing that, although I thought I kicked off this discussion way earlier than necessary, you still may not be able to change the format in time for the next World Championship.  :-)  At least the ideas will be in play for the following year, though.

As for color assignment, that didn't factor into your testing of other schemes, did it?  Fortunately color assignment doesn't seem to have a large effect on winning percentage, according to the database statistics.  Still, for the sake of fairness, I would suggest that, within each pairing, whoever has played Gold fewer times so far in the tournament should get to play Gold for that game, with ties broken randomly.  That won't entirely even out color assignments, but will be more fair than, say, "higher rated player always gets Silver".  One reason to keep it even is that we aren't even positive which color has the advantage!

Title: Re: World Championship tournament format
Post by 99of9 on Jul 27th, 2005, 4:58pm

on 07/27/05 at 11:45:16, Fritzlein wrote:
One reason to keep it even is that we aren't even positive which color has the advantage!


Another is to keep people like me happy who are convinced that they are worse with a particular colour. :-)

Title: Re: World Championship tournament format
Post by omar on Jul 31st, 2005, 1:07am
I was trying to learn Ruby and I figured this would be a good chance to try implementing Karl's floating triple elimination format.

Here's a sample run to check if it is implemented correctly.

##### Player List #####
player  true  measured
p1 1796 1804
p2 1533 1536
p3 1815 1788
p4 1780 1811
p5 1766 1748
p6 1887 1873
p7 1894 1859
p8 1842 1812
p9 1887 1929
p10 1755 1724
p11 1884 1881
p12 1851 1867
p13 1577 1584
p14 1733 1761
p15 1580 1605
p16 1626 1672
#######################

* p9 1929 0
* p11 1881 0
* p6 1873 0
* p12 1867 0
* p7 1859 0
* p8 1812 0
* p4 1811 0
* p1 1804 0
* p3 1788 0
* p14 1761 0
* p5 1748 0
* p10 1724 0
* p16 1672 0
* p15 1605 0
* p13 1584 0
* p2 1536 0

* Round 1
pick p9 p2 winner p9
pick p11 p13 winner p11
pick p15 p6 winner p6
pick p12 p16 winner p16
pick p10 p7 winner p10
pick p5 p8 winner p8
pick p14 p4 winner p4
pick p3 p1 winner p3
===============================

* p9 1929 0
* p11 1881 0
* p6 1873 0
* p12 1867 1
* p7 1859 1
* p8 1812 0
* p4 1811 0
* p1 1804 1
* p3 1788 0
* p14 1761 1
* p5 1748 1
* p10 1724 0
* p16 1672 0
* p15 1605 1
* p13 1584 1
* p2 1536 1

* Round 2
pick p16 p9 winner p9
pick p10 p11 winner p11
pick p6 p3 winner p3
pick p4 p8 winner p4
pick p2 p12 winner p12
pick p7 p13 winner p7
pick p1 p15 winner p15
pick p5 p14 winner p5
===============================

* p9 1929 0
* p11 1881 0
* p6 1873 1
* p12 1867 1
* p7 1859 1
* p8 1812 1
* p4 1811 0
* p1 1804 2
* p3 1788 0
* p14 1761 2
* p5 1748 1
* p10 1724 1
* p16 1672 1
* p15 1605 1
* p13 1584 2
* p2 1536 2

* Round 3
pick p3 p9 winner p9
pick p4 p11 winner p11
pick p6 p16 winner p16
pick p12 p15 winner p12
pick p7 p5 winner p5
pick p8 p10 winner p8
pick p2 p1 winner p2
pick p13 p14 winner p14
===============================

* p9 1929 0
* p11 1881 0
* p6 1873 2
* p12 1867 1
* p7 1859 2
* p8 1812 1
* p4 1811 1
* p3 1788 1
* p14 1761 2
* p5 1748 1
* p10 1724 2
* p16 1672 1
* p15 1605 2
* p2 1536 2

* Round 4
pick p9 p11 winner p9
pick p5 p12 winner p12
pick p8 p16 winner p8
pick p4 p3 winner p4
pick p2 p6 winner p6
pick p15 p7 winner p7
pick p14 p10 winner p10
===============================

* p9 1929 0
* p11 1881 1
* p6 1873 2
* p12 1867 1
* p7 1859 2
* p8 1812 1
* p4 1811 1
* p3 1788 2
* p5 1748 2
* p10 1724 2
* p16 1672 2

* Round 5
pick p9 BYE winner p9
pick p11 p8 winner p8
pick p12 p4 winner p4
pick p6 p10 winner p6
pick p16 p7 winner p7
pick p3 p5 winner p5
===============================

* p9 1929 0
* p11 1881 2
* p6 1873 2
* p12 1867 2
* p7 1859 2
* p8 1812 1
* p4 1811 1
* p5 1748 2

* Round 6
pick p4 p9 winner p9
pick p8 p7 winner p7
pick p11 p5 winner p11
pick p6 p12 winner p12
===============================

* p9 1929 0
* p11 1881 2
* p12 1867 2
* p7 1859 2
* p8 1812 2
* p4 1811 2

* Round 7
pick p9 p8 winner p8
pick p7 p11 winner p7
pick p12 p4 winner p12
===============================

* p9 1929 1
* p12 1867 2
* p7 1859 2
* p8 1812 2

* Round 8
pick p7 p9 winner p9
pick p8 p12 winner p8
===============================

* p9 1929 1
* p8 1812 2

* Round 9
pick p8 p9 winner p9
===============================

result p9 1
===============================

I'll update the table on page 6 with the results of running 2000 trials.


Title: Re: World Championship tournament format
Post by omar on Jul 31st, 2005, 1:33am
Karl's floatingTripElim format is definitely the best of the elimination formats and even better than a round robin. Way better than the singleElimSlide format we used last year.

The swiss formats I came up with (not to be confused with real swiss formats) are better than the FTE, but they require 16 rounds if there are 16 players. The FTE requires only 10 rounds with 16 players. My swiss formats also depend on knowing the inaccuracy of the rating system.

This looks like a very good format to use for WC type tournaments in the future.


Title: Re: World Championship tournament format
Post by Fritzlein on Jul 31st, 2005, 10:20am

on 07/31/05 at 01:07:04, omar wrote:
Here's a sample run to check if it is implemented correctly.


The sample run of floating triple elimination looks like it was implemented correctly.  Also, for that particular run, it performed beautifully.  Nobody had to play the same opponent twice until the very last round!  That's exactly the way I had envisioned a tournament running, with a bunch of quality games for everyone.

I'm glad it also picks the best player a high percentage of the time, and that you are considering it for the upcoming championship.  Of course, you know how biased I am, since I started this whole thread by advocating this format.  :-)  Thanks for at least taking the suggestion seriously.

Title: Re: World Championship tournament format
Post by omar on Jul 31st, 2005, 2:10pm
I found an error in my roundRobinDouble format. For even number of players it was running just like a single round robin. The one test that I did to check it initially used an odd number of players and I assumed it was working right.

I've run the simulations for this format again and updated the table on page 6.

Also I've added the number of rounds each format takes.

Title: Re: World Championship tournament format
Post by 99of9 on Jul 31st, 2005, 6:31pm
Hi Omar,

Since you've already coded Fritz's Floating-Triple-algorithm, perhaps it would not be hard for you to modify it to try the Floating-Double??  For easiest comparison I'd propose exactly the same rules as Fritz, but change the 3-loss condition (1) into a 2-loss condition.

Obviously it will not perform as accurately, but it will use less games.  As Tarr points out, this is a common trade off, and the best choice depends on what the tournament timetable can handle.  If it's not too hard I'd add a column to your table on page six stating the average number of games used per tournament in each format (for a particular standard number of players).

If you really want you could try a Quadruple-Elim ... etc just to get a feel for how fast it converges on perfect :-).

Title: Re: World Championship tournament format
Post by 99of9 on Jul 31st, 2005, 6:37pm
Oh Sorry, I just read the table again and saw that as always you are one step ahead of me.  You have already done Floating-Quadruple.

I'm still interested in Floating double, because I think it is better than the folding-double Fritz explained earlier.

You've also put in the average number of rounds, which is interesting, but I think average number of games would also be of interest.

Title: Re: World Championship tournament format
Post by omar on Jul 31st, 2005, 9:03pm
Hi Toby, I didn't see your postings till just now. But I guess we are thinking alike since I did already try the Quad and Double formats.

It does seem like there is something "right" about triple elimination. The jump in percentage between double and triple elimination is significant and worth the extra rounds. But when going from triple to quadruple the increase in percentage is not much and not worth adding three more rounds.

I also tried a slight variation which I called floatTripElim2. It is identical to the original except that when selecting the second player in the pairing we chose a player having the most loses instead of least loses. It seems to perform about the same as the original, but adds an additional round.

I also compared selected tournament formats using an inaccuracy of 400. The floatingTripElim holds up very well at these levels. The formats which relied very heavily on accurate ratings which had performed better than FTE at lower inaccuracy levels start to drop rapidly at these levels. Most noticable is swissKnife which basically becomes randomSelection.

I also compared the FTE against a double RR for the 4 player case (as in the computer championships). Here's the results:

Rating inacc  50        200          400
FTE (6.7)     66.0%    62.9%    65.4%
DRR (6.0)    55.2%    56.9%    56.4%

It's almost 10% better with only about one extra round. I think FTE is also a good format for the computer championship. It eliminates having to break ties based on moves per game as we encounted in the last tournament. Also it can scale up nicely by keeping the number of rounds fairly low if we allow for more players.

In evaluating various tournament formats we did not try out real swiss style tournaments.
http://en.wikipedia.org/wiki/Swiss_tournament
Within brackets of players with equal score they use the slide method for pairing the upper and lower half. Also the tournament runs for a fixed number of rounds and no players are eliminated. After all the rounds are over the player who won the most games (actually scored the highest) wins. The number of rounds is long enough so that one player can acheive a higher score than the rest of the field (about 2*log(N)/log(2)). Ties are usually broken by sum of opponents scores. I will venture to guess that the performance of a real swiss style tournament will be about equal to the floatingDoubleElim format. Also a swiss tournament lacks the climax ending of a knockout style format.

So in many ways the FTE format looks like a good option.

Title: Re: World Championship tournament format
Post by Fritzlein on Jul 31st, 2005, 10:48pm
The advantage/disadvantage of traditional Swiss is that everyone plays the same number of rounds.  In high school chess tournaments, it was great to know that, after driving halfway across the state, everyone in the van was going to get to play five rounds no matter what.   Also, you never had to face the same guy twice no matter what.

In floating triple elimination, the advantage is that you can go home after you are eliminated.  Also, as the players drop out, repeat pairings are eventually mandatory, so you will get another crack at someone who beats you if you both keep winning.

But the hugest advantage of elimination formats is that your fate remains in your hands until elimination.  Until you get the third loss, you can always rest secure in the knowledge, "If I win the rest of my games I will still be champion."  Each player retains the power to single-handedly vanquish the field.  In traditional Swiss this is not true: if you are behind you need to win plus you need the leaders to lose.  In some of the rating-based schemes Omar proposed this is not true.  And in round-robin tournaments it is very not true, as many players must slog on long after hope is gone.

I realize this has nothing to do with percentage chance of a tournament being won by the best player, but psychologically it seems very important, and in the best spirit of sports and competition.

Title: Re: World Championship tournament format
Post by 99of9 on Jul 31st, 2005, 10:50pm

on 07/31/05 at 21:03:21, omar wrote:
But when going from triple to quadruple the increase in percentage is not much and not worth adding three more rounds.

There is a statistical anomaly in there (FTE performs *better* than FQE at inacc=50).  I think it might be worth running some of the most important formats with more than 2000 trials.

I'd say 10.3 consecutive rounds is already very long to have people guaruntee they can participate (2.5 months!).  It could be even longer if we have more entries than last year.  I personally would prefer the 7.7 round option.


Quote:
I also tried a slight variation which I called floatTripElim2. It is identical to the original except that when selecting the second player in the pairing we chose a player having the most loses instead of least loses. It seems to perform about the same as the original, but adds an additional round.

Although this is a reasonable alternative, I think it is socially bad.  This method means that the bottom ranked player gets smashed in his first 3 games against the top 3 ranked players (assuming the outcomes are predictable).


Quote:
I also compared the FTE against a double RR for the 4 player case (as in the computer championships). Here's the results:

Rating inacc  50        200          400
FTE (6.7)     66.0%    62.9%    65.4%
DRR (6.0)    55.2%    56.9%    56.4%

It's almost 10% better with only about one extra round. I think FTE is also a good format for the computer championship. It eliminates having to break ties based on moves per game as we encounted in the last tournament. Also it can scale up nicely by keeping the number of rounds fairly low if we allow for more players.

Interesting.  I'd really like to see unlimited entries in the CC (max 1 per human of course).  Then we'll never have to rely on the December ratings, which are certainly open to manipulation since humans are better than bots at the present.

Note that extra games are not really that painful in computer tournaments, because they can be run automatically.  In fact I'd say some extra games would add to the interest of spectators.  

You might want to think about how minor places are awarded.  That would be less straightforward if the tournament was not round robin.


Quote:
we did not try out real swiss style tournaments.
...
I will venture to guess that the performance of a real swiss style tournament will be about equal to the floatingDoubleElim format.

That depends how long the swiss tourney goes for, and exactly how you handle players who are in their own score bracket (especially up the top), and how you handle players who've played each other before.  If you give it the same number of rounds as floating_Triple_Elim, and allow players to play previous opponents, I expect it would be about equal with FTE.  Why do you think it will be worse?

Title: Re: World Championship tournament format
Post by omar on Jul 31st, 2005, 10:51pm
I wanted to see how much of the FTE's performance was due to the careful pairing of the players. So I created a version called floatTripElimRand which was identical to the original floatTripElim except that the second player in the pairing is chosen at random from among the players that the first player has played againt the least number of time (instead of in favor of having fewest losses, and further ties broken in favor of lower pre-tournament rating).

I was suprised to see that the random version performed as well as the original version that carefully selected the opponent. The number of rounds increased slightly. This suggests that the details of the pairings don't matter as much as we think they do. Amazing :-)


Title: Re: World Championship tournament format
Post by Fritzlein on Jul 31st, 2005, 11:17pm
Wow, we all three posted withing 3 minutes of each other.  Some more responses:

I agree with 99of9 that pairing by fewest losses vs. most losses is socially bad.  Let's give even the bottom seed a shot a redemption in game three.   But more than that, it slows down the sorting of players.  Of course we can expect the tournament to take longer, because we delay the point at which the top dogs have to prove anything against each other.  It's no accident that traditional Swiss pairings match people with similar records so as to speed up the sorting.

I agree with Omar (not 99of9) that a traditional Swiss tourney of length equal to FTE would not do quite as well at picking a winner.  This is because traditional Swiss absolutely forbids repeat pairings.  But it provides less information to have the ninth round consist of, say #1 vs #7 and #2 vs #8 (or something like that because of avoiding repeat pairings) than it provides to have #1 play #2 because everyone else has been knocked out.  The early rounds are used to determine who #1 and #2 are, but eventually they have to fight each other to the death rather than trying to rack up points against players who have no more hope of winning.

I agree with 99of9 that it would be cool to let all bots into the championship tourney as long as they meet the public play requirements.  It wouldn't add too many rounds either, I believe.  (Side note: with only four entrants, the first three rounds of FTE are guaranteed to be a round robin.)

And finally, I too am surprised that floatTripElimRand did almost as well as my FTE.  You think of the most interesting things to test, Omar.  But anyway it isn't surprising that randomness takes a round longer to sort things out, and I also think random pairings are socially worse than letting losers play losers and winners play winners.

Title: Re: World Championship tournament format
Post by 99of9 on Jul 31st, 2005, 11:24pm

on 07/31/05 at 22:48:20, Fritzlein wrote:
Also, you never had to face the same guy twice no matter what.

I didn't realise this was a fundamental part of the swiss system.  I think that is a major disadvantage (for the reason you later describe Fritz).  Can't that condition be relaxed?


Quote:
But the hugest advantage of elimination formats is that your fate remains in your hands until elimination.  Until you get the third loss, you can always rest secure in the knowledge, "If I win the rest of my games I will still be champion."  Each player retains the power to single-handedly vanquish the field.  In traditional Swiss this is not true: if you are behind you need to win plus you need the leaders to lose.

You are thinking of "your fate" being simply in terms of winning or not winning the overall tournament.  Those of us who are not at the top ;-) are also interested in contesting lower ladder-places, especially if it means that we get a few games against people of similar standard (even if we've lost the chance at winning overall).

How's this for a format proposal:

Each round:
(1) Players are sorted from first to last in order of
(a) tournament score
tiebroken by
(b) total score of opponents
tiebroken by
(c) original rating (seeding)

(2) If there are an odd number of players, the player at the bottom of the tournament gets a bye which is counted as a win (2 points).

(3) The others pair off from the top of the sorted list [eg 1 plays 2, 3 plays 4 etc].  2 points for a win, 1 for a draw, 0 for a loss.

Iterate this for as many rounds as you want in your tournament.  At a bare minimum you will need a similar number to swiss, but if you test this out I'd prefer you ran it with 5,6,7,8,9,10.... rounds to see how much better it got with increasing rounds (and so it could be directly compared with other formats of the same number of rounds).

[EDIT: Of course if there is a tie in the tournament score at the end, it is tiebroken the same way as the sorting is done - whoever had the hardest tournament wins.]

I've designed this to get as many confrontations between top players as possible (as in swiss), and in particular have deliberately included repeat clashes.  This means that for a fewer number of rounds, a better discernment is done between the top few players.  (it would certainly give the tournament leader a high-difficulty tournament!)

I think by Omar's metric this method will perform nearly the best for a given number of rounds [but I understand that there are other important metrics that are important to how good a competition is].

What do you guys think of it?

Title: Re: World Championship tournament format
Post by 99of9 on Jul 31st, 2005, 11:29pm

on 07/31/05 at 23:17:49, Fritzlein wrote:
I agree with Omar (not 99of9) that a traditional Swiss tourney of length equal to FTE would not do quite as well at picking a winner.  This is because traditional Swiss absolutely forbids repeat pairings.

Agreed - I hadn't realised this was part of the construct (I should have actually followed Omar's wiki link :-)).

Title: Re: World Championship tournament format
Post by 99of9 on Jul 31st, 2005, 11:38pm

on 07/31/05 at 21:03:21, omar wrote:
Also a swiss tournament lacks the climax ending of a knockout style format.

This is an important metric that is not included in Omar's tables.  The format I just proposed also does not necessarily have a climactic final round.  But unless one player is much stronger than all the others, I'd suggest that the final round will often be very important to the overall winner result (more often than swiss), and will certainly be important for determining minor placings.

Title: Re: World Championship tournament format
Post by omar on Jul 31st, 2005, 11:39pm

on 07/31/05 at 22:50:36, 99of9 wrote:
That depends how long the swiss tourney goes for, and exactly how you handle players who are in their own score bracket (especially up the top), and how you handle players who've played each other before.  If you give it the same number of rounds as floating_Triple_Elim, and allow players to play previous opponents, I expect it would be about equal with FTE.  Why do you think it will be worse?


Yes, I agree it depends on the length of the tournament. But usually the number of rounds of a swiss tournament has to be preannounced and is usually just long enough for one player to score one point more than the next highest. About two rounds more than what a single elimination requires is usually enough. That happens to be about how long a floating double elimination is.

Title: Re: World Championship tournament format
Post by Fritzlein on Aug 1st, 2005, 12:54am

on 07/31/05 at 23:24:48, 99of9 wrote:
How's this for a format proposal:

Each round:
(1) Players are sorted from first to last in order of
(a) tournament score
tiebroken by
(b) total score of opponents
tiebroken by
(c) original rating (seeding)

(2) If there are an odd number of players, the player at the bottom of the tournament gets a bye which is counted as a win (2 points).

(3) The others pair off from the top of the sorted list [eg 1 plays 2, 3 plays 4 etc].  2 points for a win, 1 for a draw, 0 for a loss.


Wow, very interesting idea.  I would love to see how it performs.  One proviso is that you would have to have several extra rounds.  If you have just 4 rounds for 16 players, then it's single elimination with horrible seeding.  Even if you have just five rounds, I expect it would get weird.  But after eight or nine rounds, you would have had a heck of an interesting tournament.

Here are my attempts at pairings in the "favorites always win model":

1v2, 3v4, 5v6, 7v8, 9v10, 11v12, 13v14, 15v16
1v3, 5v7, 9v11, 13v15, 2v4, 6v8, 10v12, 14v16
1v5, 9v13, 2v3, 6v7, 10v11, 14v15, 4v8, 12v16
1v9, 5v13, 2v10, 6v14, 3v11, 7v15, 4v12, 8v16
1v5, 2v9, 6v3, 13v7, 10v4, 14v11, 8v15, 12v16
1v2, 5v3, 9v4, 6v7, 10v11, 13v8, 14v12, 15v16

Notes: for the first four rounds, the standings correspond less and less to player strength.  (Not a good thing for a format to achieve when there are 2nd and 3rd place prizes to be given out?)  After 5 rounds, things are sorting back out, but #8 seed was still rooked by the pairings and #9 vaulted into fifth place.  Sadly, after 6 rounds, there is almost no chance I have done the calculation correctly.

Well, I wonder what Omar's tests will reveal.  It is an intriguing idea, but I say that unless these pairings are unequivocaly better at picking a winner than floating elimination is, then the early mayhem caused in the standings is simply not worth it.  For psychological reasons, the #4 seed should not have to get rocked twice at the start and fall behind some schmoe from the bottom of the bracket who got two wins against other low seeds.

Title: Re: World Championship tournament format
Post by 99of9 on Aug 1st, 2005, 2:36am

on 08/01/05 at 00:54:39, Fritzlein wrote:
Sadly, after 6 rounds, there is almost no chance I have done the calculation correctly.


Miracles can happen.  My calculation (some bits by hand, some by spreadsheet) gives exactly the same first 6 rounds as you. I'll post more when I get them (But I see why you gave up!!!  Tourneys using this format would have to be administrated by a computer!).

Title: Re: World Championship tournament format
Post by 99of9 on Aug 1st, 2005, 3:55am
I wrote the code Fritz, here are the pairings and intermediate scorecards if the better seeded player wins every time.  From round 6 onward you are in a situation where #1 is tested against #2 and #3 until somebody cracks.  One downside: in this "perfect seeding" model #2 and #3 only get one game against each other (because one of them is usually busy playing #1).  But I think that for humans, upsets are sufficiently likely that there would be enough variation in the results to prevent endless cycles in the pairings.


Round  1 - Current Standings:
player  1 score:   0 oppscore:    0
player  2 score:   0 oppscore:    0
player  3 score:   0 oppscore:    0
player  4 score:   0 oppscore:    0
player  5 score:   0 oppscore:    0
player  6 score:   0 oppscore:    0
player  7 score:   0 oppscore:    0
player  8 score:   0 oppscore:    0
player  9 score:   0 oppscore:    0
player 10 score:   0 oppscore:    0
player 11 score:   0 oppscore:    0
player 12 score:   0 oppscore:    0
player 13 score:   0 oppscore:    0
player 14 score:   0 oppscore:    0
player 15 score:   0 oppscore:    0
player 16 score:   0 oppscore:    0
1 v  2
3 v  4
5 v  6
7 v  8
9 v 10
11 v 12
13 v 14
15 v 16

Round  2 - Current Standings:
player  1 score:   1 oppscore:    0
player  3 score:   1 oppscore:    0
player  5 score:   1 oppscore:    0
player  7 score:   1 oppscore:    0
player  9 score:   1 oppscore:    0
player 11 score:   1 oppscore:    0
player 13 score:   1 oppscore:    0
player 15 score:   1 oppscore:    0
player  2 score:   0 oppscore:    1
player  4 score:   0 oppscore:    1
player  6 score:   0 oppscore:    1
player  8 score:   0 oppscore:    1
player 10 score:   0 oppscore:    1
player 12 score:   0 oppscore:    1
player 14 score:   0 oppscore:    1
player 16 score:   0 oppscore:    1
1 v  3
5 v  7
9 v 11
13 v 15
2 v  4
6 v  8
10 v 12
14 v 16

Round  3 - Current Standings:
player  1 score:   2 oppscore:    2
player  5 score:   2 oppscore:    2
player  9 score:   2 oppscore:    2
player 13 score:   2 oppscore:    2
player  2 score:   1 oppscore:    2
player  3 score:   1 oppscore:    2
player  6 score:   1 oppscore:    2
player  7 score:   1 oppscore:    2
player 10 score:   1 oppscore:    2
player 11 score:   1 oppscore:    2
player 14 score:   1 oppscore:    2
player 15 score:   1 oppscore:    2
player  4 score:   0 oppscore:    2
player  8 score:   0 oppscore:    2
player 12 score:   0 oppscore:    2
player 16 score:   0 oppscore:    2
1 v  5
9 v 13
2 v  3
6 v  7
10 v 11
14 v 15
4 v  8
12 v 16

Round  4 - Current Standings:
player  1 score:   3 oppscore:    5
player  9 score:   3 oppscore:    5
player  5 score:   2 oppscore:    6
player 13 score:   2 oppscore:    6
player  2 score:   2 oppscore:    5
player 10 score:   2 oppscore:    5
player  6 score:   2 oppscore:    3
player 14 score:   2 oppscore:    3
player  3 score:   1 oppscore:    6
player 11 score:   1 oppscore:    6
player  7 score:   1 oppscore:    4
player 15 score:   1 oppscore:    4
player  4 score:   1 oppscore:    3
player 12 score:   1 oppscore:    3
player  8 score:   0 oppscore:    4
player 16 score:   0 oppscore:    4
1 v  9
5 v 13
2 v 10
6 v 14
3 v 11
7 v 15
4 v 12
8 v 16

Round  5 - Current Standings:
player  1 score:   4 oppscore:   11
player  5 score:   3 oppscore:   11
player  2 score:   3 oppscore:   10
player  9 score:   3 oppscore:    9
player  6 score:   3 oppscore:    8
player  3 score:   2 oppscore:   10
player 13 score:   2 oppscore:    9
player  7 score:   2 oppscore:    8
player 10 score:   2 oppscore:    8
player  4 score:   2 oppscore:    7
player 14 score:   2 oppscore:    6
player 11 score:   1 oppscore:    8
player  8 score:   1 oppscore:    7
player 15 score:   1 oppscore:    6
player 12 score:   1 oppscore:    5
player 16 score:   0 oppscore:    5
1 v  5
2 v  9
6 v  3
13 v  7
10 v  4
14 v 11
8 v 15
12 v 16

Round  6 - Current Standings:
player  1 score:   5 oppscore:   16
player  2 score:   4 oppscore:   16
player  5 score:   3 oppscore:   18
player  3 score:   3 oppscore:   17
player  9 score:   3 oppscore:   15
player  4 score:   3 oppscore:   13
player  6 score:   3 oppscore:   13
player  7 score:   3 oppscore:   11
player 10 score:   2 oppscore:   14
player 11 score:   2 oppscore:   12
player 13 score:   2 oppscore:   12
player  8 score:   2 oppscore:   10
player 14 score:   2 oppscore:    8
player 12 score:   2 oppscore:    7
player 15 score:   1 oppscore:    9
player 16 score:   0 oppscore:    9
1 v  2
5 v  3
9 v  4
6 v  7
10 v 11
13 v  8
14 v 12
15 v 16

Round  7 - Current Standings:
player  1 score:   6 oppscore:   21
player  2 score:   4 oppscore:   26
player  3 score:   4 oppscore:   23
player  4 score:   4 oppscore:   20
player  6 score:   4 oppscore:   18
player  5 score:   3 oppscore:   25
player  9 score:   3 oppscore:   21
player  7 score:   3 oppscore:   18
player 10 score:   3 oppscore:   18
player  8 score:   3 oppscore:   15
player 12 score:   3 oppscore:   11
player 11 score:   2 oppscore:   18
player 13 score:   2 oppscore:   16
player 14 score:   2 oppscore:   13
player 15 score:   2 oppscore:   10
player 16 score:   0 oppscore:   15
1 v  2
3 v  4
6 v  5
9 v  7
10 v  8
12 v 11
13 v 14
15 v 16

Round  8 - Current Standings:
player  1 score:   7 oppscore:   28
player  3 score:   5 oppscore:   30
player  2 score:   4 oppscore:   36
player  5 score:   4 oppscore:   34
player  4 score:   4 oppscore:   27
player  6 score:   4 oppscore:   27
player  7 score:   4 oppscore:   25
player  8 score:   4 oppscore:   21
player  9 score:   3 oppscore:   28
player 10 score:   3 oppscore:   24
player 11 score:   3 oppscore:   22
player 13 score:   3 oppscore:   22
player 12 score:   3 oppscore:   15
player 15 score:   3 oppscore:   13
player 14 score:   2 oppscore:   19
player 16 score:   0 oppscore:   21
1 v  3
2 v  5
4 v  6
7 v  8
9 v 10
11 v 13
12 v 15
14 v 16

Round  9 - Current Standings:
player  1 score:   8 oppscore:   37
player  2 score:   5 oppscore:   45
player  3 score:   5 oppscore:   43
player  4 score:   5 oppscore:   34
player  7 score:   5 oppscore:   30
player  5 score:   4 oppscore:   42
player  9 score:   4 oppscore:   36
player  6 score:   4 oppscore:   35
player 11 score:   4 oppscore:   29
player  8 score:   4 oppscore:   28
player 12 score:   4 oppscore:   22
player 10 score:   3 oppscore:   34
player 13 score:   3 oppscore:   30
player 14 score:   3 oppscore:   21
player 15 score:   3 oppscore:   19
player 16 score:   0 oppscore:   27
1 v  2
3 v  4
7 v  5
9 v  6
11 v  8
12 v 10
13 v 14
15 v 16

Round 10 - Current Standings:
player  1 score:   9 oppscore:   46
player  3 score:   6 oppscore:   52
player  2 score:   5 oppscore:   60
player  5 score:   5 oppscore:   53
player  4 score:   5 oppscore:   45
player  6 score:   5 oppscore:   43
player  7 score:   5 oppscore:   42
player  8 score:   5 oppscore:   36
player  9 score:   4 oppscore:   45
player 10 score:   4 oppscore:   39
player 11 score:   4 oppscore:   38
player 13 score:   4 oppscore:   36
player 12 score:   4 oppscore:   28
player 15 score:   4 oppscore:   21
player 14 score:   3 oppscore:   29
player 16 score:   0 oppscore:   35
1 v  3
2 v  5
4 v  6
7 v  8
9 v 10
11 v 13
12 v 15
14 v 16

Round 11 - Current Standings:
player  1 score:  10 oppscore:   57
player  2 score:   6 oppscore:   71
player  3 score:   6 oppscore:   69
player  4 score:   6 oppscore:   53
player  7 score:   6 oppscore:   48
player  5 score:   5 oppscore:   64
player  6 score:   5 oppscore:   54
player  9 score:   5 oppscore:   54
player  8 score:   5 oppscore:   46
player 11 score:   5 oppscore:   46
player 12 score:   5 oppscore:   36
player 10 score:   4 oppscore:   52
player 13 score:   4 oppscore:   47
player 14 score:   4 oppscore:   31
player 15 score:   4 oppscore:   29
player 16 score:   0 oppscore:   43
1 v  2
3 v  4
7 v  5
6 v  9
8 v 11
12 v 10
13 v 14
15 v 16

Round 12 - Current Standings:
player  1 score:  11 oppscore:   68
player  3 score:   7 oppscore:   80
player  2 score:   6 oppscore:   90
player  5 score:   6 oppscore:   76
player  4 score:   6 oppscore:   67
player  6 score:   6 oppscore:   63
player  7 score:   6 oppscore:   63
player  8 score:   6 oppscore:   55
player  9 score:   5 oppscore:   66
player 10 score:   5 oppscore:   58
player 11 score:   5 oppscore:   58
player 13 score:   5 oppscore:   54
player 12 score:   5 oppscore:   45
player 15 score:   5 oppscore:   31
player 14 score:   4 oppscore:   41
player 16 score:   0 oppscore:   53
1 v  3
2 v  5
4 v  6
7 v  8
9 v 10
11 v 13
12 v 15
14 v 16

Title: Re: World Championship tournament format
Post by 99of9 on Aug 1st, 2005, 3:59am
Omar, here is the code.  If you can turn it into a probabilistic one based on ratings which integrates with your validation system, I'd be very interested, but don't worry about it too much - having seen the early round pairings I'm less gung-ho about it.


#include <stdio.h>
#include <stdlib.h>

#define ROUNDS 12
#define PLAYERS 16

int score[PLAYERS];
int oppscore[PLAYERS];
int seed[PLAYERS];
int opponent[PLAYERS][ROUNDS];
int ranking[PLAYERS];

int Sort_Players() {
 /* I know this is a very inefficient sort method, but I needed to be quick */
 int pos=0;
 int cp;

 while (pos!=PLAYERS-1) {
     if (( score[ranking[pos] <score[ranking[pos+1])||
           ((score[ranking[pos]==score[ranking[pos+1])&&(oppscore[ranking[pos]<oppscore[ranking[pos+1]))||
           ((score[ranking[pos]==score[ranking[pos+1])&&(oppscore[ranking[pos]==oppscore[ranking[pos+1])&&(seed[ranking[pos]<seed[ranking[pos+1]))){
       cp = ranking[pos+1];
       ranking[pos+1] = ranking[pos];
       ranking[pos] = cp;

       pos = 0;
     } else {
       pos++;
     }
 }

}

int Winner (int a, int b) {
 if (a<b) return 1;
 else return 0;
}

int main () {
 int round, rnd;
 int player;
 int game;


 for (player=0; player<PLAYERS; player++) {
     ranking[player]=player;
     score[player] = 0;
     oppscore[player] = 0;
     seed[player] = PLAYERS - player;
 }

 for (round=0; round<ROUNDS; round++) {
      Sort_Players();

     printf("Round %2d - Current Standings:\n", round+1);
     for (player=0; player<PLAYERS; player++) {
       printf("player %2d score: %3d oppscore: %4d\n", ranking[player]+1, score[ranking[player], oppscore[ranking[player]);
     }

   for (game=0; game<PLAYERS/2; game++) {

       printf("%2d v %2d\n", ranking[game*2]+1, ranking[game*2+1]+1);

       opponent[ranking[game*2][round]   = ranking[game*2+1];
       opponent[ranking[game*2+1][round] = ranking[game*2];

       if (Winner(ranking[game*2],ranking[game*2+1])) {
           score[ranking[game*2]++;
       } else {
           score[ranking[game*2+1]++;
       }
     }
     if (PLAYERS%2) {
       score[ranking[PLAYERS-1]++;
       opponent[ranking[PLAYERS-1][round] = -1;
     }
     /* last on board gets a bye and a point,
        and their opponent for that round is set to be themselves*/

     for (player=0; player<PLAYERS; player++) {
       oppscore[player] = 0;
       for (rnd=0; rnd<=round; rnd++) {
           if (!(opponent[player][rnd]==-1))
           oppscore[player] += score[opponent[player][rnd];
       }
     }

     printf("\n");
 }

}


Title: Re: World Championship tournament format
Post by jdb on Aug 1st, 2005, 8:25am

Quote:
But the hugest advantage of elimination formats is that your fate remains in your hands until elimination.  Until you get the third loss, you can always rest secure in the knowledge, "If I win the rest of my games I will still be champion."  Each player retains the power to single-handedly vanquish the field.  In traditional Swiss this is not true: if you are behind you need to win plus you need the leaders to lose.  


I agree with this 100%. The climatic final game is also good public relations.


Quote:
I agree with Omar (not 99of9) that a traditional Swiss tourney of length equal to FTE would not do quite as well at picking a winner.  This is because traditional Swiss absolutely forbids repeat pairings.  


If a Swiss and FTE have the same number of rounds, and the number of participants is sufficiently large, the Swiss will perform better. While each individual game in the Swiss is not as effective at determining the winner, in the latter rounds, the Swiss has more total games. Which one is better for a 16 person tourny, I have no idea.   :)


Title: Re: World Championship tournament format
Post by omar on Aug 4th, 2005, 12:30am
Toby, I'll try out your method once I get some time.

Jeff, Im starting to get the feeling that the total number of games does not matter too much. What I think matters more is the average number of rounds a tournament has. Also another thing that seems to matter is how many rounds are played before making a decision about eliminating players. Im also getting the feeling that the details of how the pairs are chosen do not matter much compared to these other two factors.

Here's a trivial sounding math question for anyone who wants to take a crack at it:

If a player A can defeat player B with a probability p then what is the probability that player A will win if they play a series of N games?

I was wondering about this because it will set an upper bound on how well we can expect a tourament to perform. Knowing the number of players and the range of their true ratings we can come up with an average difference in rating between the best and second best player and from that the probability of the best player winning a game against the second best. Then from the number of rounds the tournament runs for we can determine the probability of the best player winning and get an upper bound on what the best the tournament can do.


Title: Re: World Championship tournament format
Post by jdb on Aug 4th, 2005, 9:26am

Quote:
If a player A can defeat player B with a probability p then what is the probability that player A will win if they play a series of N games?


Here's a little program that should compute this.


public class GameTest {
 public static long factorial(int n) {
   assert( n>=0 && n<=60);
   long result = 1;
   for ( int i=1; i<=n; i++ ) {
     result *= i;
   }
   return result;
 }

 public static long nCr(int n, int r) {
   return factorial(n)/factorial(r)/factorial(n-r);
 }
 
 public static void main( String args[] ) {
   int num_games = 5;
   double game_win_prob = .3;  


   if ( args.length == 2 ) {
     num_games = Integer.parseInt(args[0]);
     game_win_prob = Double.parseDouble(args[1]);
   }
   
   assert( num_games >=1 );
   assert( game_win_prob>=0 && game_win_prob <=1.0 );
   
   double total = 0;
   double match_win_prob = 0;
 
   for ( int i=0; i<=num_games; i++ ) {
     double a_win = Math.pow(game_win_prob,i);
     double b_win = Math.pow(1-game_win_prob,num_games-i);
     double ways = nCr(num_games,i);
     double item_win = a_win * b_win * ways;

     System.out.println(i+" "+item_win);
     total += item_win;

     match_win_prob += (i>num_games/2) ? item_win : 0;
   }
   System.out.println("Total: "+total);
   
   System.out.println("Match Length: "+num_games);
   System.out.println("Individual Win Prob: "+game_win_prob);
   System.out.println("Match Win Prob: "+match_win_prob);
 }
}


Title: Re: World Championship tournament format
Post by omar on Aug 4th, 2005, 5:20pm
Thanks for the program Jeff.

Since Im learning Ruby, I rewrote this in Ruby. Here it is in case anyone is interested:


#!/usr//local/bin/ruby

# N choose M   =   N! / (N-M)!*M!
def NcM(n, m)
 b = [];
 b[0] = 1
 for i in 1 .. n
   b[i] = 1
   j = i-1
   while j>0
     b[j] += b[j - 1]
     j -= 1
   end
 end
 return b[m]
end

# compute probability of winning as:
#  pW = 1 - [ T0 + T1 + T2 ... + Tk ]
#      where Ti = Nci*q^(N-i)*p^i
#             q = 1-p
#             k = int(N/2)
#
def pW(p, n)
 q = 1-p
 k = n/2
 wp = 0.0
 for i in 0 .. k
   j = n-i
   y = NcM(n,i)
   x = NcM(n,i) * q**j * p**i
   wp += NcM(n,i) * q**j * p**i
 end
 wp = 1.0 - wp;
 return wp
end

p = ARGV[0].to_f
n = ARGV[1].to_i
if p < 0  or  p > 1
 print "probability is out of range [0, 1]\n"
 exit
end
if n < 1
 print "number of games must be at least 1\n"
 exit
end

wp = 100*pW(p, n)
print "chance of winning is #{wp}%\n"


To run it just pass the probability of winning for the stronger player and the number of games. For example:

chance 0.6 11

and it will print:

 chance of winning is 75.349813248%


Title: Re: World Championship tournament format
Post by Fritzlein on Aug 4th, 2005, 7:34pm

on 08/04/05 at 00:30:55, omar wrote:
Here's a trivial sounding math question for anyone who wants to take a crack at it:

If a player A can defeat player B with a probability p then what is the probability that player A will win if they play a series of N games?


As far as I know, Jeff's method of adding up factorials is the only way to get an exact answer.  However, as the length of the match increases, the binomial distribution increasingly resembles (and can be approximated by) the normal distribution.  To use Omar's example, say one player wins 60% of the time.  In an 11 game match, he would be expected to win 6.6 games.  For him to lose the match he would need to score 6.6-(11/2) = 1.1 wins or more below expected.  Now one standard deviation is sqrt(Npq) = sqrt(11 * 0.6 * 0.4) = 1.6248, so a match loss is 1.1/1.6248 = 0.677 standard deviations below the mean.  Consulting my stat function on my calculator, that occurs with probability 0.2492, so the favorite will win 75.08% of the time.  Not a bad approximation of 75.35% as calculated by the program.

The approximation is better for larger values of N and values of p closer to 0.5.

Title: Re: World Championship tournament format
Post by omar on Aug 9th, 2005, 11:45am

on 08/01/05 at 08:25:45, jdb wrote:
If a Swiss and FTE have the same number of rounds, and the number of participants is sufficiently large, the Swiss will perform better. While each individual game in the Swiss is not as effective at determining the winner, in the latter rounds, the Swiss has more total games. Which one is better for a 16 person tourny, I have no idea.   :)


I've been thinking about this recently and without having done any simulations with a swiss format I am starting to think that the floatTripElim (FTE) would still perform better than a swiss format even if they both had the same number of rounds; at least for the 16 player case.

First of all a typical swiss tournament has just a few more rounds than what single elimination requires; that is log2(N) where N is the number of players. I have not been able to find any good source on the web providing an exact way to determine the number of rounds for a given number of players. If you happen to come across this please let me know. One page that I found said you need about 1.3*log2(N) rounds:
 http://www.scoutingresources.org.uk/ideabase_swiss.html

Now if we assume that a swiss tournament had exactly log2(N) rounds then it would be exactly the same as singleElimSlide. The maximum number of rounds a swiss tournament could possibly have N-1 since any two players cannot play more than once. With the maximum number of rounds it would be exactly the same as roundRobin. We know from the simulations that roundRobin performs better than singleElimSlide (about 30% compared to 25% for the 16 player case). Since the number of rounds in a typical swiss tournament is much closer to log2(N) than N-1 I would think that it would perform closer to singleElimSlide than to roundRobin.

Since the FTE has performed better than roundRobin with fewer rounds (at least up to the 16 player case) I think it is safe to conclude that FTE will perform better than swiss even if swiss used the same number of rounds as FTE.

Title: Re: World Championship tournament format
Post by Fritzlein on Aug 9th, 2005, 3:48pm
While it is true that more total games means more information in general, it doesn't necessarily mean more information about who is the best.  If I wanted to know who was best in a 16-player tournament, I'd lean towards a FTE over a 10-round Swiss.  The "extra" games of the Swiss tournament are mostly between players who have no claim to the top spot.  A 10-round Swiss might be pretty darn good at ordering all the players from best to worst, but in my guess will be slightly worse at determining who is the best.

Title: Re: World Championship tournament format
Post by omar on Aug 10th, 2005, 10:45am
While trying to search for info on the Swiss tournament format I happened to come across this very interesting article. A must read for this topic:

http://www.oxfordcroquet.com/tech/appleton/index.asp


Title: Re: World Championship tournament format
Post by jdb on Aug 10th, 2005, 12:15pm
A very interesting article.

If I understood correctly, the article's simulation method attempted to account for a player's strength varying from game to game.

The article described a tournament format called "Draw and Process". I had never heard of it before. It seems to have some good qualities.

Title: Re: World Championship tournament format
Post by Fritzlein on Aug 11th, 2005, 8:13pm
That is an excellent article, Omar.  I agree with JDB that the "draw and process" format is the most intriguing.  For 16 players D&P takes 9 rounds unless one player wins both brackets, in which case it takes only 8.  Therefore it probably takes about one more round on average than floating double elimination, which takes 7.7.

I am going to go out on a limb to predict that the D&P, in spite of having one more round than the FDE, will have a couple percentange points less chance of picking the best player as the winner.

Title: Re: World Championship tournament format
Post by omar on Aug 11th, 2005, 8:39pm
I've never heard of the Draw and Process format before either. Doing some searches for it brought me to this page:
http://www.magma.ca/~acna/cobclubch.htm

It seems like running a single elemination tournament two times but with different ordering of players and if the same player wins both time then that player is the winner. If different players win then maybe there are additional games between them to select a final winner.

Title: Re: World Championship tournament format
Post by jdb on Aug 11th, 2005, 11:34pm
Algorithmically, the draw and process can be modelled as a single elim tournament, with fixed pairings. Each person is entered twice, subject to the special seeding requirements. This would also work for three or more way draw and process, by adding byes to fill out the unused brackets. If the same person wins all the brackets, the extra games are meaningless, but don't change any of the results.

Another option would be to try random pairing.

Title: Re: World Championship tournament format
Post by Fritzlein on Aug 12th, 2005, 6:05pm
As I understood it, the desired property of draw and process is that people who meet in the first round of one bracket can't meet until the finals of the other, people who meet in the second round of one bracket can't meet until the semi-finals of the other, etc.  It seems, however, that there is more than one way to meet this requirement.  For example, if the draw is

1-16--9-8----5-12--13-4--------3-14--11-6----7-10--15-2

then the process could be

1-2--3-4----5-6--7-8--------9-10--11-12----13-14--15-16

but it also could be

1-15--3-13----5-11--7-9--------8-10--6-12----4-14--2-16

I wonder whether one could build three brackets such that the D&P condition is satisfied for any pair.  My example doesn't quite do that, because 1 could meet 3 in the second round of both of the latter two brackets.

I'm curious about it mostly as a math question, because I believe that a D&P with two brackets will be out-performed by a floating double elimination, and I beleive a D&P with three brackets will be out-performed by a floating triple elimination.

Title: Re: World Championship tournament format
Post by jdb on Aug 12th, 2005, 6:27pm
I'm working on coding up draw and process now. I'll let some of them run overnight and post the results in the morning.

Title: Re: World Championship tournament format
Post by jdb on Aug 13th, 2005, 8:21am
Here are my results for a four way draw and process.
The brackets were paired, slide,fold,fold,slide.

run2 'formats/dap' 2000 16 500 50 10
 1   39.2%
 2   23.4%
 3   14.7%
 4    9.0%
 5    6.2%
 6    3.3%
 7    2.1%
 8    0.9%
 9    0.4%
10    0.2%
11    0.3%
12    0.1%
13    0.1%
14    0.1%

Three way draw and process
Brackets paired fold,slide and random, with random getting bye into final.

run2 'formats/dap' 2000 16 500 50 10
 1   35.5%
 2   23.9%
 3   17.2%
 4    8.8%
 5    5.8%
 6    3.8%
 7    2.0%
 8    1.3%
 9    0.9%
10    0.6%
11    0.1%
12    0.1%

Title: Re: World Championship tournament format
Post by Fritzlein on Aug 13th, 2005, 6:09pm
Looks like the 3-way D&P did about as well as quadruple floating elimination, which has about the same number of rounds (i.e. between 13 and 14).  The four-way D&P would take up to 18 rounds (a little less on average due to final games begin unnecessary sometimes) so there is nothing to dircetly compare it to.  One could get at least 5 and maybe six eliminations for that number of rounds.

Title: Re: World Championship tournament format
Post by omar on Aug 15th, 2005, 4:04pm

on 08/13/05 at 08:21:23, jdb wrote:
Here are my results for a four way draw and process.
The brackets were paired, slide,fold,fold,slide.


Thank Jeff.

Im just wondering how you handled ties for first place?

Also did you use any of the pre-tournament ratings to order the players?

Title: Re: World Championship tournament format
Post by omar on Aug 15th, 2005, 4:08pm

on 08/13/05 at 18:09:16, Fritzlein wrote:
Looks like the 3-way D&P did about as well as quadruple floating elimination, which has about the same number of rounds (i.e. between 13 and 14).


FTE with 16 players averaged a little over 10 rounds. So I think it was a bit faster.



Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.