Welcome, Guest. Please Login or Register.
May 7th, 2024, 9:24pm

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « 2007 World Championship Format »


   Arimaa Forum
   Arimaa
   Events
(Moderator: supersamu)
   2007 World Championship Format
« Previous topic | Next topic »
Pages: 1 2 3 4 5 6  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: 2007 World Championship Format  (Read 5294 times)
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2007 World Championship Format
« Reply #45 on: Sep 24th, 2006, 9:01pm »
Quote Quote Modify Modify

Here are the functions which determine the "badness" of any given pairing:
 
(((num_games * num_losses_to_eliminate +  
abs (num_player1_losses - num_player2_losses)) * sqr (all_players.size ())) + ((sqr (all_players.size ()) - 1) - sqr (abs (player1_rating_rank - player2_rating_rank))))
 
and  
 
(((num_byes * (1 + previous_rounds.size ()) + num_losses) * num_losses_to_eliminate + rating_rank)
* ((1 + previous_rounds.size ()) * num_losses_to_eliminate * sqr (all_players.size ()) * all_players.size ()))
 
When you parse out the size of the second function, the problem is immediately apparent.  It is a huge term.  In effect, it says that giving the bye to the right person is more important than anything else.  This includes giving the bye to the highest-rated player, even if it will cause repeat matchups.  If you fed the 2006 Computer Championship to this pairing code, it would demand the third Bomb-Clueless pairing before the first Bomb-Aamira pairing in order to give the bye to the right player.  This is exactly the unfairness that prompted a protest last year.  We definitely want to change the code to prevent another train wreck like that.
 
To this end, let me rip apart the the eval function, and put it back together in the order of my priorities.  (Playing out the tournaments did convince me that the giving a bye to a more deserving player should take priority over swiss pairing winners vs. winners, a minor modification.)  I'll use the notation X for the number of eliminations and N for the number of players.
 
1. Folding pairing, our lowest priority, is dealt with by sqr (all_players.size() - 1) - sqr (abs (player1_rating_rank - player2_rating_rank)).  It was clever of doublep to realize you only need to keep track of the ranks of the players, not their actual ratings.  I'm not sure why he took the absolute value of something before squaring, but I left it just in case.
 
The penalty is (N-1)^2 minus the difference in rank squared.  The larger the difference in rank, the lower the penalty.  The reason it has to be a subtraction is that all penalties must be positive for branch and bound to work, so you can't reward a mistmatch, you just have to penalize it less than an equal pairing.
 
2. The second-lowest priority is pairing winners vs. winners and losers vs. losers.  The term abs (num_player1_losses - num_player2_losses) * sqr (all_players.size ()) has the N^2 term to make it bigger than the previous term.  However, I would modify it so that a paring a no-loss player against a two-loss player is more than twice as bad as pairing a no-loss player against a one-loss player.  For example, if the four remaining players have 0, 1, 1, and 2 losses respectively, this function is indifferent between pairing 0v2, 1v1 (two losses difference) and 0v1, 1v2 (two losses difference).  Therefore I think we should square the difference in the number of losses between the paired players, i.e.
sqr( abs (num_player1_losses - num_player2_losses)) * sqr (all_players.size ())
 
3. Third-lowest priority is giving a bye to a player of higher rank.  To keep it above the previous term, we need to multiply by X^2 and N^2, thus
 
rating_rank * sqr(num_losses_to_eliminate) * sqr (all_players.size ())
 
4. Fourth-lowest priority is to give the bye to the player with fewer losses.  To make the number of losses weigh more than the rank of the players, we have to throw in another factor of N, the number of players, i.e.
 
num_losses * sqr(num_losses_to_eliminate) * sqr (all_players.size ()) * all_players.size ()
 
5. Second-to-top priority is avoiding repeat pairings.  To outdo the previous term, we need X^3 times N^3 weighting the number of repetitions.  But more than that, just as in our second-to bottom priority, two pairings which have each happened once before are not as bad as one pairing with has occurred twice before.  Therefore we square the number of times the pairing has previously happened:
 
sqr(num_games) * sqr(num_losses_to_eliminate) * num_losses_to_eliminate * sqr (all_players.size ()) * all_players.size ()
 
6. Finally, our top priority must be to not let the byes get uneven.  This must not only be bigger than the previous term, but bigger than the previous term summed across all pairings, because it is not just more important than one repeat pairing, it is more important than any number of repeat pairings.  If R is the number of rounds played so far, we need R^2 times X^3 times N^4 to be safe.
 
num_byes * sqr(previous_rounds.size ()) * sqr(num_losses_to_eliminate) * num_losses_to_eliminate * sqr(sqr(all_players.size ()))
 
To plug these back into doublep's code, we sum 3, 4, and 6 to make the function dealing with byes, and sum 1, 2, and 5 to make the function dealing with all other pairings.
 
I hope this makes sense and I haven't made too many errors.
IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2007 World Championship Format
« Reply #46 on: Sep 24th, 2006, 9:17pm »
Quote Quote Modify Modify

Playing out those tournaments that Omar posted has given me some further impressions:
 
* The seeding is truly important.  Having a higher rating makes your tournament easier.  Because this is true, participants have a large incentive to inflate their ratings by bot-bashing.  But if the ratings are going to be distorted, that lessens the incentive for seeding.  I am more convinced than ever that, if we use ratings for seeding, it should be exclusively HvH ratings.
 
* I like floating double-elimination, but I like floating triple-elimination better.  The third life allows flukes to work out better regardless of seeding.  C'mon Omar, twelve rounds isn't too many, is it?
 
* The low seeds do get rocked in the pairings, but if you win your games, you can still succeed.  In one of Omar's 17-player triple-elimination tournaments, #4 won it all, followed by #1, #10, and tied for fourth place were #15 and #8.  It's an open tournament in spite of the seeding.
 
http://arimaa.com/arimaa/tourn/compare/fte/paul/tr4
 
* If lots of players enter (say 24) and the pairing code bogs down, the pairing can be done by hand for the first four rounds until eliminations reduce the field.
 
* I can't wait for the fun to begin!
IP Logged

omar
Forum Guru
*****



Arimaa player #2

   


Gender: male
Posts: 1003
Re: 2007 World Championship Format
« Reply #47 on: Sep 25th, 2006, 6:58am »
Quote Quote Modify Modify

on Sep 24th, 2006, 12:06pm, seanick wrote:
I haven't looked at the program, but for the last line, don't you mean, instead of this:  
return &best_nodes[ best_nodes.size () - 1];  
 
you actually meant this, right?  
return &best_nodes[rand() % ( best_nodes.size () - 1)];
(because leaving out the random factor would mean it was always the last player in the best_nodes arrray that was picked)

 
Actually this array holds tournament pairings. Sometimes some of the entries do not have a vailid pairing. But it seems the last entry always does.
IP Logged
omar
Forum Guru
*****



Arimaa player #2

   


Gender: male
Posts: 1003
Re: 2007 World Championship Format
« Reply #48 on: Sep 25th, 2006, 7:57am »
Quote Quote Modify Modify

Karl, thanks for the input on adjusting the penalty functions. I have made the changes and the modified program is availible from here:
 
http://arimaa.com/arimaa/tourn/compare/sim/formats/floatXElimRepair/tour nament.cpp
 
I have run the new program on the same trials posted previously. The new results for those trials can be viewed here:
 
 FTE 16 players
http://arimaa.com/arimaa/tourn/compare/fte/paul/ntr1
http://arimaa.com/arimaa/tourn/compare/fte/paul/ntr2
FTE 17 players
http://arimaa.com/arimaa/tourn/compare/fte/paul/ntr3
http://arimaa.com/arimaa/tourn/compare/fte/paul/ntr4
 
FDE 16 players
http://arimaa.com/arimaa/tourn/compare/fde/paul/ntr1
http://arimaa.com/arimaa/tourn/compare/fde/paul/ntr2
FDE 17 players
http://arimaa.com/arimaa/tourn/compare/fde/paul/ntr3
http://arimaa.com/arimaa/tourn/compare/fde/paul/ntr4  
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2007 World Championship Format
« Reply #49 on: Sep 25th, 2006, 12:16pm »
Quote Quote Modify Modify

on Sep 25th, 2006, 7:57am, omar wrote:
I have run the new program on the same trials posted previously. The new results for those trials can be viewed here:

 
Excellent, thank you.  I will try to pair these with a goal of predicting 100% what the computer decided on, just to prove it can be done by hand if necessary.
IP Logged

doublep
Forum Guru
*****



Badger author

   


Gender: male
Posts: 82
Re: 2007 World Championship Format
« Reply #50 on: Sep 25th, 2006, 2:17pm »
Quote Quote Modify Modify

I've fixed a bug (memory corruption) in the program, it shouldn't crash or give weird results anymore (I hope.)  And feel free to modify penalty calculation algorithm, it is broken out to separate functions just for that.
IP Logged
doublep
Forum Guru
*****



Badger author

   


Gender: male
Posts: 82
Re: 2007 World Championship Format
« Reply #51 on: Sep 25th, 2006, 2:18pm »
Quote Quote Modify Modify

Ant yes, forgot to mention that the fixed version is available at the same URL.
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2007 World Championship Format
« Reply #52 on: Sep 25th, 2006, 4:30pm »
Quote Quote Modify Modify

on Sep 25th, 2006, 2:17pm, doublep wrote:
I've fixed a bug (memory corruption) in the program, it shouldn't crash or give weird results anymore (I hope.)  And feel free to modify penalty calculation algorithm, it is broken out to separate functions just for that.

 
Thank you so much for the code, doublep.  I really couldn't think of a better way to do the pairing I wanted, other than trying all the possibilities.  Your program is a great gift to the community.
IP Logged

omar
Forum Guru
*****



Arimaa player #2

   


Gender: male
Posts: 1003
Re: 2007 World Championship Format
« Reply #53 on: Sep 25th, 2006, 4:37pm »
Quote Quote Modify Modify

Thanks for the quick fix Paul. It correctly handles the two cases I mentioned above. I'll add Karl's evaluation to this version and test it out further.
IP Logged
omar
Forum Guru
*****



Arimaa player #2

   


Gender: male
Posts: 1003
Re: 2007 World Championship Format
« Reply #54 on: Sep 25th, 2006, 4:59pm »
Quote Quote Modify Modify

on Sep 25th, 2006, 12:16pm, Fritzlein wrote:

 
Excellent, thank you.  I will try to pair these with a goal of predicting 100% what the computer decided on, just to prove it can be done by hand if necessary.

 
Sometimes multiple pairing solutions are found which all have the same "score". Paul's program originally selected one of these solutions at random. I modified it to select the last one in list (as a work around for the problem mentioned above).
 
Even though Paul has fixed this problem now, I am thinking that maybe we should have it continue to select the last one in the list just so that the program will be deterministic and we can have it produce the same output with the same input (in case we ever need to prove how the pairing was generated).
 
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2007 World Championship Format
« Reply #55 on: Sep 25th, 2006, 8:38pm »
Quote Quote Modify Modify

It makes sense to me to have the program be deterministic, and thus verifiable.  From a human perspective, the order in which the code generates the tied pairings is not predictable in advance, and therefore as good as random anyway.
 
In the eight tournaments (about eighty rounds) I just finished double-checking, there was a tie only once, and that was caused in the first round by two players having the same rating.
 
on Sep 25th, 2006, 12:16pm, Fritzlein wrote:
Excellent, thank you.  I will try to pair these with a goal of predicting 100% what the computer decided on, just to prove it can be done by hand if necessary.

Let me restate my ability to do this algorithm by hand:
 
* In the two 16-player floating double-elimination tournaments, I made no mistakes.
* In the two 17-player floating double-elimination tournaments, I made a mistake in only one round
* In the four floating triple-elimination tournaments, I made a mistake in eight rounds and only got it right in about forty rounds.  Pairing the middle rounds (4-7) of FTE should not be not be entrusted to a human, at least not me.
 
We need to have confidence in the code.  Fortunately, I'm gaining that confidence.  The nine times I messed up, I was able to track down my mistake and clearly see why the computer pairing was superior.  My brain is nearly fried from trying to pair all these tournaments by hand, but I think I have energy for one more check, if you don't mind producing it, Omar.  We've tried it with 4N and 4N +1, but not 4N+2 and 4N+3.  If you wouldn't mind producing one floating triple elimination each for 18 and 19 players, I would check those too.  (If it bogs down with 18 and 19, then 14 and 15 would be fine as well.)
IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2007 World Championship Format
« Reply #56 on: Sep 25th, 2006, 9:00pm »
Quote Quote Modify Modify

Some observations from these tournaments:
 
* It is quite common for order of elimination to be a poor way of settling second and third place.  About half the time there was a tie for number of wins, even when the order of elimination was clear.  I repeat my suggestion that there be an elimination playoff (a.k.a. consolation game(s)) among players tied for the medal positions by number of wins before elimination.
 
* A better seed definitely helps, but not as much as winning games does.  In the tournament here
 
http://arimaa.com/arimaa/tourn/compare/fte/paul/ntr1
 
Player #9 won the first four games, and was the only undefeated player when it was time to give a bye in round five, so #9 got to kick back on the sidelines and watch the fifth-round games of #2 vs. #4 and #1 vs. #3.  By inaccurate rating #9 was actually seeded tenth, but players seeded higher (1,4,5,6,7,8,10) got eliminated without ever receiving a bye, and finished behind #9 who eventually took third place.
 
The ninth-seeded player (twelfth in actual skill) also took third here:
 
http://arimaa.com/arimaa/tourn/compare/fde/paul/ntr2
 
* Omar, I think a very interesting experiment would be to measure whether seed or skill is more important.  Run a tournament 1000 times where the top two players have their measured ratings the reverse of their real ratings, so #1 in skill is seeded below #2 in skill.  Let the rest of the players have the same random inaccuracies in ratings as they have in your current experiments.  Who will win more often, the better seed or the better player?  One could also repeat the experiment switching #1 with #3 or lower to see how weak a player can be boosted to victory by an inaccurate high seeding.
« Last Edit: Sep 25th, 2006, 9:01pm by Fritzlein » IP Logged

omar
Forum Guru
*****



Arimaa player #2

   


Gender: male
Posts: 1003
Re: 2007 World Championship Format
« Reply #57 on: Sep 26th, 2006, 8:22am »
Quote Quote Modify Modify

on Sep 25th, 2006, 8:38pm, Fritzlein wrote:

My brain is nearly fried from trying to pair all these tournaments by hand, but I think I have energy for one more check, if you don't mind producing it, Omar.  We've tried it with 4N and 4N +1, but not 4N+2 and 4N+3.  If you wouldn't mind producing one floating triple elimination each for 18 and 19 players, I would check those too.  (If it bogs down with 18 and 19, then 14 and 15 would be fine as well.)

 
Thanks so much for verifying these Karl. I generated two more trials:
  http://arimaa.com/arimaa/tourn/compare/fte/paul/fte14
  http://arimaa.com/arimaa/tourn/compare/fte/paul/fte15
 
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2007 World Championship Format
« Reply #58 on: Sep 26th, 2006, 3:03pm »
Quote Quote Modify Modify

on Sep 26th, 2006, 8:22am, omar wrote:
http://arimaa.com/arimaa/tourn/compare/fte/paul/fte14
http://arimaa.com/arimaa/tourn/compare/fte/paul/fte15

Thanks Omar.  The algorithm worked fine in both cases.  In the 15-player tournament note that, although players #4 and #6 were eliminated at the same time in the second-last round, their records were 5-3 and 6-3 respectively, because player #4 got a bye and player #6 never did.  I think this should mean that player #6 unambiguously finishes in third place, despite the simultaneous elimination.
 
I expect the pairing to work more smoothly this year than last year.  I love the way this kind of floating triple-elimination works, but I'm aware that parts of it may seem strange or even wrong to other people.  We'll just have to test it out and see how folks react to the way things play out in practice.  Maybe after this year, people will demand a return to something more familiar like fixed-bracket double-elimination or round-robin.  Or maybe floating elimination will be such a smashing success it will catch on in the wider world...
« Last Edit: Sep 26th, 2006, 3:04pm by Fritzlein » IP Logged

omar
Forum Guru
*****



Arimaa player #2

   


Gender: male
Posts: 1003
Re: 2007 World Championship Format
« Reply #59 on: Sep 28th, 2006, 8:22am »
Quote Quote Modify Modify

I ran 2000 trials with the new FDE format and also 2000 trials with the new FTE format. Here are the numbers:
 
run3 'formats/floatDoubleElimRepair' 2000 16 500 50 9999
  1   35.0%
  2   23.9%
  3   16.1%
  4    9.8%
  5    6.0%
  6    3.6%
  7    2.5%
  8    1.3%
  9    0.8%
 10    0.5%
 11    0.1%
 13    0.1%
 
run3 'formats/floatTripElimRepair' 2000 16 500 50 9999
  1   36.6%
  2   25.7%
  3   16.2%
  4    8.8%
  5    5.1%
  6    3.5%
  7    1.8%
  8    1.2%
  9    0.6%
 10    0.2%
 11    0.1%
 12    0.1%
 14    0.1%
 
I was suprised to see FDE doing almost as well as FTE. Maybe it was just a fluke. I am running another 2000 trials of FDE now.
IP Logged
Pages: 1 2 3 4 5 6  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.