Performance based rating calculations The following equation is solved numberically to determine the rating (RP) of a given player. k1(w1-W(r1-RP)) + k2*(w2-W(r2-RP)) + ... + kN*(wN-W(rN-RP)) = 0 where: r1 to rN are the ratings of the opponents w1 to wN are the results of the games (0, 0.5, 1 for lose, draw, win) k1 to kN are weights assigned to each game W() is the winning expectancy formula given below. W(RD) = 1 ----------------- where RD is the rating difference 1 + 10^(RD/400) The p* scripts described later were used to experiment with performance based rating calculations described above. These scripts read a list of games as input and print the rating computed as output. The input should be formated so that: The newer games are listed at the top with one line per game using this format: [username] [when] For example: +1500 abc -1750 xyz =1610 abc means the first game was a draw against abc a 1610 player, second game was lost to xyz a 1750 player and the third game (most recent) was won against abc now rated 1500. If the username is not given it defaults to 'unknown'. When is given is the number of days since the game was played. If not given it defaults to 0. A script called 'rep' is provided to help create input for the p* scripts. 'rep' reads the command line arguments to print the given strings a specified number of times. For example: rep '+1500 abc' 2 '-2000 xyz' 1 will produce: +1500 abc +1500 abc -2000 xyx It uses ; to seperate games; for example: rep '+1500 abc; -1500 xyz' 2 will produce: +1500 abc -1500 xyz +1500 abc -1500 xyz A * in the string is replaced by the repition number and can be used to create unique player names; for example: rep '+1000 a*' 3 will produce: +1000 a1 +1000 a2 +1000 a3 If a '-' appears as one of the arguments, it will copy the standard input to standard output before continuing with the processing of the rest of the arguments. A script called 'testit' is provided to run one of the p* scripts and print out it's results for various test situations. Example usage: testit p3 A script called 'gr' is provided which gets the most current games history for a given player from the Arimaa gameroom and prints it to standard output. The format is suitable to be used as input to a p* script. Example usage: gr omar | p6 This would compute the rating for player 'omar' using the p6 script. A script called 'ra' computes the rating accuracy as: RA = sqrt(P1) + sqrt(P2) + .... + sqrt(PN) where Pi is the number of games played against player i Example usage: gr omar | ra p1 Uses k1 to kN all equal 1. Has the problem that if a player has not lost a game or has not won any game it gives bad results. p2 Same as p1, but tries to fix the problem by adding a fictitious draw game against an random player (rated 0). The fictitious game is given a weight of 0.1 (as if it was played long ago). The fictitious game could also be a draw against an average rated player (such as rating 1500). But after about 5 real games it does not make much difference what the rating is of the fictitious player. This still has the problem that if a player constantly wins games against a player with a fixed rating the winning players rating increases without bound. Like wise if the player loses it decreases without bound. p3 Same as p2 but uses k(i+1) = 0.98*k(i), and k(1) = 1, and game 1 is the newest. This prevents a players rating from increasing without bound. If a player constantly wins games against another player with a fixed rating then the winning players rating will increase to a maximum of 1200 points above the other player. If a player consistently loses to a fixed rated player the rating will decrease to a maximum of 1200 points below the other player if the rating of the other player is below 117, otherwise it does not decrease without bound but will also not be above 0. This also has the nice property that if a player builds up his rating by winning lots of games against weak players, he will lose a lot of points when he loses a game. Where as if he acheived his rating by playing against players close to his rating then he will not lose many points when he loses a game. For example: Both rep '+1492' 20 | p3 and rep '+2400; -2600' 10 | p3 result in the player having a rating of 2500 after 20 games. Now if the first players lose to a 2500 player rep '-2500' 1 '+1492' 20 | p3 his rating drops to: 2232, but the same lose for the second player rep '-2500' 1 '+2400; -2600' 10 | p3 only drop the rating to 2479. A rating system that computes the new rating based only on the current rating of the two players does not have this property. This still has the problem that if a player finds another player that has about an equal rating, but the style of play is such that it can be consistently defeated, the player can win lots of games against this player to inflate their rating (up to a maximum of 1200 points above the other player). p4 Same as p3 but decreases the weight of games if they were played against the same player. A factor of 1/sqrt(N) is used to decrease the games; where N is the number of games with the same player. This has the property that after about 40 games with the same player the ratings does not increase much and peaks out at about 70 games to about 780 points above the other player. Beyond 70 games the rating actually decreases but very. Here is an example: using: rep '+1000' N | p* N p4 p3 1 1512 1512 2 1573 1635 5 1649 1791 10 1702 1904 20 1746 2008 30 1766 2063 40 1775 2097 50 1780 2121 60 1781 2138 70 1781 2151 80 1779 2161 90 1776 2169 100 1773 2175 200 1734 2197 300 1701 2199 400 1676 2200 500 1656 2200 This also has the nice property that if some games are won and some are lost then there is less of an effect. Here is an example: using: rep '+1000; -1000' N | p* N p4 p3 1 979 986 2 986 995 5 992 1000 10 994 1001 20 996 1002 30 996 1003 40 996 1003 50 996 1003 Notice that the difference after 20 game (10 wins; 10 loses; alternating) is just 7 points (1001-994). However after 20 games (all wins) the difference is 262 points (2008-1746). This means that p4 is similar to p3 if some games are won and some are lost against the same player, but differs more if all games are won against the same player. Of course if every player is different p4 is the same as p3. This also has the nice property that if you build your rating by playing the same player it is not as stable as a rating that was built by playing against the field. For example: rep '+2000; -2000' 50 | p3 = 2003 rep '+2000; -2000' 50 | p4 = 1995 These ratings are almost the same, but the first was built by playing the field (though p3 is used it is the same as p4 with all the opponents being different) while the second was built by playing all 100 games against the same player. Now lets see what happens when we lose a game to a different player: using: rep '-R playerX' 1 '+2000; -2000' 50 | p* R p4 p3 3000 1995 2003 2500 1987 2002 2000 1929 1995 1500 1842 1987 1000 1818 1986 500 1817 1986 0 1816 1986 Notice that the rating built by playing the field (p3) drops only by a maximum of 17 points while the rating that was built by playing only one player drops by 179 points. If a player has inflated their rating by winning lots of games against the same player then their rating will be very unstable and p4 will bring it down a lot when they lose a game to a different player. For example: rep '+1230' 100 | p4 gives a rating of 2003 now using: rep '-R playerX' 1 '+1230' 100 | p4 R p4 3000 1990 2500 1911 2000 1731 1500 1541 1000 1440 500 1425 0 1424 To get an idea of how stable the rating is we can recompute the rating to see how much it would change if the player was to win or lose a game against a player with the same rating. This is used to give the +/- values along with the rating. The main problem with this script is that the rating decreases if many games are played with the same player (even if all games were wins). Another major problem is that if a player plays many (like 200) games against the same or selected few players in a short period of time they will cause the older games to be given less weight even though those older games may have been played not too long ago and could contribute to a more accurate answer for the players rating. So a player can basically flush out the older games by playing lots of new ones. p5 and p6 Various attempts at trying to fix the problems in p4 p7 This script first reduces the impact of games played against the same player based on how old the game is in the sequence of games against this player. It then scales all the games played against a particular player based on the number of games played against that player. If more games have been played then the scaling factor is greater, but it can only increase up to a certian fixed value. Then it reduces the impact of games based on how many unique players have been played since that game. Finally it adds a fititious game which is a draw against a zero rated player. p8 Same as p7 except that two fictitious draws against a 1500 rated player are used instead of one against a zero rated player. --------