Performance based rating calculations


The following equation is solved numberically to determine the
rating (RP) of a given player.

k1(w1-W(r1-RP)) + k2*(w2-W(r2-RP)) + ... + kN*(wN-W(rN-RP)) = 0

where:
  r1 to rN are the ratings of the opponents
  w1 to wN are the results of the games (0, 0.5, 1 for lose, draw, win)
  k1 to kN are weights assigned to each game
  W() is the winning expectancy formula given below.


W(RD) =              1
             -----------------   where RD is the rating difference
              1 + 10^(RD/400)


The p* scripts described later were used to experiment with performance
based rating calculations described above.

These scripts read a list of games as input and print the rating 
computed as output. The input should be formated so that:
  The newer games are listed at the top with one line per game 
  using this format:
    <result><rating> [username] [when]
    For example:
      +1500 abc
      -1750 xyz
      =1610 abc
    means the first game was a draw against abc a 1610 player,
      second game was lost to xyz a 1750 player and the
      third game (most recent) was won against abc now rated 1500.
    If the username is not given it defaults to 'unknown'.
    When is given is the number of days since the game was played.
      If not given it defaults to 0.

A script called 'rep' is provided to help create input for
the p* scripts. 'rep' reads the command line arguments to print
the given strings a specified number of times. For example:
  rep '+1500 abc' 2 '-2000 xyz' 1
will produce:
  +1500 abc
  +1500 abc
  -2000 xyx
It uses ; to seperate games; for example:
  rep '+1500 abc; -1500 xyz' 2
will produce:
  +1500 abc
  -1500 xyz
  +1500 abc
  -1500 xyz
A * in the string is replaced by the repition number and can
be used to create unique player names; for example: 
  rep '+1000 a*' 3
will produce:
  +1000 a1
  +1000 a2
  +1000 a3
If a '-' appears as one of the arguments, it will copy
the standard input to standard output before continuing
with the processing of the rest of the arguments. 

A script called 'testit' is provided to run one of the p*
scripts and print out it's results for various test
situations. Example usage: testit p3

A script called 'gr' is provided which gets the most current
games history for a given player from the Arimaa gameroom and 
prints it to standard output. The format is suitable to be used 
as input to a p* script. Example usage: gr omar | p6
This would compute the rating for player 'omar' using the p6 script.

A script called 'ra' computes the rating accuracy as:
    RA = sqrt(P1) + sqrt(P2) + .... + sqrt(PN)
        where Pi is the number of games played against player i
    Example usage: gr omar | ra


p1
  Uses k1 to kN all equal 1. Has the problem that if a player has
  not lost a game or has not won any game it gives bad results.

p2
  Same as p1, but tries to fix the problem by adding a fictitious
  draw game against an random player (rated 0). The fictitious
  game is given a weight of 0.1 (as if it was played long ago).
  The fictitious game could also be a draw against an average
  rated player (such as rating 1500). But after about 5 real
  games it does not make much difference what the rating is of
  the fictitious player.
  This still has the problem that if a player constantly wins 
  games against a player with a fixed rating the winning players rating
  increases without bound. Like wise if the player loses it
  decreases without bound.

p3
  Same as p2 but
  uses k(i+1) = 0.98*k(i), and k(1) = 1, and game 1 is the newest.
  This prevents a players rating from increasing without bound.
  If a player constantly wins games against another player with a
  fixed rating then the winning players rating will increase to 
  a maximum of 1200 points above the other player. If a player
  consistently loses to a fixed rated player the rating will decrease
  to a maximum of 1200 points below the other player if the
  rating of the other player is below 117, otherwise it does not
  decrease without bound but will also not be above 0.

  This also has the nice property that if a player builds up his
  rating by winning lots of games against weak players, he will
  lose a lot of points when he loses a game. Where as if he
  acheived his rating by playing against players close to his rating 
  then he will not lose many points when he loses a game. 
    For example:
      Both
        rep '+1492' 20 | p3
      and
        rep '+2400; -2600' 10 | p3
      result in the player having a rating of 2500 after 20 games.
      Now if the first players lose to a 2500 player
        rep '-2500' 1 '+1492' 20 | p3
      his rating drops to: 2232, but the same lose for the second player
        rep '-2500' 1 '+2400; -2600' 10 | p3
      only drop the rating to 2479.
  A rating system that computes the new rating based only on the 
  current rating of the two players does not have this property.

  This still has the problem that if a player finds another player
  that has about an equal rating, but the style of play is such
  that it can be consistently defeated, the player can win lots
  of games against this player to inflate their rating (up to a
  maximum of 1200 points above the other player).

p4
  Same as p3 but
  decreases the weight of games if they were played against the 
  same player.  A factor of 1/sqrt(N) is used to decrease the games; 
  where N is the number of games with the same player.
  This has the property that after about 40 games with the same
  player the ratings does not increase much and peaks out at about
  70 games to about 780 points above the other player. Beyond 70 
  games the rating actually decreases but very. Here is an example:
    using: rep '+1000' N | p*
        N      p4      p3
        1     1512    1512
        2     1573    1635
        5     1649    1791
       10     1702    1904
       20     1746    2008
       30     1766    2063
       40     1775    2097
       50     1780    2121
       60     1781    2138
       70     1781    2151
       80     1779    2161
       90     1776    2169
      100     1773    2175
      200     1734    2197
      300     1701    2199
      400     1676    2200
      500     1656    2200

  This also has the nice property that if some games are won and
  some are lost then there is less of an effect. Here is an example:
    using: rep '+1000; -1000' N | p*
        N      p4      p3
        1      979     986
        2      986     995
        5      992    1000
       10      994    1001
       20      996    1002
       30      996    1003
       40      996    1003
       50      996    1003
  Notice that the difference after 20 game (10 wins; 10 loses; alternating) 
  is just 7 points (1001-994). However after 20 games (all wins) the
  difference is 262 points (2008-1746). This means that p4 is similar
  to p3 if some games are won and some are lost against the same
  player, but differs more if all games are won against the same
  player. Of course if every player is different p4 is the same
  as p3.

  This also has the nice property that if you build your rating
  by playing the same player it is not as stable as a rating that
  was built by playing against the field. For example:
    rep '+2000; -2000' 50 | p3    =   2003
    rep '+2000; -2000' 50 | p4    =   1995
  These ratings are almost the same, but the first was built
  by playing the field (though p3 is used it is the same as p4
  with all the opponents being different) while the second was
  built by playing all 100 games against the same player. Now
  lets see what happens when we lose a game to a different player:
    using: rep '-R playerX' 1 '+2000; -2000' 50 | p*
        R      p4      p3
      3000    1995    2003
      2500    1987    2002
      2000    1929    1995
      1500    1842    1987
      1000    1818    1986
       500    1817    1986
         0    1816    1986
  Notice that the rating built by playing the field (p3) drops only by
  a maximum of 17 points while the rating that was built by playing
  only one player drops by 179 points.

  If a player has inflated their rating by winning lots of games
  against the same player then their rating will be very unstable and 
  p4 will bring it down a lot when they lose a game to a different player.
  For example:
    rep '+1230' 100 | p4  gives a rating of 2003
    now using: rep '-R playerX' 1 '+1230' 100 | p4
        R      p4
      3000    1990
      2500    1911
      2000    1731
      1500    1541
      1000    1440
       500    1425
         0    1424

  To get an idea of how stable the rating is we can recompute the
  rating to see how much it would change if the player was to
  win or lose a game against a player with the same rating. This
  is used to give the +/- values along with the rating.

  The main problem with this script is that the rating decreases
  if many games are played with the same player (even if all games
  were wins).

  Another major problem is that if a player plays many (like 200)
  games against the same or selected few players in a short period
  of time they will cause the older games to be given less weight
  even though those older games may have been played not too long
  ago and could contribute to a more accurate answer for the players
  rating. So a player can basically flush out the older games by
  playing lots of new ones.

p5 and p6
  Various attempts at trying to fix the problems in p4

p7
  This script first reduces the impact of games played against
  the same player based on how old the game is in the sequence
  of games against this player. It then scales all the games
  played against a particular player based on the number of
  games played against that player. If more games have been
  played then the scaling factor is greater, but it can only
  increase up to a certian fixed value. Then it reduces the
  impact of games based on how many unique players have been
  played since that game. Finally it adds a fititious game
  which is a draw against a zero rated player.

p8
  Same as p7 except that two fictitious draws against a 1500
  rated player are used instead of one against a zero rated player.


--------