Welcome, Guest. Please Login or Register.
May 5th, 2024, 2:03am

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « Experimental new rating system »


   Arimaa Forum
   Arimaa
   General Discussion
(Moderator: supersamu)
   Experimental new rating system
« Previous topic | Next topic »
Pages: 1 2 3  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: Experimental new rating system  (Read 9190 times)
aaaa
Forum Guru
*****



Arimaa player #958

   


Posts: 768
Experimental new rating system
« on: May 26th, 2006, 1:55pm »
Quote Quote Modify Modify

Mark Glickman, known for his rating system Glicko, which extended Elo such that it takes time into account, has designed a successor system which also takes into account the volatility of the strength of a player (meaning that one can forego with the fudge of imposing a minimum deviation). It is described here.
 
I've adapted Glicko-2 for use in a "real-time" context and took a stab at optimizing the various parameters based on all the rated games in the database. Unfortunately, it has not been originally designed to handle the ultra-short "rating periods" of one second I gave it, making it prone to hanging in the iteration phase. Nevertheless, I didn't want people here to miss out on it, so below is a list of the 50 top-rated players according to a particularly customized version of it (apologies for the bad layout as I couldn't get the data working with the forum table markup). I'm particularly interested in getting queries of a statistical nature about the system as well as hearing what properties exactly is desired of it here.
 
player rating (old style) rating deviation (old style) volatility
Fritzlein 5.6630497 (2483.772535) 0.8068123002 (140.1576578) 0.0006376507628
99of9 4.910623754 (2353.062755) 0.832506 (144.6211108) 0.0006516585594
robinson 4.62855297 (2304.062039) 0.7831770235 (136.0517895) 0.0006228492358
Adanac 4.610491197 (2300.924388) 0.7379585114 (128.1965291) 0.00056411572
PMertens 4.360161883 (2257.43773) 0.832506 (144.6211108) 0.0008407439315
Ryan_Cable 4.326954398 (2251.668999) 0.832506 (144.6211108) 0.0004918684095
Belbo 4.27533983 (2242.702629) 0.7576357981 (131.6148241) 0.0006193388697
mouse 3.962690082 (2188.389803) 0.832506 (144.6211108) 0.0005526332836
Arimanator 3.78416112 (2157.376145) 0.832506 (144.6211108) 0.0008069304669
RonWeasley 3.697059358 (2142.245018) 0.832506 (144.6211108) 0.0005989948555
chessandgo 3.671875516 (2137.870137) 0.5040354009 (87.55992097) 0.0007090357629
omar 3.546825706 (2116.146759) 0.832506 (144.6211108) 0.0006258232938
naveed 3.451697291 (2099.62126) 0.832506 (144.6211108) 0.0007640425682
blue22 3.382977003 (2087.683322) 0.832506 (144.6211108) 0.0005354651612
bot_Bomb2005CC 3.170315207 (2050.740183) 0.7609260811 (132.1864048) 0.0005321464476
bot_Bomb2005Fast 3.064107257 (2032.289972) 0.8262308837 (143.5310114) 0.000611501966
bot_Bomb2005Blitz 3.053605699 (2030.465664) 0.6707376748 (116.5190732) 0.0009073338997
OLTI 3.03204056 (2026.719416) 0.832506 (144.6211108) 0.0005484800951
bot_Bomb2005P2 2.822907443 (1990.389271) 0.4867876375 (84.56367745) 0.0004840308518
thorin 2.767536776 (1980.7704) 0.832506 (144.6211108) 0.0005788629022
omarFast 2.726652212 (1973.668024) 0.832506 (144.6211108) 0.0006681596461
bot_speedy 2.682962807 (1966.078396) 0.832506 (144.6211108) 0.0007288659695
bleitner 2.610592744 (1953.506428) 0.832506 (144.6211108) 0.0005072821952
jdb 2.610499995 (1953.490316) 0.832506 (144.6211108) 0.0006070601943
bot_Clueless2005Fast 2.58310922 (1948.732051) 0.6668649056 (115.8463043) 0.0006654729265
megamau 2.541955565 (1941.582928) 0.832506 (144.6211108) 0.0006978433977
bot_lightning 2.473854617 (1929.752582) 0.832506 (144.6211108) 0.0006763171832
Swynndla 2.422299879 (1920.796606) 0.7948650666 (138.0822107) 0.0006315119355
frostlad 2.419039538 (1920.230227) 0.8071741155 (140.2205116) 0.0006132318648
BlackKnight 2.347565986 (1907.813998) 0.832506 (144.6211108) 0.0006623329595
bot_GnoBot2005Fast 2.303539877 (1900.16588) 0.7849751855 (136.3641623) 0.000675043096
bot_Clueless2005Blitz 2.2345132 (1888.174717) 0.7551311049 (131.1797143) 0.0007258793672
bot_Clueless2005P2 2.212829489 (1884.407871) 0.6895930611 (119.7945895) 0.0006311812553
bot_Clueless2005CC 2.168773741 (1876.754603) 0.8308199736 (144.328218) 0.0006241990032
bot_Arimaanator 2.090386741 (1863.137386) 0.8148788129 (141.5589546) 0.0004081767749
bot_Clueless2006P2 2.075735855 (1860.592266) 0.7943007282 (137.984175) 0.0007195330871
kamikazeking 2.011747429 (1849.476337) 0.7556039343 (131.2618531) 0.0005792767201
ytri 1.972060074 (1842.581938) 0.832506 (144.6211108) 0.0005580832465
filerank 1.970310098 (1842.277936) 0.832506 (144.6211108) 0.000569107061
haizhi 1.894086955 (1829.036619) 0.832506 (144.6211108) 0.0008102594244
Aamir 1.856510045 (1822.508841) 0.832506 (144.6211108) 0.0006060001632
bot_haizhi 1.702288469 (1795.717808) 0.832506 (144.6211108) 0.0006619388723
bot_Bomb2004CC 1.674421387 (1790.8768) 0.832506 (144.6211108) 0.000606956265
clauchau 1.633644449 (1783.79312) 0.832506 (144.6211108) 0.0005567039486
grey_0x2A 1.593474756 (1776.814929) 0.832506 (144.6211108) 0.000601434371
deselby 1.591926598 (1776.545986) 0.832506 (144.6211108) 0.0006369085893
CeeJay 1.578303073 (1774.179338) 0.832506 (144.6211108) 0.0007813484887
bot_Aamira2006Fast 1.574885954 (1773.585723) 0.7628650488 (132.523238) 0.0006310808374
bot_Clueless2006Fast 1.562548955 (1771.442567) 0.832506 (144.6211108) 0.0006612313097
bot_Loc2005Blitz 1.538297761 (1767.229703) 0.7531418141 (130.834139) 0.0006051832325
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Experimental new rating system
« Reply #1 on: May 26th, 2006, 3:45pm »
Quote Quote Modify Modify

Hey, this rocks.  I have the highest respect for Mark Glickman, and it's cool to see what numbers are produced by an implementation of the Glicko system.  (Actually, I confess I liked Glicko but never read up on Glicko-2; an omission I shall shortly remedy.)
 
I'm curious why blue22 is ranked so much lower and Belbo so much higher in your ratings than the official ratings.  Maybe it's because Glicko doesn't like to move the ratings around as much?  Belbo established a very high rating with tons of games, and has since dropped off his peak in the official ratings, but maybe Glicko considered him extremely firmly established and didn't let his rating move down as much.
 
The rating deviation of 84 for Bomb2005P2 seems suspicious to me.  That bot has played 812 games, but only 447 of those were rated.  Why should it have a deviation so much lower than mine, when I've played 773 rated games?
 
Anyway, the issue I am most concerned about is not an issue that Glickman has addressed at all, to the best of my knowledge.  What troubles me is the non-transitivity of the ratings.  You can see the non-transitivity in action all the time in Arimaa.  Sometimes a newcomer will get stuck on BombP1 on the ladder, and lose thirty times in a row, driving their rating down to, say, 1200.  Meanwhile a newcomer who figures out a technique for beating BombP1 might win thirty in a row and pump their rating to 1800.  But the gap the between the two humans is not 600 points.  They are each properly rated relative to BombP1, but improperly rated relative to each other.  That is to say, the ratings are not transitive.
 
I would love there to be some mechanism whereby a ton of games against a single opponent would have a reduced impact on one's rating, in order to mitigate the effects of non-transitivity.  In my mind it seems roughly correct to weight games against a single opponent by the square root of the number of games so that, for example, 25 games against one opponent would have the same impact as one game each against five different opponents.
 
But I recognize that reducing the weight of certain games is a kludge, and I wish that I could think of a more elegant way to deal with non-transitivity.  I'd love to hear alternative suggestions.  Non-transitivity is such a huge problem, though, that I don't think it can be ignored.
IP Logged

aaaa
Forum Guru
*****



Arimaa player #958

   


Posts: 768
Re: Experimental new rating system
« Reply #2 on: May 26th, 2006, 4:50pm »
Quote Quote Modify Modify

on May 26th, 2006, 3:45pm, Fritzlein wrote:
Hey, this rocks.  I have the highest respect for Mark Glickman, and it's cool to see what numbers are produced by an implementation of the Glicko system.  (Actually, I confess I liked Glicko but never read up on Glicko-2; an omission I shall shortly remedy.)

Once again I would like to point out that Glicko-2 was not originally intended to be applied on a game-by-game basis. Glicko was modified to do so for the Free Internet Chess Server with Glickman's knowledge and I was curious enough to find out if the same was possible for Glicko-2.
 
on May 26th, 2006, 3:45pm, Fritzlein wrote:

I'm curious why blue22 is ranked so much lower and Belbo so much higher in your ratings than the official ratings.  Maybe it's because Glicko doesn't like to move the ratings around as much?  Belbo established a very high rating with tons of games, and has since dropped off his peak in the official ratings, but maybe Glicko considered him extremely firmly established and didn't let his rating move down as much.

If you look at the fifth column, you can see that Belbo has been given a higher volatility than blue22. For some reason, the system thinks Belbo's performance is less consistent than blue22's (0.0006193388697 vs 0.0005354651612).
 
on May 26th, 2006, 3:45pm, Fritzlein wrote:

The rating deviation of 84 for Bomb2005P2 seems suspicious to me.  That bot has played 812 games, but only 447 of those were rated.  Why should it have a deviation so much lower than mine, when I've played 773 rated games?

Probably due to the large amount of bot-bot matches taken into account, maximizing the prediction power of the system has resulted in the rating deviation growing very fast if a player doesn't play in a while. Depending on one's volatility, it will take only about 20 days before one's rating deviation becomes the maximum again. I've already been experimenting with excluding bot-bot matches from consideration.
IP Logged
Ryan_Cable
Forum Guru
*****



Arimaa player #951

   


Gender: male
Posts: 138
Re: Experimental new rating system
« Reply #3 on: May 26th, 2006, 10:20pm »
Quote Quote Modify Modify

This does support my belief that our ratings are currently too compressed.  Other than that, I'm not clear on what the advantage of this system is over our current system.
 
Our current system is fairly easy to understand, and anyone can calculate the possible rating changes that would result from playing a given opponent.  I would not want to give that up unless there is a substantial improvement in rating accuracy.
IP Logged
aaaa
Forum Guru
*****



Arimaa player #958

   


Posts: 768
Re: Experimental new rating system
« Reply #4 on: May 27th, 2006, 9:57am »
Quote Quote Modify Modify

Here's the result again after the choice of parameters has been optimized for rated games including at least one human. Tell me if this one is more sane.
 
player rating (old style) rating deviation (old style) volatility
Fritzlein 5.482469551 (2452.402549) 0.8047089398 (139.7922667) 0.000538855377
99of9 4.73692475 (2322.888146) 0.8288227 (143.981256) 0.0005558081298
robinson 4.435887799 (2270.59267) 0.7746629858 (134.5727496) 0.0005238959024
Adanac 4.401616739 (2264.639176) 0.7049974879 (122.4706126) 0.0004748660898
PMertens 4.212564795 (2231.797489) 0.8288227 (143.981256) 0.0006714964567
Ryan_Cable 4.147041625 (2220.414948) 0.8288227 (143.981256) 0.0004148334762
Belbo 4.058155186 (2204.973791) 0.7336598637 (127.4497775) 0.0005050846755
mouse 3.839142456 (2166.927381) 0.8288227 (143.981256) 0.0004747277447
RonWeasley 3.555657933 (2117.681074) 0.8288227 (143.981256) 0.0005129970293
Arimanator 3.55381375 (2117.360706) 0.8288227 (143.981256) 0.0006905204892
omar 3.399174901 (2090.497186) 0.8288227 (143.981256) 0.0005327536732
chessandgo 3.381264438 (2087.385819) 0.4764257499 (82.76363314) 0.0006149328405
naveed 3.282873511 (2070.293564) 0.8288227 (143.981256) 0.0006009688416
blue22 3.212482055 (2058.065315) 0.7934868976 (137.8427982) 0.0004544315336
bot_Bomb2005CC 2.970377488 (2016.007442) 0.6806065302 (118.2334691) 0.000455604352
OLTI 2.94866222 (2012.235114) 0.8288227 (143.981256) 0.0004670586044
bot_Bomb2005Blitz 2.838656284 (1993.125125) 0.6355597776 (110.4080463) 0.0007068470581
bot_Bomb2005Fast 2.82836776 (1991.337825) 0.8188877809 (142.2553838) 0.0005046538155
bot_Bomb2005P2 2.677105717 (1965.060915) 0.4418521777 (76.75758823) 0.0004078946749
omarFast 2.648254447 (1960.048936) 0.8288227 (143.981256) 0.0005759785643
thorin 2.530981686 (1939.67657) 0.8288227 (143.981256) 0.0004996524011
bleitner 2.474353922 (1929.83932) 0.8288227 (143.981256) 0.0004316956614
bot_speedy 2.434224123 (1922.868059) 0.8288227 (143.981256) 0.0005830126789
jdb 2.427799948 (1921.752066) 0.8288227 (143.981256) 0.0005177591978
megamau 2.420580392 (1920.4979) 0.8288227 (143.981256) 0.0006022504608
bot_Clueless2005Fast 2.388905996 (1914.995494) 0.6231364636 (108.2498956) 0.0005694958969
bot_lightning 2.342006325 (1906.848186) 0.8288227 (143.981256) 0.0005799969339
frostlad 2.256872674 (1892.058956) 0.7260031127 (126.1196635) 0.0005255536504
Swynndla 2.210853008 (1884.064521) 0.7130838884 (123.8753643) 0.0005439431955
BlackKnight 2.1852398 (1879.615051) 0.8288227 (143.981256) 0.0005633310056
bot_GnoBot2005Fast 2.095862689 (1864.088655) 0.712771455 (123.8210891) 0.0005831434752
bot_Clueless2005P2 2.072370715 (1860.007681) 0.6243452409 (108.4598817) 0.0005399026625
bot_Clueless2005Blitz 2.06877531 (1859.383096) 0.7454407812 (129.4963325) 0.0006259613599
bot_Clueless2005CC 2.025541811 (1851.872667) 0.7638517253 (132.6946412) 0.0005358564737
bot_Arimaanator 1.960662587 (1840.601991) 0.7346765571 (127.6263952) 0.0003402609408
kamikazeking 1.882753526 (1827.0678) 0.7424657372 (128.9795144) 0.0004977506049
bot_Clueless2006P2 1.863160001 (1823.664056) 0.7153645161 (124.2715499) 0.0006227940527
ytri 1.849213454 (1821.241293) 0.8288227 (143.981256) 0.0004792753914
filerank 1.821360335 (1816.40271) 0.8288227 (143.981256) 0.0004887645442
Aamir 1.796505038 (1812.084903) 0.8288227 (143.981256) 0.0005229366749
haizhi 1.7666057 (1806.890856) 0.8288227 (143.981256) 0.0007029900686
bot_haizhi 1.581152235 (1774.674288) 0.8288227 (143.981256) 0.000576510305
bot_Bomb2004CC 1.521184847 (1764.256885) 0.8288227 (143.981256) 0.0005199753632
grey_0x2A 1.497306546 (1760.108799) 0.8288227 (143.981256) 0.0005169386559
clauchau 1.483899672 (1757.779786) 0.8288227 (143.981256) 0.0004723108198
deselby 1.456556559 (1753.029801) 0.8288227 (143.981256) 0.0005515469025
CeeJay 1.450091942 (1751.906782) 0.8288227 (143.981256) 0.0006684186778
6sense 1.445184387 (1751.054252) 0.8288227 (143.981256) 0.0005290605304
bot_Clueless2006Fast 1.440374412 (1750.218674) 0.7802854833 (135.5494775) 0.0005738607317
Paul 1.431647882 (1748.70272) 0.8288227 (143.981256) 0.0005009657358
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Experimental new rating system
« Reply #5 on: May 29th, 2006, 10:49am »
Quote Quote Modify Modify

I decided to produce ratings based purely on rated human vs. human games (1553 of them so far), due to discussion in another thread.  However, when I started to implement an idea we had discussed earlier to deal with non-transitivity, I got distracted by a different, older idea I had of making the ratings retrospective.
 
All rating systems I know of use an "update and forget" method.  After each game (or each rating period) players get new ratings, but how they reached their ratings is thrown away.  They carry only their rating forward, and no history.  Forgetting history might have some disadvantages, in my estimation.
 
Suppose, for example, that chessandgo joins the server and for his first ten human games loses to me ten times.  The rating system gives me hardly any credit for my wins.  If then chessandgo beats a lot of other people to push his rating up, we know after the fact that he was pretty good all along, but that doesn't help me any.  I only get points for beating a sub-1500 player even if he was increasing towards 1900 strength by the end our first ten games.
 
So I created a new historical system (or one might say retrospective system) to counteract this trend.  It remembers all the old game results, and if someone does better (or worse) in the future, it retrospectively adjusts their ratings up (or down) in the past, as well as retropectively adjusting the awards and penalities to their opponents.
 
There's actually just one formula in the FRIAR system (Fritz's Retrospectively Iterated Arimaa Ratings):  Your rating as of any game is the average of your rating from the game before and your rating from the game after, plus the award/penalty for the game itself.  The game award/penalty is calculated from the same formula as standard Elo ratings with a k-factor of 15, i.e.
 
15 * (score - 1/(1+10^((Ropp - Rmine)/400)))
 
If it is a player's first game, his "rating from the game before" is 1500.  If it is a player's last game, then he just gets the game award tacked on to the previous game.
 
To calculate the ratings to match this formula, I just iterated a bunch of times.  The first interesting point is that the ratings are much more volatile than standard Elo ratings with a k-factor of 32.
 
The second interesting point is that the ratings converge glacially slowly.  I did 200 iterations overnight, but I suspect that the extreme ratings would push out an additional hundred points if only I could do 2000 iterations.  Unfortunately, my code is dog-slow because all parameters are stored (and looked up) in MS Access tables.  If someone did this properly with a C array and some pointers, it would probably take a second per iteration instead of a minute per as it took me.
 
So here are the not-really-converged ratings according to FRIAR, based only on 1553 hvh rated games, and compared to the current server ratings:
 
Name  FRIAR  Sever
Fritzlein 2320 2309
Adanac 2245 2177
robinson 2230 2148
99of9 2212 2169
Belbo 2172 2002
PMertens 2115 2086
Ryan_Cable 2085 2130
chessandgo 2052 2015
omar 2050 1947
blue22 1989 2005
Swynndla 1989 1790
RonWeasley 1979 1941
BlackKnight 1918 1833
naveed 1876 1956
jdb 1875 1796
OLTI 1850 1958
Spunk 1750 1472
mouse 1728 2051
KT2006 1715 1657
frostlad 1715 1807
seanick 1702 1537
grey_0x2A 1692 1709
Arimanator 1689 2035
kamikazeking 1668 1751
thorin 1654 1895
megamau 1649 1788
 
Belbo has a significantly higher rating under FRIAR.  This makes complete sense because he had a stellar result in last year's postal tourney, and has hardly played humans since then, except for the four games he has already won in this year's postal.  His reduced server rating is due to losing a few to BombFast while training for the WC, and FRIAR ignores such games.
 
Swynndla also gets a huge boost in FRIAR from beating tons of different human players, even though many were newcomers.  He may therefore be somewhat overrated in FRIAR, but I don't mind seeing that the same strategy that works in Player of the Month also boosts the FRIAR rating.
 
I'm pleased that FRIAR rates jdb and naveed about the same, despite their divergent server ratings.
 
I had never heard of Spunk before, but he had a good record in the very early days of the server against omar, who later turned out to be very good.  Then when the early bots came on-line, Spunk lost all his points to those bots, then left.  The FRIAR rating for Spunk actually nearly matches his server rating from before the time he started to play bots.
 
I'm sure seanick will be happy to note that FRIAR respects his record against human opponents and ignores his string of losses to tough bots.
 
FRIAR gives a huge rating penalty to mouse relative to mouse's server rating.  This reflects the fact that mouse has only played 12 rated games against humans ever.  He has a 6-6 record against fairly tough opposition, but it simply isn't enough games to pull away from 1500 very far.
 
Arimanator, in contrast, has played enough games against humans to establish a rating, but his 22-46 record doesn't put him very high in the FRIAR rankings.  His high server rating is attributable largely to bot-bashing.
 
I was surpised clauchau didn't make the list of top players, but after peaking at 1899, he dropped back to 1626.  That goes to show what happens if you don't keep up with advances in Arimaa theory.
 
Haizhi, filerank, ytri, and some other players with a decent server ranking are invisible to FRIAR because they have played no games or hardly any games against humans.  Thorin will show up in the rankings much more clearly once the current postal tournament is over, I guarantee.
 
On the whole, I don't think FRIAR ratings are any more accurate than the server ratings in terms of predicting future game outcomes.  Neverthelss, I think FRIAR admirably meets the goals of a pure-human rating to go alongside the standard server rating.
« Last Edit: May 29th, 2006, 2:46pm by Fritzlein » IP Logged

Ryan_Cable
Forum Guru
*****



Arimaa player #951

   


Gender: male
Posts: 138
Re: Experimental new rating system
« Reply #6 on: May 29th, 2006, 2:09pm »
Quote Quote Modify Modify

I don’t understand how the retrospective iteration works.  Are you assuming that everyone has constant skill over time?  That seems like a particularly bad idea.
 
I am pleasantly surprised to see how high my HvH rating is.  I thought I was significantly more overrated than that.
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Experimental new rating system
« Reply #7 on: May 29th, 2006, 2:37pm »
Quote Quote Modify Modify

on May 29th, 2006, 2:09pm, Ryan_Cable wrote:
I don’t understand how the retrospective iteration works.  Are you assuming that everyone has constant skill over time?

No, no, I'm not holding skill constant over time.  Your rating at a given time is influenced most by the games very near it, and less and less by games that far before it or far after it.  So your rating at the time of your second game is hardly influenced at all by whether your hundredth game was a win or a loss.
 
To each player of each game, I assign a rating that is supposed to represent his skill at the time of that game.  The assumption is that his skill at that time will be approximately the average of his skill the game before and the game after.
 
Take my last three games, for example:
 
32240 Ryan_Cable vs. Fritzlein
32276 Fritzlein vs. chessandgo
32282 Fritzlein vs. Swynndla
 
As part of my iterative pass through the ratings, I want to re-calculate how strong I was when I played game 32276.  I look ahead and see I was rated 2310 in game 32282, but only rated 2302 in game 32240.  My rating should be near the average of 2306.  I beat chessandgo who was rated 2052, So I recalculate my rating in game 32276 as
 
2306 + 15*(1-1/(1+10^((2052-2306)/400) =  
 
2308.8221
 
When the ratings stabilize after many many iterations, each player's rating in each game will be exactly equal to the average of his ratings before and after, plus the bonus (penalty) for winning (losing) the game in question.  
 
This list I gave was only the ratings of each player at the end of the line; I apparently peaked about 150 points higher than my final rating.  Long winning streaks or losing streaks will cause your rating to whip around even more in the FRIAR system than in the current server system.  
 
There is probably a much cleverer way to reach convergence than by making pass after pass of setting each rating in each game to what it would have been given the other ratings of the previous iteration.  My coding ability was only adequate for a simplistic solution that doesn't run fast enough to converge in a reasonable amount of time.  Sad  In C on a fast computer, however, the simplistic iteration might be adequate.
« Last Edit: May 29th, 2006, 2:51pm by Fritzlein » IP Logged

chessandgo
Forum Guru
*****



Arimaa player #1889

   


Gender: male
Posts: 1244
Re: Experimental new rating system
« Reply #8 on: May 29th, 2006, 5:50pm »
Quote Quote Modify Modify

on May 29th, 2006, 10:49am, Fritzlein wrote:

 
Suppose, for example, that chessandgo joins the server and ...

 
I'm fortunate not to have you as a math teacher :  
let chessandgo and BlackKnight be real numbers, then chessandgo^2 + Blacknight = ...
it would be really harder to write down equations Smiley
IP Logged

seanick
Forum Guru
*****



SeaNICK

    seanick
Email

Gender: male
Posts: 97
Re: Experimental new rating system
« Reply #9 on: May 31st, 2006, 1:21am »
Quote Quote Modify Modify

Yeah, I am all for this new rating system, heh heh...  
 
what about something that kept track of time taken? would the best players games take longer per move relative to the time scale, than less highly rated players? does the line go up or down in terms of % of available time per move, when playing someone of equal rating? Are those numbers easily mineable or are they somewhat obscured within various sources?
 
I am not a linux user but have begun to study some things analytically with code on win32. so such things would interest me except for the problem of having to use linux. I wouldn't mind, but ... my employer would have a few reservations about the idea.
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Experimental new rating system
« Reply #10 on: May 31st, 2006, 9:18am »
Quote Quote Modify Modify

One problem with the server ratings (which FRIAR doesn't address in the slightest) is that different humans seem to benefit differently from extra thinking time.  Some players, notably Belbo and Omar, are tigers at a slow time control or postally, but tend to fall apart in fast games.  Other players, most notably kamikazeking and PMertens, can play great moves even at blitz speeds, but don't seem to get very much better given more time.  (Actually, PMertens doesn't even use all of his time given more time.)
 
In my opinion it isn't a good idea to say the players who can move faster are the better players.  There are different kinds of skill.  I'd rather say that some players are good at blitz and other players are good at postal games.  In another thread we discussed having ratings reflect time control.
 
http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=talk;action=display;nu m=1103741634;start=0#0
 
Note that back then the fastest time control available was 30 seconds per move, and it was already an issue!
 
IP Logged

aaaa
Forum Guru
*****



Arimaa player #958

   


Posts: 768
Re: Experimental new rating system
« Reply #11 on: May 31st, 2006, 12:33pm »
Quote Quote Modify Modify

You might be interested in this article, where it is proposed that games at different time controls are to be given different weights.
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Experimental new rating system
« Reply #12 on: Jun 6th, 2006, 12:11am »
Quote Quote Modify Modify

I decided that a k-factor of 15 was making the FRIAR ratings way too volatile; I lowered it to 10.  I ran the numbers again, this time letting them converge a bit longer.  Also I added in last week's games, 23 more.  (Sorry, chessandgo, your four big wins from Sunday and Monday aren't there yet; you would surely be over 2100 with them included.)  The FRIAR top 25, with number of games played:
 
rate games username
2417 215 Fritzlein
2236 201 99of9
2228 100 Adanac
2194 265 PMertens
2182 201 robinson
2149 121 Belbo
2086 111 Ryan_Cable
2034 116 omar
2031 126 jdb
2015 67 chessandgo
2013 103 Swynndla
1963 73 blue22
1961 19 RonWeasley
1912 223 naveed
1897 79 OLTI
1894 18 BlackKnight
1765 66 kamikazeking
1742 22 frostlad
1714 13 Spunk
1701 68 Arimanator
1691 12 mouse
1680 23 grey_0x2A
1645 16 KT2006
1644 49 megamau
1639 43 clauchau
« Last Edit: Jun 6th, 2006, 1:35pm by Fritzlein » IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Experimental new rating system
« Reply #13 on: Jun 6th, 2006, 1:14am »
Quote Quote Modify Modify

And now the fun part: a graph of the historical FRIAR ratings.  (Only the top 7 by number of hvh games get personalized colors; sorry!)
 

 
Note how volatile the ratings are even with the k-factor reduced to 10.  On the official server ratings I retained the top ranking even when I tied for fourth in the 2006 World Championship, but the FRIAR ratings have me dipping below robinson, Adanac, and PMertens, i.e. all three of the WC medalists.
 
At the same time that FRIAR ratings are volatile, note that people have to play a significant number of games to move far from 1500.  In this sense the volatility of FRIAR is opposite to that of the server.  On the server your rating changes a lot at first, and slowly later.  With FRIAR your rating changes slowly until you have played fifteen games or so, but later on winning streaks (or losing streaks) have a bigger effect than they do on the server.
 
I note that in August 2004, around the time I joined the server, FRIAR considered 99of9 to be the most dominant player of any time period.  My current ratings lead of 180 points looks wimpy compared to the 350-point lead 99of9 had back then.
« Last Edit: Jun 6th, 2006, 1:15am by Fritzlein » IP Logged

chessandgo
Forum Guru
*****



Arimaa player #1889

   


Gender: male
Posts: 1244
Re: Experimental new rating system
« Reply #14 on: Jun 6th, 2006, 9:12am »
Quote Quote Modify Modify

Great !!! I had the feeling that this forum had not been used for ages  Angry ... thanks for putting once more some life in it Fritz !
 
I see nothing but a big yellow line in there Wink
IP Logged

Pages: 1 2 3  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.