Author |
Topic: Experimental new rating system (Read 9445 times) |
|
aaaa
Forum Guru
Arimaa player #958
Posts: 768
|
|
Experimental new rating system
« on: May 26th, 2006, 1:55pm » |
Quote Modify
|
Mark Glickman, known for his rating system Glicko, which extended Elo such that it takes time into account, has designed a successor system which also takes into account the volatility of the strength of a player (meaning that one can forego with the fudge of imposing a minimum deviation). It is described here. I've adapted Glicko-2 for use in a "real-time" context and took a stab at optimizing the various parameters based on all the rated games in the database. Unfortunately, it has not been originally designed to handle the ultra-short "rating periods" of one second I gave it, making it prone to hanging in the iteration phase. Nevertheless, I didn't want people here to miss out on it, so below is a list of the 50 top-rated players according to a particularly customized version of it (apologies for the bad layout as I couldn't get the data working with the forum table markup). I'm particularly interested in getting queries of a statistical nature about the system as well as hearing what properties exactly is desired of it here. player rating (old style) rating deviation (old style) volatility Fritzlein 5.6630497 (2483.772535) 0.8068123002 (140.1576578) 0.0006376507628 99of9 4.910623754 (2353.062755) 0.832506 (144.6211108) 0.0006516585594 robinson 4.62855297 (2304.062039) 0.7831770235 (136.0517895) 0.0006228492358 Adanac 4.610491197 (2300.924388) 0.7379585114 (128.1965291) 0.00056411572 PMertens 4.360161883 (2257.43773) 0.832506 (144.6211108) 0.0008407439315 Ryan_Cable 4.326954398 (2251.668999) 0.832506 (144.6211108) 0.0004918684095 Belbo 4.27533983 (2242.702629) 0.7576357981 (131.6148241) 0.0006193388697 mouse 3.962690082 (2188.389803) 0.832506 (144.6211108) 0.0005526332836 Arimanator 3.78416112 (2157.376145) 0.832506 (144.6211108) 0.0008069304669 RonWeasley 3.697059358 (2142.245018) 0.832506 (144.6211108) 0.0005989948555 chessandgo 3.671875516 (2137.870137) 0.5040354009 (87.55992097) 0.0007090357629 omar 3.546825706 (2116.146759) 0.832506 (144.6211108) 0.0006258232938 naveed 3.451697291 (2099.62126) 0.832506 (144.6211108) 0.0007640425682 blue22 3.382977003 (2087.683322) 0.832506 (144.6211108) 0.0005354651612 bot_Bomb2005CC 3.170315207 (2050.740183) 0.7609260811 (132.1864048) 0.0005321464476 bot_Bomb2005Fast 3.064107257 (2032.289972) 0.8262308837 (143.5310114) 0.000611501966 bot_Bomb2005Blitz 3.053605699 (2030.465664) 0.6707376748 (116.5190732) 0.0009073338997 OLTI 3.03204056 (2026.719416) 0.832506 (144.6211108) 0.0005484800951 bot_Bomb2005P2 2.822907443 (1990.389271) 0.4867876375 (84.56367745) 0.0004840308518 thorin 2.767536776 (1980.7704) 0.832506 (144.6211108) 0.0005788629022 omarFast 2.726652212 (1973.668024) 0.832506 (144.6211108) 0.0006681596461 bot_speedy 2.682962807 (1966.078396) 0.832506 (144.6211108) 0.0007288659695 bleitner 2.610592744 (1953.506428) 0.832506 (144.6211108) 0.0005072821952 jdb 2.610499995 (1953.490316) 0.832506 (144.6211108) 0.0006070601943 bot_Clueless2005Fast 2.58310922 (1948.732051) 0.6668649056 (115.8463043) 0.0006654729265 megamau 2.541955565 (1941.582928) 0.832506 (144.6211108) 0.0006978433977 bot_lightning 2.473854617 (1929.752582) 0.832506 (144.6211108) 0.0006763171832 Swynndla 2.422299879 (1920.796606) 0.7948650666 (138.0822107) 0.0006315119355 frostlad 2.419039538 (1920.230227) 0.8071741155 (140.2205116) 0.0006132318648 BlackKnight 2.347565986 (1907.813998) 0.832506 (144.6211108) 0.0006623329595 bot_GnoBot2005Fast 2.303539877 (1900.16588) 0.7849751855 (136.3641623) 0.000675043096 bot_Clueless2005Blitz 2.2345132 (1888.174717) 0.7551311049 (131.1797143) 0.0007258793672 bot_Clueless2005P2 2.212829489 (1884.407871) 0.6895930611 (119.7945895) 0.0006311812553 bot_Clueless2005CC 2.168773741 (1876.754603) 0.8308199736 (144.328218) 0.0006241990032 bot_Arimaanator 2.090386741 (1863.137386) 0.8148788129 (141.5589546) 0.0004081767749 bot_Clueless2006P2 2.075735855 (1860.592266) 0.7943007282 (137.984175) 0.0007195330871 kamikazeking 2.011747429 (1849.476337) 0.7556039343 (131.2618531) 0.0005792767201 ytri 1.972060074 (1842.581938) 0.832506 (144.6211108) 0.0005580832465 filerank 1.970310098 (1842.277936) 0.832506 (144.6211108) 0.000569107061 haizhi 1.894086955 (1829.036619) 0.832506 (144.6211108) 0.0008102594244 Aamir 1.856510045 (1822.508841) 0.832506 (144.6211108) 0.0006060001632 bot_haizhi 1.702288469 (1795.717808) 0.832506 (144.6211108) 0.0006619388723 bot_Bomb2004CC 1.674421387 (1790.8768) 0.832506 (144.6211108) 0.000606956265 clauchau 1.633644449 (1783.79312) 0.832506 (144.6211108) 0.0005567039486 grey_0x2A 1.593474756 (1776.814929) 0.832506 (144.6211108) 0.000601434371 deselby 1.591926598 (1776.545986) 0.832506 (144.6211108) 0.0006369085893 CeeJay 1.578303073 (1774.179338) 0.832506 (144.6211108) 0.0007813484887 bot_Aamira2006Fast 1.574885954 (1773.585723) 0.7628650488 (132.523238) 0.0006310808374 bot_Clueless2006Fast 1.562548955 (1771.442567) 0.832506 (144.6211108) 0.0006612313097 bot_Loc2005Blitz 1.538297761 (1767.229703) 0.7531418141 (130.834139) 0.0006051832325
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Experimental new rating system
« Reply #1 on: May 26th, 2006, 3:45pm » |
Quote Modify
|
Hey, this rocks. I have the highest respect for Mark Glickman, and it's cool to see what numbers are produced by an implementation of the Glicko system. (Actually, I confess I liked Glicko but never read up on Glicko-2; an omission I shall shortly remedy.) I'm curious why blue22 is ranked so much lower and Belbo so much higher in your ratings than the official ratings. Maybe it's because Glicko doesn't like to move the ratings around as much? Belbo established a very high rating with tons of games, and has since dropped off his peak in the official ratings, but maybe Glicko considered him extremely firmly established and didn't let his rating move down as much. The rating deviation of 84 for Bomb2005P2 seems suspicious to me. That bot has played 812 games, but only 447 of those were rated. Why should it have a deviation so much lower than mine, when I've played 773 rated games? Anyway, the issue I am most concerned about is not an issue that Glickman has addressed at all, to the best of my knowledge. What troubles me is the non-transitivity of the ratings. You can see the non-transitivity in action all the time in Arimaa. Sometimes a newcomer will get stuck on BombP1 on the ladder, and lose thirty times in a row, driving their rating down to, say, 1200. Meanwhile a newcomer who figures out a technique for beating BombP1 might win thirty in a row and pump their rating to 1800. But the gap the between the two humans is not 600 points. They are each properly rated relative to BombP1, but improperly rated relative to each other. That is to say, the ratings are not transitive. I would love there to be some mechanism whereby a ton of games against a single opponent would have a reduced impact on one's rating, in order to mitigate the effects of non-transitivity. In my mind it seems roughly correct to weight games against a single opponent by the square root of the number of games so that, for example, 25 games against one opponent would have the same impact as one game each against five different opponents. But I recognize that reducing the weight of certain games is a kludge, and I wish that I could think of a more elegant way to deal with non-transitivity. I'd love to hear alternative suggestions. Non-transitivity is such a huge problem, though, that I don't think it can be ignored.
|
|
IP Logged |
|
|
|
aaaa
Forum Guru
Arimaa player #958
Posts: 768
|
|
Re: Experimental new rating system
« Reply #2 on: May 26th, 2006, 4:50pm » |
Quote Modify
|
on May 26th, 2006, 3:45pm, Fritzlein wrote:Hey, this rocks. I have the highest respect for Mark Glickman, and it's cool to see what numbers are produced by an implementation of the Glicko system. (Actually, I confess I liked Glicko but never read up on Glicko-2; an omission I shall shortly remedy.) |
| Once again I would like to point out that Glicko-2 was not originally intended to be applied on a game-by-game basis. Glicko was modified to do so for the Free Internet Chess Server with Glickman's knowledge and I was curious enough to find out if the same was possible for Glicko-2. on May 26th, 2006, 3:45pm, Fritzlein wrote: I'm curious why blue22 is ranked so much lower and Belbo so much higher in your ratings than the official ratings. Maybe it's because Glicko doesn't like to move the ratings around as much? Belbo established a very high rating with tons of games, and has since dropped off his peak in the official ratings, but maybe Glicko considered him extremely firmly established and didn't let his rating move down as much. |
| If you look at the fifth column, you can see that Belbo has been given a higher volatility than blue22. For some reason, the system thinks Belbo's performance is less consistent than blue22's (0.0006193388697 vs 0.0005354651612). on May 26th, 2006, 3:45pm, Fritzlein wrote: The rating deviation of 84 for Bomb2005P2 seems suspicious to me. That bot has played 812 games, but only 447 of those were rated. Why should it have a deviation so much lower than mine, when I've played 773 rated games? |
| Probably due to the large amount of bot-bot matches taken into account, maximizing the prediction power of the system has resulted in the rating deviation growing very fast if a player doesn't play in a while. Depending on one's volatility, it will take only about 20 days before one's rating deviation becomes the maximum again. I've already been experimenting with excluding bot-bot matches from consideration.
|
|
IP Logged |
|
|
|
Ryan_Cable
Forum Guru
Arimaa player #951
Gender:
Posts: 138
|
|
Re: Experimental new rating system
« Reply #3 on: May 26th, 2006, 10:20pm » |
Quote Modify
|
This does support my belief that our ratings are currently too compressed. Other than that, I'm not clear on what the advantage of this system is over our current system. Our current system is fairly easy to understand, and anyone can calculate the possible rating changes that would result from playing a given opponent. I would not want to give that up unless there is a substantial improvement in rating accuracy.
|
|
IP Logged |
|
|
|
aaaa
Forum Guru
Arimaa player #958
Posts: 768
|
|
Re: Experimental new rating system
« Reply #4 on: May 27th, 2006, 9:57am » |
Quote Modify
|
Here's the result again after the choice of parameters has been optimized for rated games including at least one human. Tell me if this one is more sane. player rating (old style) rating deviation (old style) volatility Fritzlein 5.482469551 (2452.402549) 0.8047089398 (139.7922667) 0.000538855377 99of9 4.73692475 (2322.888146) 0.8288227 (143.981256) 0.0005558081298 robinson 4.435887799 (2270.59267) 0.7746629858 (134.5727496) 0.0005238959024 Adanac 4.401616739 (2264.639176) 0.7049974879 (122.4706126) 0.0004748660898 PMertens 4.212564795 (2231.797489) 0.8288227 (143.981256) 0.0006714964567 Ryan_Cable 4.147041625 (2220.414948) 0.8288227 (143.981256) 0.0004148334762 Belbo 4.058155186 (2204.973791) 0.7336598637 (127.4497775) 0.0005050846755 mouse 3.839142456 (2166.927381) 0.8288227 (143.981256) 0.0004747277447 RonWeasley 3.555657933 (2117.681074) 0.8288227 (143.981256) 0.0005129970293 Arimanator 3.55381375 (2117.360706) 0.8288227 (143.981256) 0.0006905204892 omar 3.399174901 (2090.497186) 0.8288227 (143.981256) 0.0005327536732 chessandgo 3.381264438 (2087.385819) 0.4764257499 (82.76363314) 0.0006149328405 naveed 3.282873511 (2070.293564) 0.8288227 (143.981256) 0.0006009688416 blue22 3.212482055 (2058.065315) 0.7934868976 (137.8427982) 0.0004544315336 bot_Bomb2005CC 2.970377488 (2016.007442) 0.6806065302 (118.2334691) 0.000455604352 OLTI 2.94866222 (2012.235114) 0.8288227 (143.981256) 0.0004670586044 bot_Bomb2005Blitz 2.838656284 (1993.125125) 0.6355597776 (110.4080463) 0.0007068470581 bot_Bomb2005Fast 2.82836776 (1991.337825) 0.8188877809 (142.2553838) 0.0005046538155 bot_Bomb2005P2 2.677105717 (1965.060915) 0.4418521777 (76.75758823) 0.0004078946749 omarFast 2.648254447 (1960.048936) 0.8288227 (143.981256) 0.0005759785643 thorin 2.530981686 (1939.67657) 0.8288227 (143.981256) 0.0004996524011 bleitner 2.474353922 (1929.83932) 0.8288227 (143.981256) 0.0004316956614 bot_speedy 2.434224123 (1922.868059) 0.8288227 (143.981256) 0.0005830126789 jdb 2.427799948 (1921.752066) 0.8288227 (143.981256) 0.0005177591978 megamau 2.420580392 (1920.4979) 0.8288227 (143.981256) 0.0006022504608 bot_Clueless2005Fast 2.388905996 (1914.995494) 0.6231364636 (108.2498956) 0.0005694958969 bot_lightning 2.342006325 (1906.848186) 0.8288227 (143.981256) 0.0005799969339 frostlad 2.256872674 (1892.058956) 0.7260031127 (126.1196635) 0.0005255536504 Swynndla 2.210853008 (1884.064521) 0.7130838884 (123.8753643) 0.0005439431955 BlackKnight 2.1852398 (1879.615051) 0.8288227 (143.981256) 0.0005633310056 bot_GnoBot2005Fast 2.095862689 (1864.088655) 0.712771455 (123.8210891) 0.0005831434752 bot_Clueless2005P2 2.072370715 (1860.007681) 0.6243452409 (108.4598817) 0.0005399026625 bot_Clueless2005Blitz 2.06877531 (1859.383096) 0.7454407812 (129.4963325) 0.0006259613599 bot_Clueless2005CC 2.025541811 (1851.872667) 0.7638517253 (132.6946412) 0.0005358564737 bot_Arimaanator 1.960662587 (1840.601991) 0.7346765571 (127.6263952) 0.0003402609408 kamikazeking 1.882753526 (1827.0678) 0.7424657372 (128.9795144) 0.0004977506049 bot_Clueless2006P2 1.863160001 (1823.664056) 0.7153645161 (124.2715499) 0.0006227940527 ytri 1.849213454 (1821.241293) 0.8288227 (143.981256) 0.0004792753914 filerank 1.821360335 (1816.40271) 0.8288227 (143.981256) 0.0004887645442 Aamir 1.796505038 (1812.084903) 0.8288227 (143.981256) 0.0005229366749 haizhi 1.7666057 (1806.890856) 0.8288227 (143.981256) 0.0007029900686 bot_haizhi 1.581152235 (1774.674288) 0.8288227 (143.981256) 0.000576510305 bot_Bomb2004CC 1.521184847 (1764.256885) 0.8288227 (143.981256) 0.0005199753632 grey_0x2A 1.497306546 (1760.108799) 0.8288227 (143.981256) 0.0005169386559 clauchau 1.483899672 (1757.779786) 0.8288227 (143.981256) 0.0004723108198 deselby 1.456556559 (1753.029801) 0.8288227 (143.981256) 0.0005515469025 CeeJay 1.450091942 (1751.906782) 0.8288227 (143.981256) 0.0006684186778 6sense 1.445184387 (1751.054252) 0.8288227 (143.981256) 0.0005290605304 bot_Clueless2006Fast 1.440374412 (1750.218674) 0.7802854833 (135.5494775) 0.0005738607317 Paul 1.431647882 (1748.70272) 0.8288227 (143.981256) 0.0005009657358
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Experimental new rating system
« Reply #5 on: May 29th, 2006, 10:49am » |
Quote Modify
|
I decided to produce ratings based purely on rated human vs. human games (1553 of them so far), due to discussion in another thread. However, when I started to implement an idea we had discussed earlier to deal with non-transitivity, I got distracted by a different, older idea I had of making the ratings retrospective. All rating systems I know of use an "update and forget" method. After each game (or each rating period) players get new ratings, but how they reached their ratings is thrown away. They carry only their rating forward, and no history. Forgetting history might have some disadvantages, in my estimation. Suppose, for example, that chessandgo joins the server and for his first ten human games loses to me ten times. The rating system gives me hardly any credit for my wins. If then chessandgo beats a lot of other people to push his rating up, we know after the fact that he was pretty good all along, but that doesn't help me any. I only get points for beating a sub-1500 player even if he was increasing towards 1900 strength by the end our first ten games. So I created a new historical system (or one might say retrospective system) to counteract this trend. It remembers all the old game results, and if someone does better (or worse) in the future, it retrospectively adjusts their ratings up (or down) in the past, as well as retropectively adjusting the awards and penalities to their opponents. There's actually just one formula in the FRIAR system (Fritz's Retrospectively Iterated Arimaa Ratings): Your rating as of any game is the average of your rating from the game before and your rating from the game after, plus the award/penalty for the game itself. The game award/penalty is calculated from the same formula as standard Elo ratings with a k-factor of 15, i.e. 15 * (score - 1/(1+10^((Ropp - Rmine)/400))) If it is a player's first game, his "rating from the game before" is 1500. If it is a player's last game, then he just gets the game award tacked on to the previous game. To calculate the ratings to match this formula, I just iterated a bunch of times. The first interesting point is that the ratings are much more volatile than standard Elo ratings with a k-factor of 32. The second interesting point is that the ratings converge glacially slowly. I did 200 iterations overnight, but I suspect that the extreme ratings would push out an additional hundred points if only I could do 2000 iterations. Unfortunately, my code is dog-slow because all parameters are stored (and looked up) in MS Access tables. If someone did this properly with a C array and some pointers, it would probably take a second per iteration instead of a minute per as it took me. So here are the not-really-converged ratings according to FRIAR, based only on 1553 hvh rated games, and compared to the current server ratings: Name FRIAR Sever Fritzlein 2320 2309 Adanac 2245 2177 robinson 2230 2148 99of9 2212 2169 Belbo 2172 2002 PMertens 2115 2086 Ryan_Cable 2085 2130 chessandgo 2052 2015 omar 2050 1947 blue22 1989 2005 Swynndla 1989 1790 RonWeasley 1979 1941 BlackKnight 1918 1833 naveed 1876 1956 jdb 1875 1796 OLTI 1850 1958 Spunk 1750 1472 mouse 1728 2051 KT2006 1715 1657 frostlad 1715 1807 seanick 1702 1537 grey_0x2A 1692 1709 Arimanator 1689 2035 kamikazeking 1668 1751 thorin 1654 1895 megamau 1649 1788 Belbo has a significantly higher rating under FRIAR. This makes complete sense because he had a stellar result in last year's postal tourney, and has hardly played humans since then, except for the four games he has already won in this year's postal. His reduced server rating is due to losing a few to BombFast while training for the WC, and FRIAR ignores such games. Swynndla also gets a huge boost in FRIAR from beating tons of different human players, even though many were newcomers. He may therefore be somewhat overrated in FRIAR, but I don't mind seeing that the same strategy that works in Player of the Month also boosts the FRIAR rating. I'm pleased that FRIAR rates jdb and naveed about the same, despite their divergent server ratings. I had never heard of Spunk before, but he had a good record in the very early days of the server against omar, who later turned out to be very good. Then when the early bots came on-line, Spunk lost all his points to those bots, then left. The FRIAR rating for Spunk actually nearly matches his server rating from before the time he started to play bots. I'm sure seanick will be happy to note that FRIAR respects his record against human opponents and ignores his string of losses to tough bots. FRIAR gives a huge rating penalty to mouse relative to mouse's server rating. This reflects the fact that mouse has only played 12 rated games against humans ever. He has a 6-6 record against fairly tough opposition, but it simply isn't enough games to pull away from 1500 very far. Arimanator, in contrast, has played enough games against humans to establish a rating, but his 22-46 record doesn't put him very high in the FRIAR rankings. His high server rating is attributable largely to bot-bashing. I was surpised clauchau didn't make the list of top players, but after peaking at 1899, he dropped back to 1626. That goes to show what happens if you don't keep up with advances in Arimaa theory. Haizhi, filerank, ytri, and some other players with a decent server ranking are invisible to FRIAR because they have played no games or hardly any games against humans. Thorin will show up in the rankings much more clearly once the current postal tournament is over, I guarantee. On the whole, I don't think FRIAR ratings are any more accurate than the server ratings in terms of predicting future game outcomes. Neverthelss, I think FRIAR admirably meets the goals of a pure-human rating to go alongside the standard server rating.
|
« Last Edit: May 29th, 2006, 2:46pm by Fritzlein » |
IP Logged |
|
|
|
Ryan_Cable
Forum Guru
Arimaa player #951
Gender:
Posts: 138
|
|
Re: Experimental new rating system
« Reply #6 on: May 29th, 2006, 2:09pm » |
Quote Modify
|
I don’t understand how the retrospective iteration works. Are you assuming that everyone has constant skill over time? That seems like a particularly bad idea. I am pleasantly surprised to see how high my HvH rating is. I thought I was significantly more overrated than that.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Experimental new rating system
« Reply #7 on: May 29th, 2006, 2:37pm » |
Quote Modify
|
on May 29th, 2006, 2:09pm, Ryan_Cable wrote:I don’t understand how the retrospective iteration works. Are you assuming that everyone has constant skill over time? |
| No, no, I'm not holding skill constant over time. Your rating at a given time is influenced most by the games very near it, and less and less by games that far before it or far after it. So your rating at the time of your second game is hardly influenced at all by whether your hundredth game was a win or a loss. To each player of each game, I assign a rating that is supposed to represent his skill at the time of that game. The assumption is that his skill at that time will be approximately the average of his skill the game before and the game after. Take my last three games, for example: 32240 Ryan_Cable vs. Fritzlein 32276 Fritzlein vs. chessandgo 32282 Fritzlein vs. Swynndla As part of my iterative pass through the ratings, I want to re-calculate how strong I was when I played game 32276. I look ahead and see I was rated 2310 in game 32282, but only rated 2302 in game 32240. My rating should be near the average of 2306. I beat chessandgo who was rated 2052, So I recalculate my rating in game 32276 as 2306 + 15*(1-1/(1+10^((2052-2306)/400) = 2308.8221 When the ratings stabilize after many many iterations, each player's rating in each game will be exactly equal to the average of his ratings before and after, plus the bonus (penalty) for winning (losing) the game in question. This list I gave was only the ratings of each player at the end of the line; I apparently peaked about 150 points higher than my final rating. Long winning streaks or losing streaks will cause your rating to whip around even more in the FRIAR system than in the current server system. There is probably a much cleverer way to reach convergence than by making pass after pass of setting each rating in each game to what it would have been given the other ratings of the previous iteration. My coding ability was only adequate for a simplistic solution that doesn't run fast enough to converge in a reasonable amount of time. In C on a fast computer, however, the simplistic iteration might be adequate.
|
« Last Edit: May 29th, 2006, 2:51pm by Fritzlein » |
IP Logged |
|
|
|
chessandgo
Forum Guru
Arimaa player #1889
Gender:
Posts: 1244
|
|
Re: Experimental new rating system
« Reply #8 on: May 29th, 2006, 5:50pm » |
Quote Modify
|
on May 29th, 2006, 10:49am, Fritzlein wrote: Suppose, for example, that chessandgo joins the server and ... |
| I'm fortunate not to have you as a math teacher : let chessandgo and BlackKnight be real numbers, then chessandgo^2 + Blacknight = ... it would be really harder to write down equations
|
|
IP Logged |
|
|
|
seanick
Forum Guru
SeaNICK
Gender:
Posts: 97
|
|
Re: Experimental new rating system
« Reply #9 on: May 31st, 2006, 1:21am » |
Quote Modify
|
Yeah, I am all for this new rating system, heh heh... what about something that kept track of time taken? would the best players games take longer per move relative to the time scale, than less highly rated players? does the line go up or down in terms of % of available time per move, when playing someone of equal rating? Are those numbers easily mineable or are they somewhat obscured within various sources? I am not a linux user but have begun to study some things analytically with code on win32. so such things would interest me except for the problem of having to use linux. I wouldn't mind, but ... my employer would have a few reservations about the idea.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Experimental new rating system
« Reply #10 on: May 31st, 2006, 9:18am » |
Quote Modify
|
One problem with the server ratings (which FRIAR doesn't address in the slightest) is that different humans seem to benefit differently from extra thinking time. Some players, notably Belbo and Omar, are tigers at a slow time control or postally, but tend to fall apart in fast games. Other players, most notably kamikazeking and PMertens, can play great moves even at blitz speeds, but don't seem to get very much better given more time. (Actually, PMertens doesn't even use all of his time given more time.) In my opinion it isn't a good idea to say the players who can move faster are the better players. There are different kinds of skill. I'd rather say that some players are good at blitz and other players are good at postal games. In another thread we discussed having ratings reflect time control. http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=talk;action=display;nu m=1103741634;start=0#0 Note that back then the fastest time control available was 30 seconds per move, and it was already an issue!
|
|
IP Logged |
|
|
|
aaaa
Forum Guru
Arimaa player #958
Posts: 768
|
|
Re: Experimental new rating system
« Reply #11 on: May 31st, 2006, 12:33pm » |
Quote Modify
|
You might be interested in this article, where it is proposed that games at different time controls are to be given different weights.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Experimental new rating system
« Reply #12 on: Jun 6th, 2006, 12:11am » |
Quote Modify
|
I decided that a k-factor of 15 was making the FRIAR ratings way too volatile; I lowered it to 10. I ran the numbers again, this time letting them converge a bit longer. Also I added in last week's games, 23 more. (Sorry, chessandgo, your four big wins from Sunday and Monday aren't there yet; you would surely be over 2100 with them included.) The FRIAR top 25, with number of games played: rate games username 2417 215 Fritzlein 2236 201 99of9 2228 100 Adanac 2194 265 PMertens 2182 201 robinson 2149 121 Belbo 2086 111 Ryan_Cable 2034 116 omar 2031 126 jdb 2015 67 chessandgo 2013 103 Swynndla 1963 73 blue22 1961 19 RonWeasley 1912 223 naveed 1897 79 OLTI 1894 18 BlackKnight 1765 66 kamikazeking 1742 22 frostlad 1714 13 Spunk 1701 68 Arimanator 1691 12 mouse 1680 23 grey_0x2A 1645 16 KT2006 1644 49 megamau 1639 43 clauchau
|
« Last Edit: Jun 6th, 2006, 1:35pm by Fritzlein » |
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Experimental new rating system
« Reply #13 on: Jun 6th, 2006, 1:14am » |
Quote Modify
|
And now the fun part: a graph of the historical FRIAR ratings. (Only the top 7 by number of hvh games get personalized colors; sorry!) Note how volatile the ratings are even with the k-factor reduced to 10. On the official server ratings I retained the top ranking even when I tied for fourth in the 2006 World Championship, but the FRIAR ratings have me dipping below robinson, Adanac, and PMertens, i.e. all three of the WC medalists. At the same time that FRIAR ratings are volatile, note that people have to play a significant number of games to move far from 1500. In this sense the volatility of FRIAR is opposite to that of the server. On the server your rating changes a lot at first, and slowly later. With FRIAR your rating changes slowly until you have played fifteen games or so, but later on winning streaks (or losing streaks) have a bigger effect than they do on the server. I note that in August 2004, around the time I joined the server, FRIAR considered 99of9 to be the most dominant player of any time period. My current ratings lead of 180 points looks wimpy compared to the 350-point lead 99of9 had back then.
|
« Last Edit: Jun 6th, 2006, 1:15am by Fritzlein » |
IP Logged |
|
|
|
chessandgo
Forum Guru
Arimaa player #1889
Gender:
Posts: 1244
|
|
Re: Experimental new rating system
« Reply #14 on: Jun 6th, 2006, 9:12am » |
Quote Modify
|
Great !!! I had the feeling that this forum had not been used for ages ... thanks for putting once more some life in it Fritz ! I see nothing but a big yellow line in there
|
|
IP Logged |
|
|
|
|