Author |
Topic: Arimaa rating deflation (Read 30022 times) |
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Arimaa rating deflation
« Reply #150 on: Aug 3rd, 2009, 7:40pm » |
Quote Modify
|
Well, it turns out that our efforts to combat rating inflation have worked very well, probably too well, to the point that we will now overshoot into severe rating deflation unless we take corrective action. To detect whether rating inflation/deflation is occurring, I calculated the average rating of each bot that Omar runs (no developer bots) for each calendar year, considering a bot only if it played at least thirty rated games for the year. A partial table of the results is below: Bot \ Year . . 2003 2004 2005 2006 2007 2008 2009 ------------------- ---- ---- ---- ---- ---- ---- ---- bot_Bomb2005Blitz . . . 1876 1856 1931 2038 2012 bot_Bomb2005CC . . . 1774 1858 1916 1903 1876 bot_Bomb2005Fast . . . 1827 1826 1930 1877 1901 bot_GnoBot2005Blitz . . 1652 1747 1841 1857 1728 bot_GnoBot2005Fast. . . 1541 1724 1734 1734 1664 bot_Arimaazilla . 1516 1419 1449 1451 1502 1505 1419 bot_Bomb2005P1 . . . 1488 1632 1715 1649 1517 bot_Bomb2005P2 . . . 1752 1806 1887 1864 1824 bot_GnoBot2005P1 . . . 1382 1262 1392 1311 1244 bot_GnoBot2005P2 . . . 1552 1608 1651 1636 1545 Taking all the bots Omar runs into consideration, not just the above bots, and dividing it between fixed-performance bots and variable performance bots, I get the following average year-over-year rating changes: fixed-performance Year Change ---- ------ 2005-6 + 5 2006-7 +59 2007-8 -27 2008-9 -46 variable-performance Year Change ---- ------ 2005-6 +81 2006-7 +50 2007-8 +32 2008-9 -21 Now, it is no problem if variable-performance bots have gained an average of 142 rating points in the past four years. That sounds perfectly consistent with increased strength based purely on better hardware. In fact, an increase of about 36 points per year is consistent with other estimates of the value of faster hardware. Also, it is no problem that fixed performance bots are now rated nine points lower, on average, than they were in 2005. We want the ratings of fixed-performance bots to remain basically constant. We inflated throughout 2006, 2007, and into 2008, but that was wiped out by deflation in the latter part of 2008 and the first half of 2009. We are back to normal, in a manner of speaking. The difficulty is that we are still rapidly deflating. The changes we made (anchoring ArimaaScoreP1's rating to 1000 and dropping newcomers to 1300) have not yet run their course. We are not at equilibrium, and unless we make changes now, I predict we will far overshoot on the deflationary side. Since ratings are near a historically reasonable level now, I recommend that we immediately increase the ratings of newcomers to 1400. Probably even that will leave us with some deflation, but maybe not, and it seems reasonable to try. We can check in again at the end of the year. The alternative, I believe, is to wait until we are sure that the system has overly deflated, and then have to take corrective action to pump rating points back into it. That's silly. Rather than having swings up and down, I'd prefer to have some kind of stabilization, so that a 2000 rating in any year means about the same thing as a 2000 rating in any other year. Just my $0.02
|
« Last Edit: Aug 4th, 2009, 6:53am by Fritzlein » |
IP Logged |
|
|
|
Arimabuff
Forum Guru
Arimaa player #2764
Gender:
Posts: 589
|
|
Re: Arimaa rating deflation
« Reply #151 on: Aug 4th, 2009, 4:41am » |
Quote Modify
|
on Aug 3rd, 2009, 7:40pm, Fritzlein wrote: Not counting inflation.
|
|
IP Logged |
|
|
|
mistre
Forum Guru
Gender:
Posts: 553
|
|
Re: Arimaa rating deflation
« Reply #152 on: Aug 4th, 2009, 9:05am » |
Quote Modify
|
on Aug 3rd, 2009, 7:40pm, Fritzlein wrote: Since ratings are near a historically reasonable level now, I recommend that we immediately increase the ratings of newcomers to 1400. Probably even that will leave us with some deflation, but maybe not, and it seems reasonable to try. We can check in again at the end of the year. |
| I agree with Karl. I have noticed the deflation and if allowed to continue it will only increase. Starting newcomers at 1400 seems sensible.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Arimaa rating deflation
« Reply #153 on: Aug 4th, 2009, 9:39am » |
Quote Modify
|
on Aug 4th, 2009, 4:41am, Arimabuff wrote: Not counting inflation. |
| Hehe, since yesterday it has become my $0.01999
|
|
IP Logged |
|
|
|
omar
Forum Guru
Arimaa player #2
Gender:
Posts: 1003
|
|
Re: Arimaa rating deflation
« Reply #154 on: Aug 6th, 2009, 1:47pm » |
Quote Modify
|
on Aug 3rd, 2009, 7:40pm, Fritzlein wrote:Well, it turns out that our efforts to combat rating inflation have worked very well, probably too well, to the point that we will now overshoot into severe rating deflation unless we take corrective action. |
| No wonder my ratings have been going down OK I'll change the initial ratings of new players to 1400. Is there any way to know if that new value will be right or will we have to change it again?
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Arimaa rating deflation
« Reply #155 on: Aug 6th, 2009, 4:09pm » |
Quote Modify
|
on Aug 6th, 2009, 1:47pm, omar wrote:OK I'll change the initial ratings of new players to 1400. Is there any way to know if that new value will be right or will we have to change it again? |
| I think we might have to change it again. The problem is that when we decided to combat rating inflation, we instituted two deflationary measures at the same time. Lowering newcomer ratings from 1500 to 1300 was deflationary for obvious reasons, but the second change of fixing ArimaaScoreP1's rating to 1000 was also deflationary for less obvious reasons. Lots of new players lose their first game because they are unclear on the concept. It used to be that whenever a new player came in and lost a game, he gave points to ArimaaScoreP1 that stayed in the system, but now those lost points disappear into thin air because ArimaaScoreP1's rating is fixed. Yes, some people also gain points from thin air by beating ArimaaScoreP1, but since they gain 18 for winning and lose 102 for losing, the net effect is negative. Another way of looking at it is that we have put ArimaaScoreP1 outside of the system by fixing its rating. People don't actually "enter the system" until after they have beaten ArimaaScoreP1. Because of all the losses to ArimaaScoreP1, the average rating of people entering the system is actually even lower than 1300. Our two changes at once were obviously an over-correction, but I'm not sure what the ideal middle ground is. Even with the change of increasing the starting rating up to 1400, we will still have inflationary and deflationary forces competing to make a balance. Inflation will still be caused by newcomers losing a few games and leaving with a rating lower than the rating with which they entered the system. Deflation will still be caused by people entering the system and working up the ladder until they have a higher rating than they had when they entered the system. Which will weigh heavier, lots of small sources of points, or a few large drains of rating points? I don't know, and it depends on user behavior. Even if we modeled all the past data to determine the starting rating which gave a perfect balance, user behavior might change. For example, when the boxed set comes out, we might get a higher ratio of small points-contributors who soon leave, or maybe we will get a higher ratio of dedicated players who hang around and deflate the system. Even if we are perfectly calibrated now, the balance could shift in the future. I suggest we re-evaluate in another year to see whether the ratings of fixed-performance bots have leveled off, or are still declining, or have bounced back up. If the ratings have basically leveled, then we can stand pat. My hunch, however, is that the ratings will still be declining. If that is true, we may want to pop the newcomer ratings up to 1450, or even all the way to 1500. In other words, it may have been that fixing ArimaaScoreP1's rating to 1000 was all the anti-inflationary medicine we needed, and lowering newcomer ratings in addition was pure overkill. If you aren't satisfied with approximate stabilization adjustments every year or two, you could take my measure of inflation, make the automatic measurement once a day, and accordingly adjust the ratings of newcomers for the following day. I would recommend against it, though, because the daily measure of inflation could fluctuate wildly, causing our countermeasures to similarly fluctuate wildly and constantly over-correct even if the balance is approximately correct.
|
« Last Edit: Aug 6th, 2009, 4:12pm by Fritzlein » |
IP Logged |
|
|
|
omar
Forum Guru
Arimaa player #2
Gender:
Posts: 1003
|
|
Re: Arimaa rating deflation
« Reply #156 on: Aug 7th, 2009, 7:27am » |
Quote Modify
|
Thanks for that explanation Karl. I changed the new player initial ratings to 1400.
|
|
IP Logged |
|
|
|
mistre
Forum Guru
Gender:
Posts: 553
|
|
Re: Arimaa rating deflation
« Reply #157 on: Aug 8th, 2009, 7:00am » |
Quote Modify
|
I don't want to reopen a closed discussion, but how was it decided that ArimaaScorep1 should be fixed to 1000? Why not 1100? Was there a methodology to this number or was it arbitrarily decided like the initial start ratings of new members?
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Arimaa rating deflation
« Reply #158 on: Aug 8th, 2009, 8:59am » |
Quote Modify
|
The rating of 1000 for ArimaaScoreP1 was somewhat, but not entirely arbitrary. The logic for choosing the fixed point was that if we are going to try to keep ratings stable, why keep them stable around an arbitrary point? Why not instead keep them stable relative to point with absolute meaning? The most natural, meaningful fixed point seemed to be for a random mover to have a rating of zero. It makes sense that an entity with complete knowledge of the rules, but trying neither to win nor to lose, would have a rating that is neither positive nor negative. A negative rating would then correspond to actively trying to lose, and positive rating would correspond to actively trying to win. The difficulty in anchoring the rating system on a random mover is that its skill level is so far from our own that we have difficulty estimating how far away it is. The rating model is roughly accurate for closely matched opponents, but the greater the gap in skill, the worse the approximation becomes. Some intuitive guesses were around a 2000 to 3000 point gap between a random mover and ShallowBlue. It turns out, however, that random play is not so horrible. Choosing a move at random is likely to advance a rabbit, which is a useful thing to do. In order to estimate how bad a random mover is, clauchau created a ladder of bots with well-defined bits of knowledge, e.g. try to capture a piece, try to advance rabbits, etc. Clauchau and 99of9 let this bot ladder play against itself, and the results are in page 6 of this thread. The top of the ladder was the bot clauchau called M+S-S, which earned a rating of 1074 relative to the random mover having a rating of zero, according to 99of9's calculations. 'M' stands for generating all possible moves and selecting the best. '+S' stands for maximizing your own score according to formerly-official Arimaa score function. '-S' stands for minimizing your opponent's score. At first I interpreted this to mean that 'M+S-S' was another name for ArimaaScoreP1, and therefore that relative to the fixed point of random mover having a zero rating, ArimaaScoreP1 should have a rating of 1074. It turned out, however, that M+S-S is actually stronger than ArimaaScoreP1, because ArimaaScoreP1 is equally concerned with maximizing its own score and minimizing the opponent's score, whereas M+S-S first maximizes its own score, and only minimizes the opponents score as an afterthought to break ties between the set of moves which maximize the mover's score. M+S-S beat ArimaaScoreP1 about 65% of the time according to clauchau at the bottom page 9 of this thread. According to the rating formula, that would mean that ArimaaScoreP1 is 108 points worse than M+S-S, i.e. approximate a rating of 966 relative to a random mover having a rating of zero. So, according to the best information we have, and insofar as the decision is not completely arbitrary, ArimaaScoreP1 should be fixed at 966 rather than at 1000. However, this round-off error is overwhelmed by so many other considerations that it is totally insignificant. First (and least important), the playouts that set the scale were random. If we ran the experiment again, we would get a different value for the rating of M+S-S. Second, and critically important, we could get any answer we wanted by choosing a different ladder of bots between random mover and M+S-S. Arimaa ratings are not transitive. They are only meaningful against the exact pool of players you have competed against. No matter how accurately you measure the relative playing strengths in a given pool of players, say with millions of plays, those relative ratings would change every time a player is added to or subtracted from that pool. I am convinced that if we wanted to skew the results, we could devise a different ladder to prove that M+S-S should have a rating over 2000. Third, and most important of all, even if we managed to anchor ArimaaScoreP1's rating at the "perfect" distance above the rating of a random mover, that would not insure that anyone else's rating would drift toward a perfect distance above the random mover. Again, ratings are not transitive, so fixing a bot rating is either inflationary or deflationary according to human behavior. If we all banded together to incessantly defeat ArimaaScoreP1 by rote, we could inflate our own ratings without bound. The reality is actually the opposite; since most people who can beat ArimaaScoreP1 stop playing it, the fixed rating has a deflationary effect. But the mere fact that human behavior determines whether ArimaaScoreP1 pumps points into the system or draws points out proves that it isn't calibrating the rest of the ratings relative to random mover. My personal opinion is that anchoring the rating system relative to random mover is so futile, it should play no part in our rating system decisions. A vastly more useful rule of thumb would be to try to make it comparable to the chess scale, where an average club player is rated 1500 and an average tournament player is rated 2000. It actually benefits Arimaa to have ratings similar to chess ratings so that the scale is familiar to outsiders. The "anchored at zero" concept is a mathematical invention of ours that doesn't correspond to any outsider intuition. An alternative rule of thumb would be that whatever system we happen to have chosen, let's keep things approximately constant. It would be annoying to have a discontinuity in the history of ratings at any point, making past ratings not comparable to future ratings. Luckily for all, it seems that all three objectives are essentially commensurate. The scale we happened to have chosen is roughly comparable to the chess scale, so by keeping things stable as they are, we are also keeping things in line with outsider intuition. In an even greater stroke of luck, this scale happens to be approximately in line with a rating of zero for a random mover. No, the correspondence isn't exact, but our ability to measure is so clouded by non-transitivity that we are within the limits of any meaningful comparison anyway. So it turns out that, more or less, we live in the best of all possible worlds.
|
« Last Edit: Aug 8th, 2009, 9:04am by Fritzlein » |
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Arimaa rating deflation
« Reply #159 on: Feb 9th, 2010, 1:19am » |
Quote Modify
|
on Aug 3rd, 2009, 7:40pm, Fritzlein wrote:fixed-performance Year Change ---- ------ 2005-6 + 5 2006-7 +59 2007-8 -27 2008-9 -46 variable-performance Year Change ---- ------ 2005-6 +81 2006-7 +50 2007-8 +32 2008-9 -21 |
| on Aug 6th, 2009, 4:09pm, Fritzlein wrote:My hunch, however, is that the ratings will still be declining. If that is true, we may want to pop the newcomer ratings up to 1450, or even all the way to 1500. In other words, it may have been that fixing ArimaaScoreP1's rating to 1000 was all the anti-inflationary medicine we needed, and lowering newcomer ratings in addition was pure overkill. |
| The above-quoted statistics were based on a partial year 2009. When I redo it for all of 2009, I get fixed-performance Year Change ---- ------ 2005-6 + 5 2006-7 +59 2007-8 -27 2008-9 -61 variable-performance Year Change ---- ------ 2005-6 +81 2006-7 +50 2007-8 +32 2008-9 -48 In other words, the deflation had not yet fully run its course when we made the mid-2009 correction. Even after we bumped starting players from 1300 up to 1400, there was a bit more deflation working its way through the system. However, I don't recommend any more corrective action at present. My gut feeling is that the deflation has now fully worked its way into the gameroom ratings and we have more or less stabilized. If we do nothing but take the same measurements at the end of 2010, I predict the averages will have drifted only slightly down. If we make any change, though, it should probably be to the upside, for example by starting new players at 1450. Totaling the differences from 2005 to 2009 shows that fixed performance bots have drifted down 24 points total, so perhaps we have over-corrected for several years of steady inflation. However, each year-on-year change had a different set of bots for comparison. Taking only the five fixed-performance bots which have been continuously present with floating ratings, namely GnoBot2005P1, GnoBot2005P2, Bomb2005P1, Bomb2005P2, and Arimaazilla, their average rating was actually 2 points higher in 2009 than in 2005. Therefore, I think we're back to approximately a normal level, and if we stabilize near here life is fine. Well see again at the end of 2010 whether my intuitions have worked out.
|
« Last Edit: Feb 9th, 2010, 11:18am by Fritzlein » |
IP Logged |
|
|
|
omar
Forum Guru
Arimaa player #2
Gender:
Posts: 1003
|
|
Re: Arimaa rating deflation
« Reply #160 on: Feb 10th, 2010, 2:39pm » |
Quote Modify
|
Thanks for posting this Karl. It's good to know that the gameroom ratings are not deflating as much now. Although now that we have WHR ratings to use for seeding tournaments, I am less concerned about the integretry of the gameroom ratings.
|
|
IP Logged |
|
|
|
zhanrnl
Forum Full Member
Arimaa player #4971
Gender:
Posts: 12
|
|
Re: Arimaa rating deflation
« Reply #161 on: Feb 10th, 2010, 8:20pm » |
Quote Modify
|
Wow, just read through the entire topic: very interesting! It did strike me as odd that ArimaaScoreP1 was pinned at 1000, but now I see there was a very good reason behind it.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Arimaa rating deflation
« Reply #162 on: Jun 15th, 2010, 1:50pm » |
Quote Modify
|
on Feb 9th, 2010, 1:19am, Fritzlein wrote:However, I don't recommend any more corrective action at present. My gut feeling is that the deflation has now fully worked its way into the gameroom ratings and we have more or less stabilized. |
| Bot ratings through the first five months of 2010 suggest that the system has indeed stabilized, and no further deflation is occurring. fixed-performance Year Change ---- ------ 2005-6 + 5 2006-7 +59 2007-8 -27 2008-9 -61 2009-10 +24 variable-performance Year Change ---- ------ 2005-6 +81 2006-7 +50 2007-8 +32 2008-9 -48 2009-10 +2 One might ask how fixed-performance bots gained 22 rating points on variable performance bots as a group. One could opine that the server is overloaded, which drags down the performance of variable bots, but my hunch is that the data is merely a statistical fluke. Note that between 2008 and 2009, the fixed-performance bots as a group lost 13 rating points relative to the variable-performance bots, which should not have happened, because the server hardware did not improve between years. Therefore that anomaly is merely reversing at present, a sign of measurement noise. There are lots of reasons not to trust individual game room ratings, but at least at a marco level there is rough stability. A rating of 1700 now means approximately what a rating of 1700 meant five years ago. On the whole, we have neither inflation nor deflation. This, in turn, means that the increase in top ratings very probably reflects an underlying reality of improved skill. Anyone who is rated over 2000 today, if they could take a time machine back, would be a contender for the 2005 World Championship. The fact that chessandgo is rated over 2600 today merely shows how far we have advanced, that is to say, how high the skill pyramid has been built up.
|
|
IP Logged |
|
|
|
omar
Forum Guru
Arimaa player #2
Gender:
Posts: 1003
|
|
Re: Arimaa rating deflation
« Reply #163 on: Jun 16th, 2010, 9:50am » |
Quote Modify
|
Thanks for looking at this Karl. Wow, looks like we've finally stabilized the rating system. It's good to have some fixed performance bots that play at the level of beginner human players.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Arimaa rating deflation
« Reply #164 on: Feb 3rd, 2011, 11:15pm » |
Quote Modify
|
on Feb 9th, 2010, 1:19am, Fritzlein wrote:My gut feeling is that the deflation has now fully worked its way into the gameroom ratings and we have more or less stabilized. If we do nothing but take the same measurements at the end of 2010, I predict the averages will have drifted only slightly down. If we make any change, though, it should probably be to the upside, for example by starting new players at 1450. [...] Well see again at the end of 2010 whether my intuitions have worked out. |
| Well, now we have data for another year. Bot \ Year . . 2003 2004 2005 2006 2007 2008 2009 2010 ------------------- ---- ---- ---- ---- ---- ---- ---- ---- bot_Bomb2005Blitz . . . 1876 1856 1931 2038 1950 1900 bot_Bomb2005CC . . . 1774 1858 1916 1903 1876 1886 bot_Bomb2005Fast . . . 1827 1826 1930 1877 1871 1878 bot_GnoBot2005Blitz . . 1652 1747 1841 1857 1734 1732 bot_GnoBot2005Fast. . . 1541 1724 1734 1734 1676 1695 bot_Arimaazilla . 1516 1419 1449 1451 1502 1505 1433 1488 bot_Bomb2005P1 . . . 1488 1632 1715 1649 1542 1558 bot_Bomb2005P2 . . . 1752 1806 1887 1864 1787 1822 bot_GnoBot2005P1 . . . 1382 1262 1392 1311 1274 1316 bot_GnoBot2005P2 . . . 1552 1608 1651 1636 1577 1593 From this small sample, it looks like bot ratings have bounced back a little in 2010 from the lows of 2009. I speculated that we might have to bump up the starting rating from 1400 to 1450 to combat lingering deflation, but I was wrong. Instead there might have been slight re-inflation. However, although 2010 was a bit above the reference year of 2005, it was still well below the inflationary peak of 2007. One could perhaps argue for lowering the rating of newcomers to 1350, but my new gut feeling is that the ratings are close enough to stable as makes no odds. I recommend we let it ride and measure again at the end of 2011 to see if much of anything has changed.
|
|
IP Logged |
|
|
|
|