Welcome, Guest. Please Login or Register.
Apr 29th, 2024, 2:29pm

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « Arimaa rating deflation »


   Arimaa Forum
   Arimaa
   General Discussion
(Moderator: supersamu)
   Arimaa rating deflation
« Previous topic | Next topic »
Pages: 1 ... 9 10 11 12  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: Arimaa rating deflation  (Read 30022 times)
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Arimaa rating deflation
« Reply #150 on: Aug 3rd, 2009, 7:40pm »
Quote Quote Modify Modify

Well, it turns out that our efforts to combat rating inflation have worked very well, probably too well, to the point that we will now overshoot into severe rating deflation unless we take corrective action.
 
To detect whether rating inflation/deflation is occurring, I calculated the average rating of each bot that Omar runs (no developer bots) for each calendar year, considering a bot only if it played at least thirty rated games for the year.  A partial table of the results is below:
 
Bot \ Year  .     .  2003  2004  2005  2006  2007  2008  2009
-------------------  ----  ----  ----  ----  ----  ----  ----
bot_Bomb2005Blitz .     .     .  1876  1856  1931  2038  2012
bot_Bomb2005CC    .     .     .  1774  1858  1916  1903  1876
bot_Bomb2005Fast  .     .     .  1827  1826  1930  1877  1901
bot_GnoBot2005Blitz     .     .  1652  1747  1841  1857  1728
bot_GnoBot2005Fast.     .     .  1541  1724  1734  1734  1664
bot_Arimaazilla   .  1516  1419  1449  1451  1502  1505  1419
bot_Bomb2005P1    .     .     .  1488  1632  1715  1649  1517
bot_Bomb2005P2    .     .     .  1752  1806  1887  1864  1824
bot_GnoBot2005P1  .     .     .  1382  1262  1392  1311  1244
bot_GnoBot2005P2  .     .     .  1552  1608  1651  1636  1545

 
Taking all the bots Omar runs into consideration, not just the above bots, and dividing it between fixed-performance bots and variable performance bots, I get the following average year-over-year rating changes:
 
fixed-performance
Year    Change
----    ------
2005-6  + 5
2006-7  +59
2007-8  -27
2008-9  -46

 
variable-performance
Year    Change
----    ------
2005-6  +81
2006-7  +50
2007-8  +32
2008-9  -21

 
Now, it is no problem if variable-performance bots have gained an average of 142 rating points in the past four years.  That sounds perfectly consistent with increased strength based purely on better hardware.  In fact, an increase of about 36 points per year is consistent with other estimates of the value of faster hardware.
 
Also, it is no problem that fixed performance bots are now rated nine points lower, on average, than they were in 2005.  We want the ratings of fixed-performance bots to remain basically constant.  We inflated throughout 2006, 2007, and into 2008, but that was wiped out by deflation  in the latter part of 2008 and the first half of 2009.  We are back to normal, in a manner of speaking.
 
The difficulty is that we are still rapidly deflating.  The changes we made (anchoring ArimaaScoreP1's rating to 1000 and dropping newcomers to 1300) have not yet run their course.  We are not at equilibrium, and unless we make changes now, I predict we will far overshoot on the deflationary side.
 
Since ratings are near a historically reasonable level now, I recommend that we immediately increase the ratings of newcomers to 1400.  Probably even that will leave us with some deflation, but maybe not, and it seems reasonable to try.  We can check in again at the end of the year.
 
The alternative, I believe, is to wait until we are sure that the system has overly deflated, and then have to take corrective action to pump rating points back into it.  That's silly.  Rather than having swings up and down, I'd prefer to have some kind of stabilization, so that a 2000 rating in any year means about the same thing as a 2000 rating in any other year.
 
Just my $0.02
« Last Edit: Aug 4th, 2009, 6:53am by Fritzlein » IP Logged

Arimabuff
Forum Guru
*****



Arimaa player #2764

   


Gender: male
Posts: 589
Re: Arimaa rating deflation
« Reply #151 on: Aug 4th, 2009, 4:41am »
Quote Quote Modify Modify

on Aug 3rd, 2009, 7:40pm, Fritzlein wrote:
...Just my $0.02

Not counting inflation.  Grin
IP Logged
mistre
Forum Guru
*****





   


Gender: male
Posts: 553
Re: Arimaa rating deflation
« Reply #152 on: Aug 4th, 2009, 9:05am »
Quote Quote Modify Modify

on Aug 3rd, 2009, 7:40pm, Fritzlein wrote:

 
Since ratings are near a historically reasonable level now, I recommend that we immediately increase the ratings of newcomers to 1400.  Probably even that will leave us with some deflation, but maybe not, and it seems reasonable to try.  We can check in again at the end of the year.
 

 
I agree with Karl.  I have noticed the deflation and if allowed to continue it will only increase.  Starting newcomers at 1400 seems sensible.
 
IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Arimaa rating deflation
« Reply #153 on: Aug 4th, 2009, 9:39am »
Quote Quote Modify Modify

on Aug 4th, 2009, 4:41am, Arimabuff wrote:

Not counting inflation.  Grin

Hehe, since yesterday it has become my $0.01999
IP Logged

omar
Forum Guru
*****



Arimaa player #2

   


Gender: male
Posts: 1003
Re: Arimaa rating deflation
« Reply #154 on: Aug 6th, 2009, 1:47pm »
Quote Quote Modify Modify

on Aug 3rd, 2009, 7:40pm, Fritzlein wrote:
Well, it turns out that our efforts to combat rating inflation have worked very well, probably too well, to the point that we will now overshoot into severe rating deflation unless we take corrective action.

No wonder my ratings have been going down Smiley
 
OK I'll change the initial ratings of new players to 1400. Is there any way to know if that new value will be right or will we have to change it again?
 
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Arimaa rating deflation
« Reply #155 on: Aug 6th, 2009, 4:09pm »
Quote Quote Modify Modify

on Aug 6th, 2009, 1:47pm, omar wrote:
OK I'll change the initial ratings of new players to 1400. Is there any way to know if that new value will be right or will we have to change it again?

I think we might have to change it again.  The problem is that when we decided to combat rating inflation, we instituted two deflationary measures at the same time.  Lowering newcomer ratings from 1500 to 1300 was deflationary for obvious reasons, but the second change of fixing ArimaaScoreP1's rating to 1000 was also deflationary for less obvious reasons.
 
Lots of new players lose their first game because they are unclear on the concept.  It used to be that whenever a new player came in and lost a game, he gave points to ArimaaScoreP1 that stayed in the system, but now those lost points disappear into thin air because ArimaaScoreP1's rating is fixed.  Yes, some people also gain points from thin air by beating ArimaaScoreP1, but since they gain 18 for winning and lose 102 for losing, the net effect is negative.
 
Another way of looking at it is that we have put ArimaaScoreP1 outside of the system by fixing its rating.  People don't actually "enter the system" until after they have beaten ArimaaScoreP1.  Because of all the losses to ArimaaScoreP1, the average rating of people entering the system is actually even lower than 1300.
 
Our two changes at once were obviously an over-correction, but I'm not sure what the ideal middle ground is.  Even with the change of increasing the starting rating up to 1400, we will still have inflationary and deflationary forces competing to make a balance.  Inflation will still be caused by newcomers losing a few games and leaving with a rating lower than the rating with which they entered the system.  Deflation will still be caused by people entering the system and working up the ladder until they have a higher rating than they had when they entered the system.
 
Which will weigh heavier, lots of small sources of points, or a few large drains of rating points?  I don't know, and it depends on user behavior.  Even if we modeled all the past data to determine the starting rating which gave a perfect balance, user behavior might change.  For example, when the boxed set comes out, we might get a higher ratio of small points-contributors who soon leave, or maybe we will get a higher ratio of dedicated players who hang around and deflate the system.  Even if we are perfectly calibrated now, the balance could shift in the future.
 
I suggest we re-evaluate in another year to see whether the ratings of fixed-performance bots have leveled off, or are still declining, or have bounced back up.  If the ratings have basically leveled, then we can stand pat.  My hunch, however, is that the ratings will still be declining.  If that is true, we may want to pop the newcomer ratings up to 1450, or even all the way to 1500.  In other words, it may have been that fixing ArimaaScoreP1's rating to 1000 was all the anti-inflationary medicine we needed, and lowering newcomer ratings in addition was pure overkill.
 
If you aren't satisfied with approximate stabilization adjustments every year or two, you could take my measure of inflation, make the automatic measurement once a day, and accordingly adjust the ratings of newcomers for the following day.  I would recommend against it, though, because the daily measure of inflation could fluctuate wildly, causing our countermeasures to similarly fluctuate wildly and constantly over-correct even if the balance is approximately correct.
« Last Edit: Aug 6th, 2009, 4:12pm by Fritzlein » IP Logged

omar
Forum Guru
*****



Arimaa player #2

   


Gender: male
Posts: 1003
Re: Arimaa rating deflation
« Reply #156 on: Aug 7th, 2009, 7:27am »
Quote Quote Modify Modify

Thanks for that explanation Karl. I changed the new player initial ratings to 1400.
IP Logged
mistre
Forum Guru
*****





   


Gender: male
Posts: 553
Re: Arimaa rating deflation
« Reply #157 on: Aug 8th, 2009, 7:00am »
Quote Quote Modify Modify

I don't want to reopen a closed discussion, but how was it decided that ArimaaScorep1 should be fixed to 1000?  Why not 1100?  Was there a methodology to this number or was it arbitrarily decided like the initial start ratings of new members?
IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Arimaa rating deflation
« Reply #158 on: Aug 8th, 2009, 8:59am »
Quote Quote Modify Modify

The rating of 1000 for ArimaaScoreP1 was somewhat, but not entirely arbitrary.  The logic for choosing the fixed point was that if we are going to try to keep ratings stable, why keep them stable around an arbitrary point?  Why not instead keep them stable relative to point with absolute meaning?
 
The most natural, meaningful fixed point seemed to be for a random mover to have a rating of zero.  It makes sense that an entity with complete knowledge of the rules, but trying neither to win nor to lose, would have a rating that is neither positive nor negative.  A negative rating would then correspond to actively trying to lose, and positive rating would correspond to actively trying to win.
 
The difficulty in anchoring the rating system on a random mover is that its skill level is so far from our own that we have difficulty estimating how far away it is.  The rating model is roughly accurate for closely matched opponents, but the greater the gap in skill, the worse the approximation becomes.  Some intuitive guesses were around a 2000 to 3000 point gap between a random mover and ShallowBlue.  It turns out, however, that random play is not so horrible.  Choosing a move at random is likely to advance a rabbit, which is a useful thing to do.
 
In order to estimate how bad a random mover is, clauchau created a ladder of bots with well-defined bits of knowledge, e.g. try to capture a piece, try to advance rabbits, etc.  Clauchau and 99of9 let this bot ladder play against itself, and the results are in page 6 of this thread.  The top of the ladder was the bot clauchau called M+S-S, which earned a rating of 1074 relative to the random mover having a rating of zero, according to 99of9's calculations.
 
'M' stands for generating all possible moves and selecting the best.  '+S' stands for maximizing your own score according to formerly-official Arimaa score function.  '-S' stands for minimizing your opponent's score.  At first I interpreted this to mean that 'M+S-S' was another name for ArimaaScoreP1, and therefore that relative to the fixed point of random mover having a zero rating, ArimaaScoreP1 should have a rating of 1074.
 
It turned out, however, that M+S-S is actually stronger than ArimaaScoreP1, because ArimaaScoreP1 is equally concerned with maximizing its own score and minimizing the opponent's score, whereas M+S-S first maximizes its own score, and only minimizes the opponents score as an afterthought to break ties between the set of moves which maximize the mover's score.  M+S-S beat ArimaaScoreP1 about 65% of the time according to clauchau at the bottom page 9 of this thread.  According to the rating formula, that would mean that ArimaaScoreP1 is 108 points worse than M+S-S, i.e. approximate a rating of 966 relative to a random mover having a rating of zero.
 
So, according to the best information we have, and insofar as the decision is not completely arbitrary, ArimaaScoreP1 should be fixed at 966 rather than at 1000.  However, this round-off error is overwhelmed by so many other considerations that it is totally insignificant.
 
First (and least important), the playouts that set the scale were random.  If we ran the experiment again, we would get a different value for the rating of M+S-S.
 
Second, and critically important, we could get any answer we wanted by choosing a different ladder of bots between random mover and M+S-S.  Arimaa ratings are not transitive.  They are only meaningful against the exact pool of players you have competed against.  No matter how accurately you measure the relative playing strengths in a given pool of players, say with millions of plays, those relative ratings would change every time a player is added to or subtracted from that pool.  I am convinced that if we wanted to skew the results, we could devise a different ladder to prove that M+S-S should have a rating over 2000.
 
Third, and most important of all, even if we managed to anchor ArimaaScoreP1's rating at the "perfect" distance above the rating of a random mover, that would not insure that anyone else's rating would drift toward a perfect distance above the random mover.  Again, ratings are not transitive, so fixing a bot rating is either inflationary or deflationary according to human behavior.  If we all banded together to incessantly defeat ArimaaScoreP1 by rote, we could inflate our own ratings without bound.  The reality is actually the opposite; since most people who can beat ArimaaScoreP1 stop playing it, the fixed rating has a deflationary effect.  But the mere fact that human behavior determines whether ArimaaScoreP1 pumps points into the system or draws points out proves that it isn't calibrating the rest of the ratings relative to random mover.
 
My personal opinion is that anchoring the rating system relative to random mover is so futile, it should play no part in our rating system decisions.  A vastly more useful rule of thumb would be to try to make it comparable to the chess scale, where an average club player is rated 1500 and an average tournament player is rated 2000.  It actually benefits Arimaa to have ratings similar to chess ratings so that the scale is familiar to outsiders.  The "anchored at zero" concept is a mathematical invention of ours that doesn't correspond to any outsider intuition.
 
An alternative rule of thumb would be that whatever system we happen to have chosen, let's keep things approximately constant.  It would be annoying to have a discontinuity in the history of ratings at any point, making past ratings not comparable to future ratings.
 
Luckily for all, it seems that all three objectives are essentially commensurate.  The scale we happened to have chosen is roughly comparable to the chess scale, so by keeping things stable as they are, we are also keeping things in line with outsider intuition.  In an even greater stroke of luck, this scale happens to be approximately in line with a rating of zero for a random mover.  No, the correspondence isn't exact, but our ability to measure is so clouded by non-transitivity that we are within the limits of any meaningful comparison anyway.
 
So it turns out that, more or less, we live in the best of all possible worlds.  Smiley
« Last Edit: Aug 8th, 2009, 9:04am by Fritzlein » IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Arimaa rating deflation
« Reply #159 on: Feb 9th, 2010, 1:19am »
Quote Quote Modify Modify

on Aug 3rd, 2009, 7:40pm, Fritzlein wrote:
fixed-performance
Year    Change
----    ------
2005-6  + 5
2006-7  +59
2007-8  -27
2008-9  -46

 
variable-performance
Year    Change
----    ------
2005-6  +81
2006-7  +50
2007-8  +32
2008-9  -21

on Aug 6th, 2009, 4:09pm, Fritzlein wrote:
My hunch, however, is that the ratings will still be declining.  If that is true, we may want to pop the newcomer ratings up to 1450, or even all the way to 1500.  In other words, it may have been that fixing ArimaaScoreP1's rating to 1000 was all the anti-inflationary medicine we needed, and lowering newcomer ratings in addition was pure overkill.

The above-quoted statistics were based on a partial year 2009.  When I redo it for all of 2009, I get
 
fixed-performance
Year    Change
----    ------
2005-6  + 5
2006-7  +59
2007-8  -27
2008-9  -61

 
variable-performance
Year    Change
----    ------
2005-6  +81
2006-7  +50
2007-8  +32
2008-9  -48

 
In other words, the deflation had not yet fully run its course when we made the mid-2009 correction.  Even after we bumped starting players from 1300 up to 1400, there was a bit more deflation working its way through the system.
 
However, I don't recommend any more corrective action at present.  My gut feeling is that the deflation has now fully worked its way into the gameroom ratings and we have more or less stabilized.  If we do nothing but take the same measurements at the end of 2010, I predict the averages will have drifted only slightly down.  If we make any change, though, it should probably be to the upside, for example by starting new players at 1450.
 
Totaling the differences from 2005 to 2009 shows that fixed performance bots have drifted down 24 points total, so perhaps we have over-corrected for several years of steady inflation.  However, each year-on-year change had a different set of bots for comparison.  Taking only the five fixed-performance bots which have been continuously present with floating ratings, namely GnoBot2005P1, GnoBot2005P2, Bomb2005P1, Bomb2005P2, and Arimaazilla, their average rating was actually 2 points higher in 2009 than in 2005.  Therefore, I think we're back to approximately a normal level, and if we stabilize near here life is fine.
 
Well see again at the end of 2010 whether my intuitions have worked out.  Smiley
« Last Edit: Feb 9th, 2010, 11:18am by Fritzlein » IP Logged

omar
Forum Guru
*****



Arimaa player #2

   


Gender: male
Posts: 1003
Re: Arimaa rating deflation
« Reply #160 on: Feb 10th, 2010, 2:39pm »
Quote Quote Modify Modify

Thanks for posting this Karl. It's good to know that the gameroom ratings are not deflating as much now. Although now that we have WHR ratings to use for seeding tournaments, I am less concerned about the integretry of the gameroom ratings.
IP Logged
zhanrnl
Forum Full Member
***



Arimaa player #4971

   


Gender: male
Posts: 12
Re: Arimaa rating deflation
« Reply #161 on: Feb 10th, 2010, 8:20pm »
Quote Quote Modify Modify

Wow, just read through the entire topic: very interesting! It did strike me as odd that ArimaaScoreP1 was pinned at 1000, but now I see there was a very good reason behind it.
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Arimaa rating deflation
« Reply #162 on: Jun 15th, 2010, 1:50pm »
Quote Quote Modify Modify

on Feb 9th, 2010, 1:19am, Fritzlein wrote:
However, I don't recommend any more corrective action at present.  My gut feeling is that the deflation has now fully worked its way into the gameroom ratings and we have more or less stabilized.

Bot ratings through the first five months of 2010 suggest that the system has indeed stabilized, and no further deflation is occurring.
 
fixed-performance
Year    Change
----    ------
2005-6  + 5
2006-7  +59
2007-8  -27
2008-9  -61
2009-10 +24

 
variable-performance
Year    Change
----    ------
2005-6  +81
2006-7  +50
2007-8  +32
2008-9  -48
2009-10  +2

 
One might ask how fixed-performance bots gained 22 rating points on variable performance bots as a group.  One could opine that the server is overloaded, which drags down the performance of variable bots, but my hunch is that the data is merely a statistical fluke.  Note that between 2008 and 2009, the fixed-performance bots as a group lost 13 rating points relative to the variable-performance bots, which should not have happened, because the server hardware did not improve between years.  Therefore that anomaly is merely reversing at present, a sign of measurement noise.
 
There are lots of reasons not to trust individual game room ratings, but at least at a marco level there is rough stability.  A rating of 1700 now means approximately what a rating of 1700 meant five years ago.  On the whole, we have neither inflation nor deflation.
 
This, in turn, means that the increase in top ratings very probably reflects an underlying reality of improved skill.  Anyone who is rated over 2000 today, if they could take a time machine back, would be a contender for the 2005 World Championship.  The fact that chessandgo is rated over 2600 today merely shows how far we have advanced, that is to say, how high the skill pyramid has been built up.
IP Logged

omar
Forum Guru
*****



Arimaa player #2

   


Gender: male
Posts: 1003
Re: Arimaa rating deflation
« Reply #163 on: Jun 16th, 2010, 9:50am »
Quote Quote Modify Modify

Thanks for looking at this Karl. Wow, looks like we've finally stabilized the rating system.
 
It's good to have some fixed performance bots that play at the level of beginner human players.
 
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Arimaa rating deflation
« Reply #164 on: Feb 3rd, 2011, 11:15pm »
Quote Quote Modify Modify

on Feb 9th, 2010, 1:19am, Fritzlein wrote:
My gut feeling is that the deflation has now fully worked its way into the gameroom ratings and we have more or less stabilized.  If we do nothing but take the same measurements at the end of 2010, I predict the averages will have drifted only slightly down.  If we make any change, though, it should probably be to the upside, for example by starting new players at 1450.
[...]
Well see again at the end of 2010 whether my intuitions have worked out.  Smiley

Well, now we have data for another year.
 
Bot \ Year  .     .  2003  2004  2005  2006  2007  2008  2009  2010
-------------------  ----  ----  ----  ----  ----  ----  ----  ----
bot_Bomb2005Blitz .     .     .  1876  1856  1931  2038  1950  1900  
bot_Bomb2005CC    .     .     .  1774  1858  1916  1903  1876  1886
bot_Bomb2005Fast  .     .     .  1827  1826  1930  1877  1871  1878
bot_GnoBot2005Blitz     .     .  1652  1747  1841  1857  1734  1732
bot_GnoBot2005Fast.     .     .  1541  1724  1734  1734  1676  1695
bot_Arimaazilla   .  1516  1419  1449  1451  1502  1505  1433  1488
bot_Bomb2005P1    .     .     .  1488  1632  1715  1649  1542  1558
bot_Bomb2005P2    .     .     .  1752  1806  1887  1864  1787  1822
bot_GnoBot2005P1  .     .     .  1382  1262  1392  1311  1274  1316
bot_GnoBot2005P2  .     .     .  1552  1608  1651  1636  1577  1593

 
From this small sample, it looks like bot ratings have bounced back a little in 2010 from the lows of 2009.  I speculated that we might have to bump up the starting rating from 1400 to 1450 to combat lingering deflation, but I was wrong.  Instead there might have been slight re-inflation.  However, although 2010 was a bit above the reference year of 2005, it was still well below the inflationary peak of 2007.  One could perhaps argue for lowering the rating of newcomers to 1350, but my new gut feeling is that the ratings are close enough to stable as makes no odds.  I recommend we let it ride and measure again at the end of 2011 to see if much of anything has changed.
IP Logged

Pages: 1 ... 9 10 11 12  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.