Welcome, Guest. Please Login or Register.
May 5th, 2024, 4:44pm

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « Ratings distortion due to selection of opponents »


   Arimaa Forum
   Arimaa
   General Discussion
(Moderator: supersamu)
   Ratings distortion due to selection of opponents
« Previous topic | Next topic »
Pages: 1 2  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: Ratings distortion due to selection of opponents  (Read 4385 times)
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Ratings distortion due to selection of opponen
« Reply #15 on: Oct 12th, 2004, 9:41pm »
Quote Quote Modify Modify

I notice that you didn't try out the worst case, i.e. when two new players with negative rating play each other.  Maybe that's rare enough at present not to be a concern, but if this server ever gets truly active, it could happen.  Even now it seems there are a couple of new players every week, and I expect in the future the rate will pick up.
 
But even if the average case of a new player entering is for Arimaazilla to lose 30 points, that's significant deflation in my book.   You say that there could be bots with fixed ratings to pump points back into the system, but that has two disadvantages.
 
The first problem is that it would be nice to implement some improvements in the ratings soon (maybe after the championships and challenge are over) but if you want inflationary forces to balance deflationary ones, then the improvements have to wait until the pool of fixed-rating bots has come on-line.  Moreover, if you design your system on the assumption that the fixed-rating bots will take care of all problems of inflation/deflation, but then the bots don't work as planned or have other unanticipated problems such that you have to take them offline, you could be stuck with a system that doesn't work in the absence of those bots.
 
The second problem is that relatively inflated and deflated ratings can take a long time to even out unless there is considerable mixing of players.  For example, at present most new players start against Arimaazilla, so almost all of the deflation would occur in Arimaazilla's rating.  The secondary effects would transfer to whoever plays Arimazilla, in particular newcomers, so after a bit not only would Arimaazilla have a deflated rating, so would newcomers playing Arimaazilla.  Established humans like me, who never play Arimaazilla, wouldn't feel the deflationary effects except at the higher order, and thus would have relatively inflated ratings.
 
In short, you don't want inflation and deflation which balance, what you want is neither.
 
All my blathering is moot it you don't feel strongly about the ballast.  Since you aren't attached to the notion of 0.1 draws against a zero-rated player, maybe you will just take my word that it is much better to have a ballast of 2 draws against a 1500-rated player.  Smiley  
 
By the way, (although maybe this belongs in the other thread) I like your idea of requiring new players to play a certain number of games against bots whose ratings they don't affect.  It could significantly mitigate the problem of new players with inaccurate ratings disrupting the ratings of established players.  This would be especially helpful if there were a couple of stronger bots in the pool, which had to be played along with the weaker ones.  As long as there are some bots a new player loses to and some bots they can beat, the ratings estimate based on those games should be reasonable.  The real trouble (and the reason for needing a significant ballast) is the extreme results of winning or losing all games, or almost all.  
 
« Last Edit: Oct 12th, 2004, 10:08pm by Fritzlein » IP Logged

maker
Forum Full Member
***



bot_tod's maker

   


Gender: male
Posts: 21
Re: Ratings distortion due to selection of opponen
« Reply #16 on: Oct 13th, 2004, 9:30am »
Quote Quote Modify Modify

Hey,
 
     I don't remember this being discussed; however, if you were to create a couple of rated bots between shallowblue and arimazilla, the ratings would almost fix themselves.  I understand that Fritzlien has a lot of guin...um, real concerns about the math involved in the proposed new systems.  However, won't some of these disappear if the newbies have real ratings?  The main problem I see with newbie's ratings being deflated right now, is that a person may have an ephiphany of thought in just a minute, however, the ratings take days and many games to fix.  This, I feel, is due only to the fact that Arimazilla is incredibly much harder to beat than shallowblue.
 
Novices who want to improve are naturally drawn to tring to beat the next bot, even when they are not ready to do so, simply because they know that they will eventually be able to do so.  This leads to premature play against an extremely stronger opponent(who is the weakest online at this point) and a huge drop in ratings.  However, at the so-called point of ephiphany, the poor player is still stuck with their sorry rating and must beat Arimazilla with this bad rating many times in order to increase their rating to what it should be.  And not only this, but the net effect would be a significant drop in Arimazilla's rating due to the loss to such a lowly rated player.  Just some thoughts.
 
maker
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Ratings distortion due to selection of opponen
« Reply #17 on: Oct 13th, 2004, 4:04pm »
Quote Quote Modify Modify

I agree that there is a much bigger gap between ShallowBlue and Arimaazilla than the ratings indicate.  However, given that games against ShallowBlue are currently unrated, that in itself doesn't cause ratings distortion.
 
What is causing ratings distortion (in my opinion) is people playing over and over against the same bot they can hardly ever beat, and people playing over and over against the same bot they can almost always beat.
 
If Player X loses to Arimaazilla 80% of the time, and Player Y beats Arimaazilla 80% of the time, then the ratings predict that Player Y will win 94% of the time against Player X, but in practice Player X gets better results than predicted head-to-head.  The problem is that it doesn't take much increase in skill to transition from losing most of the time to winning most of the time against a bot if you are almost equal to it.  It is possible the bot is still crushing you, but you are only one insight away from crushing it, as you allude to.
 
It seems to me that this sort of inaccuracy in rating can be overcome by forcing people into playing a variety of opponents.  If we are trying to get decent estimates of the ratings of newcomers, the way to do it is precisely not to have lots of games against Arimaazilla.  In most cases the newcomer will either lose almost all, and get a deflated rating, or win almost all, and get an inflated rating.  To be accurate there should be a wide enough range of bots to insure that the newcomer will win some and lose some.
 
There is also a problem, as you point out, of a player who has improved, but has a rating that hasn't caught up.  This causes a general deflation of ratings, which was much discussed in another thread.  But that applies more to the system as whole than it does to individual players.  That is to say, it's a problem if the same strength of player that was rated 1800 a year ago is now only rated 1650, but if everyone's rating reflects this deflation, then it isn't a big deal because the relative ratings are still accurate.  I am generally much more worried about relative ratings being out of whack.  If Omar implements a new system which makes the relative ratings more accurate at the cost of general inflation or deflation to the system as a whole, I will still think it is an improvement.
IP Logged

omar
Forum Guru
*****



Arimaa player #2

   


Gender: male
Posts: 1003
Re: Ratings distortion due to selection of opponen
« Reply #18 on: Oct 14th, 2004, 5:40pm »
Quote Quote Modify Modify

Like I said Im not too worried about what rating value we use for the fictitious draws so long as that number does not keep changing. I like the fact that if we use 1500 instead of 0 it represents a better guess at the user rating and reduces the amount of disruption to the rating system thereby allowing us to get started and continue using this rating system even when we don't have fixed rated bots online. So I think this is a plus for using 1500 instead of zero.
 
I created a new system called p8 which is just p7 but with two draws against a 1500 player instead of one draw against a 0 rated player.
 
I compared the rating of a few players using p7 and p8; the difference for established players is only about 1 or 2 rating points. For example:
 
  gr bot_Arimaazilla | p7
 
gives a rating of 1481 and using p8 instead gives 1482. For 99of9 p7 gives 2120 and p8 gives 2118. Interesting that two draw games against a 1500 player drags down more than one game against a zero rated player. One draw game against 1500 would keep it the same at 2120. Anyways the amount is not too significant for established players. It has a much bigger impact when the number of games played is low.
 
You can download the new set of scripts from:
 
http://arimaa.com/arimaa/rating/testRatings.tgz
 
 
IP Logged
Pages: 1 2  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.