Arimaa Forum (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
Arimaa >> General Discussion >> Ratings distortion due to selection of opponents
(Message started by: Fritzlein on Sep 25th, 2004, 9:00am)

Title: Ratings distortion due to selection of opponents
Post by Fritzlein on Sep 25th, 2004, 9:00am
I'm starting a new thread in order to emphasize that there is a more serious problem with the current rating system than ratings deflation.  The discussion in the ratings deflation thread has produced some interesting ideas about how to anchor the ratings system with a pool of bots, but there is a more fundamental issue that needs separate treatment.  In that thread, Omar says:


on 09/21/04 at 15:52:42, omar wrote:
* The ELO rating system would work fine as long as the players are not allowed to pick their opponents and the opponents are picked for them (as it happens in tournaments).

* When the players are allowed to pick their own opponents, the rating system can be abused by repeatedly playing the same or small group of opponents.

* When computers opponents are also added to the mix, it makes the problem even worse, because once a player learns how to defeat it they can do it again and again since the computer opponent will never figure out why it is losing and adapt itself (at least with the current computer opponents :-) ).

* The meaning of a players rating should be how they have performed against the field. The more different opponents that a player has played the more meaningful and reliable the rating is. If only a few opponents have been played the rating is not very meaningful.


I think most of us can, from our own experience, feel the truth of what Omar says.  If there is a bot you can consistently beat, you can push your rating up and up by playing it over and over.   The inverse is also true: if there is a bot that beats you consistently, you will push your rating down and down by playing it.  This process distorts the rating of the player who can beat the bot relative to the one who can't.  Their ratings will predict a huge difference in skill, but when the humans play each other, they will be closer than the ratings indicate.

So Omar has made a new system, and, like a good computer scientist, has given you the code so you can test it for yourself and see that it is a better system: http://arimaa.com/arimaa/rating/testRatings.tgz

I, on the other hand, am a mathematician.  I want you to look at the formulas so you can can convince yourself theoretically that Omar's new system must be a better system, regardless of how it performs in practice.  :-)  I have trouble prying things out of the perl, but I may have the gist of it.

The central idea is to not have every game affect your rating equally.  Games you play against a frequent opponent should count for less.  To implement this idea means that a new formula can't just look at the current rating of each of the two players in a game and adjust based on those, as in now the case.  Instead we have to look back in the game history.

In the proposed system, in order to calculate your new rating after each game, we look back at all the games you have ever played, assign a relative weight to each, and then use the whole body of results to compute a new rating.  Once you digest this basic procedure, the interesting question is how to weight each game.

Clearly historical games should be weighted less than more recent ones.   It seems natural to multiply old games by something like (0.999)^d, where d is the number of days old a game is.

More subtly, it might make sense to for games that were earlier in sequence to be weighted less.  For example suppose you play twenty games all on the same day.  Maybe the last of the twenty games should count for significantly more than the first, even if they were not separated very much in time.  Perhaps a second factor, multiplied on top of the time factor, would be (0.999)^s, where s is the number of games ago in the sequence.

Most importantly, however, N games about a single opponent should be weighted less than a single game against each of N different opponents.  I believe Omar is using the formula that a single game against each of N opponents has a weight equal to N^2 games against a single opponent.  So if I play one game each against 99of9, clauchau, and naveed, those games collectively will have a weight equal to nine games against speedy.

This last weighting is the crucial feature to differentiate it from the current system.  To play over and over against one opponent has diminishing returns.  Indeed, Omar has actually proposed a hard cap on the amount one opponent can affect one's rating.

My next post (maybe not today) will be about "finding cracks" in the proposed implementation, but I wanted to say more about the basic idea first, so that folks have a chance to think about it.  Supposing that we were going to calculate the rating based on a weighted average of all past games, how would you assign those relative weights?

Title: Re: Ratings distortion due to selection of opponen
Post by clauchau on Sep 25th, 2004, 12:58pm
I had (0.5)^s in mind among games involving the same pair of players. In other words, every time I play a game against Omar, my previous games against him are made worth half what they were worth.

Weights of (0.999)^s wouldn't be reactive and short-term rewarding enough to me. At least for the weights relative to pairs of players.

I think it would solve the problem of over weighting the games involving the same pair of players. My single game against Fritzlein would be worth 1 and my 93 games against bot_Speedy would be worth almost 2 (before we possibly further apply age weights).

Title: Re: Ratings distortion due to selection of opponen
Post by maker on Sep 25th, 2004, 3:03pm
This is an excellent idea.   :o  I love the thought that multiple games against a single opponent should create a lower rating increase or decrease.  This allows small differences in ratings to actually mean something for different players.  Also, it would stabalize the bot's ratings fairly well.

However, I don't believe that the current ideas about the time ratings are very good.  I believe the're an improvement over the current system, but it shouldn't be so quadratic.  Perhaps we could use a more linearized version of (.999)^quantity.  Maybe this would be better (.999)^quantity + (.999) * quantity.  This allows for a nice curve, but just not quite so abrupt of one.  It allows more recent games to have a greater impact on the score, but also allows games not-quite-so-recent to count significantly.

Overall, I believe that this is an excellent idea. ;D

maker

Title: Re: Ratings distortion due to selection of opponen
Post by Fritzlein on Sep 25th, 2004, 3:40pm
I'm with you clauchau, although not to such an extreme.  The way you suggest it, if you beat me eighteen times and then lose to me twice, your effective record against me would be 0.5 - 1.5.  It looks like I am dominating you despite winning only 10% of our games.  At most I could see a factor of (0.9)^s, and even that makes the most recent game count for at least a tenth of the total.  It could still be a bit too reacitve.

The part I like, though, is having the sequential decay apply mostly (or entirely) within games against the same opponent.  In fact, now that I think about, I might like it even as a replacement for the square root idea.  On the other hand, the advantage of the square root idea is that you don't approach the maximum weight so quickly.  If I play nine games with a deacy of (0.9)^s, that already makes the total weight of them 6.1 games, whereas with a square root it would only be 3 games.  Hmmm...

Certainly I prefer either of these to the way Omar capped the influence of any given opponent by a formula that makes the total weight of all your games against a given opponent decrease after a certain number of games.  I think the total weight should taper off, but never decline from playing more.  A win should always be a plus, however slight, but with Omar's latest, a win against a frequent opponent can actually hurt you slightly.  (This is crack #1 in his proposal, IMHO)

I think I might lobby for the application of four weighting factors.  First apply all three multiplicative weights (in any order, since multiplication is commutative):
(0.999)^d where d is the number of days old the game is
(0.998)^g where g is the number of games old the game is in your own history
(0.95)^s where s is the number of games old the game is in your history against that opponent

Then for each opponent sum the (already lowered) weights against that opponent and divide all weights of games against that opponent by the square root.

Doing it this way insures that no matter how many games you play against a given opponent, all of those games together account for a weight of 4.39 times the weight of playing one game against a new opponent.  So playing new opponents will always affect your rating more.

At the same time, a single game against a frequent opponent counts for at least 1/20 of your total against that opponent no matter how many times you've played them.  (If you've played them a bunch, its weight will be about 0.2 times the weight of a game against a new opponent.)  Not as volatile as clauchau suggests, but there is a balance between the excitement of fast-moving ratings, and the meaningfulness of more stable ratings.

Incidentally, the stability of a rating would be proportional to the sum of the weights of all the games after these factors have been applied.  So the sum of the weights should intuitively match our idea of accuracy.


Title: Re: Ratings distortion due to selection of opponen
Post by clauchau on Sep 25th, 2004, 5:25pm
Oh yes this is satisfactory.

I still wonder what to do with those weighted results. Does it lead to the equation which Omar's scripts solve?

Title: Re: Ratings distortion due to selection of opponen
Post by 99of9 on Sep 25th, 2004, 6:41pm
Can you prove that with this system a person's rating will NEVER go down due to a win?

What if a while ago you had a very low rating, and won against a high-rated player.  Now you have a rating higher than his, and you win again against him.  The original game, with the high ratings difference suddenly receives less weight.  The new win has hardly any ratings difference, so doesn't contribute much.   Maybe your rating would go down?

I'm not that against this system.  But it is untested and unproven, so I feel there may be loopholes or inconsistencies that make it perform worse than the current system.

By the way, I think your exponent based on the number of days is too high.  After an entire year a game will still be worth more than 1/2.  I'm sure most of us lost to shallowblue less than a year ago!!  [but then again... I guess the games played exponent would dampen that out ... so maybe it's ok]

As an alternative we could use the current system with a different Rating Uncertainty against each different opponent... gradually it would tail off to 1 so you wouldn't be able to exploit a bot anymore.  Then at least it would still be ELO and comparable with other systems.

Title: Re: Ratings distortion due to selection of opponen
Post by omar on Sep 25th, 2004, 9:58pm
Thanks for starting this discussion Karl. I think it will be very interesting to discuss this issue and try out possible solutions.

I would strongly suggest everyone taking part in this discussion to download this
 http://arimaa.com/arimaa/rating/testRatings.tgz
and try out the scripts in it. The README file explains how to run the scripts. If you know a little perl you can easily try out different ideas.


Quote:
A win should always be a plus, however slight, but with Omar's latest, a win against a frequent opponent can actually hurt you slightly.


I think in my latest system (p7) I made sure that a rating never decreases no matter how many games you play against the same player. It levels off, but never decreases. I think that an earlier system (p4) had that problem.


Title: Re: Ratings distortion due to selection of opponen
Post by omar on Sep 25th, 2004, 10:30pm
I just ran this command to double check:
   testit p7

and here is what is shows.

N          Rating
1          1612
2          1667
5          1728
10        1763
20        1785
50        1796
100      1798
1000    1798

N = Number of consecutive wins against a single 1000 rated player (the players rating is fixed at 1000 and does not change).

Now if each of those wins is against a different player here is what happens.

N'        Rating
1         1612
2         1730
5         1878
10       1978
20       2060
50       2123
100     2136
200     2137
1000   2137

So against the same player it levels off at 798 points above the opponents rating. Against different players it levels off at 1137 above the opponents rating. But it never drops. Another important thing to notice that the effort it requires to get 798 points above an opponent. If it is the same opponent it take 50 games to get to that level, but if it is different opponents it only 5 games (and you actually get 878 points above). Cool.


Title: Re: Ratings distortion due to selection of opponen
Post by omar on Sep 25th, 2004, 11:22pm

Quote:
I still wonder what to do with those weighted results. Does it lead to the equation which Omar's scripts solve?


I think the equation you are refering to is:

k1*(w1-W(r1-RP)) + k2*(w2-W(r2-RP)) + ... + kN*(wN-W(rN-RP)) = 0

where:
 r1 to rN are the ratings of the opponents
 w1 to wN are the results of the games (0, 0.5, 1 for lose, draw, win)
 k1 to kN are weights assigned to each game
 W() is the usual Elo winning expectancy formula.

You have to accept this equation as a given. Each term in the equation represents a game. We know the rating of the opponents and the results of the games and so we are trying to find what rating (RP) should be assigned to this player so that it best matches the players performance in these games. If we accept this equation then we get a bunch of weights that need to be assigned to the games. Thats what we are discussing now. How should we assign the weights to these games.

BTW. I didn't come up with this formula. It is refered to as a performance rating formula and is commonly used to determine how a players performed in a tournament. In such calculations the weights are all set to 1.

However I have never seen this equation used as the main equation for the rating system with the weights set to different values based on things like how old the game is, how many games were played after this game, how many games were played with this same player, etc.

The main equation of most rating systems is a simple formula that computes the new rating based on the old rating, opponents rating and the game result. This is basically how the current Arimaa rating system is also.

Having a simple main equation for the rating system would have made it feasible to compute the ratings in Elo's days when computers were not around. It would have been next to impossible to use the above equation as the main equation of a rating system if computers had not been available (especially when the weights are different). I will venture to guess that Elo would have perfered to use the above equation if computers were around in his days :-)


Title: Re: Ratings distortion due to selection of opponen
Post by Fritzlein on Sep 27th, 2004, 8:59am
I share 99of9's concern that any untested system may have loopholes and inconsistencies that we don't anticipate.  He asks, for example, whether it is certain that a rating really will never go down after a win.  This is a good question because it would be certain to never go down if the only weighting factors were the exponential decay by days, and by sequence of games.  Since your entire history of games is losing its weight in the same proportion, the realtive importance of all past results stays the same.  Then, yes, a win will always boost your rating, however slightly.

When I proposed having games against an individual opponent lose weight faster than other games, I didn't realize that I was destroying this property.  Now it could happen that you have excellent results against a frequent opponent (perfect or near-perfect), but poor results against the rest of the field.  When you play that frequent opponent again, your good results against him are knocked down by a greater factor than your poor results against the rest of the field, which can hurt you slightly.  Since you are entering a new good result against him, that will more than compensate for the loss, unless his rating has suddenly drastically declined from the times you beat him before.  But, yes, in that one case it is just barely theoretically possible for a win to hurt your rating.

In practice this seems extremely unlikely.  One can probably assume that players won't gain or lose 1000 points from where they were before.  It would never be like the silly situation with the world tennis rankings where someone can win the French Open and still drop from first to second in the rankings.  I doubt that a win would cost anyone rating points once in a thousand games.  Still, perhaps the concern that it could happen is enough to make us want to scrap the idea of game weights deacying at different rates.

In any case, I've discovered what I think is a bigger potential loophole, which I will write about in my next posting since I have to run off to class now.

Title: Re: Ratings distortion due to selection of opponen
Post by Fritzlein on Sep 30th, 2004, 11:43am
OK, time for potential problem #2 with Omar's proposal.  After the games have all been relatively weighted, the computation of the rating answers the question, "If my rating had been X for all of those games, what value of X would have predicted that I would win as many games as I actually won?"

For example (unweighted just for ease of computation) suppose I have a win against a 1414 player, a win against a 1636 player, and a loss against a player rated 1775.  My record is 2-1.  If my rating had been 1751, you would have predicted I would win
0.874 against the 1414 player
0.660 against the 1636 player
0.466 against the 1775 player
for a predicted  total of 2 wins.  Thus 1751 is a reasonable guess at my rating.

This is a nifty and conceptually simple calculation (although computationally tricky), but it has one problem.  If a player has won (or lost) all their games, the only rating that would predict a perfect score is infinity.

(oops, gotta run, more later on the disadvantages of omar's solution to the problem.)

Title: Re: Ratings distortion due to selection of opponen
Post by omar on Oct 5th, 2004, 6:40am

on 09/30/04 at 11:43:20, Fritzlein wrote:
If a player has won (or lost) all their games, the only rating that would predict a perfect score is infinity.


But there is fictitious draw game that is always added which eliminates this problem.

Title: Re: Ratings distortion due to selection of opponen
Post by omar on Oct 5th, 2004, 6:46am
I think problem #1 was supposed to be that a win against a frequent opponent can hurt your rating. But as I noted earlier p7 does not have this problem.

So just to keep the record straight Karl, both the problems you mentioned are not present in p7.

Omar

Title: Re: Ratings distortion due to selection of opponen
Post by Fritzlein on Oct 5th, 2004, 10:42am
Sorry I didn't finish my thought.  Yes, problem #1 was supposed to be that a win can reduce a player's rating.  It's good that that is not present in your most refined approach -- I hadn't checked the equations myself.

Problem #2 is not that infinite ratings are possible in your proposal, the problem is with the fictitious draw against a zero-rated player, which is added to prevent an infinite rating.  If the draw is against a zero-rated player, it will have the tendeny to deflate the entire rating system to an average rating of zero.

Example: A new player joins and loses twice to Arimaazilla.  I'm not sure what you intended to go in the record of Arimaazilla for the first game, but for the second game Arimaazilla gets credit for beating a player with a negative rating, i.e. almost no credit at all.

Then suppose the new player wins the next two.  This will result in the new player having a rating almost equal to Arimaazilla, which is fine, but what happens to Arimaazilla?  The bot get penalized for a loss to a negative-rated player and to a still-rather-low-rated player.  The net is a substantial penalty to Arimaazilla.

The situation becomes even worse if two new players each lose to Arimaazilla and then play each other for a while.  They each get results in their records of losing to negative-rated opponents, solidifying their ratings at that low level.

Furthermore, an unfortunate side effect might be that established players, understanding that new players are generally underrated, avoid playing against new players because they don't want their own ratings to take a hit on average.  If I have a rating of 2000, and I play against someone with a  rating of, say, -200, I need to have true winning odds of about 300000 to 1 to make it a fair proposition in terms of hurting or helping my rating.  If my true winning odds are only 1000 to 1, then I will, on average, lose rating points by playing that opponent.

Because of the tendency of all ratings to gravitate toward the rating of the fictitious draw, I would strongly suggest having that draw be against a 1500-rated player, or a rating that we want to be the average rating.

Omar, I think perhaps you were focusing on how much someone has to work to get a high rating, rather than focusing on the more typical case of fairly weak players entering and trying to work their way up the ladder.  If it is a major concern that it is too easy to get a high rating when the only "ballast" is a fictitious draw against a 1500-rated player, then let's add two or three fictitious draws.

Title: Re: Ratings distortion due to selection of opponen
Post by omar on Oct 9th, 2004, 1:40pm

Quote:
Example: A new player joins and loses twice to Arimaazilla.  I'm not sure what you intended to go in the record of Arimaazilla for the first game, but for the second game Arimaazilla gets credit for beating a player with a negative rating, i.e. almost no credit at all.

Then suppose the new player wins the next two.  This will result in the new player having a rating almost equal to Arimaazilla, which is fine, but what happens to Arimaazilla?  The bot get penalized for a loss to a negative-rated player and to a still-rather-low-rated player.  The net is a substantial penalty to Arimaazilla.



Actually this is not as much of a problem as it may seem. I actually tried it out:

gr bot_Arimaazilla | p7

shows that Arimaazilla's current rating is 1506 using the p7 rating system (at least at the time of this writting). When it wins two games against a 0 rated player it's rating does not go up at all. Likewise the 0 rated players rating also does not go down much.

Now here is where this rating system is so different than the rating system that we are used to. When a player has not played many games, this rating system lets the ratings move very fast. So in the next two games when new player wins his rating goes up very fast but Arimaazilla's rating does not go down as much. Lets see what happens to Arimaazilla rating after losing the first game.

gr bot_Arimaazilla | rep '-0 newPlayer' 1 - | p7

shows that it would go down to 1472. Now lets see what happens to the new players rating after he wins the first game.

rep '+1506 bot_Arimaazilla' 1 '-1506 bot_Arimazilla' 2 | p7

The new players rating pops up to 1453 from -3. After the second game the ratings would be: 1498 for the new player and 1471 for Arimaazilla. So it is not really that bad.

I don't mind using 1500 as the fictitious draw rating rather than using 0. The actual number does not matter too much. I think it is more important to keep that number fixed and not change it.

In your example you are assuming a situation where we don't have a bunch of fixed rated 'dummy bots' for new players to play against. Once we have such bots we can make it so that only after playing about 10 rated games with such bots the other rated games begin to count. Otherwise the player can still play against who ever they want, but the games won't be rated until the player completes the provisional 10 games. Im assuming we will have fixed rated dummy bots up to the level of shallowBlue or maybe even something between shallowBlow and Arimaazilla. Thus the new players will quickly bring up their rating based on their performance. Since this rating system moves the ratings so fast in the begining it works best if we have the new players play some provisional games before counting their other rated games.

So in this situation we will not have the problem of established players avoiding new players or new players establishing low ratings for by only playing other new players. And it will not make much difference what we chose the fictitious draw rating to be. Which is good because we don't want that to be a significant factor in the rating system. So we don't have to bias it at all.


Title: Re: Ratings distortion due to selection of opponen
Post by Fritzlein on Oct 12th, 2004, 9:41pm
I notice that you didn't try out the worst case, i.e. when two new players with negative rating play each other.  Maybe that's rare enough at present not to be a concern, but if this server ever gets truly active, it could happen.  Even now it seems there are a couple of new players every week, and I expect in the future the rate will pick up.

But even if the average case of a new player entering is for Arimaazilla to lose 30 points, that's significant deflation in my book.   You say that there could be bots with fixed ratings to pump points back into the system, but that has two disadvantages.

The first problem is that it would be nice to implement some improvements in the ratings soon (maybe after the championships and challenge are over) but if you want inflationary forces to balance deflationary ones, then the improvements have to wait until the pool of fixed-rating bots has come on-line.  Moreover, if you design your system on the assumption that the fixed-rating bots will take care of all problems of inflation/deflation, but then the bots don't work as planned or have other unanticipated problems such that you have to take them offline, you could be stuck with a system that doesn't work in the absence of those bots.

The second problem is that relatively inflated and deflated ratings can take a long time to even out unless there is considerable mixing of players.  For example, at present most new players start against Arimaazilla, so almost all of the deflation would occur in Arimaazilla's rating.  The secondary effects would transfer to whoever plays Arimazilla, in particular newcomers, so after a bit not only would Arimaazilla have a deflated rating, so would newcomers playing Arimaazilla.  Established humans like me, who never play Arimaazilla, wouldn't feel the deflationary effects except at the higher order, and thus would have relatively inflated ratings.

In short, you don't want inflation and deflation which balance, what you want is neither.

All my blathering is moot it you don't feel strongly about the ballast.  Since you aren't attached to the notion of 0.1 draws against a zero-rated player, maybe you will just take my word that it is much better to have a ballast of 2 draws against a 1500-rated player.  :-)  

By the way, (although maybe this belongs in the other thread) I like your idea of requiring new players to play a certain number of games against bots whose ratings they don't affect.  It could significantly mitigate the problem of new players with inaccurate ratings disrupting the ratings of established players.  This would be especially helpful if there were a couple of stronger bots in the pool, which had to be played along with the weaker ones.  As long as there are some bots a new player loses to and some bots they can beat, the ratings estimate based on those games should be reasonable.  The real trouble (and the reason for needing a significant ballast) is the extreme results of winning or losing all games, or almost all.


Title: Re: Ratings distortion due to selection of opponen
Post by maker on Oct 13th, 2004, 9:30am
Hey,

    I don't remember this being discussed; however, if you were to create a couple of rated bots between shallowblue and arimazilla, the ratings would almost fix themselves.  I understand that Fritzlien has a lot of guin...um, real concerns about the math involved in the proposed new systems.  However, won't some of these disappear if the newbies have real ratings?  The main problem I see with newbie's ratings being deflated right now, is that a person may have an ephiphany of thought in just a minute, however, the ratings take days and many games to fix.  This, I feel, is due only to the fact that Arimazilla is incredibly much harder to beat than shallowblue.

Novices who want to improve are naturally drawn to tring to beat the next bot, even when they are not ready to do so, simply because they know that they will eventually be able to do so.  This leads to premature play against an extremely stronger opponent(who is the weakest online at this point) and a huge drop in ratings.  However, at the so-called point of ephiphany, the poor player is still stuck with their sorry rating and must beat Arimazilla with this bad rating many times in order to increase their rating to what it should be.  And not only this, but the net effect would be a significant drop in Arimazilla's rating due to the loss to such a lowly rated player.  Just some thoughts.

maker

Title: Re: Ratings distortion due to selection of opponen
Post by Fritzlein on Oct 13th, 2004, 4:04pm
I agree that there is a much bigger gap between ShallowBlue and Arimaazilla than the ratings indicate.  However, given that games against ShallowBlue are currently unrated, that in itself doesn't cause ratings distortion.

What is causing ratings distortion (in my opinion) is people playing over and over against the same bot they can hardly ever beat, and people playing over and over against the same bot they can almost always beat.

If Player X loses to Arimaazilla 80% of the time, and Player Y beats Arimaazilla 80% of the time, then the ratings predict that Player Y will win 94% of the time against Player X, but in practice Player X gets better results than predicted head-to-head.  The problem is that it doesn't take much increase in skill to transition from losing most of the time to winning most of the time against a bot if you are almost equal to it.  It is possible the bot is still crushing you, but you are only one insight away from crushing it, as you allude to.

It seems to me that this sort of inaccuracy in rating can be overcome by forcing people into playing a variety of opponents.  If we are trying to get decent estimates of the ratings of newcomers, the way to do it is precisely not to have lots of games against Arimaazilla.  In most cases the newcomer will either lose almost all, and get a deflated rating, or win almost all, and get an inflated rating.  To be accurate there should be a wide enough range of bots to insure that the newcomer will win some and lose some.

There is also a problem, as you point out, of a player who has improved, but has a rating that hasn't caught up.  This causes a general deflation of ratings, which was much discussed in another thread.  But that applies more to the system as whole than it does to individual players.  That is to say, it's a problem if the same strength of player that was rated 1800 a year ago is now only rated 1650, but if everyone's rating reflects this deflation, then it isn't a big deal because the relative ratings are still accurate.  I am generally much more worried about relative ratings being out of whack.  If Omar implements a new system which makes the relative ratings more accurate at the cost of general inflation or deflation to the system as a whole, I will still think it is an improvement.

Title: Re: Ratings distortion due to selection of opponen
Post by omar on Oct 14th, 2004, 5:40pm
Like I said Im not too worried about what rating value we use for the fictitious draws so long as that number does not keep changing. I like the fact that if we use 1500 instead of 0 it represents a better guess at the user rating and reduces the amount of disruption to the rating system thereby allowing us to get started and continue using this rating system even when we don't have fixed rated bots online. So I think this is a plus for using 1500 instead of zero.

I created a new system called p8 which is just p7 but with two draws against a 1500 player instead of one draw against a 0 rated player.

I compared the rating of a few players using p7 and p8; the difference for established players is only about 1 or 2 rating points. For example:

 gr bot_Arimaazilla | p7

gives a rating of 1481 and using p8 instead gives 1482. For 99of9 p7 gives 2120 and p8 gives 2118. Interesting that two draw games against a 1500 player drags down more than one game against a zero rated player. One draw game against 1500 would keep it the same at 2120. Anyways the amount is not too significant for established players. It has a much bigger impact when the number of games played is low.

You can download the new set of scripts from:

http://arimaa.com/arimaa/rating/testRatings.tgz





Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.