Welcome, Guest. Please Login or Register.
May 1st, 2024, 6:49pm

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « New rating model with learning »


   Arimaa Forum
   Arimaa
   General Discussion
(Moderator: supersamu)
   New rating model with learning
« Previous topic | Next topic »
Pages: 1  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: New rating model with learning  (Read 2410 times)
Ryan_Cable
Forum Guru
*****



Arimaa player #951

   


Gender: male
Posts: 138
New rating model with learning
« on: Feb 20th, 2006, 12:07pm »
Quote Quote Modify Modify

Currently, the rating system assumes that every player has some constant true rating R0 and that by playing games players' listed ratings should tend towards their true ratings.  The reality for human players is much different.  Since Arimaa is pretty much exclusively played on the server, people start out very weak and then learn about the game primarily through playing games.
 
I think a better rating model is R0+R1*ln(N+1), where R0 and R1 are constants and N is the number of games played.  Basically, for every time a human doubles the number of games he has played, I expect his true rating to increase by a constant amount.  To implement this, every time a human plays a game, in addition to the normal point exchange for winning/loosing the game, I would add R1/(N+1) to his rating (where N includes the game in question).  To make this work well, the ratings would have to be internally stored as floating point numbers, but this really should be done under the current system anyway.
 
Obviously R0 and R1 are going to differ for each player, but we must make some reasonable guess at the average values.  Selecting R0 is ultimately arbitrary, but I think 1000 is a reasonably good estimate of the true rating of a noob playing his first game.  R1 is not arbitrary, it must align with the constant in the ELO formula (400).  Mostly it is a question of how many classes there are between a fresh noob and the True Champion.  My best guess at R1 is 200.  This would produce the following true ratings for the model player:

Games Rating   Next learning bonus
0     1000.0   100.0
1     1100.0    66.66
2     1166.7    50.00
5     1290.0    28.57
10    1404.0    16.67
20    1529.1     9.091
50    1703.8     3.846
100   1839.5     1.961
200   1976.6     0.9901
500   2159.0     0.3984
1000  2297.3     0.1996
2000  2435.7     0.0999
5000  2618.9     0.0399
10000 2757.5     0.0199

These numbers might look slightly high for a median player, but I think our ratings are currently too compressed.  I think implementing this new scheme would help spread the ratings out somewhat.  If this R1 did turn out to be poorly chosen, it could be adjusted later without disrupting the rating system significantly.
 
Also, I think this scheme would be very helpful psychologically as almost all new players would see their rating rise initially, rather than suffering a drop and then having to fight back up the scale.  I think this might reduce the number of people who get discouraged and leave.  Also, it should be easier to get people to play games if they know they are going to get a few points just for playing.
 
I don’t think this system will eliminate the inflation/deflation effects of people leaving the pool or of global learning about Arimaa.  However, so long as the logistic model of learning is not greatly incorrect and R1 is not very poorly chosen, I don’t think the system will exacerbate these problems.
 
I think the transition from our current system to this new one would necessarily be somewhat messy.  A player who joined at 1500 and played few games before the transition could quickly become very overrated from the learning bonuses.  However, I think the disruption would be short lived since the learning bonuses quickly become smaller than the RU.
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: New rating model with learning
« Reply #1 on: Feb 21st, 2006, 4:34pm »
Quote Quote Modify Modify

That's a very interesting idea, Ryan.  I read somewhere that increase in chess rating is dramatically correlated with the log of the number of hours spent studying (not playing) chess.  Of course, since there isn't much Arimaa material out there that one can study, most of the learning for Arimaa probably does come from playing.  (I assume that your bonuses would only apply to humans, and not to bots, although you don't mention this.)
 
Not only is the model somewhat reasonable, I agree about the psychological benefit of a rating that is likely to rise from the very beginning.  Also there is a social benefit in rewarding active play.
 
That said, there is also something very unappealing about the possibility of points being indefinitely injected into a closed subset of the system.  It is bothersome that someone could reach a rating of 2300 just by playing ShallowBlue 1000 times.  (Actually, that only counts bonuses, not the regular points for winning, so in fact folks might get up to 2500 or so with 1000 straight wins over ShallowBlue.)  Or, leaving the bots out of it, suppose two buddies join and play nobody but each other: their ratings would inevitably rise in tandem without them ever needing to encounter anyone else's ideas.
 
If there is going to be a formulaic rating reward for activity, I would much prefer it to be based on the breadth of opposition rather than the number of games played.  Given that the rating system, whatever it is, will influence behavior, we should at least consider promoting the most desirable behavior.
 
IP Logged

Ryan_Cable
Forum Guru
*****



Arimaa player #951

   


Gender: male
Posts: 138
Re: New rating model with learning
« Reply #2 on: Feb 21st, 2006, 11:02pm »
Quote Quote Modify Modify

on Feb 21st, 2006, 4:34pm, Fritzlein wrote:
I assume that your bonuses would only apply to humans, and not to bots, although you don't mention this.

Yes, none of the bots would get the bonuses.  It would probably make the most sense to have them continue to enter at 1500 as well.  I can't think of a good model for a bot under development, but they are so rare that I don't think it is important.
 
on Feb 21st, 2006, 4:34pm, Fritzlein wrote:
That said, there is also something very unappealing about the possibility of points being indefinitely injected into a closed subset of the system.  It is bothersome that someone could reach a rating of 2300 just by playing ShallowBlue 1000 times.  (Actually, that only counts bonuses, not the regular points for winning, so in fact folks might get up to 2500 or so with 1000 straight wins over ShallowBlue.)  Or, leaving the bots out of it, suppose two buddies join and play nobody but each other: their ratings would inevitably rise in tandem without them ever needing to encounter anyone else's ideas.

I thought about this.  However, due to the nature of the logistic learning model, the ability to inflate one's rating is essentially one-off and therefore extremely fragile.  If one looses any of those points, it will be very hard to get them back:

Games Cumulative Learning Points
17     499.0
225   1000.0
2758  1500.0
33615 2000.0

Also, if someone beats an opponent 10^3 times in a row they should have a 1200+ rating advantage over that player based just on the ELO formula anyway.  This is one reason why I think there should be at least a 1200 point rating spread between ShallowBlue and Bomb, with the top human 600+ above Bomb.  All together, I don't think my system would add significantly to the many opportunities for individual rating inflation that already exist.
 
on Feb 21st, 2006, 4:34pm, Fritzlein wrote:
If there is going to be a formulaic rating reward for activity, I would much prefer it to be based on the breadth of opposition rather than the number of games played.  Given that the rating system, whatever it is, will influence behavior, we should at least consider promoting the most desirable behavior.

I think the learning bonuses should be setup to correlate as much as possible with the actual learning that is happening to the players' true ratings.  Given that playing humans is usually more instructive than playing bots, it might make sense to give a larger learning bonus for games against humans.
 
I think the best way to do this would be to have games against humans to count as worth 3 bot games.  For example if a person had played 20 games against bots, then played a game against a human, he would get R1*(1/22+1/23+1/24), and would be considered to have played 23 effective games.  If he then played his next game against a bot, he would get R1/25.  It does make the system a bit more confusing though.
IP Logged
99of9
Forum Guru
*****




Gnobby's creator (player #314)

  toby_hudson  


Gender: male
Posts: 1413
Re: New rating model with learning
« Reply #3 on: Feb 22nd, 2006, 1:09am »
Quote Quote Modify Modify

on Feb 20th, 2006, 12:07pm, Ryan_Cable wrote:
Currently, the rating system assumes that every player has some constant true rating R0 and that by playing games players' listed ratings should tend towards their true ratings.

Err, that's not how I understand it.  If that were the assumption, you could come up with a much better system than ELO.  To get my current rating, the current system weights my most recent game against a particular player with much more emphasis than my first game ever against them.  Therefore it is setup somewhat correctly to deal with the fact that my true arimaa strength may be changing.
 
I think a more accurate expression of your statement is this:
The current system assumes that IF every player has some constant true rating R0 then by playing games players' listed ratings should tend towards their true ratings.
 
So, when true strengths are NOT constant, the system may or may not be able to track that strength as rapidly and stably as we would like.
 
I personally feel it does a fairly good job of both of these.  [here I am neglecting non-transitivity, time control disparity, etc. which are not part of your argument].  For example I feel that the system has adequately reflected clauchau's fall from optimal form.  I also feel that ratings fluctuations are reasonably tame, which renders the ratings worth looking at.
 
Do they give an adequate representation of strength before strength has actually been properly sampled?  No, but I'm fine with that.  And actually, they're not so bad, because of the fancy rating_uncertainty feature.  This ensures that rapid adjustments are made near the start of every players career, and these adjustments diminish as the player plays more.  (very similar to what you are proposing.)
 
Quote:
The reality for human players is much different.  Since Arimaa is pretty much exclusively played on the server, people start out very weak and then learn about the game primarily through playing games.

True, but often humans start getting worse after a while too (when they get bored).  
 
I don't have time to reply to everything else now, but in summary:
 
You are trying to achieve two things, at the expense of making the ratings system more complex (by the addition of one extra constant).  These things are:
1) Ratings would be "better".  (as in more predictive of game results?)
 
2) New players encouraged by seeing initial rating go up, rather than down (EVEN IF THEY LOSE!!).
 
3) Ratings "more spread out".
 
Regarding number 1.  Maybe for newbies, but the effect would be insignificant for experienced players.
 
Regarding number 2, I agree you will achieve this.  But I'm not sure it's necessarily that useful.  It seems a bit like the pollyanna marking used to continually inflate school-grades these days.  Surely any player is happy for their rating to go down when they lose and up when they win?  I think discouragement is more likely if, after playing plenty, they find they are unable to master a particular bot which becomes an obstacle.  We can address this in other ways (more bots, more mentoring, more human games).
 
Regarding number 3.  I think transitivity is the thing that causes the ratings of experienced players to be compressed more than they should be.  Humans are more error prone than they should be.  Bots are less variable than they should be (but this is changing with more randomized openings).  I am uncertain that your system will help with "compression".  Anyway, I don't consider it to be much of a problem.
 
I, like Fritz, think you will introduce an inflationary side effect.  Not the individual inflation that you talked about, but a global inflation, where the mean rating continually increases.  You are continually adding points to the pool that never get fully removed because people usually retire well past their ratings peak (having given their accumulated learning points to someone else).  That will make comparing our Bobby Fisher to our Garry Kasparov even more difficult than it would be under ELO.
IP Logged
omar
Forum Guru
*****



Arimaa player #2

   


Gender: male
Posts: 1003
Re: New rating model with learning
« Reply #4 on: Feb 22nd, 2006, 1:38am »
Quote Quote Modify Modify

Funny you should bring this up, Ryan. I was also thinking it might be good to start the new players at a rating of 1000 instead of 1500. Since we introduced the bot ladder it seems like the new players are playing more games, but some of the first bots they have to play againt are rated around 1000. If the new player wins, they are only getting a few points for it, but if they lose their rating is taking a big hit. There is definitely a negitive psychological effect on the new players due to us over rating them. So the easiest thing we can do to make the rating system a little better is start the new players at 1000 points. This was actually proposed by David Fotland a long time ago and we had considered it, but didn't really think it would help too much at that time. It is starting to look like a good idea now; especially with many low rated bots available to play.
 
With regard to injecting learning bonus points. I would not consider changes to the current rating formula without first doing some simulations to see if it would actually make it better than the current system. One thing I've learned from experience is not to always trust my mental simulations. So I alway perfer to try actual simulations; like we did with the tournament format. We have tons of cheap processing power now adays so we should use it as much as possible to help us make such decisions. Actually I did run a lot of simulations before selecting the current rating system. If you are interested I can send the programs to you to try out. One thing I had learned from those simulations was that ad hoc rules which tried to make the rating system better, didn't alway hold up. So it is critical to test things in a simulation before putting them into production.
 
Ryan, perhaps you may not be aware of the previous discussions in this forum on the topic of improving the rating system. It is still a work in progress. Some of the problems that we hope to fix with the new rating system are:
 
* inflation, deflation and drift of the rating scale due to the rating scale not being anchored; see "Arimaa rating deflation" page 1:
http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=talk;action=display;nu m=1065901453;start=0
 
* inaccurate ratings due to players being able to select their opponents; see "Arimaa rating deflation" page 6 posting of Sep 21, 2004; also see "Ratings distortion due to selection of opponents":
http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=talk;action=display;nu m=1096120807;start=
 
* modeling human player's performance at different game speeds; see "Omar = OmarFast   ,   bot_bomb = bot_spe"
 
* modeling human player's lose of performance due to inactivity; see "Arimaa rating deflation" page 7
 
I think what you are suggesting could be refered to as:
 
* modeling human player's rapid increase in performance while learning the game
 
I want to eventually shift to using a rating system that is anchored by a random bot which has a rating defined to be zero. I also want to eventually shift to using a performance based rating system. A performance based rating system does not keep a running rating that is updated by looking at just the result of the most recent game, but instead computes a new rating by looking at all the players games up to the most recent game. This type of rating system will allow a new player's rating to change very rapidly when the player is new and has not yet played many games. It also can address the issues of modeling lose of ability due to inactiveness as well as a player playing a small group of opponents.
IP Logged
Ryan_Cable
Forum Guru
*****



Arimaa player #951

   


Gender: male
Posts: 138
Re: New rating model with learning
« Reply #5 on: Feb 22nd, 2006, 8:23am »
Quote Quote Modify Modify

on Feb 22nd, 2006, 1:38am, omar wrote:
Funny you should bring this up, Ryan. I was also thinking it might be good to start the new players at a rating of 1000 instead of 1500. Since we introduced the bot ladder it seems like the new players are playing more games, but some of the first bots they have to play againt are rated around 1000. If the new player wins, they are only getting a few points for it, but if they lose their rating is taking a big hit. There is definitely a negitive psychological effect on the new players due to us over rating them. So the easiest thing we can do to make the rating system a little better is start the new players at 1000 points.

In the long run, R0 (the rating at which new players enter) is totally arbitrary.  ShallowBlue/GnoBot2005P1/Arimaalon are at ~1200 because R0-300 is roughly their equilibrium (while R1=0).  If you started introducing new players at 1000, the bottom of the bot ladder would drop down to ~700 within a couple weeks, if not faster.  For instance, in 74 games, I transferred ~400 points from GnoBot2005P1 to Loc2005P1 over 3 days.  Two days later, GnoBot2005P1 is right back slightly above where it started.  In the long run, the entire population would deflate to be ~500 points below where they would be without the change.
 
Most noobs, who don't do extensive research before playing their first game, are less likely to loose to the second weakest bot than the weakest bot. Then they are even less likely to loose to the third weakest bot.  They are gaining experience faster than the ladder is giving them stronger bots.  However, since they start highly overrated, they often have a lower rating when they play the second bot and a still lower rating when they play the third bot.  This is the primary problem I see the learning bonuses solving.  I don't really think this can be fixed by just adjusting the composition of the ladder, because choosing when to move to the next bot is largely voluntary.  The only other solution I can think of is to artificially fix several of the weakest bots at roughly R0+100 R0+200 R0+300 R0+400, but this strikes me as a much less appealing solution.
 
on Feb 22nd, 2006, 1:38am, omar wrote:
I want to eventually shift to using a rating system that is anchored by a random bot which has a rating defined to be zero. I also want to eventually shift to using a performance based rating system. A performance based rating system does not keep a running rating that is updated by looking at just the result of the most recent game, but instead computes a new rating by looking at all the players games up to the most recent game. This type of rating system will allow a new player's rating to change very rapidly when the player is new and has not yet played many games. It also can address the issues of modeling lose of ability due to inactiveness as well as a player playing a small group of opponents.

I have read through all of the threads in the forum at one point or another.  I think it is certainly possible to create a better rating system than what we are currently using.  By far the greatest problem with our rating system is the fact that people can take great advantage of intransitivity to inflate their ratings by self-selecting opponents, and my proposed learning bonuses don't address this.
 
However, the whole scheme of anchoring to a random bot struck me as madness.  The amount of spread in ratings would be highly dependent on the composition of the bots used to link between the 0 rated bot and the regular population.  Also, there seemed to be some mistaken impression that if A beat B every time then A belonged 709 points above B.  However, this is an artifact of the (improper) storage of ratings as integers; A actually belongs infinitely many points above B (unless there existed some bot C that occasionally beat A and occasionally lost to B, in which case we come back to the intransitivity problem).
 
I think once the bot ladder equilibrates (which I expect to take ~6 months), it will make a reasonably effective floating scale.  There will probably still be significant dependence on the composition of the ladder, but if the spreading effect becomes too strong, we can always retire some of the older bots.  By the way, do you intend to create a full suite of 2006 bots to add to the ladder?
 
on Feb 22nd, 2006, 1:09am, 99of9 wrote:
Err, that's not how I understand it.  If that were the assumption, you could come up with a much better system than ELO.  To get my current rating, the current system weights my most recent game against a particular player with much more emphasis than my first game ever against them.  Therefore it is setup somewhat correctly to deal with the fact that my true arimaa strength may be changing.
 
I think a more accurate expression of your statement is this:
The current system assumes that IF every player has some constant true rating R0 then by playing games players' listed ratings should tend towards their true ratings.
 
So, when true strengths are NOT constant, the system may or may not be able to track that strength as rapidly and stably as we would like.

Good point, if the true rating was really assumed to be constant, then RU would be allowed to tend to 0 rather than 30.  Still, I think the current system is doing a poor job of tracking the ratings of noobs.  I think it is rather bad that a noob's rating falls rather precipitously while his true rating is increasing rapidly.  I think there are at least 4 classes of humans between 1300 and 1650, and determining skill from rating is very difficult in this range.  I think if there is a way to reduce these problems without overly disrupting the rest of the rating system, we should do so.
 
on Feb 22nd, 2006, 1:09am, 99of9 wrote:
I personally feel it does a fairly good job of both of these.  [here I am neglecting non-transitivity, time control disparity, etc. which are not part of your argument].  For example I feel that the system has adequately reflected clauchau's fall from optimal form.  I also feel that ratings fluctuations are reasonably tame, which renders the ratings worth looking at.

clauchau dropped from 2011 to 1709, between games 185 and 347, during this period he would have received 128.3 learning points.  Most of these points would have been lost to the players against whom he lost his real points.  However, since he clearly wasn't learning much, this probably would have contributed marginally to global inflation (though it is possible that there was someone who was learning much faster than the model who would offset this inflation).  Ultimately, I think the fact that learning noobs outnumber declining, but moderately active players by ~50 to 1, makes this less important, especially since the learning bonuses are quite small for experienced players.
 
on Feb 22nd, 2006, 1:09am, 99of9 wrote:
1) Ratings would be "better".  (as in more predictive of game results?)
 
2) New players encouraged by seeing initial rating go up, rather than down (EVEN IF THEY LOSE!!).
 
3) Ratings "more spread out".
 
Regarding number 1.  Maybe for newbies, but the effect would be insignificant for experienced players.
 
Regarding number 2, I agree you will achieve this.  But I'm not sure it's necessarily that useful.  It seems a bit like the pollyanna marking used to continually inflate school-grades these days.  Surely any player is happy for their rating to go down when they lose and up when they win?  I think discouragement is more likely if, after playing plenty, they find they are unable to master a particular bot which becomes an obstacle.  We can address this in other ways (more bots, more mentoring, more human games).
 
Regarding number 3.  I think transitivity is the thing that causes the ratings of experienced players to be compressed more than they should be.  Humans are more error prone than they should be.  Bots are less variable than they should be (but this is changing with more randomized openings).  I am uncertain that your system will help with "compression".  Anyway, I don't consider it to be much of a problem.

In my mind 1 and 3 are roughly the same; I think true ratings are much more spread out than listed ratings and that spreading listed ratings will make them more predictive.  There will probably be some other accuracy that comes from having a somewhat better ordering of noobs, but my guess is that this will be smaller.
 
As for 2, let me say don't underestimate the effect of nominal values.  You would be amazed how many Americans believe $3 is a record high for a gallon of gasoline.  However, beyond the first 4 games, it will be very rare for someone to net gain points when loosing.  Also, I really do think getting the bottom ladder bots more spread out will give people a more accurate view of how much (or in some cases how little) they are progressing.  (After struggling against a bot that is rated 1500 and finally conquering it, it is not terribly encouraging to move to a bot rated 1550 and start struggling again.)
 
I think this system would significantly reduce compression at the bottom.  I think there is going to be significant spreading at the top due to the random BvB games even without any rating system changes.  I predict at least one of the following two things will happen within a year (probably within 4 months): a bot reaches 2000+, a human reaches 2400+.
 
on Feb 22nd, 2006, 1:09am, 99of9 wrote:
I, like Fritz, think you will introduce an inflationary side effect.  Not the individual inflation that you talked about, but a global inflation, where the mean rating continually increases.  You are continually adding points to the pool that never get fully removed because people usually retire well past their ratings peak (having given their accumulated learning points to someone else).  That will make comparing our Bobby Fisher to our Garry Kasparov even more difficult than it would be under ELO.

I think global inflation is not very important.  If we really are worried about it, we could estimate the rate and just subtract/add offsetting points to everyone’s scores.  I think cross generational comparisons are ridicules in a game that is making as much strategic progress as Arimaa.  Even in chess I don't think it is very reasonable.  If Kasparov had been born in 1888, he would have played objectively significantly worse than he actually did.  If Capablanca had been born 1963, he probably would have played objectively better than he did.  In any case, trying to make these comparisons with just listed ratings is hopelessly naive.
 
Also, since the main current cause of inflation is noobs loosing rating points and then leaving, I think it is quite possible that my system could net reduce global inflation.  This is something for which simulations would be useful.
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: New rating model with learning
« Reply #6 on: Feb 22nd, 2006, 1:05pm »
Quote Quote Modify Modify

The more I think abour Ryan's idea, the more I like it.
 
In the old days before Internet game servers, the biggest rating problems arose from the handling of newcomers.  Now that we have self-selection of opponents and non-learning bot players, our biggest problem is rating intransitivity, but the rating of newcomers is still our second-biggest problem.
 
We all know that, on the average, newcomers are drastically overrated at 1500 for their first game.  However, as discussed elsewhere, making the single change of having newcomers enter at a rating of 1000 will have a simple deflationary effect.  Before long the whole scale would be dragged down by 500 points, and then newcomers would be underrated if they entered at 1000.
 
So we accept making a terrible initial estimate of a 1500 rating.  At least ninety percent of the time this estimate is too high, so newcomers donate points to the system.  However, since players typically improve over time, they become underrated and begin to steal points from the system.  Each person who stays contributes more to deflation than each person who leaves contributes to inflation, but there are vastly more early departures than late depatures.   At the moment it appears this has a net inflationary effect (i.e. the average player rating rises over time).  Do we mind the inflation?  Maybe not, because we are simultaneously learning to play better.
 
But separate from the question of inflation/deflation is question of accuracy.  To avoid deflation we give newcomers ratings that we are quite confident are inaccurate.  Why can't we have our cake and eat it too?  If we start newcomers at a 1000 rating, we won't have the inaccuracy, and if we give them 500 bonus points for playing their first couple dozen games, then we're essentially trying to keep the average rating at 1500, much as before.
 
Now, since it is true that players eventually do get worse over time too, from old age if nothing else, I guess I would impose a cutoff after some number of games.  We could set it up to inject 500 bonus points over the course of 50 games, or 1000 bonus points over the course of 500 games, or whatever seems most reasonable before turning off the bonus spigot, but I do think an eventual cutoff is appropriate.
 
The point is that we aren't just trying to make the system as a whole inflation-neutral, we're trying to make each individual player inflation-neutral.  If someone joins and plays four games before quitting, we want them to (on average) give other players approximately as many points as they take away.  We want the same to be true of someone who plays 100 games before leaving, or only one.  The problem of local inflation or deflation is, as Ryan has pointed out, actually quite interwined with the problem of rating accuracy.
 
Ryan's proposal certainly isn't perfect, but I do think it is head and shoulders above the idea of anchoring the system with a ladder of bots, given that the intransitivity of bot ratings is a major contributor to our other, even larger problem.
 
P.S.  If we are going to test this with simulations, then we need to infuse the simulations with guesses as to how quickly people actually do change in true playing strength.  Obviously Ryan's proposal will perform poorly unless we assume that newcomers are in fact rapidly improving on average.  But since that is an assumption that is probably true...
IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: New rating model with learning
« Reply #7 on: Feb 22nd, 2006, 1:12pm »
Quote Quote Modify Modify

on Feb 21st, 2006, 4:34pm, Fritzlein wrote:
There is also something very unappealing about the possibility of points being indefinitely injected into a closed subset of the system.  It is bothersome that someone could reach a rating of 2300 just by playing ShallowBlue 1000 times.  (Actually, that only counts bonuses, not the regular points for winning, so in fact folks might get up to 2500 or so with 1000 straight wins over ShallowBlue.)  Or, leaving the bots out of it, suppose two buddies join and play nobody but each other: their ratings would inevitably rise in tandem without them ever needing to encounter anyone else's ideas.

Upon further reflection, this is more of a transitivity issue than an inflation issue.  Even so, I would prefer if there were some way to inject bonus points based on a combination of number of games played and number of different opponents played.
IP Logged

omar
Forum Guru
*****



Arimaa player #2

   


Gender: male
Posts: 1003
Re: New rating model with learning
« Reply #8 on: Feb 22nd, 2006, 7:07pm »
Quote Quote Modify Modify

I would perfer not to work on trying to patch up isolated problems with our current rating system. The current system is not all that bad. Karl at one time was looking into finding the problems with it and it turns out to be a difficult problem even to find the problems with the rating system as exibited by actual data from the games database. See:
http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=talk;action=display;nu m=1117068449;start=
 
I am also not in favor of introducing new rules into a system without thoroughly testing them. I currently would not be able to test out new ideas, but I invite others to test them through simulations and post the results.
IP Logged
Ryan_Cable
Forum Guru
*****



Arimaa player #951

   


Gender: male
Posts: 138
Re: New rating model with learning
« Reply #9 on: Feb 23rd, 2006, 4:27am »
Quote Quote Modify Modify

on Feb 22nd, 2006, 7:07pm, omar wrote:
I would perfer not to work on trying to patch up isolated problems with our current rating system. The current system is not all that bad. Karl at one time was looking into finding the problems with it and it turns out to be a difficult problem even to find the problems with the rating system as exibited by actual data from the games database.

Well, Fritzlein specifically had to cut out the noobs from the analysis, because their ratings were too inaccurate:
 
on May 25th, 2005, 7:47pm, Fritzlein wrote:
However, just as I was gearing up to estimate in terms of plus-or-minus rating points just how awful the problem of inaccurate ratings appears to be from these statistics, an obvious explanation occurred to me: It is all the fault of newcomers.  Almost everyone who ever entered the system played their first rated game against Arimaazilla or, more recently, Arimaalon.  The ratings inaccurately have these beginning players as favorites.  For those who persist in playing, there is often a reverse trend: after losing enough points to get below the weak bots, the new players are supposedly underdogs, but they quickly learn how to beat the bots, and again the underdogs do better than expected.

My Access kung fu is very weak, but I think the following analysis would be interesting for someone to do.  Take all of the first games played by humans and calculate a performance rating as if they were all played by a single person.  Also, calculate the average listed rating of those humans (should be exactly 1500).  Then take all of the second games played by humans and calculate a performance rating as if they were all played by a single person.  Also calculate the average listed rating of those humans (should be ~1500 given that most people played their first game unrated).  Continue repeating these calculations until the twentieth game or more if practical.  Then make us a table/graph showing these results.
 
I expect that listed ratings will follow a J curve with game number, but that performance rating will be close to monotonically increasing.  There might be some distortion since weak/unsucessful players are probably more likely to leave, but I think the basic point about learning will come shining through.  Also, it might help us make a better estimate of R1.
 
on Feb 22nd, 2006, 1:05pm, Fritzlein wrote:
Now, since it is true that players eventually do get worse over time too, from old age if nothing else, I guess I would impose a cutoff after some number of games.  We could set it up to inject 500 bonus points over the course of 50 games, or 1000 bonus points over the course of 500 games, or whatever seems most reasonable before turning off the bonus spigot, but I do think an eventual cutoff is appropriate.

Well, I think even the among the top humans significant learning is going on.  Also, I think global learning is likely to dominate age/disinterest related declines for the foreseeable future.  Fritzlein has played 748 games, in his next game he would receive 0.2667 points.  Over the next 500 games, he would receive 102.2 points.  For the 500 games after that he would receive 67.3 points.  I think these are perfectly reasonable numbers.  Even if Fritzlein doesn't do the expected learning, I think there will be plenty of others around who are learning who will take those points off his hands.  If however, the learning bonuses do turn out to be injecting too many points, I would greatly prefer to reduce R1 than to add an arbitrary cap.
 
on Feb 22nd, 2006, 1:05pm, Fritzlein wrote:
The point is that we aren't just trying to make the system as a whole inflation-neutral, we're trying to make each individual player inflation-neutral.  If someone joins and plays four games before quitting, we want them to (on average) give other players approximately as many points as they take away.  We want the same to be true of someone who plays 100 games before leaving, or only one.  The problem of local inflation or deflation is, as Ryan has pointed out, actually quite interwined with the problem of rating accuracy.

For each player to be inflation neutral, I think we need listed ratings to be an unbiased estimator of true ratings at every level of experience (people are as likely to be overrated as underrated no matter how many games they have played).
 
on Feb 22nd, 2006, 1:05pm, Fritzlein wrote:
P.S.  If we are going to test this with simulations, then we need to infuse the simulations with guesses as to how quickly people actually do change in true playing strength.  Obviously Ryan's proposal will perform poorly unless we assume that newcomers are in fact rapidly improving on average.  But since that is an assumption that is probably true...

Right, in the simulations, I would want human i to have true rating R0i+R1i*ln(N+1)+Sum[uin,{n,1,N}], where R0i and R1i are constants drawn from some given distributions, and uin is random noise in the learning drawn after each game.  However, having my simulated humans have the same learning model as my rating system makes the whole thing Daisyworld style question begging.  I'm not sure what the solution to this is.
 
on Feb 22nd, 2006, 7:07pm, omar wrote:
I am also not in favor of introducing new rules into a system without thoroughly testing them. I currently would not be able to test out new ideas, but I invite others to test them through simulations and post the results.

I posted this topic primarily because I had an interesting idea that I thought would provoke discussion and hopefully revive the general discussion of improvements to the rating system that seemed to have died out.  I think many of the problems with the ratings are slowly being worked out by the random BvB games.  My intuition is that my system would be somewhat better than our current one, with nearly all of the improvements coming from the bottom end of the rating scale.  Still, I fully agree that making changes to the rating system, especially complicating changes, should be done with great caution and evidentiary support.
« Last Edit: Feb 23rd, 2006, 8:07pm by Ryan_Cable » IP Logged
clauchau
Forum Guru
*****



bot Quantum Leapfrog's father

   
WWW

Gender: male
Posts: 145
Re: New rating model with learning
« Reply #10 on: Feb 23rd, 2006, 2:41pm »
Quote Quote Modify Modify

As 99of9, I see no such assumption as a limit R0 in the current formula. I even don't see what the new formula is supposed to be.
 
About improving through games played (and watched, Fritzlein is right), I see no easy correlation as far as I am concerned. Eg. I suddenly happen to be able to play at 30 or 45 seconds per move. It happens all of a sudden, after a few years. And my level clearly has as much to do with my experience, intelligence and lack of them than my fear, enthusiam, serenity, improvizing spirit, surrounding  Roll Eyes etc, which sometimes fluctuate very slowly over the year.
 
I share Ryan_Cable's feeling that there is something funny with beginners' rating, but I rather see it as some misrepresentation. How about improving the way lists of rated players are shown and adding a dimension for RU values? THe player names and ratings could be shown with different depths, colors, sizes or horizontal shifts according to RU?
 
Color would be the easiest way: the closer RU is to 30, the closer to the foreground color. A high RU would make the player name close to the background color and such players would want to get more visible and play more. Anybody who hasn't played yet is invisible.
« Last Edit: Feb 23rd, 2006, 2:44pm by clauchau » IP Logged
PMertens
Forum Guru
*****



Arimaa player #692

   
WWW

Gender: male
Posts: 437
Re: New rating model with learning
« Reply #11 on: Feb 23rd, 2006, 3:12pm »
Quote Quote Modify Modify

I suddenly forgot why the RU has to be so high for a noob.
If it starts like with everybody else then their rating will not fluctuate so strangely in the beginning .... which could be closer to reality ...
 
And/Or let them start with a rating = lowest rated bot (instead of a fixed number)
 
but then I am to lazy to write simulations ...
IP Logged
Ryan_Cable
Forum Guru
*****



Arimaa player #951

   


Gender: male
Posts: 138
Re: New rating model with learning
« Reply #12 on: Feb 23rd, 2006, 10:23pm »
Quote Quote Modify Modify

on Feb 23rd, 2006, 3:12pm, PMertens wrote:
And/Or let them start with a rating = lowest rated bot (instead of a fixed number)

This would result in massive, disruptive global deflation.  There would be huge overspreading, and Arimanator would soon be the top rated human due to his non-playing.  As in real life, I think deflation would be much worse than a similar amount of inflation.
 
I think a similar effect can be had with less problems by taking several of the weakest bots, and anchoring each of their ratings such that the median noob enters roughly correctly rated relative to these bots.  As I mentioned above, I think this is a rather unappealing hack, but it probably would make the noobs’ ratings better behaved.
 
on Feb 23rd, 2006, 3:12pm, PMertens wrote:
I suddenly forgot why the RU has to be so high for a noob.
If it starts like with everybody else then their rating will not fluctuate so strangely in the beginning .... which could be closer to reality ...

Basically, RU is proportional to the noise in the ratings and inversely proportional to the (rise time) error reduction when error is large.  Since we expect that rating error will be largest for noobs, we give them a large RU and then reduce it as they gain experience to reduce noise after their rating has hopefully converged to be much closer to their true rating.  I think this idea is basically sound, but since noobs often have their true rating and their listed rating moving in opposite directions, it might actually be making things worse than just giving everyone RU=30.
 
http://en.wikipedia.org/wiki/PID_controller
 
I think of the rating system as a sort of feedback system that tries to track true ratings with listed ratings.  However, because true ratings are not directly observable, we have to estimate the error in listed rating through the proxy of game results.  This makes it much more complicated to do the sort of noise/oscillation damping that can be done in a mechanical system; though Fritzlein did suggest that adjusting ratings based on the result of several games would be less volatile:
 
http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=talk;action=display;nu m=1117068449;start=15
 
on Feb 23rd, 2006, 2:41pm, clauchau wrote:
As 99of9, I see no such assumption as a limit R0 in the current formula. I even don't see what the new formula is supposed to be.

I was mistaken to claim that the current system is assuming that humans have constant true rating.  After further consideration, I think the system is assuming (or at least is optimized for) players whose true ratings are random walking (R0i+Sum[uin,{n,1,N}], where uin is a mean 0 random change in true rating of human i after game n).  However, the vast majority of humans have true ratings that are increasing as they play games.  My guess is that this increase is roughly logorithmic in the number of games they have played (R0i+R1i*ln(N+1)+Sum[uin,{n,1,N}]).
 
Consider the case Fritzlein suggested of 2 friends who play 1000 games against each other and no one else.  Under the current system they will always have an average rating of 1500.  Under my system they will start with an average rating of 1000 and end with an average of 2297.3.  We can debate the magnitude and distribution of the learning, but clearly it is a lot bigger than 0.
IP Logged
Pages: 1  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.