Welcome, Guest. Please Login or Register.
Apr 20th, 2024, 11:12am

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « Whole History Ratings »


   Arimaa Forum
   Arimaa
   General Discussion
(Moderator: supersamu)
   Whole History Ratings
« Previous topic | Next topic »
Pages: 1 2 3 4 5  ...  10 Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: Whole History Ratings  (Read 67406 times)
woh
Forum Guru
*****



Arimaa player #2128

   


Gender: male
Posts: 254
Re: Whole History Ratings
« Reply #30 on: Mar 21st, 2009, 2:04am »
Quote Quote Modify Modify

on Mar 12th, 2009, 8:30am, Fritzlein wrote:
Also I don't see many players with thin records floating way above the median.  (Although I want to look at how Rabbit got to #22, since I don't recall hearing of him before.)

 
on Mar 13th, 2009, 9:47am, Tuks wrote:
you might want to revise it though, no matter how much convincing you do, you will not convince me "Rabbit" who happened to win 3 out of 3 human games has any chance against any of the top players in a non-postal match

 
I am not trying to convince anyone here, but purely on Rabbit HvH results one might arguable think that he deserved to be in 22nd place.  One of his wins was against arimaa_master who is ranked 15th.  Their ratings give Rabbit now 41% for a win over arimaa_master which might be considered as not overrated since he has proven he can do it.
 
on Mar 13th, 2009, 11:10am, mistre wrote:
Perhaps there should be a minimum number of games needed to be ranked - which would fix the "Rabbit" situation.

 
There is no need to exclude players. The WHR can address the “Rabbit” issue itself. By changing the number of games won/lost against a fictitious player it will be easier/harder for the players to move away from the average rating. (The average rating being the rating of the fictitious player.) It has the most impact on the ratings of players with fewer games played. With a prior of 2 games won/lost Rabbits rating drops to 1499,7 while the rating of arimaa_master sees only a minor change to 1714.5. With a prior of 3 games won/lost their ratings become 1423,8 and 1699,1. Because of the bigger impact on players with fewer games Rabbits rank drops from 22nd to 35th and to 42nd while arimaa_master stays on 15. The full results are available for 2 wins/losses and 3 wins/losses.
 
I have done some more tests to see how fast a rating of a new player moves up when he wins a number of games against a average player for different priors. Since it is just a number in the equations the number of games won/lost doesn’t has to be an integer number. First I plotted how far a new player gets against the number of games of the prior for a different number of games won.  
 
If the number of games of the prior is zero the rating would become infinite.  So with value less than 1 the rating increases quickly. With values greater than 1 a new player gets less far away from the average rating.
 
In a second graph I plotted how far a new player gets against the number of games won for a different number of  games of the prior.  

Here I also added the results for the gameroom ratings. It turns out that the result for the gameroom ratings almost match those of the WHR with a prior of 2 games won/lost. I would suggest to use WHR ratings with a prior of 2 games won/lost against a fictitious player. Any thoughts on this?
 
IP Logged

Hannoskaj
Forum Guru
*****



Arimaa player #3794

   


Gender: male
Posts: 75
Re: Whole History Ratings
« Reply #31 on: Mar 21st, 2009, 3:08am »
Quote Quote Modify Modify

When reading the long (and interesting) thread on rating inflation/deflation that Fritzlein had pointed me to in another discussion, I was planning to suggest reading Rémi's article, but I see you already have this kind of wholesome reading!
 
About the choice of the prior, two looks like a good idea from the graphs  you have posted, woh. I just wonder about what would be the behaviour if you plotted 2-prior and GR against number of victories + 1 loss, number of victories + 2 losses, etc. Maybe even show the 2d graph, if you can draw it.
IP Logged
Hannoskaj
Forum Guru
*****



Arimaa player #3794

   


Gender: male
Posts: 75
Re: Whole History Ratings
« Reply #32 on: Mar 21st, 2009, 3:11am »
Quote Quote Modify Modify

Oh, by the way, I do not think there's a need for the prior to give an integer number of victories and defeats; but it's true the benefits we could get (slightly better fitting what we deem should be the behaviour) are probably not worth making things strange.
IP Logged
woh
Forum Guru
*****



Arimaa player #2128

   


Gender: male
Posts: 254
Re: Whole History Ratings
« Reply #33 on: Mar 21st, 2009, 4:14am »
Quote Quote Modify Modify

on Mar 21st, 2009, 3:11am, Hannoskaj wrote:
Oh, by the way, I do not think there's a need for the prior to give an integer number of victories and defeats; but it's true the benefits we could get (slightly better fitting what we deem should be the behaviour) are probably not worth making things strange.

 
We could use a non integer number of games like 1.5 if the general consensus is that a new player moves away from the average rating too fast with 1 and too slow with 2.
 
In fact the number of games won in the prior doesn't has to be the same as the number of games lost. If that were the case a new player would not start with an initial rating, that is before he played a single game, equal to the average rating. This could prove to be an interesting idea. If on average a new player has a chance of 1 in 4 to win a game against an average player then may be we should just use a prior of 1 game won and 3 games lost.
 
I think that the total number of game of the prior dictates how fast a new player moves away from his initial rating and the distribution of number of wins and losses determines the initial rating of a new player in relation to the average rating. But I would need to do tests to be sure of that. I wish I could spent all my time on this Smiley
IP Logged

woh
Forum Guru
*****



Arimaa player #2128

   


Gender: male
Posts: 254
Re: Whole History Ratings
« Reply #34 on: Mar 21st, 2009, 6:47am »
Quote Quote Modify Modify

on Mar 18th, 2009, 7:29pm, omar wrote:
So woh how can we get daily updates of our WHR ratings? If you want I can setup to run the calculations on the arimaa.com server once a day.

 
Omar
 
At this moment the source of the WHR rating tool is the Arimaa game archive. This archive is only updated on a weekly base. To make daily updates available I would need another source. What would you suggest?
 
The tool is a Windows executable. Can you run this on the arimaa.com server?
IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Whole History Ratings
« Reply #35 on: Mar 21st, 2009, 8:15am »
Quote Quote Modify Modify

Thanks for continuing to work on this woh.  I am very eager to see WHR ratings that are updated daily and integrated into the server.
 
I tend to lean in favor of weak prior distributions, such as just one win and one loss to an anchor player.  This does make it easier for a new player to shoot up the ranks quickly, but my intuition is that a stronger prior has a different disadvantage.  I believe that if you run WHR with a stronger prior the ratings of ArifSyed and Swynndla will unduly benefit from it.  Why?  Because both of them beat many newcomers in their attempts to win the Player of the Month contest.  A player whose entire game history consists of two losses to Swynndla will be rated as quite weak if the prior is weak, but not nearly so weak if the prior is strong.  Thus if the prior is strong, it will appear that Swynndla beat a host of not-terribly-weak players, when in fact they were all quite weak.  That's just a hunch though; I'd be curious whether the numbers bear me out.
 
I am no longer drawn to the notion of using one win and three losses for the prior, although I once was.  Why should we make it easier for a rating to move up than to move down?  Symmetry makes more sense.  If we believe that the prior is too kind to newcomers, then we can have it be one loss and one win to a lower-rated anchor rather than keeping the anchor rating the same and adding more losses.
 
On the other hand, I do like the idea of extra losses for the purpose of tournament seeding.  If we are alarmed that someone can get Rabbit's high rank (and thus a high seeding into tournaments) on the basis of only a few games, we can do seeding based on the rating each player would have given two additional losses to the anchor.  Note that this is quite different from giving everyone a one-win-three-loss prior.  To find a player's rating for tournament seeding, we give just him an extra two losses and see what his rating would be.  Then we remove those two losses and give them to another player, etc., until we have calculated an individual conservative rating for everyone entering the tournament.
 
Apart from seeding tournaments, I vote we let Rabbit keep his high rating from a weak prior.  Although it is a tenuous guess, it is a reasonable guess.  Yes, I understand that people might not want Rabbit to be displayed with such a high rating in the list of best players.  One solution to that is to have the Top Rated Players list default to only active players.  Then no one will ever hear of Rabbit unless Rabbit comes back to play more games.  For the curious we could also keep a list of Top Rated Players including inactive players, but seriously deprecate that list (i.e. hide the link).
« Last Edit: Mar 21st, 2009, 8:17am by Fritzlein » IP Logged

mistre
Forum Guru
*****





   


Gender: male
Posts: 553
Re: Whole History Ratings
« Reply #36 on: Mar 21st, 2009, 6:42pm »
Quote Quote Modify Modify

on Mar 21st, 2009, 8:15am, Fritzlein wrote:
One solution to that is to have the Top Rated Players list default to only active players.  Then no one will ever hear of Rabbit unless Rabbit comes back to play more games.  For the curious we could also keep a list of Top Rated Players including inactive players, but seriously deprecate that list (i.e. hide the link).

 
I like this idea.  What to propose as determining an active player?  We could be super lenient (i.e. 1 game in last year) or super strict (i.e. 1 game within last month) or somewhere in between.
IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Whole History Ratings
« Reply #37 on: Mar 21st, 2009, 7:15pm »
Quote Quote Modify Modify

on Mar 21st, 2009, 6:42pm, mistre wrote:
I like this idea.  What to propose as determining an active player?  We could be super lenient (i.e. 1 game in last year) or super strict (i.e. 1 game within last month) or somewhere in between.

How about six games in the last year?  Then if you play the Postal Mixer only, or the World Championship plus one practice game only, you are still active.  It's like a one-event-per-year rule.
IP Logged

woh
Forum Guru
*****



Arimaa player #2128

   


Gender: male
Posts: 254
Re: Whole History Ratings
« Reply #38 on: Mar 22nd, 2009, 9:04am »
Quote Quote Modify Modify

I found a way to get the details of the games not yet included in the game archive. And I have updated the results including all the games till 3:45 PM today (GMT). That is, just before the final game of the WC started.
IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Whole History Ratings
« Reply #39 on: Mar 22nd, 2009, 2:22pm »
Quote Quote Modify Modify

Marvelous!  Thank you, woh.
 
The relative values of the HvH ratings look very reasonable, but I'm worried that they are so much lower than the game room ratings.  Do I understand correctly that your anchor rating of 1220 was chosen to give ArimaaScoreP1 a rating of 1000 when all games are rated?  That's a nice idea when bots are included, but for the human-only ratings it seems too low.  It looks like we would need to add about 200 points to the human-only anchor to make the scales comparable.
 
To put it another way, if we are rating two different sets of games, I would rather have the outputs be roughly comparable than have the anchors be identical.  Would you be able to take the ratings of the 100 most active HvH players and anchor the HvH ratings so that their average rating is the same as for the all-game ratings anchored at 1220?
« Last Edit: Mar 22nd, 2009, 2:26pm by Fritzlein » IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Whole History Ratings
« Reply #40 on: Mar 23rd, 2009, 12:39pm »
Quote Quote Modify Modify

on Mar 21st, 2009, 8:15am, Fritzlein wrote:
I believe that if you run WHR with a stronger prior the ratings of ArifSyed and Swynndla will unduly benefit from it.

Oh, I didn't see at first that you had actually posted with different priors.  My suspicions were confirmed.  With a 1-of-2 prior, ArifSyed is ranked 46th.  With a 2-of-4 prior he is ranked 37th.  With a 3-of-6 prior he is ranked 33rd.
 
The point is that when we choose our prior we should not only look at how it affects newcomers, but also how it affects established players who play a lot of newcomers.  Apparently if the prior is stronger, then beating up newcomers is rewarded more.  A weak prior has the advantage of rewarding sandbagging less.
« Last Edit: Mar 23rd, 2009, 12:55pm by Fritzlein » IP Logged

woh
Forum Guru
*****



Arimaa player #2128

   


Gender: male
Posts: 254
Re: Whole History Ratings
« Reply #41 on: Mar 24th, 2009, 1:06pm »
Quote Quote Modify Modify

on Mar 22nd, 2009, 2:22pm, Fritzlein wrote:
Would you be able to take the ratings of the 100 most active HvH players and anchor the HvH ratings so that their average rating is the same as for the all-game ratings anchored at 1220?

 
Fritzlein, is your concern the difference between the WHR all-games ratings and the WHR HvH-games ratings or the difference between the gameroom ratings and the WHR HvH-games ratings? I would expect the latter since the gameroom rating would still be used as the all-games rating.
on Mar 18th, 2009, 7:29pm, omar wrote:
I've started to view our current rating system as an unofficial superficial rating system which simply serves to provide immediate feedback of ratings to new users.

 
Would it not be more logical to anchor the WHR HvH ratings so that the average rating of the 100 most active HvH players is the same as the average of their gameroom rating? Or was it that what you meant all along?
 
 
BTW: the 100th most active HvH player played 12 HvH games.
IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Whole History Ratings
« Reply #42 on: Mar 24th, 2009, 2:03pm »
Quote Quote Modify Modify

I would like the human-only WH rating of experienced humans who haven't pumped up their ratings with bot bashing to correspond roughly to their current game room ratings.  I don't care about the correspondence in ratings for bot bashers or newcomers.  In fact, I explicitly want the WH ratings to be different from game room ratings for bot bashers and newcomers.
 
Your graphs show that low-rated players have significantly higher game room ratings than they have WH ratings.  My hunch is that most of those discrepancies come from players with very scant game records.  There are lots of accounts where people joined at a 1500 rating, lost one game, and never played again.  Their game room rating will obviously be way above their WH rating
 
My guess is that over the history of Arimaa, there was first rating deflation, then rating inflation, and most recently another bout of rating deflation.  Since the absolute meaning of game room ratings has probably fluctuated by a hundred points or more over time, I would be fine with any set of ratings that were scaled within about 100 points of the game room ratings.  By superficial examination, your all-games WH ratings (for non-bot-bashers, non-newcomers) fall within the tolerable range, but the human-only WH ratings fall so far below game-room ratings that it would be a major jolt.  
 
I quite like your method of initializing the system with an anchor rating that sets ArimaaScoreP1's rating to 1000.  If we can achieve WHR on nearly the same scale as game room ratings using the notion that ArimaaScoreP1=1000, that's a bonus, because that means we are indirectly calibrating to RandomMover=0.
 
Unfortunately, it is almost a contradiction in terms to calibrate a human-only rating system to the rating of a bot.  That's why I came up with the odd idea of calibrating all-game WHR ratings first to ArimaaScoreP1=1000, and then calibrating the human-only WHR to the all-games WHR.
 
Maybe 12 games is a rather small number to make an HvH rating reliable.  Also, what I really want to achieve is to exclude the influence of bot-bashers.  If we did the calibration on all players with at least 30 rated HvH games, and with between 25% and 75% of their games against bots, how many players would that leave?  I think quality is more important than quantity for aligning the two scales, but if there are too few points of comparison, it would be easier for the alignment to be thrown off by pure randomness.  Do we have 30 players with at least 30 HvH games and good mix of human and bot opponents?
« Last Edit: Mar 24th, 2009, 2:37pm by Fritzlein » IP Logged

woh
Forum Guru
*****



Arimaa player #2128

   


Gender: male
Posts: 254
Re: Whole History Ratings
« Reply #43 on: Mar 25th, 2009, 4:22am »
Quote Quote Modify Modify

on Mar 24th, 2009, 2:03pm, Fritzlein wrote:
Do we have 30 players with at least 30 HvH games and good mix of human and bot opponents?

51 players have played at least 30 HvH games. 23 of them have played at most 75% of their games against bots, none of them played less then 25% of their games against bots. There are 30 players who played at least 25 HvH games with between 20% and 80% of their games against bots.
 
on Mar 24th, 2009, 2:03pm, Fritzlein wrote:
If we can achieve WHR on nearly the same scale as game room ratings using the notion that ArimaaScoreP1=1000, that's a bonus, because that means we are indirectly calibrating to RandomMover=0.

This was the case with the data of January but far less with the current data.
Comparing the average rating of the above mentioned pools of 23 and 30 players for January, we get:
___   WHR all-games  GMR  difference
R23     1899.89   1943.65     45.76
R30     1860.40   1906.17     43.76

 
As you can see in the history graph for ArimaaScoreP1, his rating fluctuates. Fixing his final rating at 1000 makes the anchor change over the course of time. ArimaaScoreP1 apparently has been doing well lately. Now an anchor of only 1133 is needed to fix his rating at 1000. Using this anchor pulls the whole scale down by about 90 points.
___   WHR all-games  GMR  difference
R23     1808.99   1944.00     135.01
R30     1770.27   1906.20     135.93
This is no longer within about 100 points (or maybe just).
 
I then checked what anchor is required to synchronize the HvH WHR with the gameroom ratings.
January    R23     1476.12
January    R30     1476.04
currently  R23     1474.54
currently  R30     1472.67

I do not like my idea any longer to use an anchor that fixes the rating of a player. Looking at the history of ArimaaScoreP1 its rating fluctuates by about 150 points. This will cause the whole set of ratings go up and down with it. I am now in favour of a fixed anchor given the fact that about the same anchor is required now as in January to synchronize the HvH WHR with the gameroom.  I will try to check how this evolves over a bigger period.
 
Both pools of players give about the same result. So I think both are a good reference.
IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Whole History Ratings
« Reply #44 on: Mar 25th, 2009, 6:29am »
Quote Quote Modify Modify

on Mar 25th, 2009, 4:22am, woh wrote:
Both pools of players give about the same result. So I think both are a good reference.

Excellent.
 
Quote:
I do not like my idea any longer to use an anchor that fixes the rating of a player. Looking at the history of ArimaaScoreP1 its rating fluctuates by about 150 points. This will cause the whole set of ratings go up and down with it.

That's a very good point I had not considered.  Any single player will have a performance rating that fluctuates over time by chance.  Fixing the rating of any single player will cause the whole system to swing up and down in response to that player's performance.  We might think that the performance rating of a fixed-performance bot would be quite stable, but in the case of ArimaaScoreP1 there will be a huge amount of noise introduced via its opponents.  Since all newcomers play ArimaaScoreP1 first, and newcomers are necessarily the least accurately-rated people in the system, the inaccuracy of their ratings will show up as a ton of noise in the performance rating of ArimaaScoreP1.
 
Therefore I am totally in agreement with your change of position.  We should definitely not anchor the ratings on ArimaaScoreP1.  It would be much better to anchor the ratings on something else such as a fixed prior distribution.
 
Quote:
I am now in favour of a fixed anchor given the fact that about the same anchor is required now as in January to synchronize the HvH WHR with the gameroom.  I will try to check how this evolves over a bigger period.

I am interested in how you will measure the effect of a fixed anchor over a larger period of time.  I did not expect that in order to stabilize the ratings of a good reference group of players, we need an anchor that is approximately 1500, the rating that we formerly gave to all newcomers.  We're a bit under 1500 now, but I blame that on recent deflation, and I expect that a year ago the anchor rating needed to calibrate WHR to game room ratings would have been over 1500 due to the inflation underway then.  If you see this anchor value drifting up and down historically, but in the neighborhood of 1500, and in any case drifting less quickly and violently than the rating of ArimaaScoreP1, it would seem like a strong argument for anchoring WHR with a prior against a 1500-rated player.
 
« Last Edit: Mar 25th, 2009, 7:27am by Fritzlein » IP Logged

Pages: 1 2 3 4 5  ...  10 Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.