Welcome, Guest. Please Login or Register.
Oct 31st, 2024, 6:52pm

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « Arimaa rating deflation »


   Arimaa Forum
   Arimaa
   General Discussion
(Moderator: supersamu)
   Arimaa rating deflation
« Previous topic | Next topic »
Pages: 1 ... 4 5 6 7 8  ...  12 Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: Arimaa rating deflation  (Read 30707 times)
omar
Forum Guru
*****



Arimaa player #2

   


Gender: male
Posts: 1003
Re: Arimaa rating deflation
« Reply #75 on: Sep 16th, 2004, 6:55pm »
Quote Quote Modify Modify

Actually if transitivity is not preserved then any scale does not make sense; regardless of weather it is absolute or floating. A scale can only be defined if some kind of transitivity exists.
 
But we do have a rating scale for our floating rating system; and we also know that transitivity is not strictly preserved; so what then do the ratings on our floating scale mean?  The only thing those ratings reflect is how we've performed against the players we've played. If someone played all their games against the same person, then their rating (regardless of the type of scale) would be way off from what it should be if they had played "against the field". So even on a floating scale we still have the same problem if we look at the ratings between any two players to accurately tell us how the players will perform against each other. We used the probabilities to go forward and compute the ratings, but we can't rely on those ratings to go in the reverse direction and accurately tell us the probabilities.
 
I know you have to agree to all this because everything I've said so far is stuff that I learned from you (Fritzlein) in our email conversations Smiley
 
So what is this notion of an absolute rating scale. Maybe using the word 'absolute' is what is causing the problem. It might be better to call it an 'anchored' rating scale.
 
Now to see what I mean by an 'anchored' rating scale imagine this experiment. Suppose that we introduce a random bot and several other non-random (but really dumb) bots into our current floating rating system in the Arimaa gameroom. All the bots come in with an initial rating of 1500 and start losing lots of games (even against shallowBlue). So the ratings of these bots sink pretty low. But they win some games against each other and also some of them win some games against shallowBlue (playing rated games) and so eventually their ratings become ordered and stabilize. Now take whatever rating the random bot has and reset that to zero and shift all the other players (bots and humans) ratings based on that difference. Now we've got an anchored rating scale with the random bot having a rating of zero.  
 
But whenever the random bots rating drifts from zero we have to readjust everyones ratings; not good. So to avoid doing this we can just fix the random bots rating to zero and let the other players ratings change. But this causes the rating scale to be adjusted much more slowly. To speed things up we can have the random bot and all the other low rated bots play lots and lots of games against each other and then fix all their ratings so that do not change. Then there is much more chance of players with non-fixed ratings playing against the fixed (or anchored) players and the rating scale getting adjusted faster.
 
In a way we can think of our current rating system as being anchored around an average player having a rating of 1500, because that's the rating new players come in with. If we had chose to let the new players to come in with a rating of 10,000 our ratings would now be scattered around 10,000. So we just want a system where the ratings are anchored based on a random bot having a rating of zero. I don't think there is anything wrong or impossible about doing that. But we can't strictly take any players rating to mean that the difference in rating against the random bot can give an accurate measure of the probability with which that player can beat the random bot. It didn't really mean that in our current scale anchored with the average player at 1500 and it won't really mean that in a scale anchored with the random player at zero.
 
So there are no new problems introduced by anchoring at zero using a random bot instead of at 1500 as the average player. But it does eleminate the problem of drift because we are anchoring based on a player that will not get any better or worse as it plays and will also never retire from playing.
 
Now the issue about ratings reflecting a players ability "against the field" and not just against a few selected players is a real problem, but it is independent of the type of rating scale. This needs to be dealt with seperately and trying to fix it will require changing the rating formulas altogether. But I think in our email conversation we may be making some progress on this issue.
 
Omar
IP Logged
99of9
Forum Guru
*****




Gnobby's creator (player #314)

  toby_hudson  


Gender: male
Posts: 1413
Re: Arimaa rating deflation
« Reply #76 on: Sep 17th, 2004, 5:53am »
Quote Quote Modify Modify

Here's some raw data to throw in the mix.
 
I revved up Clauchau's program, and played out the rest of the duels to get winning percentages of all the pairs.  Then I fiddled with the "ratings" of each of these bots (with random_mover anchored at 0), until I got a good fit for these winning percentages against the predicted winning percentages based on "ratings" (using the formula for the arimaa rating).  
 
Here's how good the fit was.  [Fitting could be done even better with a program, I just did it by hand in a spreadsheet]

 
The X axis is the actual proportion of games won by a bot in a particular duel, the Y axis is the predicted proportion of games won by that bot in that duel if both bots are rated as shown below.
 
Remember these are a group of bots with quite diverse (but very dumb) styles.  They are all somewhat stochastic.
 
The ratings came out as:

   0 M  
 -30 S
 308 S+K-K  
 343 S+I  
 478 S+I-I  
 794 M+I-I  
 890 S+F-F  
 964 S+S-S  
 985 M+F-F  
1044 M+S-S  

 
So I think that all up, this bodes reasonably well for an anchored ratings system.  Clauchau has managed to span 1000 ratings points with fairly "predictably" performing bots (ie where a single number rating represents their chances of success fairly well).  There are a couple of gaps people may want to plug, or we can keep extending it with better bots and hopefully get to shallowblue one day Smiley.
 
One thing to note is that my fitting was done based on percentage wins, not on relative rating.   This is because I did not want to make too much distinction between a 99.9% and a 99.8% win ... even though by the ratings formula these would have quite different ratings differences.   Basically my fitting system weights games between bots of similar standard more than it does bots of way different standard.  I think that's a good thing.
« Last Edit: Sep 17th, 2004, 6:02am by 99of9 » IP Logged
99of9
Forum Guru
*****




Gnobby's creator (player #314)

  toby_hudson  


Gender: male
Posts: 1413
Re: Arimaa rating deflation
« Reply #77 on: Sep 17th, 2004, 10:04am »
Quote Quote Modify Modify

on Sep 17th, 2004, 5:53am, 99of9 wrote:

[Fitting could be done even better with a program, I just did it by hand in a spreadsheet]

 
Well now I've written a program to do some simulated annealing to determine the best fit ratings.  Here's the output for that same dataset.

# Rating:     0.0 M
# Rating:   -23.6 S
# Rating:   311.3 S+K-K
# Rating:   362.9 S+I
# Rating:   491.4 S+I-I
# Rating:   806.9 M+I-I
# Rating:   913.0 S+F-F
# Rating:   986.1 S+S-S
# Rating:  1007.1 M+F-F
# Rating:  1073.9 M+S-S

 
So I wasn't that far off by hand...  
IP Logged
99of9
Forum Guru
*****




Gnobby's creator (player #314)

  toby_hudson  


Gender: male
Posts: 1413
Re: Arimaa rating deflation
« Reply #78 on: Sep 17th, 2004, 10:04am »
Quote Quote Modify Modify

Here's the program if anyone wants to do this for themselves.  All you have to do to include a new bot in the ratings is to add another line to the results crosstable (in the input file results.txt), and set it to run.
 

/*  
   ************************************************************************ *********
   RateArimaaBots.c
 
   Program to rate bots by the Arimaa rating scheme.  
   The first bot in your list will automatically take the rating value of 0.
 
   ------------------------------------------------------------------------ ---------
   Input file should be a results crosstable of the format:
   %d\n       <number of bots>
   %s %s %s %s ...\n    <contents of this line do not matter>
   %s\n       <name of bot 0>
   %s %f\n    <name of bot 1 and performance (up to 100) against bot 0>
   %s %f %f\n      <name 2 and performance against bot 0 and bot 1>
   %s %f %f %f\n   <etc>
   ------------------------------------------------------------------------ ---------
   A filled crosstable is also handled correctly if that input format is easier,
   but there is no error checking to ensure that A vs B adds up to 100.
 
   Version 0.0
   Toby Hudson (toby<AT>hudsonclan.net)
   This program is available without warranty for anyone to use for any good purpose.
   Please credit the author in any publications or derivative works.
   ************************************************************************ **********
 */
 
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
 
#define MAXBOTS 20
#define MAXNAMELENGTH 40
#define NGEN 1000001
#define MAXDELTA 1.0
 
double kT;
 
double predict (double ratingB, double ratingA) {
  return 100*(1.0/(1.0+pow(10.0,((ratingA-ratingB)/400.0))));
}
 
double rmserror (int n,  
   double rate[MAXBOTS],  
   double res[MAXBOTS][MAXBOTS],  
   double pred[MAXBOTS][MAXBOTS]) {
  double err = 0.0;
  int i, j;
  for (i=0; i<n; i++) {
    for (j=0; j<i; j++) {
 pred[i][j] = predict(rate[i], rate[j]);  
 err += (res[i][j]-pred[i][j])*(res[i][j]-pred[i][j]);
    }
  }
  err /= ((double)n*(n-1)/2);
  err = sqrt(err);
  return err;
}
 
void ReadIn (int *n,  
      double res[MAXBOTS][MAXBOTS],  
      char nam[MAXBOTS][MAXNAMELENGTH]) {
  FILE *fp;
  char ch='x';
  int i, j;
   
  fp = fopen("results.txt", "r");
   
  fscanf(fp, "%ld\n", n);
  printf("# Rating %d bots\n", *n);
   
  while (ch!='\n') ch = getc(fp);
   
  for (i=0; i<*n; i++) {
    fscanf(fp, "%s", nam[i]);
    for (j=0; j<i; j++) {
 fscanf(fp, "%lf", &res[i][j]);
    }
    ch = 'x';
    while (ch!='\n') ch = getc(fp);
  }
}
 
double RateBots(int n,  
  double rate[MAXBOTS],  
  double res[MAXBOTS][MAXBOTS],  
  double pred[MAXBOTS][MAXBOTS]) {
  int gen;
  int i;
  double delta;
  double initerr;
  double finerr;
  double minerr;
  double bestrate[MAXBOTS];
  int bot;
 
  for (i=0; i<MAXBOTS; i++) {
    rate[i] = 0.0;
  }
 
  initerr = rmserror(n,rate,res,pred);
  minerr = initerr;
 
  for (gen = 0; gen<NGEN; gen++) {
    kT = 1000.0 / (double)gen;
 
    // choose a bot and how much to perturb by
    delta = MAXDELTA*(((1.0*rand())/RAND_MAX)-0.5);
    bot = 0;
    while (bot==0) bot = rand()%n;
 
    rate[bot] += delta;
     
    finerr = rmserror(n,rate,res,pred);
     
    if ((1.0*rand()/RAND_MAX)<exp(-(finerr-initerr)/kT)) {
 // accept move
 initerr = finerr;
 if (initerr<minerr) {
 minerr=initerr;
 for (i=0; i<n; i++) {
   bestrate[i] = rate[i];
 }
 }
 
    } else {
 rate[bot] -= delta;
    }
 
    //if (gen%10000==0) printf("%10d %10.5f %10.5f\n", gen, initerr, minerr);
  }
 
  for (i=0; i<n; i++) {
    rate[i] = bestrate[i];
  }
  return minerr;
 
}
 
int main () {
  FILE *fp;
  int NumBots;
  char Names[MAXBOTS][MAXNAMELENGTH];
  double Result[MAXBOTS][MAXBOTS];
  double Predict[MAXBOTS][MAXBOTS];
  double Rating[MAXBOTS];
 
  int i;
 
  ReadIn(&NumBots, Result, Names);
  RateBots(NumBots, Rating, Result, Predict);
   
  for (i=0; i<NumBots; i++) printf("# Rating: %7.1f %s\n", Rating[i], Names[i]);
}

 
Here's the current input file:

10
Actual  M  S  S+K-K   S+I     S+I-I   M+I-I   S+F-F   S+S-S   M+F-F   M+S-S
M
S  46.00
S+K-K   81.00   82.60
S+I     94.34   93.00   59.50
S+I-I   98.50   97.70   67.10   71.70
M+I-I   99.90   99.90   94.20   97.24   84.00
S+F-F   99.95   99.97   89.02   96.53   91.50   66.00
S+S-S   99.98   99.98   95.07   97.93   95.60   72.00   64.4
M+F-F   100.00  99.99   96.09   99.08   97.00   74.00   59.00   52.00
M+S-S   100.00  100.00  99.03   99.80   98.30   85.20   72.00   66.00   53.50
« Last Edit: Sep 17th, 2004, 10:07am by 99of9 » IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Arimaa rating deflation
« Reply #79 on: Sep 17th, 2004, 8:21pm »
Quote Quote Modify Modify

This is a fun conversation!  I wish I had gotten involved earlier.
 
Omar, you make a clear and persuasive argument.  I accept the vocabulary change of "anchoring" the system as opposed to having an "absolute" scale for the ratings.  It makes sense to anchor the system rather than letting it drift.  If you let it drift it will probably (under the current system) deflate over time.  Moreover, as long as the system is going to be anchored, fixing the rating of a random mover at zero is at least as sensible as fixing the average rating at 1500, or any other anchoring idea I know of.
 
Still, the ratings scale will depend on the pool of players, and especially on the pool of bots used to anchor it.  A different pool of bots would anchor it in a different way, even if that pool of bots also had the random mover fixed at a zero rating.  The lack of transitivity insures that all ratings are meaningful only relative to the playing population.
 
The more I think about the lack of transitivity, the more I question the fundamental meaning of ratings.  I am beginning to think that the most basic formula, namely "WP = 1/(1+10^(RD/400))", needs to be deprecated from its current central role.  Follow me through an example, and see if you don't arrive at the same conclusion I do.
 
Suppose that 99of9 can beat Belbo about 75% of the time and speedy about 75% of the time.  Suppose that I make a special study of speedy, and learn to beat it 99% of the time, but this specialty knowledge isn't transitive, so I can only beat Belbo 40% of the time.  (I know both the 99% and the 40% are far too high, but bear with me for the sake of the example.)
 
The question is, based on these percentages, who deserves a higher rating, me or 99of9?  If you start from the formula of "WP = 1/(1+10^(RD/400))", you would say that 99of9 deserves to be 191 points higher than Belbo and 191 points higher than speedy, while I deserve to be 70 points lower than Belbo but 798 points higher than speedy.  To combine these numbers somehow, we could take the average rating of Belbo and speedy, put 99of9 above it by (191+191)/2, and put me above it by (-70+798)/2.  I end up 173 points higher than 99of9, which is absurd.
 
Intuitively, it makes no sense whatsoever that 99of9 would be rated lower than me, given that he wins 1.5 out of every two games against the same opposition I beat 1.39 games out of 2.  This issue really comes to a crisis when there are winning percentages of 99.98, 99.99, and 100.00.  My intution (apparently in agreement with 99of9) is that the difference between 99.98 and 99.99 should count for exactly as much as the difference between 50.00 and 50.01.  Yet the current rating system disproportionately rewards lopsided results.  The key to getting a high rating at present is to find a bot you can beat, and beat it again and again and again.
 
If we can free ourselves for a minute from the shackles of "WP = 1/(1+10^(RD/400))", let's consider an alternate definition of playing strength.  Let's say that the better player is, by definition, the one who wins more games (on average) against the same opposition.  Everyone can agree that, in a round-robin tournament, the player with the highest total score wins, regardless of who the wins and losses were against.  Adding up total wins in a round robin is universally considered to be a fair method of scoring.
 
In the round robin of bots from clauchau/99of9, the expected winning percentages of each bot are (calculated by hand so please forgive any errors):
 
M  8.93
S    8.10
S+K-K   29.29
S+I     31.62
S+I-I   40.96
M+I-I   64.23
S+F-F   71.95
S+S-S   78.55
M+F-F   80.41
M+S-S   87.09
 
I would claim that the above list of winning percentages in a round robin DEFINES the relative playing strength of each bot in this field.  Anything else we say about ratings should be derivative from this concept.
 
Now, I admit that this leaves open the knotty question of how to estimate ratings when a round robin isn't possible.  I need to chew on that one some more before I offer up any suggestions.  But I have a strong intuition that "total score versus the field", i.e. the round-robin philosophy, is a solid starting point.
IP Logged

clauchau
Forum Guru
*****



bot Quantum Leapfrog's father

   
WWW

Gender: male
Posts: 145
Re: Arimaa rating deflation
« Reply #80 on: Sep 18th, 2004, 7:34am »
Quote Quote Modify Modify

Oh yes, I love all that was recently said.
 
It looks like round robin scores are an additive logarithmic version of ratings with a smaller focus on the last games. It has the advantage of weighing every played opponents equally.
 
In theory I still could invite a hundred similar bots in the gameroom and play them all, amounting to a weigh of hundred against a single bot. So the scale depends on who plays, but it's alright I think and my hundred bots aren't a real issue.  
 
Now recent games should however be weighed more. And maybe that only means we'll get back to the current formula. It's fun too. To me it looks a bit like a currency or some traded goods or several currencies with national local biases.
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Arimaa rating deflation
« Reply #81 on: Sep 18th, 2004, 4:01pm »
Quote Quote Modify Modify

Yes, clauchau, if there happens to be a player that I do well against and others do poorly against, I want as many copies of that player as possible in the field.  This can distort the ratings.  But I it isn't likely to be as much of an issue as the current problem that I can find one opponent I do relatively well against and play hundreds of games against that one opponent.
 
If nothing else, we should have a reasonable pool of bots with fixed ratings.  It seems that the current pool needs to be extended further up to overlap with ShallowBlue and human beginners.  Are there natural, easy-to-define bots that play better without looking deeper?  I notice that M+K-K isn't in the pool, where K is just a count of the number of pieces each player controls.  Furthermore, that bot could perhaps be further strengthed by tiebreaking pure materialism with a count of "favorable adjacencies", i.e. among moves which result in the same number of pieces being captured, break ties in order of (# of enemy pieces next to a stronger friendly piece) - (# of friendly pieces next to a stronger enemy piece).  We could call it M+K-K+A-A, where A is for adjacency.  We could also have M+K-K+F-F where now F stands for the number of frozen pieces on each side.
 
Are there other natural algorithms?
 
How difficult is it to add new bots to the "anchor group"?
IP Logged

99of9
Forum Guru
*****




Gnobby's creator (player #314)

  toby_hudson  


Gender: male
Posts: 1413
Re: Arimaa rating deflation
« Reply #82 on: Sep 18th, 2004, 6:10pm »
Quote Quote Modify Modify

on Sep 18th, 2004, 4:01pm, Fritzlein wrote:
How difficult is it to add new bots to the "anchor group"?

 
Very easy if they're similarly structured.  If you can define it, then you can probably code it quickly by copying Clauchau's current bots.  Then you just have to run it against all previous ones, and feed the results into my program.
 
Actually there are quite a few bots that Clauchau wrote that haven't been included yet because they haven't been run against each other.  I guess they're intermediate ones though, but I'm slowly working through them.
IP Logged
clauchau
Forum Guru
*****



bot Quantum Leapfrog's father

   
WWW

Gender: male
Posts: 145
Re: Arimaa rating deflation
« Reply #83 on: Sep 19th, 2004, 9:13am »
Quote Quote Modify Modify

I remember having tried a mix of Freezer and K, weighing those pieces that are frozen less than 1. The best weigh was 0.86 or so, but it didn't make that stronger a bot.
 
Maybe Freezer or A distinct from K would do some good as you suggest, fritzlein. In any case I bet we need more to make a really better bot - open access to goal ranks, something about traps and territory and strength density.  
 
Here is a thought about how to make it natural: The first bots are about advancing rabbits. It's naturally derived from the winning condition. The only human thought we put into it is that in order to get a rabbit on the 8th rank, we better have  
some on the 7th, the 6th, ...
 
We might try to naturally derive a bot from the condition "being one step from winning" = "having a rabbit on the 7th rank and no piece above and (a friend beside or no stronger foe beside) and having a step to play".
« Last Edit: Sep 22nd, 2004, 2:10pm by clauchau » IP Logged
omar
Forum Guru
*****



Arimaa player #2

   


Gender: male
Posts: 1003
Re: Arimaa rating deflation
« Reply #84 on: Sep 21st, 2004, 3:52pm »
Quote Quote Modify Modify

Karl  (Fritzlein) and I had some long email discussions about the problems with the ELO rating systems. Some of the things that I learned from it was that:
 
* The ELO rating system would work fine as long as the players are not allowed to pick their opponents and the opponents are picked for them (as it happens in tournaments).
 
* When the players are allowed to pick their own opponents, the rating system can be abused by repeatedly playing the same or small group of opponents.
 
* When computers opponents are also added to the mix, it makes the problem even worse, because once a player learns how to defeat it they can do it again and again since the computer opponent will never figure out why it is losing and adapt itself (at least with the current computer opponents Smiley ).
 
* A players rating is very dependent on the pool of players that are available to play against. For example a player with a true rating of 3000 will never show that rating if the rating of the other players in the pool are around 1000. Even if he consistently defeats all of them the rating formula will only let him increase to about 2200. So there needs to be a good continuum of players at all levels to support a healthy rating system.  
 
* Even though we use winning percentages to compute the ratings we should not expect the ratings difference between any two players to accurately tell us what the winning percentages will be when they play each other. We can go forward, but we can't reliably go back.
 
* The meaning of a players rating should be how they have performed against the field. The more different opponents that a player has played the more meaningful and reliable the rating is. If only a few opponents have been played the rating is not very meaningful.
 
Keep in mind that anchoring a rating system does not prevent players from abusing the rating system by playing very limited sets of selected opponents. So this problem is completely independent of that.
 
After having these discussions with Karl I tried out some different formulas for computing the players ratings that might be more resistent to abuse. The current best formula has these properties:
 
* Winning many games against the same player will not increase your rating as much as winning the same number of games against differnt players.
 
* A rating obtained by playing the same player will fall more after a lose than the same rating obtained by playing many different opponents.
 
* A rating obtained by playing much weaker players will fall more after a lose than the same rating obtained by playing opponents of similar ratings.
 
* The recent games count more than the older games, but a player cannot wash out their history by playing a lot of games (like 200) with the same player.
 
* The rating uncertianty goes down faster if you play different opponents and it can go back up if you start playing many games with the same opponent or few opponents. In effect the rating uncertainty can reflect how meaningful your rating is.
 
I passed it on to Karl to look at and am waiting to hear back from him about it. He is pretty good at finding cracks in a system Smiley
 
If anyone else wants to check it out, you can download it from:
  http://arimaa.com/arimaa/rating/testRatings.tgz
 
IP Logged
clauchau
Forum Guru
*****



bot Quantum Leapfrog's father

   
WWW

Gender: male
Posts: 145
Re: Arimaa rating deflation
« Reply #85 on: Sep 22nd, 2004, 2:28pm »
Quote Quote Modify Modify

Ah, I see you too Omare became a forum God Smiley
 
I guess it's a good idea to have the ratings to be the solution of an equation, but I don't understand the equation.
 
And how to justify the addition of a draw against a zero rated opponent, in theory? I can only understand adding a draw against an equally rated opponent.
IP Logged
omar
Forum Guru
*****



Arimaa player #2

   


Gender: male
Posts: 1003
Re: Arimaa rating deflation
« Reply #86 on: Sep 23rd, 2004, 10:22am »
Quote Quote Modify Modify

I recently lowered the values for the forum seniority levels; that's why we all gained seniority Smiley
 
The fictitious draw is need so that the equation does not blow up if a player has not lost or won a single game.
 
The choice of what rating that fictitious draw is against does not matter too much in the long run, but chosing the average opponent rating (a number which is not fixed) can cause a player to gain rating points after losing a game against a high rated player (which is counter intuitive to how ratings should work). So that is why I didn't chose that. I thought it is more important to chose a number that is fixed and does not change. Zero seemed like the obvious number Smiley
 
IP Logged
clauchau
Forum Guru
*****



bot Quantum Leapfrog's father

   
WWW

Gender: male
Posts: 145
Re: Arimaa rating deflation
« Reply #87 on: Sep 24th, 2004, 6:53am »
Quote Quote Modify Modify

The simple bots we've got so far on the bottom of the scale all happen to see no more than the horizontal projection of the board. Much less even, because the ranks of the noble piece aren't taken into consideration. So, I'm wondering, how about first making the best of that projection?
 
I want to know how high it is best to put every piece relative to each others. Like of course it is better to wait for a lower density of foes on the goal rank before advancing your rabbits, except the more you accompany them with friendly powerful pieces the higher the foe density you can tolerate.
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Arimaa rating deflation
« Reply #88 on: Nov 2nd, 2004, 2:48pm »
Quote Quote Modify Modify

In another thread I said:
 
on Nov 2nd, 2004, 12:02pm, Fritzlein wrote:

 
It isn't possible to deduce apriori whether the way players enter and leave the playing pool will have a net inflationary or net deflationary effect.  From my short time of observation, however, I rather expect that we are suffering from mild deflation at present.  And even if we experiencing a small amount of inflation in the sense that average rating of active players going up (which I don't think we are)  I suspect that there is significant deflation in the sense that Fotland meant, i.e. that a 1700-rated player today is signficantly stronger than a 1700-rated player was a year ago.

 
There is some evidence that there is both inflation in the one sense and deflation in the other.
 
I tested for deflation by looking at Arimaazilla's rating, since as I understand it, that bot plays the same as a year ago.  So I checked my database.  In September and October 2003, Arimaazilla was averaging a rating of 1506.  In September and October (up to the 24th) of 2004, Arimaazilla was averaging a rating of 1429.  That's not conclusive, but it sure is suggestive.
 
On the other hand, in November of 2003, the average rating of 23 active humans was 1558, whereas in November of 2004, the average rating of 19 active humans was 1663.
 
So at a first approximation, it is possible that we have equal and opposite trends.  Maybe the average rating of active players has gone up by 105 points in a year, but the rating of a player who hasn't improved has gone down by 80 points in the same time frame.
 
An alternative conclusion is that humans are just getting better at beating bots.  That is to say, perhaps there is neither inflation nor deflation, but human players are pulling away from the state of the art computer players.  This is intuitively very plausible to me.
« Last Edit: Nov 2nd, 2004, 3:05pm by Fritzlein » IP Logged

99of9
Forum Guru
*****




Gnobby's creator (player #314)

  toby_hudson  


Gender: male
Posts: 1413
Re: Arimaa rating deflation
« Reply #89 on: Nov 2nd, 2004, 4:53pm »
Quote Quote Modify Modify

If you include bots in your definition of an "active player" what do you find?
 
This is what I expect to have inflated due to wandering players.  Find all the players that have played at least a few rated games in a month, average their ratings.  If someone plays 10 times the number of games their rating shouldn't be included in the calculation 10 times.
 
You are no doubt right that active humans have improved relative to the bots, but that is a different issue, that is the deflation that fotland was identifying when he started this thread.
« Last Edit: Nov 2nd, 2004, 4:54pm by 99of9 » IP Logged
Pages: 1 ... 4 5 6 7 8  ...  12 Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.