Welcome, Guest. Please Login or Register.
Apr 25th, 2024, 3:20am

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « Arimaa rating deflation »


   Arimaa Forum
   Arimaa
   General Discussion
(Moderator: supersamu)
   Arimaa rating deflation
« Previous topic | Next topic »
Pages: 1 2 3 4 5 6  ...  12 Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: Arimaa rating deflation  (Read 30011 times)
fotland
Forum Guru
*****



Arimaa player #211

   


Gender: male
Posts: 216
Re: Arimaa rating deflation
« Reply #45 on: Dec 16th, 2003, 9:35am »
Quote Quote Modify Modify

bot_random shouldn't have a fixed rating, or it will distort the rating system.  The problem is that bot_random will lose every game and end up 700 points lower than the lowest bot, but that won' be its true rating.
IP Logged
MrBrain
Forum Guru
*****



Arimaa player #344

   


Gender: male
Posts: 148
Re: Arimaa rating deflation
« Reply #46 on: Dec 16th, 2003, 10:31am »
Quote Quote Modify Modify

But that's the whole purpose of bot_random.  To have a fixed reference point for a rating of 0.  Why else would we even be making a random bot?  If you go back to the start of this topic (actually, the first post after your initial post), you'll see that the whole reason we started talking about a random bot was that we wanted some anchor for the rating system.  What better way to do this than random=0?
« Last Edit: Dec 16th, 2003, 10:34am by MrBrain » IP Logged
clauchau
Forum Guru
*****



bot Quantum Leapfrog's father

   
WWW

Gender: male
Posts: 145
Re: Arimaa rating deflation
« Reply #47 on: Dec 16th, 2003, 1:43pm »
Quote Quote Modify Modify

Yep, and that's why we'll also have intermediate bots.  
 
Here are some first results about elementary bots that now acknowledge the goal winning condition.
 
The Stepping Ultimate Lookout Wink makes random steps, except that it steps onto the goal with a rabbit if some step ever achieves that immediately (without ever caring first to get rabbits closer) and it never pulls or pushes one of the opponent's rabbits onto the opposing goal.
 
Stepping Ultimate Lookout / Random Stepper
 
SUL won 62.7% and RS won 37.3% of 100,000 games
 
That's not much of an improvement but I was curious about it.
 
The Stepping +Infiltrator -Infiltrator makes random steps among the steps maximazing
 
16*(advancement of the most advanced rabbit) - (advancement of the opponent's most advanced rabbit)
 
where advancement = 8 on the goal.
 
Stepping +Infiltrator -Infiltrator / Random Stepper
 
S+I-I won 97.7% and RS won 2.3% of 100,000 games
 
Now the Stepping +Flooder -Flooder focusses on getting as many rabbits as possible onto the goal, then onto the row before, etc., then on the first row, then on getting as few of his opponent's rabbits as possible on the opposing goal, then as few on the row before, etc.
 
Stepping +Flooder -Flooder / Stepping +Infiltrator -Infiltrator
 
S+F-F won 91.5% and S+I-I won 8.5% of 100,000 games.
 
Wins by Goal reached: 99.75%
Loss by pulling or pushing on the opposing goal: 0 (none)
The loser was unable to move: 0.25%
Loss by 3-times repetition: 2 games
 
shorter game = 6 half moves
mean length = 33.1 half moves (sd = 14.3)
longest game = 188
 
There is more, but that's the most important results. I didn't get any elementary Stepping bots stronger than that Flooder. In particular the official scoring function makes a weaker stepping bot (and moving bot as well).
IP Logged
MrBrain
Forum Guru
*****



Arimaa player #344

   


Gender: male
Posts: 148
Re: Arimaa rating deflation
« Reply #48 on: Dec 16th, 2003, 2:35pm »
Quote Quote Modify Modify

Nice results so far!  I'd be also interested in seeing your first non-random bot play against the random mover, since this is the agreed 0-rating floor bot.
IP Logged
MrBrain
Forum Guru
*****



Arimaa player #344

   


Gender: male
Posts: 148
Re: Arimaa rating deflation
« Reply #49 on: Dec 16th, 2003, 2:37pm »
Quote Quote Modify Modify

From my knowledge of how chess ratings relate to win probabilities, I'd estimate (very preliminary back-of-the-envelope calculation) that your Stepping +Flooder -Flooder bot would be about 1200 rating points better than the random stepper.
 
Actually, if we can do this kind of analysis before we anchor the rating system at 0 for bot_random, we should be able to estimate a one-time adjustment to all current ratings.  For example, if we find that a person rated around 1500 would instead be 2300 with a random=0 anchor, then we can simply add 800 points to everyone's ratings.  This will prevent a long and inaccurate period where people's ratings drift at different rates depending on how much they play.
 
Actually, if we find that the adjustment would be really great (like more than 2000 points), it may be aesthetically pleasing to both scale AND shift the entire rating system.  For example, instead of having mean ratings be 3800, we could change the scaling factor in the ratings formulas from 800 to 400 so that a difference of 100 points then would be what a difference of 200 points is now.  But again, that's just a preference, not a necessity.  Some sort of shift will probably be necessary though if we don't want a long period of inaccurate ratings.
« Last Edit: Dec 16th, 2003, 2:50pm by MrBrain » IP Logged
99of9
Forum Guru
*****




Gnobby's creator (player #314)

  toby_hudson  


Gender: male
Posts: 1413
Re: Arimaa rating deflation
« Reply #50 on: Dec 16th, 2003, 2:41pm »
Quote Quote Modify Modify

Keep it coming!  This is all very interesting.
 
I'd also be interested in how a complete materialist would do in this scheme of things... eg implementing the 99system at each step, without any focus on pushing rabbits forward.  I expect this would lose to flooder, but I'd be interested nonetheless.
 
It'd be worth playing some of those bots you've made against the random_mover, since that's what most people think should be the one fixed to 0.  Then we can start arguing about ratings for the intermediate bots.
 
In fact a full crosstable of percentage wins for all pairs of bots you make is probably the best thing to calculate ratings from.
IP Logged
fotland
Forum Guru
*****



Arimaa player #211

   


Gender: male
Posts: 216
Re: Arimaa rating deflation
« Reply #51 on: Dec 18th, 2003, 12:22am »
Quote Quote Modify Modify

on Dec 16th, 2003, 10:31am, MrBrain wrote:
But that's the whole purpose of bot_random.  To have a fixed reference point for a rating of 0.  Why else would we even be making a random bot?  If you go back to the start of this topic (actually, the first post after your initial post), you'll see that the whole reason we started talking about a random bot was that we wanted some anchor for the rating system.  What better way to do this than random=0?

 
I understand the desire to have a fixed reference point, but I think that a random player is 5000 or 10000 points weaker than the strong players.  I don't think we want to radically change the ratings of the current players, and wait for them to restabilize.  My suggestion is that initially the random player should float, to find out what its natural rating is, then make it the anchor at that rating.
 
But you know that I think the whole idea is silly Smiley  Because there will be so many stages of intermediate players between the random player and the worst human, that the system will never stabilize.
IP Logged
MrBrain
Forum Guru
*****



Arimaa player #344

   


Gender: male
Posts: 148
Re: Arimaa rating deflation
« Reply #52 on: Dec 18th, 2003, 10:30am »
Quote Quote Modify Modify

I think you are severely overestimating the number of levels between random and regular players.  As the preliminary analysis has shown, there's about a 1200 point difference between random and a bot that accomplishes a concrete strategical goal.  I would estimate (without the benefit of seeing its play) that this bot is about 1200 at most worse than shallow_blue.  Allow another 600 points for an average player puts us at about 3000.  So at worse, we may need to, as I suggested before, scale the rating system so that a 100 point difference means about what a 200 point difference does now.
IP Logged
MrBrain
Forum Guru
*****



Arimaa player #344

   


Gender: male
Posts: 148
Re: Arimaa rating deflation
« Reply #53 on: Dec 18th, 2003, 10:35am »
Quote Quote Modify Modify

on Dec 18th, 2003, 12:22am, fotland wrote:
My suggestion is that initially the random player should float, to find out what its natural rating is, then make it the anchor at that rating.

What's the difference between what you're saying, and figuring out what the natural rating would be through experimentation (what Claude is doing) followed by a one time rating adjustment?  There is none, except with the second approach, you end up with random=0, which makes sense.
 
on Dec 18th, 2003, 12:22am, fotland wrote:

there will be so many stages of intermediate players between the random player and the worst human, that the system will never stabilize.

That's the purpose of the one-time rating adjustment.  We go right to what we think is the best difference and start from there.  There won't be long-term drifting.
« Last Edit: Dec 18th, 2003, 10:38am by MrBrain » IP Logged
99of9
Forum Guru
*****




Gnobby's creator (player #314)

  toby_hudson  


Gender: male
Posts: 1413
Re: Arimaa rating deflation
« Reply #54 on: Dec 18th, 2003, 10:48am »
Quote Quote Modify Modify

on Dec 16th, 2003, 1:43pm, clauchau wrote:

SUL won 62.7% and RS won 37.3% of 100,000 games
 
That's not much of an improvement but I was curious about it.
 
S+I-I won 97.7% and RS won 2.3% of 100,000 games
S+F-F won 91.5% and S+I-I won 8.5% of 100,000 games.

 
I quite like the idea of bots with a fair degree of overlap, where the win ratio is near 70%.  (whether by randomisation or by very small increments in bot algorithm).  Otherwise if the win ratio is up near 100%, it's difficult to be sure of the relative ratings.
IP Logged
99of9
Forum Guru
*****




Gnobby's creator (player #314)

  toby_hudson  


Gender: male
Posts: 1413
Re: Arimaa rating deflation
« Reply #55 on: Dec 18th, 2003, 10:54am »
Quote Quote Modify Modify

on Dec 18th, 2003, 10:30am, MrBrain wrote:
I think you are severely overestimating the number of levels between random and regular players.

 
Actually David's estimate of [(Strong Human - Random)=~5000 (to 10000)], is not that far off my estimate of [Random Rating on Current Scale = -2000], since strong humans can have a rating over +2000.
 
But anyway, a more precise answer will eventually be established by Clauchau's bots.
 
99
IP Logged
99of9
Forum Guru
*****




Gnobby's creator (player #314)

  toby_hudson  


Gender: male
Posts: 1413
Re: Arimaa rating deflation
« Reply #56 on: Dec 18th, 2003, 11:05am »
Quote Quote Modify Modify

on Dec 14th, 2003, 11:50am, clauchau wrote:

the mover won 54%, the stepper won 46%

 
If we define Random Mover as our 0, Random Stepper therefore has a rating of approximately -28.
« Last Edit: Dec 18th, 2003, 12:42pm by 99of9 » IP Logged
99of9
Forum Guru
*****




Gnobby's creator (player #314)

  toby_hudson  


Gender: male
Posts: 1413
Re: Arimaa rating deflation
« Reply #57 on: Dec 18th, 2003, 11:17am »
Quote Quote Modify Modify

on Dec 16th, 2003, 1:43pm, clauchau wrote:

 
SUL won 62.7% and RS won 37.3% of 100,000 games
S+I-I won 97.7% and RS won 2.3% of 100,000 games
S+F-F won 91.5% and S+I-I won 8.5% of 100,000 games.

 
That gives SUL a rating of about 62 (90 higher than RS).
 
S+I-I is approximately at 623 (651 higher than RS)
 
S+F-F is approximately at 1036 (413 higher than S+I-I).
 
 
To be honest I think we're still quite a way from Shallowblue, because at the moment, in games of Shallowblue vs S+F-F, shallowblue will simply eat every rabbit that S+F-F sends forward.  This flooding mechanism may be good against bots that don't try to trap it, but as soon as you put any trapping plan into action, flooder is dead.
 
By the way:   S+F-F is only different to S+I-I when the lead rabbit cannot make progress.  In that case S+F-F sends another rabbit forward, whereas S+I-I simply makes a random move.  Notice that this small difference in strategies resulted in a few hundred ratings points!!
IP Logged
MrBrain
Forum Guru
*****



Arimaa player #344

   


Gender: male
Posts: 148
Re: Arimaa rating deflation
« Reply #58 on: Dec 18th, 2003, 12:24pm »
Quote Quote Modify Modify

Well, perhaps the rating difference is more than I expect (but I am almost positive much less than 10000).  But yes, we will definitely find out from the experiments.  I am very excited to see the results!
IP Logged
fotland
Forum Guru
*****



Arimaa player #211

   


Gender: male
Posts: 216
Re: Arimaa rating deflation
« Reply #59 on: Dec 19th, 2003, 1:09am »
Quote Quote Modify Modify

Does anyone have an estimate of shallow blue's actual rating, since its currently frozen?  I'm confident that ariminator will win very close to 100% against it, so
perhaps shallow blue's rating is actually about 500 on the current scale.  Maybe Omar could let it float and we could see where it ends up.
 
A bigger ratings issue with using bots is that they don't learn and people do.  People will discover their weaknesses, and exploit the same weakness over and over.  This causes distortion in the relative human ratings.  Of course we already have this problem, but I don't think fixing the bot ratings will help it.
 
Finally, many people are familiar with chess ratings, yahoo ratings, etc.  If we shift the whole rating system up thousands of points and popular the familiar ratings with many bots, it will look a little odd Smiley
 
Still, I'm very interested in the results of the bot experiments.  I bet I could write 3 bots where bot1 beats bot2 close to 100%, bot 2 beats bot3, and bot3 beats bot1.  Would that be enough to demonstrate the futility of using bots to make a more stable rating system? Smiley
 
IP Logged
Pages: 1 2 3 4 5 6  ...  12 Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.