Author |
Topic: Arimaa rating deflation (Read 30011 times) |
|
fotland
Forum Guru
Arimaa player #211
Gender:
Posts: 216
|
|
Re: Arimaa rating deflation
« Reply #45 on: Dec 16th, 2003, 9:35am » |
Quote Modify
|
bot_random shouldn't have a fixed rating, or it will distort the rating system. The problem is that bot_random will lose every game and end up 700 points lower than the lowest bot, but that won' be its true rating.
|
|
IP Logged |
|
|
|
MrBrain
Forum Guru
Arimaa player #344
Gender:
Posts: 148
|
|
Re: Arimaa rating deflation
« Reply #46 on: Dec 16th, 2003, 10:31am » |
Quote Modify
|
But that's the whole purpose of bot_random. To have a fixed reference point for a rating of 0. Why else would we even be making a random bot? If you go back to the start of this topic (actually, the first post after your initial post), you'll see that the whole reason we started talking about a random bot was that we wanted some anchor for the rating system. What better way to do this than random=0?
|
« Last Edit: Dec 16th, 2003, 10:34am by MrBrain » |
IP Logged |
|
|
|
clauchau
Forum Guru
bot Quantum Leapfrog's father
Gender:
Posts: 145
|
|
Re: Arimaa rating deflation
« Reply #47 on: Dec 16th, 2003, 1:43pm » |
Quote Modify
|
Yep, and that's why we'll also have intermediate bots. Here are some first results about elementary bots that now acknowledge the goal winning condition. The Stepping Ultimate Lookout makes random steps, except that it steps onto the goal with a rabbit if some step ever achieves that immediately (without ever caring first to get rabbits closer) and it never pulls or pushes one of the opponent's rabbits onto the opposing goal. Stepping Ultimate Lookout / Random Stepper SUL won 62.7% and RS won 37.3% of 100,000 games That's not much of an improvement but I was curious about it. The Stepping +Infiltrator -Infiltrator makes random steps among the steps maximazing 16*(advancement of the most advanced rabbit) - (advancement of the opponent's most advanced rabbit) where advancement = 8 on the goal. Stepping +Infiltrator -Infiltrator / Random Stepper S+I-I won 97.7% and RS won 2.3% of 100,000 games Now the Stepping +Flooder -Flooder focusses on getting as many rabbits as possible onto the goal, then onto the row before, etc., then on the first row, then on getting as few of his opponent's rabbits as possible on the opposing goal, then as few on the row before, etc. Stepping +Flooder -Flooder / Stepping +Infiltrator -Infiltrator S+F-F won 91.5% and S+I-I won 8.5% of 100,000 games. Wins by Goal reached: 99.75% Loss by pulling or pushing on the opposing goal: 0 (none) The loser was unable to move: 0.25% Loss by 3-times repetition: 2 games shorter game = 6 half moves mean length = 33.1 half moves (sd = 14.3) longest game = 188 There is more, but that's the most important results. I didn't get any elementary Stepping bots stronger than that Flooder. In particular the official scoring function makes a weaker stepping bot (and moving bot as well).
|
|
IP Logged |
|
|
|
MrBrain
Forum Guru
Arimaa player #344
Gender:
Posts: 148
|
|
Re: Arimaa rating deflation
« Reply #48 on: Dec 16th, 2003, 2:35pm » |
Quote Modify
|
Nice results so far! I'd be also interested in seeing your first non-random bot play against the random mover, since this is the agreed 0-rating floor bot.
|
|
IP Logged |
|
|
|
MrBrain
Forum Guru
Arimaa player #344
Gender:
Posts: 148
|
|
Re: Arimaa rating deflation
« Reply #49 on: Dec 16th, 2003, 2:37pm » |
Quote Modify
|
From my knowledge of how chess ratings relate to win probabilities, I'd estimate (very preliminary back-of-the-envelope calculation) that your Stepping +Flooder -Flooder bot would be about 1200 rating points better than the random stepper. Actually, if we can do this kind of analysis before we anchor the rating system at 0 for bot_random, we should be able to estimate a one-time adjustment to all current ratings. For example, if we find that a person rated around 1500 would instead be 2300 with a random=0 anchor, then we can simply add 800 points to everyone's ratings. This will prevent a long and inaccurate period where people's ratings drift at different rates depending on how much they play. Actually, if we find that the adjustment would be really great (like more than 2000 points), it may be aesthetically pleasing to both scale AND shift the entire rating system. For example, instead of having mean ratings be 3800, we could change the scaling factor in the ratings formulas from 800 to 400 so that a difference of 100 points then would be what a difference of 200 points is now. But again, that's just a preference, not a necessity. Some sort of shift will probably be necessary though if we don't want a long period of inaccurate ratings.
|
« Last Edit: Dec 16th, 2003, 2:50pm by MrBrain » |
IP Logged |
|
|
|
99of9
Forum Guru
Gnobby's creator (player #314)
Gender:
Posts: 1413
|
|
Re: Arimaa rating deflation
« Reply #50 on: Dec 16th, 2003, 2:41pm » |
Quote Modify
|
Keep it coming! This is all very interesting. I'd also be interested in how a complete materialist would do in this scheme of things... eg implementing the 99system at each step, without any focus on pushing rabbits forward. I expect this would lose to flooder, but I'd be interested nonetheless. It'd be worth playing some of those bots you've made against the random_mover, since that's what most people think should be the one fixed to 0. Then we can start arguing about ratings for the intermediate bots. In fact a full crosstable of percentage wins for all pairs of bots you make is probably the best thing to calculate ratings from.
|
|
IP Logged |
|
|
|
fotland
Forum Guru
Arimaa player #211
Gender:
Posts: 216
|
|
Re: Arimaa rating deflation
« Reply #51 on: Dec 18th, 2003, 12:22am » |
Quote Modify
|
on Dec 16th, 2003, 10:31am, MrBrain wrote:But that's the whole purpose of bot_random. To have a fixed reference point for a rating of 0. Why else would we even be making a random bot? If you go back to the start of this topic (actually, the first post after your initial post), you'll see that the whole reason we started talking about a random bot was that we wanted some anchor for the rating system. What better way to do this than random=0? |
| I understand the desire to have a fixed reference point, but I think that a random player is 5000 or 10000 points weaker than the strong players. I don't think we want to radically change the ratings of the current players, and wait for them to restabilize. My suggestion is that initially the random player should float, to find out what its natural rating is, then make it the anchor at that rating. But you know that I think the whole idea is silly Because there will be so many stages of intermediate players between the random player and the worst human, that the system will never stabilize.
|
|
IP Logged |
|
|
|
MrBrain
Forum Guru
Arimaa player #344
Gender:
Posts: 148
|
|
Re: Arimaa rating deflation
« Reply #52 on: Dec 18th, 2003, 10:30am » |
Quote Modify
|
I think you are severely overestimating the number of levels between random and regular players. As the preliminary analysis has shown, there's about a 1200 point difference between random and a bot that accomplishes a concrete strategical goal. I would estimate (without the benefit of seeing its play) that this bot is about 1200 at most worse than shallow_blue. Allow another 600 points for an average player puts us at about 3000. So at worse, we may need to, as I suggested before, scale the rating system so that a 100 point difference means about what a 200 point difference does now.
|
|
IP Logged |
|
|
|
MrBrain
Forum Guru
Arimaa player #344
Gender:
Posts: 148
|
|
Re: Arimaa rating deflation
« Reply #53 on: Dec 18th, 2003, 10:35am » |
Quote Modify
|
on Dec 18th, 2003, 12:22am, fotland wrote:My suggestion is that initially the random player should float, to find out what its natural rating is, then make it the anchor at that rating. |
| What's the difference between what you're saying, and figuring out what the natural rating would be through experimentation (what Claude is doing) followed by a one time rating adjustment? There is none, except with the second approach, you end up with random=0, which makes sense. on Dec 18th, 2003, 12:22am, fotland wrote: there will be so many stages of intermediate players between the random player and the worst human, that the system will never stabilize. |
| That's the purpose of the one-time rating adjustment. We go right to what we think is the best difference and start from there. There won't be long-term drifting.
|
« Last Edit: Dec 18th, 2003, 10:38am by MrBrain » |
IP Logged |
|
|
|
99of9
Forum Guru
Gnobby's creator (player #314)
Gender:
Posts: 1413
|
|
Re: Arimaa rating deflation
« Reply #54 on: Dec 18th, 2003, 10:48am » |
Quote Modify
|
on Dec 16th, 2003, 1:43pm, clauchau wrote: SUL won 62.7% and RS won 37.3% of 100,000 games That's not much of an improvement but I was curious about it. S+I-I won 97.7% and RS won 2.3% of 100,000 games S+F-F won 91.5% and S+I-I won 8.5% of 100,000 games. |
| I quite like the idea of bots with a fair degree of overlap, where the win ratio is near 70%. (whether by randomisation or by very small increments in bot algorithm). Otherwise if the win ratio is up near 100%, it's difficult to be sure of the relative ratings.
|
|
IP Logged |
|
|
|
99of9
Forum Guru
Gnobby's creator (player #314)
Gender:
Posts: 1413
|
|
Re: Arimaa rating deflation
« Reply #55 on: Dec 18th, 2003, 10:54am » |
Quote Modify
|
on Dec 18th, 2003, 10:30am, MrBrain wrote:I think you are severely overestimating the number of levels between random and regular players. |
| Actually David's estimate of [(Strong Human - Random)=~5000 (to 10000)], is not that far off my estimate of [Random Rating on Current Scale = -2000], since strong humans can have a rating over +2000. But anyway, a more precise answer will eventually be established by Clauchau's bots. 99
|
|
IP Logged |
|
|
|
99of9
Forum Guru
Gnobby's creator (player #314)
Gender:
Posts: 1413
|
|
Re: Arimaa rating deflation
« Reply #56 on: Dec 18th, 2003, 11:05am » |
Quote Modify
|
on Dec 14th, 2003, 11:50am, clauchau wrote: the mover won 54%, the stepper won 46% |
| If we define Random Mover as our 0, Random Stepper therefore has a rating of approximately -28.
|
« Last Edit: Dec 18th, 2003, 12:42pm by 99of9 » |
IP Logged |
|
|
|
99of9
Forum Guru
Gnobby's creator (player #314)
Gender:
Posts: 1413
|
|
Re: Arimaa rating deflation
« Reply #57 on: Dec 18th, 2003, 11:17am » |
Quote Modify
|
on Dec 16th, 2003, 1:43pm, clauchau wrote: SUL won 62.7% and RS won 37.3% of 100,000 games S+I-I won 97.7% and RS won 2.3% of 100,000 games S+F-F won 91.5% and S+I-I won 8.5% of 100,000 games. |
| That gives SUL a rating of about 62 (90 higher than RS). S+I-I is approximately at 623 (651 higher than RS) S+F-F is approximately at 1036 (413 higher than S+I-I). To be honest I think we're still quite a way from Shallowblue, because at the moment, in games of Shallowblue vs S+F-F, shallowblue will simply eat every rabbit that S+F-F sends forward. This flooding mechanism may be good against bots that don't try to trap it, but as soon as you put any trapping plan into action, flooder is dead. By the way: S+F-F is only different to S+I-I when the lead rabbit cannot make progress. In that case S+F-F sends another rabbit forward, whereas S+I-I simply makes a random move. Notice that this small difference in strategies resulted in a few hundred ratings points!!
|
|
IP Logged |
|
|
|
MrBrain
Forum Guru
Arimaa player #344
Gender:
Posts: 148
|
|
Re: Arimaa rating deflation
« Reply #58 on: Dec 18th, 2003, 12:24pm » |
Quote Modify
|
Well, perhaps the rating difference is more than I expect (but I am almost positive much less than 10000). But yes, we will definitely find out from the experiments. I am very excited to see the results!
|
|
IP Logged |
|
|
|
fotland
Forum Guru
Arimaa player #211
Gender:
Posts: 216
|
|
Re: Arimaa rating deflation
« Reply #59 on: Dec 19th, 2003, 1:09am » |
Quote Modify
|
Does anyone have an estimate of shallow blue's actual rating, since its currently frozen? I'm confident that ariminator will win very close to 100% against it, so perhaps shallow blue's rating is actually about 500 on the current scale. Maybe Omar could let it float and we could see where it ends up. A bigger ratings issue with using bots is that they don't learn and people do. People will discover their weaknesses, and exploit the same weakness over and over. This causes distortion in the relative human ratings. Of course we already have this problem, but I don't think fixing the bot ratings will help it. Finally, many people are familiar with chess ratings, yahoo ratings, etc. If we shift the whole rating system up thousands of points and popular the familiar ratings with many bots, it will look a little odd Still, I'm very interested in the results of the bot experiments. I bet I could write 3 bots where bot1 beats bot2 close to 100%, bot 2 beats bot3, and bot3 beats bot1. Would that be enough to demonstrate the futility of using bots to make a more stable rating system?
|
|
IP Logged |
|
|
|
|