Arimaa Forum - Arimaa rating deflation

Welcome, Guest. Please Login or Register.
Jul 18^th, 2025, 7:48pm

Home

Help

Members

Arimaa Forum « Arimaa rating deflation »

   Arimaa Forum
   Arimaa
   General Discussion (Moderator: supersamu)
   Arimaa rating deflation

« Previous topic | Next topic »

Pages: 1 2 3 ... 12

Notify of replies

Send Topic

Author

Topic: Arimaa rating deflation (Read 31613 times)

fotland
Forum Guru

Arimaa player #211

Gender: male

Posts: 216

Arimaa rating deflation
« on: Oct 11^th, 2003, 2:44pm »

Quote

Modify

It seems like the established players have gotten a lot stronger lately. Certainly the ratings of the bots have been pushed way down, and it's not easy for me to keep speedy improving as fast as the top players.

This has caused the ratings to be much tougher. A 1500 player is much stronger today than 6 months ago.

I'm concerned that since new players come in at 1500, they might get discouraged that they end up with much smaller ratings after a few games.

My suggestion is to either have new players come in with a lower provisional rating, like 1400 or 1300, or add 100 or 200 rating points to everyone.

-David

IP Logged

omar
Forum Guru

Arimaa player #2

Gender: male

Posts: 1003

Re: Arimaa rating deflation
« Reply #1 on: Nov 19^th, 2003, 12:43am »

Quote

Modify

The choice of using 1500 for the initial rating was kind of arbitrary. But even if I set the initial ratings to 1300 the beginners will still lose to the easy bots until they get used to basic ideas of the game (and still get discouraged).

I really think that beginners would be able to consistently beat the easy bots if they went through an introduction to the game and learned the basic principles and did some practice puzzles. Im working on such pages as time permits.

The one thing that bothers me about this whole rating system is that it's not anchored to anything. What does a rating of 1500 mean anyways? Only rating differences have a meaning which can be translated to probability of outcomes. But the absolute rating values don't really have any meaning; so the whole scale could be shifted to whatever we want.

I've been thinking of anchoring the scale so that a program which plays a completely random game (i.e. generates the list of all possible moves and randomly selects one) is defined as having a rating of 0. This program would be allowed to play rated games, but its rating would never change; only the ratings of it's opponents would change. Layers of progressively better (but still quite nieve) programs would establish their ratings based on this program and each other. The beginners could then be given ratings comperable to the better of these programs. The intermediate players would establish their ratings based on the beginners and each other and so on up to the advanced players.

This would provide a floor so that ratings can't get deflated. Also it would keep the ratings scale from drifting over time.

Omar

IP Logged

fotland
Forum Guru

Arimaa player #211

Gender: male

Posts: 216

Re: Arimaa rating deflation
« Reply #2 on: Nov 19^th, 2003, 10:17pm »

Quote

Modify

I think your random bot would be worse than you think

It would never win a single game, so it would cause the rest if the ratings to inflate forever.

Is it really a problem that the ratings are arbitrary and not anchored?

You have an anchor anyway, since shallow blue has a rating that never changes.

IP Logged

99of9
Forum Guru

Gnobby's creator (player #314)

Gender:

Posts: 1413

Re: Arimaa rating deflation
« Reply #3 on: Nov 20^th, 2003, 4:51am »

Quote

Modify

You're right, random play is seriously bad! But I think that's why we'd need other naiive bots. And actually, other ratings wouldn't inflate forever anyway, as long as the system was similar to the one now - even 100% win reaches an equilibrium.

Shallowblue doesn't really have a proper rating, it's only played a few rated games. And I don't think it's really fair to say that something not even participating in the ratings is somehow "anchored".

But we do kindof have an anchor for our whole distribution. Namely the average rating should always stay at 1500 (as long as newbies always come in with 1500). Now, fair enough, this means a new player has to drop before (s)he can gain, but changing the incoming rating to 1200 wouldn't help because then, over the time the average rating would become 1200 (after enough new users had been added). Then the same problem could occur with newbies dropping rapidly below 1200 at first. This is always going to be the case in a conservative ratings scheme.

So I think Omar's suggestion has merit. The random bot at 0 would be nice for theoretical prettiness, but that's might take a long time for the ratings to equilibrate, so maybe it's just better to set a bot we all know - say arimaazilla to 1500.

IP Logged

omar
Forum Guru

Arimaa player #2

Gender: male

Posts: 1003

Re: Arimaa rating deflation
« Reply #4 on: Nov 28^th, 2003, 10:57pm »

Quote

Modify

Well I've played a random bot against itself and the games usually end in about 30 to 60 moves, which is not that different than normal games.

I have not done any experiments to see how a random bot compares to a 1 ply bot. But even if a 1 ply bot won 100% of the games we could always make a weaker bot from the 1 ply bot by having it look 1 ply with a fixed probability and select random moves otherwise. By using different values for this probability we can get a range of bots between the random bot and the 1 ply bot. Similarly we could produce a range of bots between a 1 and 2 ply bot. I have not had the time to do any such experiments yet. If anyone is interested to do them and report the results it would give us a good idea of how many layers of neive bots we might need before we get to bots that play like human beginners. Keep in mind that shallowBlue is a 1 ply bot and Arimaazilla and Arimaanator are 2 ply bots (differing only in the evaluation function). The 2 ply bots difinetly pose a good challenge for beginners.

It could turn out that in an anchored rating system the ratings of average players may be much higher then what they are now. But they would not increase indefinetely.

An anchored rating system which does not drift over time and is independent of the current population of players allows somewhat reasonalbe comparisons of player ratings from different time periods. In chess for example people often speculate about how Fischer would do against Kasparov. These kind of comparisons could be done if the chess rating system had been anchored.

Also if other games adopted an anchored rating system it may allow us to make comparisons between games of their level of complexities. For example the ratings of the best Go players may be much higher than the ratings of the best Chess players. Or maybe they would be the same; who knows. But still it would be an interesting comparison.

Omar

IP Logged

fotland
Forum Guru

Arimaa player #211

Gender: male

Posts: 216

Re: Arimaa rating deflation
« Reply #5 on: Nov 29^th, 2003, 1:55am »

Quote

Modify

I don't think it's possible to make a stable rating system
unless there is an unchanging plater who plays a lot to be the anchor. NNGS uses a group of older players who aren't improving as anchors. But arimaa doesn't have that yet. I wouldn't recommend anchoring from the bottom, since a random player is so much weaker. It would push up all the ratings, and cause a lot of instability until ratings stabilized at the new levels.

The go rating system is anchored at the top, at 9 dan. This works becuase it is an old game, and the very strongest players are very close.

IP Logged

99of9
Forum Guru

Gnobby's creator (player #314)

Gender:

Posts: 1413

Re: Arimaa rating deflation
« Reply #6 on: Nov 29^th, 2003, 7:16am »

Quote

Modify

But that Go system cannot really be called anchored, just upper bounded. If a new amazing (computer?) player that never lost came along, he would be 9 dan, and everyone else would deflate.

IP Logged

clauchau
Forum Guru

bot Quantum Leapfrog's father

Gender:

Posts: 145

Re: Arimaa rating deflation
« Reply #7 on: Nov 29^th, 2003, 12:53pm »

Quote

Modify

Quote:

we can get a range of bots between the random bot and the 1 ply bot.

One problem is - we can get an ordered chain of millions of bots between any two fixed bots such that any bot in the chain always loses against the next bot in the chain...

IP Logged

MrBrain
Forum Guru

Arimaa player #344

Gender: male

Posts: 148

Re: Arimaa rating deflation
« Reply #8 on: Nov 29^th, 2003, 2:24pm »

Quote

Modify

To have a "fixed" rating scale, one needs only to have one robot whose style never changes. A completely random bot seems to me to be the best standard that we could implement. The bot should always play rated games, but its rating should be fixed and never change.

But what does "completely random" mean? There are two possibilities:
1. Each possible move order that leads to a legal position (a change in position from the previous move) would be chosen with equal probability.
2. Each possible legal position that could result after any move order would be chosen with equal probability. Then any move order that achieves this position could be chosen.

For the first move of the game (placement of pieces), this distinction is moot, as either #1 or #2 would lead to the same starting positions with the same probability. However, for subsequent moves, this distinction does make a difference. For example, consider a move where four different animals take one step forward, compared with a move where the elephant takes four steps forward. Using #1, the four-animal move would be 24 times as likely (4!) as the four-space elephant move, since there are 24 different move orders that achieve the end position of the four-animal move, but only one move order that achieves the 4-space elephant move end position. With #2, both end positions would be equally likely.

So I would recommend making a bot "bot_random" that follows strictly #1 or #2, and setting its rating to a low value (1000 seems too high to me, maybe 600 is better). Then, allow it to play rated games, but have its rating never change.

By the way, which of #1 or #2 was implemented before? Were either? (It seems to me that it would be easy to implement something that approximates either #1 or #2, but more difficult to implement #1 or #2 strictly.)

« Last Edit: Nov 29^th, 2003, 2:44pm by MrBrain »

IP Logged

fotland
Forum Guru

Arimaa player #211

Gender: male

Posts: 216

Re: Arimaa rating deflation
« Reply #9 on: Nov 29^th, 2003, 3:00pm »

Quote

Modify

I don't think you realize how very weak a random bot would be. Someone did some experiments with random go bots a few years ago, thinking that they woul dbe 30 or 35 kyu, but they are much weaker.

If you set a random bot to 600, you might push up the entire rating system by several thousand points, and it would take forever to stabilize at the new level.

David

IP Logged

MrBrain
Forum Guru

Arimaa player #344

Gender: male

Posts: 148

Re: Arimaa rating deflation
« Reply #10 on: Nov 29^th, 2003, 3:04pm »

Quote

Modify

Well someone recommended 1000, and I thought that seemed too high. That's why I said lower is better. Perhaps a rating of 0 would aesthetically make the most sense, because any bot that does worse than that is playing worse than random, and therefore deserves a negative rating!

IP Logged

99of9
Forum Guru

Gnobby's creator (player #314)

Gender:

Posts: 1413

Re: Arimaa rating deflation
« Reply #11 on: Nov 29^th, 2003, 5:59pm »

Quote

Modify

Random bot would indeed be very bad. I'd estimate, compared to the current ratings system, that it plays at a rating of approx -2000. I tried making a random bot when I first made Gnobby, and it regularly walked pieces directly into traps.

As David points out, this makes any direct comparison with humans unreasonable. Therefore to make it workable, we'd need intermediate bots. Claude, I'd suggest that all intermediate bots were somewhat stochastic to prevent the problem you suggest.

If the ratings were to be reset, I agree that random should be set at 0. Of Mr Brain's methods, I'd choose #2.

IP Logged

MrBrain
Forum Guru

Arimaa player #344

Gender: male

Posts: 148

Re: Arimaa rating deflation
« Reply #12 on: Nov 29^th, 2003, 7:04pm »

Quote

Modify

I'm not sure you understand how low a rating of -2000 is. Even a random bot wouldn't be that low, as opponents would sometimes resign, lose on time, etc. A rating of -2000 would mean that it would lose virtually every time to a player with rating -1200, and that player loses every time to a player with rating -400, and that player loses every time to... etc. This is not realistic. I think a rating of 0 would be a good place to start. And you wouldn't really have to reset anything. The ratings of existing players would drift to match the fixed system. What might be needed, however, are some bots that are somewhere between random and poor. Right now, shallowblue would beat a random bot just about every time, meaning that its rating would never go down to the level that you'd expect from random=0.

« Last Edit: Nov 29^th, 2003, 7:09pm by MrBrain »

IP Logged

MrBrain
Forum Guru

Arimaa player #344

Gender: male

Posts: 148

Re: Arimaa rating deflation
« Reply #13 on: Nov 29^th, 2003, 7:08pm »

Quote

Modify

Actually, if you had shallowblue play rated games both against humans and against the random bot, I believe that you'd get a stable rating for shallowblue. Right now, shallowblue's rating is too high, compared with other players. I think a rating of about 800 to 1000 for shallowblue seems about right. And that seems about the right number of points above random (0) for its strength to me.

IP Logged

MrBrain
Forum Guru

Arimaa player #344

Gender: male

Posts: 148

Re: Arimaa rating deflation
« Reply #14 on: Nov 29^th, 2003, 7:29pm »

Quote

Modify

Thinking about this some more. Yes, I may be underestimating the poorness of random. However, I think I prefer option #2 as well, which to me seems like it would be slightly stronger than #1, since on average it will tend to make moves that achieve more than with method #1. The question is, how many "levels" of bots would be between random and shallowblue? I believe that at worst, there would be a bot that could beat random 99% of the time, while it could beat shallowblue 1% of the time. That would give shallowblue a rating of about 1600. That would then be somewhat of a ratings inflation.

Another question then becomes, what rating to you give to new players? Perhaps the current average of all players would make sense.

Another concern is the provisional ratings formula for new players. If they were to play a bot more than 400 points below their current rating, they would lose points even after a win. (USCF recently changed their formulas to avoid such weirdnesses.) A solution might be to disallow rated games in such situations. Or make such games automatically unrated. Just a thought.

IP Logged

Pages: 1 2 3 ... 12

Notify of replies

Send Topic


« Previous topic \| Next topic »