Arimaa Forum - Arimaa rating deflation

Welcome, Guest. Please Login or Register.
Apr 20^th, 2024, 7:03am

Home

Help

Members

Arimaa Forum « Arimaa rating deflation »

   Arimaa Forum
   Arimaa
   General Discussion (Moderator: supersamu)
   Arimaa rating deflation

« Previous topic | Next topic »

Pages: 1 ... 3 4 5 6 7 ... 12

Notify of replies

Send Topic

Author

Topic: Arimaa rating deflation (Read 30006 times)

99of9
Forum Guru

Gnobby's creator (player #314)

Gender:

Posts: 1413

Re: Arimaa rating deflation
« Reply #60 on: Dec 19^th, 2003, 3:03am »

Quote

Modify

on Dec 19^th, 2003, 1:09am, fotland wrote:

Still, I'm very interested in the results of the bot experiments. I bet I could write 3 bots where bot1 beats bot2 close to 100%, bot 2 beats bot3, and bot3 beats bot1. Would that be enough to demonstrate the futility of using bots to make a more stable rating system?

People have alluded to this, it will be interesting to see if any of Clauchau's bots do it. I don't think a set of bots specifically designed for this is interesting, unless they are fairly generic strategies. Nor do I think this does break the system, we should include these cycles in our analysis. In the end we should probably put all Bot-Bot results in a big matrix and do some diagonalization or something. I agree the rankings for the higher bots will not be watertight, but they will be approximately correct.

« Last Edit: Dec 19^th, 2003, 3:04am by 99of9 »

IP Logged

clauchau
Forum Guru

bot Quantum Leapfrog's father

Gender:

Posts: 145

Re: Arimaa rating deflation
« Reply #61 on: Dec 20^th, 2003, 5:15am »

Quote

Modify

K = every friendly piece on the board is worth one point
S = Arimaa score R + P*(C+1)

+X -Y means X as viewed by the playing player is maximized above all. In case of steps or moves with equal value, Y as viewed by the opposing player is minimized.

Below are percentages of wins. Every figure is based on 100,000 games when only Stepping strategies are involved. When one or two players has some Moving strategy, only 1000 games have been sampled.

	S	M	S+K-K	S+I	S+I-I	M+I-I	S+F-F	S+S-S	M+F-F	M+S-S
S
M	54
S+K-K	82.6	81
S+I	93.0		59.5
S+I-I	97.7	98.5	67.1	71.7
M+I-I	99.9	99.9			84
S+F-F	99.97	99.90			91.5	66
S+S-S	99.98				95.6	72	64.4
M+F-F					97.0	74	59	52
M+S-S							72	66	53.5

« Last Edit: Dec 22^nd, 2003, 6:43am by clauchau »

IP Logged

omar
Forum Guru

Arimaa player #2

Gender: male

Posts: 1003

Re: Arimaa rating deflation
« Reply #62 on: Dec 20^th, 2003, 9:40am »

Quote

Modify

Wow Claude, you've done a really impressive job of collecting some good stats on the random bots. Thanks so much for doing this.

Is it possible that you could send me a copy of your program so I can also try out some experiments. Actually I was thinking that we really should keep a repository of the programs we use for the random bots and the other simple bots we use for anchoring the rating system. I can make it available under the download section of the Arimaa site so that others can also look at the code and experiment with it. It would be great if you could contribute this.

Omar

IP Logged

omar
Forum Guru

Arimaa player #2

Gender: male

Posts: 1003

Re: Arimaa rating deflation
« Reply #63 on: Dec 20^th, 2003, 9:45am »

Quote

Modify

on Dec 16^th, 2003, 8:46am, MrBrain wrote:

That all sounds very good. Any chance you could get the random mover, possibly called "bot_random", on line and start playing against other bots? (I suppose Omar would first have to implement a change to the rating system so that bot_random's rating stays at 0 no matter what, but the opponent's rating moves accordingly.)

Actually I would not need to make any changes. If I just set the RU to zero (meaning there is no uncertianty about the players rating) then the rating and RU of that player will not change. Pretty nice how the equations just work out that way

Omar

IP Logged

omar
Forum Guru

Arimaa player #2

Gender: male

Posts: 1003

Re: Arimaa rating deflation
« Reply #64 on: Dec 20^th, 2003, 10:52am »

Quote

Modify

I have a simple question. What is the rating of the perfect tic-tac-toe program if the rating of the random program is set to zero and we use the Arimaa rating equations to establish the rating scale:
http://arimaa.com/airmaa/rating/

Would we be able to independently come up with the same value. Of course we would all come up with a different value due to sampling difference, but how different would it be. Would they be say within 100 rating points of one another, or would they be way off.

I think this is worth investigating to learn more about anchored rating systems before we use it in more complex games.

Omar

IP Logged

omar
Forum Guru

Arimaa player #2

Gender: male

Posts: 1003

Re: Arimaa rating deflation
« Reply #65 on: Dec 20^th, 2003, 10:53am »

Quote

Modify

Had a typo in the link; it should be:
http://arimaa.com/arimaa/rating/

Omar

IP Logged

99of9
Forum Guru

Gnobby's creator (player #314)

Gender:

Posts: 1413

Re: Arimaa rating deflation
« Reply #66 on: Dec 20^th, 2003, 11:51am »

Quote

Modify

Does the perfect tic-tac-toe program know that it's opponent is playing random? If so it might play differently.

IP Logged

omar
Forum Guru

Arimaa player #2

Gender: male

Posts: 1003

Re: Arimaa rating deflation
« Reply #67 on: Dec 20^th, 2003, 1:23pm »

Quote

Modify

on Dec 20^th, 2003, 11:51am, 99of9 wrote:

Does the perfect tic-tac-toe program know that it's opponent is playing random? If so it might play differently.

No, it should not make any assumptions about the opponent other than the opponent will try to make the best possible move.

Omar

IP Logged

omar
Forum Guru

Arimaa player #2

Gender: male

Posts: 1003

Re: Arimaa rating deflation
« Reply #68 on: Dec 20^th, 2003, 1:49pm »

Quote

Modify

Claude was nice enough to contribute his random bot program. I have put it up on the Arimaa site:
http://arimaa.com/arimaa/download/randomBot/claude/r.cpp

I haven't had a chance to try it out yet.

Here's the notes that Claude sent me with the program:

Here is the c++ program I use to get statistics
among bots. Unfortunately I get 3 errors when compiled
with Gnu gcc, although it does look fine and get compiled
with Borland c++ compiler. Tell me if you figure out my
mistake. In any case, feel free to include it on your
download page and allow people to use and alter it.

IP Logged

clauchau
Forum Guru

bot Quantum Leapfrog's father

Gender:

Posts: 145

Re: Arimaa rating deflation
« Reply #69 on: Dec 21^st, 2003, 1:24pm »

Quote

Modify

on Dec 20^th, 2003, 10:52am, omar wrote:

Would we be able to independently come up with the same value. Of course we would all come up with a different value due to sampling difference, but how different would it be. Would they be say within 100 rating points of one another, or would they be way off.

Calculations involving confidence intervals in sampling statistics show that

if you sample 100 games between bots X and Y and get a certain proportion x% of wins for X, then you are 99.9% sure the real proportion is between (x-17)% and (x+17)%;

if you sample 1,000 games between bots X and Y and get a certain proportion x% of wins for X, then you are 99.9% sure the real proportion is between (x-5.2)% and (x+5.2)%;

if you sample 10,000 games between bots X and Y and get a certain proportion x% of wins for X, then you are 99.9% sure the real proportion is between (x-1.7)% and (x+1.7)%;

if you sample 100,000 games between bots X and Y and get a certain proportion x% of wins for X, then you are 99.9% sure the real proportion is between (x-0.52)% and (x+0.52)%;

if you sample 1,000,000 games between bots X and Y and get a certain proportion x% of wins for X, then you are 99.9% sure the real proportion is between (x-0.17)% and (x+0.17)%.

Being 99.9% sure means the real proportion might be out of that interval but only one sample over 1000 could then have yielded an estimation x% that far from the real proportion. In other words, if you trust the intervals given above, you are going to match the truth 999 times over 1000 and be deceived once every 1000 times or so.

When the sampled proportion x% is close to 100% you can trust it more closely. For x% = 85% you can replace any 17 above by 12 and 52 by 37 but it gets much better when closer to 100%.

As a result, the difference of ratings between bots X and Y lies -- with a 99.9% level of confidence -- in an interval [A,B] whose length B-A is about

240 points if x%=50% (430 points if x%=85%) when you sample 100 games;

73 points if x%=50% (103 points if x%=85%) when you sample 1,000 games;

23 points if x%=50% (32 points if x%=85%) when you sample 10,000 games;

7.2 points if x%=50% (10.1 points if x%=85%) when you sample 100,000 games;

2.3 points if x%=50% (3.2 points if x%=85%) when you sample 1,000,000 games.

(hmm, I hope this was clear).

« Last Edit: Dec 21^st, 2003, 1:35pm by clauchau »

IP Logged

fotland
Forum Guru

Arimaa player #211

Gender: male

Posts: 216

Re: Arimaa rating deflation
« Reply #70 on: Dec 21^st, 2003, 1:35pm »

Quote

Modify

Thanks for the confidence data, Claude. Doesn't this mean that there is no hope for a truly stable rating system? Even with 100 games, the high confidence interval is still 240 points. It's unlikely that two people will play that many games between them.

From the results of the world championship it seems that the top 4 players are very similar in strength, but from month to month their ratings have a spread of over 100 points. It looks now that this kind of spread is inherent in the sampling process, due to a small number of games played, and that no rating system can do better.

David

IP Logged

clauchau
Forum Guru

bot Quantum Leapfrog's father

Gender:

Posts: 145

Re: Arimaa rating deflation
« Reply #71 on: Dec 21^st, 2003, 2:06pm »

Quote

Modify

If we lower the confidence level to 90% instead of 99.9% -- so we accept to be deceived once in ten times -- then 240 turns into 116, which is still pretty large. But most serious people will play 100 games, some even against fixed bots, and get to know their rating within that range. Not too bad.

For 1,000 games, 73 turns into 36. Looks like the indeterminacy level around Chess master ratings?

IP Logged

MrBrain
Forum Guru

Arimaa player #344

Gender: male

Posts: 148

Re: Arimaa rating deflation
« Reply #72 on: Dec 21^st, 2003, 9:08pm »

Quote

Modify

on Dec 20^th, 2003, 10:52am, omar wrote:

Actually, this is not as simple a question as it sounds. The reason is that tic-tac-toe is a theoretical draw, and there are in many cases several drawing moves. For example, in the first move, any square draws. If the first player picks center, the second must pick a corner; if the first player picks corner, the second must pick center. However, if the first player picks a side box, both center and the opposite side box draw.

So the question of whether the perfect player knows it's playing against a random bot is actually relevant. In most cases, there will be a "better" move that increases the chances of winning against a random player. However, if the "perfect player" believes it's playing against a typical human, for example, it may be best to play a particular opening that humans will fall for most frequently.

For example, if playing first, then corner, center, opposite corner is a good opening trick. If the second player then picks one of the two remaining corners, the first player will win. Another example is to pick a side box as first player since this is a somewhat unusual opening.

Anyway, in tic-tac-toe, it may make the most sense to define a "perfect player" as one that maximizes its winning chances against a random opponent, since tic-tac-toe is a simple game. However, it would be easy to see that if we extended this definition to a complex game such as Arimaa, this definition would be ridiculous.

We could instead define the perfect tic-tac-toe player as one that chooses randomly from all equally optimal moves. However, this may not give the best chances of winning against either a random opponent or a human opponent. In Arimaa, such a definition would lead to an extremely strong player. But in a game such as Arimaa, you also might have to consider that a move winning the game quicker is "superior" to one that forces a win in a larger number of moves.

« Last Edit: Dec 21^st, 2003, 10:04pm by MrBrain »

IP Logged

omar
Forum Guru

Arimaa player #2

Gender: male

Posts: 1003

Re: Arimaa rating deflation
« Reply #73 on: Jan 20^th, 2004, 9:03pm »

Quote

Modify

Things have been busy latly and I have not been able to think about this much. But in the long run I still want to eventually go to an anchored rating system once we get more familiar with them.

Omar

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: Arimaa rating deflation
« Reply #74 on: Sep 15^th, 2004, 9:37pm »

Quote

Modify

This thread had many interesting posts, including fascinating statistics produced by clauchau. I would like to add my voice, however, to those who are skeptical of the whole project of providing an "absolute" scale to the ratings.

Assigning a rating of 0 to a random mover (or stepper) makes sense to me. That's absolute, and reasonable. Now suppose I create a second algorithm VeryDumb which beats the random bot 3/4 of the time. By the rating formula, VeryDumb should be rated 191. This also can be considered absolute.

Then suppose I create the bot TotallyNuts, which beats VeryDumb 3/4 of the time and the random bot 4/5 of the time . By the rating formula TotallyNuts should be rated 241 points above the random bot, and 191 points above VeryDumb.

What should the rating of TotallyNuts be, then, 241 or 382? If you say 382, then it isn't absolute. If you say 241, it is still absolute, but what good is it? You can't tell from the ratings of VeryDumb and TotallyNuts how they will do against each other, only how well each does against the random bot. If we anchored ratings that way, they would be essentially meaningless.

The problem is that ratings aren't truly transitive. You can't infer from A's results against B and from B's results against C, exactly what A's results against C will be. With humans this is a slight problem, but with bots it can become a huge problem. Indeed, at least two people have pointed out that with deterministic bots, the ratings formula doesn't work at all.

I could give many examples of ways in which ratings aren't transitive, but I will spare you unless someone asks. The important point is that if you don't have transitivity, then the notion of putting the ratings on an absolute scale becomes meaningless. The rating of TotallyNuts depends on whether VeryDumb is in the playing pool or not. And if the ratings depend on who is in the playing pool, then they are by definition only on a relative scale.

A far more pressing concern than an absolute scale for ratings is making sure that, while they are necessarily relative to the field, the ratings are accurate against the field as a whole. One shouldn't be able to find a favorable matchup and exploit it. I, for example, have gotten a rating of 1950 without beating a single human opponent. I know how to beat bots, that's all, so I have a ridiculously inflated rating. The meaningfulness of the ratings would be enhanced far more by forcing people to play a variety of opponents than it would by trying to anchor it "absolutely".

IP Logged

Pages: 1 ... 3 4 5 6 7 ... 12

Notify of replies

Send Topic


« Previous topic \| Next topic »