Arimaa Forum - (no) absolute score values for pieces?

Welcome, Guest. Please Login or Register.
Jul 15^th, 2025, 3:24pm

Home

Help

Members

Arimaa Forum « (no) absolute score values for pieces? »

   Arimaa Forum
   Arimaa
   Bot Development (Moderator: supersamu)
   (no) absolute score values for pieces?

« Previous topic | Next topic »

Pages: 1 ... 5 6 7 8 9

Notify of replies

Send Topic

Author

Topic: (no) absolute score values for pieces? (Read 44091 times)

rbarreira
Forum Guru

Arimaa player #1621

Gender: male

Posts: 605

Re: (no) absolute score values for pieces?
« Reply #90 on: Jul 28^th, 2010, 2:33pm »

Quote

Modify

It seems logical to me that the side with the advantage always benefits from trading equal material.

Isn't this the whole idea behind FAME?

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: (no) absolute score values for pieces?
« Reply #91 on: Jul 28^th, 2010, 3:10pm »

Quote

Modify

on Jul 28^th, 2010, 2:33pm, rbarreira wrote:

It seems logical to me that the side with the advantage always benefits from trading equal material.

Isn't this the whole idea behind FAME?

I wouldn't say it is the whole idea.

A strong part of my motivation to get away from fixed values for the pieces was that a camel goes down in value on an emptier board, while rabbits go up in value. It was clear even then that Bomb vastly overvalued its camel in an endgame, to its own detriment.

A subtle point is that the side with the advantage might have fewer pieces, e.g. M for HR. The side with the camel has an advantage, but equal trades can turn the advantage into a disadvantage in a big hurry.

on Jul 28^th, 2010, 2:13pm, jdb wrote:

I have been doing some tests with the various material evaluators using Janzert's roundrobin program. The games are 10 sec per move, so a game takes around 10 minutes. I'll post the results when there are enough games.

Fantastic. I am very curious to see your results. I wonder whether the results will be statistically significant, or the genuine differences in evaluation will be drowned in noise.

Quote:

Assume there is a H for d trade. Who benefits from equal trades? The side with the extra H benefits from equal trades of camels or horses. This gets them closer to having the strongest piece. What about equal trades of dogs, cats or rabbits?

Assume one side has an extra rabbit. Who benefits from equals trades? Trading rabbits eventually leads to 2 rabbits vs 1 rabbit. The extra rabbit becomes a huge advantage. What about trading cats, dogs, horses or camels? Eventually this leads to E vs e with an extra rabbit. This also looks like a big advantage for the extra rabbit.

Yes, every equal trade should benefit a player with an extra rabbit. Use with caution, though, because an extra rabbit is a small advantage on a full board and a still small (albeit greater) advantage when it gets down to E8R vs. e7r. Against a computer opponent I might not want to trade down because I expect the endgame to be its forte despite my material advantage.

Quote:

Assume one side has an extra camel. Who benefits from equal trades? Fame/Harlog puts the initial advantage at 5.64/6.48. With everything but the rabbits traded off, leaving EM8R vs E8R, Fame/Harlog is 6.38/3.80. Finally EMR vs er, FAME/Harlog is 8.46/5.31. This looks like an area for improvement. What is correct in this case?

HarLog performs better than FAME on this one, although I don't entirely trust HarLog either because of how it treats rabbits. When I have an extra camel, I feel that any equal trade weakens my position, even if only slightly. My objective when I am up a camel is always to get a better-than-equal trade or (ideally) to get something for nothing. I feel sufficiently strongly about this that I think it would make a good litmus test of the kind you are seeking, i.e. any material evaluator that likes any equal trade (except elephants Cheesy

) when up a camel is just wrong and should be replaced by some modified form of itself.

But the effect is not strong. If I am up a camel on a full board, trading dog for dog only hurts me a little, whereas winning a rabbit outright helps me significantly, so I would be happy to win DR for D.

« Last Edit: Jul 28^th, 2010, 3:20pm by Fritzlein »

IP Logged

jdb
Forum Guru

Arimaa player #214

Gender: male

Posts: 682

Re: (no) absolute score values for pieces?
« Reply #92 on: Jul 28^th, 2010, 4:07pm »

Quote

Modify

on Jul 28^th, 2010, 3:10pm, Fritzlein wrote:

A subtle point is that the side with the advantage might have fewer pieces, e.g. M for HR. The side with the camel has an advantage, but equal trades can turn the advantage into a disadvantage in a big hurry.

I agree the number of pieces is very important, both relative and absolute.

Quote:

Fantastic. I am very curious to see your results. I wonder whether the results will be statistically significant, or the genuine differences in evaluation will be drowned in noise.

Bayeselo is wonderful.

Quote:

I personally am indifferent to trade of weaker pieces. My reasoning is that emptying the board destabilizes the position, which benefits the player who is behind in material. On the other hand, the fewer pieces are on the board, the more likely it is for the mismatch to be relevant. I let the two opposing considerations cancel out in my mind, although I am sure sometimes one is more important than the other. This is an area where I wouldn't feel confident to tell a material evaluator that it was wrong whichever way it was leaning.

Yes, every equal trade should benefit a player with an extra rabbit. Use with caution, though, because an extra rabbit is a small advantage on a full board and a still small (albeit greater) advantage when it gets down to E8R vs. e7r. Against a computer opponent I might not want to trade down because I expect the endgame to be its forte despite my material advantage.

I'll run some test games with E8R vs e7r and see what happens.

Quote:

I'll have to think about this. If everything is traded off it comes down to EMR vs er which is a big advantage. But as you said, there is a period during the trades where the position can be destabilized. Maybe it is necessary to define when the material evaluator can be applied. That is, its only valid in a stable position without alot of threats.

IP Logged

jdb
Forum Guru

Arimaa player #214

Gender: male

Posts: 682

Re: (no) absolute score values for pieces?
« Reply #93 on: Jul 29^th, 2010, 4:24pm »

Quote

Modify

Here are the results of some testing between FAME,HarLog, and Constant. Time control was 10sec per move.

Rank Name . . Elo + - games score oppo. draws 1 Clueless_FAME . 2221 40 39 121 54% 2189 0% 2 Clueless_HarLog 2210 40 39 120 52% 2195 0% 3 Clueless_Constant 2169 39 40 121 45% 2215 0%

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: (no) absolute score values for pieces?
« Reply #94 on: Jul 29^th, 2010, 7:07pm »

Quote

Modify

Nice, thanks for sharing. Is the reported +/- two standard deviations? It appears that FAME and HarLog are statistically significantly better than constant piece values, but statistically indistinguishable from each other.

Did you tell me once that other positional factor in Clueless are tuned to work with FAME? If so, would that put HarLog at a relative disadvantage?

IP Logged

jdb
Forum Guru

Arimaa player #214

Gender: male

Posts: 682

Re: (no) absolute score values for pieces?
« Reply #95 on: Jul 29^th, 2010, 7:48pm »

Quote

Modify

The table was generated by bayeselo. I don't know what the +/- means.

The value of an initial rabbit is normalized for all material evaluators in clueless' eval. I dont think it would matter to HarLog.

IP Logged

Janzert
Forum Guru

Arimaa player #247

Gender: male

Posts: 1016

Re: (no) absolute score values for pieces?
« Reply #96 on: Jul 29^th, 2010, 8:17pm »

Quote

Modify

The +/- columns are confidence interval. I'm pretty positive 95% CI although I can't find anything explicitly stating it right now. I did find this post by Remi Coloumn on talkchess describing the four methods bayeselo has available for calculating the CI.

Quote:

Bayeselo offer 4 different algorithms for computing confidence intervals. This is the list of options, from the least accurate and fastest, to the most accurate and slowest:

* Default: assume opponents ratings are their true ratings, and Gaussian distribution
* "exactdist": assume opponents ratings are their true ratings, but does not assume Gaussian distribution. This will produce asymmetric intervals, especially for very high or very low winning rates. Cost is linear in the number of players.
* "covariance": assume Gaussian distribution, but not that the rating of opponents are true. This may be very costly if you have thousands of players, but it is more accurate than the default. The cost is cubic in the number of players (it is a matrix inversion)
* "jointdist": computes a numerical estimation of the whole distribution. It is the most accurate, but the cost is exponential in the number of players. May work for 3-4 players. You should reduce the resolution of the discretization for more players.

The output from the los (likelyhood-of-superiority) command would also be interesting to see.

Janzert

IP Logged

rbarreira
Forum Guru

Arimaa player #1621

Gender: male

Posts: 605

Re: (no) absolute score values for pieces?
« Reply #97 on: Jul 30^th, 2010, 3:17am »

Quote

Modify

on Jul 29^th, 2010, 7:07pm, Fritzlein wrote:

Actually if you take the two extremes, static values may be as high as 2208 while FAME may be as low as 2182. Or am I misunderstanding something?

Unfortunately it is necessary to test a very high number of games to test most changes, at least search-related ones... I have more or less accepted that I won't be able to conclusively test many of the changes I do to my bot.

Not everyone has a big cluster like Dr. Robert Hyatt, and CPU time at Amazon EC2 isn't cheap enough for me.

« Last Edit: Jul 30^th, 2010, 3:18am by rbarreira »

IP Logged

jdb
Forum Guru

Arimaa player #214

Gender: male

Posts: 682

Re: (no) absolute score values for pieces?
« Reply #98 on: Jul 30^th, 2010, 5:52am »

Quote

Modify

The games are 10sec per move. All pieces are setup on the first two ranks.

Rank Name . . Elo + - games score oppo. draws 1 Clueless_EC6R 2292 34 33 203 65% 2156 0% 2 Clueless_E8R 2262 27 26 303 65% 2130 0% 3 Clueless_E7R 2046 28 29 300 25% 2272 0% Output of LOS: . . .Cl Cl Cl Clueless_EC6R 86 100 Clueless_E8R 13 100 Clueless_E7R 0 0 Output of detail: 1 Clueless_EC6R 2292 203.0 (131.0 : 72.0) 103.0 ( 52.0 : 51.0) Clueless_E8R 2262 100.0 ( 79.0 : 21.0) Clueless_E7R 2046 2 Clueless_E8R 2262 303.0 (196.0 : 107.0) 103.0 ( 51.0 : 52.0) Clueless_EC6R 2292 200.0 (145.0 : 55.0) Clueless_E7R 2046 3 Clueless_E7R 2046 300.0 ( 76.0 : 224.0) 100.0 ( 21.0 : 79.0) Clueless_EC6R 2292 200.0 ( 55.0 : 145.0) Clueless_E8R 2262

Now running tournament with:

E8R,EC6R,E7R,EC5R,ECC4R,ECC3R

Code:

Rank Name . . Elo + - games score oppo. draws
1 Clueless_ECC4R 2362 52 48 143 71% 2168 0%
2 Clueless_EC6R 2307 32 31 348 65% 2164 0%
3 Clueless_E8R 2263 28 27 448 63% 2149 0%
4 Clueless_EC5R 2114 47 49 143 38% 2217 0%
5 Clueless_ECC3R 2113 47 48 143 38% 2217 0%
6 Clueless_E7R 2041 29 30 445 26% 2263 0%
Cl Cl Cl Cl Cl Cl
Clueless_ECC4R 94 99 99 99100
Clueless_EC6R 5 97 99 99100
Clueless_E8R 0 2 99 99100
Clueless_EC5R 0 0 0 51 98
Clueless_ECC3R 0 0 0 48 98
Clueless_E7R 0 0 0 1 1
1 Clueless_ECC4R 2362 143.0 (102.0 : 41.0)
29.0 ( 15.0 : 14.0) Clueless_EC6R 2307
29.0 ( 20.0 : 9.0) Clueless_E8R 2263
28.0 ( 20.0 : 8.0) Clueless_EC5R 2114
28.0 ( 25.0 : 3.0) Clueless_ECC3R 2113
29.0 ( 22.0 : 7.0) Clueless_E7R 2041
2 Clueless_EC6R 2307 348.0 (227.0 : 121.0)
29.0 ( 14.0 : 15.0) Clueless_ECC4R 2362
132.0 ( 69.0 : 63.0) Clueless_E8R 2263
29.0 ( 24.0 : 5.0) Clueless_EC5R 2114
29.0 ( 16.0 : 13.0) Clueless_ECC3R 2113
129.0 (104.0 : 25.0) Clueless_E7R 2041
3 Clueless_E8R 2263 448.0 (280.0 : 168.0)
29.0 ( 9.0 : 20.0) Clueless_ECC4R 2362
132.0 ( 63.0 : 69.0) Clueless_EC6R 2307
29.0 ( 18.0 : 11.0) Clueless_EC5R 2114
29.0 ( 21.0 : 8.0) Clueless_ECC3R 2113
229.0 (169.0 : 60.0) Clueless_E7R 2041
4 Clueless_EC5R 2114 143.0 ( 55.0 : 88.0)
28.0 ( 8.0 : 20.0) Clueless_ECC4R 2362
29.0 ( 5.0 : 24.0) Clueless_EC6R 2307
29.0 ( 11.0 : 18.0) Clueless_E8R 2263
28.0 ( 16.0 : 12.0) Clueless_ECC3R 2113
29.0 ( 15.0 : 14.0) Clueless_E7R 2041
5 Clueless_ECC3R 2113 143.0 ( 55.0 : 88.0)
28.0 ( 3.0 : 25.0) Clueless_ECC4R 2362
29.0 ( 13.0 : 16.0) Clueless_EC6R 2307
29.0 ( 8.0 : 21.0) Clueless_E8R 2263
28.0 ( 12.0 : 16.0) Clueless_EC5R 2114
29.0 ( 19.0 : 10.0) Clueless_E7R 2041
6 Clueless_E7R 2041 445.0 (116.0 : 329.0)
29.0 ( 7.0 : 22.0) Clueless_ECC4R 2362
129.0 ( 25.0 : 104.0) Clueless_EC6R 2307
229.0 ( 60.0 : 169.0) Clueless_E8R 2263
29.0 ( 14.0 : 15.0) Clueless_EC5R 2114
29.0 ( 10.0 : 19.0) Clueless_ECC3R 2113

« Last Edit: Jul 31^st, 2010, 9:47am by jdb »

IP Logged

Hippo
Forum Guru

Arimaa player #4450

Gender: male

Posts: 883

Re: (no) absolute score values for pieces?
« Reply #99 on: Jul 31^st, 2010, 1:49pm »

Quote

Modify

on Jul 29^th, 2010, 4:24pm, jdb wrote:

It would be interesting to test HA(FA)ME as well ...

IP Logged

jdb
Forum Guru

Arimaa player #214

Gender: male

Posts: 682

Re: (no) absolute score values for pieces?
« Reply #100 on: Jul 31^st, 2010, 5:59pm »

Quote

Modify

on Feb 17^th, 2010, 2:09am, Hippo wrote:

So far I have:
1) the FAME rabbit evaluation
2) let SGN_i,j is result of comparison of i-th strongest gold piece with j-th strongest silver piece.
I have matrix of coefficients C_i,j. Sum of coordinate-wise multiplication of these two matrices is second summand.
3) last summand contains 1 for each presented stone type and 10000 for presented rabbit (added for gold and subtracted for silver).

C is symmetric: first try
(250 27 10 3 1 0 0 0)
( 27 90 19 7 2 0 0 0)
( 10 19 60 13 6 1 0 0)
( 3 7 13 40 9 2 0 0)
( 1 2 6 9 30 3 1 0)
( 0 0 1 2 3 10 3 1)
( 0 0 0 0 1 3 10 3)
( 0 0 0 0 0 1 3 7)

BTW: Having C diagonal with diagonal
256 85 57 38 25 17 11 7 gives original FAME (first 2 summands). May be I had to start much nearer to this matrix.

I can test this too but I am unsure how to handle part 3.

IP Logged

Hippo
Forum Guru

Arimaa player #4450

Gender: male

Posts: 883

Re: (no) absolute score values for pieces?
« Reply #101 on: Aug 1^st, 2010, 12:27pm »

Quote

Modify

on Jul 31^st, 2010, 5:59pm, jdb wrote:

I can test this too but I am unsure how to handle part 3.

Last summand is not important (the high value of last rabbit is covered by elimination rule tests, and the at most 5 points are small enough to be notable) ... Player gains 1 point if he has a cat, 1 point if he has a dog, 1 point if he has a horse, 1 point if he has a camel and 1 point if he has an elephant.
In that case EDC8R/em8r is 1 point for gold better to EHH8R/em8r.

I was trying to make a puzzle where there the advantage of stone diversity is important but I don't think it would ever be important in real arimaa game so you can ignore the third summand

Here is code of evaluation I was trying (but not implementing bot yet).

Code:

   static int[] f_powers(int[] pieces)
   {
   int[] powers = new int[8]; int k = 0;
   for (int i = pieces.Length; i > 0; i--)
   for (int j = 0; j < pieces[i - 1] && k < 8; j++)
   powers[k++] = i;
   for (; k < 8; k++) powers[k] = 0;
   return powers;
   }

   static long f_HAME0eval(int g, int s)
   {
   int[] gPieces = f_pieces(g), sPieces = f_pieces(s);
   int[] gPowers = f_powers(gPieces), sPowers = f_powers(sPieces);
   int[,] weights = new int[8, 8]
   {{2130, 210, 40, 0, 0, 0, 0, 0},
   { 210, 540, 130, 20, 0, 0, 0, 0},
   { 40, 130, 400, 70, 10, 0, 0, 0},
   { 0, 20, 70, 270, 40, 10, 0, 0},
   { 0, 0, 10, 40, 180, 30, 0, 0},
   { 0, 0, 0, 10, 30, 120, 20, 0},
   { 0, 0, 0, 0, 0, 20, 70, 10},
   { 0, 0, 0, 0, 0, 0, 10, 60}};
   int gLastNonRabbit = 0, sLastNonRabbit = 0, maxLastNonRabbit = 0;
   long score = 0;
   for (int i = 0; i < 8; i++)
   {
   if (gPowers[i] > 1) maxLastNonRabbit = gLastNonRabbit = i + 1;
   if (sPowers[i] > 1) maxLastNonRabbit = sLastNonRabbit = i + 1;
   if (i == maxLastNonRabbit) break;
   }
   for (int i = 0; i < maxLastNonRabbit; i++)
   for (int j = 0; j < maxLastNonRabbit; j++)
   {
   if (gPowers[i] > sPowers[j]) score += weights[i, j];
   if (gPowers[i] < sPowers[j]) score -= weights[i, j];
   }
   int gNrPieces = gPieces[0] + gLastNonRabbit,
   sNrPieces = sPieces[0] + sLastNonRabbit;
   int gResist = gNrPieces + gPieces[0];
   int sResist = sNrPieces + sPieces[0];
   if ((gResist > 0) && (sResist > 0))
   if (gNrPieces > sNrPieces)
   score += (gNrPieces - sNrPieces) * 1200 / sResist;
   else
   score -= (sNrPieces - gNrPieces) * 1200 / gResist;
   for (int i = 1; i < 6; i++)
   {
   if (gPieces[i] > 0) score++;
   if (sPieces[i] > 0) score--;
   }
   if (gPieces[0] > 0) score += 10000-840/gPieces[0];
   if (sPieces[0] > 0) score -= 10000-840/sPieces[0];
   // 840,480,280,210,168,140,120,105
   return score;
   }

But as I read it now, resist should be probably 2*nrpieces-pieces[0]. The code was not optimised for speed. I have used it to precompute the evaluation table and access the table rather than recomputing so this neednot be the issue.

« Last Edit: Aug 1^st, 2010, 12:57pm by Hippo »

IP Logged

rbarreira
Forum Guru

Arimaa player #1621

Gender: male

Posts: 605

Re: (no) absolute score values for pieces?
« Reply #102 on: Aug 1^st, 2010, 3:34pm »

Quote

Modify

jdb, one thing that I have noticed while running tests with roundrobin:

The default time limit for a whole game is 10 minutes. If you see games ending due to reason "s" it's because this time was exceeded.

I changed it to 0 which should be unlimited, since I don't want results to get distorted due to this time limit. Or maybe I should use something bigger, in case there's an infinite loop or something.

No matter what, those results should be eliminated from the pgn if they are happening.

IP Logged

Janzert
Forum Guru

Arimaa player #247

Gender: male

Posts: 1016

Re: (no) absolute score values for pieces?
« Reply #103 on: Aug 1^st, 2010, 8:08pm »

Quote

Modify

By default if it isn't specified in the timecontrol there shouldn't be any limit on the game length. There may very well be a bug there in the current version though.

Just FYI, generally for testing I've been using a move limit instead of a time limit. It seems a little easier than calculating a reasonable time limit for each time control I test at. To set a move limit just append 't' to the limit. So it would look something like 3s/15s/100/0/125t.

Janzert

IP Logged

jdb
Forum Guru

Arimaa player #214

Gender: male

Posts: 682

Re: (no) absolute score values for pieces?
« Reply #104 on: Aug 2^nd, 2010, 3:39pm »

Quote

Modify

Latest bunch of games.

A couple observations.

1) A cat is worth almost exactly 2 rabbits, as long as both sides have 4 or more rabbits.

2) The first 4 rabbits captured are worth about the same. After that their value goes up quickly.

Code:

Rank Name . . Elo + - games score oppo. draws
1 Clueless_ECC8R 3309 90 76 230 93% 2538 0%
2 Clueless_ECC7R 3219 76 69 230 88% 2545 0%
3 Clueless_ECC6R 3019 63 60 230 77% 2561 0%
4 Clueless_EC8R 2987 59 56 243 75% 2550 0%
5 Clueless_ECC5R 2848 58 57 230 64% 2574 0%
6 Clueless_EC7R 2815 54 54 241 61% 2569 0%
7 Clueless_ECC4R 2640 40 39 375 56% 2543 0%
8 Clueless_EC6R 2612 31 31 579 58% 2511 0%
9 Clueless_E8R 2551 27 27 702 54% 2501 0%
10 Clueless_ECC3R 2422 40 41 375 35% 2570 0%
11 Clueless_EC5R 2418 40 41 375 34% 2571 0%
12 Clueless_E7R 2337 28 29 836 52% 2188 0%
13 Clueless_EC4R 2267 53 51 368 78% 1756 0%
14 Clueless_E6R 2148 49 48 381 69% 1786 0%
15 Clueless_ECC2R 2113 49 48 366 69% 1763 0%
16 Clueless_E5R 1996 47 48 381 59% 1797 0%
17 Clueless_EC3R 1991 47 47 368 61% 1776 0%
18 Clueless_E4R 1730 49 50 381 42% 1817 0%
19 Clueless_EC2R 1653 51 53 368 39% 1801 0%
20 Clueless_ECC1R 1553 54 56 366 34% 1804 0%
21 Clueless_E3R 1494 56 58 381 29% 1834 0%
22 Clueless_E2R 1110 73 79 381 13% 1862 0%
23 Clueless_EC1R 1025 77 83 368 11% 1847 0%
24 Clueless_E1R 542 306 -269 381 1% 1904 0%

IP Logged

Pages: 1 ... 5 6 7 8 9

Notify of replies

Send Topic


« Previous topic \| Next topic »