Author |
Topic: (no) absolute score values for pieces? (Read 39259 times) |
|
rbarreira
Forum Guru
Arimaa player #1621
Gender:
Posts: 605
|
|
Re: (no) absolute score values for pieces?
« Reply #90 on: Jul 28th, 2010, 2:33pm » |
Quote Modify
|
It seems logical to me that the side with the advantage always benefits from trading equal material. Isn't this the whole idea behind FAME?
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: (no) absolute score values for pieces?
« Reply #91 on: Jul 28th, 2010, 3:10pm » |
Quote Modify
|
on Jul 28th, 2010, 2:33pm, rbarreira wrote:It seems logical to me that the side with the advantage always benefits from trading equal material. Isn't this the whole idea behind FAME? |
| I wouldn't say it is the whole idea. A strong part of my motivation to get away from fixed values for the pieces was that a camel goes down in value on an emptier board, while rabbits go up in value. It was clear even then that Bomb vastly overvalued its camel in an endgame, to its own detriment. A subtle point is that the side with the advantage might have fewer pieces, e.g. M for HR. The side with the camel has an advantage, but equal trades can turn the advantage into a disadvantage in a big hurry. on Jul 28th, 2010, 2:13pm, jdb wrote:I have been doing some tests with the various material evaluators using Janzert's roundrobin program. The games are 10 sec per move, so a game takes around 10 minutes. I'll post the results when there are enough games. |
| Fantastic. I am very curious to see your results. I wonder whether the results will be statistically significant, or the genuine differences in evaluation will be drowned in noise. Quote:Assume there is a H for d trade. Who benefits from equal trades? The side with the extra H benefits from equal trades of camels or horses. This gets them closer to having the strongest piece. What about equal trades of dogs, cats or rabbits? |
| I personally am indifferent to trade of weaker pieces. My reasoning is that emptying the board destabilizes the position, which benefits the player who is behind in material. On the other hand, the fewer pieces are on the board, the more likely it is for the mismatch to be relevant. I let the two opposing considerations cancel out in my mind, although I am sure sometimes one is more important than the other. This is an area where I wouldn't feel confident to tell a material evaluator that it was wrong whichever way it was leaning. Quote:Assume one side has an extra rabbit. Who benefits from equals trades? Trading rabbits eventually leads to 2 rabbits vs 1 rabbit. The extra rabbit becomes a huge advantage. What about trading cats, dogs, horses or camels? Eventually this leads to E vs e with an extra rabbit. This also looks like a big advantage for the extra rabbit. |
| Yes, every equal trade should benefit a player with an extra rabbit. Use with caution, though, because an extra rabbit is a small advantage on a full board and a still small (albeit greater) advantage when it gets down to E8R vs. e7r. Against a computer opponent I might not want to trade down because I expect the endgame to be its forte despite my material advantage. Quote:Assume one side has an extra camel. Who benefits from equal trades? Fame/Harlog puts the initial advantage at 5.64/6.48. With everything but the rabbits traded off, leaving EM8R vs E8R, Fame/Harlog is 6.38/3.80. Finally EMR vs er, FAME/Harlog is 8.46/5.31. This looks like an area for improvement. What is correct in this case? |
| HarLog performs better than FAME on this one, although I don't entirely trust HarLog either because of how it treats rabbits. When I have an extra camel, I feel that any equal trade weakens my position, even if only slightly. My objective when I am up a camel is always to get a better-than-equal trade or (ideally) to get something for nothing. I feel sufficiently strongly about this that I think it would make a good litmus test of the kind you are seeking, i.e. any material evaluator that likes any equal trade (except elephants ) when up a camel is just wrong and should be replaced by some modified form of itself. But the effect is not strong. If I am up a camel on a full board, trading dog for dog only hurts me a little, whereas winning a rabbit outright helps me significantly, so I would be happy to win DR for D.
|
« Last Edit: Jul 28th, 2010, 3:20pm by Fritzlein » |
IP Logged |
|
|
|
jdb
Forum Guru
Arimaa player #214
Gender:
Posts: 682
|
|
Re: (no) absolute score values for pieces?
« Reply #92 on: Jul 28th, 2010, 4:07pm » |
Quote Modify
|
on Jul 28th, 2010, 3:10pm, Fritzlein wrote: A subtle point is that the side with the advantage might have fewer pieces, e.g. M for HR. The side with the camel has an advantage, but equal trades can turn the advantage into a disadvantage in a big hurry. |
| I agree the number of pieces is very important, both relative and absolute. Quote: Fantastic. I am very curious to see your results. I wonder whether the results will be statistically significant, or the genuine differences in evaluation will be drowned in noise. |
| Bayeselo is wonderful. Quote: I personally am indifferent to trade of weaker pieces. My reasoning is that emptying the board destabilizes the position, which benefits the player who is behind in material. On the other hand, the fewer pieces are on the board, the more likely it is for the mismatch to be relevant. I let the two opposing considerations cancel out in my mind, although I am sure sometimes one is more important than the other. This is an area where I wouldn't feel confident to tell a material evaluator that it was wrong whichever way it was leaning. Yes, every equal trade should benefit a player with an extra rabbit. Use with caution, though, because an extra rabbit is a small advantage on a full board and a still small (albeit greater) advantage when it gets down to E8R vs. e7r. Against a computer opponent I might not want to trade down because I expect the endgame to be its forte despite my material advantage. |
| I'll run some test games with E8R vs e7r and see what happens. Quote: HarLog performs better than FAME on this one, although I don't entirely trust HarLog either because of how it treats rabbits. When I have an extra camel, I feel that any equal trade weakens my position, even if only slightly. My objective when I am up a camel is always to get a better-than-equal trade or (ideally) to get something for nothing. I feel sufficiently strongly about this that I think it would make a good litmus test of the kind you are seeking, i.e. any material evaluator that likes any equal trade (except elephants ) when up a camel is just wrong and should be replaced by some modified form of itself. But the effect is not strong. If I am up a camel on a full board, trading dog for dog only hurts me a little, whereas winning a rabbit outright helps me significantly, so I would be happy to win DR for D. |
| I'll have to think about this. If everything is traded off it comes down to EMR vs er which is a big advantage. But as you said, there is a period during the trades where the position can be destabilized. Maybe it is necessary to define when the material evaluator can be applied. That is, its only valid in a stable position without alot of threats.
|
|
IP Logged |
|
|
|
jdb
Forum Guru
Arimaa player #214
Gender:
Posts: 682
|
|
Re: (no) absolute score values for pieces?
« Reply #93 on: Jul 29th, 2010, 4:24pm » |
Quote Modify
|
Here are the results of some testing between FAME,HarLog, and Constant. Time control was 10sec per move. Rank Name . . Elo + - games score oppo. draws 1 Clueless_FAME . 2221 40 39 121 54% 2189 0% 2 Clueless_HarLog 2210 40 39 120 52% 2195 0% 3 Clueless_Constant 2169 39 40 121 45% 2215 0%
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: (no) absolute score values for pieces?
« Reply #94 on: Jul 29th, 2010, 7:07pm » |
Quote Modify
|
Nice, thanks for sharing. Is the reported +/- two standard deviations? It appears that FAME and HarLog are statistically significantly better than constant piece values, but statistically indistinguishable from each other. Did you tell me once that other positional factor in Clueless are tuned to work with FAME? If so, would that put HarLog at a relative disadvantage?
|
|
IP Logged |
|
|
|
jdb
Forum Guru
Arimaa player #214
Gender:
Posts: 682
|
|
Re: (no) absolute score values for pieces?
« Reply #95 on: Jul 29th, 2010, 7:48pm » |
Quote Modify
|
The table was generated by bayeselo. I don't know what the +/- means. The value of an initial rabbit is normalized for all material evaluators in clueless' eval. I dont think it would matter to HarLog.
|
|
IP Logged |
|
|
|
Janzert
Forum Guru
Arimaa player #247
Gender:
Posts: 1016
|
|
Re: (no) absolute score values for pieces?
« Reply #96 on: Jul 29th, 2010, 8:17pm » |
Quote Modify
|
The +/- columns are confidence interval. I'm pretty positive 95% CI although I can't find anything explicitly stating it right now. I did find this post by Remi Coloumn on talkchess describing the four methods bayeselo has available for calculating the CI. Quote:Bayeselo offer 4 different algorithms for computing confidence intervals. This is the list of options, from the least accurate and fastest, to the most accurate and slowest: * Default: assume opponents ratings are their true ratings, and Gaussian distribution * "exactdist": assume opponents ratings are their true ratings, but does not assume Gaussian distribution. This will produce asymmetric intervals, especially for very high or very low winning rates. Cost is linear in the number of players. * "covariance": assume Gaussian distribution, but not that the rating of opponents are true. This may be very costly if you have thousands of players, but it is more accurate than the default. The cost is cubic in the number of players (it is a matrix inversion) * "jointdist": computes a numerical estimation of the whole distribution. It is the most accurate, but the cost is exponential in the number of players. May work for 3-4 players. You should reduce the resolution of the discretization for more players. |
| The output from the los (likelyhood-of-superiority) command would also be interesting to see. Janzert
|
|
IP Logged |
|
|
|
rbarreira
Forum Guru
Arimaa player #1621
Gender:
Posts: 605
|
|
Re: (no) absolute score values for pieces?
« Reply #97 on: Jul 30th, 2010, 3:17am » |
Quote Modify
|
on Jul 29th, 2010, 7:07pm, Fritzlein wrote:Nice, thanks for sharing. Is the reported +/- two standard deviations? It appears that FAME and HarLog are statistically significantly better than constant piece values, but statistically indistinguishable from each other. |
| Actually if you take the two extremes, static values may be as high as 2208 while FAME may be as low as 2182. Or am I misunderstanding something? Unfortunately it is necessary to test a very high number of games to test most changes, at least search-related ones... I have more or less accepted that I won't be able to conclusively test many of the changes I do to my bot. Not everyone has a big cluster like Dr. Robert Hyatt, and CPU time at Amazon EC2 isn't cheap enough for me.
|
« Last Edit: Jul 30th, 2010, 3:18am by rbarreira » |
IP Logged |
|
|
|
jdb
Forum Guru
Arimaa player #214
Gender:
Posts: 682
|
|
Re: (no) absolute score values for pieces?
« Reply #98 on: Jul 30th, 2010, 5:52am » |
Quote Modify
|
The games are 10sec per move. All pieces are setup on the first two ranks. Rank Name . . Elo + - games score oppo. draws 1 Clueless_EC6R 2292 34 33 203 65% 2156 0% 2 Clueless_E8R 2262 27 26 303 65% 2130 0% 3 Clueless_E7R 2046 28 29 300 25% 2272 0% Output of LOS: . . .Cl Cl Cl Clueless_EC6R 86 100 Clueless_E8R 13 100 Clueless_E7R 0 0 Output of detail: 1 Clueless_EC6R 2292 203.0 (131.0 : 72.0) 103.0 ( 52.0 : 51.0) Clueless_E8R 2262 100.0 ( 79.0 : 21.0) Clueless_E7R 2046 2 Clueless_E8R 2262 303.0 (196.0 : 107.0) 103.0 ( 51.0 : 52.0) Clueless_EC6R 2292 200.0 (145.0 : 55.0) Clueless_E7R 2046 3 Clueless_E7R 2046 300.0 ( 76.0 : 224.0) 100.0 ( 21.0 : 79.0) Clueless_EC6R 2292 200.0 ( 55.0 : 145.0) Clueless_E8R 2262 Now running tournament with: E8R,EC6R,E7R,EC5R,ECC4R,ECC3R Code: Rank Name . . Elo + - games score oppo. draws 1 Clueless_ECC4R 2362 52 48 143 71% 2168 0% 2 Clueless_EC6R 2307 32 31 348 65% 2164 0% 3 Clueless_E8R 2263 28 27 448 63% 2149 0% 4 Clueless_EC5R 2114 47 49 143 38% 2217 0% 5 Clueless_ECC3R 2113 47 48 143 38% 2217 0% 6 Clueless_E7R 2041 29 30 445 26% 2263 0% Cl Cl Cl Cl Cl Cl Clueless_ECC4R 94 99 99 99100 Clueless_EC6R 5 97 99 99100 Clueless_E8R 0 2 99 99100 Clueless_EC5R 0 0 0 51 98 Clueless_ECC3R 0 0 0 48 98 Clueless_E7R 0 0 0 1 1 1 Clueless_ECC4R 2362 143.0 (102.0 : 41.0) 29.0 ( 15.0 : 14.0) Clueless_EC6R 2307 29.0 ( 20.0 : 9.0) Clueless_E8R 2263 28.0 ( 20.0 : 8.0) Clueless_EC5R 2114 28.0 ( 25.0 : 3.0) Clueless_ECC3R 2113 29.0 ( 22.0 : 7.0) Clueless_E7R 2041 2 Clueless_EC6R 2307 348.0 (227.0 : 121.0) 29.0 ( 14.0 : 15.0) Clueless_ECC4R 2362 132.0 ( 69.0 : 63.0) Clueless_E8R 2263 29.0 ( 24.0 : 5.0) Clueless_EC5R 2114 29.0 ( 16.0 : 13.0) Clueless_ECC3R 2113 129.0 (104.0 : 25.0) Clueless_E7R 2041 3 Clueless_E8R 2263 448.0 (280.0 : 168.0) 29.0 ( 9.0 : 20.0) Clueless_ECC4R 2362 132.0 ( 63.0 : 69.0) Clueless_EC6R 2307 29.0 ( 18.0 : 11.0) Clueless_EC5R 2114 29.0 ( 21.0 : 8.0) Clueless_ECC3R 2113 229.0 (169.0 : 60.0) Clueless_E7R 2041 4 Clueless_EC5R 2114 143.0 ( 55.0 : 88.0) 28.0 ( 8.0 : 20.0) Clueless_ECC4R 2362 29.0 ( 5.0 : 24.0) Clueless_EC6R 2307 29.0 ( 11.0 : 18.0) Clueless_E8R 2263 28.0 ( 16.0 : 12.0) Clueless_ECC3R 2113 29.0 ( 15.0 : 14.0) Clueless_E7R 2041 5 Clueless_ECC3R 2113 143.0 ( 55.0 : 88.0) 28.0 ( 3.0 : 25.0) Clueless_ECC4R 2362 29.0 ( 13.0 : 16.0) Clueless_EC6R 2307 29.0 ( 8.0 : 21.0) Clueless_E8R 2263 28.0 ( 12.0 : 16.0) Clueless_EC5R 2114 29.0 ( 19.0 : 10.0) Clueless_E7R 2041 6 Clueless_E7R 2041 445.0 (116.0 : 329.0) 29.0 ( 7.0 : 22.0) Clueless_ECC4R 2362 129.0 ( 25.0 : 104.0) Clueless_EC6R 2307 229.0 ( 60.0 : 169.0) Clueless_E8R 2263 29.0 ( 14.0 : 15.0) Clueless_EC5R 2114 29.0 ( 10.0 : 19.0) Clueless_ECC3R 2113 |
|
|
« Last Edit: Jul 31st, 2010, 9:47am by jdb » |
IP Logged |
|
|
|
Hippo
Forum Guru
Arimaa player #4450
Gender:
Posts: 883
|
|
Re: (no) absolute score values for pieces?
« Reply #99 on: Jul 31st, 2010, 1:49pm » |
Quote Modify
|
on Jul 29th, 2010, 4:24pm, jdb wrote:Here are the results of some testing between FAME,HarLog, and Constant. Time control was 10sec per move. Rank Name . . Elo + - games score oppo. draws 1 Clueless_FAME . 2221 40 39 121 54% 2189 0% 2 Clueless_HarLog 2210 40 39 120 52% 2195 0% 3 Clueless_Constant 2169 39 40 121 45% 2215 0% |
| It would be interesting to test HA(FA)ME as well ...
|
|
IP Logged |
|
|
|
jdb
Forum Guru
Arimaa player #214
Gender:
Posts: 682
|
|
Re: (no) absolute score values for pieces?
« Reply #100 on: Jul 31st, 2010, 5:59pm » |
Quote Modify
|
on Feb 17th, 2010, 2:09am, Hippo wrote: So far I have: 1) the FAME rabbit evaluation 2) let SGNi,j is result of comparison of i-th strongest gold piece with j-th strongest silver piece. I have matrix of coefficients Ci,j. Sum of coordinate-wise multiplication of these two matrices is second summand. 3) last summand contains 1 for each presented stone type and 10000 for presented rabbit (added for gold and subtracted for silver). C is symmetric: first try (250 27 10 3 1 0 0 0) ( 27 90 19 7 2 0 0 0) ( 10 19 60 13 6 1 0 0) ( 3 7 13 40 9 2 0 0) ( 1 2 6 9 30 3 1 0) ( 0 0 1 2 3 10 3 1) ( 0 0 0 0 1 3 10 3) ( 0 0 0 0 0 1 3 7) BTW: Having C diagonal with diagonal 256 85 57 38 25 17 11 7 gives original FAME (first 2 summands). May be I had to start much nearer to this matrix. |
| I can test this too but I am unsure how to handle part 3.
|
|
IP Logged |
|
|
|
Hippo
Forum Guru
Arimaa player #4450
Gender:
Posts: 883
|
|
Re: (no) absolute score values for pieces?
« Reply #101 on: Aug 1st, 2010, 12:27pm » |
Quote Modify
|
on Jul 31st, 2010, 5:59pm, jdb wrote: I can test this too but I am unsure how to handle part 3. |
| Last summand is not important (the high value of last rabbit is covered by elimination rule tests, and the at most 5 points are small enough to be notable) ... Player gains 1 point if he has a cat, 1 point if he has a dog, 1 point if he has a horse, 1 point if he has a camel and 1 point if he has an elephant. In that case EDC8R/em8r is 1 point for gold better to EHH8R/em8r. I was trying to make a puzzle where there the advantage of stone diversity is important but I don't think it would ever be important in real arimaa game so you can ignore the third summand Here is code of evaluation I was trying (but not implementing bot yet). Code: static int[] f_powers(int[] pieces) { int[] powers = new int[8]; int k = 0; for (int i = pieces.Length; i > 0; i--) for (int j = 0; j < pieces[i - 1] && k < 8; j++) powers[k++] = i; for (; k < 8; k++) powers[k] = 0; return powers; } static long f_HAME0eval(int g, int s) { int[] gPieces = f_pieces(g), sPieces = f_pieces(s); int[] gPowers = f_powers(gPieces), sPowers = f_powers(sPieces); int[,] weights = new int[8, 8] {{2130, 210, 40, 0, 0, 0, 0, 0}, { 210, 540, 130, 20, 0, 0, 0, 0}, { 40, 130, 400, 70, 10, 0, 0, 0}, { 0, 20, 70, 270, 40, 10, 0, 0}, { 0, 0, 10, 40, 180, 30, 0, 0}, { 0, 0, 0, 10, 30, 120, 20, 0}, { 0, 0, 0, 0, 0, 20, 70, 10}, { 0, 0, 0, 0, 0, 0, 10, 60}}; int gLastNonRabbit = 0, sLastNonRabbit = 0, maxLastNonRabbit = 0; long score = 0; for (int i = 0; i < 8; i++) { if (gPowers[i] > 1) maxLastNonRabbit = gLastNonRabbit = i + 1; if (sPowers[i] > 1) maxLastNonRabbit = sLastNonRabbit = i + 1; if (i == maxLastNonRabbit) break; } for (int i = 0; i < maxLastNonRabbit; i++) for (int j = 0; j < maxLastNonRabbit; j++) { if (gPowers[i] > sPowers[j]) score += weights[i, j]; if (gPowers[i] < sPowers[j]) score -= weights[i, j]; } int gNrPieces = gPieces[0] + gLastNonRabbit, sNrPieces = sPieces[0] + sLastNonRabbit; int gResist = gNrPieces + gPieces[0]; int sResist = sNrPieces + sPieces[0]; if ((gResist > 0) && (sResist > 0)) if (gNrPieces > sNrPieces) score += (gNrPieces - sNrPieces) * 1200 / sResist; else score -= (sNrPieces - gNrPieces) * 1200 / gResist; for (int i = 1; i < 6; i++) { if (gPieces[i] > 0) score++; if (sPieces[i] > 0) score--; } if (gPieces[0] > 0) score += 10000-840/gPieces[0]; if (sPieces[0] > 0) score -= 10000-840/sPieces[0]; // 840,480,280,210,168,140,120,105 return score; } |
| But as I read it now, resist should be probably 2*nrpieces-pieces[0]. The code was not optimised for speed. I have used it to precompute the evaluation table and access the table rather than recomputing so this neednot be the issue.
|
« Last Edit: Aug 1st, 2010, 12:57pm by Hippo » |
IP Logged |
|
|
|
rbarreira
Forum Guru
Arimaa player #1621
Gender:
Posts: 605
|
|
Re: (no) absolute score values for pieces?
« Reply #102 on: Aug 1st, 2010, 3:34pm » |
Quote Modify
|
jdb, one thing that I have noticed while running tests with roundrobin: The default time limit for a whole game is 10 minutes. If you see games ending due to reason "s" it's because this time was exceeded. I changed it to 0 which should be unlimited, since I don't want results to get distorted due to this time limit. Or maybe I should use something bigger, in case there's an infinite loop or something. No matter what, those results should be eliminated from the pgn if they are happening.
|
|
IP Logged |
|
|
|
Janzert
Forum Guru
Arimaa player #247
Gender:
Posts: 1016
|
|
Re: (no) absolute score values for pieces?
« Reply #103 on: Aug 1st, 2010, 8:08pm » |
Quote Modify
|
By default if it isn't specified in the timecontrol there shouldn't be any limit on the game length. There may very well be a bug there in the current version though. Just FYI, generally for testing I've been using a move limit instead of a time limit. It seems a little easier than calculating a reasonable time limit for each time control I test at. To set a move limit just append 't' to the limit. So it would look something like 3s/15s/100/0/125t. Janzert
|
|
IP Logged |
|
|
|
jdb
Forum Guru
Arimaa player #214
Gender:
Posts: 682
|
|
Re: (no) absolute score values for pieces?
« Reply #104 on: Aug 2nd, 2010, 3:39pm » |
Quote Modify
|
Latest bunch of games. A couple observations. 1) A cat is worth almost exactly 2 rabbits, as long as both sides have 4 or more rabbits. 2) The first 4 rabbits captured are worth about the same. After that their value goes up quickly. Code: Rank Name . . Elo + - games score oppo. draws 1 Clueless_ECC8R 3309 90 76 230 93% 2538 0% 2 Clueless_ECC7R 3219 76 69 230 88% 2545 0% 3 Clueless_ECC6R 3019 63 60 230 77% 2561 0% 4 Clueless_EC8R 2987 59 56 243 75% 2550 0% 5 Clueless_ECC5R 2848 58 57 230 64% 2574 0% 6 Clueless_EC7R 2815 54 54 241 61% 2569 0% 7 Clueless_ECC4R 2640 40 39 375 56% 2543 0% 8 Clueless_EC6R 2612 31 31 579 58% 2511 0% 9 Clueless_E8R 2551 27 27 702 54% 2501 0% 10 Clueless_ECC3R 2422 40 41 375 35% 2570 0% 11 Clueless_EC5R 2418 40 41 375 34% 2571 0% 12 Clueless_E7R 2337 28 29 836 52% 2188 0% 13 Clueless_EC4R 2267 53 51 368 78% 1756 0% 14 Clueless_E6R 2148 49 48 381 69% 1786 0% 15 Clueless_ECC2R 2113 49 48 366 69% 1763 0% 16 Clueless_E5R 1996 47 48 381 59% 1797 0% 17 Clueless_EC3R 1991 47 47 368 61% 1776 0% 18 Clueless_E4R 1730 49 50 381 42% 1817 0% 19 Clueless_EC2R 1653 51 53 368 39% 1801 0% 20 Clueless_ECC1R 1553 54 56 366 34% 1804 0% 21 Clueless_E3R 1494 56 58 381 29% 1834 0% 22 Clueless_E2R 1110 73 79 381 13% 1862 0% 23 Clueless_EC1R 1025 77 83 368 11% 1847 0% 24 Clueless_E1R 542 306 -269 381 1% 1904 0% |
|
|
|
IP Logged |
|
|
|
|