Author |
Topic: Attempt for an Arimaa Positional Evaluator (Read 2339 times) |
|
pago
Forum Guru
Arimaa player #5439
Gender:
Posts: 69
|
|
Attempt for an Arimaa Positional Evaluator
« on: Jan 15th, 2011, 4:16am » |
Quote Modify
|
Hello, I would like to suggest a new kind of positional evaluator based on board control although I am not very sure that it is valuable. Files are available under the following links : Explainations : http://sd-2.archive-host.com/membres/up/208912627824851423/APE.pdf Calculation sheet (Excel) : http://sd-2.archive-host.com/membres/up/208912627824851423/APE.zip When I test it without tree search is has some successes as bot slayer games or 2010 World Championship games. However it has also some complete failures as 2010 games between Chessandgo and Fritzlein. This evaluator uses HERD material evaluator discussed in an other thread. However, the idea could probably be used with other material evaluators. http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=devTalk;action=display ;num=1283354501;start=45 The principles of this Arimaa Positional Evaluator have also close relations with an old Omar’s idea. on Apr 27th, 2005, 2:45pm, omar wrote:Don Dailey and I once discussed this while he was working on bot Occam. We came to the conclusion that the value of a piece is highly determined by what other enemy pieces are around it (basically within 4 steps of it). If a dog is in one corner of the board with nothing stronger than another dog in proximity, then it is basically the elephant in that area. But for a program to actually value pieces that way could be disastrous. It might leave want to exchange its horse which is close to stronger enemy pieces for the opponents dog that is not close to stronger enemy pieces. So at one point we thought of having local and global values for the pieces. The local value being determined by the proximity of other opponent pieces and the global value fixed based on the overall rank of the piece. But still there were so many quirks that Don never actually tried implementing it. |
| It is also partially linked to discussions about mobility : on Nov 19th, 2006, 7:40am, Stanley wrote:I have just read the analysis at http://arimaa.janzert.com/bf_study/ Congratulations, Janzert, your study is of great importance for dimensioning the "Arimaa problem". What seems very interesting to me, and perhaps there are some additional graphs that may be easy to obtain is: The strong correlation between winning chances and mobility (defined as number of possible moves). (see the graph "bysidegoal.png" near the center of the page). May be, evaluation functions should have that "in mind". |
|
|
|
IP Logged |
|
|
|
pago
Forum Guru
Arimaa player #5439
Gender:
Posts: 69
|
|
Re: Attempt for an Arimaa Positional Evaluator
« Reply #1 on: Feb 24th, 2011, 2:01pm » |
Quote Modify
|
Today I have noticed a very strange behavior oh HERD material evaluator : According to the games I have used to evaluate the efficiency of my evaluators it has a significant better result when silver wins than when gold wins. This phenomenon seems to be amplified with APE positional evaluator. This is very strange because the evaluators are symetric for gold and silver. HERD material evaluator (% of correct prevision) : Opening Middle endgame Whole silver 50,70% 67,48% 88,15% 69,13% gold 50,50% 60,78% 80,92% 64,17% APE positional evaluator (% of correct prevision) : Opening Middle endgame Whole silver 59,24% 77,46% 91,79% 76,59% gold 53,20% 69,06% 77,74% 66,80% I am currently trying to test these evaluators with 2011 WC games with the same results : APE is efficient when silver wins and has bad result when gold wins. I know that I haven't tested evaluators with a huge quantity of games but I believe (without beeing sure) that I have used a sufficient quantity of games to avoid a statistical bias. Does a same kind phenomenon exist with current performant evaluators (FAME, DAPE etc...) ? Has someone an explanation for this phenomenon ? Or do you think that the set of game I have used is not significant and that It would disappear if I used a greater set of games ?
|
|
IP Logged |
|
|
|
The_Jeh
Forum Guru
Arimaa player #634
Gender:
Posts: 460
|
|
Re: Attempt for an Arimaa Positional Evaluator
« Reply #2 on: Feb 26th, 2011, 4:38pm » |
Quote Modify
|
Is there a typo somewhere in your evaluator code? What exactly does "% of correct prevision" mean?
|
« Last Edit: Feb 26th, 2011, 4:38pm by The_Jeh » |
IP Logged |
|
|
|
pago
Forum Guru
Arimaa player #5439
Gender:
Posts: 69
|
|
Re: Attempt for an Arimaa Positional Evaluator
« Reply #3 on: Feb 27th, 2011, 1:59pm » |
Quote Modify
|
on Feb 26th, 2011, 4:38pm, The_Jeh wrote:Is there a typo somewhere in your evaluator code? What exactly does "% of correct prevision" mean? |
| My notation may be confusing. If gold wins and the evaluator evaluates that gold has an advantage during all the game (from 1b to last move), the % of correct prevision would be 100%. If gold wins and the evaluator evaluates that silver has an advantage during all the game the % of correct prevision would be 0%. If the evaluator randomly evaluates a position the % of correct prevision should be about 50%. So a % of correct prevision higher than 50% may be seen as a (sight) success. A % of correct prevision lower prevision is a bad result (random evaluator would be better). In an other side the evaluator evaluates the position with a percentage value. Maybe should I have chosen a more classical scale (-infinite + infinite) to avoid the confusion.
|
|
IP Logged |
|
|
|
The_Jeh
Forum Guru
Arimaa player #634
Gender:
Posts: 460
|
|
Re: Attempt for an Arimaa Positional Evaluator
« Reply #4 on: Feb 27th, 2011, 2:21pm » |
Quote Modify
|
All right, but let's say gold was theoretically winning until move 40, makes a serious blunder, and then loses on move 50. Your whole-game percent of correct prevision may only be about 20%, even though the evaluator was doing its job properly the entire time. So your endgame measurements may be the only part worth looking at, though even they may suffer from this effect. And do you consider the magnitude of the evaluation? For example, if the evaluation is +.01 for gold the whole game and then silver wins, I would say your prevision would be better estimated at 49.9% rather than 0%. The fact that all moves were evaluated in gold's favor means little given that the magnitude of the advantage was so tiny. This could also affect your measurements. But I don't know why else silver would have a higher prevision, other than that your data set is too small.
|
« Last Edit: Feb 27th, 2011, 2:34pm by The_Jeh » |
IP Logged |
|
|
|
pago
Forum Guru
Arimaa player #5439
Gender:
Posts: 69
|
|
Re: Attempt for an Arimaa Positional Evaluator
« Reply #5 on: Feb 28th, 2011, 1:57pm » |
Quote Modify
|
on Feb 27th, 2011, 2:21pm, The_Jeh wrote:All right, but let's say gold was theoretically winning until move 40, makes a serious blunder, and then loses on move 50. Your whole-game percent of correct prevision may only be about 20%, even though the evaluator was doing its job properly the entire time. |
| I agree with you. The lone proper way to measure the efficiency of an evaluator is to implement it in a bot and look at its results (but I am not able to do that). In the meantime I try to compare it with strong player games (rates higher than 2000) assuming that there will not be too much blunders and that in a sufficient number of game the winner had a winning game more than half of the time. It is probably not very satisfying but I didn't find a better way. A good solution could be to use a game database assessed by Arimaa experts (ie experts would have evaluated which side has an advantage from move 1b to last move). But I don't think that such a base exists.
|
|
IP Logged |
|
|
|
Sconibulus
Forum Guru
Arimaa player #4633
Gender:
Posts: 116
|
|
Re: Attempt for an Arimaa Positional Evaluator
« Reply #6 on: Feb 28th, 2011, 7:14pm » |
Quote Modify
|
Well, even experts can be wrong, we've seen violent swings due to 'strong' moves without any obvious blunders or 'weak' moves from the other side.
|
|
IP Logged |
|
|
|
UruramTururam
Forum Guru
Arimaa player #2537
Gender:
Posts: 319
|
|
Re: Attempt for an Arimaa Positional Evaluator
« Reply #7 on: Mar 1st, 2011, 12:29am » |
Quote Modify
|
on Feb 28th, 2011, 1:57pm, pago wrote: The lone proper way to measure the efficiency of an evaluator is to implement it in a bot and look at its results |
| No, absolutely not! It's a common misunderstanding... Implementing an evaluator in a bot would make people play specifically against it and this would alter the results and make them pretty useless. The proper way is to make a statistical analysis of every game played after the evaluator is developed and see how often and how early it would be able to determine the winner. This is in fact harder to do than making a bot; and in order to do it properly it would require a good statistician.
|
|
IP Logged |
Caffa et bucella per attactionem corporum venit ad stomachum meum. BGG Arimaa badges - get your own one!
|
|
|
pago
Forum Guru
Arimaa player #5439
Gender:
Posts: 69
|
|
Re: Attempt for an Arimaa Positional Evaluator
« Reply #8 on: Mar 31st, 2011, 10:50am » |
Quote Modify
|
I have tried to apply APE positional evaluator to 2011 World Championship games : http://sd-2.archive-host.com/membres/up/208912627824851423/2011_WC.zip I have chosen to test APE on all the games between players with a rate higher than 2000 (including Hanzack due to his results) assuming that the number of burdens would be sufficiently low not to disturb the results. The global result is that APE positional evaluator gets mixed results. Although it is better than a random play, it doesn’t show neither an obvious improvement of prevision performances nor an obvious decreased performances compared with HERD material evaluator. APE shows a worse result than HERD in the begining and the middle game and a better result than HERD at the end of the game. On the whole game the results are about equal. Strangely, APE and HERD still show significant better performances when silver wins than when gold wins. Maybe APE shows one interesting advantage on one point : it gets a better result from the last 5 moves before the first trade until the end of the game than the material evaluator. An other little point in APE favor is that it seems to choose reasonable moves when I test it on the first moves even without tree search. My global impression is that APE would probably not be very competitive compared with performant positional evaluators but it could « reasonably » play faced to weak players and could have a different play style than current bots.
|
|
IP Logged |
|
|
|
JimmSlimm
Forum Guru
Arimaa player #6348
Gender:
Posts: 86
|
|
Re: Attempt for an Arimaa Positional Evaluator
« Reply #9 on: May 12th, 2011, 12:06pm » |
Quote Modify
|
pago, since your evaluator is giving advantage as a percentage, maybe you would get better prevision results if you add certainty to the calculation. for example 1% advantage maybe shouldn't weigh much in the final prevision result
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Attempt for an Arimaa Positional Evaluator
« Reply #10 on: May 12th, 2011, 4:08pm » |
Quote Modify
|
on May 12th, 2011, 12:06pm, JimmSlimm wrote:for example 1% advantage maybe shouldn't weigh much in the final prevision result |
| Root mean square error works well in such cases. One might think is bad to penalize a 51%-49% prediction with an "error" of 0.51^2 or 0.49^2 depending on the final outcome, especially if the position was "truly" a coin flip at the time of the prediction. However, a more ambitious prediction function which evaluated the same position 99%-1% would get hit with an error of 0.99^2 often enough to make it come out worse despite sometimes scoring only 0.01^2 error. The least average error will accrue to the formula making the truest prediction.
|
|
IP Logged |
|
|
|
JimmSlimm
Forum Guru
Arimaa player #6348
Gender:
Posts: 86
|
|
Re: Attempt for an Arimaa Positional Evaluator
« Reply #11 on: May 12th, 2011, 4:44pm » |
Quote Modify
|
on May 12th, 2011, 4:08pm, Fritzlein wrote: Root mean square error works well in such cases. One might think is bad to penalize a 51%-49% prediction with an "error" of 0.51^2 or 0.49^2 depending on the final outcome, especially if the position was "truly" a coin flip at the time of the prediction. However, a more ambitious prediction function which evaluated the same position 99%-1% would get hit with an error of 0.99^2 often enough to make it come out worse despite sometimes scoring only 0.01^2 error. The least average error will accrue to the formula making the truest prediction. |
| hmm ok, maybe it doesn't matter if prevision value is high, it should make a better bot with any evaluator that just gives a different percentage for each position(to get the best available move). Just as long as the sorted move list is in the correct order, I mean 100% for the best move is still the best, doesn't matter if the second best is 99% or 51%, they are still in same order
|
|
IP Logged |
|
|
|
pago
Forum Guru
Arimaa player #5439
Gender:
Posts: 69
|
|
Re: Attempt for an Arimaa Positional Evaluator
« Reply #12 on: May 13th, 2011, 7:09am » |
Quote Modify
|
Following the previous discussions I am also not sure that my indicator is a the more pertinent indicator. Maybe could we use a mix of different metric : - The one I used (% of right prevision) - Mean Absolute Error (it would be the average of APE when silver wins). An evaluation at 49,99% or 50,01% would almost have the same effect than a random prevision. - RMSE as suggested by Fritzlein. - Weighted error assuming that a right prevision at move one is much more important than a right prevision at the last move (even me I am able to say who is winning at the end of the game !). In a game of 50 (gold + silver) moves, the move one would get a weight of 50, the move two a weight of 49... and the last move a weight of 1. For information I have founded a great problem explaining why APE was not very performant in the beginning and the middle game : I used HERD to evaluate the local situation. However in HERD formula there is a bias to take into account the number of remaining rabbit. From global point of view it may make sense (a player without rabbit cannot win the game...). However from local point of view, it is totally silly. HERD evaluates that one rabbit is stronger than an elephant. In the whole board it is true but locally it is the contrary. The problem was amplified by the fact that the local winner get all and the loser nothing (the rabbit locally get 1 when the elephant get 0). => The result of that was when a piece was closed from rabbits, the evaluator tended to revert the advantage. I am rerunning the evaluation by removing the bias from HERD formula on 2011 WC. I have also introduced an other change. Now I keep HERD (without bias) result an I take the average performance on the complete board : It avoids to introduce bad errors when the local result is closed from 50% but in the wrong side (49,9% instead of 50,1%) With the 19 first games, APE performance to predict the result before the first piece trap was 48,9% (worse than a random prevision !). With the new formula without bias and the calculation on an average the performance is now about 55,9 % which begins to be more interesting... Maybe a more efficient material evaluator (FAME or other ?) would get better results with the same kind of idea : Locally apply the evaluator everywhere in the board and take a weighted average of the results. I take time to perform my new simulations because I use excel as usual...
|
|
IP Logged |
|
|
|
|