Author |
Topic: Global Algebric Material Evaluator (Read 9424 times) |
|
pago
Forum Guru
Arimaa player #5439
Gender:
Posts: 69
|
|
Global Algebric Material Evaluator
« on: Sep 1st, 2010, 10:21am » |
Quote Modify
|
Hello First, I hope that this topic is new. I didn’t see similar idea in the forum although this topic has obviously a close relation with the thread “(no) absolute score value for pieces”. I have written a paper about a new material evaluator that I propose. http://sd-2.archive-host.com/membres/up/208912627824851423/Global_Arimaa _Material_Evaluator.pdf This evaluator has some interesting properties : - I have designed the evaluator without other parameters than the number of pieces (yes I could !) - The evaluator takes into account ALL possible combination of pieces - It is intrinsically holistic. The evaluator globally deals with the material. A piece (or combination of piece) has not a predefined value. - It answers to most of the issues already discussed in the forum (increasing value of the rabbits, change of piece value according to the material balance etc…) - It is consistent with jdb’s tests on clueless bot (CR & DCR tests) - It could have some unexpected links with mathematics. Before thinking that I am totally mad, please try to read the paper I am interested by the friendly feed-back that I could get from the community, in particular those from bot developers, strong players, people with more mathematical competency than me and… Arimaa game inventors.
|
|
IP Logged |
|
|
|
rbarreira
Forum Guru
Arimaa player #1621
Gender:
Posts: 605
|
|
Re: Global Algebric Material Evaluator
« Reply #1 on: Sep 1st, 2010, 11:49am » |
Quote Modify
|
It is very elegant, maybe Janzert will add it to the evaluators page for people to play around with. If I have some time I may program it into my bot and run some tournaments against my current material evaluation (FAME).
|
|
IP Logged |
|
|
|
speek
Forum Guru
Arimaa player #5441
Gender:
Posts: 75
|
|
Re: Global Algebric Material Evaluator
« Reply #2 on: Sep 1st, 2010, 12:04pm » |
Quote Modify
|
Where is this evaluators page?
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Global Algebric Material Evaluator
« Reply #4 on: Sep 1st, 2010, 1:56pm » |
Quote Modify
|
My first impression is that GAME is aimed more at endgame evaluation than opening evaluation. This is the opposite of FAME which performs best in the opening and worst in the endgame. As the first trade, FAME, DAPE, and HarLog all agree that M >> CC. I think every top player would concur as well. In the opening the absence of two cats is merely annoying, whereas the absence of the camel is crippling. The player without the camel has no answer to an elephant-horse attack, which is strong enough to tie down the defensive elephant, leaving the lone camel unopposed on the rest of the board. On a much smaller point, GAME rates it equally to have one strong and one weak piece against two medium pieces, for example MD vs HH. My experience is that having the strong and weak piece is better in almost every situation. Winning the primary fight is more important than winning the secondary fight. Thanks again for sharing your formula.
|
|
IP Logged |
|
|
|
jdb
Forum Guru
Arimaa player #214
Gender:
Posts: 682
|
|
Re: Global Algebric Material Evaluator
« Reply #5 on: Sep 1st, 2010, 3:40pm » |
Quote Modify
|
Very nice work! Your idea does an amazingly good job. The rest of the tournament results are posted in the other thread.
|
|
IP Logged |
|
|
|
Rednaxela
Forum Senior Member
Arimaa player #4674
Gender:
Posts: 34
|
|
Re: Global Algebric Material Evaluator
« Reply #6 on: Sep 1st, 2010, 6:12pm » |
Quote Modify
|
Very impressive! I really like how elegant it is! Hmm... I wonder if there is a 'missing' factor would fix the "M >> CC" situation that Fritzlein mentioned without adding numeric constants.... I have a feeling that putting an exponent on the "number of pieces dominated by this piece" factor might work, but that does add a numeric constant to guess or empirically tune. About "M >> CC", I think one other thing perhaps worth noting, is that the empirically optimized ones all show that loss to be significantly smaller than a single-cat-at-opening loss, whereas all the hand tuned ones show it as a significantly bigger loss than a single-cat-at-opening loss. Perhaps this suggests that while "M > CC" is true, it's objective difference is smaller than what it 'feels' like to players?
|
« Last Edit: Sep 1st, 2010, 6:35pm by Rednaxela » |
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Global Algebric Material Evaluator
« Reply #7 on: Sep 1st, 2010, 8:00pm » |
Quote Modify
|
on Sep 1st, 2010, 6:12pm, rednaxela wrote:About "M >> CC", I think one other thing perhaps worth noting, is that the empirically optimized ones all show that loss to be significantly smaller than a single-cat-at-opening loss, whereas all the hand tuned ones show it as a significantly bigger loss than a single-cat-at-opening loss. Perhaps this suggests that while "M > CC" is true, it's objective difference is smaller than what it 'feels' like to players? |
| You suggest that perhaps M for CC feels like a larger advantage than it is. This is a tricky issue, because an advantage that allows me to win a rabbit for nothing could be considered as an advantage of just one rabbit, or it could be considered an infinite (game-winning) advantage. If I have M for CC, it is reasonable for me to play to not lose any pieces ever. I should be able to successfully contest all four traps, denying the opponent any space for capture. I will almost automatically be able to set up a control situation where I eventually win something for nothing. It may take a long time, but if the opponent has no way to threaten me, it doesn't matter how long the control takes to pay off, because then I can reset and make it pay off again and again and again. (OK, maybe it is too strong to insist that I would never have to lose a piece, but at a minimum I could play to never lose a piece without capturing a better piece in compensation, i.e. I would never have to accept an equal trade.) The "empirically optimized" results should be used with caution. There is apparently a correlation between the ability of the players and the value of the strong pieces. Relatively speaking, between weaker players it is more important to have numerous pieces, whereas among stronger players it is more important to have stronger pieces. I believe this is because stronger players have a better understanding of "control" positions, where the player who has lost control is doomed to eventually lose material even though it isn't superficially obvious why. Unfortunately, if one restricts "empirical optimization" to, say, games in which both players were humans rated over 2000, there aren't enough data points to get a good reading on material values. Therefore data from games between intermediate players must be included as well, potentially distorting the results. An extreme example of a useless empirical result is self-play by a randomly-moving bot. For such a player (i.e. one with no strategy), a rabbit is worth more than an elephant, as proven by experiment! Yet nobody supposes the material values for ultra-weak players should inform their values for strong players. On the other hand, it is possible that I am merely stubborn in my adherence to my intuitions in spite of the evidence. Certainly the top players of 2005 over-valued the camel. Our intuitions were wrong. Now we value it somewhat less, but perhaps still too much. Perhaps a camel is truly worth only a cat and a dog. Please, do your best to convince chessandgo of this, so that we can arrange to trade his M for my DC in our next World Championship match.
|
|
IP Logged |
|
|
|
Rednaxela
Forum Senior Member
Arimaa player #4674
Gender:
Posts: 34
|
|
Re: Global Algebric Material Evaluator
« Reply #8 on: Sep 1st, 2010, 10:59pm » |
Quote Modify
|
on Sep 1st, 2010, 8:00pm, Fritzlein wrote: The "empirically optimized" results should be used with caution. There is apparently a correlation between the ability of the players and the value of the strong pieces. Relatively speaking, between weaker players it is more important to have numerous pieces, whereas among stronger players it is more important to have stronger pieces. I believe this is because stronger players have a better understanding of "control" positions, where the player who has lost control is doomed to eventually lose material even though it isn't superficially obvious why. Unfortunately, if one restricts "empirical optimization" to, say, games in which both players were humans rated over 2000, there aren't enough data points to get a good reading on material values. Therefore data from games between intermediate players must be included as well, potentially distorting the results. |
| Indeed, one must be quite careful with "empirically optimization" and the methodology used. One note is that schemes with fewer constants to optimize take MUCH less data to optimize well. GAME extended by the exponent I mentioned in my earlier post, would take a relatively tiny number of data points to stabilize. My feeling is that the ideal material evaluation shouldn't need many parameters to tune. Really, all pieces except the rabbit have the same rules, and are only differentiated by what pieces on the board they dominate or are dominated by. I kind of think GAME is on the right track with the elegant way it approaches this. As far as pure material evaluation goes, there are only two things I can think of that feel like 'major omissions' from GAME: 1) Relative weight of rabbits to other pieces, due to their differing rules 2) Non-linear effects in the number of pieces dominated It seems to me #1 would take one constant to express, and #2 would take one to two constants to express. A number of constants as few as 2 is pretty easy to "empirically optimize" with relatively few data points, allowing one to be pickier about the games used. Anyway, since I already have code lying around for processing gameroom data... I should probably run some tests on it, using GAME, older material evaluators, and perhaps try a variation on GAME with some tunable parameters... Too late tonight to do that, but I'll probably get something together this week to give it a shot.
|
|
IP Logged |
|
|
|
pago
Forum Guru
Arimaa player #5439
Gender:
Posts: 69
|
|
Re: Global Algebric Material Evaluator
« Reply #9 on: Sep 2nd, 2010, 5:10am » |
Quote Modify
|
Thank you for these very friendly feedbacks To rbarreira : I would be very interested by seeing the behavior of GAME in a bot although I already know that it has to be improved (see the answer to Fritzlein). Unfortunately my competences in software development are worse than than my competences at Arimaa (knowing that the last ones worth almost zero). To Fritzlein (and others) : As usual your comments are interesting and challenging. I agree with you when you remark in your two examples that GAME has not a perfect behaviour yet although I find astonishing to see that it is possible to design a (relatively consistent) evaluator taking into account all the combination of pieces without hand-tuned parameters. My analysis is that although GAME intrinsically takes into account the interactions between piece combinations it doesn’t take into account the relative position of pieces on the board and in particular the dangerousness of piece environment. GAME(G;s)=F(G;s)/(F(G;s)+F(s;G)) F(G,s) = somme(Fi(G;s)) = Opportunities–Risks As it is, GAME considers that the relative value of a cat and a camel in the middle of the enemy army equals the relative value of a cat and a camel staying quietly at home. It is obviously false. A cat feels that the middle of enemy army is a very dangerous environment. It has to look at enemy elephant, camel, horses, dogs (risks). It has no time to fight against enemy cats or to attack rabbits (opportunities). It would like to stay at home until the situation becomes more quiet. On the contrary a camel feels that the middle of the enemy army is not so dangerous. It has only one risk (the elephant) and a lot of opportunities. It can stay in the middle of enemy army (as far as possible of the elephant). To summarize, F(G;s) should be weighted by the dangerousness of each piece environment, so that the relative value between a cat and a camel would increase with dangerousness The initial setup is a quite dangerous situation (although it is less dangerous than the middle of enemy army). So the relative value between a camel and a cat should be greater. In the finale the dangerousness decreases more for a cat than for a camel. The relative value between a camel and a cat should decrease. I have some ideas to design a positional evaluator based on GAME which could fix the identified issues. F(G;s) would be replaced by F(G;s;setup) = somme(Fi(G;s;ri;ci) where ri is the rank and ci is the column of the square occupied by piece i. This evaluator would have the additional emerging following properties (keeping in mind that I want to avoid hand-tuned parameters) : a) Fi(G;0;ri;ci) increases toward the center of the board (pieces tend to search the center) b) Fi(G;s;ri;ci) increases with the proximity of weaker pieces (pieces tend to attack weaker pieces) c) Fi(G;s;ri;ci) decreases with the proximity of stronger pieces (pieces tend to avoid stronger pieces) d) For the rabbit, Fi(G;0;ri+1;ci) > Fi(G;0;ri;ci) (when there are no risk an advanced rabbit is better) e) Fi(G;0;ri+1;ci) > Fi(G;0;ri;ci+1) & Fi(G;0;ri+1;ci) > Fi(G;0;ri;ci-1) (when there are no risk a rabbit shall go to the 8th rank as far as possible) f) The relative weight between a, b, c, d, e would not be predefined (similarly the relative value of pieces are not predefined in GAME. They emerge from the material balance) The elephant would feel a tense between a) and b) (between centralization and attack) A camel would feel a tense between a) much b) and a little c) … A cat would would feel a tense between a) a little b) and much c) A rabbit would feel a tense between a little a) much c) a little d) a little e) With such an evaluator, cats and camels would feel the influence of enemy and so relative value between them would increase thanks b) and c). Maybe it is a little too ambitious, but I will try…
|
|
IP Logged |
|
|
|
speek
Forum Guru
Arimaa player #5441
Gender:
Posts: 75
|
|
Re: Global Algebric Material Evaluator
« Reply #10 on: Sep 2nd, 2010, 12:16pm » |
Quote Modify
|
Pago, now you've describe some of what I've been thinking about - evaluating pieces based on proximity those they can dominate (= good) and proximity to those who can dominate them (=bad). Something to consider though is that if A dominates B, it is MOST valuable if A's strength is only 1 greater than B, as opposed to 2+ greater. IE, better to use the elephant to dominate a camel than a cat. In this way, the inverse of the difference between piece strengths should, I think, feed into your equations.
|
« Last Edit: Sep 2nd, 2010, 12:16pm by speek » |
IP Logged |
|
|
|
jdb
Forum Guru
Arimaa player #214
Gender:
Posts: 682
|
|
Re: Global Algebric Material Evaluator
« Reply #11 on: Sep 2nd, 2010, 2:52pm » |
Quote Modify
|
I have a question. In the phase where the points are calculated based on the duels, is this an equivalent formulation? The points each piece gets is 16 - the number of stronger enemy pieces.
|
|
IP Logged |
|
|
|
pago
Forum Guru
Arimaa player #5439
Gender:
Posts: 69
|
|
Re: Global Algebric Material Evaluator
« Reply #12 on: Sep 4th, 2010, 7:27am » |
Quote Modify
|
[Posted by: jdb Posted on: Sep 2nd, 2010, 9:52pm ][/I have a question. In the phase where the points are calculated based on the duels, is this an equivalent formulation? The points each piece gets is 16 - the number of stronger enemy pieces. ] Yes it equivalent. I wrote §5.1 (matrix calculation) using this equivalence
|
|
IP Logged |
|
|
|
Rednaxela
Forum Senior Member
Arimaa player #4674
Gender:
Posts: 34
|
|
Re: Global Algebric Material Evaluator
« Reply #13 on: Sep 5th, 2010, 8:28pm » |
Quote Modify
|
I just ran some tests now, on all gameroom data from 2006 to present (1GB when uncomrpessed). Then I split the results based on 1) rating threshold, 2) bots included or not, and 3) which "phase" of the game it is, divided into thirds. The percent value represents "percept of turns with non-equal material, where the winner of the correctly guessed by the material evaluator", and the count is the number of such turns in that data sample. Sorry it's hard to read. The forum software in use here seems to not handle it's table tags that well, so I couldn't use that. Game must be rated and end in rabbit victory. Both players rated over 1700. Games with bots included. Game Phase, Count, GAME, FAME, FAMEeo, DAPE, DAPEeo, HarLog 1st third, 98369, 65.119%, 64.754%, 65.058%, 64.512%, 65.100%, 64.780% 2nd third, 378126, 77.409%, 77.597%, 78.036%, 77.332%, 78.035%, 77.691% 3rd third, 505682, 86.753%, 86.783%, 87.149%, 86.715%, 87.136%, 86.822% Total, 982177, 80.989%, 81.040%, 81.429%, 80.879%, 81.425%, 81.099% Game must be rated and end in rabbit victory. Both players rated over 2000. Games with bots included. Game Phase, Count, GAME, FAME, FAMEeo, DAPE, DAPEeo, HarLog 1st third, 11266, 69.466%, 68.676%, 69.270%, 68.116%, 69.253%, 68.454% 2nd third, 46259, 75.579%, 76.147%, 76.547%, 76.242%, 76.530%, 76.450% 3rd third, 63295, 85.929%, 86.146%, 86.448%, 86.027%, 86.550%, 86.136% Total, 120820, 80.431%, 80.689%, 81.055%, 80.611%, 81.101%, 80.779% Game must be rated and end in rabbit victory. Both players rated over 1700. Games with bots excluded. Game Phase, Count, GAME, FAME, FAMEeo, DAPE, DAPEeo, HarLog 1st third, 5326, 66.147%, 67.574%, 67.405%, 67.443%, 67.443%, 67.462% 2nd third, 34116, 74.716%, 75.070%, 76.404%, 75.152%, 76.404%, 75.378% 3rd third, 52590, 85.959%, 85.246%, 86.326%, 85.362%, 86.454%, 85.372% Total, 92032, 80.645%, 80.451%, 81.553%, 80.540%, 81.628%, 80.631% Game must be rated and end in rabbit victory. Both players rated over 2000. Games with bots excluded. Game Phase, Count, GAME, FAME, FAMEeo, DAPE, DAPEeo, HarLog 1st third, 1933, 67.977%, 70.150%, 69.633%, 70.150%, 69.633%, 69.840% 2nd third, 12116, 73.110%, 74.563%, 75.264%, 75.322%, 75.165%, 74.761% 3rd third, 18632, 85.793%, 84.811%, 85.804%, 84.650%, 86.174%, 84.537% Total, 32681, 80.037%, 80.144%, 80.940%, 80.334%, 81.114%, 80.043% It seems to show a few things, including the 'eo' variants being stronger even when excluding bots and players not rated over 2000. One interesting thing is GAME seems to have issues judging early trades in human-only games. Overall, GAME seems to perform fairly well though. EDIT: Interestingly, it looks like for the first third of the match, GAME always is the best predictor when bots are involved, but starts to suffer both in later game and when it's human only. The other interesting thing is that in early-game for human-only games, FAME and DAPE appear to outperform FAMEeo and DAPEeo, but in the late game the 'eo' variants come out on top. Then in the games with bots, the 'eo' variants seem to always come out on top. Any thoughts, either on GAME or on these results in general?
|
« Last Edit: Sep 5th, 2010, 9:37pm by Rednaxela » |
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Global Algebric Material Evaluator
« Reply #14 on: Sep 5th, 2010, 9:22pm » |
Quote Modify
|
Very interesting analysis. Thanks for sharing. It strikes me that the optimized versions of FAME and DAPE do worse at predicting in the opening third of the game. Why wouldn't they do better in every phase? Probably it is because there are more material imbalances later in the game, so if equal weight is given to every position with a material imbalance, then late-game positions will have a greater impact on the optimizations. One reason to be suspicious of the optimized versions of FAME and DAPE was that perhaps they were over-fitting the data at hand. However, they have done quite well for themselves across the data from several years since they were calculated. This is true predictive power, and I am impressed by it, particularly in the case of over-2000 HvH matches. In line with my reasoning about control positions earlier in this thread, it makes sense that stronger pieces would relatively lose their value as the game progress, because it becomes harder and harder to play a control game. I hand-tuned FAME mostly with early-game positions in mind, so it isn't surprising that it performs increasingly poorly later in the game. It was clever of you to divide the data into thirds as you did, because that provides the greatest insight of all. Whenever someone suggests that my material intuitions are out of whack, I imagine the imbalance occurring on a relatively quiet, fluid board position. However, later in the game the positions tend to be messier, with more pieces strategically committed, more rabbits advanced, and more traps contested. Perhaps my intuitions are just fine in the positions I usually imagine, but are inaccurate when some factor typical in later games is present. I am not sure precisely which factor this would be, but I will now be more vigilant for late-game positional factors that change the value of a material imbalance. Thanks again for doing the spadework and sharing your results.
|
|
IP Logged |
|
|
|
|