Author |
Topic: Global Algebric Material Evaluator (Read 9525 times) |
|
aaaa
Forum Guru
    
 Arimaa player #958
Posts: 768
|
 |
Re: Global Algebric Material Evaluator
« Reply #45 on: Sep 13th, 2010, 1:02pm » |
Quote Modify
|
on Sep 13th, 2010, 11:21am, Fritzlein wrote:Of course one could arbitrarily draw a line for switching between evaluators, but it would be much more elegant to have a single formula. |
| It's not just about elegance.
|
|
IP Logged |
|
|
|
Rednaxela
Forum Senior Member
   
 Arimaa player #4674
Gender: 
Posts: 34
|
 |
Re: Global Algebric Material Evaluator
« Reply #46 on: Sep 13th, 2010, 8:14pm » |
Quote Modify
|
on Sep 13th, 2010, 4:04am, pago wrote: I am not so surprised by this. As I tried to explain in a previous reply, GEM "measures a material balance as if the goal of Arimaa were to take the maximum quantity of adverse piece (or maybe more precisely as if there were no goal in Arimaa game). It is good at the beginning but at the end, it is more important to win the game than to catch the adverse elephant. |
| Ahh, I see. As a quick note, I did some quick trials and found that unweighted "GEM + GAME" gives higher scores than either overall, but worse in than the best of the two in any given segment of the game. I tried some weighting based on number of pieces and got some slightly better performance still, but nothing that felt worth the inelegance of such melding to me. on Sep 13th, 2010, 11:21am, Fritzlein wrote: One thing that occurs to me is that you insisted your games end in goal. Doesn't that slightly bias things in favor of evaluators that like rabbits? In particular, someone who has an army consisting of lots of strong pieces and few rabbits might find it easier to win by immobilization than by goal. I don't see why you shouldn't include wins by immobilization and elimination in your methodology. |
| Well, I didn't think much about elimination, but to me it seemed that immobilization wins are rare and are caused by rather different circumstances and thus would be more of a noise source than anything. Prompted by you asking this though I did a test of including the different game results: 2000+ score, no bots, goal ending only (590 games) "quiet position" turns only Game Phase, Count, Marwin, GEM, GAME, FAME, FAMEeo, DAPE, DAPEeo, HarLog Phase0, 359, 58.496%, 57.382%, 56.546%, 58.217%, 57.939%, 58.774%, 57.939%, 57.939% Phase1, 1356, 70.870%, 71.313%, 70.428%, 70.723%, 71.460%, 70.944%, 71.386%, 70.870% Phase2, 2211, 85.346%, 84.080%, 85.889%, 85.391%, 85.798%, 85.075%, 86.251%, 85.301% Total, 3926, 77.891%, 77.229%, 77.866%, 77.840%, 78.299%, 77.789%, 78.528%, 77.815% 590 2000+ score, no bots, goal AND elimination ending only (595 games) "quiet position" turns only Game Phase, Count, Marwin, GEM, GAME, FAME, FAMEeo, DAPE, DAPEeo, HarLog Phase0, 366, 58.197%, 57.104%, 56.557%, 57.923%, 57.650%, 58.470%, 57.650%, 57.650% Phase1, 1381, 70.891%, 71.615%, 70.746%, 70.818%, 71.687%, 71.035%, 71.687%, 71.108% Phase2, 2242, 85.459%, 84.255%, 86.084%, 85.504%, 85.995%, 85.236%, 86.396%, 85.459% Total, 3989, 77.914%, 77.388%, 78.065%, 77.889%, 78.441%, 77.864%, 78.666%, 77.939% 595 2000+ score, no bots, goal/elimination/immobilization endings (607 games) "quiet position" turns only Game Phase, Count, Marwin, GEM, GAME, FAME, FAMEeo, DAPE, DAPEeo, HarLog Phase0, 377, 58.355%, 57.294%, 56.764%, 58.090%, 57.825%, 58.621%, 57.825%, 57.825% Phase1, 1407, 71.073%, 71.784%, 70.860%, 71.144%, 71.855%, 71.357%, 71.855%, 71.429% Phase2, 2309, 85.881%, 84.712%, 86.488%, 85.925%, 86.401%, 85.665%, 86.791%, 85.881% Total, 4093, 78.256%, 77.742%, 78.378%, 78.280%, 78.769%, 78.256%, 78.989%, 78.329% The change to all evaluators and game segments seems essentially uniform, so at very least it doesn't really change the overall picture due to their rarity.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
    
 Arimaa player #706

Gender: 
Posts: 5928
|
 |
Re: Global Algebric Material Evaluator
« Reply #47 on: Sep 13th, 2010, 8:38pm » |
Quote Modify
|
on Sep 13th, 2010, 8:14pm, Rednaxela wrote:Well, I didn't think much about elimination, but to me it seemed that immobilization wins are rare and are caused by rather different circumstances and thus would be more of a noise source than anything. |
| Immobilization is a source of wins, not noise! Or do you tell your opponent, when you lose by immobilization, that he didn't really beat you? Quote:The change to all evaluators and game segments seems essentially uniform, so at very least it doesn't really change the overall picture due to their rarity. |
| It makes sense that the impact would be small due to the rarity of non-goal results, but I was curious nonetheless. Thanks for re-running the numbers.
|
|
IP Logged |
|
|
|
Rednaxela
Forum Senior Member
   
 Arimaa player #4674
Gender: 
Posts: 34
|
 |
Re: Global Algebric Material Evaluator
« Reply #48 on: Sep 13th, 2010, 8:58pm » |
Quote Modify
|
on Sep 13th, 2010, 8:38pm, Fritzlein wrote: Immobilization is a source of wins, not noise! Or do you tell your opponent, when you lose by immobilization, that he didn't really beat you? |
| Hahaha, nah. What I mean by it being "noise" is that I felt that mixing it with goal wins would be too much of an "apples an oranges" comparison. I'm starting to change my mind though.
|
« Last Edit: Sep 13th, 2010, 8:58pm by Rednaxela » |
IP Logged |
|
|
|
pago
Forum Guru
    
 Arimaa player #5439

Gender: 
Posts: 69
|
 |
Re: Global Algebric Material Evaluator
« Reply #49 on: Sep 14th, 2010, 2:10pm » |
Quote Modify
|
Quote:Ahh, I see. As a quick note, I did some quick trials and found that unweighted "GEM + GAME" gives higher scores than either overall, but worse in than the best of the two in any given segment of the game. I tried some weighting based on number of pieces and got some slightly better performance still, but nothing that felt worth the inelegance of such melding to me. |
| Maybe could we try to introduce a "goal balance" in the equation to take into account the goal of the game and to bias a little the evaluator in favour of rabbits. This goal balance could be something as : Balance(G;s;goal) = N6/(N6+n6) So when we would introduce it in GEM equation it would be simplified : (N6+n6)*Balance(G;s;goal) = N6 GEM = (sum(...)+N6)/(Sum(Ni+ni)+N6+n6) I have not tried this idea yet and maybe it doesn't work at all.
|
|
IP Logged |
|
|
|
pago
Forum Guru
    
 Arimaa player #5439

Gender: 
Posts: 69
|
 |
Re: Global Algebric Material Evaluator
« Reply #50 on: Sep 15th, 2010, 4:20am » |
Quote Modify
|
Quote:This goal balance could be something as : Balance(G;s;goal) = N6/(N6+n6) So when we would introduce it in GEM equation it would be simplified : (N6+n6)*Balance(G;s;goal) = N6 GEM = (sum(...)+N6)/(Sum(Ni+ni)+N6+n6) I have not tried this idea yet and maybe it doesn't work at all. |
| Result : It doesn't work as it is... (rabbits are too favoured) The idea seems to work with a little difference. Instead of taking the number of rabbits as potential goals I consider that there are only two goals (one for each side). That is even more consistent with the fact that the match ends when one side has reached his goal. The equations would become : Balance(G;s;goal) = N6/(N6+n6) HEM = (Sigma(...)+2*balance(G;s;goal))/(Sigma(Ni+ni)+2) I am performing my tests with Excel (!). I'll post the results when they are finished
|
|
IP Logged |
|
|
|
aaaa
Forum Guru
    
 Arimaa player #958
Posts: 768
|
 |
Re: Global Algebric Material Evaluator
« Reply #51 on: Sep 16th, 2010, 5:56pm » |
Quote Modify
|
I just did some tests myself and I'm afraid to conclude that to use game data to evaluate or derive material evaluation functions would (still) be of dubious merit as it seems to lead to a lopsided preference of quantity of pieces over quality that's well outside the mainstream opinion.
|
|
IP Logged |
|
|
|
pago
Forum Guru
    
 Arimaa player #5439

Gender: 
Posts: 69
|
 |
Re: Global Algebric Material Evaluator
« Reply #52 on: Sep 17th, 2010, 7:50am » |
Quote Modify
|
Quote:I just did some tests myself and I'm afraid to conclude that to use game data to evaluate or derive material evaluation functions would (still) be of dubious merit as it seems to lead to a lopsided preference of quantity of pieces over quality that's well outside the mainstream opinion. |
| Quote:Instead of taking the number of rabbits as potential goals I consider that there are only two goals (one for each side). That is even more consistent with the fact that the match ends when one side has reached his goal. The equations would become : Balance(G;s;goal) = N6/(N6+n6) HEM = (Sigma(...)+2*balance(G;s;goal))/(Sigma(Ni+ni)+2) |
| I have rewritten for the third (and probably the last) time my paper about the evaluator. This time I have incorporated the idea I posted in a previous reply to take into account the goal of Arimaa. The pdf file is available under this link : http://sd-2.archive-host.com/membres/up/208912627824851423/HEM.pdf The Excel calculation file is available under this link : http://sd-2.archive-host.com/membres/up/208912627824851423/HEM.xls I called this evaluator HEM / Holistic Evaluator of Material (I am better to find names than to build efficient evaluators !) The main modifications of the paper are : - Incorporation of a goal balance - A paragraph about first trade comparison added - A paragraph about Dog + cat complete tournament (72 combinations) added - A paragraph about intransitivity added - The paragraph about matrix calculation removed - Correction of some typos. I didn’t copy all the tournament results in the appendix. They are available in the Excel file. Compared to GEM, the improvements are : - Increase of rabbit relative value when there is an unbalanced number of rabbit (the relative value between major pieces have not been changed). It should keep the GEM advantage in first round and GAME advantages in following rounds. - Dog tournament results are more consistent with jdb’s results - Switches of advantage after trades beginning from EMHHDDCC4R ve emhhddcc8r occurs after HDC ,trades (GEM needed HDCC) The biggest potential defect that I haven’t fixed is that HEM undervalues M compared with the community consensus. For HEM, DC < M < DD. HEM still foresees that intransitivity should occur. I am now almost convinced that it is not a defect of HEM but on the contrary an improvement compared to other evaluators although foreseen cycles are dubious because of M undervalue. @Rednaxela : I would be very interested to see the behaviour of HEM in your result prediction tests (I would also understand that you have no time to test all my ideas !)
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
    
 Arimaa player #706

Gender: 
Posts: 5928
|
 |
Re: Global Algebric Material Evaluator
« Reply #53 on: Sep 17th, 2010, 12:25pm » |
Quote Modify
|
on Sep 17th, 2010, 7:50am, pago wrote:HEM still foresees that intransitivity should occur. I am now almost convinced that it is not a defect of HEM but on the contrary an improvement compared to other evaluators although foreseen cycles are dubious because of M undervalue. |
| Improvement? Just because material intransitivities exist in fact, doesn't mean that a system having intransitivities is an improvement over a system that doesn't have them. A system might claim the existence of intransitivities that don't correspond to reality while not detecting ones that do. The relevant question is whether HEM is right or wrong about its evaluations. Again, thanks for sharing your results. Thought experiments like yours keep advancing the state of the art. I wonder whether future Arimaa grandmasters will become convinced, at least partially due to the material formulas under discussion, that our current outlook overvalues material quality and undervalues material quantity in late-game situations.
|
|
IP Logged |
|
|
|
pago
Forum Guru
    
 Arimaa player #5439

Gender: 
Posts: 69
|
 |
Re: Global Algebric Material Evaluator
« Reply #54 on: Sep 18th, 2010, 1:36am » |
Quote Modify
|
Quote:Improvement? Just because material intransitivities exist in fact, doesn't mean that a system having intransitivities is an improvement over a system that doesn't have them. A system might claim the existence of intransitivities that don't correspond to reality while not detecting ones that do. The relevant question is whether HEM is right or wrong about its evaluations. |
| @Fritzlein : I agree with you. "The relevant question is whether HEM is right or wrong about its evaluations." ... and at this time HEM is not perfect ! (for example its evaluation for relative value of M is probably wrong. What I tried to say without subtility (sorry for my bad english) is that it is not so easy to design a consistent evaluator that foresees intransitivity and that is an interesting property of HEM (assuming that intransitivity does exist !). Once again I am aware that HEM shall be improved and I share the common opinion about M underevaluation. I hope that I am not borrying you with a thread that was a thread about one evaluator at the beginning and that I have became a thread about an evaluator designing process (once again...an unexpected property).
|
|
IP Logged |
|
|
|
pago
Forum Guru
    
 Arimaa player #5439

Gender: 
Posts: 69
|
 |
Re: Global Algebric Material Evaluator
« Reply #55 on: Oct 1st, 2010, 11:03am » |
Quote Modify
|
Quote:The relevant question is whether HEM is right or wrong about its evaluations. |
| I believe that I have succeeded to fix the main weaknesses of the previous evaluators (GEM & HEM). I have called this updated evaluator HERD (Holistic Evaluator of Remaining Duel). I do hope that HERD will be competitive compared with the current best ones (FAME, DAPE, Harlog etc...) according to rednaxela criteria (I have good reasons for this hope if I refer to my tests). Here are the links to the files : http://sd-2.archive-host.com/membres/up/208912627824851423/HERD.pdf http://sd-2.archive-host.com/membres/up/208912627824851423/HERD.zip The main improvements of HERD evaluator compared with the previous ones are : 1) Evaluation of major pieces relative values much more closer from community consensus (for example HD > M > HC) 2) Evaluation of finales (Cat tournament, Dog tournament and DCR tournament) very close from jdb’s results (according to RMSE, MAE & MAPE error estimations). 3) Estimated relative value of a cat compared with a rabbit consistent with the consensus. 4) Estimated relative value of a dog compared with two rabbits consistent with the consensus. 5) Estimated relative value of a horse and three rabbits close from the consensus. The main remaining potential defects or differences with community current consensus are : a) Evaluation of MD versus HH at first trade. HERD doesn’t complies with consensus and evaluates that HH > MD at first trade (although it estimates that MD > HH after the trade of one dog). In the same way it evaluates that DD > HC. b) Evaluation of DCC versus HH. HERD evaluates that DCC > HH at first trade. c) Relative value of camels compared with rabbits. HERD evaluates that 4R > M > 3R (I don’t know what the consensus would be). HERD calculation is based on the same formula than GEM and HEM formulas with two modifications : 1) Generalization of piece hierarchy 2) Introduction of a goal bias different from HEM goal balance I have added a few paragraph in the paper (in particular to compare HERD behaviour with current consensus at first trade). As usual, I am very interested by your comments or critics.
|
|
IP Logged |
|
|
|
pago
Forum Guru
    
 Arimaa player #5439

Gender: 
Posts: 69
|
 |
Re: Global Algebric Material Evaluator
« Reply #56 on: Oct 11th, 2010, 6:52am » |
Quote Modify
|
I would have liked to get more reactions about HERD even to show evidences of its bad behaviour (of course I would prefer the contrary !) Shall I conclude that my HERD behaves as gnus and cannot survive in the Arimaa jungle ?
|
|
IP Logged |
|
|
|
Rednaxela
Forum Senior Member
   
 Arimaa player #4674
Gender: 
Posts: 34
|
 |
Re: Global Algebric Material Evaluator
« Reply #57 on: Oct 11th, 2010, 12:35pm » |
Quote Modify
|
Hey, sorry I haven't gotten around to responding myself, I've been a bit busy lately. To me it looks like HERD really is on the right track, at least to being competitive, though I haven't had a chance to do any tests on it or anything.
|
|
IP Logged |
|
|
|
pago
Forum Guru
    
 Arimaa player #5439

Gender: 
Posts: 69
|
 |
Re: Global Algebric Material Evaluator
« Reply #58 on: Oct 11th, 2010, 2:12pm » |
Quote Modify
|
Quote:Hey, sorry I haven't gotten around to responding myself, I've been a bit busy lately. To me it looks like HERD really is on the right track, at least to being competitive, though I haven't had a chance to do any tests on it or anything. |
| Hello Rednaxela, Thank you for your reply. I am feeling less alone I hope you will be less busy in the next days because I find that your test is an interesting measurement of evaluator behavior before implementation in bots. Unfortunately I have not the competencies to perform myself the queries in the database. For your information I intend to propose in the next few weeks a Positional Evaluator based on HERD. I have already tested it with the following matches and it seems to get quite good winning prediction even when the material is equal (my criterias are the winning prediction in the 1st, the 2nd, the 3rd part of the game, the whole game and the 5 moves before the first exchange) : 136191 : 2010 WC R8 / Tuks vs chessandgo 136706 : 2010 WC R9 / 99of9 vs Fritzlein 136807 : 2010 WC R9 / Adanac vs chessandgo 137490 : 2010 WC R10 / Adanac vs 99of9 137854 : 2010 WC R10 / chessandgo vs Fritzlein 138929 : 2010 WC R11 / Fritzlein vs chessandgo 140605 : 2010 AC R1 / Adanac vs bot_marwin 140750 : 2010 AC R1 / Arimabuff vs bot_marwin 141378 : 2010 AC R2 / Tuks vs bot_marwin The originality of the evaluator is that it is totally blind : It doesnt search for blockades, hostages, trap threats, goal threats etc... but even like this it seems to be a quite good predictor of the winning side. Of course, this evaluator should be completed by a efficient tree search and by a goal search to have a chance te be competitive in bot. I intend to perform other tests before trying to share results on the forum.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
    
 Arimaa player #706

Gender: 
Posts: 5928
|
 |
Re: Global Algebric Material Evaluator
« Reply #59 on: Oct 17th, 2010, 9:17pm » |
Quote Modify
|
on Oct 11th, 2010, 6:52am, pago wrote:Shall I conclude that my HERD behaves as gnus and cannot survive in the Arimaa jungle ? |
| In my undergraduate math department there was a Professor Mayer who was particularly good at refuting purported proofs, so that other professors came to him for checking their ideas. They taught us various methods of proof, for example proof by induction and proof by contradiction, but the method that sticks out most in my mind was "proof by Mayer". 1. Submit a conjecture to Professor Mayer. 2. He will generate a counter-example showing your conjecture to be false. 3. Modify your conjecture to exclude Professor Mayer's counter-example 4. Go to step 1. If on any iteration Professor Mayer fails to produce a counter-example, you may publish your conjecture as having been proven true.
|
|
IP Logged |
|
|
|
|