Welcome, Guest. Please Login or Register.
May 3rd, 2024, 11:44pm

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « Global Algebric Material Evaluator »


   Arimaa Forum
   Arimaa
   Bot Development
(Moderator: supersamu)
   Global Algebric Material Evaluator
« Previous topic | Next topic »
Pages: 1 2 3 4 5  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: Global Algebric Material Evaluator  (Read 9122 times)
aaaa
Forum Guru
*****



Arimaa player #958

   


Posts: 768
Re: Global Algebric Material Evaluator
« Reply #45 on: Sep 13th, 2010, 1:02pm »
Quote Quote Modify Modify

on Sep 13th, 2010, 11:21am, Fritzlein wrote:
Of course one could arbitrarily draw a line for switching between evaluators, but it would be much more elegant to have a single formula.

It's not just about elegance.
IP Logged
Rednaxela
Forum Senior Member
****



Arimaa player #4674

   


Gender: male
Posts: 34
Re: Global Algebric Material Evaluator
« Reply #46 on: Sep 13th, 2010, 8:14pm »
Quote Quote Modify Modify

on Sep 13th, 2010, 4:04am, pago wrote:

I am not so surprised by this.
 
As I tried to explain in a previous reply, GEM "measures a material balance as if the goal of Arimaa were to take the maximum quantity of adverse piece (or maybe more precisely as if there were no goal in Arimaa game).
It is good at the beginning but at the end, it is more important to win the game than to catch the adverse elephant.

Ahh, I see. As a quick note, I did some quick trials and found that unweighted "GEM + GAME" gives higher scores than either overall, but worse in than the best of the two in any given segment of the game. I tried some weighting based on number of pieces and got some slightly better performance still, but nothing that felt worth the inelegance of such melding to me.
 
 
on Sep 13th, 2010, 11:21am, Fritzlein wrote:

One thing that occurs to me is that you insisted your games end in goal.  Doesn't that slightly bias things in favor of evaluators that like rabbits?  In particular, someone who has an army consisting of lots of strong pieces and few rabbits might find it easier to win by immobilization than by goal.  I don't see why you shouldn't include wins by immobilization and elimination in your methodology.

Well, I didn't think much about elimination, but to me it seemed that immobilization wins are rare and are caused by rather different circumstances and thus would be more of a noise source than anything.
 
Prompted by you asking this though I did a test of including the different game results:
 
2000+ score, no bots, goal ending only (590 games)
"quiet position" turns only
Game Phase, Count, Marwin, GEM, GAME, FAME, FAMEeo, DAPE, DAPEeo, HarLog
Phase0, 359, 58.496%, 57.382%, 56.546%, 58.217%, 57.939%, 58.774%, 57.939%, 57.939%
Phase1, 1356, 70.870%, 71.313%, 70.428%, 70.723%, 71.460%, 70.944%, 71.386%, 70.870%
Phase2, 2211, 85.346%, 84.080%, 85.889%, 85.391%, 85.798%, 85.075%, 86.251%, 85.301%
Total, 3926, 77.891%, 77.229%, 77.866%, 77.840%, 78.299%, 77.789%, 78.528%, 77.815%
590
 
2000+ score, no bots, goal AND elimination ending only (595 games)
"quiet position" turns only
Game Phase, Count, Marwin, GEM, GAME, FAME, FAMEeo, DAPE, DAPEeo, HarLog
Phase0, 366, 58.197%, 57.104%, 56.557%, 57.923%, 57.650%, 58.470%, 57.650%, 57.650%
Phase1, 1381, 70.891%, 71.615%, 70.746%, 70.818%, 71.687%, 71.035%, 71.687%, 71.108%
Phase2, 2242, 85.459%, 84.255%, 86.084%, 85.504%, 85.995%, 85.236%, 86.396%, 85.459%
Total, 3989, 77.914%, 77.388%, 78.065%, 77.889%, 78.441%, 77.864%, 78.666%, 77.939%
595
 
2000+ score, no bots, goal/elimination/immobilization endings (607 games)
"quiet position" turns only
Game Phase, Count, Marwin, GEM, GAME, FAME, FAMEeo, DAPE, DAPEeo, HarLog
Phase0, 377, 58.355%, 57.294%, 56.764%, 58.090%, 57.825%, 58.621%, 57.825%, 57.825%
Phase1, 1407, 71.073%, 71.784%, 70.860%, 71.144%, 71.855%, 71.357%, 71.855%, 71.429%
Phase2, 2309, 85.881%, 84.712%, 86.488%, 85.925%, 86.401%, 85.665%, 86.791%, 85.881%
Total, 4093, 78.256%, 77.742%, 78.378%, 78.280%, 78.769%, 78.256%, 78.989%, 78.329%
 
 
The change to all evaluators and game segments seems essentially uniform, so at very least it doesn't really change the overall picture due to their rarity.
 
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Global Algebric Material Evaluator
« Reply #47 on: Sep 13th, 2010, 8:38pm »
Quote Quote Modify Modify

on Sep 13th, 2010, 8:14pm, Rednaxela wrote:
Well, I didn't think much about elimination, but to me it seemed that immobilization wins are rare and are caused by rather different circumstances and thus would be more of a noise source than anything.

Immobilization is a source of wins, not noise!  Or do you tell your opponent, when you lose by immobilization, that he didn't really beat you?  Wink
 
Quote:
The change to all evaluators and game segments seems essentially uniform, so at very least it doesn't really change the overall picture due to their rarity.

It makes sense that the impact would be small due to the rarity of non-goal results, but I was curious nonetheless.  Thanks for re-running the numbers.  Smiley
IP Logged

Rednaxela
Forum Senior Member
****



Arimaa player #4674

   


Gender: male
Posts: 34
Re: Global Algebric Material Evaluator
« Reply #48 on: Sep 13th, 2010, 8:58pm »
Quote Quote Modify Modify

on Sep 13th, 2010, 8:38pm, Fritzlein wrote:

Immobilization is a source of wins, not noise!  Or do you tell your opponent, when you lose by immobilization, that he didn't really beat you?  Wink

Hahaha, nah. What I mean by it being "noise" is that I felt that mixing it with goal wins would be too much of an "apples an oranges" comparison. I'm starting to change my mind though.  Roll Eyes
« Last Edit: Sep 13th, 2010, 8:58pm by Rednaxela » IP Logged
pago
Forum Guru
*****



Arimaa player #5439

   
Email

Gender: male
Posts: 69
Re: Global Algebric Material Evaluator
« Reply #49 on: Sep 14th, 2010, 2:10pm »
Quote Quote Modify Modify

Quote:
Ahh, I see. As a quick note, I did some quick trials and found that unweighted "GEM + GAME" gives higher scores than either overall, but worse in than the best of the two in any given segment of the game. I tried some weighting based on number of pieces and got some slightly better performance still, but nothing that felt worth the inelegance of such melding to me.

 
Maybe could we try to introduce a "goal balance" in the equation to take into account the goal of the game and to bias a little the evaluator in favour of rabbits.
 
This goal balance could be something as :
Balance(G;s;goal) = N6/(N6+n6)
 
So when we would introduce it in GEM equation it would be simplified :
(N6+n6)*Balance(G;s;goal) = N6
GEM = (sum(...)+N6)/(Sum(Ni+ni)+N6+n6)
 
I have not tried this idea yet and maybe it doesn't work at all.
IP Logged
pago
Forum Guru
*****



Arimaa player #5439

   
Email

Gender: male
Posts: 69
Re: Global Algebric Material Evaluator
« Reply #50 on: Sep 15th, 2010, 4:20am »
Quote Quote Modify Modify

Quote:
This goal balance could be something as :  
Balance(G;s;goal) = N6/(N6+n6)  
 
So when we would introduce it in GEM equation it would be simplified :  
(N6+n6)*Balance(G;s;goal) = N6  
GEM = (sum(...)+N6)/(Sum(Ni+ni)+N6+n6)  
 
I have not tried this idea yet and maybe it doesn't work at all.

 
Result : It doesn't work as it is... (rabbits are too favoured)
 
The idea seems to work with a little difference.
Instead of taking the number of rabbits as potential goals I consider that there are only two goals (one for each side). That is even more consistent with the fact that the match ends when one side has reached his goal.
 
The equations would become :
Balance(G;s;goal) = N6/(N6+n6)  
 
HEM = (Sigma(...)+2*balance(G;s;goal))/(Sigma(Ni+ni)+2)
 
I am performing my tests with Excel (!). I'll post the results when they are finished
IP Logged
aaaa
Forum Guru
*****



Arimaa player #958

   


Posts: 768
Re: Global Algebric Material Evaluator
« Reply #51 on: Sep 16th, 2010, 5:56pm »
Quote Quote Modify Modify

I just did some tests myself and I'm afraid to conclude that to use game data to evaluate or derive material evaluation functions would (still) be of dubious merit as it seems to lead to a lopsided preference of quantity of pieces over quality that's well outside the mainstream opinion.
IP Logged
pago
Forum Guru
*****



Arimaa player #5439

   
Email

Gender: male
Posts: 69
Re: Global Algebric Material Evaluator
« Reply #52 on: Sep 17th, 2010, 7:50am »
Quote Quote Modify Modify

Quote:
I just did some tests myself and I'm afraid to conclude that to use game data to evaluate or derive material evaluation functions would (still) be of dubious merit as it seems to lead to a lopsided preference of quantity of pieces over quality that's well outside the mainstream opinion.

 
Quote:
Instead of taking the number of rabbits as potential goals I consider that there are only two goals (one for each side). That is even more consistent with the fact that the match ends when one side has reached his goal.  
 
The equations would become :  
Balance(G;s;goal) = N6/(N6+n6)  
   
HEM = (Sigma(...)+2*balance(G;s;goal))/(Sigma(Ni+ni)+2)

 
I have rewritten for the third (and probably the last) time my paper about the evaluator.
 
This time I have incorporated the idea I posted in a previous reply to take into account the goal of Arimaa.
 
The pdf file is available under this link :  
http://sd-2.archive-host.com/membres/up/208912627824851423/HEM.pdf
 
The Excel calculation file is available under this link :
http://sd-2.archive-host.com/membres/up/208912627824851423/HEM.xls
 
I called this evaluator HEM / Holistic Evaluator of Material
(I am better to find names than to build efficient evaluators !)
 
 
The main modifications of the paper are :
- Incorporation of a goal balance
- A paragraph about first trade comparison added
- A paragraph about Dog + cat complete tournament (72 combinations) added
- A paragraph about intransitivity added
- The paragraph about matrix calculation removed
- Correction of some typos.
 
I didn’t copy all the tournament results in the appendix. They are available in the Excel file.
 
Compared to GEM, the improvements are :
- Increase of rabbit relative value when there is an unbalanced number of rabbit (the relative value between major pieces have not been changed). It should keep the GEM advantage in first round and GAME advantages in following rounds.
- Dog tournament results are more consistent with jdb’s results
- Switches of advantage after trades beginning from EMHHDDCC4R ve emhhddcc8r occurs after HDC ,trades (GEM needed HDCC)
 
The biggest potential defect that I haven’t fixed is that HEM undervalues M compared with the community consensus. For HEM, DC < M < DD.
 
HEM still foresees that intransitivity should occur. I am now almost convinced that it is not a defect of HEM but on the contrary an improvement compared to other evaluators although foreseen cycles are dubious because of M undervalue.
 
 
@Rednaxela : I would be very interested to see the behaviour of HEM in your result prediction tests (I would also understand that you have no time to test all my ideas !)
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Global Algebric Material Evaluator
« Reply #53 on: Sep 17th, 2010, 12:25pm »
Quote Quote Modify Modify

on Sep 17th, 2010, 7:50am, pago wrote:
HEM still foresees that intransitivity should occur. I am now almost convinced that it is not a defect of HEM but on the contrary an improvement compared to other evaluators although foreseen cycles are dubious because of M undervalue.

Improvement?  Just because material intransitivities exist in fact, doesn't mean that a system having intransitivities is an improvement over a system that doesn't have them.  A system might claim the existence of intransitivities that don't correspond to reality while not detecting ones that do.  The relevant question is whether HEM is right or wrong about its evaluations.
 
Again, thanks for sharing your results.  Thought experiments like yours keep advancing the state of the art.  I wonder whether future Arimaa grandmasters will become convinced, at least partially due to the material formulas under discussion, that our current outlook overvalues material quality and undervalues material quantity in late-game situations.
IP Logged

pago
Forum Guru
*****



Arimaa player #5439

   
Email

Gender: male
Posts: 69
Re: Global Algebric Material Evaluator
« Reply #54 on: Sep 18th, 2010, 1:36am »
Quote Quote Modify Modify

Quote:
Improvement?  Just because material intransitivities exist in fact, doesn't mean that a system having intransitivities is an improvement over a system that doesn't have them.  A system might claim the existence of intransitivities that don't correspond to reality while not detecting ones that do.  The relevant question is whether HEM is right or wrong about its evaluations.

 
@Fritzlein :
I agree with you.
"The relevant question is whether HEM is right or wrong about its evaluations."
... and at this time HEM is not perfect ! (for example its evaluation for relative value of M is probably wrong.
 
What I tried to say without subtility (sorry for my bad english) is that it is not so easy to design a consistent evaluator that foresees intransitivity and that is an interesting property of HEM (assuming that intransitivity does exist !).
 
Once again I am aware that HEM shall be improved and I share the common opinion about M underevaluation.
 
I hope that I am not borrying you with a thread that was a thread about one evaluator at the beginning and that I have became a thread about an evaluator designing process (once again...an unexpected property).
IP Logged
pago
Forum Guru
*****



Arimaa player #5439

   
Email

Gender: male
Posts: 69
Re: Global Algebric Material Evaluator
« Reply #55 on: Oct 1st, 2010, 11:03am »
Quote Quote Modify Modify

Quote:
The relevant question is whether HEM is right or wrong about its evaluations.

 
I believe that I have succeeded to fix the main weaknesses of the previous evaluators (GEM & HEM).
 
I have called this updated evaluator HERD (Holistic Evaluator of Remaining Duel).
 
I do hope that HERD will be competitive compared with the current best ones (FAME, DAPE, Harlog etc...) according to rednaxela criteria (I have good reasons for this hope if I refer to my tests).
 
Here are the links to the files :
http://sd-2.archive-host.com/membres/up/208912627824851423/HERD.pdf
http://sd-2.archive-host.com/membres/up/208912627824851423/HERD.zip
 
 
The main improvements of HERD evaluator compared with the previous ones are :
1) Evaluation of major pieces relative values much more closer from community consensus (for example HD > M > HC)
2) Evaluation of finales (Cat tournament, Dog tournament and DCR tournament) very close from jdb’s results (according to RMSE, MAE & MAPE error estimations).
3) Estimated relative value of a cat compared with a rabbit consistent with the consensus.
4) Estimated relative value of a dog compared with two rabbits consistent with the consensus.
5) Estimated relative value of a horse and three rabbits close from the consensus.
 
 
The main remaining potential defects or differences with community current consensus are :
a) Evaluation of MD versus HH at first trade. HERD doesn’t complies with consensus and evaluates that HH > MD at first trade (although it estimates that MD > HH after the trade of one dog). In the same way it evaluates that DD > HC.
b) Evaluation of DCC versus HH. HERD evaluates that DCC > HH at first trade.
c) Relative value of camels compared with rabbits. HERD evaluates that 4R > M > 3R (I don’t know what the consensus would be).
 
 
HERD calculation is based on the same formula than GEM and HEM formulas with two modifications :
1) Generalization of piece hierarchy
2) Introduction of a goal bias different from HEM goal balance
 
 
I have added a few paragraph in the paper (in particular to compare HERD behaviour with current consensus at first trade).
 
As usual, I am very interested by your comments or critics.
IP Logged
pago
Forum Guru
*****



Arimaa player #5439

   
Email

Gender: male
Posts: 69
Re: Global Algebric Material Evaluator
« Reply #56 on: Oct 11th, 2010, 6:52am »
Quote Quote Modify Modify


I would have liked to get more reactions about HERD even to show evidences of its bad behaviour (of course I would prefer the contrary !)  Lips Sealed
 
Shall I conclude that my HERD behaves as gnus and cannot survive in the Arimaa jungle ?
IP Logged
Rednaxela
Forum Senior Member
****



Arimaa player #4674

   


Gender: male
Posts: 34
Re: Global Algebric Material Evaluator
« Reply #57 on: Oct 11th, 2010, 12:35pm »
Quote Quote Modify Modify

Hey, sorry I haven't gotten around to responding myself, I've been a bit busy lately.
 
To me it looks like HERD really is on the right track, at least to being competitive, though I haven't had a chance to do any tests on it or anything.
IP Logged
pago
Forum Guru
*****



Arimaa player #5439

   
Email

Gender: male
Posts: 69
Re: Global Algebric Material Evaluator
« Reply #58 on: Oct 11th, 2010, 2:12pm »
Quote Quote Modify Modify

Quote:
Hey, sorry I haven't gotten around to responding myself, I've been a bit busy lately.  
 
To me it looks like HERD really is on the right track, at least to being competitive, though I haven't had a chance to do any tests on it or anything.

 
Hello Rednaxela,
 
Thank you for your reply. I am feeling less alone  Smiley
 
I hope you will be less busy in the next days because I find that your test is an interesting measurement of evaluator behavior before implementation in bots.
Unfortunately I have not the competencies to perform myself the queries in the database.
 
For your information I intend to propose in the next few weeks a Positional Evaluator based on HERD.
 
I have already tested it with the following matches and it seems to get quite good winning prediction even when the material is equal (my criterias are the winning prediction in the 1st, the 2nd, the 3rd part of the game, the whole game and the 5 moves before the first exchange) :
 
136191 : 2010 WC R8 / Tuks vs chessandgo
136706 : 2010 WC R9 / 99of9 vs Fritzlein
136807 : 2010 WC R9 / Adanac vs chessandgo
137490 : 2010 WC R10 / Adanac vs 99of9
137854 : 2010 WC R10 / chessandgo vs Fritzlein
138929 : 2010 WC R11 / Fritzlein vs chessandgo
140605 : 2010 AC R1 / Adanac vs bot_marwin
140750 : 2010 AC R1 / Arimabuff vs bot_marwin
141378 : 2010 AC R2 / Tuks vs bot_marwin
 
The originality of the evaluator is that it is totally blind : It doesnt search for blockades, hostages, trap threats, goal threats etc... but even like this it seems to be a quite good predictor of the winning side.
 
Of course, this evaluator should be completed by a efficient tree search and by a goal search to have a chance te be competitive in bot.
 
I intend to perform other tests before trying to share results on the forum.
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Global Algebric Material Evaluator
« Reply #59 on: Oct 17th, 2010, 9:17pm »
Quote Quote Modify Modify

on Oct 11th, 2010, 6:52am, pago wrote:
Shall I conclude that my HERD behaves as gnus and cannot survive in the Arimaa jungle ?

In my undergraduate math department there was a Professor Mayer who was particularly good at refuting purported proofs, so that other professors came to him for checking their ideas.  They taught us various methods of proof, for example proof by induction and proof by contradiction, but the method that sticks out most in my mind was "proof by Mayer".
 
1. Submit a conjecture to Professor Mayer.
2. He will generate a counter-example showing your conjecture to be false.
3. Modify your conjecture to exclude Professor Mayer's counter-example
4. Go to step 1.
 
If on any iteration Professor Mayer fails to produce a counter-example, you may publish your conjecture as having been proven true.  Wink
IP Logged

Pages: 1 2 3 4 5  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.