Arimaa Forum - Global Algebric Material Evaluator

Welcome, Guest. Please Login or Register.
Jul 12^th, 2025, 12:29am

Home

Help

Members

Arimaa Forum « Global Algebric Material Evaluator »

   Arimaa Forum
   Arimaa
   Bot Development (Moderator: supersamu)
   Global Algebric Material Evaluator

« Previous topic | Next topic »

Pages: 1 2 3 4 5

Notify of replies

Send Topic

Author

Topic: Global Algebric Material Evaluator (Read 9788 times)

aaaa
Forum Guru

Arimaa player #958

Posts: 768

Re: Global Algebric Material Evaluator
« Reply #45 on: Sep 13^th, 2010, 1:02pm »

Quote

Modify

on Sep 13^th, 2010, 11:21am, Fritzlein wrote:

Of course one could arbitrarily draw a line for switching between evaluators, but it would be much more elegant to have a single formula.

It's not just about elegance.

IP Logged

Rednaxela
Forum Senior Member

Arimaa player #4674

Gender: male

Posts: 34

Re: Global Algebric Material Evaluator
« Reply #46 on: Sep 13^th, 2010, 8:14pm »

Quote

Modify

on Sep 13^th, 2010, 4:04am, pago wrote:

I am not so surprised by this.

As I tried to explain in a previous reply, GEM "measures a material balance as if the goal of Arimaa were to take the maximum quantity of adverse piece (or maybe more precisely as if there were no goal in Arimaa game).
It is good at the beginning but at the end, it is more important to win the game than to catch the adverse elephant.

Ahh, I see. As a quick note, I did some quick trials and found that unweighted "GEM + GAME" gives higher scores than either overall, but worse in than the best of the two in any given segment of the game. I tried some weighting based on number of pieces and got some slightly better performance still, but nothing that felt worth the inelegance of such melding to me.

on Sep 13^th, 2010, 11:21am, Fritzlein wrote:

One thing that occurs to me is that you insisted your games end in goal. Doesn't that slightly bias things in favor of evaluators that like rabbits? In particular, someone who has an army consisting of lots of strong pieces and few rabbits might find it easier to win by immobilization than by goal. I don't see why you shouldn't include wins by immobilization and elimination in your methodology.

Well, I didn't think much about elimination, but to me it seemed that immobilization wins are rare and are caused by rather different circumstances and thus would be more of a noise source than anything.

Prompted by you asking this though I did a test of including the different game results:

2000+ score, no bots, goal ending only (590 games)
"quiet position" turns only
Game Phase, Count, Marwin, GEM, GAME, FAME, FAMEeo, DAPE, DAPEeo, HarLog
Phase0, 359, 58.496%, 57.382%, 56.546%, 58.217%, 57.939%, 58.774%, 57.939%, 57.939%
Phase1, 1356, 70.870%, 71.313%, 70.428%, 70.723%, 71.460%, 70.944%, 71.386%, 70.870%
Phase2, 2211, 85.346%, 84.080%, 85.889%, 85.391%, 85.798%, 85.075%, 86.251%, 85.301%
Total, 3926, 77.891%, 77.229%, 77.866%, 77.840%, 78.299%, 77.789%, 78.528%, 77.815%
590

2000+ score, no bots, goal AND elimination ending only (595 games)
"quiet position" turns only
Game Phase, Count, Marwin, GEM, GAME, FAME, FAMEeo, DAPE, DAPEeo, HarLog
Phase0, 366, 58.197%, 57.104%, 56.557%, 57.923%, 57.650%, 58.470%, 57.650%, 57.650%
Phase1, 1381, 70.891%, 71.615%, 70.746%, 70.818%, 71.687%, 71.035%, 71.687%, 71.108%
Phase2, 2242, 85.459%, 84.255%, 86.084%, 85.504%, 85.995%, 85.236%, 86.396%, 85.459%
Total, 3989, 77.914%, 77.388%, 78.065%, 77.889%, 78.441%, 77.864%, 78.666%, 77.939%
595

2000+ score, no bots, goal/elimination/immobilization endings (607 games)
"quiet position" turns only
Game Phase, Count, Marwin, GEM, GAME, FAME, FAMEeo, DAPE, DAPEeo, HarLog
Phase0, 377, 58.355%, 57.294%, 56.764%, 58.090%, 57.825%, 58.621%, 57.825%, 57.825%
Phase1, 1407, 71.073%, 71.784%, 70.860%, 71.144%, 71.855%, 71.357%, 71.855%, 71.429%
Phase2, 2309, 85.881%, 84.712%, 86.488%, 85.925%, 86.401%, 85.665%, 86.791%, 85.881%
Total, 4093, 78.256%, 77.742%, 78.378%, 78.280%, 78.769%, 78.256%, 78.989%, 78.329%

The change to all evaluators and game segments seems essentially uniform, so at very least it doesn't really change the overall picture due to their rarity.

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: Global Algebric Material Evaluator
« Reply #47 on: Sep 13^th, 2010, 8:38pm »

Quote

Modify

on Sep 13^th, 2010, 8:14pm, Rednaxela wrote:

Immobilization is a source of wins, not noise! Or do you tell your opponent, when you lose by immobilization, that he didn't really beat you? Wink

Quote:

The change to all evaluators and game segments seems essentially uniform, so at very least it doesn't really change the overall picture due to their rarity.

It makes sense that the impact would be small due to the rarity of non-goal results, but I was curious nonetheless. Thanks for re-running the numbers.

IP Logged

Rednaxela
Forum Senior Member

Arimaa player #4674

Gender: male

Posts: 34

Re: Global Algebric Material Evaluator
« Reply #48 on: Sep 13^th, 2010, 8:58pm »

Quote

Modify

on Sep 13^th, 2010, 8:38pm, Fritzlein wrote:

Immobilization is a source of wins, not noise! Or do you tell your opponent, when you lose by immobilization, that he didn't really beat you? Wink

Hahaha, nah. What I mean by it being "noise" is that I felt that mixing it with goal wins would be too much of an "apples an oranges" comparison. I'm starting to change my mind though. Roll Eyes

« Last Edit: Sep 13^th, 2010, 8:58pm by Rednaxela »

IP Logged

pago
Forum Guru

Arimaa player #5439

Gender:

Posts: 69

Re: Global Algebric Material Evaluator
« Reply #49 on: Sep 14^th, 2010, 2:10pm »

Quote

Modify

Quote:

Maybe could we try to introduce a "goal balance" in the equation to take into account the goal of the game and to bias a little the evaluator in favour of rabbits.

This goal balance could be something as :
Balance(G;s;goal) = N6/(N6+n6)

So when we would introduce it in GEM equation it would be simplified :
(N6+n6)*Balance(G;s;goal) = N6
GEM = (sum(...)+N6)/(Sum(Ni+ni)+N6+n6)

I have not tried this idea yet and maybe it doesn't work at all.

IP Logged

pago
Forum Guru

Arimaa player #5439

Gender:

Posts: 69

Re: Global Algebric Material Evaluator
« Reply #50 on: Sep 15^th, 2010, 4:20am »

Quote

Modify

Quote:

This goal balance could be something as :
Balance(G;s;goal) = N6/(N6+n6)

So when we would introduce it in GEM equation it would be simplified :
(N6+n6)*Balance(G;s;goal) = N6
GEM = (sum(...)+N6)/(Sum(Ni+ni)+N6+n6)

I have not tried this idea yet and maybe it doesn't work at all.

Result : It doesn't work as it is... (rabbits are too favoured)

The idea seems to work with a little difference.
Instead of taking the number of rabbits as potential goals I consider that there are only two goals (one for each side). That is even more consistent with the fact that the match ends when one side has reached his goal.

The equations would become :
Balance(G;s;goal) = N6/(N6+n6)

HEM = (Sigma(...)+2*balance(G;s;goal))/(Sigma(Ni+ni)+2)

I am performing my tests with Excel (!). I'll post the results when they are finished

IP Logged

aaaa
Forum Guru

Arimaa player #958

Posts: 768

Re: Global Algebric Material Evaluator
« Reply #51 on: Sep 16^th, 2010, 5:56pm »

Quote

Modify

I just did some tests myself and I'm afraid to conclude that to use game data to evaluate or derive material evaluation functions would (still) be of dubious merit as it seems to lead to a lopsided preference of quantity of pieces over quality that's well outside the mainstream opinion.

IP Logged

pago
Forum Guru

Arimaa player #5439

Gender:

Posts: 69

Re: Global Algebric Material Evaluator
« Reply #52 on: Sep 17^th, 2010, 7:50am »

Quote

Modify

Quote:

Instead of taking the number of rabbits as potential goals I consider that there are only two goals (one for each side). That is even more consistent with the fact that the match ends when one side has reached his goal.

The equations would become :
Balance(G;s;goal) = N6/(N6+n6)

HEM = (Sigma(...)+2*balance(G;s;goal))/(Sigma(Ni+ni)+2)

I have rewritten for the third (and probably the last) time my paper about the evaluator.

This time I have incorporated the idea I posted in a previous reply to take into account the goal of Arimaa.

The pdf file is available under this link :
http://sd-2.archive-host.com/membres/up/208912627824851423/HEM.pdf

The Excel calculation file is available under this link :
http://sd-2.archive-host.com/membres/up/208912627824851423/HEM.xls

I called this evaluator HEM / Holistic Evaluator of Material
(I am better to find names than to build efficient evaluators !)

The main modifications of the paper are :
- Incorporation of a goal balance
- A paragraph about first trade comparison added
- A paragraph about Dog + cat complete tournament (72 combinations) added
- A paragraph about intransitivity added
- The paragraph about matrix calculation removed
- Correction of some typos.

I didn’t copy all the tournament results in the appendix. They are available in the Excel file.

Compared to GEM, the improvements are :
- Increase of rabbit relative value when there is an unbalanced number of rabbit (the relative value between major pieces have not been changed). It should keep the GEM advantage in first round and GAME advantages in following rounds.
- Dog tournament results are more consistent with jdb’s results
- Switches of advantage after trades beginning from EMHHDDCC4R ve emhhddcc8r occurs after HDC ,trades (GEM needed HDCC)

The biggest potential defect that I haven’t fixed is that HEM undervalues M compared with the community consensus. For HEM, DC < M < DD.

HEM still foresees that intransitivity should occur. I am now almost convinced that it is not a defect of HEM but on the contrary an improvement compared to other evaluators although foreseen cycles are dubious because of M undervalue.

@Rednaxela : I would be very interested to see the behaviour of HEM in your result prediction tests (I would also understand that you have no time to test all my ideas !)

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: Global Algebric Material Evaluator
« Reply #53 on: Sep 17^th, 2010, 12:25pm »

Quote

Modify

on Sep 17^th, 2010, 7:50am, pago wrote:

HEM still foresees that intransitivity should occur. I am now almost convinced that it is not a defect of HEM but on the contrary an improvement compared to other evaluators although foreseen cycles are dubious because of M undervalue.

Improvement? Just because material intransitivities exist in fact, doesn't mean that a system having intransitivities is an improvement over a system that doesn't have them. A system might claim the existence of intransitivities that don't correspond to reality while not detecting ones that do. The relevant question is whether HEM is right or wrong about its evaluations.

Again, thanks for sharing your results. Thought experiments like yours keep advancing the state of the art. I wonder whether future Arimaa grandmasters will become convinced, at least partially due to the material formulas under discussion, that our current outlook overvalues material quality and undervalues material quantity in late-game situations.

IP Logged

pago
Forum Guru

Arimaa player #5439

Gender:

Posts: 69

Re: Global Algebric Material Evaluator
« Reply #54 on: Sep 18^th, 2010, 1:36am »

Quote

Modify

Quote:

@Fritzlein :
I agree with you.
"The relevant question is whether HEM is right or wrong about its evaluations."
... and at this time HEM is not perfect ! (for example its evaluation for relative value of M is probably wrong.

What I tried to say without subtility (sorry for my bad english) is that it is not so easy to design a consistent evaluator that foresees intransitivity and that is an interesting property of HEM (assuming that intransitivity does exist !).

Once again I am aware that HEM shall be improved and I share the common opinion about M underevaluation.

I hope that I am not borrying you with a thread that was a thread about one evaluator at the beginning and that I have became a thread about an evaluator designing process (once again...an unexpected property).

IP Logged

pago
Forum Guru

Arimaa player #5439

Gender:

Posts: 69

Re: Global Algebric Material Evaluator
« Reply #55 on: Oct 1^st, 2010, 11:03am »

Quote

Modify

Quote:

The relevant question is whether HEM is right or wrong about its evaluations.

I believe that I have succeeded to fix the main weaknesses of the previous evaluators (GEM & HEM).

I have called this updated evaluator HERD (Holistic Evaluator of Remaining Duel).

I do hope that HERD will be competitive compared with the current best ones (FAME, DAPE, Harlog etc...) according to rednaxela criteria (I have good reasons for this hope if I refer to my tests).

Here are the links to the files :
http://sd-2.archive-host.com/membres/up/208912627824851423/HERD.pdf
http://sd-2.archive-host.com/membres/up/208912627824851423/HERD.zip

The main improvements of HERD evaluator compared with the previous ones are :
1) Evaluation of major pieces relative values much more closer from community consensus (for example HD > M > HC)
2) Evaluation of finales (Cat tournament, Dog tournament and DCR tournament) very close from jdb’s results (according to RMSE, MAE & MAPE error estimations).
3) Estimated relative value of a cat compared with a rabbit consistent with the consensus.
4) Estimated relative value of a dog compared with two rabbits consistent with the consensus.
5) Estimated relative value of a horse and three rabbits close from the consensus.

The main remaining potential defects or differences with community current consensus are :
a) Evaluation of MD versus HH at first trade. HERD doesn’t complies with consensus and evaluates that HH > MD at first trade (although it estimates that MD > HH after the trade of one dog). In the same way it evaluates that DD > HC.
b) Evaluation of DCC versus HH. HERD evaluates that DCC > HH at first trade.
c) Relative value of camels compared with rabbits. HERD evaluates that 4R > M > 3R (I don’t know what the consensus would be).

HERD calculation is based on the same formula than GEM and HEM formulas with two modifications :
1) Generalization of piece hierarchy
2) Introduction of a goal bias different from HEM goal balance

I have added a few paragraph in the paper (in particular to compare HERD behaviour with current consensus at first trade).

As usual, I am very interested by your comments or critics.

IP Logged

pago
Forum Guru

Arimaa player #5439

Gender:

Posts: 69

Re: Global Algebric Material Evaluator
« Reply #56 on: Oct 11^th, 2010, 6:52am »

Quote

Modify

I would have liked to get more reactions about HERD even to show evidences of its bad behaviour (of course I would prefer the contrary !) Lips Sealed

Shall I conclude that my HERD behaves as gnus and cannot survive in the Arimaa jungle ?

IP Logged

Rednaxela
Forum Senior Member

Arimaa player #4674

Gender: male

Posts: 34

Re: Global Algebric Material Evaluator
« Reply #57 on: Oct 11^th, 2010, 12:35pm »

Quote

Modify

Hey, sorry I haven't gotten around to responding myself, I've been a bit busy lately.

To me it looks like HERD really is on the right track, at least to being competitive, though I haven't had a chance to do any tests on it or anything.

IP Logged

pago
Forum Guru

Arimaa player #5439

Gender:

Posts: 69

Re: Global Algebric Material Evaluator
« Reply #58 on: Oct 11^th, 2010, 2:12pm »

Quote

Modify

Quote:

Hello Rednaxela,

Thank you for your reply. I am feeling less alone

I hope you will be less busy in the next days because I find that your test is an interesting measurement of evaluator behavior before implementation in bots.
Unfortunately I have not the competencies to perform myself the queries in the database.

For your information I intend to propose in the next few weeks a Positional Evaluator based on HERD.

I have already tested it with the following matches and it seems to get quite good winning prediction even when the material is equal (my criterias are the winning prediction in the 1st, the 2nd, the 3rd part of the game, the whole game and the 5 moves before the first exchange) :

136191 : 2010 WC R8 / Tuks vs chessandgo
136706 : 2010 WC R9 / 99of9 vs Fritzlein
136807 : 2010 WC R9 / Adanac vs chessandgo
137490 : 2010 WC R10 / Adanac vs 99of9
137854 : 2010 WC R10 / chessandgo vs Fritzlein
138929 : 2010 WC R11 / Fritzlein vs chessandgo
140605 : 2010 AC R1 / Adanac vs bot_marwin
140750 : 2010 AC R1 / Arimabuff vs bot_marwin
141378 : 2010 AC R2 / Tuks vs bot_marwin

The originality of the evaluator is that it is totally blind : It doesnt search for blockades, hostages, trap threats, goal threats etc... but even like this it seems to be a quite good predictor of the winning side.

Of course, this evaluator should be completed by a efficient tree search and by a goal search to have a chance te be competitive in bot.

I intend to perform other tests before trying to share results on the forum.

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: Global Algebric Material Evaluator
« Reply #59 on: Oct 17^th, 2010, 9:17pm »

Quote

Modify

on Oct 11^th, 2010, 6:52am, pago wrote:

Shall I conclude that my HERD behaves as gnus and cannot survive in the Arimaa jungle ?

In my undergraduate math department there was a Professor Mayer who was particularly good at refuting purported proofs, so that other professors came to him for checking their ideas. They taught us various methods of proof, for example proof by induction and proof by contradiction, but the method that sticks out most in my mind was "proof by Mayer".

1. Submit a conjecture to Professor Mayer.
2. He will generate a counter-example showing your conjecture to be false.
3. Modify your conjecture to exclude Professor Mayer's counter-example
4. Go to step 1.

If on any iteration Professor Mayer fails to produce a counter-example, you may publish your conjecture as having been proven true. Wink

IP Logged

Pages: 1 2 3 4 5

Notify of replies

Send Topic


« Previous topic \| Next topic »