Arimaa Forum - Global Algebric Material Evaluator

Welcome, Guest. Please Login or Register.
Jul 5^th, 2025, 6:13pm

Home

Help

Members

Arimaa Forum « Global Algebric Material Evaluator »

   Arimaa Forum
   Arimaa
   Bot Development (Moderator: supersamu)
   Global Algebric Material Evaluator

« Previous topic | Next topic »

Pages: 1 2 3 4 5

Notify of replies

Send Topic

Author

Topic: Global Algebric Material Evaluator (Read 9771 times)

pago
Forum Guru

Arimaa player #5439

Gender:

Posts: 69

Re: Global Algebric Material Evaluator
« Reply #30 on: Sep 9^th, 2010, 2:12pm »

Quote

Modify

Quote:

Here is an earlier thread addressing the possibility of material intransitivity.

Interesting thread.
So GEM gives some positive argument about the possibility of material intransitivity although I think that the cycles GEM has revealed are not convincing.

Quote:

Whether this evaluation function turns out to be better than others or not, I am very impressed with the clarity of your paper, the emergent behavior your evaluation demonstrates, and how the small set of rules succeed in getting a result as accurate as it is.

Thank you for this friendly reply.

Now, Physicians say that an elegant theory is not a sufficient criteria to recognize a good theory.
Similarly, GEM has to be tested indifferent situations. I tried to perform some of them and compare them to jdb's results (thank you jdb for your nice job which is a reference for me) but Excel has some limitations !

Rednaxela comparisions between evaluators is a much more valuable test.

I would also be highly interested if someone would try to implement it in a bot. I believe that it is the best test for an evaluator and I would be curious to see how GEM would perform.

Quote:

The results from the reduced material tournament I played do not agree with this.

Nobody is perfect !
My last (but not credible) hope is that the use of clueless by jdb induces a kind of bias (for example if positional factor parameters are not accurate, clueless might use some combination of piece in a way which is not efficient).
I am aware that it is dubious hope.

I have now an explaination that seems more credible to me.

GEM evaluates a material balance but jdb tests the real results of matches between setup.
However a material evaluator is not a complete result predictor although there is a great correlation.

In particular GEM estimates the power balance between pieces without taking into account that the goal of Arimaa is taking the last rabbit of reaching a rabbit on the last rank.
GEM evaluates material as if the goal would be to take indifferently the greater number of pieces.
In general it has not a great impact excepted when it remains one or two rabbits as in the cycles given in the paper.

I expect that other examples of intransitivity could exist with a greater number of rabbit. In that case (if I am right) they would be much more convincing.

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: Global Algebric Material Evaluator
« Reply #31 on: Sep 10^th, 2010, 1:01pm »

Quote

Modify

on Sep 6^th, 2010, 6:00pm, Rednaxela wrote:

As a test I decided to do a couple queries where.... instead of checking whether the evaluators approve of a move by the winning player, checking whether the evaluators approve of the move of a particular player. Just so you know Fritzlein... it turns out you appear to have a slight bias to make moves that FAME and FAMEeo approve of, whereas 99of9 has a slight bias to make moves that DAPE and DAPEeo approve of. Thus... I think you're quite right that this method measures what players like to do.

Ah, nice, that was a very interesting test to run. I'm surprised (although I shouldn't be) that DAPE predicts what 99of9 actually likes to do better than FAME does. I guess that when he invented DAPE he was just putting his mouth where his money is.

IP Logged

tize
Forum Guru

Arimaa player #3121

Gender: male

Posts: 118

Re: Global Algebric Material Evaluator
« Reply #32 on: Sep 12^th, 2010, 4:55am »

Quote

Modify

@Rednaxela: May I ask you to include a static material evaluator in your tests? We all know that the static ones are inferor, but it would be nice to see by how much in the tests you have been running. At least it would be nice for me as Marwin is using a static one. I can give you Marwins values if you'd like to run it.

IP Logged

Rednaxela
Forum Senior Member

Arimaa player #4674

Gender: male

Posts: 34

Re: Global Algebric Material Evaluator
« Reply #33 on: Sep 12^th, 2010, 1:27pm »

Quote

Modify

on Sep 12^th, 2010, 4:55am, tize wrote:

Sure, I'm curious about this also. I'd be particularly interested in testing with Marwin's values.

I'll also try out GEM at the same time.

IP Logged

tize
Forum Guru

Arimaa player #3121

Gender: male

Posts: 118

Re: Global Algebric Material Evaluator
« Reply #34 on: Sep 12^th, 2010, 2:32pm »

Quote

Modify

Ok then here is Marwins material values

const int MaterialRabbit1 = 1150;
const int MaterialRabbit2 = 1300;
const int MaterialRabbit3 = 1600;
const int MaterialRabbit4 = 2150;
const int MaterialRabbit5 = 2700;
const int MaterialRabbit6 = 3450;
const int MaterialRabbit7 = 4200;
const int MaterialRabbit8 = 7600;

const int MaterialCat = 2000;
const int MaterialDog = 2500;
const int MaterialHorse = 3500;
const int MaterialCamel = 5800;
const int MaterialElephant = 10000;

Everything is just sum'ed up, once for gold and once for silver, then the difference of those values are calculated...

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: Global Algebric Material Evaluator
« Reply #35 on: Sep 12^th, 2010, 3:41pm »

Quote

Modify

on Sep 12^th, 2010, 2:32pm, tize wrote:

const int MaterialRabbit8 = 7600;

Would it have any effect to make the value of the eighth rabbit higher or lower? I can imagine that it doesn't matter at all, given that loss by elimination is handled separately, but if it does matter in any way, why not make this value much higher? Is a moderate value a guard against playing under a rule set that allows draws?

IP Logged

Isaac Grosof
Forum Guru

Longtime Arimaa Fan

Gender: male

Posts: 175

Re: Global Algebric Material Evaluator
« Reply #36 on: Sep 12^th, 2010, 9:33pm »

Quote

Modify

on Sep 12^th, 2010, 2:32pm, tize wrote:

const int MaterialElephant = 10000;

So low? I would set the value of an elephant far higher. On the other hand, I suppose the only time it would matter is in the end game, and then it is a reasonable value. Does it cause any problems?

IP Logged

Sorry about that one thing.

Rednaxela
Forum Senior Member

Arimaa player #4674

Gender: male

Posts: 34

Re: Global Algebric Material Evaluator
« Reply #37 on: Sep 12^th, 2010, 11:36pm »

Quote

Modify

Here are the latest results.

The "How often does the eventual winner have the advantage according to the evaluator" tests:
(aka "Accuracy of the guess of who wins")

1700+ rating, bots excluded (1708 games)
Counting all turns:
Game Phase, Count, Marwin, GEM, GAME, FAME, FAMEeo, DAPE, DAPEeo, HarLog
Phase0, 5957, 68.189%, 67.030%, 66.577%, 67.937%, 67.836%, 67.786%, 67.870%, 67.819%
Phase1, 35239, 75.740%, 74.962%, 75.090%, 75.431%, 76.793%, 75.487%, 76.784%, 75.731%
Phase2, 49145, 85.921%, 84.999%, 86.145%, 85.496%, 86.526%, 85.630%, 86.662%, 85.622%
Total, 90341, 80.781%, 79.899%, 80.543%, 80.412%, 81.497%, 80.497%, 81.570%, 80.590%

1700+ rating, bots excluded (1708 games)
Only counting "quiet position" turns where 1) a capture occurs, and 2) the next turn is not a capture:
Game Phase, Count, Marwin, GEM, GAME, FAME, FAMEeo, DAPE, DAPEeo, HarLog
Phase0, 1014, 57.495%, 56.903%, 56.805%, 57.396%, 57.692%, 57.396%, 57.791%, 57.298%
Phase1, 3823, 71.410%, 71.305%, 71.122%, 71.384%, 72.142%, 71.384%, 72.142%, 71.436%
Phase2, 6476, 86.056%, 85.392%, 86.658%, 85.747%, 86.736%, 85.917%, 86.844%, 85.825%
Total, 11313, 78.547%, 78.078%, 78.732%, 78.352%, 79.201%, 78.450%, 79.272%, 78.405%

2000+ rating, bots excluded (590 games)
Counting all turns:
Game Phase, Count, Marwin, GEM, GAME, FAME, FAMEeo, DAPE, DAPEeo, HarLog
Phase0, 2158, 70.575%, 68.443%, 67.933%, 70.065%, 69.648%, 70.204%, 69.648%, 69.741%
Phase1, 12526, 75.267%, 74.557%, 73.551%, 74.900%, 75.675%, 75.627%, 75.571%, 75.100%
Phase2, 17416, 85.450%, 84.101%, 86.001%, 85.083%, 86.007%, 84.899%, 86.386%, 84.773%
Total, 32100, 80.477%, 79.324%, 79.928%, 80.100%, 80.875%, 80.293%, 81.040%, 79.988%

2000+ rating, bots excluded (590 games)
Only counting "quiet position" turns where 1) a capture occurs, and 2) the next turn is not a capture:
Game Phase, Count, Marwin, GEM, GAME, FAME, FAMEeo, DAPE, DAPEeo, HarLog
Phase0, 359, 58.496%, 57.382%, 56.546%, 58.217%, 57.939%, 58.774%, 57.939%, 57.939%
Phase1, 1356, 70.870%, 71.313%, 70.428%, 70.723%, 71.460%, 70.944%, 71.386%, 70.870%
Phase2, 2211, 85.346%, 84.080%, 85.889%, 85.391%, 85.798%, 85.075%, 86.251%, 85.301%
Total, 3926, 77.891%, 77.229%, 77.866%, 77.840%, 78.299%, 77.789%, 78.528%, 77.815%

The "How often the evaluator approves of the winning player's trades/captures" tests:
(aka "How the winning player likes to play")

1700+ rating, bots excluded (1708 games)
Game Phase, Count, Marwin, GEM, GAME, FAME, FAMEeo, DAPE, DAPEeo, HarLog
Phase0, 1014, 57.002%, 58.087%, 57.298%, 57.890%, 57.988%, 57.890%, 57.988%, 57.692%
Phase1, 3823, 63.955%, 68.193%, 66.100%, 68.166%, 68.585%, 68.062%, 67.800%, 67.852%
Phase2, 6476, 73.116%, 75.664%, 73.564%, 76.004%, 76.683%, 76.081%, 75.293%, 75.540%
Total, 11313, 68.576%, 71.564%, 69.584%, 71.732%, 72.271%, 71.740%, 71.210%, 71.343%

2000+ rating, bots excluded (590 games)
Game Phase, Count, Marwin, GEM, GAME, FAME, FAMEeo, DAPE, DAPEeo, HarLog
Phase0, 359, 54.039%, 55.153%, 54.318%, 55.989%, 55.989%, 56.546%, 55.710%, 55.153%
Phase1, 1356, 62.611%, 67.109%, 64.381%, 66.888%, 67.330%, 66.962%, 66.593%, 66.740%
Phase2, 2211, 69.878%, 72.999%, 70.782%, 72.999%, 73.813%, 73.089%, 72.546%, 72.365%
Total, 3926, 65.920%, 69.333%, 67.066%, 69.333%, 69.944%, 69.460%, 68.951%, 68.849%

I find the following things interesting:

1) Marwin's static material evaluation scores well in predicting the eventual winner, yet scores as relatively dissimilar to the trades/captures winning players get into. Curious...

2) Compared to GAME, GEM comes up much more similar to how players act, yet didn't show such an improvement in predicting eventual winner.

3) For predicting the eventual winner, GEM improves upon GAME in the early game, but is worse in the late game.

For a finer-grained analysis, I'm now thinking about doing best-fit plots of "material evaluator score" versus "win/loss", With those best-bit plots, calculating the y-axis error (NOT least-squares style), should give a good even-handed way to take magnitude of score into account, as opposed to the current boolean "predicted correctly or not".

I'm also open to other suggestions of how to analyze the results if anyone has any ideas.

« Last Edit: Sep 12^th, 2010, 11:50pm by Rednaxela »

IP Logged

tize
Forum Guru

Arimaa player #3121

Gender: male

Posts: 118

Re: Global Algebric Material Evaluator
« Reply #38 on: Sep 13^th, 2010, 12:04am »

Quote

Modify

on Sep 12^th, 2010, 3:41pm, Fritzlein wrote:

If we just look at this as a material evaluator then no, the value of the last rabbit has no effect at all, because both sides always have the last rabbit left whenever the evaluation is called for a position. But the values are used for a lot of things in Marwin e.g. to evaluate capture threats, hostages, and frames. So changing that value will change how he plays when one side only has one rabbit left.

I know that I had that value at least a little higher about a year ago. I don't remember exactly why I changed it but I can imagine that it was because he gave away pieces for rabbit threats.

on Sep 12^th, 2010, 9:33pm, 722caasi wrote:

So low? I would set the value of an elephant far higher. On the other hand, I suppose the only time it would matter is in the end game, and then it is a reasonable value. Does it cause any problems?

10000 is enough to make him careful of his own elephant and, if the opportunity arise, take the opponents elephant.

It would be very rare for him to see that he can trap material that is more valuable than the elephant, but that those trappings only where available if he couldn't save his own elephant. Most likely he then could trap something and pass up on the complete trade.

IP Logged

tize
Forum Guru

Arimaa player #3121

Gender: male

Posts: 118

Re: Global Algebric Material Evaluator
« Reply #39 on: Sep 13^th, 2010, 12:17am »

Quote

Modify

on Sep 12^th, 2010, 11:36pm, Rednaxela wrote:

1) Marwin's static material evaluation scores well in predicting the eventual winner, yet scores as relatively dissimilar to the trades/captures winning players get into. Curious...

Predicting the winner is probably less important than telling the bot which trades to go after...

But honestly I didn't think that the static evaluator would score this well in predicting the winner. This tells me that the static evaluation should eventually be replaced but the award of doing it isn't very high.

IP Logged

Rednaxela
Forum Senior Member

Arimaa player #4674

Gender: male

Posts: 34

Re: Global Algebric Material Evaluator
« Reply #40 on: Sep 13^th, 2010, 12:35am »

Quote

Modify

on Sep 13^th, 2010, 12:17am, tize wrote:

Predicting the winner is probably less important than telling the bot which trades to go after...

Agreed, but note that my second set of tests *doesn't* necessarily reflect the best trades to go after, because it essentially just gives a mirror into the current practices of human players, rather than directly measuring what works.

Currently, I'm hoping the procedure I described before, using the y-axis error of a best-fit plot, will provide something more meaningful than boolean winner prediction, while avoiding the biases of what current human players do. It is still win-centric rather than trade-centric, but:
1) Unlike the boolean win prediction, it will care about the magnitude of the evaluator output, rather than just the sign.
2) I'm starting to think that the trade-centric approach will always inherently follow current habits of players.

IP Logged

99of9
Forum Guru

Gnobby's creator (player #314)

Gender:

Posts: 1413

Re: Global Algebric Material Evaluator
« Reply #41 on: Sep 13^th, 2010, 2:01am »

Quote

Modify

I'm glad to read all the analysis in this thread, thankyou. I'm also glad that DAPE is still competitive.

on Sep 10^th, 2010, 1:01pm, Fritzlein wrote:

This is very curious. If only the world championship was decided according to material evaluation formulas! ... then chessandgo wouldn't even get to participate.

IP Logged

pago
Forum Guru

Arimaa player #5439

Gender:

Posts: 69

Re: Global Algebric Material Evaluator
« Reply #42 on: Sep 13^th, 2010, 4:04am »

Quote

Modify

Thank you Rednaxela for these interesting results

Quote:

2) 3) For predicting the eventual winner, GEM improves upon GAME in the early game, but is worse in the late game.

I am not so surprised by this.

As I tried to explain in a previous reply, GEM "measures a material balance as if the goal of Arimaa were to take the maximum quantity of adverse piece (or maybe more precisely as if there were no goal in Arimaa game).
It is good at the beginning but at the end, it is more important to win the game than to catch the adverse elephant.

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: Global Algebric Material Evaluator
« Reply #43 on: Sep 13^th, 2010, 11:21am »

Quote

Modify

on Sep 12^th, 2010, 11:36pm, Rednaxela wrote:

Here are the latest results.
[...]Only counting "quiet position" turns where 1) a capture occurs, and 2) the next turn is not a capture:

I like this methodology much better than your previous one, because it counts each material state only once.
Before if a material state persisted twenty ply, each evaluator got to predict the winner twenty times, and each was considered right twenty times or wrong twenty times on a single throw of the dice. Intuitively that way of doing things adds noise and makes the results less reliable, whereas your new way of taking quiet positions only counts each imbalance only once, removing that noise.

Quote:

2000+ rating, bots excluded (590 games)
Only counting "quiet position" turns where 1) a capture occurs, and 2) the next turn is not a capture:
Game Phase, Count, Marwin, GEM, GAME, FAME, FAMEeo, DAPE, DAPEeo, HarLog
Phase0, 359, 58.496%, 57.382%, 56.546%, 58.217%, 57.939%, 58.774%, 57.939%, 57.939%
Phase1, 1356, 70.870%, 71.313%, 70.428%, 70.723%, 71.460%, 70.944%, 71.386%, 70.870%
Phase2, 2211, 85.346%, 84.080%, 85.889%, 85.391%, 85.798%, 85.075%, 86.251%, 85.301%
Total, 3926, 77.891%, 77.229%, 77.866%, 77.840%, 78.299%, 77.789%, 78.528%, 77.815%

The winners now are:
Phase 0: DAPE, Marwin, FAME
Phase 1: FAMEeo, DAPEeo, GEM
Phase 2: DAPEeo, GAME, FAMEeo

In other words, it's a big mess, with no evaluator clearly best according to your metric. We somehow need a single formula that evaluates trades like strong humans do in the opening, but values smaller pieces and numerical superiority more later in the game. Of course one could arbitrarily draw a line for switching between evaluators, but it would be much more elegant to have a single formula.

One thing that occurs to me is that you insisted your games end in goal. Doesn't that slightly bias things in favor of evaluators that like rabbits? In particular, someone who has an army consisting of lots of strong pieces and few rabbits might find it easier to win by immobilization than by goal. I don't see why you shouldn't include wins by immobilization and elimination in your methodology.

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: Global Algebric Material Evaluator
« Reply #44 on: Sep 13^th, 2010, 11:32am »

Quote

Modify

on Sep 13^th, 2010, 12:04am, tize wrote:

Ah, that makes sense. Thanks for explaining.

on Sep 13^th, 2010, 12:17am, tize wrote:

This tells me that the static evaluation should eventually be replaced but the award of doing it isn't very high.

This puts you in agreement with David Fotland. It is telling that both you and he were able to code championship bots without dynamic material evaluation. Obviously it isn't as big a deal as we think. Fotland opined that even when a bot was getting the overall material balance wrong, it was still getting most of the trades right, i.e. it would still know that a horse is worth more than a dog. The static evaluation doesn't fail until it comes to trading one strong piece for two smaller ones, which isn't all that common, and anyway we humans are confused about it.

The two of you have jointly convinced me that there are more important things to work on for bots, e.g. strategic understanding, and I expect the same holds true for humans as well.

IP Logged

Pages: 1 2 3 4 5

Notify of replies

Send Topic


« Previous topic \| Next topic »