Author |
Topic: (no) absolute score values for pieces? (Read 41015 times) |
|
Janzert
Forum Guru
Arimaa player #247
Gender:
Posts: 1016
|
|
Re: (no) absolute score values for pieces?
« Reply #75 on: Jun 8th, 2009, 8:20pm » |
Quote Modify
|
Added to my page; I think I've got it correct. aaaa, let me know if you see it producing any errors. A few samples (all scores normalized to an initial rabbit): E vs mhd = 0.16 M vs hcr = 0.74 H vs dc = 0.41 D vs rr = even C vs rr = 0.73 Janzert P.S. As long as 99 doesn't mind I really should reintegrate all of 99's work adding the different systems back into my page.
|
« Last Edit: Jun 8th, 2009, 8:21pm by Janzert » |
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: (no) absolute score values for pieces?
« Reply #76 on: Jun 9th, 2009, 5:30am » |
Quote Modify
|
Thanks Janzert. One case where the aaaa evaluation impresses me is the camel for horse trade. What is thrilling is not the exact number (1.90) by which the camel is worth more than the horse, but rather that the "even" trade of a pair of horses knocks down the advantage to 0.95. This is strikingly close to my intuition that, after an initial M for H trade an addition horse swap is worth about a rabbit. Kudos to bot_quad for knowing which side will benefit from this "even" trade, and by how much.
|
|
IP Logged |
|
|
|
aaaa
Forum Guru
Arimaa player #958
Posts: 768
|
|
Re: (no) absolute score values for pieces?
« Reply #78 on: Jun 15th, 2009, 10:49am » |
Quote Modify
|
If one considers the formula distinguishable enough to merit its own name, I was thinking of calling it "HarLog", a reference to the harmonic and logarithmic components of the system. Of course, that would be a blend rather than an acronym.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: (no) absolute score values for pieces?
« Reply #79 on: Jun 15th, 2009, 12:54pm » |
Quote Modify
|
A catchy name was all that HarLog was missing to take its place as the premiere material evaluation function. Perhaps you could even call it HarmLog for the benefit of the word play (i.e. a journal of the damage suffered by each side ). Apparently FAME is now obsolete. Yes, there are a few corner cases where my intuition agrees more with FAME than with HarLog (for example an initial trade of E for DCCRRRRRRR), but the main meat-and-potatoes exchanges that happen all the time seem to be handled a little bit better across the board by HarLog.
|
« Last Edit: Jun 15th, 2009, 12:55pm by Fritzlein » |
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: (no) absolute score values for pieces?
« Reply #80 on: Jun 15th, 2009, 1:16pm » |
Quote Modify
|
By the way, aaaa, I recall our discussion in which I maintained that two rabbits might always be worth more than a cat, and you proposed EC6R vs E8R as a possible counter-example. I notice now that all four functions on Janzert's page disagree with me and prefer the cat to the two rabbits. I'm afraid that my endgame play is so weak that the disagreement reflects badly on my intuition rather than reflecting badly on the (unanimous) material functions.
|
|
IP Logged |
|
|
|
aaaa
Forum Guru
Arimaa player #958
Posts: 768
|
|
Re: (no) absolute score values for pieces?
« Reply #81 on: Jun 16th, 2009, 3:41am » |
Quote Modify
|
On the contrary, it was the very fact that you treated the question of which side would be better in this situation as being very much open to discussion that made me settle on the current incarnation of the function with its lack of lopsided evaluation for one side or the other (not much more than the advantage of an initial rabbit).
|
« Last Edit: Jun 16th, 2009, 6:01am by aaaa » |
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: (no) absolute score values for pieces?
« Reply #82 on: Jul 3rd, 2009, 7:09am » |
Quote Modify
|
on Jul 2nd, 2009, 10:39am, Janzert wrote:I've finally finished integrating 99of9's additional evaluation systems back into the page and also added piece images. In doing so I used a few css and javascript techniques I've not used before so there is certainly a possibility for browser compatibility problems. The location was also changed to better reflect that it shows multiple evaluation systems instead of just FAME. The new url is http://arimaa.janzert.com/eval.html (there is a redirect setup for the old url so old links should continue to work for now). Let me know if you see any errors or have another eval system you'd like to see added. Janzert |
| Thanks, Janzert. You inspired me to try out some more material states to see how the evaluators agree with my intuition. I noticed something interesting about the M for HD trade. I had seen before that HarmLog gives a significant edge to HD over M, whereas FAME and DAPE put it about even. I think my intuition is somewhere in between, so it wasn't much worth commenting on. But what I just now noticed is the effect of further removing CC from each side. My intuition is that even trades are slightly disadvantageous to the M side, because the more the board thins out, the more important sheer numbers are relative to having the strongest pieces. FAME and DAPE both agree with me, slightly preferring HD over M after a trade of CC. HarmLog, in contrast, thinks the trade helps the M side and reduces the advantage of the HD side. Of course, my intuition about this case could well be wrong, as it has been wrong about other cases in the past. If quad enters the 2010 Computer Championship, it will be interesting to see it in action against FAME bots, because there will be some equal trades that both sides will be angling for.
|
|
IP Logged |
|
|
|
Hippo
Forum Guru
Arimaa player #4450
Gender:
Posts: 883
|
|
Re: (no) absolute score values for pieces?
« Reply #83 on: Feb 17th, 2010, 2:09am » |
Quote Modify
|
Fritzlein is much further with his bot_Nimrod than me with bot_Hippo... I like both FAME and HarmLog evaluations. FAME is good in positions where you could put equally "ordered" pieces next one to the other, but as said it suffers in EHx against mdx or ccx giving equal results. I would probably go for something in between (considering not only equally ordered matches, but giving equally ordered matches more weigh). I am planning to add a "quiet" multiplication factor to material evaluation not to be fixed too much on material in races. How to compute the factor is another question. Last joke: HHCx against mccx is not as good as HDCx at least theoretically the difference is the 3 repetitions rule. It would be difficult to find an example when it will change the game result so not including it in material evaluation would rarely cause any harm. Hmm it would need a lot of work to set coefficiens well. So far I have: 1) the FAME rabbit evaluation 2) let SGNi,j is result of comparison of i-th strongest gold piece with j-th strongest silver piece. I have matrix of coefficients Ci,j. Sum of coordinate-wise multiplication of these two matrices is second summand. 3) last summand contains 1 for each presented stone type and 10000 for presented rabbit (added for gold and subtracted for silver). C is symmetric: first try (250 27 10 3 1 0 0 0) ( 27 90 19 7 2 0 0 0) ( 10 19 60 13 6 1 0 0) ( 3 7 13 40 9 2 0 0) ( 1 2 6 9 30 3 1 0) ( 0 0 1 2 3 10 3 1) ( 0 0 0 0 1 3 10 3) ( 0 0 0 0 0 1 3 7) BTW: Having C diagonal with diagonal 256 85 57 38 25 17 11 7 gives original FAME (first 2 summands). May be I had to start much nearer to this matrix. So Hippo Adjusted Fritz Arimaa Material Evaluation could be good name for it (HAFAME). I suppose the C matrix would be changed either for improving the evaluation or for speedup reasons to make the computation easier.
|
« Last Edit: Feb 17th, 2010, 3:27pm by Hippo » |
IP Logged |
|
|
|
Hippo
Forum Guru
Arimaa player #4450
Gender:
Posts: 883
|
|
Re: (no) absolute score values for pieces?
« Reply #84 on: Feb 22nd, 2010, 12:31pm » |
Quote Modify
|
on Mar 18th, 2006, 4:49pm, Janzert wrote:Hmm, I must have something wrong or FAME's a little wierder than I thought. Try EMHHDDCCR vs EM and a variable number of R's. For 8-6 silver r's the score goes up for gold as expected. But then 5-0 r's the score actually decreases for gold with each additional r captured. FYI, I'm using the modification (clarification?) by Fritzlein to allow negative rabbits left over. Janzert |
| It seems to me gold score increases by each captured silver rabbit even in this case. So no need for fixed 30 points for "negative rabbits" (33 FAME (flored) points for initial captured rabbit is the minimal dynamics rabit value). on Mar 18th, 2006, 7:45pm, Fritzlein wrote: Here's a genuine weirdness with FAME: EMHHDDCCR vs. ER = +652 EHHDDCCR vs. ER = +640 It only lowers Gold's evaluation by 12 points to throw away a camel. Although it leaves Silver with only -5 rabbits instead of -6, it simultaneously weakens Gold's defense from 17 to 15, and against negative offense, FAME thinks a weaker defense is better! To stop that silliness, each negative leftover rabbit should have a fixed value of, let's say, 40 points to the other team regardless of the size of the larger army. Not that it matters much, but why not patch holes that are easy to patch? There will still be enough unpatchable holes left. |
| Oh so this was the reason ... I am thinking of following "rabbits" scoring ... take difference of number of pieces (including rabbits) side with more pieces obtains points say the difference * 600/(opponent defense = 2*nonrabbits+rabbits). It seems to me HarmLog overvalues the rabbits ... the log part is much higher than the other part.
|
« Last Edit: Feb 22nd, 2010, 12:42pm by Hippo » |
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: (no) absolute score values for pieces?
« Reply #85 on: Feb 22nd, 2010, 6:20pm » |
Quote Modify
|
Ah, a matrix is a nice way to value non-direct comparisons, and it seems quite relevant. I look forward to further iterations of HAFAME once the theory meets reality.
|
|
IP Logged |
|
|
|
Hippo
Forum Guru
Arimaa player #4450
Gender:
Posts: 883
|
|
Re: (no) absolute score values for pieces?
« Reply #86 on: Feb 24th, 2010, 12:55pm » |
Quote Modify
|
I have used matrix just as an easy way to describe the algorithm. Of course it would be hard coded. Important thing is that only top left corner bounded by maximal "nonrabbit pieces size" is used. (Probably jump table would be used for this matrix cut, fastest implementation of score += sgn (a-b) * Ci,j would be score' += (b-a)&C'i,j, score' -= (a-b)&C'i,j where ' values are shifted 3 bits left ... no conditional jumps). With the other rabbit evaluation ... I would probably end with HAME acronym. ... I am not sure it's OK but initial exchange of 2 horses for 5 rabbits is considered advantage for player with more pieces in HAME. Actually I use more FAME like (213, 21, 4, 0, ...) ( 21, 54, 13, 2, 0, ...) ( 4, 13, 40, 7, 1, 0, ...) ( 0, 2, 7, 27, 4, 1, 0, ...) ( 0, 0, 1, 4, 18, 3, 0, ...) ( 0, 0, 0, 1, 3, 12, 2, 0) ( 0, 0, 0, 0, 0, 2, 7, 1) ( 0, 0, 0, 0, 0, 0, 1, 6)
|
« Last Edit: Feb 24th, 2010, 1:12pm by Hippo » |
IP Logged |
|
|
|
aaaa
Forum Guru
Arimaa player #958
Posts: 768
|
|
Re: (no) absolute score values for pieces?
« Reply #87 on: Feb 24th, 2010, 3:29pm » |
Quote Modify
|
I was asked whether there is a theoretical justification for my material evaluation function and I can offer a partial one in the form of the fact that it contains two degrees of freedom; I consider this to be ideal on account of the fact that this number corresponds nicely with the three (main?) dimensions by which an army can be measured: the quantity of the pieces, their quality and their goal potential. I encourage anyone to suggest alternative parameter values for HarLog if the outputted values feel off in any of these respects. Rabbits overvalued, you say? Try lowering 'G' and see what values you get then. I'd personally be wary of adding several parameters on a whim though, because that runs the risk of overfitting.
|
|
IP Logged |
|
|
|
Hippo
Forum Guru
Arimaa player #4450
Gender:
Posts: 883
|
|
Re: (no) absolute score values for pieces?
« Reply #88 on: Feb 25th, 2010, 7:38am » |
Quote Modify
|
Sorry aaaa, I have had problems with log(0) term , but it was bad interpretation ... multiplied by friendly nonrabbits instead of all friendly pieces. Actually when there is no log(0) problem, I valuate rabbits more ... (C6R/d3r), (C6R/dcr), (6R/c2r) or (2C7R/hdr) are advantages for gold in HAME and for silver in HAR(M)LOG (if I don't have a bug there). It's good there are several evaluation functions , I am not sure with current "co"processors speed. Causes the often divisions and the logarithm problem with speed? Follows stronger advantages for gold in HAME and stronger advantages for silver in HAR(M)LOG (2HD2C8R/m2hr) and (2H2D2C7R/m2hr). Similar comparison HAMExFAME (but roughly 7 times smaller advantages): (MD2C8R/eh2dcr) (M2DC8R/e2hdcr) (M2DC8R/e2h2dr) (MH2C8R/em2dcr) (MH2D2C6R/em2hdr) HAR(M)LOGxFAME: (M2HR/2hd2c8r) (M2HR/2h2d2c8r) Both (2DC8R/em2hr) (2H2D2CR/emhr) are considered slight advantage for gold in HAME, but I am really not sure with that. In the former case ... does silver have enough pieces to take control of both home traps and prevent goaling? In the later case ... it seems to me silver can go for elimination and the lot of weak nonrabbit pieces does not help gold.
|
« Last Edit: Apr 27th, 2013, 7:22am by Hippo » |
IP Logged |
|
|
|
jdb
Forum Guru
Arimaa player #214
Gender:
Posts: 682
|
|
Re: (no) absolute score values for pieces?
« Reply #89 on: Jul 28th, 2010, 2:13pm » |
Quote Modify
|
I have been doing some tests with the various material evaluators using Janzert's roundrobin program. The games are 10 sec per move, so a game takes around 10 minutes. I'll post the results when there are enough games. Assume there is a H for d trade. Who benefits from equal trades? The side with the extra H benefits from equal trades of camels or horses. This gets them closer to having the strongest piece. What about equal trades of dogs, cats or rabbits? Assume one side has an extra rabbit. Who benefits from equals trades? Trading rabbits eventually leads to 2 rabbits vs 1 rabbit. The extra rabbit becomes a huge advantage. What about trading cats, dogs, horses or camels? Eventually this leads to E vs e with an extra rabbit. This also looks like a big advantage for the extra rabbit. Assume one side has an extra camel. Who benefits from equal trades? Fame/Harlog puts the initial advantage at 5.64/6.48. With everything but the rabbits traded off, leaving EM8R vs E8R, Fame/Harlog is 6.38/3.80. Finally EMR vs er, FAME/Harlog is 8.46/5.31. This looks like an area for improvement. What is correct in this case? The reason I am asking about this, is I was hoping to come up with a set of basic tests a material evaluator needs to pass. Things like a camel is worth more than a horse etc. It takes alot of time to get enough test games and I wanted a way to filter the different evaluators.
|
|
IP Logged |
|
|
|
|