Author |
Topic: (no) absolute score values for pieces? (Read 39563 times) |
|
99of9
Forum Guru
Gnobby's creator (player #314)
Gender:
Posts: 1413
|
|
Re: (no) absolute score values for pieces?
« Reply #30 on: Oct 9th, 2005, 11:50pm » |
Quote Modify
|
on Oct 9th, 2005, 10:56pm, Fritzlein wrote:Nevertheless, I'll hold off naming it until I think it will work. |
| Your patience shows great wisdom .
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: (no) absolute score values for pieces?
« Reply #31 on: Oct 13th, 2005, 9:06am » |
Quote Modify
|
I looked back over the numbers my system produced, and I have to say I like them reasonably well, except for when one side is missing an elephant. To fix that, I'll just arbitrarily give the top matchup 128 extra points, so the matchups are worth 256, 85, 57, 38, 25, 17, 11, and 7. I still have some quibbles with my evaluations, for example a camel versus five rabbits, but I'm not too worried because (A) such lopsided comparisons rarely arise in practice, and (B) I'm not terribly confident in my intuition of which is better. Anyway, I think I can be wrong in some peripheral ways and still be better than the other systems out there. One critical feature I have that nobody else does is a bias for or against "equal" trades (like a horse for a horse) when there is a material imbalance. This type of decision is important because it comes up all the time: should I capture a piece and allow a capture in return, or should I give up my attack in order to defend? In fact, the broader issue of judging tradeoffs between attack and defense is one of the most important strategic considerations in Arimaa. In my system a player who is behind material will always be penalized for trading, i.e. whoever is behind will be evaluated as further behind after an "equal" trade. Furthermore, when there is an imbalance where it isn't clear who is ahead or behind (e.g. M for HD) my system does roughly the right thing, rewarding quantity over quality as the board empties out. I'm proud that a player who has HD for M in my system will be nearly as eager to trade a pair of horses as to win a rabbit outright. On the other hand, my system also rewards promotion of pieces as higher-ranking pieces disappear, so that a player with H for DR will be eager to trade off camels or even a pair of horses, while averse to trading off cats and rabbits. My system still has some issues with endgames, but at least avoids the blantant over-valuing of camels and horses to which Bomb is prone. In my opinion Bomb over-values camels and horses slightly in the opening and heavily in the endgame, while undervaluing cats at all times. When all else is equal, my system prefers to a cat to a rabbit at any phase of the game. That reminds me to say that one would have to independently heavily penalize the loss of the last rabbit, perhaps making it worth an additional -1000, or minus infinity in games where draws are not allowed. This seems like a bit of a hack compared to other systems, but I think it is worth being a bit ungraceful to avoid the overvaluation of rabbits relative to pieces present in other systems. For example the 99of9 system has ERRR way ahead of ECCR, and I think Bomb does too, but in my opinion ERRR is probably losing! If there is no immediate goal for ERRR, odds are that ECCR will start winning rabbits. (Is this controversial? Maybe I'm wrong about this evaluation...) Well, to summarize, I'm absolutely positive my proposal can be improved upon, but also somewhat optimistic that it is in itself an improvement on previous systems. It would be interesting to compare the same Arimaa playing engine against itself with the same positional values, but with two different material evaluations.
|
|
IP Logged |
|
|
|
99of9
Forum Guru
Gnobby's creator (player #314)
Gender:
Posts: 1413
|
|
Re: (no) absolute score values for pieces?
« Reply #32 on: Oct 13th, 2005, 9:24am » |
Quote Modify
|
Are you ready to name it then? I agree it's looking good. ECCR vs ERRR ... I'm not sure actually, but I haven't played enough games with one rabbit left to have any experience about this.
|
|
IP Logged |
|
|
|
jdb
Forum Guru
Arimaa player #214
Gender:
Posts: 682
|
|
Re: (no) absolute score values for pieces?
« Reply #33 on: Oct 13th, 2005, 11:18am » |
Quote Modify
|
Quote:It would be interesting to compare the same Arimaa playing engine against itself with the same positional values, but with two different material evaluations. |
| Fritzlein, if you want to try out some tests, I can provide you with code to do that. Quote: I think having one rabbit left is sort of a special case. The endgame ER vs e is drawn (assuming we ignore three fold repetition) if the e can pin the R on the edge of the board. So it might be worth considering holding on to at least two rabbits, so the defender has to worry about two things at once. This material imbalance seems interesting, so I'll run some tests and post the results later.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: (no) absolute score values for pieces?
« Reply #34 on: Oct 13th, 2005, 12:43pm » |
Quote Modify
|
on Oct 13th, 2005, 9:24am, 99of9 wrote:Are you ready to name it then? |
| Let's call it the FAME system, for Fritz's Arimaa Material Evaluator. But what will I call it when I tweak the constants again? Maybe FAME can refer to my latest tweaks, and I'll only upgrade the name if I make major changes.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: (no) absolute score values for pieces?
« Reply #35 on: Oct 13th, 2005, 12:50pm » |
Quote Modify
|
on Oct 13th, 2005, 11:18am, jdb wrote: Fritzlein, if you want to try out some tests, I can provide you with code to do that. |
| You mean using Omar's offline match script? I'm all over that. I'll send you a separate e-mail to get the ball rolling. It seems there is a serious issue in integrating material evaluation with positional factors, but maybe you were thinking of stripping down evaluation to only material? Quote: I think having one rabbit left is sort of a special case. |
| Sigh. Probably you are right, and I need a fudge factor both for zero rabbits and only one rabbit left. You engineers don't mind having a few arbitrary contsants, but it bugs the heck out of us mathematicians.
|
« Last Edit: Oct 13th, 2005, 1:01pm by Fritzlein » |
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: (no) absolute score values for pieces?
« Reply #36 on: Oct 13th, 2005, 12:58pm » |
Quote Modify
|
on Oct 13th, 2005, 9:24am, 99of9 wrote: I'm pretty sure that ECCR vs E is won. In fact, if I recall correctly, Adanac found the win over the board in his game of record-setting length. ECCR vs. ERRR is therefore probably very unstable. Barring a quick goal, the weaker side will lose its rabbits and then the game. On the other hand I seem to recall that ECR vs E is drawn, so the weak side might try to grab a cat while letting go of its own rabbits, as long as the enemy rabbit doesn't goal in the mean time.
|
|
IP Logged |
|
|
|
nbarriga
Forum Guru
Almost retired Bot Developer
Gender:
Posts: 119
|
|
Re: (no) absolute score values for pieces?
« Reply #37 on: Oct 13th, 2005, 3:26pm » |
Quote Modify
|
on Oct 13th, 2005, 9:06am, Fritzlein wrote:It would be interesting to compare the same Arimaa playing engine against itself with the same positional values, but with two different material evaluations. |
| I just programmed your proposed eval function, but i encountered some problems. Mi evaluation function is composed of a material and a positional section. I changed the material section, and i'm positive than it is better than the older, but it will be hard to re-balance the scores between the material and positional sections. I'm running some games now at Blitz and Fast speeds, and i will post the results as soon as i have them.
|
|
IP Logged |
|
|
|
nbarriga
Forum Guru
Almost retired Bot Developer
Gender:
Posts: 119
|
|
Re: (no) absolute score values for pieces?
« Reply #38 on: Oct 13th, 2005, 9:03pm » |
Quote Modify
|
The balancing between positional and material is more difficult than i thought, so i will not be able to publish results yet. The current results i have now are very bad for the new proposed eval function. By the way, my current eval function is R=100 C=200 D=300 H=500 M=800 E=2000 If the oponent lost a complete category, the next category of my pieces is worth the average between the category and the one lost. If i'm not making myself clear is because i'm not a native english speaker. An example: If the enemy lost both his dogs, the values for my pieces is: R=150 C=250 D=300 H=500 M=800 E=2000
|
|
IP Logged |
|
|
|
99of9
Forum Guru
Gnobby's creator (player #314)
Gender:
Posts: 1413
|
|
Re: (no) absolute score values for pieces?
« Reply #39 on: Oct 13th, 2005, 9:55pm » |
Quote Modify
|
on Oct 13th, 2005, 9:03pm, nbarriga wrote:By the way, my current eval function is R=100 C=200 D=300 H=500 M=800 E=2000 |
| It's interesting how similar your creation is to the one I suggested a few years ago (at the start of this thread): Quote: Elephant 13 Camel 8 Horse 5 Dog 3 Cat 2 1st Rabbit 1 |
| I think you value the elephant better. But you might like to look back and see what David and I wrote about rabbits - I still think that's quite important.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: (no) absolute score values for pieces?
« Reply #40 on: Oct 14th, 2005, 5:59am » |
Quote Modify
|
on Oct 13th, 2005, 9:03pm, nbarriga wrote:The balancing between positional and material is more difficult than i thought, so i will not be able to publish results yet. The current results i have now are very bad for the new proposed eval function. |
| Thanks for testing it out. I wonder if the FAME system is inaccurate, or if the problem is something else, like balancing it with positional factors. I guess it wouldn't be too surprising to see a drop in performance if suddenly all material was undervalued (or overvalued) relative to positional factors. And I can imagine it is even more complex than that. As pieces are traded, the relative value of the camel goes down in fame, so does that mean the value of a camel hostage should go down? Or if the value of a dog goes up due to trades, should the value of a dog hostage go up too? I am flattered that you considered FAME worth trying out, and it's too bad if it is of no benefit. I do expect that positional factors are far more important than material evaluation, so I'm not too surprised that FAME doesn't help, but I would be disappointed if it couldn't be made to work at least as well as the fixed constants you are using. Ah, well, so is life.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: (no) absolute score values for pieces?
« Reply #41 on: Dec 12th, 2005, 4:40pm » |
Quote Modify
|
OK, I've done further tweaking on my material evaluation function. I realized that I was fairly valuing strong pieces versus weak pieces, but was overvaluing pieces (non-rabbits) relative to rabbits. Here's my improved system: (1) Line up the pieces from strongest to weakest on both sides. If one side has fewer pieces, they must contribute rabbits until all the opposing pieces are matched. The number of matchups will thus be the number of pieces in the army with more pieces. (There will be eight matchups at first, and fewer as pieces are traded.) Any rabbits not involved in the matchups are left over. (2) The values of winning the matchups are, from top to bottom, 256, 85, 57, 38, 25, 17, 11, and 7. (3) The leftover rabbits on each side each score 600/(R+2P) where R and P are respectively the number of rabbits and pieces the opponent has left. (This formula is the bit that changed in order to value rabbits more relative to pieces.) Here are some initial trade values: R free = +34 C free = +50 D free = +67 H free = +105 M free = +190 E free = +446 (This might suggest static piece values of R=1, C=1.5, D=2, H=3.1, M=5.6, E=13.2, but those static values from the opening would somewhat overvalue the big pieces in the mid-game and hugely overvalue the big pieces in the endgame.) C for R = +15 C for RR = -20 D for RR = -3 D for CR = -21 MD for MCR = -17 MHD for MHCR = -11 MHHD for MHHCR = 0 H for D = +38 H for DR = 0 H for DC = -22 M for HD = 0 MH for HHD = -27 M for HRR = 9 M for HH = -57 H for RRR = -2 M for RRRRR = +8 MHDC for HDCRRRRR = -6 E for MH = 114 E for MHH = -80 and some endgames: ER vs. CCR = -29 ERR vs. CCR = +141 ERR vs. CCRR = -29 ERRR vs. ECR = +35 EDR vs. ECCR = -92 EDRR vs. ECCR = 8 EDRRR vs ECCRR = -1 These endgame numbers are much less dodgy than the previous version. Two rabbits are now correctly valued at more than a cat at all times. (Nevertheless FAME values a cat higher than a single rabbit almost all the time, which is a position I maintain in defiance of popular opinion). The new endgame valuations may not be perfect, but now they are at least in the ballpark. Meanwhile the good features from before have been retained, including: *If there has been a trade of M for HD, the side with the camel will be averse to trading horses, while the side with HD will be eager to trade horses. The value of the superficially equal horse trade is actually near the value of losing (winning) a rabbit outright. *In general the side with more numerous pieces would like to trade while the side with stronger pieces would like to avoid trades. However, the D for CR trade, which is initially poor, gets progressively better if M, H, and H are traded, which promotes the dog more than the cat. *As the board empties out, the relative value of rabbits goes up. *The value of a weak piece rises with every stronger opposing piece that disappears, so that a cat in the endgame may be worth what a horse was in the opening. Of course, this is offset by rabbits also becoming much more valuable, so the primary effect is that any remaining strong pieces go down in relative value. JDB, the new trade values don't differ much from the old ones in the opening (only in the endgame), so if you drop the new constants into Clueless, you shouldn't have to retune all the positional factors to match.
|
« Last Edit: Dec 12th, 2005, 4:51pm by Fritzlein » |
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: (no) absolute score values for pieces?
« Reply #42 on: Dec 12th, 2005, 5:20pm » |
Quote Modify
|
Incidentally, after 26 moves in my WC game against Robinson, I had taken HDDC and he had taken MHD. This latest version of FAME values that at +58 for Robinson, and I agree. Demote his horse to a dog, however, and we are dead even, i.e. taking HHDC would have balanced MHD.
|
|
IP Logged |
|
|
|
Adanac
Forum Guru
Arimaa player #892
Gender:
Posts: 635
|
|
Re: (no) absolute score values for pieces?
« Reply #43 on: Dec 13th, 2005, 7:20pm » |
Quote Modify
|
I was wondering whether any bots use rabbit composition percentages for the endgame? That's the intuitive system that I use but I may be the only player that does. For example: 1. EMHHDCR (14% rabbits) 2. EHDRRRR (57% rabbits) 3. ERRRRRR (86% rabbits) Each army has 7 pieces but they range from Muscular -> Balanced -> Lots of Goal Threats I happen to believe that army #2 is better than either #1 or #3 because it has the best Rabbit/Non-Rabbit ratio while possessing a bit of strength with the horse and dog (I wouldn't like it at all with ECCRRRR, though). I adhere to this system more passionately in the endgame, but I also use it in the opening, to some degree. For example, at the beginning of the game, if each side traps one rabbit and then, for the second exchange, the gold cat and a silver rabbit are trapped, I believe that gold has the much better army. For starters it's more balanced (50% rabbits versus 43%) and secondly I'm a big, big fan of advanced rabbits and it doesn't require many piece exchanges before I value rabbits more highly than dogs, never mind cats. However, I find that bots (and humans) have much different opinions of relative piece value than I do, so it wouldn't surprise me if no one else uses or agrees with this philosophy! I once suggested a similar idea to Arimanator and he thought I was nuts (though I did suggest that rabbits were more valuable than cats on the FIRST trade, not the second as in the above example).
|
|
IP Logged |
|
|
|
Ryan_Cable
Forum Guru
Arimaa player #951
Gender:
Posts: 138
|
|
Re: (no) absolute score values for pieces?
« Reply #44 on: Dec 14th, 2005, 3:01am » |
Quote Modify
|
FAME is by far the best material evaluator I have seen. It is the first algorithm that comes anywhere close to being as good as HOTFLAME (Human On The FLy Arimaa Material Evaluation). Thus, I will point out all of the bugs I see in hopes you can make it even better. FAME ignores the non-matchup interactions between pieces: EHCR vs. EMDR = ECCR vs. EMDR = -142 But the former is clearly better than the latter. FAME has problems when one side has no Rs: E vs. CR = E vs. RR = -44 But E vs. CR is usually an infinite move draw (E freezes R, then C must dance around to prevent immobilization), while E vs. RR is usually lost (E freezes R, then R goals). ER vs. EC = -85 EHHDDCCR vs. EMHHDDCC = ERRRRRRR vs. EMHHDDCC = -223 But all situations of this type are >=0. Strictly speaking, you have not defined the score for situations where one side has more pieces than the other has pieces plus Rs. The obvious solution is to specify that piece vs. NULL counts as a wining matchup. This would be fine when both sides have Rs, but it would give EDR vs. EHC = ER vs. EHC = -142 Which is basically a combination of the first two problems. Adanac, I agree with Arimanator, you are nuts! If you really are passionate that 2 is better than 1, send me a postal invite. I will even give you the first move after we finish making the necessary sacrifices. FAME gives: EMHHDCR vs. EHDRRRR = EMHHDCR vs. ERRRRRR = 235.8 EHDRRRR vs. ERRRRRR = 202 I think this is probably too high for the 1 vs. 2 case and probably too low for the 1 vs. 3 case. But I think EMHHDCR vs. EHDRRRR is enough advantage for me to be able to beat you even if you are the true World Champion. However, I would much rather have (in descending order of preference): EMHHDRR vs. EHDRRRR = EMHHCRR vs. EHDRRRR = EMHDCRR vs. EHDRRRR = 225 And I would prefer EMHHRRR vs. EHDRRRR = 190.9 to at least some of those. I think FAME undervalues Rs vs. pieces, when there are many pieces and few Rs. on Dec 13th, 2005, 7:20pm, Adanac wrote:For example, at the beginning of the game, if each side traps one rabbit and then, for the second exchange, the gold cat and a silver rabbit are trapped, I believe that gold has the much better army. For starters it's more balanced (50% rabbits versus 43%) and secondly I'm a big, big fan of advanced rabbits and it doesn't require many piece exchanges before I value rabbits more highly than dogs, never mind cats. |
| There are three places to attempt a goal threat: left flank, right flank, and center. Goal threats in the center are usually weak, and it is rare for one player to have more than 2 goal threats at a time. In goal defense, a C is usually worth >=2R. Thus, I would always be materially happy to trade a R for a piece, when I have >=3R. However, Rs are more effected by positional factors than any other piece. A R that is presenting a latent goal threat can be worth >=C, and a R that is actually threatening goal is often worth >=D.
|
|
IP Logged |
|
|
|
|