Author |
Topic: More Material Analysis (Read 5433 times) |
|
IdahoEv
Forum Guru
Arimaa player #1753
Gender:
Posts: 405
|
|
More Material Analysis
« on: Apr 27th, 2006, 11:49pm » |
Quote Modify
|
I'm trying to develop a more universal material analysis algorithm by empiric study of the games database. Here's what I've got so far. The material quantification I am using attempts to classify pieces in such a way that no information about their abilities is lost. Trivial points-based analyzers will count cats and dogs differently even when all the opponents' cats and dogs are dead. FAME misses non-matchup interactions between pieces, as when it assigns the same score to EHCemd and ECCemd. So, I've come up with this quantification to solve both problems at once: Define a material state as up to six piece "levels" per player. Five for officers, one for rabbits. Starting at the most powerful pieces, fill levels downward. Whever you find a piece that can be pushed by any piece in the current level, drop to the next level and begin filling there. Note that since you can't push your own pieces, they remain in the same level if your opponent has no pieces that can push them. Rabbits always go in their own level (level 6 or R), so if other levels collapse, levels 5, 4, etc. may be left empty. Thus EHHRRemddccrr looks like this: L1: E vs e L2: - vs m L3: HH vs - L4: - vs ddcc L5: (empty) LR: RR vs rr I write material classifications in this notation: six digits for white (representing the number of pieces in levels 1 - 5 and R), a hyphen, six digits for black. So, the starting position is: 112228-112228 A single capture of a black cat: 112228-112218 And EHHRRemddccrr: 102002-110402 I don't yet have a function assigned to these material evaluations, but I have done some analysis on the game database; I'll post it in the next message.
|
« Last Edit: Apr 28th, 2006, 12:23am by IdahoEv » |
IP Logged |
|
|
|
IdahoEv
Forum Guru
Arimaa player #1753
Gender:
Posts: 405
|
|
Re: More Material Analysis
« Reply #1 on: Apr 28th, 2006, 12:10am » |
Quote Modify
|
Hmmm... database analysis and statistics coming in a bit. I think I just demonstrated a bug in my code. The initial capture of a black rabbit (112228-112227) is the most common position in my results, but the convers (112227-112228 ) does not appear at all. That can't possibly be right. Other mirror combinations, like cat captures (112218-112228 and 112228-112218 ) do appear, in fact those are the 2nd and 3rd most frequent positions recorded.
|
« Last Edit: Apr 28th, 2006, 12:12am by IdahoEv » |
IP Logged |
|
|
|
99of9
Forum Guru
Gnobby's creator (player #314)
Gender:
Posts: 1413
|
|
Re: More Material Analysis
« Reply #2 on: Apr 28th, 2006, 1:40am » |
Quote Modify
|
Very clever. I like how you think!
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: More Material Analysis
« Reply #3 on: Apr 28th, 2006, 8:22am » |
Quote Modify
|
Sounds a bit like what Don Dailey was doing with Occam: on Sep 7th, 2003, 10:31am, gern wrote:In Bot Occam, the values of the pieces vary depending what is on the board. When a piece type goes away for instance, I revalue every piece on the board as if that piece never was a part of the game. For example, if the dogs go away, there should not be a huge gap between the value of the cats and the horses. It's as if the horses now become DOGS. In arimaa the values of the piece are relative to each other, not like in chess where the value is based more on the power of their moves. Don |
| I'm very interested to see the numbers you generate, particular for unbalanced trades. As you say, FAME has holes, so there is a lot of room for improvement.
|
« Last Edit: Apr 28th, 2006, 8:23am by Fritzlein » |
IP Logged |
|
|
|
jdb
Forum Guru
Arimaa player #214
Gender:
Posts: 682
|
|
Re: More Material Analysis
« Reply #4 on: Apr 28th, 2006, 8:28am » |
Quote Modify
|
Nice representation of the material balance. All I can say, is I hope the first zero is in my opponent's list.
|
|
IP Logged |
|
|
|
IdahoEv
Forum Guru
Arimaa player #1753
Gender:
Posts: 405
|
|
Re: More Material Analysis
« Reply #5 on: Apr 28th, 2006, 1:01pm » |
Quote Modify
|
Okay, I found my bug and re-ran my analysis of the game DB. Here are the criteria for the games I included: 1) The game was rated 2) The losing player had a rating of at least 1650 3) No takebacks (they confuse my game parser) I ran through each matching game, and recorded each "stable" material state lasting through 1 full ply, meaning the opponent did not immediately reply with a capture. I recorded the states in 112228-112228 notation and kept a record of how many times each state appeared in a game leading to a white win, and how many for a black win. The states are not, for now, player-commutative. i.e. 112228-112227 and 112227-112228 are different states, scored separately. Some basic results: * 13478 different material states appear in the DB. * Of those, 7854 appear only once. * 425 states appear 10 or more times; I will use these to develop my material algorithm. * Initial rabbit captures lead to wins 62% of the time for either player. (882 and 761 instances) * An initial cat capture for gold (112228-112218 ) leads to a gold win in 61% of 623 instances. * An initial cat capture by silver is *more* common (739 instances) ( (112218-112218 ) but leads to a silver win only 51% of the time! * Initial dog captures lead to wins ~ 65% for either player (334 and 300 instances). * Someone enjoys tormenting bots: 170008-100000 appears 14 times. * They can't be bothered as silver, though, 100000-170008 only appears twice. Anyhow, given a large dataset mapping xxxxxx-xxxxxx > win percentage, I need to use it to generate a material evaluator as a function of twelve variables. I think the best representation for this is T1...T6 and G1....G6 representing the Total pieces at each level and the Gold net advantage at each level. Any thoughts of the form of the function I should fit?
|
|
IP Logged |
|
|
|
IdahoEv
Forum Guru
Arimaa player #1753
Gender:
Posts: 405
|
|
Re: More Material Analysis
« Reply #6 on: Apr 28th, 2006, 1:23pm » |
Quote Modify
|
* Initial capture of a horse is about a 70% win for either player. At this point, the state is becoming infrequent enough that the statistics aren't as strong. * Giving up a horse to capture a camel is about as good as capturing a dog or rabbit; ~65% win for either player. * Capturing two rabbits (112226-11222 is about a 70% win for silver or 80% win for gold. * Capturing a camel outright (112228-10222 is only a 73% win. * But capturing one each dog, cat, and rabbit, (112228-112117) is an 84% win.
|
|
IP Logged |
|
|
|
IdahoEv
Forum Guru
Arimaa player #1753
Gender:
Posts: 405
|
|
Re: More Material Analysis
« Reply #7 on: Apr 28th, 2006, 2:13pm » |
Quote Modify
|
Some fun results! When I combine the two players, one thing is quite clear: rabbits are worth more than cats in the initial game. Even the very first rabbit is statistically worth more than a cat, and almost as much as a dog. FAME does not value rabbits highly enough. Capturing RR is noticeably better than capturing H. (in xxxxxx-yyyyyy, the x's are you an the y's are your opponent. I have combined the scores for gold and silver.) State | FAME | Win % | 112228-112227 | +1.00 | 62% | 112228-112218 | +1.49 | 55% | 112228-112128 | +1.99 | 66% | 112228-111228 | +3.12 | 69% | 112228-112217 | +2.54 | 74% | 112228-112226 | +2.02 | 74% | 112228-112225 | +3.07 | 82% | 112218-112227 | -0.45 | 55% |
|
|
IP Logged |
|
|
|
chessandgo
Forum Guru
Arimaa player #1889
Gender:
Posts: 1244
|
|
Re: More Material Analysis
« Reply #8 on: Apr 28th, 2006, 2:20pm » |
Quote Modify
|
Yes, your notation provides a formal background to analyse material ; and it doesn't lose any information, so if there is a truth somewhere it should lie in there. Trying to guess a function by looking at experiments looks even more difficult than trying to figure it out out of nowhere, but if you manage to do it it will ensure that you'll have a very good result ... It would be a great breakthrough if you achieve it !!! My idea was that the value of a piece should only depend on the values of the enemy pieces lying in strictly inferior layers ; for instance set the value of the lower level to 1 (I don't take rabbits into account), and for each layer, the value of a piece in this layer is the sum over all stricly weaker pieces of their respective values (or rather of a linear fonction of these). But experimenting a but with that, I didn't find any values of the 2 constants involved in the linear function which doesn't lead to some contradictory or unsactisfactory results ... So I guess this approach is not enough, and one has to consider more interaction. Too bad because this would have allowed a linear time computation of the material function, and thus wouldn't have slowed down a bot ... At any rate, a good material evaluator woulde be worth the trouble of a little slowing of a bot. How do you intent to interpolate a fonction from your data Idaho ? Good luck with this work !!!!!! Jean
|
|
IP Logged |
|
|
|
chessandgo
Forum Guru
Arimaa player #1889
Gender:
Posts: 1244
|
|
Re: More Material Analysis
« Reply #9 on: Apr 28th, 2006, 2:24pm » |
Quote Modify
|
on Apr 28th, 2006, 2:13pm, IdahoEv wrote: State | FAME | Win % | 112228-112227 | +1.00 | 62% | 112228-112218 | +1.49 | 55% | 112228-112128 | +1.99 | 66% | 112228-111228 | +3.12 | 69% | 112228-112217 | +2.54 | 74% | 112228-112226 | +2.02 | 74% | 112228-112225 | +3.07 | 82% | 112218-112227 | -0.45 | 55% | |
| Whow !!! very nice ... I should reconsider the value of a rabbit ... as well as FAME it seems
|
|
IP Logged |
|
|
|
mouse
Forum Senior Member
Arimaa player #784
Gender:
Posts: 45
|
|
Re: More Material Analysis
« Reply #10 on: Apr 28th, 2006, 2:32pm » |
Quote Modify
|
on Apr 28th, 2006, 2:13pm, IdahoEv wrote:Some fun results! When I combine the two players, one thing is quite clear: rabbits are worth more than cats in the initial game. Even the very first rabbit is statistically worth more than a cat, and almost as much as a dog. FAME does not value rabbits highly enough. Capturing RR is noticeably better than capturing H. |
| I think the reason for this is you are more likely to lose a horse by a blunder than 2 rabbits. So losing 2 rabbits will mean you are outplayed. Losing a horse can mean you are outplayed or you made a blunder. If you lost a horse by a blunder you still have a good chance of a comback against many bots.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: More Material Analysis
« Reply #11 on: Apr 28th, 2006, 3:15pm » |
Quote Modify
|
This is extremely interesting analysis, IdahoEv. I hope that the forumla you generate trounces FAME in its accuracy. However, I think you might be slightly jumping the gun with this: on Apr 28th, 2006, 2:13pm, IdahoEv wrote:Even the very first rabbit is statistically worth more than a cat, and almost as much as a dog. FAME does not value rabbits highly enough. Capturing RR is noticeably better than capturing H. |
| I'll concede that FAME probably doesn't value rabbits highly enough, but your other two statements are highly suspect. Establishing a correlation between winning and capturing an initial rabbit doesn't prove that the rabbit caused the win. It might be that the player who is winning tends to capture a rabbit, i.e. that the causality is reversed. Consider the statistic posted in another thread, that a player capturing an initial rabbit is 66% likely to make the second capture as well, whereas the player capturing an initial cat is only 59% likely to make the second capture as well. Should we conclude that having one more rabbit than the other guy helps you capture another piece more than having an extra cat helps you? I think that would be a stretch. Much more reasonable is to conclude that the player who gets an initial positional advantage of any type is more likely to realize that advantage through an initial rabbit capture than an intial cat or horse capture. In short, I suspect that in many cases winning an initial rabbit or two is an effect of winning more than a cause. on Apr 28th, 2006, 2:13pm, IdahoEv wrote:State | FAME | Win % | 112218-112227 | -0.45 | 55% | |
| Now this statistic is much more telling, I admit. It surprises me very much, and makes me hope you investigate further in case you can prove me wrong (not just prove FAME wrong) about the relative worth of a cat and a rabbit in the opening. But a true test of which side is favored in an initial trade of cat for rabbit would be to start one side without the cat, and start the other side without the rabbit, from the very beginning of the game. Then there would be no question of the cat-for-rabbit trade being an effect of winning rather than a cause. A similar test could weigh the value of an initial horse versus two rabbits. Maybe I (or someone) could set up a bot to play itself a long series of games with a cat-for-rabbit handicap (or horse-for-two-rabbit handicap) built into each game. If that series came out 55% in favor of the side with the extra rabbit(s), even a stubborn guy like me might have to give in. In the mean time, however, if you let me pick which side to play, I'll take the side with the extra cat or horse every time. (Gee whiz, am I established enough to be the curmudgeonly defender of outdated ideas? I could end up like all those scientists who denounced the big bang theory as ludicrous... )
|
|
IP Logged |
|
|
|
jdb
Forum Guru
Arimaa player #214
Gender:
Posts: 682
|
|
Re: More Material Analysis
« Reply #12 on: Apr 28th, 2006, 3:51pm » |
Quote Modify
|
Quote:Establishing a correlation between winning and capturing an initial rabbit doesn't prove that the rabbit caused the win. It might be that the player who is winning tends to capture a rabbit, i.e. that the causality is reversed. |
| Fritzlein, this may seem like a silly question, but why does it matter if "causality is reversed"? The side that captures the first rabbit wins 62% of the time. Lets assume, for the sake if discussion, this number is accurate. So, if I understand correctly this establishes a correlation. Why does it matter what the "true reason" is for the win? Maybe it might help to know how to make use of the extra rabbit? In my opinion, there are two different things going on here. 1) Knowing who is winning and 2) Knowing why they are winning.
|
|
IP Logged |
|
|
|
IdahoEv
Forum Guru
Arimaa player #1753
Gender:
Posts: 405
|
|
Re: More Material Analysis
« Reply #13 on: Apr 28th, 2006, 4:07pm » |
Quote Modify
|
It's a fairly simple question of probability: given a certain amount of knowledge, and no other information, who do you predict will win? Regardless of the reason for a capture of a cat, rabbit, or whatever else, knowing only that tells you a certain amount about the likelihood of that player winning. It may well be that the reason that P(win|captureRR) is higher than P(win|captureH) is because someone who captures RR is likely playing more strongly than someone who captures H because of a certain probability that the other player blundered the H. But, an analysis of nothing but the material states of the board cannot tell the difference between P(win|captureH,no blunder) and P(win|captureH,blunder). If anyone can define "blunder" strictly enough to make a mathematical measurement, I'd be happy to run an analysis and give you the posterior probabilities. If the underlying reasons are because of good play strategy overall rather than material advantage, and people & bots start valuing rabbits higher than cats, we will see the probabilities right themselves as the actual play compensates for these underlying functions. That can only be discerned with time. We would probably need another 3k games matching my above conditions before we'd see any major shift in the probabilities As has been said in other threads - this is a pure material analysis, no other information, and is based only on past games. Given those assumptions, I stand by the numbers, unless I find another bug.
|
|
IP Logged |
|
|
|
IdahoEv
Forum Guru
Arimaa player #1753
Gender:
Posts: 405
|
|
Re: More Material Analysis
« Reply #14 on: Apr 28th, 2006, 4:15pm » |
Quote Modify
|
Another interesting point is that while I treat EHDDRemrr and EHDCRemrr the same (both are 101201-110002), an analysis of sufficient games would probably show a difference between the play styles and possibly outcomes of such equivalent states. The reason is that (1) humans probably can't completely separate the psychological value dogs and cats from their play functionality, and (2) many of the bots use simple material evaluators that evaluate D and C differently even at times when they are functionally equivalent. Unfortunately, the density of representation of small groups like these is so poor as yet that I cannot perform any statistics on them. (I have recorded, for example, only one case of 100004-140007. Silver won, of course.).
|
|
IP Logged |
|
|
|
|