Arimaa Forum - Print Page


    
      
        Arimaa Forum
        (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
      

        Arimaa >> General Discussion >> More Material Analysis
        
(Message started by: IdahoEv on Apr 27^th, 2006, 11:49pm)

Title: More Material Analysis
Post by IdahoEv on Apr 27^th, 2006, 11:49pm

I'm trying to develop a more universal material analysis algorithm by empiric study of the games database. Here's what I've got so far.

The material quantification I am using attempts to classify pieces in such a way that no information about their abilities is lost. Trivial points-based analyzers will count cats and dogs differently even when all the opponents' cats and dogs are dead. FAME misses non-matchup interactions between pieces, as when it assigns the same score to EHCemd and ECCemd. So, I've come up with this quantification to solve both problems at once:

Define a material state as up to six piece "levels" per player. Five for officers, one for rabbits. Starting at the most powerful pieces, fill levels downward. Whever you find a piece that can be pushed by any piece in the current level, drop to the next level and begin filling there. Note that since you can't push your own pieces, they remain in the same level if your opponent has no pieces that can push them.

Rabbits always go in their own level (level 6 or R), so if other levels collapse, levels 5, 4, etc. may be left empty.

Thus EHHRRemddccrr looks like this:

L1: E vs e
L2: - vs m
L3: HH vs -
L4: - vs ddcc
L5: (empty)
LR: RR vs rr

I write material classifications in this notation: six digits for white (representing the number of pieces in levels 1 - 5 and R), a hyphen, six digits for black.

So, the starting position is: 112228-112228
A single capture of a black cat: 112228-112218
And EHHRRemddccrr: 102002-110402

I don't yet have a function assigned to these material evaluations, but I have done some analysis on the game database; I'll post it in the next message.

Title: Re: More Material Analysis
Post by IdahoEv on Apr 28^th, 2006, 12:10am

Hmmm... database analysis and statistics coming in a bit. I think I just demonstrated a bug in my code.

The initial capture of a black rabbit (112228-112227) is the most common position in my results, but the convers (112227-112228 ) does not appear at all. That can't possibly be right.

Other mirror combinations, like cat captures (112218-112228 and 112228-112218 ) do appear, in fact those are the 2nd and 3rd most frequent positions recorded.

Title: Re: More Material Analysis
Post by 99of9 on Apr 28^th, 2006, 1:40am

Very clever. I like how you think!

Title: Re: More Material Analysis
Post by Fritzlein on Apr 28^th, 2006, 8:22am

Sounds a bit like what Don Dailey was doing with Occam:

on 09/07/03 at 10:31:43, gern wrote:

In Bot Occam, the values of the pieces vary depending what is on the board. When a piece type goes away for instance, I revalue every piece on the board as if that piece never was a part of the game.

For example, if the dogs go away, there should not be a huge gap between the value of the cats and the horses. It's as if the horses now become DOGS. In arimaa the values of the piece are relative to each other, not like in chess where the value is based more on the power of their moves.

Don

I'm very interested to see the numbers you generate, particular for unbalanced trades. As you say, FAME has holes, so there is a lot of room for improvement.

Title: Re: More Material Analysis
Post by jdb on Apr 28^th, 2006, 8:28am

Nice representation of the material balance.

All I can say, is I hope the first zero is in my opponent's list.

Title: Re: More Material Analysis
Post by IdahoEv on Apr 28^th, 2006, 1:01pm

Okay, I found my bug and re-ran my analysis of the game DB.

Here are the criteria for the games I included:
1) The game was rated
2) The losing player had a rating of at least 1650
3) No takebacks (they confuse my game parser)

I ran through each matching game, and recorded each "stable" material state lasting through 1 full ply, meaning the opponent did not immediately reply with a capture. I recorded the states in 112228-112228 notation and kept a record of how many times each state appeared in a game leading to a white win, and how many for a black win.

The states are not, for now, player-commutative. i.e. 112228-112227 and 112227-112228 are different states, scored separately.

Some basic results:
* 13478 different material states appear in the DB.
* Of those, 7854 appear only once.
* 425 states appear 10 or more times; I will use these to develop my material algorithm.

* Initial rabbit captures lead to wins 62% of the time for either player. (882 and 761 instances)

* An initial cat capture for gold (112228-112218 ) leads to a gold win in 61% of 623 instances.

* An initial cat capture by silver is *more* common (739 instances) ( (112218-112218 ) but leads to a silver win only 51% of the time!

* Initial dog captures lead to wins ~ 65% for either player (334 and 300 instances).

* Someone enjoys tormenting bots: 170008-100000 appears 14 times.

* They can't be bothered as silver, though, 100000-170008 only appears twice.

Anyhow, given a large dataset mapping xxxxxx-xxxxxx > win percentage, I need to use it to generate a material evaluator as a function of twelve variables.

I think the best representation for this is T1...T6 and G1....G6 representing the Total pieces at each level and the Gold net advantage at each level.

Any thoughts of the form of the function I should fit?

Title: Re: More Material Analysis
Post by IdahoEv on Apr 28^th, 2006, 1:23pm

* Initial capture of a horse is about a 70% win for either player. At this point, the state is becoming infrequent enough that the statistics aren't as strong.

* Giving up a horse to capture a camel is about as good as capturing a dog or rabbit; ~65% win for either player.

* Capturing two rabbits (112226-112228) is about a 70% win for silver or 80% win for gold.

* Capturing a camel outright (112228-102228) is only a 73% win.

* But capturing one each dog, cat, and rabbit, (112228-112117) is an 84% win.

Title: Re: More Material Analysis
Post by IdahoEv on Apr 28^th, 2006, 2:13pm

Some fun results!

When I combine the two players, one thing is quite clear: rabbits are worth more than cats in the initial game. Even the very first rabbit is statistically worth more than a cat, and almost as much as a dog. FAME does not value rabbits highly enough. Capturing RR is noticeably better than capturing H.

(in xxxxxx-yyyyyy, the x's are you an the y's are your opponent. I have combined the scores for gold and silver.)

State	FAME	Win %
112228-112227	+1.00	62%
112228-112218	+1.49	55%
112228-112128	+1.99	66%
112228-111228	+3.12	69%
112228-112217	+2.54	74%
112228-112226	+2.02	74%
112228-112225	+3.07	82%
112218-112227	-0.45	55% :-)

Title: Re: More Material Analysis
Post by chessandgo on Apr 28^th, 2006, 2:20pm

Yes, your notation provides a formal background to analyse material ; and it doesn't lose any information, so if there is a truth somewhere it should lie in there.

Trying to guess a function by looking at experiments looks even more difficult than trying to figure it out out of nowhere, but if you manage to do it it will ensure that you'll have a very good result ... It would be a great breakthrough if you achieve it !!!

My idea was that the value of a piece should only depend on the values of the enemy pieces lying in strictly inferior layers ; for instance set the value of the lower level to 1 (I don't take rabbits into account), and for each layer, the value of a piece in this layer is the sum over all stricly weaker pieces of their respective values (or rather of a linear fonction of these).

But experimenting a but with that, I didn't find any values of the 2 constants involved in the linear function which doesn't lead to some contradictory or unsactisfactory results ... So I guess this approach is not enough, and one has to consider more interaction. Too bad because this would have allowed a linear time computation of the material function, and thus wouldn't have slowed down a bot ...

At any rate, a good material evaluator woulde be worth the trouble of a little slowing of a bot.

How do you intent to interpolate a fonction from your data Idaho ?

Good luck with this work !!!!!!

Jean

Title: Re: More Material Analysis
Post by chessandgo on Apr 28^th, 2006, 2:24pm

on 04/28/06 at 14:13:29, IdahoEv wrote:

State	FAME	Win %
112228-112227	+1.00	62%
112228-112218	+1.49	55%
112228-112128	+1.99	66%
112228-111228	+3.12	69%
112228-112217	+2.54	74%
112228-112226	+2.02	74%
112228-112225	+3.07	82%
112218-112227	-0.45	55% :-)

Whow !!! very nice ... I should reconsider the value of a rabbit ... as well as FAME it seems ;)

Title: Re: More Material Analysis
Post by mouse on Apr 28^th, 2006, 2:32pm

on 04/28/06 at 14:13:29, IdahoEv wrote:

I think the reason for this is you are more likely to lose a horse by a blunder than 2 rabbits. So losing 2 rabbits will mean you are outplayed. Losing a horse can mean you are outplayed or you made a blunder. If you lost a horse by a blunder you still have a good chance of a comback against many bots.

Title: Re: More Material Analysis
Post by Fritzlein on Apr 28^th, 2006, 3:15pm

This is extremely interesting analysis, IdahoEv. I hope that the forumla you generate trounces FAME in its accuracy. However, I think you might be slightly jumping the gun with this:

on 04/28/06 at 14:13:29, IdahoEv wrote:

Even the very first rabbit is statistically worth more than a cat, and almost as much as a dog. FAME does not value rabbits highly enough. Capturing RR is noticeably better than capturing H.

I'll concede that FAME probably doesn't value rabbits highly enough, but your other two statements are highly suspect.

Establishing a correlation between winning and capturing an initial rabbit doesn't prove that the rabbit caused the win. It might be that the player who is winning tends to capture a rabbit, i.e. that the causality is reversed.

Consider the statistic posted in another thread, that a player capturing an initial rabbit is 66% likely to make the second capture as well, whereas the player capturing an initial cat is only 59% likely to make the second capture as well. Should we conclude that having one more rabbit than the other guy helps you capture another piece more than having an extra cat helps you? I think that would be a stretch. Much more reasonable is to conclude that the player who gets an initial positional advantage of any type is more likely to realize that advantage through an initial rabbit capture than an intial cat or horse capture. In short, I suspect that in many cases winning an initial rabbit or two is an effect of winning more than a cause.

on 04/28/06 at 14:13:29, IdahoEv wrote:

State	FAME	Win %
112218-112227	-0.45	55% :-)

Now this statistic is much more telling, I admit. It surprises me very much, and makes me hope you investigate further in case you can prove me wrong (not just prove FAME wrong) about the relative worth of a cat and a rabbit in the opening. But a true test of which side is favored in an initial trade of cat for rabbit would be to start one side without the cat, and start the other side without the rabbit, from the very beginning of the game. Then there would be no question of the cat-for-rabbit trade being an effect of winning rather than a cause. A similar test could weigh the value of an initial horse versus two rabbits.

Maybe I (or someone) could set up a bot to play itself a long series of games with a cat-for-rabbit handicap (or horse-for-two-rabbit handicap) built into each game. If that series came out 55% in favor of the side with the extra rabbit(s), even a stubborn guy like me might have to give in.

In the mean time, however, if you let me pick which side to play, I'll take the side with the extra cat or horse every time. (Gee whiz, am I established enough to be the curmudgeonly defender of outdated ideas? I could end up like all those scientists who denounced the big bang theory as ludicrous... :P)

Title: Re: More Material Analysis
Post by jdb on Apr 28^th, 2006, 3:51pm

Quote:

Establishing a correlation between winning and capturing an initial rabbit doesn't prove that the rabbit caused the win. It might be that the player who is winning tends to capture a rabbit, i.e. that the causality is reversed.

Fritzlein, this may seem like a silly question, but why does it matter if "causality is reversed"?

The side that captures the first rabbit wins 62% of the time. Lets assume, for the sake if discussion, this number is accurate. So, if I understand correctly this establishes a correlation. Why does it matter what the "true reason" is for the win? Maybe it might help to know how to make use of the extra rabbit?

In my opinion, there are two different things going on here. 1) Knowing who is winning and 2) Knowing why they are winning.

Title: Re: More Material Analysis
Post by IdahoEv on Apr 28^th, 2006, 4:07pm

It's a fairly simple question of probability: given a certain amount of knowledge, and no other information, who do you predict will win?

Regardless of the reason for a capture of a cat, rabbit, or whatever else, knowing only that tells you a certain amount about the likelihood of that player winning. It may well be that the reason that P(win|captureRR) is higher than P(win|captureH) is because someone who captures RR is likely playing more strongly than someone who captures H because of a certain probability that the other player blundered the H. But, an analysis of nothing but the material states of the board cannot tell the difference between P(win|captureH,no blunder) and P(win|captureH,blunder).

If anyone can define "blunder" strictly enough to make a mathematical measurement, I'd be happy to run an analysis and give you the posterior probabilities. :-)

If the underlying reasons are because of good play strategy overall rather than material advantage, and people & bots start valuing rabbits higher than cats, we will see the probabilities right themselves as the actual play compensates for these underlying functions. That can only be discerned with time. We would probably need another 3k games matching my above conditions before we'd see any major shift in the probabilities

As has been said in other threads - this is a pure material analysis, no other information, and is based only on past games. Given those assumptions, I stand by the numbers, unless I find another bug. :-)

Title: Re: More Material Analysis
Post by IdahoEv on Apr 28^th, 2006, 4:15pm

Another interesting point is that while I treat EHDDRemrr and EHDCRemrr the same (both are 101201-110002), an analysis of sufficient games would probably show a difference between the play styles and possibly outcomes of such equivalent states.

The reason is that (1) humans probably can't completely separate the psychological value dogs and cats from their play functionality, and (2) many of the bots use simple material evaluators that evaluate D and C differently even at times when they are functionally equivalent.

Unfortunately, the density of representation of small groups like these is so poor as yet that I cannot perform any statistics on them.

(I have recorded, for example, only one case of 100004-140007. Silver won, of course.).

Title: Re: More Material Analysis
Post by IdahoEv on Apr 28^th, 2006, 4:18pm

Perhaps another, simpler explanation for the rabbit upset: good players have focussed on pulling rabbits the last couple of years. Therefore rabbit captures indicate likely wins because the players are good, end of story.

I can re-run the analysis with different game selection criteria if anyone is interested. Only humans, only bots, both players > 1750, whatever.

Title: Re: More Material Analysis
Post by Fritzlein on Apr 28^th, 2006, 5:58pm

on 04/28/06 at 15:51:47, jdb wrote:

Fritzlein, this may seem like a silly question, but why does it matter if "causality is reversed"?

It isn't a silly question at all; it's the very center of my argument about correlation not being the same as causality.

I'm not disputing IdahoEv's numbers at all. I accept that there is a greater correlation between winning and capturing an initial rabbit than between winning and capturing an initial cat. I am disputing his conclusion that therefore an inital rabbit capture is worth more. I assume he means that if one player starts a game without a rabbit, that incurs a greater chance of losing than starting without a cat. This conclusion may be true, but I think it is false, and it certainly doesn't follow from the correlation.

Suppose for a moment that cats really are more valuable than rabbits in the opening, and suppose that everyone correctly makes this evaluation. This precise fact could cause (at least in part) the higher correlation between winning and an initial rabbit capture.

Example 1: Consider any game where I get a camel hostage. That means I'm winning. The causal chain is now that I will try to capture something in my other home trap while my opponent is trying to free up his defending elephant. Naturally while he tries to engineer a defense, he will want to lose as little material as possible in the mean time. So he may (correctly!) use a rabbit to unfreeze and retreat any threatened piece, including a cat, so that I only win a rabbit in the mean time.

Note that my advantage caused me to win an initial rabbit, not an initial cat, in part because cats are more valuable than rabbits. Note also that after capturing an initial rabbit, I still have the camel hostage as well, so I have a greater chance of winning from that point than if I had been given a cat capture but nothing else.

Example 2: Suppose that I frame a rabbit in one of my home traps, for no compensation. This means I am winning. Next I try to create a second threat to force material gain. Suppose my second threat is to take a cat hostage, and threaten it with capture. My opponent (correctly!) realizing that a cat is worth more than a rabbit, abandons his framed rabbit to prevent the cat capture. If he waited too long to give up the rabbit, though, I can often frame the cat as well, or keep it as a hostage in a very favorable way.

Note that again my winning caused a rabbit capture (not a cat capture). Note further that, because I have a rabbit capture plus a cat hostage, I have a greater chance of winning than I would have had I captured a cat for nothing.

Example 3: Suppose I somehow manage to get a cat frame right off the bat. Having tied down the opposing elephant, I go hunting for a second threat. My opponent (correctly!) protects his pieces at the expense of letting a rabbit be pulled. When I'm about to take the rabbit, he considers abandoning his cat to save the rabbit, but correctly lets the rabbit go and maneuvers to break the frame of his cat instead. Therefore I win a rabbit.

For the third time, my winning has caused me to capture a rabbit, not a cat. For the third time the position I end up with may be worth more than it would be worth had I started the game with a free cat.

In summary, the higher correlation between winning and initial rabbit captures than between winning an initial cat captures may be due to the fact that cats really are worth more than rabbits, which in turn means that winning positionally causes more initial rabbit captures than it causes initial cat captures.

For the sake of fairness, I must also speculate about what causes a cat capture. It may well be that I can capture a cat because I have a second, bigger threat that must be defended. When this happens winning can also cause cat captures just like it causes rabbit captures. I contend, however, that these cases are less likely than the cases of winning causing a rabbit capture.

Perhaps it is much more common that an initial cat capture is totally isolated from advantage of any other kind. For example it may be that 50% of initial cat captures happen when the capturing player has no other advantage in addition to the cat, whereas only 30% of initial rabbit captures happen when the player has no other advantage in addition to the rabbit.

I'm hope I'm not being too pedantic in my efforts to be clear. The distinction between correlation and causality bedevils all applied statistics. For example, there's a known correlation between abstinence from alcohol and death. People who drink moderately are less like to die at any given time than people who don't drink at all. However, you can't necessarily draw the conclusion that, if you currently abstain from alcohol, taking up moderate drinking will lower your chance of death. The medical studies must first eliminate the possibility of reverse causality. In particular, some people who are near death are absolutely forbidden to drink by their doctors. Even after they stop drinking, they're still likely to die. In this fashion, being likely to die can cause a higher rate of abstinence rather than abstinence causing a higher likelihood of dying. We have to be careful what the correlation means.

Check out http://www.eurekalert.org/pub_releases/2006-03/uoc--isq032706.php

Title: Re: More Material Analysis
Post by Swynndla on Apr 28^th, 2006, 6:06pm

on 04/28/06 at 15:51:47, jdb wrote:

Fritzlein, this may seem like a silly question, but why does it matter if "causality is reversed"?

Quote:

In my opinion, there are two different things going on here. 1) Knowing who is winning and 2) Knowing why they are winning.

Hmmm I'm not sure if I'm interpreting what you are saying correctly, but I though I'd throw my 2c in anyways ...

An extreme, made up example:
Lets say it is shown, that in rated games of players rated 1700 and above, where gold got a rabbit on the 7th rank, that 90% of those games were won by gold.

If cause and effect were not taken into account, then it would be easy to fall into the trap of programming a bot to try really hard to get a rabbit on the 7th, or at least put its rabbit on the 7th given an opportunity. It may do this even at the expense of giving up its horse, at that has a lower % win according to the database.

This would be a serious weakness for the bot, and further more, the evaluation (as a result of the 7th rank analysis) would fail in 1) Knowing who is winning and 2) Knowing why they are winning.

Title: Re: More Material Analysis
Post by IdahoEv on Apr 28^th, 2006, 6:43pm

Fritz's logic is sound, and those effects could account for the difference seen, if they occur with sufficient frequency to mask the underlying, hypothetically larger value of a cat. For Fritz's reasoning to be the only reason that cats are undervalued relative to rats in the DB (assuming their natural 'worth' is greater than the 1st rabbit), a very large fraction of initial rabbit captures would have to be as a result of sacrificing a rabbit to save a cat in one of Fritz' scenarios ... without any large fraction of rabbits being sacrificed to save dogs or horses.

(And no, you're not being pedantic in the slightest. :-)

However, parallel scenarios would exist for sacrificing rabbits in the rescue of dogs and horses. While they might not be sufficient to pull the piece's value (in the P(win|capture) sense) down below that of the rabbit, it should reduce it somewhat. Yet while P(win|captureC) is 55%, P(win|captureD) is 66%, above rabbits and actually very close to horses.

So there's an 11% gap in eventual win probability between the initial capture of cats and dogs, and that implies me that something is going on beyond the analysis done so far.

I'll try to compile some other examples that will possibly help; RR vs CR and so forth.

Title: Re: More Material Analysis
Post by 99of9 on Apr 28^th, 2006, 7:09pm

on 04/28/06 at 13:01:31, IdahoEv wrote:

* An initial cat capture by silver is *more* common (739 instances) ( (112218-112218 ) but leads to a silver win only 51% of the time!

This could be due to people baiting bomb.

Title: Re: More Material Analysis
Post by IdahoEv on Apr 28^th, 2006, 7:13pm

on 04/28/06 at 14:20:54, chessandgo wrote:

How do you intent to interpolate a fonction from your data Idaho ?

I'm not sure yet, it's a lot of degrees of freedom.

A simple neural net/perceptron would have no trouble mapping the 12 variables to a %age, but because the data are sparse it would overfit badly.

I'd like to come up with a simple function in Tn, Gn for each row, using a single constant An, then sum them over all levels. It might include one or more of:
T(n+1 .. 6) (total pieces below this level)
G(n+1 .. 6) (gold advantage at below this level)
T(all) total pieces alive
G(all) gold advantage in total pieces

But in any case, I'd like it to be the same for all non-rabbit rows, differing only by a constant An for each row. Then I only need to fit six total parameters, making it of similar complexity to FAME.

Trouble is, I can't think of an obvious form for the function that will be easy to fit and fast to compute. But I haven't spent much time on it yet, either.

Title: Re: More Material Analysis
Post by chessandgo on Apr 28^th, 2006, 8:50pm

on 04/28/06 at 17:58:05, Fritzlein wrote:

abstinence causing a higher likelihood of dying. We have to be careful what the correlation means.

Another conlusion that might be drawn is : hey buddies, abstinence is bad your health as well as your arimaa strength ! Don't forget your sexual life !
::)

Title: Re: More Material Analysis
Post by IdahoEv on Apr 28^th, 2006, 9:12pm

on 04/28/06 at 20:50:42, chessandgo wrote:

Don't forget your sexual life !

This is something one is capable of "forgetting"?

Title: Re: More Material Analysis
Post by clauchau on May 1^st, 2006, 9:29am

Since some unwanted aspects are biasing statistics on materials in recorded games, how about sampling random board positions among all possible positions having given material states?

At first, positions would get a score like say +1000, -1000 or 0, according to whether they definitely are won, lost, or neither. The material states would average over them.

Then, positions would get the previous scores after having completed a 2-ply minmaxing search. The material states would get updated averages score.

And we repeat that step somehow.