Arimaa Forum (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
Arimaa >> Bot Development >> (no) absolute score values for pieces?
(Message started by: clauchau on Aug 27th, 2003, 2:42pm)

Title: (no) absolute score values for pieces?
Post by clauchau on Aug 27th, 2003, 2:42pm
If we pursue the chess tradition and evaluate positions by summing up individual scores assigned to every piece on the board, weighted according to positional factors, should we look for absolute values like rabbits are worth 1, cats 1.8, ..., camels 7, elephants 10?

It can't be that simple, because trading the two camels is exactly the same as trading the two elephants for example - the Silver and White players are left with the same power. Absolute static scores would mislead a bot into favoring one trade over the other and overlooking some other advantage.

I vote for a non-materialistic approach! Any taker?

Claude

Title: Re: (no) absolute score values for pieces?
Post by clauchau on Aug 28th, 2003, 4:26am
I wrote:

Quote:
trading the two camels is exactly the same as trading the two elephants


Hmm, well, this actually is no valid argument against individual static scores for the animals, I'm sorry. Other things being equal, the difference of total scores would exactly be the same and wouldn't show any unwanted artifact as I first suggested.

Claude

Title: Re: (no) absolute score values for pieces?
Post by clauchau on Aug 28th, 2003, 4:56am

Quote:
rabbits are worth 1, cats 1.8, ..., camels 7, elephants 10?


Aha, a valid case against those values is when your opponent is missing two adjacent species - for example cats and dogs. Whether your are left with one of them doesn't make any difference then. A cat or a dog are worth the same in the absence of opposing animals of the same rank.

A naive materialistic bot would think there is a difference.

Claude

Title: Re: (no) absolute score values for pieces?
Post by 99of9 on Aug 29th, 2003, 11:22am
I think rabbit's are an interesting case in valuation.  When you have 8 rabbits, losing one is no big deal, but when you have only one, then it is often the most valuable piece on the board!

Even valuing the rabbit's position is extremely difficult, because it really depends what's in front of it, not how far up the board it has made it so far.

PS nice forum omar  ;)

Title: The 99 System
Post by 99of9 on Aug 29th, 2003, 12:36pm
This scoring system shall forever hereafter be referred to as The 99 System :)

Piece Value
Elephant 13
Camel 8
Horse 5
Dog 3
Cat 2
1st Rabbit 1
2nd Rabbit 2
3rd Rabbit 3
4th Rabbit 4
5th Rabbit 5
6th Rabbit 6
7th Rabbit 7
8th Rabbit 99


Each side in Arimaa starts the game with a total material value of 168. This includes 41 for all the noble animals, 28 for the first 7 rabbits, and 99 for the final rabbit.

This system is especially designed to value material at the start of the game.

For example (the first four of these examples, being equalities, also apply vice versa):

  • I would be happy to start a game without my elephant (13), if you gave up both your camel and a horse (8+5).
  • I would be happy to start a game without my camel ( 8 ), if you gave up both a horse and a dog (5+3).
  • I would be happy to start a game without one horse (5), if you gave up both a dog and a cat (3+2).
  • I would be happy to start a game without one dog (3), if you gave up both a cat and a rabbit (2+1).
  • I would be happy to start a game without my elephant (13), if you gave up 5 rabbits (1+2+3+4+5>13). Controversial!!!
  • I will play you in a game where one side only has  one elephant plus one rabbit (13+99), and the other has only 3 rabbits (6+7+99). I don't mind who gets what. As long as I get to play gold ;) Even more controversial!!!


  • A brief treatise on the value of rabbits:

The value of the last rabbits to be taken away is higher than the value of the earlier rabbits to be taken away. I am only happy for you to take away my last rabbit if I can take away all of your pieces in return :), hence it is valued at 99, higher than the value of all other pieces put together. If you're still confused about rabbit values, and happen to be an absent minded scientist, then think of removing electrons from an atom... the first are easy to remove, and take little energy, however the last ones to remove, are very strongly bound, and require a very large energy payment.

Title: Re: (no) absolute score values for pieces?
Post by leo on Aug 29th, 2003, 11:48pm
Having only two rabbits left, I'd tend to value both at least 30. Is it too much? If so, why?

Title: Re: (no) absolute score values for pieces?
Post by leo on Aug 30th, 2003, 12:00am
What's a camel worth when frozen or blocked or on trapping prevention? Sure, if we can discern values for the piece types, they're just one factor in the evaluation of the actions.

I like to picture the whole of the pieces of one player as a protoplasm whose shape and inner tensions determine its health and strength. But don't ask me to put that into code yet  :-/

Title: Re: (no) absolute score values for pieces?
Post by 99of9 on Aug 30th, 2003, 9:13am

Quote:
What's a camel worth when frozen or blocked or on trapping prevention?


That's like asking "What's a queen worth when the king's in checkmate?".

We're just talking about valuing material here.  Valuing position is another very important question, but it's certainly a different question.


Quote:
Having only two rabbits left, I'd tend to value both at least 30.


Well the last is clearly worth more than the second last, since it determines whether you can even possibly win the game.  I believe the second last is not worth 30.  For example, if you think it is worth more than an elephant plus a camel plus a horse plus a dog (13+8+5+3) ... I'll play you where I have all those pieces and one rabbit, and you just have two rabbits :).

Title: Re: (no) absolute score values for pieces?
Post by clauchau on Aug 30th, 2003, 6:06pm
As you suggested 99of9, those values may weigh little compared to other factors after a few moves have been played.

But they are real and interesting statistics to make and may help design an handicap scheme.

I'll have fun making experiments about them when/if my non-materialistic Quantum Leapfrog bot is finished.

Claude

Title: Re: The 99 System
Post by clauchau on Aug 30th, 2003, 6:37pm

on 08/29/03 at 12:36:53, 99of9 wrote:
The 99 System: Elephant 13, Camel 8, Horse 5, Dog 3, Cat 2 [...]


The values of rabbits change according to how many are left and I think you also need similar variable values for the stronger animals. For example consider a game starting with

1 Elephant + 8 Rabbits vs 1 Camel + 1 Horse + 8 Rabbits

which is fair according to your system (140 vs 140). It would exactly be the same as a game starting with

1 Elephant + 8 Rabbits vs 1 Dog + 1 Cat + 8 Rabbits

but your system now scores this as unfair (140 vs 132).

Claude

Title: Re: (no) absolute score values for pieces?
Post by 99of9 on Aug 30th, 2003, 6:58pm

Quote:
... which is fair according to your system  ...
... but your system now scores this as unfair ...


Your quantum leapfrog will understand this, it is in a quantum state which is a mixture of fairness and unfairness.

But seriously, of course you are right, this is only a first approximation.

If you do want to correct it, you need to use the 99 shuffle up process... another exceedingly brilliant but as yet unpatented invention.  It's a little hard to explain though, so I might leave it for another post.  Oh, that and I'm also not quite sure yet whether it should be called shuffle up or shuffle down...

Title: Re: (no) absolute score values for pieces?
Post by leo on Aug 30th, 2003, 8:57pm

on 08/30/03 at 09:13:59, 99of9 wrote:
We're just talking about valuing material here. Valuing position is another very important question, but it's certainly a different question.


Sorry, I'm obsessed with the "action potential" of the pieces and I'm not familiar with traditional piece value.


Quote:
I believe the second last is not worth 30. For example, if you think it is worth more than an elephant plus a camel plus a horse plus a dog (13+8+5+3) ... I'll play you where I have all those pieces and one rabbit, and you just have two rabbits :).


I see what you mean. Thanks for you reply.

Title: Re: (no) absolute score values for pieces?
Post by fotland on Sep 1st, 2003, 4:24pm
Bomb has fixed values for all pieces except rabbits.  When there are fewer on the board, they are worth more.  Not just because when the last one is gone ther is no way to win.  Rabbits are essential for blocking forward progress of other rabbits, so even an 8 to 4 excess of rabbits is a huge advantage.  At some point there just aren't enough pieces left to block the goal.

My values are almost the same as 99of9:

1, 2.5, 3, 5, 9, 13.

I agree that fixed values are not correct, but in actual games there is not so much opportunity for strange trades, so it doesn't make much difference.

In any case, just evaluating material will give a very weak player.

David

Title: Re: (no) absolute score values for pieces?
Post by 99of9 on Sep 2nd, 2003, 11:36am

Quote:
...  Not just because when the last one is gone ther is no way to win.  Rabbits are essential for blocking forward progress of other rabbits, so even an 8 to 4 excess of rabbits is a huge advantage.  At some point there just aren't enough pieces left to block the goal. ...


This is true, but of course rabbits are no better at this than any other piece.  In fact they are often worse.  So really all pieces should increase in value with decreasing density.

Title: Re: (no) absolute score values for pieces?
Post by 99of9 on Sep 2nd, 2003, 11:38am

on 09/01/03 at 16:24:53, fotland wrote:
My values are almost the same as 99of9:

1, 2.5, 3, 5, 9, 13.


Can I sue?  (no claiming bankruptcy when bomb thrashes humanity come February) :)

99

Title: Re: (no) absolute score values for pieces?
Post by gern on Sep 7th, 2003, 10:31am
In Bot Occam, the values of the pieces vary depending what is on the
board.   When a piece type goes away for instance,   I revalue every
piece on the board as if that piece never was a part of the game.

For example,  if the dogs go away, there should not be a huge gap
between the value of the cats and the horses.   It's as if the horses
now become DOGS.     In arimaa the values
of the piece are relative to each other,  not like in chess where the
value is based more on the power of their moves.

Don

Title: Re: (no) absolute score values for pieces?
Post by clauchau on Oct 3rd, 2003, 5:50pm
By the way Camels are overrated  :)

In the games I traded it, I hardly felt the trade like a loss. The truth is, if you rate it high, you have to defend it and use part of your elephant, or at least keep it far from your opponent's elephant, reducing the value you intially saw in it.

Funny paradox, isn't it - if they are worth a lot then they aren't worth that lot  ;)

Hence I feel better putting values elsewhere in the first place.

Claude

Title: Re: (no) absolute score values for pieces?
Post by fotland on Oct 5th, 2003, 1:16am
Would you trade a camel for two horses, or a horse and a dog?

The way I think of it is that if I trade camel for camel, then the horses become camels, so now I have two instead of one :)

David

Title: Re: (no) absolute score values for pieces?
Post by haizhi on Nov 22nd, 2003, 3:19pm
I have some thoughts about that. Maybe we can use TD learning to set piece values for every situation since it's enumable, say, one side lost 1 camel and  the other lost 1 horse, just run a game initially set up this way for 1000 times.

Title: Re: (no) absolute score values for pieces?
Post by haizhi on Nov 22nd, 2003, 3:37pm
But it only works when you doesn't plan putting new feature in your eval, otherwise too many recalculating.

Title: Re: (no) absolute score values for pieces?
Post by fotland on Nov 22nd, 2003, 11:10pm
I thought about trying to learn weights for pieces, but I found that positional values are much higher in arimaa than in chess programs, so the exact values of the pieces are not very important.

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Oct 7th, 2005, 10:59pm
After reading Haizhi's thesis about his bot, I was impressed with how much I don't know about game programming, and how useless my strategic understanding of Arimaa would be in helping to create a stronger bot.  I can sometimes specifically describe what I think the weaknesses of a given bot are, and the bot-bashing hall of fame is testament to how small flaws in bot evaluation can be parlayed into outrageous losses, but increasing human understanding of bot weakness doesn't necessarily suggest any quick fixes.

That said, if there is one tiny area where cheap gains in bot strength seem to be available, it would be in material evaluation.  Material advantage is (and should be) the bedrock of the evaluation function, but static piece values just aren't very good, and Occam's fix, while helpful, doesn't apply in many situations.  Let me add a couple of examples to the many already proposed:

(1) I believe that as the first pieces traded, a camel is equal to a horse and a dog, i.e. EHHDDCCRRRRRRRR = EMHDCCRRRRRRRR.  However, every even trade thereafter makes the camel worth relatively less.  If a pair of horses comes off, then the camel is palpably worse than the horse and dog although Occam can't collapse any piece categories and the 99of9 system acknowledges no difference.  EHDDCCRRRRRRRR > EMDCCRRRRRRRR.

(2) The value of rabbits increases, not just when rabbits disappear (as the 99of9 system has it) but also when pieces disappear.  How does a camel compare to four rabbits?  It is worth more in the opening, i.e.  EMHHDDCCRRRR > EHHDDCCRRRRRRRR, but less if we trade off HDC, weakening goal defense, i.e. EMHDCRRRR < EHDCRRRRRRRR.  Notice that neither Occam nor the 99of9 system can have a different inequality in the former case than in the latter case.

In general, I agree with Clauchau that camels are overvalued, at least by bots.  This is because bots are hand-tuned to value camels approximately correctly in the opening, but stick with this high value even as they get less and less valuable later in the game.


I'll post more ramblings in a bit.

Title: Re: (no) absolute score values for pieces?
Post by PMertens on Oct 8th, 2005, 12:54am
You need to be very carefull how to tune this, because humans will learn very fast how to trigger suicidal moves.

That beeing said I honestly doubt that a more complex weighting of just the piece-values is worth the effort - just because the effort is so enormous.

The value is not only dependent on the available pieces, but also on the position on the board.

What is the value of a blocked phant  ;)
(see bait and tackle)

What is the value of a camel hiding at home ?

Like I said in the beginning: I doubt it will bring to much advantage ...
but then I should try to come up with a better idea, shouldn't I ?  :-[

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Oct 8th, 2005, 1:49am
Well, certainly positional factors are going to weigh heavily, and be very tricky to evaluate.  Yes, an elephant blockade may be worth more or less than a dog depending on a number of factors.  Yes, a hiding camel is worth less than an active camel.  Etc. Etc.

But that doesn't mean we shouldn't try to evaluate material more accurately.  We are starting to pile up endgame examples where material misevaluation, quite apart from positional factors, has been fatal to Bomb.  For example it declined to trade its camel for a dog plus a rabbit of mine when trading would have given Bomb a clearer material advantage than not trading.

Of course, a better evaluation function is easier to speculate about than to formulate, so maybe you are right that it requires too much effort.  Here's one try that doesn't quite work, but may be of interest:

First line up the pieces (not including rabbits) from strongest to weakest on each side.  Winning the "strongest vs. strongest" is worth 128 points.  Winning "2nd strongest vs. 2nd strongest" is worth 64 points, and so on, with each contest worth half the previous.  After totalling up those points, let the rabbits each be worth 1024 divided by the opponent's goal defense, i.e. 1024/(R + 3P), where R is the number of opposing rabbits and P is the number of opposing pieces.

Here's how some opening trades are evaluated by this system:

M for HD is +21
M for HH is -11
M for HCR is -17
M for RRRR is -66
H for DC is -19
E for RRRRR is +16
E for RRRRRR is -31

Clearly I am over-valuing the rabbits and undervaluing horses.  Maybe the behavior I want not only can't be captured by a simple formula, it can't be captured by a moderately complex formula.  :-(

Title: Re: (no) absolute score values for pieces?
Post by jdb on Oct 8th, 2005, 8:44am
In my opinion, this is a very important topic for anyone wanting to create a quality evaluation function.

As Pmertens pointed out, if the eval places a large value on positional factors, humans will figure out how to exploit that and suicidal moves will result.

The 2005 version of clueless uses absolute piece values, but the rabbits follow something like the 99of9 system. This approach is not really good enough.

In order to manage the complexity, I was considering using two sets of piece values. One for the opening and one for the endgame. As Fritzlein mentioned, the piece values do depend on the game situation.

It is likely safer to select criteria that cant be undone. For example, changing material values if a horse/camel is hostage, would be risky. Using something like the enemy camel is captured would be better, since it can't be undone. Anything that can be undone could be exploited by a human quite easily.

Clueless uses these values:

Cat           1600
Dog          1800
Horse       3500
Camel       5500
Elephant 22000

 // CUMULATIVE RABBIT VALUE
 // 1st rabbit is 3000
 // 2nd is 2000
 // 3rd-5th is 1500
 // 6-8th is 1200   Changed to 1400
 // 9th and up also 1200 for test position only, Changed to error condition
 public static final int rabbit_value[] = {
   0, 3000, 5000, 6500, 8000, 9500, 10900, 12300, 13700, 14900, 16100, -1,
   -1, -1,
   -1, -1, -1, -1, -1, -1, -1, -1
 };

Title: Re: (no) absolute score values for pieces?
Post by PMertens on Oct 8th, 2005, 11:07am
one little idea, that might be slightly off Topic:

Bots usually do not mind exchanging pieces.
Does any bot check wether he leads on material or not ?

If I am behind a dog, then It is usually not smart to trade horses.

That also means, that the values on both sides must be different. (I am not sure if anyone implemented that)

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Oct 8th, 2005, 1:49pm

on 10/08/05 at 11:07:34, PMertens wrote:
one little idea, that might be slightly off Topic:

Bots usually do not mind exchanging pieces.
Does any bot check wether he leads on material or not ?


That's not off-topic at all, and in fact is an essential feature of what I am trying to do with my evaluation function.  Generally if you are behind on material, trading off pieces will make it worse, but also there are material deficits that get better through trading.  For example two rabbits is worth less than a dog at first, but trading off cats and rabbits makes the disadvantage less, and may turn it into an advantage.

I've re-tweaked my proposal, and now it is starting to give semi-reasonable results, although there are still some problems.  I still give great weight to numerical superiority, but now there's a slightly better balance between pieces and rabbits.  (By piece I mean any non-rabbit piece.)   My new plan is:

(1) Line up the pieces from strongest to weakest on both sides.  If one side has fewer pieces, they must contributed rabbits until all the opposing pieces are matched.

(2) Winning the top matchup is worth 128, and each successive matchup is worth 2/3 of the previous.

(3) The leftover rabbits each score 768/(R+3P) where R and P are respectively the number of rabbits and pieces the opponent has left.

Here are some initial trade values:

R free = +30
C free = +51
D free = +68
H free = +116
M free = +201
E free = +329

C for R = +19
C for RR = -13
D for RR = +4
D for CR = -17
MD for MCR = -14
MHD for MHCR = -8
MHHD for MHHCR = +2
H for D = +38
H for DR = +4
H for DC = -24
M for HD = -2
MH for HHD = -30
M for HCR = -13
M for HH = -59
H for RRR = +9
H for RRRR = -24
M for RRRRR = +28
MHDC for HDCRRRRR = +22 :-(
E for MH = -16 :-(

and some endgames:
ERR vs. CCRR = -168
ERRR vs. CCRR = -21
EDRR vs. ECCRR = -94
ERRR vs. ECR = -6
EDRRR vs ECCRR = -2

Well, the formula returned reasonable values for a lot of trades, but the last two trades are wrong, I believe.   The endgame values are a bit dodgy, but not wholly unreasonable given that positional goal threat factors will often overwhelm them.  I have a slight preference for the former side in either of the last two endgames, but my system has them even.  Ah, well, back to the drawing board.

Title: Re: (no) absolute score values for pieces?
Post by 99of9 on Oct 9th, 2005, 6:58pm
I'm really glad to see this topic refreshed.  I'll have more to contribute later, but for now, I think I can identify one of Fritz's main problems:

on 10/08/05 at 13:49:22, Fritzlein wrote:
(2) Winning the top matchup is worth 128, and each successive matchup is worth 2/3 of the previous.

I think this outrageously undervalues the topmost matchup.  Losing global domination is devastating, it is worth much more than a 50% bonus on secondary domination.

I think this is why you're unhappy with the elephant sacrifice results.

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Oct 9th, 2005, 10:56pm

on 10/09/05 at 18:58:22, 99of9 wrote:
I think I can identify one of Fritz's main problems:
I think this outrageously undervalues the topmost matchup.


You hit the nail on the head.  I'm going to try to run the numbers again with the values of the matchups being something more like

1st: 256
2nd: 256/3
3rd: 256/5
4th: 256/7
etc.

This will also help correct some of the dubious endgame values.  I'll have to run all the numbers again to be sure, but I think that with this improvement, I might prefer this dynamic system to antyhing that has been published about bots Haizhi, Clueless, Bomb, Occam, and Gnobot.  Nevertheless, I'll hold off naming it until I think it will work.  :-)

Title: Re: (no) absolute score values for pieces?
Post by 99of9 on Oct 9th, 2005, 11:45pm

on 10/09/05 at 22:56:40, Fritzlein wrote:
I might prefer this dynamic system to antyhing that has been published about bots Haizhi, Clueless, Bomb, Occam, and Gnobot.

I think you could well be right, it's certainly the best thinking I've seen published about this.

And thanks for your tipoff that Haizhi's thesis was online... I'm just reading it now.  Congratulations on submitting Haizhi!

Title: Re: (no) absolute score values for pieces?
Post by 99of9 on Oct 9th, 2005, 11:50pm

on 10/09/05 at 22:56:40, Fritzlein wrote:
Nevertheless, I'll hold off naming it until I think it will work.  :-)

Your patience shows great wisdom :-).

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Oct 13th, 2005, 9:06am
I looked back over the numbers my system produced, and I have to say I like them reasonably well, except for when one side is missing an elephant.  To fix that, I'll just arbitrarily give the top matchup 128 extra points, so the matchups are worth 256, 85, 57, 38, 25, 17, 11, and 7.

I still have some quibbles with my evaluations, for example a camel versus five rabbits, but I'm not too worried because (A) such lopsided comparisons rarely arise in practice, and (B) I'm not terribly confident in my intuition of which is better.  Anyway, I think I can be wrong in some peripheral ways and still be better than the other systems out there.

One critical feature I have that nobody else does is a bias for or against "equal" trades (like a horse for a horse) when there is a material imbalance.  This type of decision is important because it comes up all the time: should I capture a piece and allow a capture in return, or should I give up my attack in order to defend?  In fact, the broader issue of judging tradeoffs between attack and defense is one of the most important strategic considerations in Arimaa.

In my system a player who is behind material will always be penalized for trading, i.e. whoever is behind will be evaluated as further behind after an "equal" trade.

Furthermore, when there is an imbalance where it isn't clear who is ahead or behind (e.g. M for HD) my system does roughly the right thing, rewarding quantity over quality as the board empties out.  I'm proud that a player who has HD for M in my system will be nearly as eager to trade a pair of horses as to win a rabbit outright.  On the other hand, my system also rewards promotion of pieces as higher-ranking pieces disappear, so that a player with H for DR will be eager to trade off camels or even a pair of horses, while averse to trading off cats and rabbits.

My system still has some issues with endgames, but at least avoids the blantant over-valuing of camels and horses to which Bomb is prone.  In my opinion Bomb over-values camels and horses slightly in the opening and heavily in the endgame, while undervaluing cats at all times.  When all else is equal, my system prefers to a cat to a rabbit at any phase of the game.

That reminds me to say that one would have to independently heavily penalize the loss of the last rabbit, perhaps making it worth an additional -1000, or minus infinity in games where draws are not allowed.  This seems like a bit of a hack compared to other systems, but I think it is worth being a bit ungraceful to avoid the overvaluation of rabbits relative to pieces present in other systems.  For example the 99of9 system has ERRR way ahead of ECCR, and I think Bomb does too, but in my opinion ERRR is probably losing!  If there is no immediate goal for ERRR, odds are that ECCR will start winning rabbits.  (Is this controversial?  Maybe I'm wrong about this evaluation...)

Well, to summarize, I'm absolutely positive my proposal can be improved upon, but also somewhat optimistic that it is in itself an improvement on previous systems.

It would be interesting to compare the same Arimaa playing engine against itself with the same positional values, but with two different material evaluations.

Title: Re: (no) absolute score values for pieces?
Post by 99of9 on Oct 13th, 2005, 9:24am
Are you ready to name it then? :-)

I agree it's looking good.

ECCR vs ERRR ... I'm not sure actually, but I haven't played enough games with one rabbit left to have any experience about this.

Title: Re: (no) absolute score values for pieces?
Post by jdb on Oct 13th, 2005, 11:18am

Quote:
It would be interesting to compare the same Arimaa playing engine against itself with the same positional values, but with two different material evaluations.


Fritzlein, if you want to try out some tests, I can provide you with code to do that.


Quote:
ECCR vs ERRR


I think having one rabbit left is sort of a special case. The endgame ER vs e is drawn (assuming we ignore three fold repetition) if the e can pin the R on the edge of the board. So it might be worth considering holding on to at least two rabbits, so the defender has to worry about two things at once. This material imbalance seems interesting, so I'll run some tests and post the results later.



Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Oct 13th, 2005, 12:43pm

on 10/13/05 at 09:24:32, 99of9 wrote:
Are you ready to name it then? :-)


Let's call it the FAME system, for Fritz's Arimaa Material Evaluator.

But what will I call it when I tweak the constants again?  Maybe FAME can refer to my latest tweaks, and I'll only upgrade the name if I make major changes.


Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Oct 13th, 2005, 12:50pm

on 10/13/05 at 11:18:57, jdb wrote:
Fritzlein, if you want to try out some tests, I can provide you with code to do that.


You mean using Omar's offline match script? I'm all over that. I'll send you a separate e-mail to get the ball rolling.  It seems there is a serious issue in integrating material evaluation with positional factors, but maybe you were thinking of stripping down evaluation to only material?


Quote:
I think having one rabbit left is sort of a special case.


Sigh. Probably you are right, and I need a fudge factor both for zero rabbits and only one rabbit left. You engineers don't mind having a few arbitrary contsants, but it bugs the heck out of us mathematicians. :)

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Oct 13th, 2005, 12:58pm

on 10/13/05 at 09:24:32, 99of9 wrote:
ECCR vs ERRR


I'm pretty sure that ECCR vs E is won.  In fact, if I recall correctly, Adanac found the win over the board in his game of record-setting length.  ECCR vs. ERRR is therefore probably very unstable.  Barring a quick goal, the weaker side will lose its rabbits and then the game.

On the other hand I seem to recall that ECR vs E is drawn, so the weak side might try to grab a cat while letting go of its own rabbits, as long as the enemy rabbit doesn't goal in the mean time.

Title: Re: (no) absolute score values for pieces?
Post by nbarriga on Oct 13th, 2005, 3:26pm

on 10/13/05 at 09:06:30, Fritzlein wrote:
It would be interesting to compare the same Arimaa playing engine against itself with the same positional values, but with two different material evaluations.


I just programmed your proposed eval function, but i encountered some problems. Mi evaluation function is composed of a material and a positional section. I changed the material section, and i'm positive than it is better than the older, but it will be hard to re-balance the scores between the material and positional sections.

I'm running some games now at Blitz and Fast speeds, and i will post the results as soon as i have them.

Title: Re: (no) absolute score values for pieces?
Post by nbarriga on Oct 13th, 2005, 9:03pm
The balancing between positional and material is more difficult than i thought, so i will not be able to publish results yet. The current results i have now are very bad for the new proposed eval function.

By the way, my current eval function is
R=100
C=200
D=300
H=500
M=800
E=2000

If the oponent lost a complete category, the next category of my pieces is worth the average between the category and the one lost.

If i'm not making myself clear is because i'm not a native english speaker. An example: If the enemy lost both his dogs, the values for my pieces is:
R=150
C=250
D=300
H=500
M=800
E=2000

Title: Re: (no) absolute score values for pieces?
Post by 99of9 on Oct 13th, 2005, 9:55pm

on 10/13/05 at 21:03:04, nbarriga wrote:
By the way, my current eval function is
R=100
C=200
D=300
H=500
M=800
E=2000


It's interesting how similar your creation is to the one I suggested a few years ago (at the start of this thread):


Quote:
Elephant      13
Camel      8
Horse      5
Dog      3
Cat      2
1st Rabbit      1


I think you value the elephant better.  But you might like to look back and see what David and I wrote about rabbits - I still think that's quite important.

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Oct 14th, 2005, 5:59am

on 10/13/05 at 21:03:04, nbarriga wrote:
The balancing between positional and material is more difficult than i thought, so i will not be able to publish results yet. The current results i have now are very bad for the new proposed eval function.


Thanks for testing it out.  I wonder if the FAME system is inaccurate, or if the problem is something else, like balancing it with positional factors.   I guess it wouldn't be too surprising to see a drop in performance if suddenly all material was undervalued (or overvalued) relative to positional factors.

And I can imagine it is even more complex than that.  As pieces are traded, the relative value of the camel goes down in fame, so does that mean the value of a camel hostage should go down?  Or if the value of a dog goes up due to trades, should the value of a dog hostage go up too?

I am flattered that you considered FAME worth trying out, and it's too bad if it is of no benefit.  I do expect that positional factors are far more important than material evaluation, so I'm not too surprised that FAME doesn't help, but I would be disappointed if it couldn't be made to work at least as well as the fixed constants you are using.  Ah, well, so is life.

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Dec 12th, 2005, 4:40pm
OK, I've done further tweaking on my material evaluation function. I realized that I was fairly valuing strong pieces versus weak pieces, but was overvaluing pieces (non-rabbits) relative to rabbits. Here's my improved system:

(1) Line up the pieces from strongest to weakest on both sides. If one side has fewer pieces, they must contribute rabbits until all the opposing pieces are matched. The number of matchups will thus be the number of pieces in the army with more pieces. (There will be eight matchups at first, and fewer as pieces are traded.) Any rabbits not involved in the matchups are left over.

(2) The values of winning the matchups are, from top to bottom, 256, 85, 57, 38, 25, 17, 11, and 7.

(3) The leftover rabbits on each side each score 600/(R+2P) where R and P are respectively the number of rabbits and pieces the opponent has left. (This formula is the bit that changed in order to value rabbits more relative to pieces.)

Here are some initial trade values:

R free = +34
C free = +50
D free = +67
H free = +105
M free = +190
E free = +446

(This might suggest static piece values of R=1, C=1.5, D=2, H=3.1, M=5.6, E=13.2, but those static values from the opening would somewhat overvalue the big pieces in the mid-game and hugely overvalue the big pieces in the endgame.)

C for R = +15
C for RR = -20
D for RR = -3
D for CR = -21
MD for MCR = -17
MHD for MHCR = -11
MHHD for MHHCR = 0
H for D = +38
H for DR = 0
H for DC = -22
M for HD = 0
MH for HHD = -27
M for HRR = 9
M for HH = -57
H for RRR = -2
M for RRRRR = +8
MHDC for HDCRRRRR = -6
E for MH = 114
E for MHH = -80

and some endgames:
ER vs. CCR = -29
ERR vs. CCR = +141
ERR vs. CCRR = -29
ERRR vs. ECR = +35
EDR vs. ECCR = -92
EDRR vs. ECCR = 8
EDRRR vs ECCRR = -1

These endgame numbers are much less dodgy than the previous version. Two rabbits are now correctly valued at more than a cat at all times. (Nevertheless FAME values a cat higher than a single rabbit almost all the time, which is a position I maintain in defiance of popular opinion). The new endgame valuations may not be perfect, but now they are at least in the ballpark. Meanwhile the good features from before have been retained, including:

*If there has been a trade of M for HD, the side with the camel will be averse to trading horses, while the side with HD will be eager to trade horses. The value of the superficially equal horse trade is actually near the value of losing (winning) a rabbit outright.

*In general the side with more numerous pieces would like to trade while the side with stronger pieces would like to avoid trades.  However, the D for CR trade, which is initially poor, gets progressively better if M, H, and H are traded, which promotes the dog more than the cat.

*As the board empties out, the relative value of rabbits goes up.

*The value of a weak piece rises with every stronger opposing piece that disappears, so that a cat in the endgame may be worth what a horse was in the opening. Of course, this is offset by rabbits also becoming much more valuable, so the primary effect is that any remaining strong pieces go down in relative value.

JDB, the new trade values don't differ much from the old ones in the opening (only in the endgame), so if you drop the new constants into Clueless, you shouldn't have to retune all the positional factors to match.

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Dec 12th, 2005, 5:20pm
Incidentally, after 26 moves in my WC game against Robinson, I had taken HDDC and he had taken MHD.  This latest version of FAME values that at +58 for Robinson, and I agree.  Demote his horse to a dog, however, and we are dead even, i.e. taking HHDC would have balanced MHD.

Title: Re: (no) absolute score values for pieces?
Post by Adanac on Dec 13th, 2005, 7:20pm
I was wondering whether any bots use rabbit composition percentages for the endgame?  That's the intuitive system that I use but I may be the only player that does.  For example:

1. EMHHDCR  (14% rabbits)
2. EHDRRRR  (57% rabbits)
3. ERRRRRR (86% rabbits)

Each army has 7 pieces but they range from Muscular -> Balanced -> Lots of Goal Threats

I happen to believe that army #2 is better than either #1 or #3 because it has the best Rabbit/Non-Rabbit ratio while possessing a bit of strength with the horse and dog (I wouldn't like it at all with ECCRRRR, though).  I adhere to this system more passionately in the endgame, but I also use it in the opening, to some degree.  For example, at the beginning of the game, if each side traps one rabbit and then, for the second exchange, the gold cat and a silver rabbit are trapped, I believe that gold has the much better army.  For starters it's more balanced (50% rabbits versus 43%) and secondly I'm a big, big fan of advanced rabbits and it doesn't require many piece exchanges before I value rabbits more highly than dogs, never mind cats.  However, I find that bots (and humans) have much different opinions of relative piece value than I do, so it wouldn't surprise me if no one else uses or agrees with this philosophy!  I once suggested a similar idea to Arimanator and he thought I was nuts (though I did suggest that rabbits were more valuable than cats on the FIRST trade, not the second as in the above example).

Title: Re: (no) absolute score values for pieces?
Post by Ryan_Cable on Dec 14th, 2005, 3:01am
FAME is by far the best material evaluator I have seen.  It is the first algorithm that comes anywhere close to being as good as HOTFLAME (Human On The FLy Arimaa Material Evaluation).  Thus, I will point out all of the bugs I see in hopes you can make it even better.

FAME ignores the non-matchup interactions between pieces:

EHCR vs. EMDR = ECCR vs. EMDR = -142

But the former is clearly better than the latter.

FAME has problems when one side has no Rs:

E vs. CR = E vs. RR = -44

But E vs. CR is usually an infinite move draw (E freezes R, then C must dance around to prevent immobilization), while E vs. RR is usually lost (E freezes R, then R goals).

ER vs. EC = -85
EHHDDCCR vs. EMHHDDCC = ERRRRRRR vs. EMHHDDCC = -223

But all situations of this type are >=0.

Strictly speaking, you have not defined the score for situations where one side has more pieces than the other has pieces plus Rs.  The obvious solution is to specify that piece vs. NULL counts as a wining matchup.  This would be fine when both sides have Rs, but it would give

EDR vs. EHC = ER vs. EHC = -142

Which is basically a combination of the first two problems.

Adanac, I agree with Arimanator, you are nuts!  If you really are passionate that 2 is better than 1, send me a postal invite.  I will even give you the first move after we finish making the necessary sacrifices.  FAME gives:

EMHHDCR vs. EHDRRRR = EMHHDCR vs. ERRRRRR = 235.8
EHDRRRR vs. ERRRRRR = 202

I think this is probably too high for the 1 vs. 2 case and probably too low for the 1 vs. 3 case.  But I think EMHHDCR vs. EHDRRRR is enough advantage for me to be able to beat you even if you are the true World Champion.  However, I would much rather have (in descending order of preference):

EMHHDRR vs. EHDRRRR = EMHHCRR vs. EHDRRRR = EMHDCRR vs. EHDRRRR = 225

And I would prefer EMHHRRR vs. EHDRRRR = 190.9 to at least some of those.  I think FAME undervalues Rs vs. pieces, when there are many pieces and few Rs.


on 12/13/05 at 19:20:24, Adanac wrote:
For example, at the beginning of the game, if each side traps one rabbit and then, for the second exchange, the gold cat and a silver rabbit are trapped, I believe that gold has the much better army.  For starters it's more balanced (50% rabbits versus 43%) and secondly I'm a big, big fan of advanced rabbits and it doesn't require many piece exchanges before I value rabbits more highly than dogs, never mind cats.

There are three places to attempt a goal threat: left flank, right flank, and center.  Goal threats in the center are usually weak, and it is rare for one player to have more than 2 goal threats at a time.  In goal defense, a C is usually worth >=2R.  Thus, I would always be materially happy to trade a R for a piece, when I have >=3R.  However, Rs are more effected by positional factors than any other piece.  A R that is presenting a latent goal threat can be worth >=C, and a R that is actually threatening goal is often worth >=D.

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Dec 17th, 2005, 2:11pm

on 12/14/05 at 03:01:02, Ryan_Cable wrote:
HOTFLAME (Human On The FLy Arimaa Material Evaluation).

:-) Nice acronym :-)


Quote:
FAME ignores the non-matchup interactions between pieces:

EHCR vs. EMDR = ECCR vs. EMDR = -142

Excellent point.  The material eval should definitely differentiate between different situations.  One significant motivation for FAME was that equivalent situations be evaluated equivalently, e.g. EMR vs. ECCR = EDR vs. ECCR, but I appear to have flattened it too much.

Actually, my very first idea about dynamic piece valuation was to value a piece based on what proportion of enemy pieces it was better than, equal to, or inferior to.   I could never get it to work out in a reasonable fashion.  I eventually forgot about it when FAME started to give some reasonable numbers, but I think I may have to revive the idea in future.



Quote:
FAME has problems when one side has no Rs:

My thought was that having ridiculous evals in the case of zero rabbits was a minor glitch, since that case probably has to be handled separately anyway.  Also, whether having no rabbits is a possible draw or an automatic loss depends on the tournament rules, so that needs special handling as well.  I recognize this as a bug in FAME, but perhaps not of the highest priority.


Quote:
EHHDDCCR vs. EMHHDDCC = ERRRRRRR vs. EMHHDDCC = -223

Not quite correct, because in the former case the side with the rabbit ties on some of the matchups, but the point is taken that a side with no rabbits can never be considered to be winning.


Quote:
Strictly speaking, you have not defined the score for situations where one side has more pieces than the other has pieces plus Rs.

Sorry I didn't didn't specify this in the post.  In chat discussion with Jeff, I suggested a fine workaround would be to have the weaker side have negative leftover rabbits.  Rather than simply getting a zero rabbit bonus, they get a negative rabbit bonus.  This isn't a supremely accurate way of measuring how much the stronger side is winning by, but is does retain an appropriate incentive towards capture when one side is far ahead in material.   For example EDDCCRR vs EMRRR < EDDCCRR vs EMR, because in the former case the weaker side has zero leftover rabbits, whereas in the latter case the weaker side has negative two leftover rabbits.


Quote:
But I think EMHHDCR vs. EHDRRRR is enough advantage for me to be able to beat you even if you are the true World Champion.

I agree with Ryan, and I would like to see this played out postally between the two of you.  Are you game, Adanac?  I don't think you will prefer the latter army any more after you have had to try to play with it for a while.  Let's have all the pieces start on the back row on each side (following appropriate initial suicides) in the order RD*ERRHR for Adanac as Gold, opposite CM*HEDHR for Ryan as Silver.  I don't think those piece oppositions give either player an advantage, apart from the significant material advantage for Ryan with the stronger pieces.

Perhaps FAME does undervalue rabbits relative to pieces in some endgames, but I believe popular opinion rather overvalues rabbits relative to pieces, and I would like to see at least Adanac's extreme proposal put to the test.

Title: Re: (no) absolute score values for pieces?
Post by Adanac on Dec 18th, 2005, 11:12am

on 12/17/05 at 14:11:53, Fritzlein wrote:
I agree with Ryan, and I would like to see this played out postally between the two of you. Are you game, Adanac? I don't think you will prefer the latter army any more after you have had to try to play with it for a while. Let's have all the pieces start on the back row on each side (following appropriate initial suicides) in the order RD*ERRHR for Adanac as Gold, opposite CM*HEDHR for Ryan as Silver. I don't think those piece oppositions give either player an advantage, apart from the significant material advantage for Ryan with the stronger pieces.

Perhaps FAME does undervalue rabbits relative to pieces in some endgames, but I believe popular opinion rather overvalues rabbits relative to pieces, and I would like to see at least Adanac's extreme proposal put to the test.


I'd like to try it using Fritzlein's opening setup!  I've struggled mightily with the stronger pieces (but with only 1 rabbit remaining) in the World Championship against PMertens and the Postal Championship against Belbo - it's because of games like those that I've come to highly value rabbits,  but I rarely play endgames with the extra rabbits.  The only game I can think of was my Postal Championship against 99of9 where I sacrificed my Camel for a rabbit and an opportunity to advance rabbits on both wings, rather than save my camel and leave my elephant without mobility.  I won the game but I'm still not 100% convinced that it was necessarily a correct sacrifice.

If I'm wrong about my balanced rabbit theory, it's better to find out sooner rather than later!!

Title: Re: (no) absolute score values for pieces?
Post by Janzert on Jan 22nd, 2006, 4:02am
Since people have been wondering what the fame score was at certain times during the recent tournament games, I put together a page to calculate it.

http://www.janzert.com/fame.html

Janzert

Title: Re: (no) absolute score values for pieces?
Post by 99of9 on Jan 22nd, 2006, 6:11am

on 01/22/06 at 04:02:26, Janzert wrote:
Since people have been wondering what the fame score was at certain times during the recent tournament games, I put together a page to calculate it.

http://www.janzert.com/fame.html

Wow, that's very cool.

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Jan 22nd, 2006, 11:46am
Thanks Janzert!  I'm going to have that window open from now on when I'm watching any game.

Title: Re: (no) absolute score values for pieces?
Post by omar on Jan 29th, 2006, 7:51pm
Thanks Brian. For now I've linked it on the Downloads page so that we can find it easily.

Title: Re: (no) absolute score values for pieces?
Post by Janzert on Jan 30th, 2006, 6:43pm
Feel free to take and use it in the site itself if you want. I added a notice to the top of the html file releasing it into the public domain. I also moved the css into the page itself so there's no external stylesheet to deal with.

Janzert

Title: Re: (no) absolute score values for pieces?
Post by Janzert on Jan 30th, 2006, 6:50pm
One other thing on this subject. Fritzlein had mentioned possibly making some changes to FAME in order to fix some problems in certain situations.

If changes are made it might be nice to "normalize" the scores so that an initial free rabbit is worth something like 1, 10 or 100.

Janzert

Title: Re: (no) absolute score values for pieces?
Post by Swynndla on Mar 7th, 2006, 10:07pm
Normalized values make a lot of sense to me ... not only are they easy for humans to understand, but if bots used a modular value lookup (and it makes sense to me that they should) then it would be easy to substitute that lookup for FAME, because the positional function (that deals with value increases and decreases based on a pieces position on the board and a pieces position relative to other pieces etc) wouldn't have to be changed.  Am I making any sense at all?

PS - I really like FAME - great stuff Fritzlein.

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Mar 8th, 2006, 7:52am
I just realized that the results of the Adanac / Ryan Cable never got posted to this thread.  Ryan won both games.

http://arimaa.com/arimaa/gameroom/comments.cgi?gid=23643
http://arimaa.com/arimaa/gameroom/comments.cgi?gid=24845

So it may be that FAME undervalues rabbits, but a lone rabbit can still be plenty to win if backed by the more powerful army.

Another point worth noting is that both of these games were very short.  If it ever seems that Arimaa is becoming too long and boring, it's good to know there is a quick fix in the form of starting with fewer pieces.  I'm guessing that if each side removed HDCRRR to begin with (i.e. if we played with only ten pieces), Arimaa would be much shorter and more tactical.  Also defensive play would become hopeless.

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Mar 8th, 2006, 9:41am

on 03/07/06 at 22:07:46, Swynndla wrote:
Normalized values make a lot of sense to me ... not only are they easy for humans to understand, but if bots used a modular value lookup (and it makes sense to me that they should) then it would be easy to substitute that lookup for FAME, because the positional function (that deals with value increases and decreases based on a pieces position on the board and a pieces position relative to other pieces etc) wouldn't have to be changed.  Am I making any sense at all?

You are making sense to me.  My concern is that the rabbit may not be the best thing to normalize on if the positional values are supposed to remain unchanged when a new material scheme is implemented.

For example, suppose I program my bot so that a perfectly positioned camel hostage is worth a dog.  Taking the 99of9 material values R=1, C=2, D=3, H=5, M=8, E=13, I hard-code my camel hostage to also be worth 3.  Then later on I discover that I was undervaluing the first rabbit, and revamp my system to have R=1, C=1.5, D=2, H=3, M=5, E=10.  The first rabbit is still worth one, but by normalizing to the rabbit I just pulled the value of all the other pieces down, and now my hard-coded positional value for a camel is too high at 3, now the value of a horse.  In this case my positional values would have stayed better if I had normalized to the total value of the army rather than normalizing to the value of its weakest unit.

Well, anyway, probably whenever a new material evaluation scheme is implemented, all the positional values have to be re-tuned by hand anyway, so it isn't such a big deal.  Probably a more important factor is the convention in chess that a pawn is worth 1, which makes a computer chess evaluation of +2.36 immediately intelligible.  I guess for the sake of going easy on everyone's intution in conversation, FAME should be scaled to make the first free rabbit worth 1.  Developers can scale things internally however it most makes sense to them.  Rabbit=1 is a convention that regular gamers will expect, even if they don't expect that rabbits will be worth a lot more than 1 in the endgame.  :-)

Title: Re: (no) absolute score values for pieces?
Post by Swynndla on Mar 8th, 2006, 5:38pm
As always, it's more complicated than I first realized!

But thinking about the bot that is programmed to think "that a perfectly positioned camel hostage is worth a dog", probably it would be better to program it to think that a perfectly positioned camel hostage is worth the value of a dog, and not just "3" (ie it would look up the value).  That way, changing the piece values wont matter.  But ... that value (of a dod) would change as the number of peices decreased as the games goes on, so would it still work? ... I'm not so sure, so I'm at a loss as to the what to do about that.

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Mar 9th, 2006, 11:22am
Well, I don't know how I would do it, now that I think about it some more.  A camel hostage is worth way less late in the game than early in the game.  Bomb can be tricked in the endgame because it still thinks camels and camel hostages are hugely important, and they're not.  But the value of cats and dogs usually goes up between the opening and the endgame, so tying the value of the camel hostage to the value of another piece doesn't seem any cleverer than making it a fixed value.

I actually have no idea how to code dynamic positional values.  It's hard enough thinking how relative material values change in light of exchanges.

Title: Re: (no) absolute score values for pieces?
Post by Janzert on Mar 18th, 2006, 4:14pm
Updated the FAME calculator (http://www.janzert.com/fame.html) today.

* You can now specify zero rabbits left.
* Normalized the scores so an initial rabbit is worth a score of 1. The raw score is also reported in parentheses for now.

But in testing this I found some discrepancies with other results in this thread. After rewriting the algorithm used, I still get the same results. Also checking with a local script I wrote in python when I originally implemented FAME I still get the same score. So either I'm misunderstanding some aspect of FAME or a few of the results posted are wrong. Could someone please double check the following:


Code:
pieces = posted normalized score(raw posted), my normalized(my raw)
EDRR vs. ECCR = 0.24(8), 0.41(14)
EDRRR vs ECCRR = -0.02(-1), 0.20(7)
EMHHRRR vs. EHDRRRR = 5.67(190.9), 5.83(196)
EMHHDCR vs. ERRRRRR = 7(235.8), 7.44(251)
EHHDDCCR vs. EMHHDDCC = -6.62(-223), -4.36(-147)
ERRRRRRR vs. EMHHDDCC = -6.62(-223), -7.12(-240)


Janzert

Title: Re: (no) absolute score values for pieces?
Post by Janzert on Mar 18th, 2006, 4:49pm
Hmm, I must have something wrong or FAME's a little wierder than I thought.

Try EMHHDDCCR vs EM and a variable number of R's.

For 8-6 silver r's the score goes up for gold as expected. But then 5-0 r's the score actually decreases for gold with each additional r captured. :(

FYI, I'm using the modification (clarification?) by Fritzlein to allow negative rabbits left over.

Janzert

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Mar 18th, 2006, 7:45pm
EMHHDDCCR vs. EMRRRRRRRR = +134
EMHHDDCCR vs. EMRRRRRRR = +174
EMHHDDCCR vs. EMRRRRRR = +215

so far I agree with your web page

EMHHDDCCR vs. EMRRRRR = +257 , not +215 as the web page says.  To break it down, it is (for the traps) 57+38+25+17+11+7+ (for gold's 1 extra rabbit) 600/9 - (for silver's -1 extra rabbits) -1 *  600/17.

FAME is admittedly very weird, but at least it likes making extra captures in this situation; something is wrong with the calculation on your web page.

Here's a genuine weirdness with FAME:

EMHHDDCCR vs. ER = +652
EHHDDCCR vs. ER = +640

It only lowers Gold's evaluation by 12 points to throw away a camel.  Although it leaves Silver with only -5 rabbits instead of -6, it simultaneously weakens Gold's defense from 17 to 15, and against negative offense, FAME thinks a weaker defense is better!

To stop that silliness, each negative leftover rabbit should have a fixed value of, let's say, 40 points to the other team regardless of the size of the larger army.  Not that it matters much, but why not patch holes that are easy to patch?  There will still be enough unpatchable holes left. :-)



Title: Re: (no) absolute score values for pieces?
Post by Janzert on Mar 18th, 2006, 9:38pm

on 03/18/06 at 19:45:13, Fritzlein wrote:
EMHHDDCCR vs. EMRRRRR = +257 , not +215 as the web page says.  To break it down, it is (for the traps) 57+38+25+17+11+7+ (for gold's 1 extra rabbit) 600/9 - (for silver's -1 extra rabbits) -1 *  600/17.


Oops, I was breaking out of the matchups as soon as one side ran out of pieces.


Quote:
EHHDDCCR vs. ER = +640


Nooo, I thought I finally had it. ;) 640 not 633? Rabbits vs nothing count for matchup and leftover?

85 + 57 + 38 + 25 + 17 + 11 + 7 = 240

240 + ((600/(1+(2*1)))*1) - ((600/(1+(2*7)))*-5) = 640

Did you happen to check the other ones at all? In particular the first two are from your examples after making the last modification to the formula and I still get the differring results.

Janzert

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Mar 18th, 2006, 9:59pm

on 03/18/06 at 21:38:59, Janzert wrote:
Nooo, I thought I finally had it. ;) 640 not 633? Rabbits vs nothing count for matchup and leftover?

85 + 57 + 38 + 25 + 17 + 11 + 7 = 240

240 + ((600/(1+(2*1)))*1) - ((600/(1+(2*7)))*-5) = 640

Oh, whoops, it should be 633.  You are right.  The 7 point bonus for controlling the 8th trap goes to no one.


Quote:
Did you happen to check the other ones at all? In particular the first two are from your examples after making the last modification to the formula and I still get the differring results.

Sorry, I didn't see the other examples when I replied before.

EDRR vs. ECCR = 85 - 57 + 600/7 - 600/6 = 14
EDRRR vs ECCRR = 85 - 57 + 2*600/8 - 2*600/7 = 7

Looks like your web page is right and the numbers I published before are wrong.  Apparently I was mistakenly reducing the denominator by one, i.e. 85 - 57 + 600/6 - 600/5 = 8 != 14.  Thanks for checking so meticulously.  By now you know my system better than I do.

Title: Re: (no) absolute score values for pieces?
Post by Janzert on Mar 18th, 2006, 10:49pm

on 03/18/06 at 21:59:12, Fritzlein wrote:
Oh, whoops, it should be 633.


Whoo, I think I finally got it right then.


Quote:
By now you know my system better than I do.


Heh, not even close. Just finally got through enough trial and error to get to the right spot. At least I hope it's the right spot. ;)

Janzert

Title: Re: (no) absolute score values for pieces?
Post by 99of9 on May 1st, 2008, 2:26am

on 01/22/06 at 04:02:26, Janzert wrote:
Since people have been wondering what the fame score was at certain times during the recent tournament games, I put together a page to calculate it.

http://www.janzert.com/fame.html


I've added a fair bit more code, so can now present a reasonably comprehensive all-in-one material eval page:
http://www.chem.usyd.edu.au/~hudson_t/arimaa/material_evals_new.html

Title: Re: (no) absolute score values for pieces?
Post by IdahoEv on May 1st, 2008, 3:12am

on 05/01/08 at 02:26:06, 99of9 wrote:
I've added a fair bit more code, so can now present a reasonably comprehensive all-in-one material eval page:
http://www.chem.usyd.edu.au/~hudson_t/arimaa/material_evals_new.html


Wow, that's pretty thorough!   I wouldn't bother with the RabbitCurve systems unless it entertains you to do so.   They were merely experiments to see if the curve could help the system but they quite clearly didn't so they've never seen any contemplation beyond a single experiment...

Title: Re: (no) absolute score values for pieces?
Post by aaaa on May 1st, 2008, 9:38am
Some systems out there would gladly trade a horse for a cat and a rabbit in direct contravention of classical ("Fritzleinian"?) Arimaa theory, but my analysis of game data does appear to bear that out.

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on May 1st, 2008, 1:54pm
Sorry, what does your analysis of game data bear out?  That CR is better than H or the reverse?  What set of games/positions is that based on?

If my opinion is considered "classical" then we have to define when my opinion is/was measured.  I used to have clear preference for H over DR as an opening trade, but I now rate it nearly equal.  I don't know if my change of heart is due to experience, or due to persuasion by reported statistics.  I still clearly prefer H to CR, though.

I remain intrigued by the fact that my intuition is contradicted by game data, as first pointed out by IdahoEv.  (And it isn't just me: ask chessandgo which side of a C for R trade he would prefer in the opening.)  I am therefore quite curious about the exact nature of the data that is contradicting me.

IdahoEv has suggested that material values may be different for bots than for humans.  If so, bot developers may wish to ignore the opinions of human players.  (In particular, Clueless and OpFor may want to stop using FAME.)

It should be much easier to verify material values for actual bots than material values for hypothetical perfect play.  One should be able to play a bot against itself with various material handicaps present from the start (e.g. H for CR) and see which side wins more often.  This would at the very least separate out the causality issue, i.e. prove that the material imbalance causes the difference in winning chances, rather than the causality being reserved, or both being effects of some third cause.

Title: Re: (no) absolute score values for pieces?
Post by aaaa on May 1st, 2008, 2:40pm
You've nailed it completely with pointing out that direction of causality is key here. For example, based on my data, I can say that given a game (biased towards one played between strong players) where at one point one side missed a horse, two dogs and a rabbit while the other missed a horse, a dog, a cat and a rabbit, the former is still more likely to have eventually won the game despite being strictly worse off.

Title: Re: (no) absolute score values for pieces?W
Post by Janzert on May 1st, 2008, 5:45pm
Wow, very nice 99of9. I didn't realize there were anywhere near that many different methods proposed already.

Janzert

Title: Re: (no) absolute score values for pieces?
Post by 99of9 on May 1st, 2008, 6:40pm

on 05/01/08 at 09:38:44, aaaa wrote:
Some systems out there would gladly trade a horse for a cat and a rabbit

Mostly those which were empirically optimized based on game data (by IdahoEv).

Quote:
my analysis of game data does appear to bear that out.

Which presumably means you were using a similar methodology to him!

Title: Re: (no) absolute score values for pieces?
Post by 99of9 on May 1st, 2008, 6:47pm

on 10/13/05 at 21:03:04, nbarriga wrote:
By the way, my current eval function is
R=100
C=200
D=300
H=500
M=800
E=2000

If the oponent lost a complete category, the next category of my pieces is worth the average between the category and the one lost.

If i'm not making myself clear is because i'm not a native english speaker. An example: If the enemy lost both his dogs, the values for my pieces is:
R=150
C=250
D=300
H=500
M=800
E=2000

nbarriga, I see you on the forum right now, so I might as well ask you this.  If you would like me to implement this system on the summary page, I will need a bit more detail.

So when the enemy loses one type of piece, all values of pieces below it are promoted by half of one step?  I presume enemy pieces are *not* promoted?  What if the enemy loses two sets of pieces, or more?

Title: Re: (no) absolute score values for pieces?
Post by nbarriga on May 21st, 2008, 2:03pm
Sorry, I just saw this question. The answer is, I don't remember :( I haven't coded anything for arimaa in a long while, I can paste the actual code here, but I don't even remember if tests showed if it was usefull.

this is the code, I hope it is not to criptic.

//parameters.piece_value are the values you cited
// c is side to play
void piece_value(position *p,int *real_values,int c){
   //int real_values[6];
   int i,j=5;
   real_values[5]=2000;
   for(i=4;i>=0;i--){
       if(bit_count(p->bd[c^1][i+2]!=0)){//if enemy has given piece
           j--;
       }
       real_values[i]=(parameters.piece_value[j]+parameters.piece_value[i])/2;
   }

}

Title: Re: (no) absolute score values for pieces?
Post by aaaa on Jun 8th, 2009, 11:30am
Here is the material evaluation function used by my bot. It is the difference between the result of the following computation for one side and that of the other:

For every friendly non-rabbit with no stronger enemy piece add 2/Q.
For every other friendly non-rabbit add 1/(Q+number_of_stronger_enemy_pieces).
Finally, add G*ln(number_of_friendly_rabbits*number_of_total_friendly_pieces).

The chosen values for the parameters are:
Q=1.447530126
G=0.6314442034

This makes the material evaluation function completely indifferent towards a trade of a dog for two rabbits.

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Jun 8th, 2009, 6:34pm
Thanks for sharing, aaaa.  I like the simplicity of your formula.  Is there a theoretical justification, or did it simply seem to correspond to our intuition for many practical cases?  Does number_of_total_friendly_pieces include the rabbits?  I suppose it must or the function might be undefined.  Do you have a silly acronym for it like FAME or DAPE?  I hope Janzert adds it to his material calculator page so I can play around with it for minimal effort.

Title: Re: (no) absolute score values for pieces?
Post by Janzert on Jun 8th, 2009, 8:20pm
Added to my page (http://arimaa.janzert.com/fame.html); I think I've got it correct. aaaa, let me know if you see it producing any errors.

A few samples (all scores normalized to an initial rabbit):

E vs mhd = 0.16
M vs hcr = 0.74
H vs dc = 0.41
D vs rr = even
C vs rr = 0.73

Janzert

P.S. As long as 99 doesn't mind I really should reintegrate all of 99's work adding the different systems back into my page.

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Jun 9th, 2009, 5:30am
Thanks Janzert.  One case where the aaaa evaluation impresses me is the camel for horse trade.  What is thrilling is not the exact number (1.90) by which the camel is worth more than the horse, but rather that the "even" trade of a pair of horses knocks down the advantage to 0.95.  This is strikingly close to my intuition that, after an initial M for H trade an addition horse swap is worth about a rabbit.  Kudos to bot_quad for knowing which side will benefit from this "even" trade, and by how much.

Title: Re: (no) absolute score values for pieces?
Post by 99of9 on Jun 9th, 2009, 6:03pm

on 06/08/09 at 20:20:09, Janzert wrote:
P.S. As long as 99 doesn't mind I really should reintegrate all of 99's work adding the different systems back into my page.

Go ahead Janzert, I borrowed it all from you in the first place!
http://www.chem.usyd.edu.au/~hudson_t/arimaa/material_evals_new.html

Title: Re: (no) absolute score values for pieces?
Post by aaaa on Jun 15th, 2009, 10:49am
If one considers the formula distinguishable enough to merit its own name, I was thinking of calling it "HarLog", a reference to the harmonic and logarithmic components of the system. Of course, that would be a blend rather than an acronym.

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Jun 15th, 2009, 12:54pm
A catchy name was all that HarLog was missing to take its place as the premiere material evaluation function. Perhaps you could even call it HarmLog for the benefit of the word play (i.e. a journal of the damage suffered by each side;)).

Apparently FAME is now obsolete. Yes, there are a few corner cases where my intuition agrees more with FAME than with HarLog (for example an initial trade of E for DCCRRRRRRR), but the main meat-and-potatoes exchanges that happen all the time seem to be handled a little bit better across the board by HarLog.

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Jun 15th, 2009, 1:16pm
By the way, aaaa, I recall our discussion in which I maintained that two rabbits might always be worth more than a cat, and you proposed EC6R vs E8R as a possible counter-example.  I notice now that all four functions on Janzert's page disagree with me and prefer the cat to the two rabbits.  I'm afraid that my endgame play is so weak that the disagreement reflects badly on my intuition rather than reflecting badly on the (unanimous) material functions.

Title: Re: (no) absolute score values for pieces?
Post by aaaa on Jun 16th, 2009, 3:41am
On the contrary, it was the very fact that you treated the question of which side would be better in this situation as being very much open to discussion that made me settle on the current incarnation of the function with its lack of lopsided evaluation for one side or the other (not much more than the advantage of an initial rabbit).

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Jul 3rd, 2009, 7:09am

on 07/02/09 at 10:39:44, Janzert wrote:
I've finally finished integrating 99of9's additional evaluation systems back into the page and also added piece images. In doing so I used a few css and javascript techniques I've not used before so there is certainly a possibility for browser compatibility problems.

The location was also changed to better reflect that it shows multiple evaluation systems instead of just FAME. The new url is http://arimaa.janzert.com/eval.html (there is a redirect setup for the old url so old links should continue to work for now).

Let me know if you see any errors or have another eval system you'd like to see added.

Janzert

Thanks, Janzert.  You inspired me to try out some more material states to see how the evaluators agree with my intuition.  I noticed something interesting about the M for HD trade.

I had seen before that HarmLog gives a significant edge to HD over M, whereas FAME and DAPE put it about even.  I think my intuition is somewhere in between, so it wasn't much worth commenting on.  But what I just now noticed is the effect of further removing CC from each side.  My intuition is that even trades are slightly disadvantageous to the M side, because the more the board thins out, the more important sheer numbers are relative to having the strongest pieces.  FAME and DAPE both agree with me, slightly preferring HD over M after a trade of CC.  HarmLog, in contrast, thinks the trade helps the M side and reduces the advantage of the HD side.

Of course, my intuition about this case could well be wrong, as it has been wrong about other cases in the past.  If quad enters the 2010 Computer Championship, it will be interesting to see it in action against FAME bots, because there will be some equal trades that both sides will be angling for.

Title: Re: (no) absolute score values for pieces?
Post by Hippo on Feb 17th, 2010, 2:09am
Fritzlein is much further with his bot_Nimrod than me with bot_Hippo... I like both FAME and HarmLog evaluations. FAME is good in positions where you could put equally "ordered" pieces next one to the other, but as said it suffers in EHx against mdx or ccx giving equal results. I would probably go for something in between (considering not only equally ordered matches, but giving equally ordered matches more weigh).

I am planning to add a "quiet" multiplication factor to material evaluation not to be fixed too much on material in races. How to compute the factor is another question.

Last joke: HHCx against mccx is not as good as HDCx at least theoretically the difference is the 3 repetitions rule. It would be difficult to find an example when it will change the game result so not including it in material evaluation would rarely cause any harm.

Hmm it would need a lot of work to set coefficiens well.

So far I have:
1) the FAME rabbit evaluation
2) let SGNi,j is result of comparison of i-th strongest gold piece with j-th strongest silver piece.
I have matrix of coefficients Ci,j. Sum of coordinate-wise multiplication of these two matrices is second summand.
3) last summand contains 1 for each presented stone type and 10000 for presented rabbit (added for gold and subtracted for silver).

C is symmetric: first try
(250 27 10 3 1 0 0 0)
( 27 90 19 7 2 0 0 0)
( 10 19 60 13 6 1 0 0)
( 3 7 13 40 9 2 0 0)
(  1   2   6    9   30  3   1   0)
( 0 0 1 2 3 10 3 1)
( 0 0 0 0 1 3 10 3)
( 0 0 0 0 0 1 3 7)

BTW: Having C diagonal with diagonal
256 85 57 38 25 17 11 7 gives original FAME (first 2 summands). May be I had to start much nearer to this matrix.

So Hippo Adjusted Fritz Arimaa Material Evaluation could be good name for it (HAFAME).

I suppose the C matrix would be changed either for improving the evaluation or for speedup reasons to make the computation easier.

Title: Re: (no) absolute score values for pieces?
Post by Hippo on Feb 22nd, 2010, 12:31pm

on 03/18/06 at 16:49:33, Janzert wrote:
Hmm, I must have something wrong or FAME's a little wierder than I thought.

Try EMHHDDCCR vs EM and a variable number of R's.

For 8-6 silver r's the score goes up for gold as expected. But then 5-0 r's the score actually decreases for gold with each additional r captured. :(

FYI, I'm using the modification (clarification?) by Fritzlein to allow negative rabbits left over.

Janzert


It seems to me gold score increases by each captured silver rabbit even in this case. So no need for fixed 30 points for "negative rabbits" (33 FAME (flored) points for initial captured rabbit is the minimal dynamics rabit value).


on 03/18/06 at 19:45:13, Fritzlein wrote:
Here's a genuine weirdness with FAME:

EMHHDDCCR vs. ER = +652
EHHDDCCR vs. ER = +640

It only lowers Gold's evaluation by 12 points to throw away a camel. Although it leaves Silver with only -5 rabbits instead of -6, it simultaneously weakens Gold's defense from 17 to 15, and against negative offense, FAME thinks a weaker defense is better!

To stop that silliness, each negative leftover rabbit should have a fixed value of, let's say, 40 points to the other team regardless of the size of the larger army. Not that it matters much, but why not patch holes that are easy to patch? There will still be enough unpatchable holes left. :-)


Oh so this was the reason ... I am thinking of following "rabbits" scoring ... take difference of number of pieces (including rabbits) side with more pieces obtains points say the difference * 600/(opponent defense = 2*nonrabbits+rabbits).

It seems to me HarmLog overvalues the rabbits ... the log part is much higher than the other part.

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Feb 22nd, 2010, 6:20pm
Ah, a matrix is a nice way to value non-direct comparisons, and it seems quite relevant.  I look forward to further iterations of HAFAME once the theory meets reality.

Title: Re: (no) absolute score values for pieces?
Post by Hippo on Feb 24th, 2010, 12:55pm
I have used matrix just as an easy way to describe the algorithm.
Of course it would be hard coded. Important thing is that only top left corner bounded by maximal "nonrabbit pieces size" is used.
(Probably jump table would be used for this matrix cut, fastest implementation of score += sgn (a-b) * Ci,j would be score' += (b-a)&C'i,j, score' -= (a-b)&C'i,j where ' values are shifted 3 bits left ... no conditional jumps).

With the other rabbit evaluation ... I would probably end with HAME acronym. ... I am not sure it's OK but initial exchange of 2 horses for 5 rabbits is considered advantage for player with more pieces in HAME.

Actually I use more FAME like
(213, 21,  4, 0, ...)
( 21, 54, 13, 2, 0, ...)
(  4, 13, 40,  7,  1,   0, ...)
(  0,  2,  7, 27,  4,   1,  0, ...)
(  0,  0,  1,  4, 18,   3,  0, ...)
(  0,  0,  0,  1,  3, 12,  2,  0)
(  0,  0,  0,  0,  0,  2,  7,  1)
(  0,  0,  0,  0,  0,  0,  1,  6)

Title: Re: (no) absolute score values for pieces?
Post by aaaa on Feb 24th, 2010, 3:29pm
I was asked whether there is a theoretical justification for my material evaluation function and I can offer a partial one in the form of the fact that it contains two degrees of freedom; I consider this to be ideal on account of the fact that this number corresponds nicely with the three (main?) dimensions by which an army can be measured: the quantity of the pieces, their quality and their goal potential. I encourage anyone to suggest alternative parameter values for HarLog if the outputted values feel off in any of these respects. Rabbits overvalued, you say? Try lowering 'G' and see what values you get then. I'd personally be wary of adding several parameters on a whim though, because that runs the risk of overfitting.

Title: Re: (no) absolute score values for pieces?
Post by Hippo on Feb 25th, 2010, 7:38am
Sorry aaaa, I have had problems with log(0) term :), but it was bad interpretation ... multiplied by friendly nonrabbits instead of all friendly pieces.

Actually when there is no log(0) problem, I valuate rabbits more :) ... (C6R/d3r), (C6R/dcr), (6R/c2r) or (2C7R/hdr) are advantages for gold in HAME and for silver in HAR(M)LOG (if I don't have a bug there).

It's good there are several evaluation functions :), I am not sure with current "co"processors speed. Causes the often divisions and the logarithm problem with speed?

Follows stronger advantages for gold in HAME and stronger advantages for silver in HAR(M)LOG
(2HD2C8R/m2hr) and (2H2D2C7R/m2hr).

Similar comparison HAMExFAME (but roughly 7 times smaller advantages):
(MD2C8R/eh2dcr) (M2DC8R/e2hdcr) (M2DC8R/e2h2dr) (MH2C8R/em2dcr) (MH2D2C6R/em2hdr)

HAR(M)LOGxFAME:
(M2HR/2hd2c8r) (M2HR/2h2d2c8r)

Both (2DC8R/em2hr) (2H2D2CR/emhr) are considered slight advantage for gold in HAME, but I am really not sure with that.
In the former case ... does silver have enough pieces to take control of both home traps and prevent goaling?
In the later case ... it seems to me silver can go for elimination and the lot of weak nonrabbit pieces does not help gold.

Title: Re: (no) absolute score values for pieces?
Post by jdb on Jul 28th, 2010, 2:13pm
I have been doing some tests with the various material evaluators using Janzert's roundrobin program. The games are 10 sec per move, so a game takes around 10 minutes.  I'll post the results when there are enough games.

Assume there is a H for d trade. Who benefits from equal trades? The side with the extra H benefits from equal trades of camels or horses. This gets them closer to having the strongest piece. What about equal trades of dogs, cats or rabbits?

Assume one side has an extra rabbit. Who benefits from equals trades? Trading rabbits eventually leads to 2 rabbits vs 1 rabbit. The extra rabbit becomes a huge advantage. What about trading cats, dogs, horses or camels? Eventually this leads to E vs e with an extra rabbit. This also looks like a big advantage for the extra rabbit.

Assume one side has an extra camel. Who benefits from equal trades? Fame/Harlog puts the initial advantage at 5.64/6.48. With everything but the rabbits traded off, leaving EM8R vs E8R, Fame/Harlog is 6.38/3.80. Finally EMR vs er, FAME/Harlog is 8.46/5.31. This looks like an area for improvement. What is correct in this case?

The reason I am asking about this, is I was hoping to come up with a set of basic tests a material evaluator needs to pass. Things like a camel is worth more than a horse etc. It takes alot of time to get enough test games and I wanted a way to filter the different evaluators.


Title: Re: (no) absolute score values for pieces?
Post by rbarreira on Jul 28th, 2010, 2:33pm
It seems logical to me that the side with the advantage always benefits from trading equal material.

Isn't this the whole idea behind FAME?

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Jul 28th, 2010, 3:10pm

on 07/28/10 at 14:33:26, rbarreira wrote:
It seems logical to me that the side with the advantage always benefits from trading equal material.

Isn't this the whole idea behind FAME?

I wouldn't say it is the whole idea. :) A strong part of my motivation to get away from fixed values for the pieces was that a camel goes down in value on an emptier board, while rabbits go up in value. It was clear even then that Bomb vastly overvalued its camel in an endgame, to its own detriment.

A subtle point is that the side with the advantage might have fewer pieces, e.g. M for HR.  The side with the camel has an advantage, but equal trades can turn the advantage into a disadvantage in a big hurry.


on 07/28/10 at 14:13:59, jdb wrote:
I have been doing some tests with the various material evaluators using Janzert's roundrobin program. The games are 10 sec per move, so a game takes around 10 minutes. I'll post the results when there are enough games.

Fantastic. I am very curious to see your results. I wonder whether the results will be statistically significant, or the genuine differences in evaluation will be drowned in noise.


Quote:
Assume there is a H for d trade. Who benefits from equal trades? The side with the extra H benefits from equal trades of camels or horses. This gets them closer to having the strongest piece. What about equal trades of dogs, cats or rabbits?

I personally am indifferent to trade of weaker pieces. My reasoning is that emptying the board destabilizes the position, which benefits the player who is behind in material. On the other hand, the fewer pieces are on the board, the more likely it is for the mismatch to be relevant. I let the two opposing considerations cancel out in my mind, although I am sure sometimes one is more important than the other.  This is an area where I wouldn't feel confident to tell a material evaluator that it was wrong whichever way it was leaning.


Quote:
Assume one side has an extra rabbit. Who benefits from equals trades? Trading rabbits eventually leads to 2 rabbits vs 1 rabbit. The extra rabbit becomes a huge advantage. What about trading cats, dogs, horses or camels? Eventually this leads to E vs e with an extra rabbit. This also looks like a big advantage for the extra rabbit.

Yes, every equal trade should benefit a player with an extra rabbit. Use with caution, though, because an extra rabbit is a small advantage on a full board and a still small (albeit greater) advantage when it gets down to E8R vs. e7r. Against a computer opponent I might not want to trade down because I expect the endgame to be its forte despite my material advantage.


Quote:
Assume one side has an extra camel. Who benefits from equal trades? Fame/Harlog puts the initial advantage at 5.64/6.48. With everything but the rabbits traded off, leaving EM8R vs E8R, Fame/Harlog is 6.38/3.80. Finally EMR vs er, FAME/Harlog is 8.46/5.31. This looks like an area for improvement. What is correct in this case?

HarLog performs better than FAME on this one, although I don't entirely trust HarLog either because of how it treats rabbits. When I have an extra camel, I feel that any equal trade weakens my position, even if only slightly. My objective when I am up a camel is always to get a better-than-equal trade or (ideally) to get something for nothing. I feel sufficiently strongly about this that I think it would make a good litmus test of the kind you are seeking, i.e. any material evaluator that likes any equal trade (except elephants :D) when up a camel is just wrong and should be replaced by some modified form of itself.

But the effect is not strong.  If I am up a camel on a full board, trading dog for dog only hurts me a little, whereas winning a rabbit outright helps me significantly, so I would be happy to win DR for D.

Title: Re: (no) absolute score values for pieces?
Post by jdb on Jul 28th, 2010, 4:07pm

on 07/28/10 at 15:10:36, Fritzlein wrote:
A subtle point is that the side with the advantage might have fewer pieces, e.g. M for HR. The side with the camel has an advantage, but equal trades can turn the advantage into a disadvantage in a big hurry.


I agree the number of pieces is very important, both relative and absolute.


Quote:
Fantastic. I am very curious to see your results. I wonder whether the results will be statistically significant, or the genuine differences in evaluation will be drowned in noise.


Bayeselo is wonderful.


Quote:
I personally am indifferent to trade of weaker pieces. My reasoning is that emptying the board destabilizes the position, which benefits the player who is behind in material. On the other hand, the fewer pieces are on the board, the more likely it is for the mismatch to be relevant. I let the two opposing considerations cancel out in my mind, although I am sure sometimes one is more important than the other. This is an area where I wouldn't feel confident to tell a material evaluator that it was wrong whichever way it was leaning.

Yes, every equal trade should benefit a player with an extra rabbit. Use with caution, though, because an extra rabbit is a small advantage on a full board and a still small (albeit greater) advantage when it gets down to E8R vs. e7r. Against a computer opponent I might not want to trade down because I expect the endgame to be its forte despite my material advantage.


I'll run some test games with E8R vs e7r and see what happens.


Quote:
HarLog performs better than FAME on this one, although I don't entirely trust HarLog either because of how it treats rabbits. When I have an extra camel, I feel that any equal trade weakens my position, even if only slightly. My objective when I am up a camel is always to get a better-than-equal trade or (ideally) to get something for nothing. I feel sufficiently strongly about this that I think it would make a good litmus test of the kind you are seeking, i.e. any material evaluator that likes any equal trade (except elephants :D) when up a camel is just wrong and should be replaced by some modified form of itself.

But the effect is not strong. If I am up a camel on a full board, trading dog for dog only hurts me a little, whereas winning a rabbit outright helps me significantly, so I would be happy to win DR for D.


I'll have to think about this. If everything is traded off it comes down to EMR vs er which is a big advantage. But as you said, there is a period during the trades where the position can be destabilized. Maybe it is necessary to define when the material evaluator can be applied. That is, its only valid in a stable position without alot of threats.

Title: Re: (no) absolute score values for pieces?
Post by jdb on Jul 29th, 2010, 4:24pm
Here are the results of some testing between FAME,HarLog, and Constant. Time control was 10sec per move.



Rank Name    .     .     Elo    +    - games score oppo. draws
  1 Clueless_FAME  .   2221   40   39   121   54%  2189    0%
  2 Clueless_HarLog    2210   40   39   120   52%  2195    0%
  3 Clueless_Constant  2169   39   40   121   45%  2215    0%

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Jul 29th, 2010, 7:07pm
Nice, thanks for sharing.  Is the reported +/- two standard deviations?  It appears that FAME and HarLog are statistically significantly better than constant piece values, but statistically indistinguishable from each other.

Did you tell me once that other positional factor in Clueless are tuned to work with FAME?   If so, would that put HarLog at a relative disadvantage?

Title: Re: (no) absolute score values for pieces?
Post by jdb on Jul 29th, 2010, 7:48pm
The table was generated by bayeselo. I don't know what the +/- means.

The value of an initial rabbit is normalized for all material evaluators in clueless' eval. I dont think it would matter to HarLog.


Title: Re: (no) absolute score values for pieces?
Post by Janzert on Jul 29th, 2010, 8:17pm
The +/- columns are confidence interval. I'm pretty positive 95% CI although I can't find anything explicitly stating it right now. I did find this post by Remi Coloumn on talkchess describing the four methods bayeselo has available for calculating the CI.


Quote:
Bayeselo offer 4 different algorithms for computing confidence intervals. This is the list of options, from the least accurate and fastest, to the most accurate and slowest:

   * Default: assume opponents ratings are their true ratings, and Gaussian distribution
   * "exactdist": assume opponents ratings are their true ratings, but does not assume Gaussian distribution. This will produce asymmetric intervals, especially for very high or very low winning rates. Cost is linear in the number of players.
   * "covariance": assume Gaussian distribution, but not that the rating of opponents are true. This may be very costly if you have thousands of players, but it is more accurate than the default. The cost is cubic in the number of players (it is a matrix inversion)
   * "jointdist": computes a numerical estimation of the whole distribution. It is the most accurate, but the cost is exponential in the number of players. May work for 3-4 players. You should reduce the resolution of the discretization for more players.


The output from the los (likelyhood-of-superiority) command would also be interesting to see.

Janzert

Title: Re: (no) absolute score values for pieces?
Post by rbarreira on Jul 30th, 2010, 3:17am

on 07/29/10 at 19:07:40, Fritzlein wrote:
Nice, thanks for sharing. Is the reported +/- two standard deviations? It appears that FAME and HarLog are statistically significantly better than constant piece values, but statistically indistinguishable from each other.


Actually if you take the two extremes, static values may be as high as 2208 while FAME may be as low as 2182. Or am I misunderstanding something?

Unfortunately it is necessary to test a very high number of games to test most changes, at least search-related ones... I have more or less accepted that I won't be able to conclusively test many of the changes I do to my bot.

Not everyone has a big cluster like Dr. Robert Hyatt, and CPU time at Amazon EC2 isn't cheap enough for me.

Title: Re: (no) absolute score values for pieces?
Post by jdb on Jul 30th, 2010, 5:52am
The games are 10sec per move. All pieces are setup on the first two ranks.


Rank Name . . Elo + - games score oppo. draws
1 Clueless_EC6R 2292 34 33 203 65% 2156 0%
2 Clueless_E8R 2262 27 26 303 65% 2130 0%
3 Clueless_E7R 2046 28 29 300 25% 2272 0%


Output of LOS:
. . .Cl Cl Cl
Clueless_EC6R 86 100
Clueless_E8R 13 100
Clueless_E7R 0 0

Output of detail:
1 Clueless_EC6R 2292 203.0 (131.0 : 72.0)
103.0 ( 52.0 : 51.0) Clueless_E8R 2262
100.0 ( 79.0 : 21.0) Clueless_E7R 2046
2 Clueless_E8R 2262 303.0 (196.0 : 107.0)
103.0 ( 51.0 : 52.0) Clueless_EC6R 2292
200.0 (145.0 : 55.0) Clueless_E7R 2046
3 Clueless_E7R 2046 300.0 ( 76.0 : 224.0)
100.0 ( 21.0 : 79.0) Clueless_EC6R 2292
200.0 ( 55.0 : 145.0) Clueless_E8R 2262


Now running tournament with:

E8R,EC6R,E7R,EC5R,ECC4R,ECC3R


Code:
Rank Name    .     .  Elo    +    - games score oppo. draws
  1 Clueless_ECC4R  2362   52   48   143   71%  2168    0%
  2 Clueless_EC6R   2307   32   31   348   65%  2164    0%
  3 Clueless_E8R    2263   28   27   448   63%  2149    0%
  4 Clueless_EC5R   2114   47   49   143   38%  2217    0%
  5 Clueless_ECC3R  2113   47   48   143   38%  2217    0%
  6 Clueless_E7R    2041   29   30   445   26%  2263    0%
               Cl Cl Cl Cl Cl Cl
Clueless_ECC4R     94 99 99 99100
Clueless_EC6R    5    97 99 99100
Clueless_E8R     0  2    99 99100
Clueless_EC5R    0  0  0    51 98
Clueless_ECC3R   0  0  0 48    98
Clueless_E7R     0  0  0  1  1  
  1 Clueless_ECC4R  2362 143.0 (102.0 :  41.0)
                          29.0 ( 15.0 :  14.0) Clueless_EC6R   2307
                          29.0 ( 20.0 :   9.0) Clueless_E8R    2263
                          28.0 ( 20.0 :   8.0) Clueless_EC5R   2114
                          28.0 ( 25.0 :   3.0) Clueless_ECC3R  2113
                          29.0 ( 22.0 :   7.0) Clueless_E7R    2041
  2 Clueless_EC6R   2307 348.0 (227.0 : 121.0)
                          29.0 ( 14.0 :  15.0) Clueless_ECC4R  2362
                         132.0 ( 69.0 :  63.0) Clueless_E8R    2263
                          29.0 ( 24.0 :   5.0) Clueless_EC5R   2114
                          29.0 ( 16.0 :  13.0) Clueless_ECC3R  2113
                         129.0 (104.0 :  25.0) Clueless_E7R    2041
  3 Clueless_E8R    2263 448.0 (280.0 : 168.0)
                          29.0 (  9.0 :  20.0) Clueless_ECC4R  2362
                         132.0 ( 63.0 :  69.0) Clueless_EC6R   2307
                          29.0 ( 18.0 :  11.0) Clueless_EC5R   2114
                          29.0 ( 21.0 :   8.0) Clueless_ECC3R  2113
                         229.0 (169.0 :  60.0) Clueless_E7R    2041
  4 Clueless_EC5R   2114 143.0 ( 55.0 :  88.0)
                          28.0 (  8.0 :  20.0) Clueless_ECC4R  2362
                          29.0 (  5.0 :  24.0) Clueless_EC6R   2307
                          29.0 ( 11.0 :  18.0) Clueless_E8R    2263
                          28.0 ( 16.0 :  12.0) Clueless_ECC3R  2113
                          29.0 ( 15.0 :  14.0) Clueless_E7R    2041
  5 Clueless_ECC3R  2113 143.0 ( 55.0 :  88.0)
                          28.0 (  3.0 :  25.0) Clueless_ECC4R  2362
                          29.0 ( 13.0 :  16.0) Clueless_EC6R   2307
                          29.0 (  8.0 :  21.0) Clueless_E8R    2263
                          28.0 ( 12.0 :  16.0) Clueless_EC5R   2114
                          29.0 ( 19.0 :  10.0) Clueless_E7R    2041
  6 Clueless_E7R    2041 445.0 (116.0 : 329.0)
                          29.0 (  7.0 :  22.0) Clueless_ECC4R  2362
                         129.0 ( 25.0 : 104.0) Clueless_EC6R   2307
                         229.0 ( 60.0 : 169.0) Clueless_E8R    2263
                          29.0 ( 14.0 :  15.0) Clueless_EC5R   2114
                          29.0 ( 10.0 :  19.0) Clueless_ECC3R  2113

Title: Re: (no) absolute score values for pieces?
Post by Hippo on Jul 31st, 2010, 1:49pm

on 07/29/10 at 16:24:11, jdb wrote:
Here are the results of some testing between FAME,HarLog, and Constant. Time control was 10sec per move.



Rank Name . . Elo + - games score oppo. draws
1 Clueless_FAME . 2221 40 39 121 54% 2189 0%
2 Clueless_HarLog 2210 40 39 120 52% 2195 0%
3 Clueless_Constant 2169 39 40 121 45% 2215 0%


It would be interesting to test HA(FA)ME as well ...

Title: Re: (no) absolute score values for pieces?
Post by jdb on Jul 31st, 2010, 5:59pm

on 02/17/10 at 02:09:16, Hippo wrote:
So far I have:
1) the FAME rabbit evaluation
2) let SGNi,j is result of comparison of i-th strongest gold piece with j-th strongest silver piece.
I have matrix of coefficients Ci,j. Sum of coordinate-wise multiplication of these two matrices is second summand.
3) last summand contains 1 for each presented stone type and 10000 for presented rabbit (added for gold and subtracted for silver).

C is symmetric: first try
(250 27 10 3 1 0 0 0)
( 27 90 19 7 2 0 0 0)
( 10 19 60 13 6 1 0 0)
( 3 7 13 40 9 2 0 0)
( 1 2 6 9 30 3 1 0)
( 0 0 1 2 3 10 3 1)
( 0 0 0 0 1 3 10 3)
( 0 0 0 0 0 1 3 7)

BTW: Having C diagonal with diagonal
256 85 57 38 25 17 11 7 gives original FAME (first 2 summands). May be I had to start much nearer to this matrix.



I can test this too but I am unsure how to handle part 3.

Title: Re: (no) absolute score values for pieces?
Post by Hippo on Aug 1st, 2010, 12:27pm

on 07/31/10 at 17:59:16, jdb wrote:
I can test this too but I am unsure how to handle part 3.


Last summand is not important (the high value of last rabbit is covered by elimination rule tests, and the at most 5 points are small enough to be notable) ... Player gains 1 point if he has a cat, 1 point if he has a dog, 1 point if he has a horse, 1 point if he has a camel and 1 point if he has an elephant.
In that case EDC8R/em8r is 1 point for gold better to EHH8R/em8r.

I was trying to make a puzzle where there the advantage of stone diversity is important but I don't think it would ever be important in real arimaa game so you can ignore the third summand :)

Here is code of evaluation I was trying (but not implementing bot yet).


Code:
static int[] f_powers(int[] pieces)
{
int[] powers = new int[8]; int k = 0;
for (int i = pieces.Length; i > 0; i--)
for (int j = 0; j < pieces[i - 1] && k < 8; j++)
powers[k++] = i;
for (; k < 8; k++) powers[k] = 0;
return powers;
}

static long f_HAME0eval(int g, int s)
{
int[] gPieces = f_pieces(g), sPieces = f_pieces(s);
int[] gPowers = f_powers(gPieces), sPowers = f_powers(sPieces);
int[,] weights = new int[8, 8]
[2130, 210, 40, 0, 0, 0, 0, 0},
{ 210, 540, 130, 20, 0, 0, 0, 0},
{ 40, 130, 400, 70, 10, 0, 0, 0},
{ 0, 20, 70, 270, 40, 10, 0, 0},
{ 0, 0, 10, 40, 180, 30, 0, 0},
{ 0, 0, 0, 10, 30, 120, 20, 0},
{ 0, 0, 0, 0, 0, 20, 70, 10},
{ 0, 0, 0, 0, 0, 0, 10, 60];
int gLastNonRabbit = 0, sLastNonRabbit = 0, maxLastNonRabbit = 0;
long score = 0;
for (int i = 0; i < 8; i++)
{
if (gPowers[i] > 1) maxLastNonRabbit = gLastNonRabbit = i + 1;
if (sPowers[i] > 1) maxLastNonRabbit = sLastNonRabbit = i + 1;
if (i == maxLastNonRabbit) break;
}
for (int i = 0; i < maxLastNonRabbit; i++)
for (int j = 0; j < maxLastNonRabbit; j++)
{
if (gPowers[i] > sPowers[j]) score += weights[i, j];
if (gPowers[i] < sPowers[j]) score -= weights[i, j];
}
int gNrPieces = gPieces[0] + gLastNonRabbit,
sNrPieces = sPieces[0] + sLastNonRabbit;
int gResist = gNrPieces + gPieces[0];
int sResist = sNrPieces + sPieces[0];
if ((gResist > 0) && (sResist > 0))
if (gNrPieces > sNrPieces)
score += (gNrPieces - sNrPieces) * 1200 / sResist;
else
score -= (sNrPieces - gNrPieces) * 1200 / gResist;
for (int i = 1; i < 6; i++)
{
if (gPieces[i] > 0) score++;
if (sPieces[i] > 0) score--;
}
if (gPieces[0] > 0) score += 10000-840/gPieces[0];
if (sPieces[0] > 0) score -= 10000-840/sPieces[0];
// 840,480,280,210,168,140,120,105
return score;
}


But as I read it now, resist should be probably 2*nrpieces-pieces[0]. The code was not optimised for speed. I have used it to precompute the evaluation table and access the table rather than recomputing so this neednot be the issue.

Title: Re: (no) absolute score values for pieces?
Post by rbarreira on Aug 1st, 2010, 3:34pm
jdb, one thing that I have noticed while running tests with roundrobin:

The default time limit for a whole game is 10 minutes. If you see games ending due to reason "s" it's because this time was exceeded.

I changed it to 0 which should be unlimited, since I don't want results to get distorted due to this time limit. Or maybe I should use something bigger, in case there's an infinite loop or something.

No matter what, those results should be eliminated from the pgn if they are happening.

Title: Re: (no) absolute score values for pieces?
Post by Janzert on Aug 1st, 2010, 8:08pm
By default if it isn't specified in the timecontrol there shouldn't be any limit on the game length. There may very well be a bug there in the current version though.

Just FYI, generally for testing I've been using a move limit instead of a time limit. It seems a little easier than calculating a reasonable time limit for each time control I test at. To set a move limit just append 't' to the limit. So it would look something like 3s/15s/100/0/125t.

Janzert

Title: Re: (no) absolute score values for pieces?
Post by jdb on Aug 2nd, 2010, 3:39pm
Latest bunch of games.

A couple observations.

1) A cat is worth almost exactly 2 rabbits, as long as both sides have 4 or more rabbits.

2) The first 4 rabbits captured are worth about the same. After that their value goes up quickly.


Code:
Rank Name    .    .   Elo    +    - games score oppo. draws
  1 Clueless_ECC8R  3309   90   76   230   93%  2538    0%
  2 Clueless_ECC7R  3219   76   69   230   88%  2545    0%
  3 Clueless_ECC6R  3019   63   60   230   77%  2561    0%
  4 Clueless_EC8R   2987   59   56   243   75%  2550    0%
  5 Clueless_ECC5R  2848   58   57   230   64%  2574    0%
  6 Clueless_EC7R   2815   54   54   241   61%  2569    0%
  7 Clueless_ECC4R  2640   40   39   375   56%  2543    0%
  8 Clueless_EC6R   2612   31   31   579   58%  2511    0%
  9 Clueless_E8R    2551   27   27   702   54%  2501    0%
 10 Clueless_ECC3R  2422   40   41   375   35%  2570    0%
 11 Clueless_EC5R   2418   40   41   375   34%  2571    0%
 12 Clueless_E7R    2337   28   29   836   52%  2188    0%
 13 Clueless_EC4R   2267   53   51   368   78%  1756    0%
 14 Clueless_E6R    2148   49   48   381   69%  1786    0%
 15 Clueless_ECC2R  2113   49   48   366   69%  1763    0%
 16 Clueless_E5R    1996   47   48   381   59%  1797    0%
 17 Clueless_EC3R   1991   47   47   368   61%  1776    0%
 18 Clueless_E4R    1730   49   50   381   42%  1817    0%
 19 Clueless_EC2R   1653   51   53   368   39%  1801    0%
 20 Clueless_ECC1R  1553   54   56   366   34%  1804    0%
 21 Clueless_E3R    1494   56   58   381   29%  1834    0%
 22 Clueless_E2R    1110   73   79   381   13%  1862    0%
 23 Clueless_EC1R   1025   77   83   368   11%  1847    0%
 24 Clueless_E1R     542  306 -269   381    1%  1904    0%

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Aug 2nd, 2010, 7:28pm
Very interesting, JDB.  Thanks for sharing.  I recall that I once proposed that two rabbits would always be worth more than a cat.  Aaaa suggested EC6R vs E8R as a possible counter-example, and it seem that he was correct.  It doesn't flip to the two rabbits being more valuable until three more rabbits have been exchanged.

Title: Re: (no) absolute score values for pieces?
Post by jdb on Aug 5th, 2010, 7:20pm
I added dogs to the handicap matches. It will take a couple weeks to get enough games to cover all the cases.

Title: Re: (no) absolute score values for pieces?
Post by jdb on Aug 9th, 2010, 3:40pm
Another round of testing. This time DC vs D vs CC.

The relative value of each pair depends greatly on the number of rabbits remaining.


Code:
Rank Name    .     .  Elo    +    - games score oppo. draws
  1 Clueless_EDC8R   794   44   41   774   91%     9    0%
  2 Clueless_ECC8R   730   34   32  1227   91%   -45    0%
  3 Clueless_EDC7R   635   38   36   758   83%    14    0%
  4 Clueless_EDC6R   597   37   36   750   81%    10    0%
  5 Clueless_ECC7R   542   28   28  1223   81%   -33    0%
  6 Clueless_ED8R    485   29   29  1171   80%   -95    0%
  7 Clueless_ECC6R   438   27   27  1199   74%   -26    0%
  8 Clueless_EDC5R   348   34   34   750   66%    24    0%
  9 Clueless_ECC5R   296   27   27  1162   65%   -16    0%
 10 Clueless_ED7R    286   27   27  1160   68%   -82    0%
 11 Clueless_EDC4R   171   35   35   734   54%    47    0%
 12 Clueless_ED6R    151   28   28  1156   60%   -75    0%
 13 Clueless_ECC4R   111   27   28  1162   53%    -4    0%
 14 Clueless_ED5R    -77   28   29  1156   45%   -60    0%
 15 Clueless_EDC3R   -92   37   38   733   38%    60    0%
 16 Clueless_ECC3R  -187   31   31  1159   36%    15    0%
 17 Clueless_ED4R   -230   30   31  1145   36%   -48    0%
 18 Clueless_EDC2R  -387   42   44   729   24%    72    0%
 19 Clueless_ECC2R  -400   34   35  1159   25%    28    0%
 20 Clueless_ED3R   -440   34   34  1145   26%   -34    0%
 21 Clueless_ED2R   -738   42   44  1129   14%   -10    0%
 22 Clueless_EDC1R  -887   60   64   729    7%    98    0%
 23 Clueless_ECC1R  -925   49   51  1159    7%    62    0%
 24 Clueless_ED1R  -1220   67   80  1113    2%    14    0%


Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Aug 9th, 2010, 4:50pm
Thanks for sharing, JDB.  This shows I don't know much about endgames.  I would have expected DR to be worth more than CC, but it isn't until the CC player is down to his last rabbit.  Also I would have expected that when dogs are still on the board, C is worth less than RR, but the C is worth more as long as both players still have at least 4 rabbits.

If I'm wrong about both of these things, I am probably at least correct that C is worth more than R as an initial trade, contrary to statistics from the game database suggesting otherwise.  Although who knows what results you would get from having clueless play itself head-to-head with C handicap versus R handicap?

A serious student of Arimaa (i.e. not me) would surely benefit from playing out some of these unbalanced endgames against a strong bot, both for general understanding of endgames, and in particular for understanding the value of material in endgames.

Title: Re: (no) absolute score values for pieces?
Post by jdb on Aug 9th, 2010, 7:09pm

on 08/09/10 at 16:50:51, Fritzlein wrote:
If I'm wrong about both of these things, I am probably at least correct that C is worth more than R as an initial trade, contrary to statistics from the game database suggesting otherwise. Although who knows what results you would get from having clueless play itself head-to-head with C handicap versus R handicap?


I could play some games with an initial C vs r handicap, but I am not sure how good the results would be. In these lower material situations the bots are ruthless in exploiting the advantage. That is, they know how to convert the win. With so much material remaining, the bot doesn't really know how to play either side. This leaves more room for gaps in knowledge to cloud the results.





Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Aug 9th, 2010, 11:58pm
Good point.  The results are only as significant as the player is strong, so endgames are the only realm in which computers can speak with authority.  I recall that bot random proved that an elephant is worth less than a rabbit as an initial trade.  :D

Title: Re: (no) absolute score values for pieces?
Post by jdb on Sep 1st, 2010, 3:24pm
Another round of testing.

This set includes every material combination using only  DCR. (and always with the E)

The results towards the top of the table are probably suspect. I'll rerun the tournament from the beginning when I'm done tuning with bot_nomhh. Towards the bottom of the table, the results should be a lot more reliable.




Code:
Rank Name    .    .   . Elo    +    - games score oppo. draws
  1 Clueless_EDDCC8R  1168  102  102    98   88%   506    0%
  2 Clueless_EDDCC7R  1106   91   91   111   85%   526    0%
  3 Clueless_EDDC8R   1025   88   88   123   83%   487    0%
  4 Clueless_EDDCC6R  1013   80   80   128   80%   527    0%
  5 Clueless_EDCC8R    949   70   70   219   86%   318    0%
  6 Clueless_EDCC7R    823   61   61   224   77%   351    0%
  7 Clueless_EDDC7R    820   71   71   138   70%   509    0%
  8 Clueless_EDDCC5R   814   75   75   125   67%   519    0%
  9 Clueless_EDDC6R    802   71   71   148   70%   495    0%
 10 Clueless_EDD8R     746   69   69   148   67%   490    0%
 11 Clueless_EDCC6R    688   55   55   246   69%   363    0%
 12 Clueless_EDC8R     679   46   46   469   79%   145    0%
 13 Clueless_ECC8R     645   41   41   732   86%     7    0%
 14 Clueless_EDDCC4R   608   63   63   162   61%   433    0%
 15 Clueless_EDC7R     559   43   43   469   73%   134    0%
 16 Clueless_EDDC5R    547   63   63   159   61%   385    0%
 17 Clueless_EDD7R     512   64   64   158   56%   410    0%
 18 Clueless_EDCC5R    510   53   53   249   60%   320    0%
 19 Clueless_ECC7R     486   35   35   743   79%     5    0%
 20 Clueless_EDDCC3R   475   62   62   169   60%   322    0%
 21 Clueless_EDD6R     465   62   62   170   61%   321    0%
 22 Clueless_EDDC4R    455   61   61   163   58%   339    0%
 23 Clueless_EDC6R     431   42   42   470   66%   117    0%
 24 Clueless_ED8R  .   384   44   44   455   73%   -44    0%
 25 Clueless_EDCC4R    356   54   54   246   52%   277    0%
 26 Clueless_ECC6R     350   33   33   740   71%     0    0%
 27 Clueless_EC8R  .   328   40   40   436   66%    77    0%
 28 Clueless_EDC5R     258   41   41   469   58%    89    0%
 29 Clueless_EDD5R     219   59   59   177   53%   182    0%
 30 Clueless_ED7R   .  219   42   42   471   63%   -45    0%
 31 Clueless_ECC5R     192   31   31   762   59%     5    0%
 32 Clueless_EDDCC2R   179   62   62   169   58%    84    0%
 33 Clueless_EC7R   .  156   38   38   458   54%    61    0%
 34 Clueless_EDDC3R    147   59   59   173   54%   108    0%
 35 Clueless_EDCC3R    104   56   56   252   42%   201    0%
 36 Clueless_ED6R   .   72   42   42   471   56%   -74    0%
 37 Clueless_EDD4R  .   40   59   59   173   46%   104    0%
 38 Clueless_EDC4R   .  29   43   43   488   45%    75    0%
 39 Clueless_ECC4R   .   9   28   28   895   51%   -34    0%
 40 Clueless_EC6R   .   -4   27   27   779   53%   -48    0%
 41 Clueless_E8R   .   -63   24   24   957   50%   -73    0%
 42 Clueless_EDD3R    -138   61   61   161   46%   -66    0%
 43 Clueless_ED5R     -167   43   43   469   44%  -111    0%
 44 Clueless_EDC3R    -172   44   44   491   36%    43    0%
 45 Clueless_EC5R     -179   33   33   580   36%   -26    0%
 46 Clueless_EDDC2R   -180   59   59   168   44%   -86    0%
 47 Clueless_EDCC2R   -193   57   57   272   32%   110    0%
 48 Clueless_ECC3R    -217   30   30   892   35%   -42    0%
 49 Clueless_E7R    . -248   26   26   927   48%  -305    0%
 50 Clueless_EDDCC1R  -262   62   62   157   41%  -127    0%
 51 Clueless_ED4R     -315   45   45   454   37%  -122    0%
 52 Clueless_EC4R     -366   46   46   394   64%  -608    0%
 53 Clueless_E6R   .  -421   42   42   449   59%  -597    0%
 54 Clueless_EDD2R    -439   66   66   143   34%  -210    0%
 55 Clueless_EDC2R    -464   50   50   468   23%    24    0%
 56 Clueless_ECC2R    -484   38   38   707   43%  -371    0%
 57 Clueless_EDDC1R   -532   70   70   136   30%  -240    0%
 58 Clueless_ED3R     -537   50   50   440   26%  -136    0%
 59 Clueless_EDCC1R   -545   68   68   251   18%    65    0%
 60 Clueless_EC3R     -573   46   46   362   56%  -702    0%
 61 Clueless_E5R   .  -605   44   44   384   54%  -702    0%
 62 Clueless_ED2R     -768   61   61   391   18%  -163    0%
 63 Clueless_E4R   .  -801   48   48   367   42%  -716    0%
 64 Clueless_EDD1R    -842   93   93   100   20%  -322    0%
 65 Clueless_EDC1R    -859   70   70   413    9%    21    0%
 66 Clueless_ECC1R    -963   48   48   635   21%  -388    0%
 67 Clueless_EC2R     -986   55   55   322   34%  -745    0%
 68 Clueless_E3R     -1021   56   56   339   32%  -734    0%
 69 Clueless_ED1R    -1280  111  111   357    3%  -120    0%
 70 Clueless_E2R     -1377   75   75   318   15%  -703    0%
 71 Clueless_EC1R    -1493   84   84   299   12%  -720    0%
 72 Clueless_E1R     -1875  141  141   304    1%  -652    0%

Title: Re: (no) absolute score values for pieces?
Post by pago on Sep 15th, 2010, 3:28pm

Quote:
The results towards the top of the table are probably suspect. I'll rerun the tournament from the beginning when I'm done tuning with bot_nomhh. Towards the bottom of the table, the results should be a lot more reliable.


Hello jdb,
I find your tests very interesting to compare them with evaluator behaviors.
Indeed, some results are suspect in this last batch (EDD7R < EDC7R for ex), so I am waiting for your next tournament.

I am wondering if some inconsistency could be linked to a kind of non-linearity (or even intransitivity although I am aware that it is controversial).
Imagine that setup1<setup2<setup3 and setup3 has more difficulty to beat setup1 than setup2.
In that case, I feel that you should perform all the duels a great number times to get an accurate result.

Iwould also be interested to get the results of the duels that you have performed. It is a very good reference to verify the consistency of evaluators

Title: Re: (no) absolute score values for pieces?
Post by aaaa on Sep 20th, 2010, 7:53am
jdb, would you be willing to give the pairwise outcome matrices from now on as well? Thanks.

Title: Re: (no) absolute score values for pieces?
Post by jdb on Sep 21st, 2010, 8:43am

on 09/20/10 at 07:53:04, aaaa wrote:
jdb, would you be willing to give the pairwise outcome matrices from now on as well? Thanks.


Janzert kindly put the pgn file for the tournament on his website. It is compatible with bayeselo.

http://arimaa.janzert.com/jdb/reduced_material_result.zip

Title: Re: (no) absolute score values for pieces?
Post by pago on Sep 28th, 2010, 3:02pm

I would like to suggest one way to use jdb's work.

If we assume that jdb's results are as close as possible from real results we can use them to calculate error indicators of results foreseen by an evaluator.

1) calculate all the results foreseen by the evaluator on the tournament. For example in DCR tournament calculate the evaluation of the 72*72 duels
2) Calculate the average evaluation of each setup
3) Get the rank estimated by the evaluator for each setup according to the average
4) Calculate error indicator assuming that jdb's results are the real observations. I suggest the following ones (I have not a clear idea of the most pertinent) :
- Root Mean Square Error (RMSE)
- Mean Absolute Error (MAE)
- Mean Absolute Percentage Error (MAPE)

It could be a mean to perform some preliminary tests and get a preliminary "objective" performance measurement before implementing the evaluator in a bot.
Of course it would not be perfect because it depends on jdb's results accuracy (in particular some results of DCR tournament should be improved).

Title: Re: (no) absolute score values for pieces?
Post by pago on Sep 30th, 2010, 4:54am


Quote:
Janzert kindly put the pgn file for the tournament on his website. It is compatible with bayeselo.

http://arimaa.janzert.com/jdb/reduced_material_result.zip


I would like to thank you for sharing your results. Personnaly I find them very interresting and useful.


Quote:
The results towards the top of the table are probably suspect. I'll rerun the tournament from the beginning when I'm done tuning with bot_nomhh. Towards the bottom of the table, the results should be a lot more reliable.


I have found some weird results when I used your pgn file.
For example clueless got the following results :
ED8R / EDD6R : +2 /-2
ED8R / EDC6R : +0 / -12

The second result could let think that EDC6R >> ED8R. However the first one seems to show that ED8R and EDD6R are about equal (although it is statistically not significant).

Do you have some explaination to these results ?
Could it be a result of a non efficient positional parameter ? (For example ED8R would not properly against the cat).

Title: Re: (no) absolute score values for pieces?
Post by jdb on Sep 30th, 2010, 8:42am
Any of the games involving dogs are somewhat suspect. The eval did not handle them well. If there is a cycle involving only cats and rabbits, I would say it was a reliable result.

Title: Re: (no) absolute score values for pieces?
Post by Weirdo87 on Oct 10th, 2010, 3:38am

on 08/09/10 at 16:50:51, Fritzlein wrote:
A serious student of Arimaa (i.e. not me)

If Fritzlein isn't a serious student of Arimaa, who the hell is?

Title: Re: (no) absolute score values for pieces?
Post by Fritzlein on Oct 10th, 2010, 9:06am

on 10/10/10 at 03:38:16, Weirdo87 wrote:
If Fritzlein isn't a serious student of Arimaa, who the hell is?

Chessandgo. ;)

Truly, I spend a lot of time on Arimaa, but not in a disciplined way.



Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB 2000-2003. All Rights Reserved.