Arimaa Forum (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
Arimaa >> General Discussion >> Handicap Order - what beats what?
(Message started by: mistre on Apr 17th, 2008, 3:50pm)

Title: Handicap Order - what beats what?
Post by mistre on Apr 17th, 2008, 3:50pm
It took a while and was tedious, but I took the values from each of the 3 material evaluators - http://arimaa.janzert.com/fame.html  and ranked each major type of handicap.

I say major, because I calculated there are actually 815 different handicap combinations (keeping at least 1 rabbit).  For this exercise, I did not include any combination that included both non-rabbit pieces and rabbit pieces.  This dropped the number to a more manageable 108 combinations.

My reasoning for not including the rabbit/non-rabbit combinations is that typically when handicap botbashing, you will start with the heavy pieces first and only add rabbits at the end.  So I don't think there would be any reason to compare something crazy like HDCR and CRRRRRR.  I did leave in rabbit-only handicaps to see how they compared.

After I ranked all 108 combinations for each evaluator, I summed the rankings to give a composite score.  I think this is more accurate than just doing a 2/3 majority rule (this was the tiebreaker).

Some interesting findings:

H = CC.  FAME and DAPE (opt) prefer CC, while DAPE prefers H.  The average rankings for all 3 makes the comparison equivalent.

7 rabbits = MHC.  This would make a neat asymmetrical handicap match.

In Arimaa, E = MHC2

Here is the entire list from lowest handicap to highest.

R
C
D
RR
RRR
CC
H
DC
DD
RRRR
HC
DCC
HD
M
DDC
RRRRR
HCC
HH
HDC
MC
HDD
MD
DDCC
HHC
RRRRRR
HHD
MCC
HDCC
MH
HDDC
MDC
MDD
HHCC
HHDC
MHC
RRRRRRR
MDCC
MHD
HHDD
HDDCC
MDDC
E
MHCC
MHH
HHDCC
MHDC
EC
HHDDC
MHDD
MDDCC
ED
MHDCC
MHHD
ECC
MHDDC
EH
HHDDCC
EDC
EDD
MHHCC
MHHDC
EHC
EHD
MHHDD
MHDDCC
EDCC
EDDC
EHH
EM
EHCC
MHHDCC
EHDC
EHDD
MHHDDC
EMC
EMD
EHHD
EHDCC
EHDDC
EMCC
EMH
EHHCC
MHHDDCC
EMDC
EMDD
EHHDC
EHHDD
EMHC
EHDDCC
EMDCC
EMHD
EMDDC
EHHDCC
EMHCC
EHHDDC
EMHDC
EMHH
EMDDCC
EMHDD
EMHHC
EMHDCC
EMHDDC
EHHDDCC
EMHHD
EMHHCC
EMHDDCC
EMHHDC
EMHHDD
EMHHDCC
EMHHDDC
EMHHDDCC

Edit: I found 3 more, there is 111 listed now.  3 more to find.

Title: Re: Handicap Order - what beats what?
Post by Arimabuff on Apr 18th, 2008, 6:45am
You did a great job here!

maybe you should put a link in the handicap rules part of the botbasher page to this list.

Title: Re: Handicap Order - what beats what?
Post by Arimabuff on Apr 18th, 2008, 6:55am

on 04/17/08 at 15:50:51, mistre wrote:
...In Arimaa, E = MHC2...

Only in relative terms.  ;)

Title: Re: Handicap Order - what beats what?
Post by Arimabuff on Apr 18th, 2008, 7:09am

on 04/17/08 at 15:50:51, mistre wrote:
...Oops, I think I missed a few - EMHDDC for one.  Please let me know if you see any others and I will re-do the analysis.

Actually there should be 2x2X3X3X3=108-1 = 107 (If you exclude "nothing" from the list) combinations, if you don't count the rabbits.  Plus 1 to 7 rabbits (since you need at least one rabbit to win) = 107 + 7

= 114 combinations.

However, your list has only 108 items, therefore there must still be 6 combinations missing.

Title: Re: Handicap Order - what beats what?
Post by chessandgo on Apr 18th, 2008, 11:24am
Thanks for this interesting list, Mistre.

I went trough the start of the list, and I disagree with a few things. For instance, it does not make sense, in my opinion, that H<CC while HH>HCC. Having a piece less makes the number of pieces even more important. I'm not sure what I'd prefer between H and CC, but I definitely prefer having HCC to HH.

Similarly, M and HD are probably roughly equal, but MD should be worth less than HDD, by the same principle that the less pieces you have, the more their number is important relatively to their strength.

Or are there theoretical arguments for the converse ?

Title: Re: Handicap Order - what beats what?
Post by Arimabuff on Apr 18th, 2008, 11:35am

on 04/18/08 at 11:24:17, chessandgo wrote:
Thanks for this interesting list, Mistre.

I went trough the start of the list, and I disagree with a few things. For instance, it does not make sense, in my opinion, that H<CC while HH>HCC. Having a piece less makes the number of pieces even more important. I'm not sure what I'd prefer between H and CC, but I definitely prefer having HCC to HH.

Similarly, M and HD are probably roughly equal, but MD should be worth less than HDD, by the same principle that the less pieces you have, the more their number is important relatively to their strength.

Or are there theoretical arguments for the converse ?

I my view the NUMBER of pieces outweighs their relative strengths in game endings when what's important is the ground you're covering. In the beginning of a game, it's pretty much the reverse. We all know how deadly it can be if you block someone's elephant even if you commit half of your pieces to the task!  While in the end of a game, an elephant can't even outrun two distant rabbits. Therefore, you may try to think of these pieces as in the context of a cluttered board and it might change your perspective a little.

Title: Re: Handicap Order - what beats what?
Post by mistre on Apr 18th, 2008, 11:59am
Chessandgo - Thanks for your comments.  

The results are only as good as the three current models.  What I think this list does for the first time though is combine all 3 model evaluators into one composite ranking.

Title: Re: Handicap Order - what beats what?
Post by Arimabuff on Apr 18th, 2008, 1:36pm

on 04/18/08 at 11:59:34, mistre wrote:
Chessandgo - Thanks for your comments.  

The results are only as good as the three current models.  What I think this list does for the first time though is combine all 3 model evaluators into one composite ranking.

I think this is the right thing to do. If we try to make a list by taking each individual's opinion, it’ll take ten years to come up with an order that everyone will hate. So you might as well refer to an objective indicator like this one and stick to it.

Title: Re: Handicap Order - what beats what?
Post by Fritzlein on Apr 18th, 2008, 2:09pm

on 04/18/08 at 11:59:34, mistre wrote:
The results are only as good as the three current models.

Yeah, and I freely admit that FAME stinks.  The best I can say for it is that FAME is better than a static piece evaluation, which is what I was competing against when I designed it.  Oh, and it is well-defined.  Chessandgo is absolutely right that after a trade of H for CC, the side with the cats benefits more from trading H for H.  I'm pretty sure H > CC but I don't know about HH for HCC.  It's painful to imagine having that kind of discussion a hundred times to get the handicap order right. :P

Title: Re: Handicap Order - what beats what?
Post by mistre on Apr 18th, 2008, 4:25pm

on 04/18/08 at 14:09:02, Fritzlein wrote:
Yeah, and I freely admit that FAME stinks.  


The worst thing about FAME is that once you get to multiple piece handicaps it thinks they are equivalent.

For example, it ranks the 4-piece handicaps EHCC through EMHH as equal.  This first occurs with HDC and HDD.

Title: Re: Handicap Order - what beats what?
Post by aaaa on Apr 18th, 2008, 5:11pm
I'm planning to come up with my own material evaluator in the form of a multilayer perceptron based on game statistics. Here's hoping it will have better output than garbage.

Title: Re: Handicap Order - what beats what?
Post by 99of9 on Apr 18th, 2008, 7:06pm

on 04/18/08 at 11:24:17, chessandgo wrote:
I went trough the start of the list, and I disagree with a few things. For instance, it does not make sense, in my opinion, that H<CC while HH>HCC. Having a piece less makes the number of pieces even more important. I'm not sure what I'd prefer between H and CC, but I definitely prefer having HCC to HH.

Or are there theoretical arguments for the converse ?

Yes, I have a theoretical argument for the converse.

I agree with you that as both player's pieces are reduced, the number of pieces of pieces starts mattering more.

BUT, there is also something in the detail about which pieces you have.  It's good to have a piece equal to the pieces your opponent has an excess of.  One horse can partially neutralize an opponent with two horses, but zero horses cannot neutralize an opponent with one horse.

So after an H for CC trade, I would agree that it is beneficial for the CC player to trade rabbits, but it could well be beneficial for the H player to trade away one of his H's for the last of his opponent's.

(This is the reason for the "Equals" term in DAPE, which depreciates the value of a piece according to how many opponent's pieces are equal to it.)

Title: Re: Handicap Order - what beats what?
Post by Fritzlein on Apr 18th, 2008, 9:17pm

on 04/18/08 at 19:06:51, 99of9 wrote:
So after an H for CC trade, I would agree that it is beneficial for the CC player to trade rabbits, but it could well be beneficial for the H player to trade away one of his H's for the last of his opponent's.

That's a good point, and one that would weigh even more heavily if the camels were missing.  Maybe I'll have to reconsider my intuition.

Title: Re: Handicap Order - what beats what?
Post by Janzert on Apr 18th, 2008, 10:39pm

on 04/17/08 at 15:50:51, mistre wrote:
After I ranked all 108 combinations for each evaluator, I summed the rankings to give a composite score.  I think this is more accurate than just doing a 2/3 majority rule (this was the tiebreaker).


While I don't want to step into the middle of the controversy over what order things should be ranked in, I do think this approach is flawed because of the way I currently present the material eval scores on that page.

The scores shown are normalized for the first rabbit captured (i.e. all the methods show 1.0)1. But the three methods still operate on different scales (i.e. once there is only a rabbit left FAME gives 162, DAPE 284 and DAPE(eo) 56). This means that for the high end sacrifices with the above summation method you are essentially giving DAPE 1.75 times the weight of FAME and 5 times the weight of DAPE(eo). So for the current numbers a simple count as 99 originally proposed is almost certainly better.

I have several times in the past thought about rescaling the numbers so the could be more directly compared. But the problem is that while it's very natural to say that the first rabbit should be 1.0 there isn't any similiar self apparent spot to pin the top end. Or looked at another way the current numbers directly mean DAPE thinks one rabbit remaining is like 284 initial rabbits while DAPE(eo) thinks it's like 56 initial rabbits. Scaling the methods to all go 1 through 100 or some such makes the numbers a little more abstract but more directly comparable. Maybe I'll just modify it to show both the current (initial rabbit) number and a new (scaled range) number.

Janzert

1 The actual raw values for the initial rabbit are 33.69 for FAME, 4.28 for DAPE and 10.44 for DAPE(eo). So had the raw numbers been left on the page DAPE would have had a smaller weighting while FAME and DAPE(eo) would have had larger weightings in the above calculation.

Title: Re: Handicap Order - what beats what?
Post by mistre on Apr 18th, 2008, 11:00pm
Janzert,

Let me clarify my ranking method.  I did not use overall raw numbers but rankings instead.  I ranked all 108 (now 111) handicaps 1 through 111 for each of the three measures.  I then summed that number for each measure.  The handicaps were then ranked from lowest total score to highest total score.

So overall raw numbers didn't matter, just what order each of the 3 evaluators placed each handicap.



Title: Re: Handicap Order - what beats what?
Post by 99of9 on Apr 18th, 2008, 11:22pm
Ahh, thanks for the clarification mistre, that sounds like a much better method than what I (and Janzert) assumed you were doing.  In fact I think I even agree with you that it is a better method than my "majority of evals" method (although yours is obviously much more time consuming).

Title: Re: Handicap Order - what beats what?
Post by chessandgo on Apr 19th, 2008, 3:38am

on 04/18/08 at 13:36:17, Arimabuff wrote:
I think this is the right thing to do. If we try to make a list by taking each individual's opinion, it’ll take ten years to come up with an order that everyone will hate. So you might as well refer to an objective indicator like this one and stick to it.



on 04/18/08 at 11:59:34, mistre wrote:
Chessandgo - Thanks for your comments.  

The results are only as good as the three current models.  What I think this list does for the first time though is combine all 3 model evaluators into one composite ranking.


Well, I wasn't suggesting to the least extent that the ranking should be any different, you did a great job, Mistre. Just wanted to profit from this thread to hear from other players how they approach unbalanced trades.


Title: Re: Handicap Order - what beats what?
Post by chessandgo on Apr 19th, 2008, 3:43am

on 04/18/08 at 19:06:51, 99of9 wrote:
Yes, I have a theoretical argument for the converse.

I agree with you that as both player's pieces are reduced, the number of pieces of pieces starts mattering more.

BUT, there is also something in the detail about which pieces you have.  It's good to have a piece equal to the pieces your opponent has an excess of.  One horse can partially neutralize an opponent with two horses, but zero horses cannot neutralize an opponent with one horse.

So after an H for CC trade, I would agree that it is beneficial for the CC player to trade rabbits, but it could well be beneficial for the H player to trade away one of his H's for the last of his opponent's.

(This is the reason for the "Equals" term in DAPE, which depreciates the value of a piece according to how many opponent's pieces are equal to it.)


Yes, thanks for this point, Toby. But as Karl says, I would consider this to be significant only if the camels were gone. Moreover, if camels get traded, the number of pieces decreases again, making the extra piece even more interesting ... in my opinion. But I see your point.

So do you think an early exchange of HH vs HCC favorizes the side with hcc down ?

Title: Re: Handicap Order - what beats what?
Post by 99of9 on Apr 19th, 2008, 5:17am

on 04/19/08 at 03:43:40, chessandgo wrote:
So do you think an early exchange of HH vs HCC favorizes the side with hcc down ?

I'm not sure.  It's close.  All I'm saying is that there's something in each player's favour.

I agree that the factor is much stronger when the camels are missing (all 3 evals also agree on this).  But even the fact that a camel trade is now favourable for the hcc player makes his position a little easier, because the other player has to watchfully prevent a camel trade.

Title: Re: Handicap Order - what beats what?
Post by Janzert on Apr 19th, 2008, 5:51am

on 04/18/08 at 23:00:25, mistre wrote:
Let me clarify my ranking method.  I did not use overall raw numbers but rankings instead.  I ranked all 108 (now 111) handicaps 1 through 111 for each of the three measures.  I then summed that number for each measure.  The handicaps were then ranked from lowest total score to highest total score.


Ahh, yes that does sound like a better method.

Janzert

Title: Re: Handicap Order - what beats what?
Post by woh on Apr 19th, 2008, 6:50am
Great list, mistre!


on 04/17/08 at 15:50:51, mistre wrote:
3 more to find.

I believe those are
MHHC
EHHC
EDDCC

Title: Re: Handicap Order - what beats what?
Post by Fritzlein on Apr 19th, 2008, 1:47pm

on 04/18/08 at 23:00:25, mistre wrote:
Let me clarify my ranking method.  I did not use overall raw numbers but rankings instead.  I ranked all 108 (now 111) handicaps 1 through 111 for each of the three measures.  I then summed that number for each measure.  The handicaps were then ranked from lowest total score to highest total score.

So am I correct to state that we can't infer from your list whether EMHHDDCCRRRRR or EMHHDDCRRRRRRR is the greater handicap?  I looks like, in order to get a relative ranking between the two, you would have to rerun with 113 handicaps in the list.

Would it also be correct to say that adding alternatives could change the relative rankings of items that are already in your list?  For example, in the partial lists

ABC
ACB
ACB

your method would have C ahead of B overall, but if new alternatives came in like

ABDEFC
ADEFCB
ADEFCB

then the combined ranking would have B ahead of C, right?

I guess my point is that relative values of handicaps is an unstable way to do it, so to be fair one would presumably have to have _all_ handicaps in the list, or else raise the question of why one list and not another.

Title: Re: Handicap Order - what beats what?
Post by aaaa on Apr 19th, 2008, 2:07pm
Take the ranking generated by a Condorcet method with the chosen material evaluators being the voters and the 971 possible handicaps being the candidates. Problem solved.

Title: Re: Handicap Order - what beats what?
Post by Arimabuff on Apr 19th, 2008, 2:43pm

on 04/19/08 at 13:47:57, Fritzlein wrote:
So am I correct to state that we can't infer from your list whether EMHHDDCCRRRRR or EMHHDDCRRRRRRR is the greater handicap?  I looks like, in order to get a relative ranking between the two, you would have to rerun with 113 handicaps in the list.

Would it also be correct to say that adding alternatives could change the relative rankings of items that are already in your list?  For example, in the partial lists

ABC
ACB
ACB

your method would have C ahead of B overall, but if new alternatives came in like

ABDEFC
ADEFCB
ADEFCB

then the combined ranking would have B ahead of C, right?

I guess my point is that relative values of handicaps is an unstable way to do it, so to be fair one would presumably have to have _all_ handicaps in the list, or else raise the question of why one list and not another.

Karl I believe that for handicaps where the pieces are all but depleted we should see it the other way around, that is in this case whether having only CR in your camp is better than RRR. I think it makes things clearer to see. If we have only either an army of three rabbits to fight or a cat and a rabbit, we all know which one is best don't we?

I think when we consider it on the side of the handicap MHHDDCRRRRRRR versus MHHDDCCRRRRR the program will suffer from side effect miscalculations, due to the big accumulation of pieces that adds each its own uncertainty.

When you look it as RRR versus CR it suddenly makes it look plain and simple but maybe that’s not what we want?

I say it makes sense to look at things from their clearer and simpler perspective, that’s how we’ve been taught to organize our thoughts.

Title: Re: Handicap Order - what beats what?
Post by Fritzlein on Apr 19th, 2008, 2:53pm
If my previous post is correct, the question of whether CR beats RRR was not decided at the time that Arimabuff and 99of9 set their respective records against Gnobot2005P1.  That means we now need to decide which is better, either directly, or by deciding on a methodology and accepting the result.

That is tricky enough, but there is an added complication that we are setting the finish line after the race has been run.  Arimabuff stopped running after reducing the handicap to CR, because he thought the race was over.  Unless Gnobot2005P1 can be beaten with only RR, the race is really over now, and decision we make will determine who holds the all time record.  We can't decide that RRR was the greater feat now without being unfair to Arimabuff.  On the other hand, we can't start the discussion knowing what we have to conclude, or it isn't a real discussion.

The more I think about whether CR or RRR is better, the more it seems to me that the two records are incommensurate, and the Hall of Fame should therefore reflect both.  More generally, one handicap should not replace another in the Hall of Fame unless one handicap includes all the pieces of the other, plus more, or the two handicaps are equal in number of pieces and one is strictly better in strength.

To illustrate my reasoning, consider the following examples.  In the first pair of boards (Silver to move) CR is clearly weaker than RRR


 +---+---+---+---+---+---+---+---+
8 |   |   |   |   |   |   |   |   |
 +---+---+---+---+---+---+---+---+
7 |   |   |   |   |   |   |   |   |
 +---+---+---+---+---+---+---+---+
6 |   |   | * |   |   | * |   |   |
 +---+---+---+---+---+---+---+---+
5 | c |   |   | M | E |   |   | r |
 +---+---+---+---+---+---+---+---+
4 | R |   |   | H | H |   |   | R |
 +---+---+---+---+---+---+---+---+
3 |   |   | * | D | D | * |   |   |
 +---+---+---+---+---+---+---+---+
2 |   |   |   | C | C |   |   |   |
 +---+---+---+---+---+---+---+---+
1 |   | R | R | R | R | R | R |   |
 +---+---+---+---+---+---+---+---+
   a   b   c   d   e   f   g   h

 +---+---+---+---+---+---+---+---+
8 |   |   |   |   |   |   |   |   |
 +---+---+---+---+---+---+---+---+
7 |   |   |   |   |   |   |   |   |
 +---+---+---+---+---+---+---+---+
6 |   |   | * |   |   | * |   |   |
 +---+---+---+---+---+---+---+---+
5 | r | r |   | M | E |   |   | r |
 +---+---+---+---+---+---+---+---+
4 | R |   |   | H | H |   |   | R |
 +---+---+---+---+---+---+---+---+
3 |   |   | * | D | D | * |   |   |
 +---+---+---+---+---+---+---+---+
2 |   |   |   | C | C |   |   |   |
 +---+---+---+---+---+---+---+---+
1 |   | R | R | R | R | R | R |   |
 +---+---+---+---+---+---+---+---+
   a   b   c   d   e   f   g   h


In the second pair of boards (Silver to move), CR is clearly stronger than RRR

 +---+---+---+---+---+---+---+---+
8 |   |   |   |   |   |   |   |   |
 +---+---+---+---+---+---+---+---+
7 | r | c |   |   |   |   |   |   |
 +---+---+---+---+---+---+---+---+
6 | R | R | * |   |   | * |   |   |
 +---+---+---+---+---+---+---+---+
5 |   |   |   | M | E |   |   |   |
 +---+---+---+---+---+---+---+---+
4 |   |   |   | H | H |   |   |   |
 +---+---+---+---+---+---+---+---+
3 |   |   | * | D | D | * |   |   |
 +---+---+---+---+---+---+---+---+
2 |   |   |   | C | C |   |   |   |
 +---+---+---+---+---+---+---+---+
1 |   |   | R | R | R | R | R | R |
 +---+---+---+---+---+---+---+---+
   a   b   c   d   e   f   g   h


 +---+---+---+---+---+---+---+---+
8 |   |   |   |   |   |   |   |   |
 +---+---+---+---+---+---+---+---+
7 | r | r | r |   |   |   |   |   |
 +---+---+---+---+---+---+---+---+
6 | R | R | * |   |   | * |   |   |
 +---+---+---+---+---+---+---+---+
5 |   |   |   | M | E |   |   |   |
 +---+---+---+---+---+---+---+---+
4 |   |   |   | H | H |   |   |   |
 +---+---+---+---+---+---+---+---+
3 |   |   | * | D | D | * |   |   |
 +---+---+---+---+---+---+---+---+
2 |   |   |   | C | C |   |   |   |
 +---+---+---+---+---+---+---+---+
1 |   |   | R | R | R | R | R | R |
 +---+---+---+---+---+---+---+---+
   a   b   c   d   e   f   g   h


We can't say in advance what combination of pieces is going to be most useful, because we don't know in advance what portion of the bot's army will be deployed, whether it will advance rabbits sooner or later, on one side of the board, or both, etc.

The material handicap formulas weren't designed for extreme situations such as a whole army against three little pieces.  A handicap win is only possible because the bot doesn't use its whole army.  If we use the formulas at all in this case, it might make more sense to ask whether RRR or CR is stronger against an opposing army of RRR.  But then for a different bot that used its pieces differently, the true difficulty of each handicap would be reflected by a different formula.

Why not use a partial order (as we mathematicians call it) and keep both records where there is no clear order between them?  One might say that we would have too many records per bot, but in practice I doubt it would get beyond two or three.  Under my proposal we can't say whether a handicap of M or RRRR is better, but if both are possible then probably so is MRRR, which trumps them both.  If we ended up with cases where MD was possible and also HH, but not MH, then why do we need to split hairs?  Why not recognize both?

My hunch is that this will result in basically two handicap records per bot: a strength handicap and a numbers handicap.  On the one side you'll be trying to win with just EMR, and on the other trying to win with just CRRRRRRR or something like that.  For the bots that get extremely bashed, the two may even converge.

What do you guys think?  Is it worth a try?

Title: Re: Handicap Order - what beats what?
Post by Fritzlein on Apr 19th, 2008, 3:02pm

on 04/19/08 at 14:07:32, aaaa wrote:
Take the ranking generated by a Condorcet method with the chosen material evaluators being the voters and the 971 possible handicaps being the candidates. Problem solved.

Are you ranking each smaller army by how much it is less than a full army?  That provides clarity, but as Arimabuff points out, the greater the mismatch, the more likely the numbers are to be meaningless.  FAME, at least, wasn't designed for extreme situations.

Or do you mean ranking each army head to head?  As Arimabuff suggests, we could enter CR vs. RRR directly into each evaluator.  Unfortunately, FAME can have circular preferences, i.e. head-to-head preferences don't produce a strict ordering of all possible armies.  For example, CCR > MR > DRR > CCR.  So there may be more accuracy of the formulas that way, but less overall clarity.

Title: Re: Handicap Order - what beats what?
Post by aaaa on Apr 19th, 2008, 3:11pm

on 04/19/08 at 15:02:07, Fritzlein wrote:
Are you ranking each smaller army by how much it is less than a full army?  That provides clarity, but as Arimabuff points out, the greater the mismatch, the more likely the numbers are to be meaningless.  FAME, at least, wasn't designed for extreme situations.

You might want to let additional material evaluators weigh in then, like optimized FAME, RabbitCurveABC, LinearAB, etc.

Title: Re: Handicap Order - what beats what?
Post by mistre on Apr 19th, 2008, 3:16pm
Wow, a lot was said since I signed off yesterday...

After some more thought, I don't plan on re-working my ranking analysis to see if RRR or CR remaining is better.  As Karl said, I can't just add those two scenarios to the list without adding all other 700+ scenarios that mix rabbits/non-rabbit pieces.

My original list is far from perfect, but I stand by using it over just 2/3 majority of evals.  For 99% of the cases, it should work just fine.  For a case like RRR vs CR remaining, it will not and we need another solution.

aaaa mentions using the condorcet model.  I don't know how to use that, so if anyone wants to give it a go with all 971 combinations, be my guest.


Title: Re: Handicap Order - what beats what?
Post by aaaa on Apr 19th, 2008, 3:20pm

on 04/19/08 at 15:02:07, Fritzlein wrote:
Or do you mean ranking each army head to head?

No. We call it a handicap because one is playing with less than a full army against the full one of the opponent, so determining how large the handicap is would logically entail quantifying the resulting disadvantage between the two armies.

Title: Re: Handicap Order - what beats what?
Post by aaaa on Apr 19th, 2008, 3:27pm

on 04/19/08 at 15:16:30, mistre wrote:
aaaa mentions using the condorcet model.  I don't know how to use that, so if anyone wants to give it a go with all 971 combinations, be my guest.

Give me the evaluation functions you want to count (preferably a large and odd number, like, say, 5) and I'll try to give you the relative ranking as given by the Schulze(margins) method.

Title: Re: Handicap Order - what beats what?
Post by mistre on Apr 19th, 2008, 3:31pm

on 04/19/08 at 06:50:54, woh wrote:
Great list, mistre!

I believe those are
MHHC
EHHC
EDDCC


Great eyes, Woh!  I appreciate it!  Next time I get a chance, I will add the missing handicaps.

Title: Re: Handicap Order - what beats what?
Post by aaaa on Apr 19th, 2008, 6:19pm
Unfortunately, using the 3 material evaluators on Janzert's page gives me a considerable amount of ties. Nevertheless, here is the result I get:

001. No handicap
002. R
003. C
004. D
005. RR
006. CR
007. DR
008. RRR
009. H
010. CC
011. CRR
012. DC
013. DRR
014. CCR
""". DD
""". HR
""". RRRR
018. CRRR
019. DCR
020. HC
021. CCRR
""". DCC
""". DDR
""". DRRR
""". HRR
026. HD
027. CRRRR
""". DCRR
""". HCR
""". M
""". RRRRR
032. DDC
033. HCC
""". HDR
""". MR
036. DCCR
""". DDRR
""". HRRR
039. CCRRR
040. DRRRR
041. HCRR
""". HDC
043. DDCR
044. CRRRRR
""". DCRRR
""". HDD
""". HH
""". RRRRRR
049. DCCRR
""". DDRRR
""". HCCR
""". HDRR
""". MC
054. HRRRR
""". MD
""". MRR
057. DDCC
""". HHR
059. CCRRRR
""". MCR
061. DRRRRR
""". HDCR
063. DDCRR
""". HCRRR
065. RRRRRRR
066. HHC
""". MH
068. DCRRRR
""". MCC
""". MDR
071. HDDR
072. HHD
073. HCCRR
074. DCCRRR
""". HDRRR
""". MRRR
077. CRRRRRR
078. DDCCR
""". HDCC
""". HHRR
""". MDC
082. E
083. DDRRRR
""". HRRRRR
085. HDDC
086. MCRR
087. HDCRR
088. DDCRRR
089. DRRRRRR
""". HHCR
091. HCRRRR
092. MDD
093. RRRRRRRR
094. CCRRRRR
""". MHR
096. HCCRRR
097. MCCR
""". MDRR
099. HDDRR
100. DDCCRR
""". HHDR
""". HHRRR
103. HDCCR
""". MRRRR
105. HHCC
106. DCRRRRR
""". HDRRRR
""". MHC
109. CRRRRRRR
""". DCCRRRR
""". MDCR
112. ER
""". HRRRRRR
114. HDCRRR
115. HDDCR
116. HHDC
""". MCRRR
118. HHCRR
119. MDDR
""". MHD
121. DDRRRRR
122. DDCRRRR
123. DRRRRRRR
""". EC
125. MCCRR
""". MDCC
""". MDRRR
128. CRRRRRRRR
""". HHDD
""". MHRR
131. HDDRRR
132. CCRRRRRR
""". HDDCC
134. HDCCRR
135. HCRRRRR
""". HHDRR
137. DDCCRRR
""". HCCRRRR
139. CCRRRRRRR
""". HHCCR
141. MRRRRR
142. HHRRRR
143. ED
144. MHCR
145. MDCRR
""". MDDC
147. DCRRRRRR
148. DRRRRRRRR
149. HDDCRR
""". HDRRRRR
151. HHCRRR
152. DCRRRRRRR
""". HHDCR
""". MCRRRR
""". MHH
156. ERR
""". HDCRRRR
158. DCCRRRRR
""". DCCRRRRRR
""". HRRRRRRR
161. MDDRR
""". MHDR
163. MHCC
164. HHDDR
165. MCCRRR
166. DDRRRRRR
""". HDCCRRR
""". MDCCR
""". MHRRR
170. ECR
171. HDDRRRR
""". HHCCRR
""". MDRRRR
174. HHDRRR
175. HDDCCR
176. HHDCC
177. EH
178. MDDCR
179. DDCRRRRR
""". DDCRRRRRR
""". DDRRRRRRR
""". HCRRRRRR
183. MDCRRR
""". MHCRR
""". MRRRRRR
186. MHDC
187. DDCCRRRRR
""". HHDDC
189. DDCCRRRR
""". HDDCRRR
191. HRRRRRRRR
""". MHHR
193. EDR
194. HHDCRR
195. ECC
""". HHCRRRR
""". HHRRRRR
198. HCCRRRRR
199. HCCRRRRRR
""". HCRRRRRRR
201. MDDRRR
""". MHDD
""". MHDRR
204. MCRRRRR
205. HDRRRRRR
206. CCRRRRRRRR
""". ERRR
""". MDCCRR
""". MDDCC
""". MHCCR
211. EDC
""". HHDDRR
213. MHHC
214. HDCRRRRR
""". HDDCCRR
""". HHDRRRR
""". MCCRRRR
""". MHRRRR
219. HHCCRRR
""". MHHD
221. HDCCRRRR
222. HHDCCR
223. ECRR
224. MDRRRRR
225. MDDCRR
""". MHDCR
227. HDDRRRRR
228. DCRRRRRRRR
""". HDCRRRRRR
230. HDCCRRRRR
""". MHCRRR
232. EDD
""". MDCRRRR
234. HDDRRRRRR
""". HDRRRRRRR
236. HDDCRRRRR
237. EHR
""". HHDDCR
""". MHHRR
240. DCCRRRRRRR
""". HHDCRRR
242. HDDCRRRR
243. MDDRRRR
244. EDRR
245. MHCCRR
""". MRRRRRRR
247. ECCR
""". MDCCRRR
""". MHDDR
""". MHDRRR
251. MDDCCR
252. HHRRRRRR
253. HHCRRRRR
""". HHDDRRR
255. MHDCC
256. EHC
""". HDDCCRRRR
258. HHCCRRRR
""". MHHCR
260. DDRRRRRRRR
261. HHDCCRR
""". MDDCRRR
263. DDCRRRRRRR
""". MCCRRRRR
""". MHRRRRR
266. ERRRR
""". HDDCCRRR
""". MCRRRRRR
""". MHDCRR
270. HHDDCC
271. EDCR
""". HHDRRRRR
273. MHHDR
274. MHCRRRR
275. MHDDC
276. DDCCRRRRRR
277. HCRRRRRRRR
278. HCCRRRRRRR
""". HHDCRRRR
""". HHDDCRR
281. ECRRR
""". MDCRRRRR
""". MDRRRRRR
284. EHD
285. MHHRRR
286. MHCCRRR
""". MHHCC
288. HHCCRRRRR
""". MHDDRR
290. EDDR
291. HHCRRRRRR
""". MDCCRRRR
""". MDDCCRR
""". MHDRRRR
295. HHRRRRRRR
""". MRRRRRRRR
297. MDDRRRRR
""". MHDCCR
299. EHRR
300. EDCC
""". HHDCRRRRR
""". HHDRRRRRR
303. MCCRRRRRR
304. MHHDC
305. EDRRR
306. ECCRR
""". HHDCCRRR
""". HHDDRRRR
""". MCRRRRRRR
310. HDRRRRRRRR
311. HHDDRRRRR
""". MDDCRRRR
""". MHHCRR
314. MHDCRRR
""". MHHDD
316. HDCRRRRRRR
317. HHDCCRRRR
""". MDCCRRRRR
319. HDCCRRRRRR
""". HHDDCCR
""". MHDDCR
322. EHCR
""". HDDRRRRRRR
324. MHCRRRRR
325. EDDC
326. EDCRR
""". HDDCRRRRRR
""". MHHDRR
329. HHDDCRRR
""". MDCRRRRRR
""". MDDCCRRR
""". MHRRRRRR
333. MDRRRRRRR
""". MHCCRRRR
335. ERRRRR
""". MDDCRRRRR
""". MHDDRRR
338. MHHRRRR
339. MHDCCRR
340. EM
341. MDDRRRRRR
342. HHDDCRRRR
343. DCCRRRRRRRR
""". MHDRRRRR
345. MHHCCR
346. ECRRRR
347. EHH
348. EHDR
""". MHDCRRRR
350. HDDCCRRRRR
351. EHCC
""". MHDDCC
353. EDDRR
354. MDDCCRRRR
355. MHHCRRR
356. MHDDCRR
357. EDCCR
""". EHRRR
359. ECCRRR
360. HHDDCCRR
361. EDRRRR
""". MHHDCR
363. MHDCCRRR
364. DDCRRRRRRRR
365. HHDDCCRRR
366. HHRRRRRRRR
367. EDDCR
""". EHDC
""". MHDDRRRR
370. HCCRRRRRRRR
""". HHCRRRRRRR
""". MHCCRRRRR
""". MHHDDR
374. EHCRR
375. MHCRRRRRR
""". MHHDRRR
377. MCCRRRRRRR
378. DDCCRRRRRRR
""". EDCRRR
380. HHCCRRRRRR
""". HHDRRRRRRR
382. MHHCCRR
383. EMR
""". MHRRRRRRR
385. HHDCRRRRRR
""". MCRRRRRRRR
387. MHDRRRRRR
388. ERRRRRR
""". MHDCRRRRR
390. MHHRRRRR
391. MHDDCCR
""". MHHDCC
393. HHDDRRRRRR
""". MHDDCRRR
395. EHDRR
396. EDDRRR
""". EHHR
""". HHDCCRRRRR
""". MHDDRRRRR
400. EHCCR
""". EMC
""". HDCRRRRRRRR
403. ECRRRRR
""". EHDD
405. EDCCRR
""". MDRRRRRRRR
""". MHDCCRRRR
""". MHHCRRRR
409. MDCCRRRRRR
""". MHHDDC
411. HDDRRRRRRRR
""". MDCRRRRRRR
""". MHHDCRR
""". MHHRRRRRR
415. EDDCC
416. EHRRRR
""". MHHCRRRRR
418. HHDDCRRRRR
""". MDDCRRRRRR
420. ECCRRRR
""". MDDRRRRRRR
422. EMD
423. HDCCRRRRRRR
424. MHDDCRRRR
425. MHHDRRRRR
426. EDRRRRR
427. EHDCR
428. HDDCRRRRRRR
429. EDDCRR
""". MHHCCRRR
""". MHHDDRR
432. EHHC
433. MHDDCCRR
""". MHHDRRRR
435. EHCRRR
""". MDDCCRRRRR
""". MHHCCRRRR
438. EMRR
439. EDCRRRR
440. MHHDCRRRR
441. MHHDCCR
442. EDDRRRR
""". EHCCRR
""". EHDDR
""". EHDRRR
""". EHHD
""". HDDCCRRRRRR
448. EHHRR
""". MHCCRRRRRR
""". MHHDDRRRR
451. HHDDCCRRRR
""". MHDDCCRRR
453. EDCCRRR
""". MHCRRRRRRR
455. EMCR
""". ERRRRRRR
457. MHHDCRRR
458. EHDCC
459. EMH
460. EDDCCR
""". HHCRRRRRRRR
462. MHDCRRRRRR
463. HHCCRRRRRRR
464. MHDRRRRRRR
465. EMDR
466. ECRRRRRR
""". EHRRRRR
468. HHDRRRRRRRR
469. MCCRRRRRRRR
""". MHHDCCRRR
""". MHHDDCR
472. MHRRRRRRRR
473. ECCRRRRR
""". MHDDRRRRRR
475. EDDCRRR
""". HHDCRRRRRRR
""". MHDCCRRRRR
478. EHDCRR
479. EHHCR
480. EHCRRRR
""". MHHDDRRR
482. EMCC
483. MHHDDCRRR
484. MHHCRRRRRR
485. DDCCRRRRRRRR
""". EMRRR
""". MHHRRRRRRR
488. EDRRRRRR
""". EHDDC
""". HHDDRRRRRRR
491. MHDDCRRRRR
492. EDCRRRRR
""". HHDCCRRRRRR
""". MHHDCCRR
495. MHHDRRRRRR
496. EHCCRRR
""". MDCCRRRRRRR
498. EMDC
""". MDCRRRRRRRR
500. EHHDR
501. EHDDRR
""". EHDRRRR
503. EHHRRR
504. EDCCRRRR
""". HHDDCRRRRRR
506. MHHCCRRRRR
507. EDDCCRR
""". EHDCCR
""". MDDRRRRRRRR
510. EHHCC
""". MDDCRRRRRRR
""". MHHDDCC
513. EMCRR
514. EDDRRRRR
""". EMHR
516. EMDD
517. MHHDCRRRRR
518. MDDCCRRRRRR
519. MHHDDCRR
520. MHDDCCRRRR
521. EMDRR
522. ECCRRRRRR
523. ECRRRRRRR
524. EHDCRRR
525. MHHDDCCRR
526. EDDCRRRR
527. EHHCRR
""". EHHDC
""". MHHDDRRRRR
530. EHRRRRRR
""". ERRRRRRRR
532. HHDDCCRRRRR
533. EMCCR
534. EHDDCR
535. HDCCRRRRRRRR
536. EHCRRRRR
""". EMHC
538. EMRRRR
539. EDCRRRRRR
540. EDCCRRRRR
""". EHCCRRRR
542. MHHDCCRRRR
543. EHHRRRR
""". HDDCRRRRRRRR
545. EHHDRR
""". EMDCR
""". EMHD
548. EHDDRRR
549. EDDCCRRR
""". EDRRRRRRR
""". EHDCCRR
""". MHCCRRRRRRR
553. EHDRRRRR
""". MHCRRRRRRRR
555. EHHCCR
556. EDDRRRRRR
""". EMHRR
558. EDDCRRRRR
""". EHHDD
560. EMDDR
""". MHHDDCRRRR
562. MHDCCRRRRRR
""". MHDCRRRRRRR
564. EMCRRR
565. HDDCCRRRRRRR
566. EHDCRRRR
567. MHDRRRRRRRR
568. MHHDDCCR
569. EHDDCC
570. EHHCRRR
""". MHDDCRRRRRR
572. EHDDCRR
""". EMHH
574. MHDDRRRRRRR
575. EMDCC
""". EMDRRR
577. EDDCCRRRR
""". EHHDCR
579. MHHCRRRRRRR
580. EMHCR
581. EHDCCRRR
""". EMCCRR
""". MHHRRRRRRRR
584. HHCCRRRRRRRR
585. EHCCRRRRR
586. EMHDR
""". MHHCCRRRRRR
588. EHCRRRRRR
""". EHDDRRRR
""". MHHDRRRRRRR
591. EMDDC
592. EHHDRRR
593. MHDDCCRRRRR
594. EHHRRRRR
""". HHDCRRRRRRRR
596. EHHCCRR
597. EMDCRR
598. EMRRRRR
599. EHRRRRRRR
""". MHHDCRRRRRR
601. EHDCRRRRR
""". EHDRRRRRR
603. ECCRRRRRRR
604. HHDDRRRRRRRR
605. EMCRRRR
606. EMHCC
607. EHDDCCR
""". EMDDRR
609. EHDDCRRR
""". EHHDDR
""". EMHRRR
612. EHHCRRRR
613. EHHDCC
614. MDCCRRRRRRRR
615. ECRRRRRRRR
""". EHDCCRRRR
617. EHDDRRRRR
618. EMHDC
619. EMHHR
620. EDCCRRRRRR
""". EHHDCRR
622. EMDRRRR
""". HHDCCRRRRRRR
624. MHHDDRRRRRR
625. MHHDDCCRRR
626. EDCRRRRRRR
""". MDDCRRRRRRRR
628. EMHDD
629. EMDCCR
630. MHHDCCRRRRR
631. EDDCRRRRRR
""". EHHCCRRR
""". EHHDRRRR
634. EMHCRR
635. HHDDCRRRRRRR
636. EDRRRRRRRR
637. EMCCRRR
638. EHDDCRRRR
639. EHHCRRRRR
640. EDDCCRRRRR
""". EMHDRR
642. EHDDCCRR
643. EMDDCR
""". HHDDCCRRRRRR
645. EHHRRRRRR
646. EDDRRRRRRR
""". MDDCCRRRRRRR
648. EMRRRRRR
649. EHHDDC
""". EMDCRRR
651. EMHHC
652. EMHHD
653. EHHDDRR
""". EMHRRRR
655. EHCCRRRRRR
656. EHHDRRRRR
657. EHHCCRRRR
""". EHHDCCR
""". EMDDRRR
660. MHHDDCRRRRR
661. EHHDCRRR
662. EMCRRRRR
""". EMHCCR
664. EHDDCCRRR
665. EHDCCRRRRR
666. EHCRRRRRRR
667. MHCCRRRRRRRR
668. EHDCRRRRRR
669. EMHHRR
670. EMHDCR
671. EMDRRRRR
672. EMDCCRR
""". MHDCRRRRRRRR
674. EHHDCRRRR
""". EMDDCC
676. EMCCRRRR
""". EMHCRRR
678. EHDRRRRRRR
""". EMHDDR
680. EHRRRRRRRR
681. EHDDRRRRRR
""". EMCRRRRRR
683. EHHDCCRR
684. EMDDCRR
685. EMRRRRRRR
""". MHDCCRRRRRRR
687. ECCRRRRRRRR
""". EMCCRRRRR
""". EMHDRRR
""". MHDDRRRRRRRR
691. EHDDCRRRRR
""". EHHDDCR
""". EMDCRRRR
""". MHDDCCRRRRRR
695. EHHDDRRR
""". EMDCRRRRR
697. EMDRRRRRR
698. MHHCRRRRRRRR
699. EHHDCCRRR
""". EHHDDRRRR
701. EMHHCR
702. HDDCCRRRRRRRR
703. EDCCRRRRRRR
""". MHHDRRRRRRRR
705. EMHHDR
706. EMHDCC
""". MHDDCRRRRRRR
708. EDCRRRRRRRR
709. EMDDRRRR
""". EMDDRRRRR
711. EDDCCRRRRRR
712. EHHCCRRRRR
713. EHDDCCRRRR
714. EHHCRRRRRR
""". EMHRRRRR
716. EMHDDC
717. EMHRRRRRR
718. EMDCCRRRR
719. EDDCRRRRRRR
720. EMDCCRRR
""". EMHCCRR
722. EHHDDCRR
""". MHHCCRRRRRRR
""". MHHDDCCRRRR
725. EHHRRRRRRR
726. EMHCRRRRR
727. EDDRRRRRRRR
""". EMDDCCR
729. EMHHRRR
730. EHHDDCRRR
""". EHHDRRRRRR
732. EMDDCRRRR
733. EHHDCRRRRR
""". EHHDDCC
""". EMHDCRR
""". EMHDRRRRR
737. EHCCRRRRRRR
738. EMDDCRRR
""". MHHDCCRRRRRR
740. EMHCRRRR
""". MHHDCRRRRRRR
742. EHDCCRRRRRR
""". HHDCCRRRRRRRR
744. EMHHCC
745. EHCRRRRRRRR
""". EMHDDRR
747. EMHHDC
748. EMHHDD
749. EHDCRRRRRRR
750. EMHCCRRRR
""". EMHDRRRR
752. EHHDCCRRRR
753. EHDDCRRRRRR
""". EMDDCCRRR
""". MHHDDRRRRRRR
756. EMHHRRRRR
757. EHHDDRRRRR
""". EMHDCRRRR
759. EHDRRRRRRRR
760. MDDCCRRRRRRRR
761. EHDDCCRRRRR
762. EMDDCCRR
""". HHDDCRRRRRRRR
764. EHHDDCCR
765. EMCRRRRRRR
""". EMRRRRRRRR
767. EMCCRRRRRR
""". EMHDDRRRR
769. MHHDDCRRRRRR
770. EHDDRRRRRRR
771. HHDDCCRRRRRRR
772. EHHDDCCRR
""". EMDCRRRRRR
""". EMHHCRR
775. EMHCCRRR
""". EMHDCCR
777. EMDRRRRRRR
778. EMHHDRR
779. EHHCCRRRRRR
780. EDCCRRRRRRRR
781. EMDCCRRRRR
782. EMDDRRRRRR
783. EMHDCCRRR
784. EHHDDCRRRR
785. EHHCRRRRRRR
""". EMHDCRRR
787. EMHHCRRRR
788. EDDCCRRRRRRR
789. EMHDDCR
790. EMHHDRRRR
791. EMHDDCRRR
792. MHDCCRRRRRRRR
793. EDDCRRRRRRRR
""". EHHDCRRRRRR
795. EMDDCRRRRR
""". EMHHRRRR
797. MHHDDCCRRRRR
798. EHHRRRRRRRR
799. EMHCRRRRRR
800. EHHDCCRRRRR
801. EMHDDRRR
802. EHHDRRRRRRR
803. EMHRRRRRRR
804. EHCCRRRRRRRR
""". EMHDRRRRRR
""". MHDDCRRRRRRRR
807. EHDCCRRRRRRR
""". MHDDCCRRRRRRR
809. EMHDCCRR
""". EMHHCCRRR
811. EMDDCCRRRR
812. EMHHDCRRR
813. EMHCCRRRRR
814. EMHHDDRRR
815. EHDCRRRRRRRR
""". EHHDDCCRRR
""". EMHHCCR
818. EHHDDRRRRRR
819. EMHDDCC
820. EMHDDCRR
821. EMCCRRRRRRR
""". EMCRRRRRRRR
""". EMHHDCR
824. EHDDCCRRRRRR
""". EMHDCRRRRR
""". MHHCCRRRRRRRR
827. EHDDCRRRRRRR
""". EHHDDCRRRRR
""". EMHDDCCRR
830. EMHHDDR
831. EMHDDRRRRR
832. EMDCCRRRRRR
""". EMDRRRRRRRR
834. EMHHRRRRRR
""". MHHDCCRRRRRRR
836. EMDCRRRRRRR
""". EMHHCRRR
838. EHDDRRRRRRRR
""". MHHDCRRRRRRRR
840. EHHCCRRRRRRR
841. EMDDCRRRRRR
842. EMDDRRRRRRR
""". EMHDCCRRRR
844. EMHHDRRR
845. EHHCRRRRRRRR
846. EMDDCCRRRRR
847. EHHDDCCRRRR
848. EHHDCCRRRRRR
""". EMHDDCCR
850. EDDCCRRRRRRRR
""". EMHDDCRRRR
852. EMHHDCCRR
""". EMHRRRRRRRR
""". HHDDCCRRRRRRRR
855. EHHDCRRRRRRR
856. EMHHDCC
857. EMHHDDCRR
858. MHHDDCRRRRRRR
859. MHHDDRRRRRRRR
860. EMHHCRRRRR
861. EMHCCRRRRRR
""". EMHHDDC
863. EMHCRRRRRRR
""". EMHHCCRR
865. EHHDRRRRRRRR
""". EMHHDRRRRR
""". MHHDDCCRRRRRR
868. EHHDDCRRRRRR
""". EMHDCRRRRRR
870. EHHDDRRRRRRR
""". EMHDRRRRRRR
872. EMHHDCRR
873. EHDCCRRRRRRRR
874. EMCCRRRRRRRR
""". EMHDCCRRRRR
876. EHHDDCCRRRRR
""". EMHDDRRRRRR
""". EMHHDDRR
879. EMHDDCCRRR
""". EMHHCCRRRR
881. EMDCCRRRRRRR
882. EMDCRRRRRRRR
883. MHDDCCRRRRRRRR
884. EHDDCCRRRRRRR
""". EHDDCRRRRRRRR
""". EMDDCCRRRRRR
887. EMHHDCRRRR
888. EMDDCRRRRRRR
""". EMHDDCRRRRR
890. EMDDRRRRRRRR
""". EMHHRRRRRRR
892. EMHHDDRRRR
893. EHHCCRRRRRRRR
894. EMHHDCCR
895. EMHCRRRRRRRR
896. EHHDCCRRRRRRR
897. EMHHDDCCR
898. MHHDCCRRRRRRRR
899. EMHCCRRRRRRR
""". EMHDRRRRRRRR
""". EMHHDDCR
902. EHHDCRRRRRRRR
903. EMHHCRRRRRR
904. EMHDDCCRRRR
905. EMHHDRRRRRR
906. EMHDCCRRRRRR
907. EMHDCRRRRRRR
""". EMHHDCCRRR
909. EHHDDCRRRRRRR
910. EHHDDRRRRRRRR
911. EHHDDCCRRRRRR
""". EMHHCCRRRRR
913. EMHDDRRRRRRR
""". EMHHRRRRRRRR
""". MHHDDCRRRRRRRR
916. EMDCCRRRRRRRR
917. EMHDDCRRRRRR
""". EMHHDDCRRR
919. MHHDDCCRRRRRRR
920. EMDDCRRRRRRRR
""". EMHHDCRRRRR
922. EHDDCCRRRRRRRR
""". EMDDCCRRRRRRR
924. EMHHDDCC
925. EMHDDCCRRRRR
926. EMHCCRRRRRRRR
""". EMHHCRRRRRRR
928. EMHHDDRRRRR
929. EMHHDRRRRRRR
930. EMHDCRRRRRRRR
931. EHHDCCRRRRRRRR
""". EMHHDDCCRR
933. EMHHDCCRRRR
934. EMHDCCRRRRRRR
""". EMHHCCRRRRRR
936. EMHDDRRRRRRRR
937. EMHHDCRRRRRR
938. EHHDDCRRRRRRRR
939. EMHDDCRRRRRRR
940. EMHHDDCRRRR
941. EMHHCRRRRRRRR
942. EMHDDCCRRRRRR
943. EMDDCCRRRRRRRR
""". EMHHDDRRRRRR
945. EHHDDCCRRRRRRR
946. EMHHDCCRRRRR
947. EMHHDRRRRRRRR
948. EMHHCCRRRRRRR
""". MHHDDCCRRRRRRRR
950. EMHDCCRRRRRRRR
951. EMHHDCRRRRRRR
""". EMHHDDCRRRRR
953. EMHHDDCCRRR
954. EMHDDCRRRRRRRR
955. EMHHDCCRRRRRR
""". EMHHDDRRRRRRR
957. EMHDDCCRRRRRRR
958. EHHDDCCRRRRRRRR
959. EMHHCCRRRRRRRR
960. EMHHDCRRRRRRRR
961. EMHHDDCCRRRR
""". EMHHDDCRRRRRR
963. EMHHDCCRRRRRRR
964. EMHHDDRRRRRRRR
965. EMHDDCCRRRRRRRR
966. EMHHDDCRRRRRRR
967. EMHHDDCCRRRRR
968. EMHHDCCRRRRRRRR
969. EMHHDDCCRRRRRR
970. EMHHDDCRRRRRRRR
971. EMHHDDCCRRRRRRR

Title: Re: Handicap Order - what beats what?
Post by Arimabuff on Apr 19th, 2008, 7:19pm

on 04/19/08 at 18:19:53, aaaa wrote:
Unfortunately, using the 3 material evaluators on Janzert's page gives me a considerable amount of ties. Nevertheless, here is the result I get:
...

With all due respect, that list doesn't make any sense in the extreme part of the handicaps as is often the case with formulas applied too systematically. It says that ONE CAT is weaker than TWO rabbits yet ONE CAT plus ONE RABBIT is STRONGER THAN Three rabbits. How can anybody rely on such absurd results? Either the difference between two rabbits or a cat is so insignificant that it would explain the bizarre switch or we are in the domain of the formula where the uncertainty outweighs the significant part of the number. Something like .5 give or take 1 for a number for instance. That happens all the time in mathematics, the data is so lacking in precision that you start getting results that don't make any sense.

Besides another thing that makes me think that the formula is based on a systematic application that is detached from reality is that you need at least one rabbit in your pieces or you have lost the game BEFORE you even started so when you stop to think about it how can the program even judge that a lone cat is stronger than a lone rabbit when the lone cat means that you have exactly ZERO chance of winning the game? That doesn’t make any sense. I say that formula is unreliable when it comes to BIG handicaps and shouldn’t be used to settle a dispute. In fact, there is also one detail that isn’t taken into account in your formula and that is that I WAS THE FIRST to come up with my result, that should count for something. Had 99 won the three rabbit first we wouldn’t even be having this discussion. I found a solution to a problem that 99 was UNABLE to resolve with less than FOUR rabbits plus a CAT, a solution that put it down to ONE rabbit and one CAT. And I’d have known how to solve the three rabbit problem if I had been informed that it was a possibility. You keep forgetting that the only reason 99 got the three rabbit and not I, is because that I didn’t know that I needed to resolve it. This is a case that we have NEVER encountered in handicap situations.

Someone not getting a record not because of lack of skill or unwillingness but because of not being made aware that he was to get it. In my game all the problems are resolved I would have found the solution even faster than 99 because I had already studied the case. I am the true holder of the record be it CR or RRR they both belong to me because I have being deceived into believing that I had won. Nobody neither 99 nor anyone else said anything about the record being beatable. In American law, you have something that can nullify a trial that is called “unfair surprise”, well that’s exactly what happened to me I have been robbed of the result because of unfair surprise and not beaten on skills but on that stupid unfair surprise and nothing else. 99 merits are exactly ZERO in that affair. If you take a look at his solution you can see that a child could have deduce is from mine. Before my attempt 99 didn’t even DARE to try with less than cat plus FOUR rabbits and all of a sudden THANKS TO ME he tries three rabbits!!! What made him make that leap of faith between FOUR rabbits plus cat to THREE rabbits without CAT. I have been dispossessed of my victory by something that disgusts me just to think about it. Ever see Amadeus? I am Mozart been stolen his final requiem by a poisonous Salieri, that’s what I am. Whatever way you look at it, the hall of fame belongs to me on that one. Salieri didn’t even know that he could do it with less than FOUR rabbits and a CAT before I came along and now since as Mozart was himself I am hated by some here, I will be deprived of my due victory because of that travesty of a contest.

Title: Re: Handicap Order - what beats what?
Post by Arimabuff on Apr 19th, 2008, 8:31pm
To sum up what I said in a few words. This case is pretty simple. I did come up with the solution to CR and that solution slightly adapted also works for RRR. I didn't play the latter NOT because I couldn't or was unwilling to but because I DIDN'T KNOW that I HAD TO. That’s the only reason. There is no way in hell that 99 could have had this RRR game if not for me being kept in the dark, be it accidentally or on purpose is not really the point. The point is that the reason why 99 played this game and NOT ME has nothing to do whatsoever with merits and everything to do with someone being deprived of his due victory. The CR belongs to me as well as the RRR which is only a variation of it. Thanks to me 99 knew how the rabbits of Gnobot would react because they reacted in both cases mine and his in a similar fashion. I am sick of having to explain again and again something that is so obvious. It doesn't matter actually if 99 did what he did on purpose or because he is just lacking of a conscience that tells him that some things cross a line. The result is that I have been deprived of my due victory in both cases. Because I didn't have the information that it was even useful for me to play this game.

NOBODY here has ever been subjected to an injustice like that. Have a shred of honesty and ADMIT IT!!! The fact that you don't like me shouldn't be factored in unless you consider that Arimaa is a popularity contest where merits take a back seat. However, if you do then you are a far worse people than I'll ever be!!!!

Title: Re: Handicap Order - what beats what?
Post by IdahoEv on Apr 20th, 2008, 1:04pm

on 04/18/08 at 17:11:04, aaaa wrote:
I'm planning to come up with my own material evaluator in the form of a multilayer perceptron based on game statistics. Here's hoping it will have better output than garbage.


You'll need to keep the number of nodes in that network very low, otherwise you'll overtrain badly.    There just isn't enough data.

The number of possible material state combinations is extremely high, and the majority of them have never occurred in any game.  I discovered this when I did research in 2006 into optimizing the coefficients of material evaluators based on the game statistics.  Even when there were only a few coefficients, like DAPE which has 7 coefficients, we saw some bad overtraining and/or sensitivity to data selection, and had to start implementing interesting constraints on the selection of games we used in order to get results that made any sense.

At some point soon, I hope to re-run those numbers, by the way.  There have been two more years of games now, it would be nice to see if the results have changed significantly.

Title: Re: Handicap Order - what beats what?
Post by IdahoEv on Apr 20th, 2008, 1:17pm
Just a quick mention that I would personally much disregard all the results in this thread that used results from the original formulations of DAPE and FAME.  When I dug into this in 2006 I found that those algorithms misrepresented the results from real games relative to algorithms with correctly tuned coefficients.

Using the ones on Janzert's calendar, the improvement (in terms of correctly predicting the outcome of games in the database) of optimized DAPE, or LinearAB (my function) over DAPE or FAME was significantly greater than the improvement DAPE or FAME gives over simply counting all the pieces (i.e. considering all pieces worth 1.0 point regardless of size).


Title: Re: Handicap Order - what beats what?
Post by mistre on Apr 20th, 2008, 4:26pm
If you rank the 114 major handicaps for the LinearAB function, I can redo the analysis using only LinearAB and DAPE (optimized).

Title: Re: Handicap Order - what beats what?
Post by Janzert on Apr 20th, 2008, 4:42pm

on 04/20/08 at 13:17:13, IdahoEv wrote:
Using the ones on Janzert's calendar, the improvement (in terms of correctly predicting the outcome of games in the database) of optimized DAPE, or LinearAB (my function) over DAPE or FAME was significantly greater than the improvement DAPE or FAME gives over simply counting all the pieces (i.e. considering all pieces worth 1.0 point regardless of size)


I've never been convinced that this wasn't simply because of overfitting. I think the number of games and range of games we have available is still just too small to try and train a general method from.

Janzert

Title: Re: Handicap Order - what beats what?
Post by IdahoEv on Apr 20th, 2008, 7:07pm

on 04/20/08 at 16:42:54, Janzert wrote:
I've never been convinced that this wasn't simply because of overfitting. I think the number of games and range of games we have available is still just too small to try and train a general method from.


Anything's possible.   But LinearAB only has two parameters, A (the cat-to-rabbit ratio) and B (the ratio used for all subsequent levels; i.e. D/C, H/D, M/H and E/M).   And it's pretty hard to overfit data with only two coefficients.  

Maybe when I get around to rerunning this I can use some part of the data for overfit testing during the optimization.   I doubt we'll see much difference.   When we looked at the particular cases that the optimized functions preferred over the guessed-at ones, they seemed pretty reasonable to me.

Rather than overfitting, I think you could make a better case that the games themselves don't actually represent the true value of the pieces very well; Karl made a good argument from causality in this vein.  (i.e. if getting ahead and being likely to win, or simply being a stronger player, makes certain captures more likely irrespective of winning, it would make those captures appear arbitrarily valuable in the analysis.)  

But that's a fundamental limitation of using statistics from the games to answer questions like these.  There isn't really anything you can do about it.

Title: Re: Handicap Order - what beats what?
Post by IdahoEv on Apr 20th, 2008, 7:21pm

on 04/20/08 at 16:26:40, mistre wrote:
If you rank the 114 major handicaps for the LinearAB function, I can redo the analysis using only LinearAB and DAPE (optimized).


If you want to do it, here's how LinearAB will rank the pieces.   For the case of beginning-game sacrifice, you can simply add the following values of the pieces to achieve the LinearAB score.   (A=1.241,  B=1.316).

rabbit      1.000
cat      1.241
dog      1.633
horse      2.149
camel      2.828
elephant      3.722

(Note that this won't give you the correct LinearAB score once the opponent starts losing pieces because it will ignore 'level collapse', i.e. the fact that EHHRerrr is functionally identical to ECCRerrr.  But for game-start sacrifice, there is no level collapse because the opposing team is full.)

I'm sure this system misvalues elephants because the E/M ratio is fixed to B.  When I made an alternate optimization that allowed E to vary independently it settled on E~7.5.  I doubt that's anything like accurate, though, because there are so few actual examples of E sacrifice to compare against.

Title: Re: Handicap Order - what beats what?
Post by 99of9 on Apr 20th, 2008, 8:10pm

on 04/20/08 at 19:07:57, IdahoEv wrote:
But that's a fundamental limitation of using statistics from the games to answer questions like these.  There isn't really anything you can do about it.

I agree, and because of this limitation, I think you should be slower to "disregard" the evals which were hand optimized by experts!  I'm looking forward to getting back to this conversation when you've run with more data.

Title: Re: Handicap Order - what beats what?
Post by mistre on Apr 20th, 2008, 8:59pm
Since there is disagreement on this one, I will take a back seat for now and let you guys sort out which evals are more reliable than others.  Is there any way to compare their predictions with real games and see which one was better?

Title: Re: Handicap Order - what beats what?
Post by IdahoEv on Apr 20th, 2008, 9:08pm

on 04/20/08 at 20:10:03, 99of9 wrote:
I agree, and because of this limitation, I think you should be slower to "disregard" the evals which were hand optimized by experts!


Fair enough, that.

Title: Re: Handicap Order - what beats what?
Post by IdahoEv on Apr 20th, 2008, 9:17pm

on 04/20/08 at 20:59:16, mistre wrote:
Is there any way to compare their predictions with real games and see which one was better?


Yes, it's possible, and that is in fact how the coefficients for LinearAB and Optimized DAPE were developed.   Existing evals were repeatedly modified (by tweaking the numbers) and measured as to how well they predicted the eventual winner in real games from the database.  That cycle was repeated until the eval functions stopped getting "better".

At the end of that process on several eval functions, Optimized DAPE (as implemented on Janzert's page) was the best overall at guessing the winner among games in the history database.   LinearAB was a bit behind that in 2nd place.  Both did much better than the hand-coded versions.

The debate is whether or not that process is valid in order to determine a material eval function.   Karl and Toby are less sure about that than I am.  I recognize their concerns that might make the process invalid, but I suspect it still is.  There's no way to prove it one way or the other.  :)

Title: Re: Handicap Order - what beats what?
Post by mistre on Apr 20th, 2008, 9:36pm
Were the game that were looked at human vs human games?  bot vs bot games? human vs bot games? or all 3?

I can't wait to see what your results will bring when you re-run the analysis.  There are quite a few more E handicap games in the database, thanks to me.  ;D

Title: Re: Handicap Order - what beats what?
Post by Fritzlein on Apr 20th, 2008, 10:14pm

on 04/20/08 at 21:36:15, mistre wrote:
I can't wait to see what your results will bring when you re-run the analysis.  There are quite a few more E handicap games in the database, thanks to me.  ;D

If your elephant handicap games are included, it could significantly mess up the value of the elephant.  You probably won a lot more than one would expect from starting down an elephant, therefore a function that is optimized to fit that data will do better if it values the elephant less than it should.

As for the superiority of LinearAB in predicting winners in games databases, let's just say that when Zombie wins the Computer Championship and I am defending the Arimaa Challenge, I will have my fingers crossed that Zombie still likes trading its camel for a horse and a rabbit, and I get to make that trade at the start of every game. :-)  For all their flaws, FAME and DAPE correctly prefer having the camel, while LinearAB and DAPE(eo) get it wrong.  But I'm willing to believe that as more pieces get traded, FAME gets progressively less accurate, while LinearAB gets better.  I mostly tuned FAME for opening trades, not midgames and endgames.


Title: Re: Handicap Order - what beats what?
Post by IdahoEv on Apr 20th, 2008, 10:16pm

on 04/20/08 at 21:36:15, mistre wrote:
Were the game that were looked at human vs human games?  bot vs bot games? human vs bot games? or all 3?


Most of the analysis was done with games where both players were rated over 1600, and including HvH games and BvB games, but not HvB games because there was so much botbashing strangeness in that category and I was trying to represent the true value of the pieces to players honestly trying to win a straight-up game.  :-)       Some of the analysis was done both ways; with and without including HvB games.

You can find the original discussion in these three threads:
thread one (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=talk;action=display;num=1146199776),  thread two (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=talk;action=display;num=1163044066), and thread three (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=talk;action=display;num=1163717031).

Title: Re: Handicap Order - what beats what?
Post by Janzert on Apr 20th, 2008, 10:41pm
The third link above should to this thread (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=talk;action=display;num=1163717031), I believe.

If you do end up running the evaluators through new games again there are two things I would find interesting. First, some sort of cross validation where a subset of the games are used for training then the rest used for testing. An obvious application initially would be to test all the old constants on the games played since then. Second, in addition to looking at a win percentage prediction also check a straight side to win prediction.

Janzert

p.s. Another thing I just thought is instead of scoring evaluators per game, score them per material pattern.

Title: Re: Handicap Order - what beats what?
Post by IdahoEv on Apr 21st, 2008, 2:45am
Fixed the third link above, thanks.

Cross-validation I will definitely do.


on 04/20/08 at 22:41:59, Janzert wrote:
Second, in addition to looking at a win percentage prediction also check a straight side to win prediction.


I'm not quite certain what you mean by that.  

All the material functions simply output a number that represents who is ahead and by how many rabbits.  One could  simply check whether score > 0 and assume that means "gold win", then add a point of error if silver won, and vice versa.     Or you can use another function to convert the score into a probability estimate as to who is more likely to win, and score the difference, so if the eval predicts an 80% chance of a gold win, it accrues 0.2 error  for every gold win from that state and 0.8 error for every silver win.  

In the limit of large numbers of sample cases, these amount to the same thing: the training of the coefficients will find the same solution.   When the number of test cases is finite, the probability estimate will find the solution a bit faster and more reliably.


Quote:
p.s. Another thing I just thought is instead of scoring evaluators per game, score them per material pattern.
 

You mean so that the error function is  Sum over states(error for state n),  instead of sum over states(error for state n)*(number of times n has appeared)?   I suppose it could.   This would cause the functions to attempt to match mid-game  and end-game states more strongly than early-game states (relative to the way I did it before), because later states are much less likely to be duplicated in the database.   It also leaves open the question of how to score the individual states, though, when we do have multiple examples.   If state N is won by gold 53% of the time and silver 47%, how much error do we accrue the training function for predicting a gold win?  0.47?  0.0?    

I don't intuitively see this as an improvement, though you're more than welcome to try to convince me.  :-)

Title: Re: Handicap Order - what beats what?
Post by Janzert on Apr 21st, 2008, 11:12am

on 04/21/08 at 02:45:36, IdahoEv wrote:
All the material functions simply output a number that represents who is ahead and by how many rabbits.  One could  simply check whether score > 0 and assume that means "gold win", then add a point of error if silver won, and vice versa.


Yes, this is what I mean. But I'm just interested in seeing the result, I didn't mean to use it for training. I'm simply wondering if some of the evaluators predict the correct side to win more frequently but when they get it wrong they get it wrong by a larger margin.


Quote:
You mean so that the error function is  Sum over states(error for state n),  instead of sum over states(error for state n)*(number of times n has appeared)?


Right, although states that have occured less than some cutoff should be excluded. Perhaps better would be sum over states(error for state n) * log(number of times n has appeared) or some other sub-linear weighting.

My reasoning for this is to try and see what the error is over the whole space of possible material states, rather than weighted towards how frequently those states appear in the game database. Basically I was motivated by this comment you made in that third thread.


Quote:
Since there are many many examples of a single rabbit loss in the DB - and since those equate to a 55/45 win/loss or so, the optimizer has to work very hard to generate a 0.55 output for that case in order to minimize the error.


In regards to,

Quote:
It also leaves open the question of how to score the individual states, though, when we do have multiple examples.   If state N is won by gold 53% of the time and silver 47%, how much error do we accrue the training function for predicting a gold win?


I meant to use your previous formula to turn an evaluator score into a prediction percentage then directly compare the error of the actual percentage of wins in the database (e.g. since fame has a K of 2.92 the prediction for 1 rabbit loss is 0.58% which is a 0.03% error).

Janzert

Title: Re: Handicap Order - what beats what?
Post by Fritzlein on Apr 21st, 2008, 2:14pm

on 04/21/08 at 02:45:36, IdahoEv wrote:
Or you can use another function to convert the score into a probability estimate as to who is more likely to win, and score the difference, so if the eval predicts an 80% chance of a gold win, it accrues 0.2 error  for every gold win from that state and 0.8 error for every silver win.  

In the limit of large numbers of sample cases, these amount to the same thing: the training of the coefficients will find the same solution.

That's right, the there is the same minimum in both cases.  For example, say that there have only been three material states ever, always a cat for a rabbit, and the side with the extra cat won two out of three.  If my penalty function for predicting percentage P on the side with the cat is (1-P) when I am right and P when I am wrong, then my total penalty function is 2*(1-P) + 1*P = 2-P.  I minimize my penalty by setting P=1, i.e. I should predict 100% for the side with the cat.

Something is wrong when you are optimizing a variable P in such a way that the "optimum" value doesn't match observation.  The root cause of this problem is that you are optimizing using the wrong penalty function.  You should penalize square error.  Then the total penalty function is 2*(1-P)^2 +1*P^2 = 3*P^2 - 4*P - 2.  What is the value of P that minimizes this penalty?  Astonishingly, it is P=2/3, i.e. the exact fraction of the time that the cat won.  The least square error metric is a wonderful thing...

Now that you have posted that detail, I see you were effectively only rewarding the function that was right the most often, and ignoring how much it was right by, even when you brought percentages into the mix.  So the optimization was all provided by the few cases in the middle that are in doubt, and you were only tweaking coefficients to be right in the maximum number of close cases.  I can't help but point out that we should not expect the optimized functions to do well in extreme cases (such as massive handicaps), if the function was optimized on the basis of only close cases.  Moreover, since only the close cases matter, you are effectively throwing away most of your optimization data, and optimizing over a much smaller set, which one can expect to make the results less reliable.

Suddenly I am extremely curious to have you re-run your optimization with percentages and the least square error function, so that the scaling actually does matter, i.e. it does matter how much one side is ahead.  The results might be essentially the same, or they might be substantially different.

 

Title: Re: Handicap Order - what beats what?
Post by IdahoEv on Apr 22nd, 2008, 1:47am
Okay Karl, I was with you right until you said this:


on 04/21/08 at 14:14:46, Fritzlein wrote:
Now that you have posted that detail, I see you were effectively only rewarding the function that was right the most often, and ignoring how much it was right by, even when you brought percentages into the mix.  So the optimization was all provided by the few cases in the middle that are in doubt, and you were only tweaking coefficients to be right in the maximum number of close cases.


Because what you are asking me to do here:


Quote:
Suddenly I am extremely curious to have you re-run your optimization with percentages and the least square error function, so that the scaling actually does matter, i.e. it does matter how much one side is ahead.


Is exactly what I did in 2006/2007, and I'm not following what led you to believe otherwise.     The functions were all optimized by minimizing the least-squared-error, computed as the difference between the %age confidence of gold win vs. the actual game result for every state in the database.  

They definitely weren't optimized against only the close cases, they were optimized against all cases.  (subject to the inclusion criteria described in those threads, i.e. ratings >= 1600, no HvB, no mid-exchange states, etc.).

It seems to me that Janzert was asking me to ignore the % confidence and just use a binary error function, and I was explaining why I doubt that would be an improvement.

Title: Re: Handicap Order - what beats what?
Post by Fritzlein on Apr 22nd, 2008, 7:33am

on 04/22/08 at 01:47:07, IdahoEv wrote:
Because what you are asking me to do here [...] Is exactly what I did in 2006/2007, and I'm not following what led you to believe otherwise.     The functions were all optimized by minimizing the least-squared-error, computed as the difference between the %age confidence of gold win vs. the actual game result for every state in the database.

Oh, whoops.  I thought that was what you had done, but then when you started talking about errors of 0.2 and 0.8 for an 80% prediction, then I thought I had mis-remembered.  But I guess I was mistaken in my mistake.  My apologies.  I should have at least checked the original thread and given you the credit that I had been giving you!

Given that all the material states did in fact contribute to your optimized coefficients, I'll have to think harder to explain why I don't like the results.  I mean, harder than the obvious explanation that my own intuitive material evaluation is wrong.  :P

But, given how many things I've been wrong about in the past, why think too hard?  I used to think that 99of9 was overly bold to open with two rabbits in front, but now I have four in front every game.  I once said I knew I was winning because I had gotten a camel hostage by sacrificing only two cats, which I'm now confident means I was losing.  My mocking LinearAB for preferring HR to M in the opening may only be fodder for history to laugh at me some more.  :)

Title: Re: Handicap Order - what beats what?
Post by Janzert on Apr 22nd, 2008, 1:05pm
As the ranting seems to have died down over this, here are my thoughts on trying to rank handicaps by difficulty. These are more or less in the order they occured to me. ;)

First, current evaluators were mostly developed to look at near even piece trades. So it's unsurprising to me that they give unreasonable results when used to compare large handicaps. Also all but the empirically optimized ones by IdahoEv are simply based on current intuition rather than any objective measure. Because of the way the empirically optimized ones are trained I wouldn't expect them to do any better on extreme handicaps either.

So what do we actually mean when trying to rank the piece handicaps? One thought I had was to try and relate it to the probability of a game theoretic proven loss. But I can easily give a fairly simple algorithm (involving no search, a p0 bot if you will) that would beat perfect play when playing against any handicap that leaves only cats and rabbits. So on some level you could say that all handicaps leaving only cats and rabbits are equivalent (trivially provably lost). Obviously any difficulty measure needs to be able to distinguish between any, or almost any, handicap.

Another aspect is that almost certainly different opponents are going to have different rankings in difficulty for various sacrifices. Almost certainly even to the point of handicap 1 being impossible and 2 being possible against opponent A but the reverse against opponent B. But of course we want to establish a general ranking that in some way represents an overall ordering of difficulty, i.e. free of 'opponent bias'.

My current idea for an objective, bias free, although theoretical, measure of difficulty would be to look at certain features of the game tree derived from a given handicap. The first thought I had was to use the percentage of leaves that are wins, lower percentages being more difficult of course. This has the potential problem though of a given tree having many early critical moves followed by a period where most moves lead to wins. This could lead to it having a percentage just as high or higher than another tree that has several viable choices at every stage throughout the game and therefor easier to play. A better metric that avoids this problem would be one that looks at the interior nodes instead of just the leaves. Perhaps looking at the percentage of critical moves as the game progresses or maybe something involving the average length of critical move sequences. Another potential problem with any similar approach is that by definition 'good play' does not travel uniformly through the game tree. But I'm afraid any attempt to try and correct for that will probably lead to opponent bias in the results.

Of course it is beyond any current, or apparent near future, resources to construct and/or look at the whole game tree for even a single handicap. I wonder though if some sort of random sampling could produce useful results.

Anyone have further thoughts on this or other ideas?

Janzert

Title: Re: Handicap Order - what beats what?
Post by IdahoEv on Apr 22nd, 2008, 2:14pm

on 04/22/08 at 07:33:49, Fritzlein wrote:
Given that all the material states did in fact contribute to your optimized coefficients, I'll have to think harder to explain why I don't like the results.  I mean, harder than the obvious explanation that my own intuitive material evaluation is wrong.  :P


Cognitive dissonance and confirmation bias are, after all, the reason we have the scientific method.  :P  You can see evidence of my own biases in the newly-reconstituted Material Eval II thread, defending my evaluator against a big fault demonstrated by 99:  I'm convinced I'm my own system is correct despite evidence to the contrary.  :-)

What I can tell you is that as of the last time I ran the data, the coefficients of LinearAB and optimized DAPE -- coefficients which lean more towards piece number and less towards piece size, relative to human expert intuition -- were definitely supported by the actual game history, both in the aggregate of all states, and in the specifics we examined like M vs. HR.   I was as surprised by these coefficients as you are.   One can argue that the game history doesn't actually represent the value of the pieces, and you might even be correct!  But it's definitely an uphill argument.

Your argument about causality is IMHO the strongest argument here.  We don't know if HR captures are causing wins more often, or if winning play is causing HR captures more often.    Put statistically, there's no way to tell whether the database results are measuring p(win|HR) or p(HR|win), and poor Thomas Bayes will spin pirouettes in the ground if we assume those are the same thing.

But consider this:  being a winning player would tend to impose your understanding of the game on the database, because you would play in the way you believe is right, and the win (since you're a strong player) would demonstrate the "rightness" of your approach, even if it's not actually optimal.   So the database evidence should be skewed towards the human belief that bigger pieces are much more valuable, simply because the top players have been playing that way and winning.   So maybe even LinearAB is overvaluing the big pieces!

In any case, the database history is the only set of objective data we've got.  Despite possible (but unproven) flaws in it, my instinct as a scientist is to trust data over human intuition.  And if I'm the only bot developer taking that approach, believe me that's perfectly fine with me!  If there's any modest possibility it's the correct approach, i'd certainly prefer to be the only one doing it  :P  

Edited:  Move 'grammar'.  For great clarity.

Title: Re: Handicap Order - what beats what?
Post by Fritzlein on Apr 22nd, 2008, 3:43pm

on 04/22/08 at 13:05:06, Janzert wrote:
Another aspect is that almost certainly different opponents are going to have different rankings in difficulty for various sacrifices. Almost certainly even to the point of handicap 1 being impossible and 2 being possible against opponent A but the reverse against opponent B.

Hey, that's a pretty strong argument in favor of my proposal to list both handicaps any time the two are incommensurate!


Quote:
But of course we want to establish a general ranking that in some way represents an overall ordering of difficulty, i.e. free of 'opponent bias'.

Do we really want this even if it is wrong?  If there is one bot that can be beaten more easily with RRR and another that can be beaten more easily with CR, do we want to lock ourselves into recognizing only one as the greater achievement in both cases?

It seems that trying to get this definitive relative ranking is like trying to get a definitive answer as to whether a camel is worth 4 rabbits or 5.  Why break your back tuning it to 4.38051 rabbits when you know it is only an average, and sometimes the camel will be worth more than that and sometimes less?  Instead we try to build dynamic evaluations that recognize that how much a camel is worth depends on the situation.

Instead of trying to say in advance what is best for a certain bot, we can instead keep all incommensurate handicap records, and see what all different handicaps can be achieved...

Title: Re: Handicap Order - what beats what?
Post by aaaa on Apr 22nd, 2008, 6:25pm
If we want to try to make any headway with the theory of material evaluation, we should work out the various relevant considerations that could be captured in intermediate variables. It may even be the case that four of those, namely those that stand for "army strength" and "goal threat" for both sides, may not even be enough.

Title: Re: Handicap Order - what beats what?
Post by Janzert on Apr 22nd, 2008, 6:28pm
Regarding handicaps having different difficulties against different opponents:

on 04/22/08 at 15:43:26, Fritzlein wrote:
Hey, that's a pretty strong argument in favor of my proposal to list both handicaps any time the two are incommensurate!


I'm not sure how the word "incommensurate"1 applies here, but I'll take a stab at interpreting your meaning to be "roughly equal" (i.e. commensurate).

I think either every handicap that is accomplished has to be listed2 or you have to define a full ordering. We could judge some handicaps to be equal but the more of those there are the less useful the ordering is.

But even disregarding that, I think the opponent bias is going to be larger than just roughly equal handicaps being possible against one opponent and not another. I wouldn't be surprised to see a bot against which an E handicap is possible but not a M handicap while others would more likely to have the reverse.

Regarding a general list of handicaps:

Quote:
Do we really want this even if it is wrong?  If there is one bot that can be beaten more easily with RRR and another that can be beaten more easily with CR, do we want to lock ourselves into recognizing only one as the greater achievement in both cases?


Actually I think we want to answer yes to both questions or at least yes with qualifications. ;)

Without a general list we have to define an ordering for each bot. I believe it is impossible to make a bot specific ordering that includes handicaps that have not yet been accomplished. Even just stating whether a handicap is possible or not would seem to be impossible, never mind creating a more fine grained ordering. This leads to the problem of defining the finish line after the race is over. Also even after various handicaps have been accomplished, I'm not sure how to simply define an objective measure of difficulty against the specific bot.

Have a general list allows predefined goals and reduces the burden on the community to only make one ordering instead of one for each bot. Even if as more arimaa knowledge is gained it is decided that some specific handicap ordering needs to be changed, I think this is a better solution than trying to make separate orderings for all bots.

Janzert

1 (adj) incommensurate (not corresponding in size or degree or extent) "a reward incommensurate with his effort".

2 I seem to recall a proposal a long time ago to simply list the first instance of every handicap that was done and some very vocal resistance to the idea.

Title: Re: Handicap Order - what beats what?
Post by Fritzlein on Apr 22nd, 2008, 7:31pm

on 04/22/08 at 18:28:18, Janzert wrote:
Regarding handicaps having different difficulties against different opponents:

I'm not sure how the word "incommensurate"1 applies here, but I'll take a stab at interpreting your meaning to be "roughly equal" (i.e. commensurate).

Read reply #24 in this thread to understand what I meant by incommensurate, and my proposal to deal with it.  But I used the wrong word.  I meant "incommensurable", meaning they can't be compared.

We can know for sure that CRR > CR > RR, but we can't compare CR to RRR or DRR to CCR.  My idea is not to have a separate, fully-ordered ranking for each bot, but to treat as records all handicaps for a bot that aren't clearly beaten by some better handicap.

Title: Re: Handicap Order - what beats what?
Post by 99of9 on Apr 22nd, 2008, 7:59pm
If we're willing to have more than one record per bot (I estimate 3-6 for most bots), then partial ordering is a great way to go because it removes the need to rely on any materials eval at all.

Even if we go down that track, I think it is worth having the conversation about which handicaps are harder in general, but it would certainly take the angst out of that conversation.

Regarding mistre's suggestion of one list per person.  I agree with Arimaabuff that this is not the best way to go on the main "records" page, because the competition does help spur us on.  However, I agree with mistre that this would be helpful for newer or less competitive botbashers, so I recommend setting this up as a subpage (where you can fit many more people).

Title: Re: Handicap Order - what beats what?
Post by Janzert on Apr 22nd, 2008, 8:28pm

on 04/22/08 at 19:31:13, Fritzlein wrote:
Read reply #24 in this thread to understand what I meant by incommensurate, and my proposal to deal with it.  But I used the wrong word.  I meant "incommensurable", meaning they can't be compared.

We can know for sure that CRR > CR > RR, but we can't compare CR to RRR or DRR to CCR.  My idea is not to have a separate, fully-ordered ranking for each bot, but to treat as records all handicaps for a bot that aren't clearly beaten by some better handicap.


Ahh, ok. First a minor nitpick, saying we can't compare two things is saying we can't tell whether green, water or 23 grams is better. I think all handicaps are comparable, whether we can figure out if a particular handicap is better, worse than or equal to another handicap is another matter. Having records for all handicaps that are equal, within the margin of error or plain undecidable on which is better is fine with me. In actual fact I care very little how the records are listed.

You say we know CRR > CR > RR and that we can't for CR-RRR and DRR-CCR. While I completely agree with you on the former and have no opinion at all on the latter, how do you decide this is the case? The method for deciding this is what interests me.

Janzert

Title: Re: Handicap Order - what beats what?
Post by Fritzlein on Apr 22nd, 2008, 8:30pm

on 04/22/08 at 14:14:38, IdahoEv wrote:
One can argue that the game history doesn't actually represent the value of the pieces, and you might even be correct!  But it's definitely an uphill argument.

I think that playing more aggressively recently has given me an insight I didn't have before into unbalanced trades.  I have always been a control player rather than a race player.  That is to say, I don't want to capture more of your pieces than you capture of mine; I want to capture something of yours in return for nothing.  Race players, on the other hand, are willing to give in order to get.  They will slug it out as long as they are getting the best of it.  In the past I have been unwilling to race even when I would get the best of the race, because I didn't want to lose control.  This unwillingness to race often came back to haunt me, because it isn't always possible to keep everything under control, even with a positional or material advantage.  Sometimes you have to race or lose your control, but if you are willing to race, you can trade control for a race that favors you.  Being able to identify and accept those favorable races actually makes control more valuable to me.

In comparing M to HR as an opening trade, it is clear that the M is better for control, i.e. more likely to produce a free capture for nothing.  The HR on the other hand, is better for racing and slugging it out, because every equal trade of pieces favors the HR side, most particularly a trade of H for H.  The question of who the material balance favors overall can be translated into whether the M side can force the game into a control game, or the HR can force the game into a slugfest.

Naturally the answer is not absolute.  One can always force the game into a race if one is willing to pay a price.  So maybe a better way of putting it is in terms of tradeoffs: what will the HR player have to pay to force the game into a race, and/or what will the M player have to pay to keep the game a control game.

This language gives me a new way of expressing why I think M is superior to HR.  Racing is, in essence, having the elephants apart.  Each player tries to do damage with his own elephant rather than defend any damage the opposing elephant does.  But when I have M and my opponent doesn't, I can fearlessly use my elephant to track his elephant, and defend anything his elephant tries to get started.  I can become Mr. Tag-along.  It doesn't bother me if both our elephants get bound up in a defensive deadlock, because I will have the strongest free piece (M) if that happens.  Meanwhile my opponent's elephant can't afford to track my elephant and stop whatever I am up to.  He will always have to leave, complicate, and threaten to trade.

I submit that races and slugfests are easier to understand than the control game, and that the weaker the players are, the better HR fares compared to M.  Even a player as strategically strong as Omar has made the mistake of racing when he is ahead by M for H, even though he wouldn't have had to race.  (I specifically remember scolding him for this. :P)  In a race strength is not so important, and the advantage of extra strength evaporates.

I submit that the strongest players prefer M to HR, because that's what is more useful in games against each other.  The more one understands the control game, the more one understands when and how not to race.

You may say that I have an uphill argument because I am arguing against the data, but you also have rather an uphill argument against the intuitions of strong players.  If I pitted two random steppers against each other, I wouldn't be surprised if the game results favored R over E  when the two players start with those respective handicaps.  (Actually, that experiment wouldn't be too hard to run.  99of9 could tell us in an afternoon whether E or R is worth more to a random stepper.)  But if R happens to be favored over E in that set of games, who cares?  The strength of the players invalidates the data.

I argued in a different thread that an M handicap is only a moderate advantage between beginners, but is decisive among experts.  The value of the extra piece increases with playing strength.  I fully expect that the same is true of the value of M versus HR: the stronger the players are, the more the full force of the M will be felt.  Of course, it wouldn't have to be this way; in theory it could be that the stronger the players are, the more effectively they are able to use an advantage in numbers.  But I think in reality the control game is generally more subtle and difficult than the race game, and understood later in one's Arimaa career.

This is just my $0.02 that the "reverse causality" argument is not the only argument that undermines the game database.  The "data" that will really convince me is when chessandgo uses his HR to beat my M. :)

Title: Re: Handicap Order - what beats what?
Post by Fritzlein on Apr 22nd, 2008, 8:39pm

on 04/22/08 at 20:28:47, Janzert wrote:
You say we know CRR > CR > RR and that we can't for CR-RRR and DRR-CCR. While I completely agree with you on the former and have no opinion at all on the latter, how do you decide this is the case? The method for deciding this is what interests me.

What I propose is that handicap A is bigger than handicap B if and only if the pieces in A can be set against the pieces in B so that every piece in B is "covered" by at least an equal piece in A, plus
1) A has at least one piece left over
or
2) A has the stronger piece in at least one pair.

DRR can't cover both cats in CCR, and CCR can't cover a D at all, so the two are incommensurable, and both would have to be listed in the Hall of Fame.  However, DCR can cover each of them while meeting criterion (2), so it would bump both of the previous handicaps out.

Title: Re: Handicap Order - what beats what?
Post by Janzert on Apr 22nd, 2008, 8:59pm
So ER=MRR or any handicap leaving 3+ pieces but without an E?

Janzert

Title: Re: Handicap Order - what beats what?
Post by Fritzlein on Apr 22nd, 2008, 10:09pm

on 04/22/08 at 20:59:48, Janzert wrote:
So ER=MRR or any handicap leaving 3+ pieces but without an E?

Yep.  But don't use the equal sign.  MCR beats one of those and not the other.  EC beats the other but not the one.  So they aren't equal, just unordered compared to each other.  It takes ERR to beat them both.

Title: Re: Handicap Order - what beats what?
Post by Janzert on Apr 22nd, 2008, 11:32pm
I find it hard to believe botbasher's will be satisfied unable to distinguish between ER on the board and MHHDDCCRRRRRRRR on the board for a handicap. If the botbasher's are willing to agree to it though, I suppose I won't fight against it. Since what it does rank I think it probably ranks correctly I just think it leaves too much unranked.

[Edit: or maybe I'm being confused about which pieces are left on the board and which are taken off]

[Edit2: actually I think maybe it doesn't matter which way around you look at it, it works out the same]

Janzert

Title: Re: Handicap Order - what beats what?
Post by 99of9 on Apr 23rd, 2008, 12:44am

on 04/22/08 at 23:32:32, Janzert wrote:
I find it hard to believe botbasher's will be satisfied unable to distinguish between ER on the board and MHHDDCCRRRRRRRR on the board for a handicap... Since what it does rank I think it probably ranks correctly I just think it leaves too much unranked.

You're right that one of those is clearly harder than the other.  But that just means the easier one will be attacked first.  Sooner or later, those two records will be broken up into (on the board):

  • ER, and
  • MDRR, and
  • MHR, and
  • HHCR, and
  • HRRRR, and
  • DRRRRR, and
  • CCRRRRR, and
  • RRRRRRRR, etc


The big question for me is how many unranked records will there be per bot in the long term, and is this unacceptably high?

Title: Re: Handicap Order - what beats what?
Post by 99of9 on Apr 23rd, 2008, 2:42am

on 04/22/08 at 13:05:06, Janzert wrote:
Another aspect is that almost certainly different opponents are going to have different rankings in difficulty for various sacrifices. Almost certainly even to the point of handicap 1 being impossible and 2 being possible against opponent A but the reverse against opponent B.

I'm sure that this is possible in principle, but I have not seen it yet.


Quote:
Of course it is beyond any current, or apparent near future, resources to construct and/or look at the whole game tree for even a single handicap. I wonder though if some sort of random sampling could produce useful results.

No, Fritz is right that random will give us misguided results.  It will favor rabbits over anything.


Quote:
Anyone have further thoughts on this or other ideas?

I only see three viable choices:

  • Use well defined materials evals in some combination.
  • Resort to partial ordering only.
  • Assume opponent bias is not a big factor and construct an order of handicaps based on what handicaps are possible in other bots.

The last one needs clarification.  Say we are comparing handicap X with handicap Y.  If handicap X (or self-evidently better) has been achieved against more bots than handicap Y (or self-evidently better) has been, then handicap Y is the better handicap (for the moment).  In the case of a tie, both records remain.  This list would be slightly dynamic, and would require a fair bit of overhead in terms of keeping it up to date.  But it would (almost by definition) be a good way of ordering the handicaps fairly.

Title: Re: Handicap Order - what beats what?
Post by mistre on Apr 23rd, 2008, 8:43am
After listening to the discussion for a while, I thought I would pop back in with my 2 cents.

Since no one has come up with anything better yet, I think we should continue to use the list we have of the 3 combined material evaluators.  Leaving many combinations unranked would just lead to a complete mess.

I really don't foresee too much of an issue, like I said, 99% of the cases for handicap game purposes shouldn't be a problem.  Someone trying for a handicap should almost always pick something obvious (for example the handicap is EMHD, so I will try EMHH instead of something like EMHCRR.

As for general ordering for the sake of knowing which handicaps are better, we should continue to try to develop a method that is better than what we have now.

What if we have a test bot play against itself with different handicaps?   Omar would have to set up a way to have piece exclusion in the set-up, but after that, it should be possible right?

Title: Re: Handicap Order - what beats what?
Post by 99of9 on Apr 23rd, 2008, 8:56am

on 04/23/08 at 08:43:20, mistre wrote:
What if we have a test bot play against itself with different handicaps?

This won't help much with extreme handicaps, because none of the bots know how to attempt an extreme handicap, and some of the bots don't even know how to stop one!

For the smaller handicaps, each bot might give you a different result.  E.g. bomb may do better against itself when handicapped with HC, but clueless may do better when handicapped with DD.  What would that tell us?

Title: Re: Handicap Order - what beats what?
Post by RonWeasley on Apr 23rd, 2008, 11:27am
I don't know if this has been said, but we might order these simply by the bot-bashing record result.  Is the metric still number of moves?

The order may not be the same for each bot, but the trend among bots should provide an indication.  Anyone who thinks a handicap is rated too hard can prove otherwise by bashing it better, or challenging the bashing specialists.

Such a ranking would not necessarily tell us what a certain handicap means 1) against a good muggle player or 2) in a position where there's been equal attrition.

Title: Re: Handicap Order - what beats what?
Post by Arimabuff on Apr 23rd, 2008, 1:27pm
BTW, Aamira2006P1 has just entered the CRN family, that is the bots that can be beat by a C and one or several Rs. That family now contains Gnobot2005P1 that is a CR1, ShallowBlue a CR2, ArimaaScoreP1 a CR3, Arimaalon a CR4 and  our latest addition Aamira2006P1 a CR5.  ;)

Title: Re: Handicap Order - what beats what?
Post by Arimabuff on Apr 26th, 2008, 6:37am
A little off topic: I beat bot_ArimaaScoreP2 with both colors with only a dog and eight rabbits. I suppose that cat and 8 rabbits is within the realm of possibilities but it seems so unlikely (instead of four pieces that can kill you, you have six and the enemy's cats become brick walls to you) that I don't expect my record to be beat within this century (7 rabbits is out of the question with only a single dog to defend the fort!).

But of course you are welcome to try.

Title: Re: Handicap Order - what beats what?
Post by Arimabuff on Apr 26th, 2008, 9:06am

on 04/26/08 at 06:37:36, Arimabuff wrote:
A little off topic: I beat bot_ArimaaScoreP2 with both colors with only a dog and eight rabbits. I suppose that cat and 8 rabbits is within the realm of possibilities but it seems so unlikely (instead of four pieces that can kill you, you have six and the enemy's cats become brick walls to you) that I don't expect my record to be beat within this century (7 rabbits is out of the question with only a single dog to defend the fort!).

But of course you are welcome to try.

Well, either I am a hundred years older or I was completely wrong on that one. I just cracked bot_ArimaaScoreP2 with a cat and eight rabbits with silver. It seemed even easier than with a dog. Now for gold, I am adamant cat and eight rabbits is completely utopist! ;)

Title: Re: Handicap Order - what beats what?
Post by aaaa on Apr 27th, 2008, 5:57pm
Here is a proposal: If there are handicap games such that one does not unambiguously dominate the other (i.e. it is not the case that one can get from one handicap to the other by any combination of adding pieces and promoting a piece which is stronger than a rabbit), then the record which stands is the one whose handicap against that particular bot has the slowest fastest result. In case of a tie, the earliest one stands. So if you have multiple players achieving for the first time certain incomparable handicaps against a particular bot, then in order to get or keep the record it would be in their interest to try to improve on the speed of the handicaps picked by the other contenders.  That way you have an objective measurement of handicap achievement and as a bonus a nice way of keeping the contest going.

Title: Re: Handicap Order - what beats what?
Post by 99of9 on Apr 28th, 2008, 2:08am
I agree it's objective, but it's not really comparing apples with apples.  The material handicap section is about material handicaps, not about time or moves.  I think that we should be looking at all possible material methods before we resort to anything else.  Since a few material valuation methods are on the table, and seem reasonable, I think we should go for one of them.

Title: Re: Handicap Order - what beats what?
Post by Arimabuff on Apr 28th, 2008, 10:17am

on 04/26/08 at 09:06:30, Arimabuff wrote:
Well, either I am a hundred years older or I was completely wrong on that one. I just cracked bot_ArimaaScoreP2 with a cat and eight rabbits with silver. It seemed even easier than with a dog. Now for gold, I am adamant cat and eight rabbits is completely utopist! ;)

Got bot_ArimaaScoreP2  with both colors with a cat and eight rabbits. I never thought it was possible but here it is.

Hear, hear!!! ;D



Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.