Welcome, Guest. Please Login or Register.
Apr 26th, 2024, 8:33am

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « Empirically derived material evaluators, part 1 »


   Arimaa Forum
   Arimaa
   General Discussion
(Moderator: supersamu)
   Empirically derived material evaluators, part 1
« Previous topic | Next topic »
Pages: 1 2  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: Empirically derived material evaluators, part 1  (Read 7495 times)
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Empirically derived material evaluators, part
« Reply #15 on: Nov 13th, 2006, 12:57pm »
Quote Quote Modify Modify

on Nov 13th, 2006, 12:05pm, IdahoEv wrote:
I'm switching to the sigmoid-probability-estimate approach for future attemts to optimize coefficients in any case.

Excellent.  I realize using a sigmoid throws in another source of error (i.e. the appropriate curvature), it does also open up a much bigger source of useful data.
 
Quote:
I have all 1357 in a spreadsheet if you want 'em.  Smiley

I want 'em.  yangfuli@yahoo.com
 
Quote:
Functionally, it shouldn't matter, right?

Yes, it also doesn't matter to FAME (I think) which category of piece was missing, even though FAME is collapsing up instead of down.
IP Logged

IdahoEv
Forum Guru
*****



Arimaa player #1753

   


Gender: male
Posts: 405
Re: Empirically derived material evaluators, part
« Reply #16 on: Nov 13th, 2006, 1:09pm »
Quote Quote Modify Modify

on Nov 13th, 2006, 12:57pm, Fritzlein wrote:
Excellent.  I realize using a sigmoid throws in another source of error (i.e. the appropriate curvature), it does also open up a much bigger source of useful data.

 
Almost the biggest annoyance is that the error function of the older system was so transparent.   An error of 7514 meant that 7514 states were classified "incorrectly".   Easy for humans to interpret.
 
 
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Empirically derived material evaluators, part
« Reply #17 on: Nov 14th, 2006, 9:29pm »
Quote Quote Modify Modify

on Nov 13th, 2006, 12:05pm, IdahoEv wrote:
I have all 1357 in a spreadsheet if you want 'em.  Smiley

OK, a look at the spreadsheet verifies what one would suspect from the coefficients.  FAME likes heavy pieces while LinearAB likes many pieces.
 
Of the 1357 cases of disagreement, 85 have the same number of pieces on each side.  For these, FAME wins 44 by preferring, for example, MR over HD.
 
Of the remaining disagreement cases only two had FAME favoring the side with more pieces while LinearAB favored the side with fewer pieces.  Those were both midgames where FAME barely preferred RRR over DC, and was wrong both times.
 
The other 1270 had FAME favoring the side with fewer pieces while LinearAB favored the side with more pieces.  Of these, FAME only wins 556, or 44% of the disputed positions.
 
We can further break this down by how many pieces fewer the side preferred by FAME had:
 
deficitCasesCorrectPercent
177036648%
238915841%
31002929%
4+11327%

So the more pieces FAME is behind by when it claims to actually be winning, the less we should trust FAME.
 
It's hard to see this as a issue of LinearAB overfitting on borderline cases so much as an issue of FAME badly fitting on extreme cases.
 
One thing that give me pause in jumping on the bandwagon to overhaul FAME is that the A coefficient in LinearAB had such a outlier result in one trial.
 
on Nov 11th, 2006, 6:56pm, IdahoEv wrote:

The three solutions were:
ErrorABC
95141.561.2630.237
95143.121.2610.203
95171.951.2650.226


But now that I carefully re-read that post, I see that A and C are in fact working together to keep the value of the officers as a group in constant proportion to the value of rabbits as a group.  Indeed, if I am reading the results right, the curvature basically does not matter.  What does matter is the ratio of big officers to small officers, and the ratio of the officers collectively to the rabbits collectively.
 
So I guess I'm now very open to the notion that FAME needs to value quantity more and quality less.  This is quite amusing given how far FAME has already gone in that direction from what I used to think.  If I now have to think a dog is worth less than two rabbits and a horse is worth less than three rabbits, the only thing I have left to hold on to is that a cat is worth more than a rabbit.  Don't take that away from me, please!
IP Logged

IdahoEv
Forum Guru
*****



Arimaa player #1753

   


Gender: male
Posts: 405
Re: Empirically derived material evaluators, part
« Reply #18 on: Nov 16th, 2006, 2:28pm »
Quote Quote Modify Modify

Karl,
 
Fascinating analysis!   I certainly didn't have the patience to analyze all those conflict states so closely.    
 
What makes me the most glad is that we seem to be actually learning something from these experiments about what works in the real world.
 
Also, I'm continually surprised by the relative strength of LinearAB as an evaluator.  It was originally just a toss-off idea I used to test out my code.
 
on Nov 14th, 2006, 9:29pm, Fritzlein wrote:
One thing that give me pause in jumping on the bandwagon to overhaul FAME is that the A coefficient in LinearAB had such a outlier result in one trial.

 
The one you're pointing out wasn't LinearAB (which simply evaluates all rabbits as worth 1 point), it was the extension which gave values the rabbits on a curve function (Call it RabbitCurveABC for the moment).   And yes, in RabbitCurveABC, B is constant but A and C tend to co-vary to keep the value of the officers more-or-less constant relative to the value of the first rabbits lost.    I'm confident that it's the first rabbits lost, because if it were the rabbits collectively, A would find a constant solution.  In RabbitCurveABC (like LinearAB), the rabbits as a whole are worth 8 points; it's only the distribution of those points among the rabbits that changes (by C).   For a fixed B, A alone sets the collective value of the officers relative to rabbits.
 
I see evidence of overfitting in RabbitCurveABC and some other algorithms, but not in LinearAB.  What I mean by that is that the system very quickly settles into states that will go unchanged for thousands of iterations before making sudden jump-shifts to another nearby state that will gain them just a few more "correct" cases.  After the error gets down to 9600 or so, the behavior of the fitting system is decided non-smooth for most of these algorithms.    
 
Quote:
If I now have to think a dog is worth less than two rabbits and a horse is worth less than three rabbits, the only thing I have left to hold on to is that a cat is worth more than a rabbit.  Don't take that away from me, please!

 
Well, my loyalty is always to the facts.   But, despite my early skewed result last summer I'm pretty confident that the facts bear out that a cat is worth more than an initial rabbit.
IP Logged
PMertens
Forum Guru
*****



Arimaa player #692

   
WWW

Gender: male
Posts: 437
Re: Empirically derived material evaluators, part
« Reply #19 on: Nov 16th, 2006, 4:13pm »
Quote Quote Modify Modify

Fritzl did not use the word initial ... and I am quite positive that it is not the last rabbit which is the first to be worth more than a cat Wink
IP Logged
IdahoEv
Forum Guru
*****



Arimaa player #1753

   


Gender: male
Posts: 405
Re: Empirically derived material evaluators, part
« Reply #20 on: Nov 16th, 2006, 5:00pm »
Quote Quote Modify Modify

on Nov 16th, 2006, 4:13pm, PMertens wrote:
Fritzl did not use the word initial ... and I am quite positive that it is not the last rabbit which is the first to be worth more than a cat Wink

 
 
If you take my RabbitCurveABC results at face value (and using the median result of several runs) a cat is worth 1.50 points (where the average rabbit is worth 1.0 points).   The first rabbit lost is worth 0.39 points, and the last is worth 1.95 points.  As it turns out, the seventh rabbit is worth 1.55 points.
 
Meanwhile, the first three rabbits added together are worth 1.51 points.
 
Note that this system was outperformed by two others ... including the one that valued all rabbits at a flat 1.0 points and cats at 1.24.   But when allowed to vary the value of rabbits, rabbit #7 = one cat is what it settles on.
IP Logged
Microbe
Forum Newbie
*




Arimaa player #1977

   


Gender: male
Posts: 4
Re: Empirically derived material evaluators, part
« Reply #21 on: Nov 16th, 2006, 5:03pm »
Quote Quote Modify Modify

Indeed. A rabbit on the last rank wins, so in many situations it is probably worth a cat. Maybe more. I obviously do not have the experience or ability to know this, just somehting that seems to make sense to me.
IP Logged
IdahoEv
Forum Guru
*****



Arimaa player #1753

   


Gender: male
Posts: 405
Re: Empirically derived material evaluators, part
« Reply #22 on: Nov 16th, 2006, 5:09pm »
Quote Quote Modify Modify

I don't mean a rabbit on the 7th rank.  I mean the cost of the 7th rabbit lost: what does one rabbit cost when you are down to only two?
IP Logged
jdb
Forum Guru
*****



Arimaa player #214

   


Gender: male
Posts: 682
Re: Empirically derived material evaluators, part
« Reply #23 on: Nov 16th, 2006, 6:56pm »
Quote Quote Modify Modify

Just an observation,
 
A material evaluation score is only valid if there are no positional features that take precedence. For example, when analyzing a game, if there is a goal race going on, the material situation is no longer really relevant. So the analysis gleaned from the database at that stage of the game should probably not be used.
 
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Empirically derived material evaluators, part
« Reply #24 on: Nov 16th, 2006, 8:44pm »
Quote Quote Modify Modify

on Nov 16th, 2006, 2:28pm, IdahoEv wrote:
And yes, in RabbitCurveABC, B is constant but A and C tend to co-vary to keep the value of the officers more-or-less constant relative to the value of the first rabbits lost.    I'm confident that it's the first rabbits lost, because if it were the rabbits collectively, A would find a constant solution.

Yes, you are quite right.  The value of the officers is kept constant relative to the value of the first few rabbits, not relative to the value of the rabbits as a whole, by A and C together.  
 
Even if there were some reason to keep the value of the officers constant relative to the value of all the rabbits, the system couldn't do it, because it is being rewarded or punished once per material state that occurs.  Since the states with most rabbits on the board occur much more often than the states with most rabbits off the board, the system will necessarily tune itself to the value of the first few rabbits rather than the last few.
 
In that sense, whatever coefficients you come up with will probably be wildly inaccurate in some endgames, for much the same reason that FAME and DAPE are wildly inaccurate in some endgames: We all tune our systems to deal with familiar situations first, and merely hope they deal with unfamiliar situations by extension.
IP Logged

IdahoEv
Forum Guru
*****



Arimaa player #1753

   


Gender: male
Posts: 405
Re: Empirically derived material evaluators, part
« Reply #25 on: Nov 16th, 2006, 10:13pm »
Quote Quote Modify Modify

on Nov 16th, 2006, 8:44pm, Fritzlein wrote:

In that sense, whatever coefficients you come up with will probably be wildly inaccurate in some endgames, for much the same reason that FAME and DAPE are wildly inaccurate in some endgames: We all tune our systems to deal with familiar situations first, and merely hope they deal with unfamiliar situations by extension.

 
Absolutely.   The drawback of an empirical approach is that it is constrained to the data available, and there's no question these results are necessarily skewed by that.
 
I suspect that one of the reasons DAPE does so well with adjusted coefficients (in the other post) is that because every piece's value depends in some sense on the number of other pieces on the board, DAPE's coefficients can be adjusted in such a way that the terms respond more appropriately in endgame situations.
IP Logged
Pages: 1 2  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.