Arimaa Forum (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
Arimaa >> Bot Development >> Checking for repeated positions
(Message started by: leon_messerschmidt on Jul 5th, 2014, 10:12pm)

Title: Checking for repeated positions
Post by leon_messerschmidt on Jul 5th, 2014, 10:12pm
Up until now I simply ignored the rule for repeating positions.  However, in the last two games that my bot played (gameid 307055 and 307051) it made illegal moves and lost a game that wasn't necessarily over yet >:(

So my question is what is the best way to deal with repeat positions?

I could check and move generation time but that would involve hundreds of millions of additional board comparisons.

Or I could do some jiggery pokery in my transposition table but my gut feel tells me it is a hack which could cause me some pain later on.

The best option seems to check the position for repetition after search/evaluation.  But what do I replace it with?  Keep track of the second best?

These are just from the top of my head.  Anyone want to share some insight?

Title: Re: Checking for repeated positions
Post by JimmSlimm on Jul 6th, 2014, 4:08am
I have a hashset which contains all board positions that has occured. And in the search, just before calling the evaluation function, I check if the position is a repetition, if it is, I don't evaluate that position, or make its eval -infinity.

Title: Re: Checking for repeated positions
Post by rbarreira on Jul 6th, 2014, 4:25am
First of all - in an arimaa bot, you never want to compare positions, at least not in the performance-critical parts. Always compare the zobrist hash of the position.

As JimmSlimm said, you need to keep the history of all positions (or their hashes) that have occurred in the game, and how many times they occurred. An efficient data structure for searching (like a hash table) is best.

I want to add two things though:

1- This is not only about evaluation, you should check that the moves are legal before recursing on your search as well. What's the point of continuing a search branch where an illegal (repeating) move occured? You probably already have a check for illegal moves in your search (like moves that don't change the board). In that same place in the code you can add the repetition check.

2- It can also be good (if less important) to consider the positions in the search branch so far. For this purpose you shouldn't need any fancy data structure, as the searches are not very deep in Arimaa. A simple array of the hash for each position after a player's move in the current branch should work.

Title: Re: Checking for repeated positions
Post by leon_messerschmidt on Jul 6th, 2014, 6:32am
Thank you for the replies.  It helps a lot.

Aren't there enough hash collisions that I have to do a full comparison when the hashes are equal?  Perhaps my hash implementation isn't great but I almost always see a handful of collisions.

I already remove illegal and duplicate moves from my search.  But only for the current tree and not previous moves.  

It would be easy enough to add a additional hash table that cover previous moves.  The only reason I'm resisting it is the additional hash calculations and comparisons, which would run an awful lot.

Title: Re: Checking for repeated positions
Post by JimmSlimm on Jul 6th, 2014, 6:48am
As rbarrieira mentioned, check out "zobrist" hashing if you haven't already done so.

Collisions are practically almost non-existant, it's extremely fast to calculate the hash if you do it inside step generation, and it's just a 64bit integer, so the comparisons are kept to a minimum.

Here's a good link that describes this:
https://chessprogramming.wikispaces.com/Zobrist+Hashing

Title: Re: Checking for repeated positions
Post by leon_messerschmidt on Jul 6th, 2014, 5:40pm
I get the feeling I'm missing something really fundamental.

From the chessprogramming site ( https://chessprogramming.wikispaces.com/Zobrist+Hashing#Collisions ) it seems as if there is a collision expected for roughly every 2^32 (roughly 4 billion) entries.

Back of envelope calculation says that at an average branching factor of 15,000 I'd see ~800 collisions for a 12 step deep search.   Did I get that right?

Are you saying that you either don't see any collisions or that they're close enough to be treated as the same position?

Edit: at the moment I do seem some collisions in my hash function and I resolve them with a full comparison.

Title: Re: Checking for repeated positions
Post by dree12 on Jul 6th, 2014, 8:40pm

on 07/06/14 at 17:40:07, leon_messerschmidt wrote:
I get the feeling I'm missing something really fundamental.

From the chessprogramming site ( https://chessprogramming.wikispaces.com/Zobrist+Hashing#Collisions ) it seems as if there is a collision expected for roughly every 2^32 (roughly 4 billion) entries.

Back of envelope calculation says that at an average branching factor of 15,000 I'd see ~800 collisions for a 12 step deep search.   Did I get that right?

Are you saying that you either don't see any collisions or that they're close enough to be treated as the same position?

Edit: at the moment I do seem some collisions in my hash function and I resolve them with a full comparison.


With 64-bit keys, a collision in general between any two positions will result every roughly 4 billion positions searched. (This is the birthday problem.) This is the probability of a collision between any two positions, not the probability of a collision with a specific position, which is much smaller.

Because the number of positions that have occurred so far this game is small, the chance of collision with any of those is very small. The 0.0000001% chance of incorrectly rejecting a position, which is very likely to be inconsequential anyways, is not a concern worth sacrificing performance for.

In addition, no modern computer will search 4 billion positions rapidly, unless the evaluation function is a lot faster than it should be :P. Even a 3-ply deep search must be heavily pruned.

Title: Re: Checking for repeated positions
Post by JimmSlimm on Jul 7th, 2014, 2:49am

Quote:
Back of envelope calculation says that at an average branching factor of 15,000 I'd see ~800 collisions for a 12 step deep search.   Did I get that right?


I don't know how to calculate it, but looks like you are mistaking single steps for a full move here(That branching factor looks more likely to be for 4 steps, not 1)

Title: Re: Checking for repeated positions
Post by leon_messerschmidt on Jul 7th, 2014, 3:40am

on 07/07/14 at 02:49:21, JimmSlimm wrote:
I don't know how to calculate it, but looks like you are mistaking single steps for a full move here(That branching factor looks more likely to be for 4 steps, not 1)


Each move (4 steps) has a branching factor of roughly 15,000.  So for 3 moves or 12 steps deep I did 15,000x15,000x15,000

Title: Re: Checking for repeated positions
Post by JimmSlimm on Jul 7th, 2014, 4:52am
oh d**n you might be right. I underestimated how much 15'000 ^ 3 really is. Most of those states are not usually really considered though in alpha beta.

So, I think the more realistic number in a normal game of arimaa, for a bot, of "false collisions" is less than 1 on average. It shouldn't be hard to actually measure the number of comparisons made with some counter and calculate the true chance of collision.

double a = 1.0 / (2^32); //chance of collision in single comparison
double b = 1.0 - a; //chance of non collision
double c = b^X; //chance of no collisions in X comparisons

From those numbers and windows calc.exe, I get ~79% chance of there being no collisions after comparing 1billion boards.

I just get the feeling that it really doesn't matter enough to store the actual, full bitboards in addition to the zobrist hash keys

Title: Re: Checking for repeated positions
Post by rbarreira on Jul 7th, 2014, 12:37pm
You're better off worrying about a coconut falling on your head while programming than 64-bit hash collisions, provided your zobrist numbers are pretty random. Especially when it comes to the subject being discussed here - the number of positions in the game history is quite small.

Title: Re: Checking for repeated positions
Post by clyring on Jul 7th, 2014, 2:03pm
Even when considering type-1 errors in the entire transposition table (much bigger than the set of positions since the last capture and therefore much more likely to yield an error), this problem is still too small to care about when using a 64-bit hash.

--derivation follows--

There are three numbers to take into account here. Two have already been mentioned: The number of nodes searched (n) and the size of the hash-space. (H) However, we must also consider the number of entries in the transposition table. (T)

First let us consider the number of vacant entries (V) in the transposition table after n nodes have been searched. With each new node searched, the probability that the number of vacant entries will go down is V/T. Assuming both n and T are large enough for the continuum approximation to be reasonable, I take this as a differential equation dV/dn=-V/t, with the initial value V(0)=T. Solving this gives us V(n)=Te^(-n/T).

The probability that adding another node will cause a type-1 error is equal to the number of occupied entries in the transposition table divided by H, or (T-V)/H=[T-Te^(-n/T)]/H. We can again use the continuum approximation to estimate the number of type-1 errors with an integral. Doing so yields the number of errors E={Tn+T²[e^(-n/T)-1]}/H.

--end derivation--

The longest game in this year's CC lasted 15 116 seconds. Assuming all of that time was spent searching or pondering at 10Mnps, we have a reasonable upper bound on n. Plugging H=2^64 and T=2^32 into the formula, we get as our average number of type-1 errors for the whole game...

34. You can decide for yourself whether or not that is enough to warrant concern. :)



EDIT: Using the same idea with repetition checking, the hashes that can cause a false match are only as many as the number of plies since the last capture out of all 2^64. Realistically, this number will hardly ever be more than 100 and the probability of a false positive on any given node is less than one in 10^17. There might be occasional, very rare issues with hash comparison in the TT but it should essentially never happen with hash comparison for the purposes of repetition checking.

Title: Re: Checking for repeated positions
Post by lightvector on Jul 7th, 2014, 10:14pm
Computing 15000^3 is seriously misleading - almost all of those nodes will never even be visited by an alpha-beta search. What's really relevant is your hashtable size and the speed of your search in nodes per second.

A simple heuristic not requiring you to solve any equations is that you need on the order of 2^n pairwise comparisons to find collisions with a good n-bit hash. So if as in the birthday paradox you have 2^32 hashes with each one compared against each other one, than that's about 2^64 pairwise comparisons - so 2^32 is where you start getting significant probability to find at least one collision against a 64 bit hash. Zobrist hashing isn't perfect, but it's quite good and in practice sees to come close to the bound that a true "random" hash would give. So...

For repetition checking, you aren't comparing each position against 2^32 others. You're comparing each position against a few hundred others - less than 2^10, so you need close to 2^54 nodes searched to worry about collisions. Not going to happen.

For transposition tables, you're comparing each position against N others where N is the size of your table. Probably N is close to but not quite as large as 2^32 unless you have a lot of RAM. If N is, say, 2^27 ~= 128 million, then you expect a collision once you start getting close to 2^37 ~= 128 billion nodes searched. Based on how fast your own search is, you can determine how long that would take and how much you care.

Also consider this study. It turns out that a chess search can actually tolerate surprisingly high error rates. The probability that any single random error actually affects the root node is small. Similar should be true for Arimaa.

https://www.cis.uab.edu/hyatt/collisions.html

So use zobrist hashing with 64 bits and don't worry about it, unless you're really paranoid, in which case 96 bits should be plenty.

Title: Re: Checking for repeated positions
Post by leon_messerschmidt on Jul 8th, 2014, 3:30am
Thanks lightvector that is a really good article.  Form the article:


Quote:
The numbers are surprising, but they clearly show that errors do not influence the search nearly as much as has always been suspected. Even with programs searching at one million nodes per second and beyond, the errors do not wreck the search and cause substantial problems. For example, in the tactical positions, there are just a few key positions that influence the final result. In a tree search space of 120 million nodes, a signature collision on one of such a small set of critical nodes is very unlikely. One point this certainly underlines is that past claims that programs search a large number of useless chess positions is clearly right on target. One can almost change scores at will and still have little adverse effect on the quality of the moves produced.


I will probably stop worrying about collisions altogether.  The probability of a 64bit collision is outlandishly small.  And even then the effect on the quality of play is minimal.

Title: Re: Checking for repeated positions
Post by rbarreira on Jul 8th, 2014, 5:02am

on 07/07/14 at 22:14:40, lightvector wrote:
So use zobrist hashing with 64 bits and don't worry about it, unless you're really paranoid, in which case 96 bits should be plenty.


And then you'd probably be hurting yourself with the paranoia, given the extra overhead of maintaining and comparing 96 bit hashes, and the lost RAM/cache space from storing bigger hashes in the TT (both of which will have an easily measurable impact on your bot, unlike the impact of the collisions).

Title: Re: Checking for repeated positions
Post by leon_messerschmidt on Jul 8th, 2014, 6:50am

on 07/08/14 at 05:02:33, rbarreira wrote:
And then you'd probably be hurting yourself with the paranoia, given the extra overhead of maintaining and comparing 96 bit hashes, and the lost RAM/cache space from storing bigger hashes in the TT (both of which will have an easily measurable impact on your bot, unlike the impact of the collisions).


And yet... Terry Pratchett said one-in-a-million chances come up nine times out of ten ;D

Title: Re: Checking for repeated positions
Post by ddyer on Oct 3rd, 2014, 1:25pm
You only need to check for repeated positions rigorously when generating moves at the top level.

At deeper levels, the occasional occurrence of a duplicate position won't matter.

In the rare case that the position is locked into a loop that is
well motivated, the top level check against repetitions will remove
the illegal-but-desirable move, so the second best choice will be
made automatically.



Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.