Welcome, Guest. Please Login or Register.
Sep 23rd, 2019, 6:27am

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « Multiple core and non-locking transposition tables »


   Arimaa Forum
   Arimaa
   Bot Development
(Moderator: supersamu)
   Multiple core and non-locking transposition tables
« Previous topic | Next topic »
Pages: 1 2  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: Multiple core and non-locking transposition tables  (Read 3272 times)
Rednaxela
Forum Senior Member
****



Arimaa player #4674

   


Gender: male
Posts: 34
Multiple core and non-locking transposition tables
« on: Aug 29th, 2010, 12:14am »
Quote Quote Modify Modify

So, I saw a thread about using multi-core before but the scaling I heard of was not that great. Since they've been a while ago I won't bump that (if I should actually bump in the future, just let me know, sorry).
 
One thing I've wondered is if syncronization for the transposition table had much to do with the scaling issues, so I ended up finding some non-locking hashtables/maps that are threading safe via use of compare-and-swap instructions. See http://sourceforge.net/projects/high-scale-lib/ for Java and http://code.google.com/p/nbds/ for C. I'm thinking they could be modified to become a nice fixed-size transposition table pretty easily, and be very optimal for parallel use. The Java lock-free hashtable I linked apparently even scales near-linearly on a crazy 768 core beast that the it's author's employer makes.
 
I don't think we'll be running an Arimaa bot on anything quite that big any time soon, but I still think that a non-locking transposition table would be a good thing. Has anyone with a parallel bot measured how much of their scaling limits are due to overhead of locking for the transposition table? Any thoughts about the value or lack thereof in a non-locking one?
« Last Edit: Aug 29th, 2010, 12:18am by Rednaxela » IP Logged
tize
Forum Guru
*****



Arimaa player #3121

   


Gender: male
Posts: 118
Re: Multiple core and non-locking transposition ta
« Reply #1 on: Aug 29th, 2010, 3:34am »
Quote Quote Modify Modify

I would think that most bots that are parallel reads and writes to the hash table as "normal", and just ignores the problem with multiple threads reading and writing to the same hash. See http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=devTalk;action=display ;num=1205320786 for why the locking is ignored.
IP Logged
rbarreira
Forum Guru
*****



Arimaa player #1621

   


Gender: male
Posts: 605
Re: Multiple core and non-locking transposition ta
« Reply #2 on: Aug 29th, 2010, 6:21am »
Quote Quote Modify Modify

I had problems when I was using a plain non-locking hash table as soon as I moved from a dual core to a hexa core.
 
For me, the solution was to use Hyatt's lockless hashing:
 
http://www.cis.uab.edu/hyatt/hashing.html
 
It is very simple to implement (just a handful of lines of code changed for me), and it has almost zero overhead compared to the plain hashing.
 
I'm sure a more fancy and "correct" solution is possible, but with more overhead to ensure 0% data loss (which isn't anywhere near a must with a transposition table).
IP Logged
Rednaxela
Forum Senior Member
****



Arimaa player #4674

   


Gender: male
Posts: 34
Re: Multiple core and non-locking transposition ta
« Reply #3 on: Aug 29th, 2010, 6:56am »
Quote Quote Modify Modify

on Aug 29th, 2010, 6:21am, rbarreira wrote:

http://www.cis.uab.edu/hyatt/hashing.html
 
It is very simple to implement (just a handful of lines of code changed for me), and it has almost zero overhead compared to the plain hashing.
 
I'm sure a more fancy and "correct" solution is possible, but with more overhead to ensure 0% data loss (which isn't anywhere near a must with a transposition table).

Ahh very interesting link there. It's simplicity looks pretty elegant.
 
As far as overhead, while I haven't benchmarked  it yet, the lockless hash tables I linked are supposed to have no data loss and also not have any overhead. If it benchmarks favorably it's probably what I'll use as a basis for a transposition table I think. I imagine the difference is fairly slim indeed,  but as long as there's no overhead, why not.
IP Logged
doublep
Forum Guru
*****



Badger author

   


Gender: male
Posts: 82
Re: Multiple core and non-locking transposition ta
« Reply #4 on: Aug 29th, 2010, 10:15am »
Quote Quote Modify Modify

Plain non-locking hashing is really not an option, because it can yield not just a wrong move/score when two threads read/write the same entry, but something illegal and crash your program.  Badger currently uses spin-locks.  I need to look closer at that article, though it's very hard to read for some reason (or maybe I'm too stupid for it).
IP Logged
speek
Forum Guru
*****



Arimaa player #5441

   


Gender: male
Posts: 75
Re: Multiple core and non-locking transposition ta
« Reply #5 on: Aug 29th, 2010, 1:34pm »
Quote Quote Modify Modify

How much help are transposition tables anyway in Arimaa?
IP Logged
jdb
Forum Guru
*****



Arimaa player #214

   


Gender: male
Posts: 682
Re: Multiple core and non-locking transposition ta
« Reply #6 on: Aug 29th, 2010, 2:06pm »
Quote Quote Modify Modify

on Aug 29th, 2010, 1:34pm, speek wrote:
How much help are transposition tables anyway in Arimaa?

 
 
A very big help.   Consider Ra1n Rb1n or Rb1n Ra1n. The same board position is reached after two steps. This happens all the time in the search.
IP Logged
speek
Forum Guru
*****



Arimaa player #5441

   


Gender: male
Posts: 75
Re: Multiple core and non-locking transposition ta
« Reply #7 on: Aug 29th, 2010, 2:23pm »
Quote Quote Modify Modify

Well, that's different - that's something you have to do to find all the unique legal moves from a given position, but it only applies to that one ply.
 
I thought transpositional tables were about collapsing the search tree to only unique positions and evaluating each unique position only once.
 
However, given an engine that correctly produces unique legal moves per ply, how much advantage is there to transpositional tables? If I search all my moves, and all your responses to all my moves, how many duplicate positions occur and how much time does it take dealing with them vs managing a transpositional table that is probably hundreds of MB in size?  I'm already storing the game tree which is hundreds of MB in size, and I only get 500 to work with.
 
Currently, I am not keeping a transpositional table.  At the start of the game, searching a full 2-ply takes less than 3 seconds.  I know that when I am generating moves, managing the hashtable to prevent duplicate moves being generated is my biggest use of cpu time.  So, I have doubts on the value of the transpositional table outside that use.
IP Logged
Rednaxela
Forum Senior Member
****



Arimaa player #4674

   


Gender: male
Posts: 34
Re: Multiple core and non-locking transposition ta
« Reply #8 on: Aug 29th, 2010, 4:52pm »
Quote Quote Modify Modify

on Aug 29th, 2010, 10:15am, doublep wrote:
Plain non-locking hashing is really not an option, because it can yield not just a wrong move/score when two threads read/write the same entry, but something illegal and crash your program.  Badger currently uses spin-locks.  I need to look closer at that article, though it's very hard to read for some reason (or maybe I'm too stupid for it).

If I'm interpreting it correctly, the one in the article that  rbarreira links appears to get around the issue, by 1) assuming a write of the key is atomic and that a write of the data is atomic (i.e. fits in a 64 bit long or is a pointer), the 2) XORing the key over the data, such that if the key/data are mismatched the error will most likely be detected and subsequently ignored. If you have spare bits in a 64 bit data value, I imagine you could probably pack a checksum into  it as well to further increase the chance of detecting the bad writes. Essentially the idea is that instead of getting getting undetectably corrupt data, you on rare occasional lose data, which for a transposition table is tolerable.
 
The high-scale-lib that I linked gets around the issue by a different approach. It appears to make use of the "compare and set" instructions to manage to operate with neither locks nor occasional lost data. There might be limitations though that I'm not yet aware of. I'll have to look closer. I still don't fully understand exactly how it's achieving it.
 
Here's another possible approach for scalability that just came to mind here thanks to you doublep. I recently read that Java's ConcurrentHashMap class deals with concurrency by having 16 separate locks for different parts of the array backing it. Perhaps this partitioned-locking approach could be used with spinlocks as used in Badger, to have a simple way to reduce how often things run into a lock? Just a thought.
 
on Aug 29th, 2010, 2:23pm, speek wrote:
Well, that's different - that's something you have to do to find all the unique legal moves from a given position, but it only applies to that one ply.
 
I thought transpositional tables were about collapsing the search tree to only unique positions and evaluating each unique position only once.
 
However, given an engine that correctly produces unique legal moves per ply, how much advantage is there to transpositional tables? If I search all my moves, and all your responses to all my moves, how many duplicate positions occur and how much time does it take dealing with them vs managing a transpositional table that is probably hundreds of MB in size?  I'm already storing the game tree which is hundreds of MB in size, and I only get 500 to work with.
Well, what you're doing with your hash tables to check the uniqueness of moves is very similar to what the transposition tables do, but transposition tables do a little more at the same time. As I see it, the differences are:
1) They stay fixed size so they don't need to be re-allocated or cleared
2) They take the 'lazy' way out with regards to collisions, just storing one value or the other usually (some can try alternative indexes but they don't put try to remember everthing permanently). This is faster and perfect recall is not necessary for the transposition table's performance in the big picture.
3) They're remembered for future turns as well, used as appropriately without needing special handling
4) They usually remember the next principal variation step that was last found at that position. When doing deeper searches, it is faster to check the principal variation first, because this often leads to to alpha-beta pruning making a cutoff quicker.
« Last Edit: Aug 29th, 2010, 4:54pm by Rednaxela » IP Logged
speek
Forum Guru
*****



Arimaa player #5441

   


Gender: male
Posts: 75
Re: Multiple core and non-locking transposition ta
« Reply #9 on: Aug 29th, 2010, 5:33pm »
Quote Quote Modify Modify

on Aug 29th, 2010, 4:52pm, rednaxela wrote:

Well, what you're doing with your hash tables to check the uniqueness of moves is very similar to what the transposition tables do, but transposition tables do a little more at the same time. As I see it, the differences are:
1) They stay fixed size so they don't need to be re-allocated or cleared
2) They take the 'lazy' way out with regards to collisions, just storing one value or the other usually (some can try alternative indexes but they don't put try to remember everthing permanently). This is faster and perfect recall is not necessary for the transposition table's performance in the big picture.
3) They're remembered for future turns as well, used as appropriately without needing special handling
4) They usually remember the next principal variation step that was last found at that position. When doing deeper searches, it is faster to check the principal variation first, because this often leads to to alpha-beta pruning making a cutoff quicker.

 
I understand this, I'm just dubious of it's value for Arimaa.  Firstly, these differences mean you can't use it to generate a correct collection of unique moves for a given position - that can't be a fuzzy operation.  It also does have to be cleared (though not necessarily reallocated) each time.  Given that, you'd have to have two such datastores.  As I said, I already spend a plurality of cpu time doing hashes for discarding duplicate moves - I'm doubtful that storing duplicate  positions between plys for the sake of forgoing evaluation of the position - I'm doubting this is all that useful.  Primarily I doubt it because we're not going terribly deep into the Arimaa tree.  It's not like chess where 10 ply will get you a lot of transpositions, yet not so many that we're straining memory capacities.
 
I would like to see empirical evidence of improvement strictly from the use of such tables.
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Multiple core and non-locking transposition ta
« Reply #10 on: Aug 29th, 2010, 5:56pm »
Quote Quote Modify Modify

Speek, if you are throwing out duplicate moves and only achieving shallow searches, I rather doubt transposition tables will help you much.  At two ply the help should be almost zero; the only possible duplication is from dislodging opposing pieces.  At three ply, however, consider the worst-case scenario of your opponent making a fixed move while your two moves consisted of eight independent steps that don't interact with each other or the opponent's move.  In that case, you can reach the same position in 8!/(4!4!) = 70 different ways in spite of duplicate moves being thrown out.  Your evaluation may be cheap, but do you want to blow it up by a factor of 70?  Of course, eight independent steps will be very rare, and if you do haizhi-pruning of moves, your maximum different ways to reach each position is only 2*4*3/2 = 12.  Still significant, it seems to me.
 
I understand that you don't want to waste time on duplicate-position detection after having already spent so much time on duplicate-move detection, but why not get rid of the duplicate-move detection and let the duplicate-position detection do both jobs?
IP Logged

rbarreira
Forum Guru
*****



Arimaa player #1621

   


Gender: male
Posts: 605
Re: Multiple core and non-locking transposition ta
« Reply #11 on: Aug 29th, 2010, 6:16pm »
Quote Quote Modify Modify

on Aug 29th, 2010, 4:52pm, rednaxela wrote:
2) XORing the key over the data, such that if the key/data are mismatched the error will most likely be detected and subsequently ignored.

 
I'm not sure if that's what you mean, but I think the better way is to xor the key and the data to get the key which will be stored. This way the data is always readable, which is good since it needs to be readable for the corrupted entry to be replaced with the usual checks.
 
on Aug 29th, 2010, 2:23pm, speek wrote:
I know that when I am generating moves, managing the hashtable to prevent duplicate moves being generated is my biggest use of cpu time.  So, I have doubts on the value of the transpositional table outside that use.

 
If the hashing is such a big bottleneck, that could indicate that your approach to get non-unique moves is not so efficient. In my bot, accesses to the transposition table take up about 10% of the running time.
IP Logged
speek
Forum Guru
*****



Arimaa player #5441

   


Gender: male
Posts: 75
Re: Multiple core and non-locking transposition ta
« Reply #12 on: Aug 29th, 2010, 6:26pm »
Quote Quote Modify Modify

Well, I don't understand your example of an opponent making a "fixed" move (what is that?) and 8 independent steps.  The first 4 independent steps resulting in position A will be in my set of unique moves once.  Then, the opponent has a "fixed" move.  Then I do 4 more moves independent of each other and the one move I started with.  How am I duplicating positions?  I think without getting to the 4th ply, I'm not duplicating positions without dislodging pieces, but please help me understand better.
 
As for using the same facility for both, I think that'd be great, but I'm frankly unsure of how it would work.  The one is checking positions at every step, whereas the other only cares about end-of-ply positions.  Necessarily, all the positions I've recorded for making unique moves have to be discarded to allow for new sets of unique moves next ply.  If I save them, then I need a different storage for generating new moves, which means I have two storage locations anyway.
 
The thing is, I can see that saving position evaluations would save a lot of time - especially as sometimes an evaluation has come as a result of a cascade via min-max through multiple ply that have been checked.  So, some of those evaluations are pretty thorough.  However, doing this comes at the cost of a lot of uselessly stored positions that the engine will never care about anyway, a lot of memory.  I'm already having trouble getting below 500MB because my game tree is currently using 160MB by itself.
IP Logged
rbarreira
Forum Guru
*****



Arimaa player #1621

   


Gender: male
Posts: 605
Re: Multiple core and non-locking transposition ta
« Reply #13 on: Aug 29th, 2010, 6:30pm »
Quote Quote Modify Modify

Your bot can use more than 500 MB, the worst case is that if omar thinks it's too much he won't make your bot available for play after the computer tournament.
 
IMO you really shouldn't let a 500 MB limit interfere with what you want to do in your bot...
« Last Edit: Aug 29th, 2010, 6:30pm by rbarreira » IP Logged
speek
Forum Guru
*****



Arimaa player #5441

   


Gender: male
Posts: 75
Re: Multiple core and non-locking transposition ta
« Reply #14 on: Aug 29th, 2010, 6:30pm »
Quote Quote Modify Modify

on Aug 29th, 2010, 6:16pm, rbarreira wrote:

If the hashing is such a big bottleneck, that could indicate that your approach to get non-unique moves is not so efficient. In my bot, accesses to the transposition table take up about 10% of the running time.

 
Well, 10% is a hefty percentage for one low-level operation like doing a put to a hash table.  It would certainly be the kind of cost I'm talking about.
IP Logged
Pages: 1 2  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.