Welcome, Guest. Please Login or Register.
Apr 20th, 2024, 2:35am

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « Multi-Threading strategy »


   Arimaa Forum
   Arimaa
   Bot Development
(Moderator: supersamu)
   Multi-Threading strategy
« Previous topic | Next topic »
Pages: 1  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: Multi-Threading strategy  (Read 2958 times)
leon_messerschmidt
Forum Senior Member
****



Arimaa player #6344

   


Gender: male
Posts: 26
Multi-Threading strategy
« on: May 24th, 2014, 10:57pm »
Quote Quote Modify Modify

I'm wondering what other strategies other developers are following to make bots multi-threaded.  I think I've more or less reached the limit of what I can squeeze out of a single thread.  So, I tried two different approaches to make my search use more cores:
 
1.  I divided the direct descendants of the root node between 4 threads (quad core).  I tried to keep individual thread entirely isolated from the other threads so no sharing of alpha/beta values, etc.  In practice some threads finish much quicker than others depending on how efficient the pruning works with the descendants that are allocated to them.  Unfortunately, as a whole it performs worse than a single threaded search.
 
2.  I based by search on an aspirational alpha/beta search which means I divide the alpha/beta windows between threads.  This sort-of works but it is very sensitive to the allocation of alpha/beta values and can under some circumstances be slower than a single threaded implementation.
 
I'm wondering what approaches other bots are taking to run multi-threaded.  Anyone willing to share?
IP Logged
rbarreira
Forum Guru
*****



Arimaa player #1621

   


Gender: male
Posts: 605
Re: Multi-Threading strategy
« Reply #1 on: May 25th, 2014, 3:48am »
Quote Quote Modify Modify

I wrote a bit about different ways of searching in parallel for Arimaa in this thread:
 
http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=devTalk;action=display ;num=1385006477
 
It's easier to do efficient parallel search for Arimaa than chess due to the huge amount of moves to search.
 
The most important thing is to use YBW (young brothers wait) - that means don't start searching other moves before the first move has been searched. You need a score from the first move in order to do zero window (or aspiration) searches for the other moves.
« Last Edit: May 25th, 2014, 4:02am by rbarreira » IP Logged
leon_messerschmidt
Forum Senior Member
****



Arimaa player #6344

   


Gender: male
Posts: 26
Re: Multi-Threading strategy
« Reply #2 on: May 25th, 2014, 3:53pm »
Quote Quote Modify Modify

Thank you for the answer.  I read though your older thread and that has a lot lot of useful information.
 
I modified my multi-threaded implementation to use YBW, but it was still slower than the single threaded version.
 
So I made only one worker thread, which I reasoned should be identical to the single threaded implementation.  Unfortunately is wasn't, which means i probably did something stupid when I went from single threaded code to multi-threaded code  Embarassed
IP Logged
rbarreira
Forum Guru
*****



Arimaa player #1621

   


Gender: male
Posts: 605
Re: Multi-Threading strategy
« Reply #3 on: May 26th, 2014, 2:02am »
Quote Quote Modify Modify

That might be good news since it's easier to investigate single-threaded code.  
 
Do the chosen best moves still make sense? How much slower is it?
 
Some logging can make it much easier to diagnose problems.
IP Logged
leon_messerschmidt
Forum Senior Member
****



Arimaa player #6344

   


Gender: male
Posts: 26
Re: Multi-Threading strategy
« Reply #4 on: May 26th, 2014, 3:23am »
Quote Quote Modify Modify

Thank you for the advice.  I certainly hope it becomes easier to diagnose when I go to a single worker.  It is about 30% slower; which is a big slowdown.
 
The moves still makes sense.  I've got a reasonable number of test positions, including a few of the puzzles listed on the arimaa site, that run with every build.
 
My single threaded implementation has gone through a few rounds of optimisation so it isn't always that obvious what is going on.  I'll do some valgrind this evening.
IP Logged
leon_messerschmidt
Forum Senior Member
****



Arimaa player #6344

   


Gender: male
Posts: 26
Re: Multi-Threading strategy
« Reply #5 on: May 28th, 2014, 3:49am »
Quote Quote Modify Modify

The problem seems kinda strange.  If I go with smaller searches things seem okay but when I go deeper or more complex positions times fall apart for multi-threaded implementations.
 
Valgrind tells me there is an inordinate amount of time spent in malloc/free.  I'm a bit out of my depth at this stage.  Could only guess that I might be running into some sort of memory fragmentation problem?  Or contention on mutexes?  
 
Either way I may need to rethink the number of calls I do to malloc.
IP Logged
rbarreira
Forum Guru
*****



Arimaa player #1621

   


Gender: male
Posts: 605
Re: Multi-Threading strategy
« Reply #6 on: May 28th, 2014, 3:49pm »
Quote Quote Modify Modify

on May 28th, 2014, 3:49am, leon_messerschmidt wrote:
The problem seems kinda strange.  If I go with smaller searches things seem okay but when I go deeper or more complex positions times fall apart for multi-threaded implementations.
 
Valgrind tells me there is an inordinate amount of time spent in malloc/free.  I'm a bit out of my depth at this stage.  Could only guess that I might be running into some sort of memory fragmentation problem?  Or contention on mutexes?  
 
Either way I may need to rethink the number of calls I do to malloc.

 
You should definitely avoid doing malloc during searches. You should be able to get by with statically allocated buffers, local variables and memory allocated at the beginning of the execution.
« Last Edit: May 28th, 2014, 3:50pm by rbarreira » IP Logged
leon_messerschmidt
Forum Senior Member
****



Arimaa player #6344

   


Gender: male
Posts: 26
Re: Multi-Threading strategy
« Reply #7 on: May 28th, 2014, 3:57pm »
Quote Quote Modify Modify

on May 28th, 2014, 3:49pm, rbarreira wrote:

 
You should definitely avoid doing malloc during searches. You should be able to get by with statically allocated buffers, local variables and memory allocated at the beginning of the execution.

 
It sounds so simple when you say it like that  Cheesy
IP Logged
rbarreira
Forum Guru
*****



Arimaa player #1621

   


Gender: male
Posts: 605
Re: Multi-Threading strategy
« Reply #8 on: May 29th, 2014, 10:24am »
Quote Quote Modify Modify

on May 28th, 2014, 3:57pm, leon_messerschmidt wrote:

 
It sounds so simple when you say it like that  Cheesy

 
Heh Smiley
 
The good thing with an alphabeta search is that the number of threads is known, so you can pre-allocate all the memory needed. Feel free to ask if there's any specific thing that makes that difficult and maybe we can discuss some ideas to solve that.
IP Logged
leon_messerschmidt
Forum Senior Member
****



Arimaa player #6344

   


Gender: male
Posts: 26
Re: Multi-Threading strategy
« Reply #9 on: May 30th, 2014, 3:50am »
Quote Quote Modify Modify

on May 29th, 2014, 10:24am, rbarreira wrote:

 
Heh Smiley
 
The good thing with an alphabeta search is that the number of threads is known, so you can pre-allocate all the memory needed. Feel free to ask if there's any specific thing that makes that difficult and maybe we can discuss some ideas to solve that.

 
Thank you for the encouragement.  I've started refactoring some code to get rid of some the most obvious malloc/free calls.
 
One case that is a bit more interesting is my hash table implementation.  I use the hash to select a bucket where each bucket is a linked list.  This seem like a hard thing to allocate memory ahead of time for.
 
I tried to switch to a sorted array (with binary search) for which I could allocate a big block of memory.  But because I need to move memory around to insert items in the right positions, it is significantly slower than a hash table.
IP Logged
rbarreira
Forum Guru
*****



Arimaa player #1621

   


Gender: male
Posts: 605
Re: Multi-Threading strategy
« Reply #10 on: May 30th, 2014, 7:00am »
Quote Quote Modify Modify

on May 30th, 2014, 3:50am, leon_messerschmidt wrote:

 
Thank you for the encouragement.  I've started refactoring some code to get rid of some the most obvious malloc/free calls.
 
One case that is a bit more interesting is my hash table implementation.  I use the hash to select a bucket where each bucket is a linked list.  This seem like a hard thing to allocate memory ahead of time for.
 
I tried to switch to a sorted array (with binary search) for which I could allocate a big block of memory.  But because I need to move memory around to insert items in the right positions, it is significantly slower than a hash table.

 
I recommend using a fixed bucket size instead of using separately allocated buckets. I use buckets with a fixed size of 4 slots, and overwrite one of the old entries when the bucket is full.
« Last Edit: May 30th, 2014, 7:02am by rbarreira » IP Logged
leon_messerschmidt
Forum Senior Member
****



Arimaa player #6344

   


Gender: male
Posts: 26
Re: Multi-Threading strategy
« Reply #11 on: May 31st, 2014, 12:36am »
Quote Quote Modify Modify

on May 30th, 2014, 7:00am, rbarreira wrote:

 
I recommend using a fixed bucket size instead of using separately allocated buckets. I use buckets with a fixed size of 4 slots, and overwrite one of the old entries when the bucket is full.

 
Okay that might work.  Although at the moment some parts of my code make the assumption that the hash table contains a full set of steps so I'll need to rework some things.
 
I also did another Valgrind and there are some other things that will give me more bang for buck.  For example, I separated the concepts of a Move and a Step.  If I collapse those back into one struct I get to remove a large number of calls to malloc.
IP Logged
leon_messerschmidt
Forum Senior Member
****



Arimaa player #6344

   


Gender: male
Posts: 26
Re: Multi-Threading strategy
« Reply #12 on: May 31st, 2014, 7:43pm »
Quote Quote Modify Modify

I probably got rid of about a third of the calls to malloc.  I've gone from two threads being substantially slower than a single thread to being about 20% faster!  Yay  Grin
 
Not quite where I was hoping to get to but at least it a step in the right direction.  Now to tackle that hash table implementation...  
 
Thank you, rbarreira, for your insights.
IP Logged
Pages: 1  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.