Arimaa Forum (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
Arimaa >> Bot Development >> Multi-Threading strategy
(Message started by: leon_messerschmidt on May 24th, 2014, 10:57pm)

Title: Multi-Threading strategy
Post by leon_messerschmidt on May 24th, 2014, 10:57pm
I'm wondering what other strategies other developers are following to make bots multi-threaded.  I think I've more or less reached the limit of what I can squeeze out of a single thread.  So, I tried two different approaches to make my search use more cores:

1.  I divided the direct descendants of the root node between 4 threads (quad core).  I tried to keep individual thread entirely isolated from the other threads so no sharing of alpha/beta values, etc.  In practice some threads finish much quicker than others depending on how efficient the pruning works with the descendants that are allocated to them.  Unfortunately, as a whole it performs worse than a single threaded search.

2.  I based by search on an aspirational alpha/beta search which means I divide the alpha/beta windows between threads.  This sort-of works but it is very sensitive to the allocation of alpha/beta values and can under some circumstances be slower than a single threaded implementation.

I'm wondering what approaches other bots are taking to run multi-threaded.  Anyone willing to share?

Title: Re: Multi-Threading strategy
Post by rbarreira on May 25th, 2014, 3:48am
I wrote a bit about different ways of searching in parallel for Arimaa in this thread:

http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=devTalk;action=display;num=1385006477

It's easier to do efficient parallel search for Arimaa than chess due to the huge amount of moves to search.

The most important thing is to use YBW (young brothers wait) - that means don't start searching other moves before the first move has been searched. You need a score from the first move in order to do zero window (or aspiration) searches for the other moves.

Title: Re: Multi-Threading strategy
Post by leon_messerschmidt on May 25th, 2014, 3:53pm
Thank you for the answer.  I read though your older thread and that has a lot lot of useful information.

I modified my multi-threaded implementation to use YBW, but it was still slower than the single threaded version.

So I made only one worker thread, which I reasoned should be identical to the single threaded implementation.  Unfortunately is wasn't, which means i probably did something stupid when I went from single threaded code to multi-threaded code  :-[

Title: Re: Multi-Threading strategy
Post by rbarreira on May 26th, 2014, 2:02am
That might be good news since it's easier to investigate single-threaded code.

Do the chosen best moves still make sense? How much slower is it?

Some logging can make it much easier to diagnose problems.

Title: Re: Multi-Threading strategy
Post by leon_messerschmidt on May 26th, 2014, 3:23am
Thank you for the advice.  I certainly hope it becomes easier to diagnose when I go to a single worker.  It is about 30% slower; which is a big slowdown.

The moves still makes sense.  I've got a reasonable number of test positions, including a few of the puzzles listed on the arimaa site, that run with every build.

My single threaded implementation has gone through a few rounds of optimisation so it isn't always that obvious what is going on.  I'll do some valgrind this evening.

Title: Re: Multi-Threading strategy
Post by leon_messerschmidt on May 28th, 2014, 3:49am
The problem seems kinda strange.  If I go with smaller searches things seem okay but when I go deeper or more complex positions times fall apart for multi-threaded implementations.

Valgrind tells me there is an inordinate amount of time spent in malloc/free.  I'm a bit out of my depth at this stage.  Could only guess that I might be running into some sort of memory fragmentation problem?  Or contention on mutexes?  

Either way I may need to rethink the number of calls I do to malloc.

Title: Re: Multi-Threading strategy
Post by rbarreira on May 28th, 2014, 3:49pm

on 05/28/14 at 03:49:10, leon_messerschmidt wrote:
The problem seems kinda strange. If I go with smaller searches things seem okay but when I go deeper or more complex positions times fall apart for multi-threaded implementations.

Valgrind tells me there is an inordinate amount of time spent in malloc/free. I'm a bit out of my depth at this stage. Could only guess that I might be running into some sort of memory fragmentation problem? Or contention on mutexes?

Either way I may need to rethink the number of calls I do to malloc.


You should definitely avoid doing malloc during searches. You should be able to get by with statically allocated buffers, local variables and memory allocated at the beginning of the execution.

Title: Re: Multi-Threading strategy
Post by leon_messerschmidt on May 28th, 2014, 3:57pm

on 05/28/14 at 15:49:39, rbarreira wrote:
You should definitely avoid doing malloc during searches. You should be able to get by with statically allocated buffers, local variables and memory allocated at the beginning of the execution.


It sounds so simple when you say it like that  :D

Title: Re: Multi-Threading strategy
Post by rbarreira on May 29th, 2014, 10:24am

on 05/28/14 at 15:57:58, leon_messerschmidt wrote:
It sounds so simple when you say it like that :D


Heh :)

The good thing with an alphabeta search is that the number of threads is known, so you can pre-allocate all the memory needed. Feel free to ask if there's any specific thing that makes that difficult and maybe we can discuss some ideas to solve that.

Title: Re: Multi-Threading strategy
Post by leon_messerschmidt on May 30th, 2014, 3:50am

on 05/29/14 at 10:24:18, rbarreira wrote:
Heh :)

The good thing with an alphabeta search is that the number of threads is known, so you can pre-allocate all the memory needed. Feel free to ask if there's any specific thing that makes that difficult and maybe we can discuss some ideas to solve that.


Thank you for the encouragement.  I've started refactoring some code to get rid of some the most obvious malloc/free calls.

One case that is a bit more interesting is my hash table implementation.  I use the hash to select a bucket where each bucket is a linked list.  This seem like a hard thing to allocate memory ahead of time for.

I tried to switch to a sorted array (with binary search) for which I could allocate a big block of memory.  But because I need to move memory around to insert items in the right positions, it is significantly slower than a hash table.

Title: Re: Multi-Threading strategy
Post by rbarreira on May 30th, 2014, 7:00am

on 05/30/14 at 03:50:34, leon_messerschmidt wrote:
Thank you for the encouragement. I've started refactoring some code to get rid of some the most obvious malloc/free calls.

One case that is a bit more interesting is my hash table implementation. I use the hash to select a bucket where each bucket is a linked list. This seem like a hard thing to allocate memory ahead of time for.

I tried to switch to a sorted array (with binary search) for which I could allocate a big block of memory. But because I need to move memory around to insert items in the right positions, it is significantly slower than a hash table.


I recommend using a fixed bucket size instead of using separately allocated buckets. I use buckets with a fixed size of 4 slots, and overwrite one of the old entries when the bucket is full.

Title: Re: Multi-Threading strategy
Post by leon_messerschmidt on May 31st, 2014, 12:36am

on 05/30/14 at 07:00:02, rbarreira wrote:
I recommend using a fixed bucket size instead of using separately allocated buckets. I use buckets with a fixed size of 4 slots, and overwrite one of the old entries when the bucket is full.


Okay that might work.  Although at the moment some parts of my code make the assumption that the hash table contains a full set of steps so I'll need to rework some things.

I also did another Valgrind and there are some other things that will give me more bang for buck.  For example, I separated the concepts of a Move and a Step.  If I collapse those back into one struct I get to remove a large number of calls to malloc.

Title: Re: Multi-Threading strategy
Post by leon_messerschmidt on May 31st, 2014, 7:43pm
I probably got rid of about a third of the calls to malloc.  I've gone from two threads being substantially slower than a single thread to being about 20% faster!  Yay  ;D

Not quite where I was hoping to get to but at least it a step in the right direction.  Now to tackle that hash table implementation...

Thank you, rbarreira, for your insights.  



Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB 2000-2003. All Rights Reserved.