Author |
Topic: Multi-Threading strategy (Read 3047 times) |
|
leon_messerschmidt
Forum Senior Member
Arimaa player #6344
Gender:
Posts: 26
|
|
Multi-Threading strategy
« on: May 24th, 2014, 10:57pm » |
Quote Modify
|
I'm wondering what other strategies other developers are following to make bots multi-threaded. I think I've more or less reached the limit of what I can squeeze out of a single thread. So, I tried two different approaches to make my search use more cores: 1. I divided the direct descendants of the root node between 4 threads (quad core). I tried to keep individual thread entirely isolated from the other threads so no sharing of alpha/beta values, etc. In practice some threads finish much quicker than others depending on how efficient the pruning works with the descendants that are allocated to them. Unfortunately, as a whole it performs worse than a single threaded search. 2. I based by search on an aspirational alpha/beta search which means I divide the alpha/beta windows between threads. This sort-of works but it is very sensitive to the allocation of alpha/beta values and can under some circumstances be slower than a single threaded implementation. I'm wondering what approaches other bots are taking to run multi-threaded. Anyone willing to share?
|
|
IP Logged |
|
|
|
rbarreira
Forum Guru
Arimaa player #1621
Gender:
Posts: 605
|
|
Re: Multi-Threading strategy
« Reply #1 on: May 25th, 2014, 3:48am » |
Quote Modify
|
I wrote a bit about different ways of searching in parallel for Arimaa in this thread: http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=devTalk;action=display ;num=1385006477 It's easier to do efficient parallel search for Arimaa than chess due to the huge amount of moves to search. The most important thing is to use YBW (young brothers wait) - that means don't start searching other moves before the first move has been searched. You need a score from the first move in order to do zero window (or aspiration) searches for the other moves.
|
« Last Edit: May 25th, 2014, 4:02am by rbarreira » |
IP Logged |
|
|
|
leon_messerschmidt
Forum Senior Member
Arimaa player #6344
Gender:
Posts: 26
|
|
Re: Multi-Threading strategy
« Reply #2 on: May 25th, 2014, 3:53pm » |
Quote Modify
|
Thank you for the answer. I read though your older thread and that has a lot lot of useful information. I modified my multi-threaded implementation to use YBW, but it was still slower than the single threaded version. So I made only one worker thread, which I reasoned should be identical to the single threaded implementation. Unfortunately is wasn't, which means i probably did something stupid when I went from single threaded code to multi-threaded code
|
|
IP Logged |
|
|
|
rbarreira
Forum Guru
Arimaa player #1621
Gender:
Posts: 605
|
|
Re: Multi-Threading strategy
« Reply #3 on: May 26th, 2014, 2:02am » |
Quote Modify
|
That might be good news since it's easier to investigate single-threaded code. Do the chosen best moves still make sense? How much slower is it? Some logging can make it much easier to diagnose problems.
|
|
IP Logged |
|
|
|
leon_messerschmidt
Forum Senior Member
Arimaa player #6344
Gender:
Posts: 26
|
|
Re: Multi-Threading strategy
« Reply #4 on: May 26th, 2014, 3:23am » |
Quote Modify
|
Thank you for the advice. I certainly hope it becomes easier to diagnose when I go to a single worker. It is about 30% slower; which is a big slowdown. The moves still makes sense. I've got a reasonable number of test positions, including a few of the puzzles listed on the arimaa site, that run with every build. My single threaded implementation has gone through a few rounds of optimisation so it isn't always that obvious what is going on. I'll do some valgrind this evening.
|
|
IP Logged |
|
|
|
leon_messerschmidt
Forum Senior Member
Arimaa player #6344
Gender:
Posts: 26
|
|
Re: Multi-Threading strategy
« Reply #5 on: May 28th, 2014, 3:49am » |
Quote Modify
|
The problem seems kinda strange. If I go with smaller searches things seem okay but when I go deeper or more complex positions times fall apart for multi-threaded implementations. Valgrind tells me there is an inordinate amount of time spent in malloc/free. I'm a bit out of my depth at this stage. Could only guess that I might be running into some sort of memory fragmentation problem? Or contention on mutexes? Either way I may need to rethink the number of calls I do to malloc.
|
|
IP Logged |
|
|
|
rbarreira
Forum Guru
Arimaa player #1621
Gender:
Posts: 605
|
|
Re: Multi-Threading strategy
« Reply #6 on: May 28th, 2014, 3:49pm » |
Quote Modify
|
on May 28th, 2014, 3:49am, leon_messerschmidt wrote:The problem seems kinda strange. If I go with smaller searches things seem okay but when I go deeper or more complex positions times fall apart for multi-threaded implementations. Valgrind tells me there is an inordinate amount of time spent in malloc/free. I'm a bit out of my depth at this stage. Could only guess that I might be running into some sort of memory fragmentation problem? Or contention on mutexes? Either way I may need to rethink the number of calls I do to malloc. |
| You should definitely avoid doing malloc during searches. You should be able to get by with statically allocated buffers, local variables and memory allocated at the beginning of the execution.
|
« Last Edit: May 28th, 2014, 3:50pm by rbarreira » |
IP Logged |
|
|
|
leon_messerschmidt
Forum Senior Member
Arimaa player #6344
Gender:
Posts: 26
|
|
Re: Multi-Threading strategy
« Reply #7 on: May 28th, 2014, 3:57pm » |
Quote Modify
|
on May 28th, 2014, 3:49pm, rbarreira wrote: You should definitely avoid doing malloc during searches. You should be able to get by with statically allocated buffers, local variables and memory allocated at the beginning of the execution. |
| It sounds so simple when you say it like that
|
|
IP Logged |
|
|
|
rbarreira
Forum Guru
Arimaa player #1621
Gender:
Posts: 605
|
|
Re: Multi-Threading strategy
« Reply #8 on: May 29th, 2014, 10:24am » |
Quote Modify
|
on May 28th, 2014, 3:57pm, leon_messerschmidt wrote: It sounds so simple when you say it like that |
| Heh The good thing with an alphabeta search is that the number of threads is known, so you can pre-allocate all the memory needed. Feel free to ask if there's any specific thing that makes that difficult and maybe we can discuss some ideas to solve that.
|
|
IP Logged |
|
|
|
leon_messerschmidt
Forum Senior Member
Arimaa player #6344
Gender:
Posts: 26
|
|
Re: Multi-Threading strategy
« Reply #9 on: May 30th, 2014, 3:50am » |
Quote Modify
|
on May 29th, 2014, 10:24am, rbarreira wrote: Heh The good thing with an alphabeta search is that the number of threads is known, so you can pre-allocate all the memory needed. Feel free to ask if there's any specific thing that makes that difficult and maybe we can discuss some ideas to solve that. |
| Thank you for the encouragement. I've started refactoring some code to get rid of some the most obvious malloc/free calls. One case that is a bit more interesting is my hash table implementation. I use the hash to select a bucket where each bucket is a linked list. This seem like a hard thing to allocate memory ahead of time for. I tried to switch to a sorted array (with binary search) for which I could allocate a big block of memory. But because I need to move memory around to insert items in the right positions, it is significantly slower than a hash table.
|
|
IP Logged |
|
|
|
rbarreira
Forum Guru
Arimaa player #1621
Gender:
Posts: 605
|
|
Re: Multi-Threading strategy
« Reply #10 on: May 30th, 2014, 7:00am » |
Quote Modify
|
on May 30th, 2014, 3:50am, leon_messerschmidt wrote: Thank you for the encouragement. I've started refactoring some code to get rid of some the most obvious malloc/free calls. One case that is a bit more interesting is my hash table implementation. I use the hash to select a bucket where each bucket is a linked list. This seem like a hard thing to allocate memory ahead of time for. I tried to switch to a sorted array (with binary search) for which I could allocate a big block of memory. But because I need to move memory around to insert items in the right positions, it is significantly slower than a hash table. |
| I recommend using a fixed bucket size instead of using separately allocated buckets. I use buckets with a fixed size of 4 slots, and overwrite one of the old entries when the bucket is full.
|
« Last Edit: May 30th, 2014, 7:02am by rbarreira » |
IP Logged |
|
|
|
leon_messerschmidt
Forum Senior Member
Arimaa player #6344
Gender:
Posts: 26
|
|
Re: Multi-Threading strategy
« Reply #11 on: May 31st, 2014, 12:36am » |
Quote Modify
|
on May 30th, 2014, 7:00am, rbarreira wrote: I recommend using a fixed bucket size instead of using separately allocated buckets. I use buckets with a fixed size of 4 slots, and overwrite one of the old entries when the bucket is full. |
| Okay that might work. Although at the moment some parts of my code make the assumption that the hash table contains a full set of steps so I'll need to rework some things. I also did another Valgrind and there are some other things that will give me more bang for buck. For example, I separated the concepts of a Move and a Step. If I collapse those back into one struct I get to remove a large number of calls to malloc.
|
|
IP Logged |
|
|
|
leon_messerschmidt
Forum Senior Member
Arimaa player #6344
Gender:
Posts: 26
|
|
Re: Multi-Threading strategy
« Reply #12 on: May 31st, 2014, 7:43pm » |
Quote Modify
|
I probably got rid of about a third of the calls to malloc. I've gone from two threads being substantially slower than a single thread to being about 20% faster! Yay Not quite where I was hoping to get to but at least it a step in the right direction. Now to tackle that hash table implementation... Thank you, rbarreira, for your insights.
|
|
IP Logged |
|
|
|
|