Author |
Topic: Go starting to show cracks as an AI challenge? (Read 12844 times) |
|
omar
Forum Guru
Arimaa player #2
Gender:
Posts: 1003
|
|
Re: Go starting to show cracks as an AI challenge?
« Reply #1 on: May 10th, 2011, 7:31pm » |
Quote Modify
|
Wow, sounds like Remi Coulom working with his PhD student Aja has hit upon some more breakthroughs that significantly improved the performance of Crazy Stone. Everyone seems to want to know what they are.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Go starting to show cracks as an AI challenge?
« Reply #2 on: May 11th, 2011, 12:02am » |
Quote Modify
|
I love the way strong Go players, including professionals, routinely put their reputations on the line in man vs. machine games, both formal and informal. The attitude of the Go world puts the professional shogi association to shame. Shogi may be only a few years from falling to computers, or not, but we can't get an accurate reading because professionals are forbidden from publicly playing against computers. With Go, we can get a much clearer sense of the state of the art, and incidentally it is somewhere around one dan for tournament Go, for all the buzz bonobot's rating engendered.
|
|
IP Logged |
|
|
|
Janzert
Forum Guru
Arimaa player #247
Gender:
Posts: 1016
|
|
Re: Go starting to show cracks as an AI challenge?
« Reply #3 on: May 11th, 2011, 11:33am » |
Quote Modify
|
I think the computers are probably now getting very close if not already at 2 dan. Besides the Crazystone rating on kgs Pachi has had some nice wins against high ranked players. (links to sgf files) H7 over 9p* H7 over 9d H6 over 8d H6 over 7d H6 over 7d Also along with Bonobots recent 5d, Manyfaces(2d), pachi2(3d) and Zen19(4d) are all now above 1d on KGS. On the contrary side though; in the same event pachi won against the 9p it also lost a H6 game against a 5p and most of the kgs rankings have been attained primarily through blitz time controls. I also think John Tromp was either very prescient or got very lucky and picked just about the perfect time limit for his Go bet. Janzert * Short report on event with the win against 9p
|
« Last Edit: May 11th, 2011, 11:36am by Janzert » |
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Go starting to show cracks as an AI challenge?
« Reply #4 on: May 11th, 2011, 4:43pm » |
Quote Modify
|
on May 11th, 2011, 11:33am, Janzert wrote:On the contrary side though; in the same event pachi won against the 9p it also lost a H6 game against a 5p and most of the kgs rankings have been attained primarily through blitz time controls. |
| You make the case against yourself well. The embarrassing computer losses don't make the headlines, but there are plenty of such losses to offset the surprising wins and the high server rankings at blitz speed. An additional argument against the most notable machine victories is that the tradition of Go rankings is based on handicap stones, which exaggerates differences between beginners and conceals differences between experts. At a high level a six-stone handicap is overwhelming, indicative of more than six ranks if ranks were calculated by winning percentages as Elo does. Matches without handicap, like the Tromp challenge, seem more telling to me than handicap matches. Handicaps in Go are similar to handicaps in Arimaa; they count for more near the top of the scale. If two Arimaa beginners win 50-50 at camel handicap, then the weaker player might only be 100 Elo behind at even games. In contrast, someone who is 50-50 against chessandgo given a camel handicap is probably 600 Elo behind chessando at even games. Quote:I also think John Tromp was either very prescient or got very lucky and picked just about the perfect time limit for his Go bet. |
| This particular even match was a blowout at 4-0. Do you put the computer close to 2 dan because the games themselves were close despite the lopsided final score? Or are you saying it was close temporally, because computers are advancing so fast that Tromp would lose in another couple of years despite the fact that presently top computers might only be 1 dan? Or did Tromp "get lucky" because he was no better than the computer but won anyway just as one might flip four heads in a row on a fair coin? I guess if I say "around shodan" and you say "around two dan" we aren't disagreeing that much. Our difference in perception is probably less than the difference between European shodan, American shodan, Japanese shodan, Korean shodan, etc.
|
« Last Edit: May 11th, 2011, 5:35pm by Fritzlein » |
IP Logged |
|
|
|
Janzert
Forum Guru
Arimaa player #247
Gender:
Posts: 1016
|
|
Re: Go starting to show cracks as an AI challenge?
« Reply #5 on: May 11th, 2011, 10:07pm » |
Quote Modify
|
on May 11th, 2011, 4:43pm, Fritzlein wrote: You make the case against yourself well. The embarrassing computer losses don't make the headlines, but there are plenty of such losses to offset the surprising wins and the high server rankings at blitz speed. |
| Well without the pessimistic evidence you'd have to say the computers are at least 3 if not 4 dan already. Since there are 2 bots with a 4d or higher rating and 4 with 3d or higher (I forgot to put EricaBot at 3d in the list earlier). Also these ratings are derived cumulatively from hundreds if not thousands of games. So it's not like bots are just using an unexpected trick to get these wins. Quote:An additional argument against the most notable machine victories is that the tradition of Go rankings is based on handicap stones, which exaggerates differences between beginners and conceals differences between experts. At a high level a six-stone handicap is overwhelming, indicative of more than six ranks if ranks were calculated by winning percentages as Elo does. |
| Even worse professional rank is a lifetime achievement ranking, not a measure of current ability. So unlike say a top Chess player a Go professional will never drop in rank. So yep I'm in complete agreement the Go ranking system is much coarser and less accurate than something like the chess ELO system (not that the chess system is without flaws either). Quote:Matches without handicap, like the Tromp challenge, seem more telling to me than handicap matches. Handicaps in Go are similar to handicaps in Arimaa; they count for more near the top of the scale. If two Arimaa beginners win 50-50 at camel handicap, then the weaker player might only be 100 Elo behind at even games. In contrast, someone who is 50-50 against chessandgo given a camel handicap is probably 600 Elo behind chessando at even games. |
| Yep, even though Go does have a much better handicapping system than most games, even matches are certainly the best way to measure strength. Quote:This particular even match was a blowout at 4-0. Do you put the computer close to 2 dan because the games themselves were close despite the lopsided final score? Or are you saying it was close temporally, because computers are advancing so fast that Tromp would lose in another couple of years despite the fact that presently top computers might only be 1 dan? Or did Tromp "get lucky" because he was no better than the computer but won anyway just as one might flip four heads in a row on a fair coin? |
| I was meaning that I think Tromp ended up with the last year he had a better than even chance of winning the bet. Despite the overwhelming final score, during the games themselves the sentiment was that things were much closer. At least one of the games the spectators were sure the computer had a completely won position but blundered it away. Since December ManyFaces has gained ~0.5 a stone in rank on KGS. It's also a bit unfortunate that Zen was unable to be used for the tournament, as it was (and is) pretty widely considered stronger than ManyFaces currently. Actually since December it has won every monthly KGS bot tournament (although it didn't participate in the extra "slow game" tournament). Also of note regarding the pinnacle strength of current bots is that the bet restricted the hardware that could be used. Most (all?) of the top bots in competition can use clusters now and extract a nice little boost in strength from them. Quote:I guess if I say "around shodan" and you say "around two dan" we aren't disagreeing that much. Our difference in perception is probably less than the difference between European shodan, American shodan, Japanese shodan, Korean shodan, etc. |
| Yeah, the differences in the actual strength meant by a given rank depending on where it is measured is certainly another problem in discussing Go ranks. My personal base of reference is the KGS server which seems to be where most of the computer go activity takes place. According to Sensei's Library ranking comparisons KGS 1-2d is somewhere between 2k Korean and 3d in Japan or AGA ranks. So given that I'm American I'll now declare that currently computer go is clearly at least 3 dan. Janzert
|
« Last Edit: May 11th, 2011, 10:08pm by Janzert » |
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Go starting to show cracks as an AI challenge?
« Reply #6 on: May 12th, 2011, 2:18am » |
Quote Modify
|
on May 11th, 2011, 10:07pm, Janzert wrote:Well without the pessimistic evidence you'd have to say the computers are at least 3 if not 4 dan already. |
| You are eloquent on both sides. Quote:Since there are 2 bots with a 4d or higher rating and 4 with 3d or higher (I forgot to put EricaBot at 3d in the list earlier). Also these ratings are derived cumulatively from hundreds if not thousands of games. So it's not like bots are just using an unexpected trick to get these wins. |
| What is the effect of the time control? I tend to guesstimate that Arimaa bots gain 150 Elo relative to humans when playing at 15s/move instead of 2m/move. Also, my experience on both ICC and the Arimaa game room is that bots either have their ratings driven far lower than they should be by folks beating them repeatedly according to formula, or else their ratings are driven far higher than they should be by players who don't know how to win losing repeatedly to a bot that never gets tired and never overlooks simple tactics. Does KGS suffer the same phenomenon of bots being significantly underrated or overrated depending on whether their recent opponents have been better or worse? Finally, are the KGS games involving bots mostly played at handicap? I read up on the rating system at one point, and the attempt they made to quantify the tradeoff between handicap stones and winning percentages. My recollection is that they didn't give the higher-ranked player nearly enough winning chances at even games. This would mean, for example, that if a 4dan and a 1dan were breaking even at three stones handicap, and then they switched to playing even games, the 4dan would win an even higher percentage of games than predicted by the server, and thus gain in rating due to the even games. Or, to put it another way, if the KGS bots are getting their 4dan ratings by winning 50% of the time against 1kyu human players at a handicap of four stones, then I will have to give those bot ratings a lot more respect than I have been. That would be very impressive indeed. Sharp2011Blitz attained a rating of 2342, but I'm wary of using that to say the Arimaa Challenge is just about to fall, and that wariness is even given a rating system that is more reliable (less of a hack job) than Go rating systems tend to be. But perhaps I'm underestimating the power of the KGS ratings. I'm willing to be educated. (and thanks for all the information so far!) Quote:I was meaning that I think Tromp ended up with the last year he had a better than even chance of winning the bet. Despite the overwhelming final score, during the games themselves the sentiment was that things were much closer. At least one of the games the spectators were sure the computer had a completely won position but blundered it away. |
| Interesting. The fact that the games were dramatic indicates in itself that the players were close in skill. On the other hand, blundering away a clearly won position cuts both ways; why should we think well of the computer for getting a lead and not think poorly of it for blowing that lead? The sum of its strength and its weakness was still a loss. Quote:Since December ManyFaces has gained ~0.5 a stone in rank on KGS. |
| OK, so your comment about Tromp getting lucky does have to do with the current rate of progress in Go AI. Are computers gaining a rank every year? It used to be a much slower pace of progress, if I remember correctly. Quote:Most (all?) of the top bots in competition can use clusters now and extract a nice little boost in strength from them. |
| What is the current conventional wisdom on the value of hardware to Go AI's? Is there a formula analogous to chess engines gaining 50-100 Elo per doubling of hardware?
|
« Last Edit: May 12th, 2011, 2:42am by Fritzlein » |
IP Logged |
|
|
|
lightvector
Forum Guru
Arimaa player #2543
Gender:
Posts: 197
|
|
Re: Go starting to show cracks as an AI challenge?
« Reply #7 on: May 12th, 2011, 12:13pm » |
Quote Modify
|
I don't know about how well KGS maps its underlying numeric model to handicap stone differences or Go rank differences, and I wouldn't be surprised if it wasn't that great. I also don't know about winning chances between players of different ranks (Sensei's library had a page with stats on this, but I can't find it). But my personal experience is that KGS is quite internally consistent above around 10k or so. If you play someone +1 rank, you are moderately more likely to lose, +2 very likely. If you take two players at given different ranks, then it tends to be the case that a fair game occurs at the handicap equal to the rank difference (after accounting for the fact that Go's handicapping system is 1/2 stone off) plus/minus half a stone if the rank difference is "small", and plus/minus a stone if large. In terms of the numeric model itself, I know that KGS uses some sort of maximum-likelihood or bayesian method (similar to Bayeselo), with decaying weights for older games. I generally trust such a model a little more than a typical elo system, although it is certainly still open to ratings abuse and such. Games against these bots are typically a good mix of players (because no single player tends to play for hours on end against the same bot, and the bots are in-demand, often remaining open for less than 30 seconds before the next person starts a game). Informally looking at the game records, the rank distribution of the players tends to be around the rank of the bot, with a very heavy tail on the weaker side and a very light tail on the stronger side. on May 12th, 2011, 2:18am, Fritzlein wrote: Are computers gaining a rank every year? It used to be a much slower pace of progress, if I remember correctly. |
| My memory is a bit vague, but a rank every year sounds approximately right. After the massive initial jump in strength following the discovery of MCTS in 2006-2007, I think I recall bots around 4k-1k in 2008 on high-end hardware. In 2009 or so, I think they got to something as high as 1d-3d on massive clusters, but maybe only 2k-1d at best otherwise. And currently, I am highly confident that the best bots are at least 2d on high-end hardware (ex. 8-32 core machines), probably even at time controls somewhat slower than blitz. These are all in KGS ranks. Prior to MCTS, strength was static. Programs were too fragile, and any improvements in one area would simply worsen them in other areas. Any amount of additional running time did almost nothing to help because the algorithms didn't scale. Now with MCTS, research and improvements are continuing at a good pace, and the algorithm shows clear gain with more running time.
|
« Last Edit: May 12th, 2011, 1:12pm by lightvector » |
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Go starting to show cracks as an AI challenge?
« Reply #8 on: May 12th, 2011, 5:12pm » |
Quote Modify
|
This conversation has piqued my interest. I found the description of KGS ratings on Sensei's Library. I applaud the fact that ranks are now converted into win percentages differently depending on whether the players are beginners or experts: . expected rank win diff rate 0.0 50% 0.5 60% 1.0 70% (k=.85, for 30k-5k) 1.5 78% 2.0 85% 2.5 89% 0.0 50% 0.5 66% 1.0 79% 1.5 88% (k=1.30, for 2d+) 2.0 93% 2.5 96% This is an improvement, although I expect the win percentages are still overstated for beginners and still understated for top players. At the upper end, where bonobot was getting its ranking, understated win percentages mean that in games that are played with a handicap unequal to rank difference, the favored player will get an inflated rating relative to all games being played at appropriate handicap. I figured out how to see the list of bonobot's games used to attain the 5d rank. The majority of games were played at an appropriate handicap, which makes me respect bonobot's rating more. It was doing as I said, giving five stones to 1k's and holding its own. Impressive. However, of the significant number of games not played at appropriate handicap, nearly all of them left bonobot at advantage relative to its rank, which would tend to inflate its rating somewhat as described above. So the effect that I was afraid of is probably still there, albeit at a much reduced level compared to what I was talking about in my previous post. I see also that the rating spanned about a hundred rated games against a variety of opponents; I didn't see evidence of a single weaker player inflating bonobot's rating by being unwilling to quit despite losing every game. Any such effect would be collective, not individual. I wonder if Go bots display the same general phenomenon that Arimaa bots do, namely that a bot loses more than expected against higher-rated players, and wins more than expected against lower-rated players, as if AI's have a tighter performance distribution than humans do. Also I saw a player "zen" ranked at 1d. Surely this isn't the computer that is winning all the computer tournaments? The remaining questions which I am not sure how to research myself are the relative effects of playing faster, and the value in ranks of doubled computing power. Gradually I am becoming convinced that Go AI has advanced further than I thought. How intriguing!
|
« Last Edit: May 12th, 2011, 10:09pm by Fritzlein » |
IP Logged |
|
|
|
Janzert
Forum Guru
Arimaa player #247
Gender:
Posts: 1016
|
|
Re: Go starting to show cracks as an AI challenge?
« Reply #9 on: May 13th, 2011, 12:46am » |
Quote Modify
|
Sorry, I ran out of the time that this conversation deserves. But I'll try and throw in a few quick things anyway. on May 12th, 2011, 5:12pm, Fritzlein wrote:Also I saw a player "zen" ranked at 1d. Surely this isn't the computer that is winning all the computer tournaments? |
| Zen19 is the username you want (similarly it uses Zen13 and Zen9 for 13x13 and 9x9). Quote:The remaining questions which I am not sure how to research myself are the relative effects of playing faster, and the value in ranks of doubled computing power. |
| The best study I've seen on doubling time effects for MCTS go bots was done by Don Dailey a few years ago (it would be interesting to see another one with modern bots). He covered 12 doublings using two MCTS bots a relatively weak one by himself, Lazarus, and one of the strongest ones at the time, Mogo. Unfortunately I'm not sure if he ever published any sort of permanent final report. All I've been able to find is an email thread discussing some intermediate results. If I'm remembering right somewhere in that thread he also listed later if not final results. Most unfortunate is the the actual results were given in a graph he made and it seems to no longer be hosted where the email messages point to. One post mentions 200 elo per doubling though. But I also saw another message from much later on referring back to the study that said around 90 elo per doubling. Janzert
|
« Last Edit: May 13th, 2011, 12:51am by Janzert » |
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Go starting to show cracks as an AI challenge?
« Reply #10 on: May 13th, 2011, 1:35am » |
Quote Modify
|
Thanks for the link to Dailey's thread. It is good to know that performance is indeed linear in log of computing power, but unfortunate that we don't know the slope. The scale used by KGS implies that every rank above 2dan is 226 Elo. If current bots are 7 ranks below the top humans (what are the top KGS ratings?), that would translate to 1580 Elo. At 90 Elo per hardware doubling, we just need 17.5 doublings, i.e. hardware that is 194,000 times faster. Projecting a continued gain of one rank per year is much more optimistic that my above estimate. Can the Go programming community continue to make smashing algorithmic improvements instead of waiting 35 years for Moore's law to bring home the victory?
|
« Last Edit: May 13th, 2011, 1:46am by Fritzlein » |
IP Logged |
|
|
|
aaaa
Forum Guru
Arimaa player #958
Posts: 768
|
|
Re: Go starting to show cracks as an AI challenge?
« Reply #11 on: May 25th, 2011, 4:50pm » |
Quote Modify
|
Although the values of the rating system used by the European Go Federation can't actually be perfectly converted to Elo equivalents (as the former doesn't follow the Bradley-Terry model), it might be interesting to know that, by its reckoning, the expected outcome of an even game between an amateur (European) 1 dan (who would apparently be about 2 dan on the KGS) and a professional 9 dan would correspond to a rating difference of about 1459 Elo points.
|
|
IP Logged |
|
|
|
aaaa
Forum Guru
Arimaa player #958
Posts: 768
|
|
Re: Go starting to show cracks as an AI challenge?
« Reply #12 on: Jun 17th, 2011, 12:34pm » |
Quote Modify
|
For a few days now, Zen, as Zen19S, has been able to maintain a 4-dan rating with games played with an overtime of 30 seconds per move. This incarnation had also won the most recent KGS slow bot tournament held a month ago. Hardware: Quote:mini-cluster of 6 pcs (a 6-core Xeon W5680/4 GHz, two 4-core i7 920/3.2 GHz, and three 4-core Core2Quad/3 GHz) connected via a GbE LAN |
|
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Go starting to show cracks as an AI challenge?
« Reply #13 on: Jun 17th, 2011, 2:08pm » |
Quote Modify
|
on May 13th, 2011, 1:35am, Fritzlein wrote:Thanks for the link to Dailey's thread. It is good to know that performance is indeed linear in log of computing power, but unfortunate that we don't know the slope. |
| So, we have confirmation both that performance is linear in log of number of playouts, and confirmation that the slope is 100 Elo per doubling: http://dvandva.org/pipermail/computer-go/2011-June/003498.html But, just as with the chess study we discussed in another thread, when you look at the data itself instead of reading the summary, it sure doesn't look linear: http://dvandva.org/pipermail/computer-go/attachments/20110617/491c8b58/a ttachment-0001.gif And if it isn't linear, then the concept of "Elo points per doubling" shouldn't be used to extrapolate way out on the curve, e.g. to guess how many doublings are needed to reach 9 dan, as I did in my previous post.
|
« Last Edit: Jun 17th, 2011, 2:11pm by Fritzlein » |
IP Logged |
|
|
|
|