Arimaa Forum - Print Page


    
      
        Arimaa Forum
        (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
      

        Arimaa >> General Discussion >> A Turing Test
        
(Message started by: rozencrantz on Sep 16^th, 2010, 1:35pm)

Title: A Turing Test
Post by rozencrantz on Sep 16^th, 2010, 1:35pm

I've read a lot about chess and Go players getting a read of the personality of their opponent, Yasunari Kawabata in particular talked a lot about how one's character is revealed in playing Go. I've even read some comments [citation needed] describing playing Chess at high levels as a rather intimate act.

So: is the art of Arimaa developed enough for that kind of thing? Do bots play noticeably differently from equally strong (or weaker) humans? Is the skill gap at the extreme end of things the only difference between human and bot play? (I doubt it)

And ultimately, could you play two unmarked opponents of similar skill and discern the human from the bot? Is there a way of trying this, with at least some degree of blinding? Has this been tried?

Title: Re: A Turing Test
Post by clojure on Sep 16^th, 2010, 1:49pm

This test has fatal flaw as a Turing test where the human tries to convince the watcher of him being human. So easy way is to start playing not for the win but encoding messages as moves.

But if both players were not informed they were being inspected, it's different matter. But to my knowledge current bots don't tend to for example learn between games. So if many a game is allowed, the one who is getting better at beating the other one, is human.

Also as a beginner player, I'd like to think that top human players easily distinguish sensible strategic movements from tactical, and the play yells of the overall understanding what to do, which bots struggle with. On the other hand, if player makes clearly a bad tactical move that does not make any advantages in long term, it's probably that the player is human if both are rated high.

These are the first things that popped up to my mind... Overall I agree that, for example in Go, player's characteristic might show up strongly but I suspect that best players can imitate a bit worse players quite well.

edit: I have no idea whether this kind of experiment has been tried.

Title: Re: A Turing Test
Post by Sconibulus on Sep 16^th, 2010, 1:57pm

You can tell the difference between humans and bots when playing them, usually fairly easily. There are only two entities I recall playing that felt like there could have been any question. Fritzlein, because he crushed me so thoroughly and controllingly that he could as easily have been a superior bot as a superior human, and bot_Marwin, who feels fairly humanesque somehow. I think it's because of the way he decides moves, I believe I was told after my games that he chooses weighted-random from among those moves he decides are best.

Title: Re: A Turing Test
Post by Nombril on Sep 16^th, 2010, 3:53pm

Interesting question. I sometimes wonder if I approach playing a bot and a human differently, I think this must be related to me having an expectation that a bot and a human will play differently.

I've noticed many bots (even good ones) will sometimes move a piece 1 step and then back to where it started (as 2 of 4 steps) when there isn't anything else it can find to do. If the bots aren't programmed to "disguise" themselves, this could be a big give away. So I think at this point it would be relatively easy to tell the difference, especially if the sample size were large enough.

Title: Re: A Turing Test
Post by Fritzlein on Sep 16^th, 2010, 10:50pm

Recall that in the Deep Blue vs. Kasparov match, the Turing Test was passed by Deep Blue. The evidence is that Kasparov accused IBM of cheating, saying that a computer could not have played such moves without human assistance. That incident makes me suspect that it is only easy to tell a human opponent from a computer opponent until such time as the opponent plays better than you.

Title: Re: A Turing Test
Post by clojure on Sep 17^th, 2010, 8:01am

Fritzlein, I'm a bit confused why you use technical term so freely. In Turing test, the one who tries to distinguish the human from computer, is interacting with them, which is a crucial component.

Here's the original paper:
http://mind.oxfordjournals.org/content/LIX/236/433

But nevertheless the question rozencrantz is asking is interesting. Even if bots would outbeat humans, could they imitate human characteristics in an Arimaa game so that an observer cannot tell which one is a bot.

Title: Re: A Turing Test
Post by Fritzlein on Sep 17^th, 2010, 11:38am

on 09/17/10 at 08:01:12, clojure wrote:

Fritzlein, I'm a bit confused why you use technical term so freely.

What I meant to say is that one can think of limiting the interaction to a chess game, rather than to a conversation, to get the Turing Test For Chess, and that Deep Blue passed the Turing Test For Chess. Is your objection to my post that I forgot to add the words "For Chess"? If so, then 99% of the disagreement between us was due to my being inattentive. On the other hand, if your objection was the people play games to play games and aren't really focusing on determining the identity of their opponent, then we have a more substantive difference in perspective.

I appreciate your point about the possibility of encoding conversation in chess moves, but I think that is not in the spirit of rosencrantz's original post, nor what I am most curious about. The more immediate question for me is not whether a chess game could be used as a tool in some rarefied way to distinguish a human from a computer, but rather whether chess when played for the sake of chess (e.g. trying to win every game) allows one to incidentally detect whether one's opponent is human or AI.

You refer to the Turing Test as technical term, but it is hardly as precisely defined as technical terms such as "renal cell carcinoma". For example, you might say that as long as there exists some pattern of questioning, however unrelated to normal human conversation, that allows an expert human interrogator to distinguish man from machine, the Turing Test has not been passed. I would tend to set a much lower bar, namely that if I treat my conversational opposite normally, as I would treat a human, and fail to detect that I am getting non-human responses, then the Turing Test is passed in my book. I mostly want to know whether whether conversation undertaken for the sake of conversation allows one to incidentally detect whether one's conversational partner is human or AI.

The "naturalness" of the conversation in question seems to matter somehow. For example, if I were in a real conversation, and someone asked me some bizarre question out of the blue for the purpose of determining whether or not I was human, a very human and natural response from me would be, "I don't have time or energy to convince you of my humanity. Goodbye." In order to set up the Turing Test as he described it, though, we have to make this thoroughly human response against the rules!

The standard that Turing proposed is more demanding than mine, in that the objective of the questioner is not to have a normal conversation, but explicitly to make a determination, in a highly artificial situation, whether or not the conversation partner is human. I have a hunch, though, that there are grey areas that would disturb even Turing, were he alive to see the details of the contest played out. If computers come close to succeeding, there will probably be a time period in which more and more specialized expertise in necessary to make a correct determination, and ordinary people with ordinary questions are less and less capable of doing it. Turing himself might consider machines intelligent even if one person somewhere still has the technique to discriminate but nobody else does.

Title: Re: A Turing Test
Post by Hippo on Sep 17^th, 2010, 12:31pm

on 09/16/10 at 13:57:03, Sconibulus wrote:

What is the main difference in marwins play to other bots is ... marwin does not like passivity. It always choses a plan what to do ... usually plans longing several turns. These plans are not necessary optimal, but it forces his opponent to play reactively.

Against other bots its sufficient to gain small advantage and slowly improve the position. As they don't attack you could play according to strategical plan and improve your position till they lost.

On the contrary Marwin interrupts such a slow decay by counterstrikes really often. You could gain something from the counterstrikes as they are not safe, but if you miss some of them, you could lose a
lot.

This is why I have big problems with MarwinBlitz. ... Pondering is much more complicated in reactive game than in proactive.

Title: Re: A Turing Test
Post by Fritzlein on Sep 17^th, 2010, 12:48pm

on 09/16/10 at 13:35:12, rozencrantz wrote:

I've even read some comments [citation needed] describing playing Chess at high levels as a rather intimate act.

I recall from way back in the early days of the ICC, Garry Kasparov once logged on anonymously for some practice before a match, and was surprised to meet very stiff (albeit very human) resistance from one player. If I remember correctly, he was able to deduce precisely that his opponent was Peter Svidler. Sounds intimate to me...

Title: Re: A Turing Test
Post by clojure on Sep 17^th, 2010, 1:14pm

Well, my objection was that chess is only a subset of the possibilities what human judge can use as a tool to identify which one is computer. I think the spirit of Turing test is that human should be able to use every way possible but physical detection. So the tests beauty is that it doesn't predefine how judge can work out the problem. If he had done that, he would have made the test less appealing as a universal test. Consider when the computer passes Turing test with predefined rules, which even don't probably capture the essence of human intelligence: people would object that humans were handicapped and keep continuing to believe that humans have intelligent in different class. Well, maybe they woudn't. But I would.

As I said, the question of limited Turing test with participants that are not even aware of being tested, is fascinating, but let's not call it Turing test. Even if I misused the word technical, it is still quite well established, and the true spirit is different from this even though they have the same kind of objective: to see whether we are similar to computers in current age.

So let's call it rather rozencrantz test. (Maybe they will make Watchmen 2 with Rorschach replaced with a robot).

Title: Re: A Turing Test
Post by Fritzlein on Sep 17^th, 2010, 1:31pm

on 09/17/10 at 13:14:21, clojure wrote:

As I said, the question of limited Turing test with participants that are not even aware of being tested, is fascinating, but let's not call it Turing test. Even if I misused the word technical, it is still quite well established, and the true spirit is different from this even though they have the same kind of objective: to see whether we are similar to computers in current age.

Hmmm, I'm not sure how different the "true spirit" of two tests can be if they have the same objective and similar means. We're both talking about whether a computer can convincingly simulate a human through the medium of conversation. Were the Kasparov vs. Deep Blue match and the Kramnik vs. Deep Fritz match in a different spirit because Kramnik had many advantages (commodity hardware, unchanging opponent, pre-match opportunity to study opponent, in-match view of opening book, etc.) that Kasparov didn't have? In my mind they were both man vs. machine contests with the bar for computers raised or lowered my the match conditions. Similarly I think of what I propose as very much in the true spirit of the Turing Test but with the bar lowered for the computer by the conversation conditions. I'm a bit suspicious that humans who want to avoid calling computers intelligent will instinctively insist on the highest possible bar, regardless of whether it is in line with a common-sense understanding of conversationally simulating a human.

But maybe I am mistaken to view things so loosely that one can refer to a Turing Test rather than the Turing Test. Perhaps I converse with too many philosophers and not enough computer scientists. ;)

As for restricting the medium of interaction to a game rather than conversation, I agree with you that it shouldn't be called the Turing Test. That was my mistake in my first post in this thread. I should use some term like chess-turing-test to distinguish it from the (real) conversation-turing-test.

Title: Re: A Turing Test
Post by clojure on Sep 17^th, 2010, 1:37pm

I forgot to say this: if we were to limit it to particular game, for instance, it would make some serious problems.

First off, how do we know that the game is good enough a test. Clearly 3x3 tic-tac-toe would fail as one. So we would spend lots of time arguing whether that particular game captures intelligence... So we had to just talk about similar style in particular game...

Also how would you reliably setup this experiment? How is the bot selected, and how is the player selected that doesn't know he is participating in this contest?

Title: Re: A Turing Test
Post by Fritzlein on Sep 17^th, 2010, 1:45pm

on 09/17/10 at 13:37:07, clojure wrote:

First off, how do we know that the game is good enough a test. Clearly 3x3 tic-tac-toe would fail as one. So we would spend lots of time arguing whether that particular game captures intelligence...

The argument about whether any particular test captures intelligence is unavoidable. The are people who argue that even if a machine could pass the Turing Test as you define it, that machine would not be intelligent. I engaged in an e-mail debate with such a person just last month; he insisted that human intelligence encompasses more than the ability to converse.

In my previous posts I was arguing about what I think deserves the label of "Turing Test", not what I think deserves the label of "intelligence". My personal opinion is that computers are already intelligent, albeit not yet as intelligent as humans. The Turing Test is only one factor in my judgement of how intelligent machines have become.

The tic-tac-toe-turing-test is obviously one that computers could pass, but it doesn't play a very big role in my understanding of machine intelligence. ;)

Title: Re: A Turing Test
Post by clojure on Sep 17^th, 2010, 1:55pm

So let's focus on Turing test as in picking up human traits and not intelligence.

I think the spirit differentiates for example in how well one can change your behavior depending on the earlier conversations / games. With arbitrary conversation, you can quickly see whether the other one is following your thought. With game, it's much harder to see whether the participant is reacting similarly or is having true understanding what you mean.

Also, I'm still wondering how you can make fair setup of the test. If bots know they are participating, it's a failed test. How do you know the bot isn't been done humanish behaviour particularly in mind?

Title: Re: A Turing Test
Post by clojure on Sep 17^th, 2010, 1:58pm

I kinda agree that "that human intelligence encompasses more than the ability to converse."

For example, some autistic people have incredible talent with music but when they try to describe it, they cannot; even so that their brain image shows how his brain disagrees what he is saying.

I saw on youtube where guy could repeat in piano the notes what the whole small orchestra was playing. It was something like 15-20 notes at one point of time interval.

If people like that participate in improvising beautiful music in real time, I think it's clear indication of intelligence.

Well it is conversation in generalized sense but...

Also some parts of human brains activate only when in visual contact with other person.

Title: Re: A Turing Test
Post by Fritzlein on Sep 17^th, 2010, 3:57pm

on 09/17/10 at 13:55:39, clojure wrote:

Also, I'm still wondering how you can make fair setup of the test. If bots know they are participating, it's a failed test. How do you know the bot isn't been done humanish behaviour particularly in mind?

I guess in any form of Turing Test the goal of the bot is to deceive. In chess the bot would have to disguise its super-human tactics; in conversation it would have to disguise its super-human ability at arithmetic.

It is the role of the human judge that varies in our different conceptions. Does the judge twist the interaction to maximize chances of correctly identifying a non-human? Or does the judge play/talk normally and then give an opinion as to the human-ness of the other party?

Title: Re: A Turing Test
Post by rozencrantz on Sep 17^th, 2010, 4:02pm

This... is not what I expected.

I was using a rather loose and humanistic sense of the phrase, and perhaps I should not have. Forgive me.

I'm trying to construct a philosophy of games, and peel back some of the flowery metaphors that gamers love, to see what's under them.

I'm also fascinated by the idea of non-human intelligence, and if games really can be expressive, game AI might be the place to look for that.

Is Arimaa an expressive game? How expressive? If you played several 1900-2000 level players, some of whom were bots using sock-puppet accounts, would you notice?

And most importantly, what would you notice?

Title: Re: A Turing Test
Post by clojure on Sep 17^th, 2010, 4:33pm

on 09/17/10 at 15:57:06, Fritzlein wrote:

But this is one of the reasons why the "limited turing test" is not a Turing test. The human player (not judge) is not allowed to participate freely but the bot is.

Now since you agreed that the bot might deceive I have an argument that I find compelling to make. The bot can analyze existing data on that particular game where as in real Turing test, the field is enormous. To say the least, this is in completely different class of problem for the programmer. The programmer can anticipate what the judge is doing in very, very limited way. The bot can be encoded to have specificly hand-tuned patterns on how to react different situations. I feel that this is so much different from free conversation that I cannot even comprehend.

Quote:

It is the role of the human judge that varies in our different conceptions.

Yeah. Interesting test but quite problematic IMO. I'm no mathematician/psychologist however...

Title: Re: A Turing Test
Post by clojure on Sep 17^th, 2010, 4:41pm

on 09/17/10 at 16:02:05, rozencrantz wrote:

This... is not what I expected.

I was using a rather loose and humanistic sense of the phrase, and perhaps I should not have. Forgive me.

No, no! Forgive me. I was aware of your intention but I cannot sometimes stop myself...

Quote:

I'm trying to construct a philosophy of games, and peel back some of the flowery metaphors that gamers love, to see what's under them.

Yeah. It really is interesting topic in both forms of question.

Quote:

Is Arimaa an expressive game? How expressive? If you played several 1900-2000 level players, some of whom were bots using sock-puppet accounts, would you notice?

Let me back on that in 2020 :P

Quote:

And most importantly, what would you notice?

I would notice how the bot positions his pieces. Whether he makes moves that can be seen as only making spatial relation of pieces better in anticipation of move that is coming in 30 moves. I would notice if the player made very solid tactical play, especially when there is huge dynamics/tension in local situation where it would be easy to blunder.

I would notice if the opponent timed out in move 15 without saying anything (and was still online).

Title: Re: A Turing Test
Post by Fritzlein on Sep 17^th, 2010, 4:50pm

on 09/17/10 at 16:02:05, rozencrantz wrote:

If you played several 1900-2000 level players, some of whom were bots using sock-puppet accounts, would you notice?

And most importantly, what would you notice?

If the bots were as at present, i.e. trying to play their best rather than trying to win with highest probability, the tipoff I would notice first is strong tactics relative to weak strategy. Finding goal in two in seconds is a dead giveaway; missing a goal in two after thinking for a minute is a sure sign of a human. Strong tactics could be disguised by a developer sprinkling in blunders, but it is harder to disguise weak strategy. For example, accepting a slow, inevitable decline rather than mixing it up is a bot characteristic, although Marwin does the best at this by usually finding a way to keep the position dynamic.

The surest sign of a computer, but one that would take longer to manifest, is a lack of learning, as several have mentioned. Humans try creative new things and bots don't. In particular, humans will pick up ideas you use against them and try to use those ideas right back at you. I don't know how a developer could hide that difference at present.

In our small community, there is also the possibility of identifiable playing styles. For example, The_Jeh and hanzack are nearly the same playing strength, but I would have no trouble distinguishing between the two of them. ChrisB gives himself away by being the only player I know at that level who always sets up his camel behind a trap. But I couldn't easily pin down Nombril on the basis of his moves. Similarly Marwin and Bomb have quirks that are pretty specifically characteristic to them, while Clueless plays a more "vanilla" game that I would have trouble putting my finger on. Or maybe it's just a function of my familiarity with each opponent.

Title: Re: A Turing Test
Post by Fritzlein on Sep 17^th, 2010, 4:56pm

on 09/17/10 at 16:33:42, clojure wrote:

Now since you agreed that the bot might deceive I have an argument that I find compelling to make. The bot can analyze existing data on that particular game where as in real Turing test, the field is enormous. To say the least, this is in completely different class of problem for the programmer. The programmer can anticipate what the judge is doing in very, very limited way. The bot can be encoded to have specificly hand-tuned patterns on how to react different situations. I feel that this is so much different from free conversation that I cannot even comprehend.

Dude, I have said twice that I think chess-turing-test is in a different class from conversation-turing-test.

on 09/17/10 at 11:38:43, Fritzlein wrote:

Is your objection to my post that I forgot to add the words "For Chess"? If so, then 99% of the disagreement between us was due to my being inattentive.

on 09/17/10 at 13:31:17, Fritzlein wrote:

As for restricting the medium of interaction to a game rather than conversation, I agree with you that it shouldn't be called the Turing Test. That was my mistake in my first post in this thread. I should use some term like chess-turing-test to distinguish it from the (real) conversation-turing-test.

Let me repeat it a third time. Conversation-turing-test is a different animal from chess-turing-test. We agree about that.

Perhaps I was making two many distinctions at the same time. I thought that, having cleared up that turing test involving a game is very different from turing test involving conversation, I could also say that turing test involving normal conversation is the same spirit as turing test involving conversation specifically targeted to ferret out machine impostors.

Title: Re: A Turing Test
Post by clojure on Sep 17^th, 2010, 5:13pm

Haha, sorry.

It is probably because I'm trying to argue that the "Turing-test" in "chess-turing-test" should not belong to there -- at all.

I must have misunderstood what you were trying to persuade. I'm not sure why you want to stick with "Turing-test" label. Now that I thought about it, it must be because there is no simple alternative name that would be as appealing.

Yet again, I apologize.

Title: Re: A Turing Test
Post by clojure on Sep 17^th, 2010, 5:16pm

Maybe you were already thinking that I'm a computer, since I was not learning anything! ;D

Title: Re: A Turing Test
Post by Fritzlein on Sep 17^th, 2010, 6:14pm

on 09/17/10 at 17:13:58, clojure wrote:

It is probably because I'm trying to argue that the "Turing-test" in "chess-turing-test" should not belong to there -- at all.

OK, fair enough :)

Title: Re: A Turing Test
Post by Rednaxela on Sep 17^th, 2010, 6:38pm

on 09/17/10 at 16:50:05, Fritzlein wrote:

The surest sign of a computer, but one that would take longer to manifest, is a lack of learning, as several have mentioned. Humans try creative new things and bots don't. In particular, humans will pick up ideas you use against them and try to use those ideas right back at you. I don't know how a developer could hide that difference at present.

While it's perhaps currently the surest sign, I don't think it has to be the case. I can think of a variety of ways to make a bot adaptive, learning from their own games or from trends in the gameroom. I'd put ways to do it into two main categories:
1) Opponent modeling
2) "Creative" forward pruning

The idea of #1 is not that radical relatively speaking, but the bot would be less deterministic. It could possibly seem slightly less bot-like if it starts to learn what sort of trades Fritz prefers, for instance. I don't expect #1 to have that big an effect on making it seem less bot-like though, because the adapting wouldn't really be "creative" at all.

For #2, the idea is to do a somewhat aggressively forward-pruned search, but to have it so firstly it'll at random choose branches not ranked so highly by the heuristic, and secondly tweak the heuristic for it adapt over time to tend towards the more successful branches, particularly when they are the branches it only visited by random chance. This would perhaps give a "creative" appearance in it's behavior. Of course, all that said, I suspect it would be extremely difficult to make this work without decreasing strength. Perhaps this would be good as a way to make a bot seem more like a novice human though?

So in summary, I think there are ways to make the bot learn or appear to learn, but they're not without complications and tradeoffs.

Title: Re: A Turing Test
Post by Nombril on Sep 18^th, 2010, 8:28am

Is there was any interest in trying out an experiment? Some possible guidelines:
1. We would want to replay the games manually, in a way that didn't have a timestamp on the moves.
2. Would it be fair to filter the games? Removing the game from the data set or just truncating the game after the position is won. This would avoid obvious tip offs, and let us examine playing style/philosophy:
a. Bots just shuffling pieces without changing the position by 2 of the moves
b. humans suiciding pieces with extra steps after goal
c. bot bashing exercises (going for elimination when goal is possible, etc.)
d. One move blunders, leaving pieces hanging, etc.
3. Include some BB games and HH games?
4. Player strength should be close (50 points?)
5. Player strength should be over a certain level? (Or maybe not - I wonder if it is easier or harder to tell the difference at lower levels?)

on 09/17/10 at 16:50:05, Fritzlein wrote:

... the tipoff I would notice first is strong tactics relative to weak strategy.

I think I disagree with you on this. How many times have you been commenting on a game, and pointed out that one player has made a nice tactical move that helps gain immediate material, but was going to loose an important strategic battle?

on 09/17/10 at 16:50:05, Fritzlein wrote:

But I couldn't easily pin down Nombril on the basis of his moves.

Funny, I've had others tell my that I have a very unique playing style, but I have no idea how (or if) I do play differently.