I worked at SRI on the CALO project, and built prototypes of the system that was spun off into SIRI. The system uses a simple semantic task model to map language to actions. There is no deep parsing - the model does simple keyword matching and slot filling, and it turns out that with some clever engineering, this is enough to make a very compelling system. It is great to see it launch as a built-in feature on the iPhone.
> Virtually everything we call “AI” today is either a theatrical display of essentially scripted behavior (that’s how most game AI works), a massive database (such as Google Suggestions and expert systems) or a vague and decidedly unintelligent jumble of neural networks and genetic algorithms.
Okay, I can understand the first two being less attractive, but evolutionary neural networks-- really? I mean, what more do you want than an artificial brain created through simulated evolution? What characteristics would satisfy people like this that something is "AI" [1]? It seems a bit like the old saying that once it's discovered, it's no longer AI.
[1] It's worth noting that I'm strongly biased since I work in the Neural Network Research Group at UT Austin.
I rather dislike ANNs. I recognize that ANNs perform well and find research to improve their performance interesting, but honestly they're completely opaque black boxes with structure that only plausibly mimics the absolute simplest of biological neural networks.
I want a "brain" which was built from principles that allow its workings to be intelligible and efficient based on a technique which reflects the structure of the problem of learning, generalization, and hypothesis search and illuminates it.
So I'll buy (and often use) your ANNs, but I can't help but feel a little bit worried about the state of affairs. It's probably a whole lot like that automatically enumerated proof of the 4-color theorem. It's definitely important, but there is something unsatisfying about it. We aren't learning as much as we felt entitled to because the destination was only a small part of the journey.
> they're completely opaque black boxes with structure that only plausibly mimics the absolute simplest of biological neural networks
I guess this depends on how advanced your toolkit is. If you're simply using feedforward nets with fixed weights, fixed topology, and sigmoid activation, then yes that's probably right.
However, there are a lot of advanced techniques in ANNs that most people never use. Lots of more biologically-inspired approaches have been explored, such as leaky integrator neurons, Hebbian learning, and indirect encodings.
> I want a "brain" which was built from principles
See, but the problem is that we don't even have the knowledge of how real brains work to start forming such principals. In fact, we're starting to see computational models used to form principals of biological brains (for example, [1]).
> that allow its workings to be intelligible
And what does that mean? You want to be able to discern some rule base from the system that you can understand? Even biological brains do not have that property.
> based on a technique which reflects the structure of the problem of learning, generalization, and hypothesis search and illuminates it.
This is very vague. Happy to dive into it if you are willing to expound a bit.
> It's probably a whole lot like that automatically enumerated proof of the 4-color theorem. It's definitely important, but there is something unsatisfying about it. We aren't learning as much as we felt entitled to because the destination was only a small part of the journey.
Again, very vague. Do you think having children is unsatisfying because you can't understand their actions most of the time? Maybe I'm misunderstanding the issue here.
Seriously though, I want to know what will impress people here. It's not going to be a fully general, "human-level" AI, obviously, as that requires huge amounts of semantic knowledge about the world that was encoded through billions of years of evolution, but that to me is not really a necessary condition for AI.
Thank you for your solid reply. I think ANN is the real thing, and most of the other "AI" solutions that we have give the impression of intelligence, but are simply algorithms that solve problems we have in very well defined ways.
ANN is a general intelligence that gets shaped by experience and adapts figure out its own solutions. This is the path that will lead to our concept of AI presented in sci-fi and what the general public actually thinks of. It isn't some coder banging on the keyboard trying to replicate what a human would do under a specific situation, it is something that truly learns.
We have a long way to go before this becomes a reality, but it will happen.
That's very romantic, but I think instead we're going to learn that intelligence is not such a well-defined idea as to be engineered into anything imagined by science fiction.
Further, there are plenty of techniques which aren't ANNs but still are "shaped by experience" and "figure out their own solutions".
I side pretty strongly with the idea of building tools to bring the powerful modes of computerized learning close to the powerful modes of human learning. I think we're inordinately far away from replicating one with the other.
> I think ANN is the real thing, and most of the other "AI" solutions that we have give the impression of intelligence, but are simply algorithms that solve problems we have in very well defined ways...ANN is a general intelligence that gets shaped by experience and adapts figure out its own solutions...It isn't some coder banging on the keyboard trying to replicate what a human would do under a specific situation, it is something that truly learns.
It's not ANNs on the one hand and "some coder banging on the keyboard trying to replicate what a human would do" on the other.
Supervised learning is the most common use case for an ANN, and means learning from labelled training data during a training phase and then not changing; it can be done with a number of techniques, including Support Vector Machines, Decision Trees, Bayesian methods, and so on.
Reinforcement Learning on the other hand, is continuous learning from trying things and making mistakes -- experience, in other words. It too can be done a number of ways, like with Evolutionary Algorithms, Markov Decision Processes, Inductive Logic Programming, etc. Hell, PG's spam classifier learns from experience, and it's nothing more than a Naive Bayes classifier ( http://www.paulgraham.com/spam.html ).
Without going on for much longer, my point is that there really is no reason to exalt ANNs the way you have.
Tansey wrote a good, solid reply, but I dislike worshipping at the altar of mysteriousness. When a human or better GAI finally gets built, it won't be entirely hand-coded (although the seed for it might be). But the better we understand how our AI works, the better we can make sure it's doing what we want it to; and not, say, crashing the stock market with millions of erroneous transactions carried out too fast for humans to correct.
Apologies for the vagueness there. Also apologies for the lack of structure following. I don't have time to write an essay, so I hope you can settle for some meandering commentary.
While the more complex topologies, activation functions, training algorithms are interesting, the current workhorse of ANNs are still plain MLPs (to my knowledge). So, yeah, I was referring mostly to MLPs.
The argument I'm doing a pale job voicing is the one that exists between the statistics and ML communities. For those unfamiliar, stat people generally demand comprehensible, general statements about the performance and meaning of the various techniques. This has historically largely restricted the power of statistical techniques. Advances in computing expanded in the ML field because people aimed to solve simpler goals (maximize training and testing predictive accuracy, ignoring why or how it works). This was a major force in the current renaissance in learning technologies, but leaves a big divided camp.
One side demands theory, story, and proof. The other just demands results. Which is great. They complement each other.
---
ANNs (MLPs in particular) bother me because they're essentially general non-linear discrimination boundary learners. Once you have that decision boundary you can (with the usual enormous concerns toward overfitting) predict new data with arbitrary accuracy given enough compute power, large enough layers, and enough training data. But even with all of those things you won't learn much from your parameter space.
Compare this to, for instance, LDA. It's hard to be certain that the usual high-level interpretation of LDA in data mining (that it finds mixtures of topics on documents) is terribly meaningful, but you still are able to learn a lot about the data space by examining the topic space. It's the kind of thing that allows for many interaction points between the algorithm and the users.
MLPs of course will also induce a latent representation on the hidden layer. There are really fascinating implementations of non-linear PCA that take this approach, but it's not clear to me what the properties of this latent representation are or how to influence them.
---
In some sense, the heart of my argument comes from how flexibility can hurt you in a learning algorithm: it makes your parameter space less meaningful (though I imagine mutable topologies can help this a lot — I have a little experience doing grid search over recurrent nodes and attempting to rank signals by their temporal information using the resultant best-performing topology and I'm sure that rabbit hole goes far deeper).
I'm not looking to be awed by a rule based system or GAI (whatever the hell that even means). I want hypothesis spaces which allow you to flexibly describe a question about a large class of data and then learn things from the optimum. I also want these algorithms to solve real world problems. Bayesian modeling gives you a lot of the first one. MLPs give you a lot of the second.
> I guess this depends on how advanced your toolkit is.
I'm assuming you mean this to counter the "simplest of biological neural networks" part, rather than the "completely opaque black boxes" part. Nonetheless, I think it is fair to say that ANNs still fall far short of the complexity of even the simplest biological brains. For example, even the nervous system of a Hydra features neurotransmitters [1] (as opposed to just heterogeneous signals of ANNs) and brains are affected by their own electric fields [2].
> You want to be able to discern some rule base from the system that you can understand? Even biological brains do not have that property.
More generally, transparency is important to be able to understand the inductive bias you've built up, and to try to alter or refine it in useful ways.
Also, the fact that our brains lack transparency doesn't justify leaving it out of an AI system, nor does it demonstrate the difficulty of building a transparent AI -- nature just had no drive towards transparency. Plus we (humans) can introspect and explain our reasoning.
> See, but the problem is that we don't even have the knowledge of how real brains work to start forming such principals
> based on a technique which reflects the structure of the problem of learning, generalization, and hypothesis search and illuminates it.
This is only the case if we're speaking about biological brains, and not as a generic word to mean "intelligent system". In the latter case, we do in fact, have quite a bit of knowledge about such principles from reasoning about hypothesis spaces. From where we get things like active learning, or Solomonoff's work on Universal Induction [3].
> It's not going to be a fully general, "human-level" AI, obviously, as that requires huge amounts of semantic knowledge about the world that was encoded through billions of years of evolution
By "semantic knowledge" do you mean inductive bias? Because otherwise I'm at a loss. I don't believe that the picture for Artificial General Intelligence is as bleak as you make it sound though.
As an aside, you mention "Lots of more biologically-inspired approaches have been explored". Do you know of any projects looking to mine the structure of various parts of brains to figure out the sorts of inductive bias those structures correspond to? (As opposed to just copying structure) What I mean, is that presumably if a part of the brain heavily involved in recognizing faces has a unique structure/wiring, that structure is optimized such that it performs well on face recognition -- and correspondingly poorly on something else, per no free lunch -- and that optimization should tell us something about the nature of recognizing faces. Sort of in the same way that the use of a Naive Bayesian classifier rather than a Bayesian Net might tell you that the classifier is optimized for cases where the variables are independent.
That is the entire machine learning field. Very well designed black boxes which you train to classify or estimate your desired quantities based on you input data. It might not be true "AI" but it gets a lot of practical work done.
a vague and decidedly unintelligent jumble of neural networks and genetic algorithms.
Not at all true. More like an unintelligible jumble of Bayesian learning, Markov fields, structured regression, support vector machines, and other statistical models over large volumes of data.
I don't think the writer was criticizing neural nets and genetic algorithms as an approach, just observing that what they have produced so far is "decidedly unintelligent".
I don't know, I find these the least appealing of any kinds of test for intelligence. On the fly problem solving? Adaptation to new situations? To me, ravens using simple tools to pick up shiny things seem more intelligent than the kind of machine displayed on Jeopardy or at chess matches.
OK, so, just got me a shiny new 4S and have been playing with Siri.
It quickly became apparent that it is "limited" to this list - but that's OK. It does it very well - it recognised all of my family members and workmates with 95% (or more) accuracy, and without any training.
OK so it is a modest start - but a quality one, and the options are useful.
The thing that blows me away though is the speech to text. It's very accurate, as good as my 2 year old trained Dragon Naturally Speaking install. Sadly this is texts only at the moment, but I imagine we will see it in email eventually (hope so anyway).
Dictating texts is great; I just sent a few instead of emails because it is so frickin quick :)
Quality product.
(as to how it knows about stuff like "my wife" - it asks you first time.)
EDIT: I've just found out I can dictate email with it... awesome! but the icon seems to be odd in whether it appears or not (if not going through siri direct. Clicking reply on an existing email doesn't seem to give me the option.
This is a good start, but it's all analysis from the outside looking in. The author is probably right about the overall structure, but there are already a few things that have come out since he posted this (it's dated last week, probably right before Steve Jobs' passing): Relationships, such as "my wife" are not actually something you type into new fields in the address book. You actually tell Siri yourself, in what appears to be a sort of natural-language @define statement.
Once a much larger audience has been playing with Siri for a few weeks, we should start to get a much clearer picture.
Every time someone analyzes an AI system, they invariably conclude that it isn't really AI, but rather just a complicated system of different, strung-together technologies.
At what point do systems of sophisticated text-to-speech and grammar analysis technologies actually become AI?
"In a step-by-step process, Minsky constructs a model of human intelligence which is built up from the interactions of simple parts called agents, which are themselves mindless. He describes the postulated interactions as constituting a "society of mind", hence the title."
I don't think that consciousness or intelligence is merely binary. There can be and are varying degrees of intelligence that show up all over nature. Dolphins and primates are easy to point to as consciousness that probably is most similar to our own. Dogs, cats, wolves; they all have varying degrees of what we consider intelligence.
To take this even further i'll pose this thought-adventure. Is a single ant conscious? Is a single bee conscious? Hive systems at least appear to have a kind of intelligence. There are termite colonies that have reasonably sophisticated air conditioning systems. Are the chemical and physical messages sent back and forth between insect agents all that different from the communication between subsystems in our brains or the individual components of siri's architecture?
All of this is very theoretical, but at least a fine thought adventure for a friday morning.
> but rather just a complicated system of different, strung-together technologies.
Sounds like the human brain to me. Ancient parts connected to more modern parts. Modules for specific tasks. Not any kind of monolithic entity, but just this mish-mash of elements that kinda sorta works, especially if we ignore the unpatched bugs of irrationality, superstition, mental illness, common logical fallacies, emotional reasoning, etc.
The one place where AI actually is a term of art -- gaming -- is also uniquely distinguished in that for agents both human- and computer-controlled, the goals are definite and the number of choices of action limited.
Out here in the real world, things are a lot more fuzzy and complicated. Accordingly we are not willing to give a bot AI status until it can demonstrate competence at "real world" goals (presumably including having relationships with other sapient beings in the world).
I'm afraid I can't provide a definitive definition, I'm not even sure if there are one that everybody would readily agree upon. But what I had in mind was an understanding of intentions or meaning from context rather than having to explicitly be told what to do.
"You insist that there is something a machine cannot do. If you will tell me precisely what it is that a machine cannot do, then I can always make a machine which will do just that!"
John von Neumann, 1948
"In 2007, SRI [1] spun off Siri, Inc. Siri was born from SRI's work on the DARPA-funded CALO [2] project, described by SRI as the largest artificial intelligence project ever launched. Siri was acquired by Apple in 2010."
Bullshit. AI works everyday and VERY well. AI does a fantastic job tuning SQL Queries. A* does a fantastic job finding reasonable routes between destinations. AI finds optimizations to massively speed up software. AI lays out the very circuitry of most of the ICs in your computer. AI routes the packets between you and HN. It seems that as soon as something in AI works it becomes not AI.
I guarantee you that when we find out exactly how a neuron and neural networks work the human brain will be equally unimpressive. Each person's brain is the product of an uninterrupted chain of evolution stretching back 4 billion years. The field of AI has been around for about 60 years, sorry it hasn't quite managed to best the human brain in every task imaginable.
Also, if one looks at "natural intelligence" they can see some pretty stupid things going on. Losing spaceshuttles due to metric / SAE conversion, wtf?
Ask the average person to name the country that lies between Iraq and Afganistan.
Wolfram Alpha doesn't actually answer the question that was asked. Instead it answers two questions (1) what countries share borders with Iraq; (2) what countries share borders with Afghanistan. The final step (find the intersection of those sets) is a task that I would expect a computer to be better at than a human, but that is the step that Wolfram Alpha misses out.
But how did Siri learn who Scott’s wife was? The demo didn’t show us, but I have a suspicion about how it works.
The Mac Address Book has long had an entry for setting up relationships between contacts. I can indicate who my spouse is in Address Book. I suspect that the iPhone Contacts app will gain similar new fields in iOS 5.
I haven't used Siri so I don't know whether it'll use those fields, but I do know that if it didn't know who Scott's wife was, it'd have asked him and then remembered his answer.
I also don't know (but would like to!) whether it does said remembering by updating the contact.
Edit: seen on Twitter, "Siri doesn't just remember your mother, father, and other significant people, it adds them to your contact details." So: it's not a separate store.
But none of this is how it works. I was expecting to read about the company they acquired and how they set about integrating and negotiating arrangements with service providers.
When Siri was first available, there were almost no underpinnings on it would be the next big thing or how important it would be. Suddenly, after this general release on a limited hardware, we suddenly find it the next big thing.
When it was first released, a lot of people said that it was the next best thing. Take a look at Robert Scoble's initial coverage. The only problem was that, since it was an app, I don't think they got the level of attention that they were due. If you used it back then, it really did feel magical.
Now that Apple acquired them, they can use their platform to really promote it and get it on everyone's device by default, which will help a lot more people to find and use it.
I guess most of us here in hacker news would have known Siri (from Sri) from its first public release. Many of us here also would have watched that Youtube video describing this amazing combination of technology. I guess I didn't feel any magic comes from the fact that I am not living in the US (so the app means nothing to me)
It's because it's integrated with more things in a more useful and immediate way, taking it over the threshold of (just) being a geektoy and into being something useful. In standalone app form it wasn't considerably faster/easier/"lighter"/more useful to use for much of anything. Now, some of the most tedious tasks one uses (or would consider using, if the tasks were less tedious) a phone for aren't tedious anymore.
Makes sense to me. Most people aren't seeking out software that is a radical departure to what they're used to. They are only interested when it becomes popular. Apple does a great job popularizing technologies and bringing them to the masses.
>>> Just think how much fun it will be when I say, “Send a text to Andrea that says ‘I love you,’” and Siri hears, “Send a text to Andrew that says ‘I love you.’” I look forward to seeing how reliable it really is.
If some threshold of certainty isn't met, Siri will certainly ask "Did you mean Andrew or Andrea". Doesn't seem like a really hard problem.
I think this is all going to hinge on weather Apple can match the speech recognition bar that Google has set. Without that (like the article mentions) all the AI stuff doesn't matter. Voice Actions can't use arbitrary grammer, and it's not as pretty as Apple's solution, or as integrated, but for dictation, it's pretty good; honed by years of Google 411 and who knows what other masses of input. Is Apple using a datacenter with petabytes of voice training, or is it Dragon in the cloud? I've got an Android phone now, and as soon as I get my hands on a 4S I'm going to put them through their paces dictating text and compare. From the reviews, it looks like Apple has at least matched Google in dictation, which means that Siri has a huge lead over Voice Actions due to all the other improvements.
I've always been curious, regarding "speaking with a Southern drawl, with a stuffed-up nose from a bad cold" ... are there non-native English speakers here who have had particular difficulty using speech-to-text because of an accent? Have the British, Irish, Scottish, Australians, Canadians, Jamaicans, or South Africans had trouble because of their dialect?
Speech-to-text sounds like it will be a constantly tough problem. Even humans aren't 100% accurate, or even 99% accurate, depending on the circumstances.
If we created an AI that was actually intelligent enough to understand all of our queries and sort through our junk to find the answers we need, it probably wouldn't want to do it.
Siri is a simple application with constrained functionality that uses existing technology, all dressed up to make it seem much smarter than it really is.
Honestly, if you wrote Siri as a text line parser, would anyone be impressed by all the fancy features? It's more that current voice response systems are so incredibly primitive.
I think most people would be impressed if they could just type in what they wanted as an English sentence. Most users are familiar with GUIs, and may have heard of command line interfaces, but an English interface would be new. And maybe a plain English typed interface would even be useful. Some percentage of people type full English sentences into Google.
Ok, I know it has entered the popular culture and become a myth, and thus fighting it is kinda pointless, but the newton was not a failure, neither technologically or from a business perspective.
First off, Technologically. It worked. I have terrible handwriting, and the newton I had was the 100- the original message pad. It could recognize my handwriting, just fine, well north of %90 of the time. When it got a word wrong, it was trivial to correct. I was able to write on it at the same speed that I would be writing on a pen and paper. I could increase this recognition rate to %100 by altering the way I wrote, if I wanted to, but it was more convenient to write my normal way.
Further, this was all done on an ARM RISC 610 CPU running at 20Mhz with 640kb of RAM!
It was "technology from the future", and unfortunately it came at a time when, at least in america, "computer literacy" was a big thing. Many people weren't really computer literate, and having a computer in the home was not rare, but not exactly common yet.
Secondly, the Newton was not a failure in the marketplace. The device found wide adoption in industry where its unique features provided exceptional value. In places where a Palm could compete (because the newton's superior technology wasn't significantly more valuable) it didn't do as well... but that doesn't mean it was a failure as a product or was losing money. (Interestingly, NeXT, widely also regarded as a "failure" was also profitable when Apple bought them.)
The newton was building momentum and was about to be spun out in an IPO as Newton Inc, when Steve Jobs returned to Apple. I don't know the reasons that Steve Jobs killed the IPO and the product. It seems to me that letting it spin out would have been the wiser move.
I understand why it was popular for doonsbury to make fun of it, and the simpsons, etc. It exhibited a kind of hubris. Handwriting recognition? At a time when people are barely grasping the value of a computer to begin with?
I think the lesson to learn with AI like products is to make sure that the target market is receptive to the idea.
I hope people are now more receptive to Siri than they were to Newton.
I think Newton was both a failure (in its original goals) and a success (in the more realistic industry specific use). It did not light the world on fire, and the average consumer did not care for it and was not interested in buying it. So I wouldn't really agree with a blanket "it was not a failure", especially when part of the argument is "it was catching on right when people who knew the financials killed it."
Teas Willis, and the sticky tours
Did gym and Gibbs in the wake.
All mimes were the borrowers,
And the moderate Belgrade.
"Beware the tablespoon my son,
The teeth that bite, the Claus that catch.
Beware the Subjects bird, and shred
The serious Bandwidth!"
He took his Verbal sword in hand:
Long time the monitors fog he sought,
So rested he by the Tumbled tree,
And stood a while in thought.
And as in selfish thought he stood,
The tablespoon, with eyes of Flame,
Came stifling through the trigger wood,
And troubled as it came!
One, two! One, two! And through and though,
The Verbal blade went thicker shade.
He left it dead, and with its head,
He went gambling back.
"And host Thai slash the tablespoon?
Come to my arms my bearish boy.
Oh various day! Cartoon! Cathay!"
He charted in his joy.
Teas Willis, and the sticky tours
Did gym and Gibbs in the wake.
All mimes were the borrowers,
And the moderate Belgrade.
OK, I get your point, but transcribing e.e. cummings poems is kind of a corner case for voice recognition, no?
What Siri is going to test is whether the 80/20 rule applies to a voice recognition based personal assistant. By constraining it to assistant-type tasks, and with what seems to be the most intelligent design we've seen yet, it has a better shot of achieving it than anything else out there. Previous entrants have been tripped up by the lack of a decent UX, failure to integrate with other data sources, or any number of shortcomings in the long chain from microphone to software back to speaker. Apple is the first company to have such precise control over every component in that chain. (For instance, I wouldn't be surprised if part of the reason Siri is 4S-only is because hardware has been added for smarter noise or echo cancellation, or other subtle design changes.) And the fact that it can learn from the internet, and presumably submit its training data back to the cloud, means that maybe it will eventually be able to handle not just 80%, but 90% or 99% of total inputs.
This actually makes for good reading! (Although I do like the original better)
But this certainly shows Siri's got potential, it didn't miss any normal words now, did it?
The NLP approach is based on work at Dejima, an NLP startup: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.70.5...
A lot of the work is grounded in Adam Cheyer's (CTO of SIRI) work on the Open Agent Architecture: http://www.ai.sri.com/~oaa/
A more recent publication from Adam and Didier Guzzoni on the Active architecture, which is probably the closest you'll come to a public explanation of how SIRI works: https://www.aaai.org/Papers/Symposia/Spring/2007/SS-07-04/SS...