To say that the sequence is computable means that you must present a Turing machine, call it T, that will produce BB(n) in finitely many steps after taking input n. Our specific Turing machine T will have a fixed number of rules call it N rules, then by the definition of the Busy Beaver T(N+1) <= BB(N) = T(N) but BB(N+1) > BB(N) so T(N+1) does not compute BB(N+1). The flaw in your inductive proof is that you have shown there is a Turing machine that can compute BB(N) for each N, but you haven't shown that the same Turing machine can compute all numbers in the sequence.
And just to be clear here, this isn't just because it can't recognize which numbers are the Busy Beavers as it arrives at them while summing ones indefinitely? If T(N) tries to compute BB(N) by summing ones then, since BB(N) is finite, then it just fails to halt? But then there would be finite numbers that can't be described as the sum of one from 0 to BB(N)? It's just hard to reconcile the concept of indefinite-but-finite running time, all numbers being able to be expressed as a sum of ones, and the BB sequence still not being computable.
Edit: Ah wait, I get it now. Misunderstanding was in confusing "Turing machine" with "universal Turing machine." Any given T(N) will have a finite number of steps it can complete before halting, and since a universal Turing machine or equivalent (e.g. rule 110) can emulate any arbitrary Turing machine, I was modeling it as just letting an arbitrary UTM run forever. But the UTM is just emulating a specific T(N) each time, and you'd have to keep going up emulating different machines for each N to get to each next BB without running out of steps. So you can write programs to calculate arbitrary BBs indefinitely, but no single program to calculate them all. Hence the infinite sequence is noncomputable despite any finite length of it being computable. Thanks for the point in the right direction.
Employers are probably unconvinced that younger applicants will be able to make them more profits. This is caused in part by the uselessness (for businesses) of the work students do in high-school and college--this is a group of people whose life has primarily consisted in consuming profits. Employers are also probably more risk-averse because the recession means that profit margins are tighter (if existent at all).
Most commenters are focusing on relatively high level features of decoding speech. It is important to also be aware that there is still great debate about what are the acoustic correlates of linguistic events in speech. It seems that our words are composed of subunits (usually taken to be phones out of the IPA--but there's work on alternatives) but what exact acoustics correspond to the phones are is still unsettled: lots of debate and mediocre recognition performance
Undoubtedly there is much room for improvement on these higher-level features but computers are still well behind humans in large vocabulary isolated keyword spotting: this is a task where one word from a very large corpus of words is spoken and the human or computer has to guess what that word was. Computers do poorly relative to humans (particularly in noise), which suggests that many of the mistakes that computers make is in not being able to interpret the acoustics correctly.
And to make this problem worse, there's a great deal of variation among different dialects of the same language, which means that there's no 1:1 mapping of IPA phones to written letters.
It's not actually clear what we hear at this point, there is evidence that we respond to something like frequencies, pitch, volume, and other things. But the jury is still out on the low-level signal processing that is occurring in the cochlea and the primary auditory cortex. What's happening further downstream in the brain is even less clear.
Actually most research effort in speech is more on the language side rather than the signal processing of the speech signal. So I think many people have a similar intuition as yourself.
Bear in mind though, that humans significantly outperform machines in tasks where isolated or streams of non-sense syllables are said: i.e.
"badagaka" is said and humans can pick out the syllables whereas computers can have a lot of difficulty (in noise in particular).
Computers start approaching human performance most when there is a lot of linguistic context to an utterance. So it appears that humans are doing something other than using semantics.
Good points, but I think we underestimate how much situational context humans use when they interpret language. Sometimes we can communicate with very little language simply because we know what the purpose of the interaction is.
Another thing I keep wondering about is why so little emphasis is put on dialog. When humans don't understand something, they ask, or offer an interpretation and ask whether it's the right one.
Speech recognition systems don't seem to do that. They say "Sorry, I could not understand what you said. Please repeat". That's not very helpful for the computer of course. It should say: "Huh, Peas? Why would anyone rest in peas for heaven's sake??". Then the human could sharpen his SS and say "PeaCCCEE!!! not peas. I'm not talking about food, I'm talking about dying!".
Context is huge for human interpretation. If you've ever have someone address you in a different language than you were expecting, you know what I mean. It's almost like you can imagine the search just going deeper and deeper without finding anything that makes sense until it swaps in the other language and go: Ah, you said "good morning"! :-)
It is true that humans do use situational context. In the cases where semantics is important and complex for understanding an utterance a computer will fail even more because it won't get the semantics or the speech signal.
On the topic of dialog, this is arguably the area that speech recognition has gained in over the last nine years. Prior to 2001 there were not many usable dialog systems and (depending on your definition of "usable") there are many usable dialog systems deployed in call centers around the world.
Most call center dialog systems have a rudimentary system asking for people to repeat things when it doesn't understand. Although, if it asks more than once the callers tend to get very angry.
It shouldn't interrupt you once every 5 words of course. What it should try to do is to create a model of what you meant to say. At some point, if the system is unsure, it should ask you to confirm or correct what it has understood so far.
He didn't make any advance that has made its way to a full word recognizer, he's merely recognizing phonemes (which are linguistic subunits of words) several researchers in the field have criticized his methods. Additionally, none of the top five phoneme recognizers have ever been deployed as a word recognizer, and there is little chance that they even will be in the next few years.
The concept of phonemes isn't undisputed either. When analyzing actual speech it becomes clear that there are no real steady states, but much coarticulation between the "segments". Of course, part of it could be attributed to the fact that speech sounds are produced by articulatory gestures, which necessarily overlap in time. On the other hand, these coarticulation patterns are not language-independent. So, a purely (articulatory/auditory) phonetical explanation of why these differences exists is rather unlikely..
I know this seems rather off-topic with regard to speech recognition, but the question of the basic building blocks of language is kind of at the heart of the problem.
I agree that its at the heart of it (and I'm presently writing a paper where I'm using articulatory-phonetic features rather than phonemes). Unfortunately, there is no large-vocabulary speech recognizer that uses articulatory phonetics (yet!). Every large scale speech recognizer and most small scale use phonemes and are trained using speech that has been transcribed into phonemes. There is almost no data that is annotated with articulatory phonetics (a problem I'm working on right now).
I guess that's in part because it's even more difficult to (manually) transcribe speech into articulatory-phonetic elements based on the acoustic signal (laryngeal gestures?? Clearly they are there in articulation, but their acoustic correlates are masked to some extent).
Automatic alignment methods are probably quite hard to implement, given the various coarticulation patterns in the signal depending on context/prosodic position etc.
Could you provide a link to papers or other materials dealing with articulatory features in speech recognition?
I guess I should take another look at Browman/Goldstein's Articulatory Phonology
I'm not any sort of expert on the neural network literature but these were some papers in the last three years that caught my eye, Yann LeCun also does work on neural nets, but I haven't been all that impressed by his results. One of the main advances has been ways of developing 'deep' architectures with multiple layers rather than the traditional shallow neural networks (arguably the SVM, for instance, is actually a very cleverly trained single layer neural network)
And even the most fancy statistical methods are generally about proving that some mild generalization of linear regression is sufficient to solve a particular problem. Non-linear models often result in a great increases in model complexity which leads to issues of overfitting, computational intractability, and instability.
There really isn't very much available on the practical side. So if you are looking to implement algorithms I suggest that you make use of the machine learning at ocw.mit.edu
Alternatively, if you want a good dose of a theoretical explanation of algorithms currently in use I highly recommend "Pattern Recognition and Machine Learning" by Christopher Bishop. It is definitely the best machine learning (and statistics) textbook that I have ever come across.
I second the nomination of Bishop. It is the standard text. It is only two years old, and Bishop will teach you machine learning the way that the field practices it nowadays. In my lab of fourteen people, we must have six or so copies of Bishop.
I don't understand what is impractical about Bishop. If you are looking blindly to use an off-the-shelf machine learning implementation, that's one thing. Machine Learning has been described as the study of bias. If you want to understand when to pick certain techniques, and develop appropriate biases, then read Bishop.
"The Elements of Statistical Learning" by Hastie, Tibshirani and Friedman gives more of a statistician's approach. The treatment is simply less broad, and also more dated.
You can also look at Andrew Ng's video lectures: http://www.youtube.com/watch?v=UzxYlbK2c7E
He is very well-respected in the field. For certain students, watching a lecture may be preferable to reading a book.
A few other things (sorry, not to snipe too much! :) )
-I'm skeptical of the idea of a single "standard text" in such a fast-moving field. New machine learning techniques appear constantly and are often documented online years before they appear in books. Some computer scientists say they prefer conference proceedings over academic journals because the latter take so long.
-Further, I'm not sure that the goal of any text should be to cover topics X, Y and Z in any case, which doesn't seem possible for a book to do. What does seem feasible is to set up a framework for analyzing the performance of different techniques. So I'd like to hear a comparison of how Bishop does that vs. HTF.
-You're of course correct that HTF takes a statistician's POV on the field - the authors are all professors of statistics at Stanford. They are also accomplished - Friedman was a co-author on CART, for example. I would instead ask the question: what can you get out of the book and the framework it offers?
-I think that part of the framework in machine learning is to think about bias AND variance, and how to trade them off successfully. This is an important part of model selection, for example.
In terms of "datedness," I would point out that the second edition is now out:
This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide'' data (p bigger than n), including multiple testing and false discovery rates.
I'd be interested to hear what you mean by "simply less broad" as compared to Bishop's book, which (from having flipped through it) looks pretty comparable.