Hacker News new | past | comments | ask | show | jobs | submit login

In a previous job I looked into the viability of pen computing, specifically which companies had succeeded commercially with a pen-based interface.

A lot of folks are too young to know this, but there was a time when it was accepted wisdom that much of personal computing would eventually converge to a pen-based interface, with the primary enabling technology being handwriting recognition.

What I found is that computer handwriting recognition can never ever be good enough to satisfy people's expectations. Why? Because no one is good enough to satisfy most people's expectations for handwriting recognition, including other humans, and often even including themselves!

Frustration with reading handwriting is a given in human interaction. It will be a given in computer interaction too--only, people don't feel like they need to be polite to a computer. So they freely get mad about it.

The companies that had succeeded with pen computing were those that had figured out a way to work around interpreting natural handwriting.

One category was companies who are specialized in pen computing as an art interface--Wacom being the primary example at the time. No attempt at handwriting recognition at all.

The other was companies who had tricked people into learning a new kind of handwriting. The primary example is the Palm Pilot's "Graffiti" system of characters. The neat thing about Graffiti is it reversed customer expectations. Because it was a new way of writing, when recognition failed, customers often blamed themselves for not doing it right!

We all know what happened instead of pens: touchscreen keyboards. The pen interface continues to be a niche UI, focused more on art than text.

It's interesting to see the parallels with spoken word interfaces. I don't know what the answer is. I doubt many people would have predicted in 2001 that the answer to pen computing was a touchscreen keyboard without tactile keys. Heck people laughed about it in 2007 when the iPhone came out.

But I won't be surprised if the eventual product that "solves" talking to a computer, wins because it hacks its way around customer expectations in a novel way--rather than delivering a "perfect" voice interface.




> The other was companies who had tricked people into learning a new kind of handwriting [...] It's interesting parallels with spoken word interfaces. I don't know what the answer is.

Well, here's a possibility: we'll meet in the middle. Companies will train humans into learning a new kind of speech which is more easily machine-recognisable. These dialects will be very similar to English but more constrained in syntax and vocabulary; they will verge on being spoken programming languages. They'll bear a resemblence to, and be constructed in similar ways to, jargon and phrasing used in aviation, optimised for clarity and unambiguity over low-quality connections. Throw in optional macros and optimisations which increase expressive power but make it sound like gibberish to the uninitiated. And then kids who grow up with these dialects will be able to voice-activate their devices with an unreal degree of fluency, almost like musical instruments.


Of course, voice assistants are already doing this to us a bit. Once I learn the phrasing that works reliably, I stick with it, even though repeating "Add ___ to shopping list" for every item feels like the equivalent of GOTO statements.

The challenge with learning anything new from our voicebots is the same as it already is: discoverability. I don't have the patience to listen to Siri's canned response when it's done what I've asked it to, and I'm probably not going to listen to it tell me "Next time, you can say xyz and I'll work better."

The easiest way to learn a new language is to be a kid and listen to native speakers converse. Unless Amazon produces a kids' show starring a family of Echos speaking their machine code, I don't see that happening.


I miss the times before voice assistants, when it was expected of user to train a voice interface. That way I could, if needed, negotiate pronunciation with the machine.


Your comment reminded me of an interesting video I saw a while back. The video is a talk [0] by a programmer explaining and demonstrating how he developed and used a voice recognition system to program in emacs, after a significant RSI injury.

The commands and syntax he uses to interact with the system might be a decent approximation of the custom English syntax you suggest.

[0]: https://youtu.be/8SkdfdXWYaI?t=537


If that happened, but went the way of Palm Pilot's "Graffiti" system, what would be the analogue of touchscreen keyboards?


I'd just like to see a voice interface that has any attempt at natural error correction.

We do this in a hundred ways in English, emphasis, meta-words, too many to fold into a system. But with a little user prodding, I think you could cover enough of this ground to still sound almost natural.

  Asst.: Would you like me to read that back to you?
  User: Yes
  A: The click brown ox...
  U: The _quick_ brown _fox_...
  A: Ok. The the quick brown fox jumped over the lazy log. 
  Is that correct?
  U: No.
  A: Which part [is incorrect]?
  U: The the quick
  A: The the quick. What should it be?
  U: The quick.
  A: The quick brown fox jumped over the lazy log. Is that 
  correct?
  U: No. Lazy dog.
  A: Lazy log?
  U: No. Lazy dog.
  A: Spell "log." <Or "Spell" + recording of user's voice saying ambiguous word>
  U: D-O-G.
  A: Ah. The quick brown fox jumped over the lazy dog. Do 
  I have it now?
  U: Yes.
  A: Say 'send' to send message.
Agreed that voice systems are an incredibly hard problem, but I suspect rule-based natural language correction systems would go a long way.


I would absolutely love a system like this. Like having a pet that you teach to sit. It’s so rewarding when they get it right. :)

If people enjoyed this kind of interaction with their computer, it would become common place and they'd likely become so much better at interaction.


I'm not so sure. Maybe I'm naive but I could see imitation and reinforcement learning actually becoming the little bit of magic that is missing to bring the area forward.

From what we know, those techiques play an important roles in how humans themselves go from baby glibberish to actual language fluency - and from what I understand there is already a lot of research ongoing how reinforcement learning algorithms can be used to learn walking cycles and similar things.

So I'd say, if you find a way to read the non-/semiverbal cues humans use to communicate understanding or misunderstanding among themselves and use those as penalties/rewards for computers, you might be on a way to learn some kind of languages that humans and computers can communicate with.

The same goes into the other direction. There is a sector of software where human users not only frequently master highly intricate interfaces, they even do it out of their own motivation and pride themselves on archieving fluency: Computer games. The secret here is a blazing fast feedback loop - which gives back penalty and reward cues in a similar way to reinforcement learning - and a carefully tuned learning curve which starts with simple, easy to learn concepts and gradually becomes more complex.

I would argue, if you combine those two techniques - using penalties and rewards to train the computers and communicate back to the user how well the computer understood you, you might be able to move to a middle-ground representation without it seeming like much effort on the human side.


The first generation Newton was insanely good at recognizing really terrible cursive handwriting and terrible at recognizing tidy or printed handwriting. The second generation Newton was very good at recognizing tidy handwriting (and you could switch back to the old system if you were one of the freaks whose handwriting the old system grokked). The Newton also had really nice correction system, e.g. gestures for changing the case of a single character or word. In my opinion the later Newton generations recognized handwriting as well as you could expect, when it failed you forgave it and could easily correct the errors, and was decently fast on the 192MHz ARM CPUs.

Speech is way, way harder.


I got a two-in-one Windows notebook and I was really, really impressed with the handwriting recognition. It can even read my cursive with like 95% accuracy, which I never imagined. That said, it's still slower than typing.


> ...That said, it's still slower than typing.

Finger-on-nose! It seems to be a question of utility and efficiency rather than primarily an emotional response to effectiveness of a given UI.

I type emails because it's faster. I handwrite class notes because I often use freeform groupings and diagrams that would make using a word processor painful.


On the other hand, learning to type took some effort and I see many people who do not know how to properly type. The case for handwriting might have looked better if you assumed people would not all want to become typists.


Back in the early 2000s, the hybrid models (the ones with a keyboard) became the most popular Windows tablet products. A keyboard is a pretty convenient way to work around the shortcomings of handwriting recognition.

Most owners eventually used them like laptops unless illustrating, or operating in a sort of "personal presentation mode," where folding the screen flat was convenient. For most people, handwriting recognition was a cool trick that didn't get used much (as I suspect it was for you, eventually).

The utility of a handwriting recognition system is ultimately constrained by the method of correcting mistakes. Even a 99% accuracy means that on average you'll be correcting about one word per paragraph, which is enough to feel like a common task. If the only method of correction is via the pen, then the options are basically to select from a set of guesses the system gives you (if it does that), or rewrite the word until it gets recognized.

But if a product has a keyboard, then you can just use the keyboard for corrections. But if you're going to use the keyboard for corrections, why not just use it for writing in the first place...


That's all true, but for a phone the writing input might be worth considering. That might be too optimistic though.


> it's still slower than typing

In alphabetic languages. What about Chinese, or Japanese?


There is software that allows you to draw the characters on a touch screen, but it's still slower than typing and most people either type the words phonetically (e.g. in Pinyin) or use something like Cangjie which is based on the shape of character components. Shape-based methods are harder to learn but can result in faster input.

For Japanese there are keyboard-based romaji and kana based input methods.

The correct Hanzi/Kanji is either inferred from context or, as a back-up, selected by the user.


Almost nobody uses kana input in Japanese. There are specialized systems for newspaper editors, stenographers, etc., that are just as fast as English-language equivalents, but learning them is too difficult for most people to bother.


Huh? The standard Japanese keyboard layout is Hiragana-based, and every touchscreen Japanese keyboard I've used defaults to 9key kana input. Do people really normally use the romanji IME instead?


Yes. Despite the fact that kana are printed on the keys, pretty much no Japanese person who is not very elderly uses kana input. They use the romaji (no N) input instead, and that's the default setting for keyboards on Japanese PCs.

You are right that the ten-key method is more common on cell phones (and older feature phones had kana assigned to number keys), although romaji input is also available.


Typing in Japanese is about half as fast as English if you're good. Still better than handwriting.


On the other hand, typing on a smartphone is faster in Japanese. (If you use the 10-key input method.)


Not sure about that. I guess it depends on how good the prediction is for your text, but in general the big problem is that the step where you select what character you actually meant takes time.


I've always thought it'd be nice to be able to code at the speed of thought - you can often envisage the code or classes that'll make up part of a solution, it's just so slow filtering that through your fingers and the keyboard; if the neural interface 'thing' could be solved that'd be a winner I think, both for coding and manipulating the 'UI'.

It seems within the bounds of possibility to me (from having no experience in the field ;), if patterns could be predictably matched in real-time or better time than it takes to whack the keyboard or move the mouse (can EEG or FMRI make predictable sort of patterns like that?).


I can't envision code or classes hardly at all. Most of my thoughts are so nebulous it takes my full concentration just to write them down.

I doubt we're going to make much progress there anytime soon.


>I've always thought it'd be nice to be able to code at the speed of thought - you can often envisage the code or classes that'll make up part of a solution,

In my experience that 'solution' isn't a solution but rather a poorly guessed attempt at one single part of it. Whenever I've thought I have the architecture of a class all figured out there is one corner case that destroys all the lovely logic and you either start from scratch on yet another brittle monolith, or accept that you are essentially writing spaghetti code. At that point documentation that explains what does what and why is the only way to 'solve' the issue of coding.


Unfortunately commercial-grade non-invasive EEG headsets can only reliably respond to things like intense emotional changes and the difference between sleep/awake state, but not much more than that. Awhile ago I had some moderate success classifying the difference between a subject watching an emotional short film vs. stand-up comedy. Mostly they are effective at picking up background noise and muscular movement and not much else.

This is based on my experience with the Muse [1] headset, which has 4 electrodes plus a reference electrode. I know there are similar devices with around a dozen or so electrodes, but I can't imagine they're a significant improvement.

So I think we're still a long way off from what you're describing :(

[1] http://www.choosemuse.com


This comment is way too long to justify whatever point you are making. First of all, how long ago do you mean? Are you 100 years old and people in the 1940's thought computers would be operated by pens?

But also, when my college had a demo microsoft tablet in their store in 2003, I could sign my name as I usually do (typical poor male handwriting) and it recognized every letter. That seems pretty good to me.

I also had a Palm Pilot then and Grafitti sucked, but that's just one implementation.


This is my most down-voted comment ever and I have no idea why.


In addition to what novia said about your first sentence, your comment is so weirdly off-base. You have very strong opinions about pen computing, but where in reality are they coming from?

No, we are not talking about the 1940s, why would anyone be talking about the 1940s? How does that make any sense at all?

We aren't talking about a Microsoft demo that worked well for you once, either. We are talking about the fact that, since the 2000s when it was tried repeatedly, people do not like to hand-write on computers. It is a strange thing not to be aware of when you're speaking so emphatically on the topic.


I don't have strong opinions about pen computing, but the comment I replied to was just giving a bad impression of the tech in the early 2000s. It worked quite well actually.

Since he didn't specify the timeframe, I gave him the benefit of the doubt and thought it he was talking about before my time.


> It worked quite well actually.

Apart from the problems mentioned in the comment you replied to. You don't seem to have a point other than being contrarian (as a state).


The commenter said that handwriting recognition didn't work well. But, having tried it myself, I know that's not true. My point is to share that data.


Not sure why you would ask for advice on a downvoted comment, but then just argue against the response.


The first sentence is being seen as combative.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: