Hacker News new | past | comments | ask | show | jobs | submit login
Handwriting Generation with Recurrent Neural Networks (toronto.edu)
266 points by cjdulberger on July 22, 2015 | hide | past | favorite | 80 comments



In both this demonstration and the demonstration by Google of image generation by neural networks (http://www.popsci.com/these-are-what-google-artificial-intel...) you can see hints of organic, naturalistic behavior emerging from artificial networks. Is there any doubt that such networks can simulate even more complex aspects of our cognition? Isn't the writing on the wall (no joke intended) with regard to the Strong AI hypothesis? Or at least a stronger weak hypothesis.


The "organic" behavior you see is replaying the natural statistics of actual handwriting trajectories. I suspect that generative models that "compress & replay" human data in order to mimic humans will lead to more complex AI systems, but capturing the statistics of "internal" processes like cognition and consciousness are still far away (and may require advancements in cognitive neuroscience first).


A lot of results points to consciousness having no input in decision making process at all, it merely receives data and builds narrative to legitimize itself.

Edit:

holy crap I had a revelation (that a lot of people probably already had before). Consciousness is lagging after actual decision making process (sometimes up to 150ms according to some studies). What if consciousness is an additional layer of back propagation mechanism for our brain? Analyse the decision post facto taking bigger picture into account, make corrective action if bad outcome. Its just a small process governing a bunch of primitive NNs.


I can't believe there's so much lag in conscious experience! (I guess that explains the "~100ms for perceived instant response" ui guideline?)

Oh! I see what you're saying. What a crazy neat way to think about consciousness. Are we all error correctors?


Definitely, to my mind. It's not exactly error correction, it's comparison of two results, expectation (from emulation circuitry) vs "reality" as (measured by sensory feedback).

Incidentally, these circuits are chaotic. They are extremely sensitive to initial conditions. These conditions are in part the output of other circuits, which are chaotic themselves. The circuits can be trained to perform better and better, by constantly comparing actual performance to emulated performance. However they will rapidly degrade. This is why even top athletes benefit from warming up before competition: training their circuits to reach maximum performance, because they will have degraded overnight. That degradation may be marginal, but at elite sports level those margins make all the difference. The best athletes are able to train their physical circuitry as well as their emotional and cognitive circuitry.


My personal suspicion:

The central nervous system contains emulators (already suspected by some researchers). Sensory feedback after completion of a goal-driven task is compared against output from an emulator that has been asked to complete the same task, in parallel The emulator runs faster, partly because the circuitry will simply be shorter.

This is suspected in musculoskeletal control, and I believe it may happen outside the musculoskeletal domain too.

Consciousness may only lag behind the emulator, not the "real world" sensory feedback.

This may offer a counter, or adjunct, to the proposal that consciousness is simply the brain experiencing itself.

The lag may be back propagation. It certainly seems to be some sort of computational overhead.


How do you emulate liking someone? or deciding on a number? picking color? This is the type of stuff conciousness lags behind. It doesnt drive the decision process, it is merely informed after the fact and tries to justify it to avoid any potential cognitive dissonance.

-of course I like her because Im into big butts, and 3 is totally my lucky number!


http://www.technologyreview.com/view/518006/how-a-fly-brain-...

I'm not an expert but... Fly brains and human brains already have have neural networks for common tasks, like sensing motion, baked into our genetics. Evolution has already trained these networks to an extent. Training a computer-simulated neural network seems to yield similar results to what nature has done.

My personal uninformed opinion is that we'll start to see how much human experience and expression is driven by our brain wiring. That we'll discover how similar we all are to each other based our wiring. And that we'll find that approximating advanced cognition will be the result of putting that wiring into a computer model -- and having a computer powerful enough to run it.


I think what he means by that is the AI is only trained against the handwriting itself. But the humans are trained against the quality of paper, the sound the pen makes, the stress of an exam and the joy of writing a birthday card. Based on the current training parameters the AI will be oblivious to those hence will not be able to reflect those experiences upon the writing itself. It could be perfectly acceptable for most scenarios (like handwriting fonts we have today), but far from actual human.


Well said. Furthermore, data on the joy of writing a birthday card (to borrow one of your examples) can be useful in other tasks (such as determining what to write).

Typical machine learning problems deal with isolated training sets and isolated problems. This approach seems strange to me; in the case of neural networks, this is somewhat analogous to a newborn child who is deprived of all senses except the limited training data to make up their world, and good/bad feedback from the loss gradient. How can one expect this hypothetical newborn to learn any meaningful representation of the world with which our machine learning problems are derived?

I think the first step towards realizing anything like "Hollywood General AI" will be a system that spends an early portion of its existence ingesting a universe of contextual data, before it is presented with a problem to solve (at which point it can make use of seemingly unrelated information to do something like handwriting). Andrew Ng's work on self-taught learning (built on transfer learning) is particularly relevant here, but I think those ideas could be taken a lot further.


General AI has always been very fascinating and exciting to me. However, recently I feel like it will serve us humans better to continue along the path of ML.

We want an AI trained with ML to keep focusing on flying plane. We may not want an AGI pilot who can also get bored just like humans and can get distracted playing games.


Wasn't this the original aim of AI, and one of the reasons the AI Winter occurred?


My personal uninformed opinion is that we'll start to see how much human experience and expression is driven by our brain wiring.

In other words, we will start thinking about people as if they are mere things. Complex things, but things nevertheless.


And, at the very least, thinking about the human mind as an object will lead to greater advances in artificial intelligence. IMO.

My mind will change when my brain does.

https://en.wikipedia.org/wiki/Neuroplasticity


Complex and important things. What else did you expect?


Are we sure what is being simulated here is cognitive? It seems to me merely mechanical. Not to deride the achievement of course.


I think we may be reaching a point where the line between mechanical and cognitive get blurred slightly. While I feel that this falls into the mechanical end of the spectrum because I understand how the model was created, RNN's demonstrate some very fascinating emergent behaviors that will become more complex as computational power increases.

I wouldn't say that this is cognitive but it definitely learns to perform a skill in a similar fashion to intelligent beings which is remarkable. Extrapolating that behavior to self-awareness isn't likely however (of course this could lead to a philosophical debate)


Couldn't this be seen as a kind of motor cognition (https://en.wikipedia.org/wiki/Motor_cognition)? Isn't this essentially what motor cognition does when we write? Perhaps this is more low-level than that, but it seems to me this neural network embodies the 'muscle memory' we develop when we practice handwriting.


Are we sure cognition isn't merely mechanical? Neural networks are pretty basic and yet they provide compelling products.


What happens in a neural network and what happens in a semiconductor device that we can choose to interpret as simulating the computational characteristics of a neural network (ignoring EM emissions, chemistry, etc.) are mechanically very different. If they both generate cognition (including the subjective aspects implied by that word), I would be very surprised. But that doesn't preclude the artificial neural network from mapping input-output in a way that correlates well with the input-output we observe in biological neural networks (again, ignoring a shit-ton of input and output that we don't consider part of the "computation").


Cognition is not "merely mechanical" by definition. If it was mechanical, there would be no special word to distinguish it from other reactions of the natural world.


That's certainly not true. The term "cognitive" could refer to a proper subset of the set of "mechanical" things. Terms for subsets are still useful.


So I guess you're fine with calling books "merely paper" because they are a "subset" of paper-based products? And you're fine with calling people "merely animals" because they're a "subset" of animals?

Human language is not based on set theory. There is a huge difference in connotations here.


Context is important, and the word "merely" can be interpreted in multiple ways. In the first case in this thread, I understood "merely mechanical" to mean "only mechanical," i.e containing nothing that is not mechanical. Using that definition, then humans are merely animals, because they're not something that's not an animal. Books are a bad example, because they do have things other than paper (like ink, glue, maybe a plastic or leather cover).

You seem to be using "merely" to mean "nothing more specific than," which is a very different meaning that would not be appropriate when talking about things which are subsets of other things.


If you care about context, I shall remind you that the original question was:

"Are we sure what is being simulated here is cognitive? It seems to me merely mechanical."

This is analogous to asking "are you sure this is done by humans rather than animals?" The question makes sense and is valid. To answer it with "humans are merely animals" would not address the subject at hand and, again, would have specific connotations.

I see that this kind of semantic acrobatics is extremely common in discussions of AI on HN.


Machine learning is reproducing the result of a process but not the process. It's easy to be confused about it, and many are. Just like when we call "thoughts" to be patterns of some sort, so then any sort of pattern is a thought. The logical error in this should punch you and any sentient being right in the stomach and leave you breathless for a few good seconds.


This is infinitely more impressive than Google's image generation. There are zillion ways to generate weird-looking images. However, this paper claims to:

1. Have trained a networks to generate handwriting. Handwriting can be objectively judged, unlike those images. The interactive demo does look pretty darn cool.

2. Made it so the network can be "primed" by small samples of new handwriting. This is the most impressive part, because it is something that actually qualifies as semi-generic AI. You train it on one thing and then it does another thing.

Actually, the second part is bit too impressive, compared to everything I've seen from NNs before. I've read the paper, but I don't understand how this priming works. How big is the primer? Is it just the sample shown on the website? Are there any constraints? Can I simply supply a sample of my writing with a textual "translation" for this network to pick up some aspects of my writing style?


> This is infinitely more impressive than Google's image generation. There are zillion ways to generate weird-looking images

Google wasn't training their NNs to 'generate weird-looking images'. The NNs were for image-classification (which is pretty impressive), the weird images were just a visualisation/artefact of the training process.


They developed and tuned a fairly elaborate process designed specifically for generating those images out of classification NNs.

The difference between the two demos is that the impressiveness of those images is mostly subjective, while handwriting replication has pretty well-established criteria to judge it by. Also, it sounds like priming is fast and does not require tons of samples.


I'm starting to think that consciousness emerges out of a neural net that has the appropriate training tests, scoring functions and has the right sensors to take in feedback.

For an easy example look at human babies - they're not conscious (maybe self-aware is a better word?), but we can see this development happen over time. Maybe sleeping has some role in recalculating the weights for the neural net? Eventually something like a feeling of self emerges as the evolved mechanism for working with social feedback. Maybe this is the general way intelligence works.

It does seem to make sense when looking at how it develops in animals.


Many chat bots display very human-like behaviour. Does that imply that the underlying technology will solve strong AI? Of course not. Such extrapolations are almost guaranteed to be incorrect. There have been some very eye opening articles recently about the limitations of deep neural networks which I highly recommend. AI is still an incredibly difficult problem and that is not going to change anytime soon. Deep neural networks are a decent step forward but we shouldn't get carried away.


> There have been some very eye opening articles recently about the limitations of deep neural networks which I highly recommend.

I'm not exactly a deep-learning fan, but could you post the links?



Thank you. It's quite interesting, actually, how the easy-to-compute discriminative models can be fooled so easily, while they hypothesize that generative models (which are much more general, and closer to our best models of human cognition), despite being much slower, ultimately provide better performance at the boundaries of the classification region.

I'd go on to venture that our minds probably use discriminative models for "intuitive" judgements, while reasoning that can explain the variance in percepts uses generative models, thus obtaining better performance where necessary despite using more energy. Or possibly, the generative models can be used to train the intuitive discriminative ones, slowly allowing the less intensive part of the mind's processing to adjust its class boundaries to suit what's really known.



That's cute. At last, the forgery app.

The killer app in this space will be when someone figures out how to extract a vocal model from existing recordings of singers. Vocaloid already synthesizes singing quite well, but a human singer has to go into a studio and sing a long list of standard phrases to build the singer model. The next step will be to feed existing singing into a system that extracts a model usable for synthesis.

The RIAA is so going to hate this.


I can imagine a near future whereby an RNN is trained to have an especially lovely reading voice. A large number of people might actually become convinced that the RNN is a person and develop empathy towards it.


That kind of thing happens already (with human experts rather than RNNs, but does that matter?) - you're reminding me of https://en.wikipedia.org/wiki/Aimi_Eguchi


haha combine that voice with some Markov chains, and you have the perfect friend replacement app!


This is starting to sound a lot like Her(http://www.imdb.com/title/tt1798709/). It'll likely be the case one day.


It also sounds similar to the black mirror episode "Be Right Back", although that episode provides a somewhat darker perspective.


It leads me to wonder why we haven't seen (or at least heard about the development of) an operating system with purely voice "UI" based on an RNN... Especially after Her came out.

I understand it's hard but it also sounds like a fun project for people with the relevant know-how.


Andrew Ng gave a keynote at GTC in which he talked about bringing Baidu's Deep Speech technology to phones (for accessibility). You betcha they're working on it!


Counter-point: The RIAA would so love to eliminate the human element from the process. As in, the most profitable artists are the dead ones. If you've read about hit-machine-in-human-form Max Martin, then you're aware the guy is a total obsessive with an approach (balanced lines, best vocal lines, etc) and if he could be in charge of a fake-singer by way of programming, then I think he'd be unstoppable.


This would be even more useful for audiobooks


Well, not so much singers at first, but voice actors and audio books.


Combine that with the extremely realistic 3D renderings available today and you'd be able to make whatever VR... fantasies you wanted. Fun and exciting.

Reminds me of this article from 5 years ago: http://www.dansdata.com/gz103.htm


Can you copyright your voice then? There are laws governing your "likeness" so I guess they would apply?


No, you can't copyright a voice. That's come up with "cover" bands. A cover band can sound like the original; they just can't claim to be the original. ("Compare to the ingredients in Elvis").

A cover band has to license the underlying composition, but not the recording they're covering. (This means ASCAP gets royalties, but the RIAA does not.) In the US, there's a compulsory license for compositions, and you can record and distribute any song by paying a relatively modest fee set by law.

This is just automating the cover band industry.

In ten years or less, this will be a common feature in DJ consoles, and we'll hear songs from musician A as if performed by musician B.

The RIAA is really going to hate that.


Exactly. No need for the mob to find and exploit the better art majors anymore. ;)


On the other hand, art majors will be even worse off than before. ;)


The smart ones will make money designing the ads for the forgery software. ;)


AB testing, and a corpus of previous advertisements, and a DNN could probably replace advertising creatives.


Nice. Mechanical turks might be a cheaper option at the creativity level I see in many Fortune 500 companies' advertisements (see IBM's boardroom ads).


Are they a business major or an art major? /s


Haha. They finally get the business degree with "life experience" credits from marketing forging tools.


It's interesting that the network will sometimes misspell words:

http://i.imgur.com/cFrlyy8.png

The input was copied from the instructions - "Type a message into the text box, and the network will try to write it out longhand". But you can see it skipped the "e" in "Type" and added an "h" after the "w" in "network", and pretty clearly spelled "to" as "du".

It also tried to cross the first vertical line of the "w" in "network" in lieu of adding an actual "t" beforehand (which is arguably an idiosyncrasy a human's handwriting might have, if a rather odd one); and stuck a big phantom stroke/letter between "T" and "y".


It has no idea what it's doing. :)


That or it's due to human error! Just like self-driving cars haha


Ha! What do you mean by human error though? An analogy with the self-driving cars example could be the human inadvertently made a typo when inputting the text sample in the above link...whereas if the dataset used in "teaching" this network to create handwritten text contained errors (e.g. the handwriting samples had "to" instead of "the" and many instances of "The" were missing the 'e').

Unless I misunderstood how this was developed...that could be the case ha!


It's interesting because handwritten notes are seen as one of the last symbols of human authenticity. The unique drawing of each letter seems to prove that a human invested time and thought in directly communicating with you. There was no copy-paste. There was no form letter. This was not a bot. And, based on the length of the note, there was provable effort involved. Both a Turing test and a proof of work problem. The world's oldest Captcha.

And so begins the devaluing of that proof. Just like when marketers started reproducing the "signature" on every sales letter with blue-colored toner, mimicking the authenticity of a hand signature.

I don't write handwritten letters, and I don't romanticize the past. But our dwindling ability to assess the authenticity of incoming communication is slightly unsettling.


Very strange results with rare Unicode characters (I used "𝕳𝖔𝖜 𝖜𝖊𝖑𝖑 𝖉𝖔𝖊𝖘 𝖙𝖍𝖎𝖘 𝖜𝖔𝖗𝖐")

https://imgur.com/a/Li8OZ



I implemented this system for the final project of my computer vision class. Couldn't get it to work by the deadline, but I'm very familiar with this paper. Happy to answer any questions.


I'm slightly alarmed by urls such as: http://www.cs.toronto.edu/~graves/handwriting.cgi?text=Hej&s...

Relative paths allowed from user input is usually a HUGE warning sign. Are you sure I can't make it open arbitrary files? What happens to your cgi script if it reads a file in the wrong format?


I'm curious as to what happened here:

Text entered: this is a test of handwriting generation

Style sample #1 selected.

All other settings at default.

http://imgur.com/6b1G5Tj

Edit: I've tried a couple other styles and haven't duplicated this craziness.


It obviously gets bored sometimes and decides it'd rather draw pretty pictures of mountain ranges or something.

http://imgur.com/uz60af1


Humans doodle, so it does too, I guess :)


The world's first artificial neural network with a drinking problem :)


There's an element of randomness in the generated samples. It does what it predicts a human would do. If it thinks there is a 1% chance a human would do move the cursor up, then it will have a 1% chance of doing that as well.

Since the net can't really be 100% sure of anything, it leaves itself a little uncertainty in case the human does something crazy. So sometimes the result will seem to be a little random than real writing because of it's uncertainty.

The bias parameter tries to fix this by biasing it's output towards more probable sequences.

I don't know what data it's using - it's quite possible it's corrupted by people drawing randomly or something.


The network that picked up that task was just a little wacky. No big deal. Look the other way. Please ignore. whistle

lol


Does anyone know what happened here[0]? The input text was "This is a test. Reeeeeeeeeeeeee!", and I chose the third style.

[0] http://i.imgur.com/6B9JkjC.png


Symbols seem to do strange things as well. The input was :

  !@#$%^&*()_+ 
here's 3 runs of it with bias all the way down:

http://paste.click/KNIJov

http://paste.click/PPjKXC

http://paste.click/RtZCtG

As I turned up the bias, it seems to trend to the letter g:

http://paste.click/hJrVlX

Some symbols seem to just fail, I'm guessing that's a bug in the serializing.


VERY impressive demo but I find it kind of funny the extra effort it takes to produce a "worse font".



You could use this in conjunction with something like Cleverbot to create a virtual pen-pal who sends you snail mail letters.


And combine it with holographic technology to have 3D Skype calls with them too?


This is really cool!

I could see it being used in games to generate hand written notes from data files.


For games you could get far more readable results by far simpler means. For example, add slight randomization of vectors to an existing font.


Has that been used in a game already? I'm curious.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: