Hacker News new | past | comments | ask | show | jobs | submit login

Maybe I'm misunderstanding the code, but it looks like it's matching audio to video, not actually recognizing speech given a video. That is, it could answer "does this audio line up with this video?" but not "what is being said in this video?"



I didn't take a deep dive of the code but in order to train it's going to need to be fed audio files with the actual video/mouth shapes/etc. Essentially it needs it to tell the reward to give back (if it was right). Once it "learns" it wouldn't need the audio file.


in order to train doesn't it have to match audio output to a video of mouth movement?

Doesn't deep learning imply training on sample result?


Exactly. How is this "lipreading"? Clickbait.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: