Hacker News new | past | comments | ask | show | jobs | submit login

I get that this is to drive the avatar, but I'm curious as to why. There's stronger signals with video, which I'm certain taking in even a very low resolution image would have stronger signals to convey movement than audio does (either the network is memorizing a lot (which is fine, but limited), or this is an iteration towards a 3D high sensitivity audio driven for precise sound? Something else?). I mean the quest has cameras in it, so why not use those? Computation? They aren't big models (largest is 1.42G, smallest 0.58GB)



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: