How interesting would it be to have your own live emotionally expressive avatar for videoconferencing, when you don't want to worry about your hair, makeup, lighting, or general visual state at all?
David Foster Wallace ruminated on this in Infinite Jest.
"The proposed solution to what the telecommunications industry's psychological consultants termed Video-Physiognmoic Dsyphoria (or VPD) was, of course, the advent of High-Definition Masking. Mask-wise, the initial option of High-Definition Photographic Imaging — i.e. taking the most flattering elements of a variety of flattering multi-angle photos of a given phone-consumer and‚ thanks to existing image-configuration equipment already pioneered by the cosmetics and law-enforcement industries — combining them into a wildly attractive high-def broadcastable composite of a face wearing an earnest, slightly overintense expression of complete attention."
Always thought it was fascinating that he came up with this in 1996!
"You can look like a gorilla or a dragon or a
giant talking penis in the Metaverse. Spend five minutes walking down the Street and you will see all of these.
Hiro's avatar just looks like Hiro, with the difference that no matter what Hiro is wearing in Reality, his avatar always wears a black leather kimono. Most hacker types don't go in for garish avatars, because they know that it takes a
lot more sophistication to render a realistic human face than a talking penis. Kind of the way people who really know clothing can appreciate the fine details that separate a cheap gray wool suit from an expensive hand-tailored gray wool
suit.", Neil Stephenson, Snowcrash, 1992.
I was playing with OpenFrameworks and Kinect years ago, and you could pretty much do this, so I am curious why you would TF.js instead of simple OpenCV or other libraries that don't need to to use ML or DL. Or am I mistaken, and it is using simple bits of TS.js?
We had a large, flip-disc (or dot) wall at the site in Macau in 2012 that we purchased from a company that was a wall of black and white discs or dots that would flip to create a cool effect by tracking people in front of the wall with Kinect units in real time. It also made a cool clicky sound like old train/airport physical arrival/departure boards did.
I had a feature on my HTC phone or early Skype over 10 years ago where a cartoon cat mimicked my mouth, eyes, and head movement live on camera, which I can't find the reference to, but I have a screen recording of it when talking with my kids in the US from Macau.
I also remember using animata, a software that animated 2D puppet-like cut-outs to the music, I played with over 7 years or so ago, that was really cool [1 YouTube].
Re the feature in HTC or Skype, it was Skype, there were a set of novelty faces that mimicked the movement (I can only remember the dog, but there were a few, some more effective than others).
Wow, this would change how animation is done completely! If there existed an easy way to create animated character cartoons. It would launch a thousand southpark/rick and morty type of shows. A team with 10k can launch a show.
That's because the animation is very much the least important part of a successful show. It's the writing and voice acting. South Park is a great example of this. It's successful because it is funny and edgy. The animation could have gone a lot of different ways and had similar success I think.
I think fundraising is the hardest part of a project, especially for creatives. If a group of writers can wrangle some amateur voice actors to work, (Which I think would be the lowest paid actors) it should be easy to get a project up.
Heh, literally my first thought when I saw the sample gifs was that I could use this to make some explainer videos. And yes, they would probably end up looking poorly done.
This is fairly easy to wire up on Linux using v4l2loopback and pyfakewebcam.
I am currently using a little setup that uses OpenCV to acquire frames from the real camera, TensorFlow/BodyPix to compute an alpha mask for the foreground (me) and then OpenCV again to transform and composite myself behind news desks and into car infotainment screens and the like, eventually writing it to a virtual webcam I can use from Zoom (over its own virt bg feature this adds the layering and perspective transforms), Jitsi, Teams, etc.
The above looks like another fun thing to add. Time to go full Who Framed Roger Rabbit? ...
It sounds like a cool idea. Probably a platform to act on a play/script as the characters themselves? Like, a social theatre kind of thing with pre-animated background and stuff, where you only do your character part.
Aha, that's a completely different application than what I had in mind (just an alt webcam that's better than a static picture for communication while still protecting your visual identity), but it would be amazingly cool =).
The Snap Camera desktop app offers features somewhat similar to what you're seeking, as long as you combine it with the right filters. I imagine that the open source repo in the OP will eventually be used in an app like this.
This is a great idea! It should be great for bandwidth too - rather than sending full cam frames multiple times a second, it only needs to send deformation information describing movements.
Please more details - I can not find any feature on the OBS website similar to what we see here. Do we need some plugin / extension? Please give us some urls, thank you!
"Unless your project is an official Google product, you must state “This is not an officially supported Google product” in an appropriate location such as the project’s README file."
Why is this down voted? I hate it when people do that and offer no explanation. Totally useless input, even harmful. HN should force comment on down-vote.
One reason that guideline exists is that unfair downvotes frequently get canceled by users who come along, see the situation, and make a corrective upvote. Meanwhile complaints like this linger on in the thread, inaccurate and off-topic—they don't garbage-collect themselves. As an example, I noticed your other comment and upvoted it before I saw this comment here. Similarly, other users have upvoted the GP.
As with any stochastic process, there is a lot of error and spillage with downvotes. There's no way to perfect it; you have to ask whether the system is better off with it than without it. Forcing comments wouldn't help, and posting complaints certainly doesn't help.
Why not have a meta discussion like a Wikipedia Talk page so comments like these would have room and critique won't be shut down.
Yesterday (you can look in my comment history) I ran into a situation where the person doing the down-voting turned out to be basing it on their opinion not fact (after they finally stated their opinion on the matter, which contradicts peer-reviewed research on the topic, I realized why they were down voting: insufficient depth of understanding of the topic) and no one came after them to correct the situation...
Either have people explain why they down-voted or have a Talk page where people can discuss their reasons, complain, etc, behind the scene.
HN is a site for intellectual curiosity (https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...) and a meta forum would be a step away from that. It would fill up with litigious bickering and nitpicking, and demands for bureaucratic administration—all things that intellectually curious discussion requires having the restraint to avoid.
Edit: and it's for similar reasons that we don't publish a moderation log.
My other comment on this thread followed others in congratulating the author. It specifically said "Bravo!" and guess what? It got down voted.
So the explanation may be that it is useless input but believe me it is not useless to the author to have people congratulate them for the fruit of their labor. It's a human desire to be recognized by others. How is it not a positive comment? Why down-vote something as benign and empathic as saying congratulations? So force a comment on each down-vote and things might get a little bit better because at least we'll know it's not just random acts of hostility and that the down-voter has a rational reason for it, at least in their mind.
This is great :) 10 years ago I'd been using Animata http://animata.kibu.hu + kinect, this rolls it all into one web based thing. cool stuff. Would it be easy to move it to a local GPU accelerated version for more FPS?
Hmm tfjs does use the gpu, and there are plenty of small models for posenet to run fast on mobile. Maybe this is an old version. (Or maybe the bottleneck is in the face or svg stuff)
But to answer your question, yeah it could be faster (everything could be faster given enough time! :)
IIRC there was a similar thing done for one of the VR systems recently, but for the hands instead of for the face. Has anyone seen any open models or software for tracking hand shape and location in this way? I would love to use it in connection with sign language processing.
If you find that bookmark could you share it with me? One project I've looked at is sign puppet [1]. It has all of the animation basics needed; the tough thing is inputting the parameters to animate the puppet. Traditionally capturing sign language data for the computer requires really sophisticated tracking equipment (gloves, etc.). Being able to do it with a web cam could be a game changer!
I studied linguistics and CS in school, and I learned a little JSL to speak with deaf friends in Japan. I think sign language processing is a really neat combination of computer vision/graphics and linguistics. Lately there have been so many great advances in speech processing, but there hasn't been a huge leap forward for sign language processing, though I feel there should be.
Deaf people are already really disadvantaged in many places, and getting left behind technologically doesn't help. I really resented looking for JSL books in the "disabled" section of the book stores in Japan, and when I spoke with some people about JSL, they didn't believe it was its own language. Even just linguistic work for sign languages is limited; I haven't seen a single reference grammar (re: comprehensive documentation) on any sign language. I think the difficulty of working with sign language data makes it more daunting to work with. (Paucity of speakers is certainly not a deterrent for linguists.)
Certainly, there are lots of apps leveraging the TrueDepth sensor on iPhones that reach an even higher level of fidelity as far is pose estimation is concerned.
The limiting factor for accuracy in a lot of these technologies is the actual rigging process of the characters, probably because that is very difficult to standardize or generalize across different geometries, art styles, animation drivers, 2D vs 3D, etc.
https://www.newyorker.com/contributors
How interesting would it be to have your own live emotionally expressive avatar for videoconferencing, when you don't want to worry about your hair, makeup, lighting, or general visual state at all?