Hacker News new | past | comments | ask | show | jobs | submit login
DensePose – Dense Human Pose Estimation in the Wild (github.com/facebookresearch)
298 points by calahad on June 19, 2018 | hide | past | favorite | 86 comments



I find this highly interesting from a personal perspective, as I am an author of a Personalized Advertising patent[1] from the early 2000's. I was unable to capitalize on my work, and ultimately sold the patent. Interesting to note, the month the patents' expired (due to aggressive early filing dates of my original patents) I noticed Facebook's AR Kit was released, which touched multiple aspects of what is protected by the global patents.

This tech is key for Personalized Advertising where consumers are inserted into still and video adverting in place of current spokespersons and side-by-side with celebrities. Advertising is about to get surreal and the fake news consumers are about to get exploited something unbelievable. "Deep Fakes" for porn is kid stuff compared to what this tech opens: Pandora's Box if you ask me.

[1] https://patents.justia.com/inventor/blake-senftner


Respect for your efforts - but I'm also amazed that apparently the tech was already there 18 years ago! Yet nothing was ever written about it.

I wonder what other technological breakthroughs are locked away behind patents.


>This tech is key for Personalized Advertising where consumers are inserted into still and video adverting in place of current spokespersons and side-by-side with celebrities.

What a boring dystopia.

Seriously though, if a company ever did without my permission I would sue the pants off of them.


By creating a Facebook account you should assume they have your permission.


No, it's personal data, you need specific consent under the GDPR :)


The idea was to use them as fantasy fulfillment: take any A level film or music artist, and many consumers will pay for a video clip of them as Black Panther or you-name-it high grossing entertainment of the moment. You put yourself in, willingly, and for cost reduction allow product placements.


Even assuming that there isn't already some precedent that a checkbox allows them to use your uploaded likeness in whatever they want, is your premise here that you have more time and money to spend in a legal battle with Facebook or Google?


> Even assuming that there isn't already some precedent that a checkbox allows them to use your uploaded likeness in whatever they want, is your premise here that you have more time and money to spend in a legal battle with Facebook or Google?

There's currently a class-action lawsuit in progress against Facebook's use of facial tagging of Illinois residents: http://www.chicagotribune.com/business/ct-biz-facebook-taggi....

Regardless of how deep Facebook's pockets are, I could see another class-action lawsuit taking them on over recruiting its users into becoming uncompensated spokesmen in deepfake ads hocking products to their friends.

Also the legal situation in other jurisdictions may be less friendly to Facebook's usage of this technology, to the point where their deep pockets won't help them. I'm no international lawyer but I think that's definitely a possibility.


And there are _still_ people complaining that the GDPR is not a good idea...

If this becomes "a thing", I fully intend to use my UK citizenship and send GDPR boilerplate deletion requests to all the data brokers, social networks, and digital advertising services I can find.


> use my UK citizenship

Better hope it becomes a thing before March.


The issue is that Facebook, Google et al. already get a ton of consent for targeted/personalized advertisement and usage of your images and data. So you might not see this in a mall, but you will surely see it online.


Funny and scary that you describe it as a Pandora's box, as an author of a related patent! "Advertising is about to get surreal." Minority-Report-style ads, here they come..


I realized the potential, and wanted to steward the deployment in such a manner that civil society is not subjected to fake news, or privacy attacks such as public and private persons inserted into media against their will. The ultimate reason I was unsuccessful was a firm stance against pornography, which VC, media partners and angels insisted on an ability to exploit.


I'm sure it's little consolation, but kudos to you for sticking to your ethics. I sure wish more in the tech community would take this approach!


And what approach is that? Delaying the inevitable? :)


How long will it be before a resourceful individual will be able to meaningfully engineer a biological or synthetic pathogen that could wipe out a good portion of the human race? 1000 years? 200 years? 50 years?

There's nothing wrong with buying a little time while society catches up with the technology.


By around 15 years, not bad?


Much respect to making a moral stand! I suppose it's true that others will continue developing the "full potential" (including ethically questionable applications) of the technology, but at least one can make the choice to not contribute to undesired directions.


Personally, I'm waiting for AR UBlock. That would put the "augmented" in AR. :)


Would AR UBlock allow you to replace all the people in ads with people of your choice? So all spokesmen for me could be Gilbert Gottfried in a neon colored t-shirt?


I imagine it’d just cancel or black the ads out, I’d hope.


Someone here on HN had recently shared some hairstyles, facial accessories, and alterations that prevented common face detection implementations from recognizing your head, which seemed relevant and pretty cool from a tech punk perspective, too.


I wouldn't really blame a person for not realizing, back almost 20 years ago, the full extent to which this technology could be used, when coupled with similar tech for audio processing, cheap compute, and the world consuming information primarily from the Internet.


Interesting from a technical perspective, sure. Absolutely terrifying from any other perspective.


A few years ago, I would have thought this technical feat was amazing, and stopped there. Now I think it's creepy. Ah how times have changed...


Is there any other upside to this besides novelty and amusement? It's easy to imagine a dozen applications where this would threaten human privacy and safety.


> Is there any other upside to this besides novelty and amusement?

* Automated injury detection. You've got a warehouse, you've got cameras, now you've got an instant alert when one of your workers appears to be injured but out of sight of other workers. You've got street cameras, now you've got automatic detection of someone having a heart attack and laying down on a sidewalk. (Dystopian application: "homeless person detected, deploying zap-drones") Hospitals and old folks' homes could use this, too.

* Lifeguard Assist programs - automatic detection of drowning-like behaviors. (Of course, over-reliance on this would be bad...)

* Children separated from parents might be easier to detect in places like malls, etc. (I'm going to stop listing obvious parenthetical dystopian applications)


I see a huge possibility of replacing expensive mocap hardware and software using this tech, allowing for video games and VFX to become more accessible and use commodity hardware.

I’ve been eyeing things like the Kinect and iPhone X face tracking for this kind of task (for a fun side project I’m working on), but it would be great if I could track at least position and pose of multiple actors in a scene using just a standard webcam or camcorder.


Those are good examples. Basically surveying groups of people for body language that indicates a dangerous situation might be unfolding.


What's wrong with amusement? Seriously, I'd love to have this in videogames - to import myself and other people into the game. I'd also love to get my hands on a good, fully automated [series of photos] -> [textured 3D human model] pipeline, for anything from silly renders to 3D printing mementos.

Technology is inherently usable both for good and evil. You get both by default. It takes active countermeasures - usually non-tech, like regulations - to limit evil applications without sacrificing the good ones. As a society, we do that to some extent, but unfortunately we're not as successful as I'd like (e.g. if it were up to me, I'd seriously curtail the advertising industry).


I didn't say there was anything wrong with amusement. I just asked if there was any utility besides that. If it's only value is amusement and the potential negatives are enormous, it's at least worth a discussion.

I don't think it's right to just shrug and say "everything can be used for good or bad". The details matter. If you're talking about a Yo-yo, yeah, you can hit people with it or just amuse yourself; nothing catastrophic is likely to happen. This tech though has greater implications.


The structure from motion stuff is good enough if you got a couple of pictures from around the human without the human moving (much), but I suspect getting enough pictures without the human moving too far might be difficult.


I imagine this could be developed to observe e.g. a budding powerlifter's form, pointing out any obvious errors and putting hordes of PTs out of work ;)


This is a key tech behind personalized advertising: replacing actors in ads with anyone. From an Advertising Industry perspective, this is Holy Grail ingredients.


Could you explain why this is Holy Grail ? As someone who's totally hostile to advertisement my brain is not wired to actually envision that holy grail. I ask the question honestly: having my face on a picture with a celebrity rings zero bells for me (even if it was RMS :-))


Perhaps it is not meant for you, but for fans of a given media whose popularity warrants creating the consumer-insertion profitable. It is a different type of media, beyond product placements; consider the ability for fantasy fulfillment or impressionability. A lot of marketers would love to have such a technology. The idea is to offer it to consumers who willingly insert themselves into desirable scenarios for the media itself. A lot of consumers would do this, and the product placement capabilities are endless.


Hard to count that as a positive, but I see your point. Seems like it would benefit a very small group.


Driverless cars doing better at recognizing humans?


Or projecting human motion. That's a fair one.


There's nothing inherently creepy about it. I imagine it will be very useful for adding context to a scene - which is important for self-driving cars, AR, image classification, etc.

If you take a bunch of photos involving people doing things and extract pose information, I'd imagine it would be helpful in figuring out what's going on in other situations that are otherwise dissimilar.


CC-Non commercial is a disappointing license to find. The project is very cool otherwise.


I wonder how they arrived at Creative Commons' CC-BY-NC as the license. These licenses are not meant for code but for artwork, Creative Commons actually discourage the use of their licenses for code [1]. I recently noticed the same with the FastPhotoStyle code [2] by nvidia, so I'm wondering if there is something that draws their legal departments to this license?

[1]: https://creativecommons.org/faq/#can-i-apply-a-creative-comm...

[2]: https://github.com/NVIDIA/FastPhotoStyle


If the dataset it was trained on is CC-BY-NC, I'm pretty sure the model also has to be CC-BY-NC. However I think this is not respected, or even considered by most people.

I'd go with limiting how competitors can use it as the main deciding factor.


<cynical thought>Note that two of the three team members are from Facebook...


I'd never thought of this. It is a very interesting idea.


Hmmm... I guess it's been selected that way because it'd covevr the model files and the datasets. It's not about the code as much as the datasets/models.


They want to be the only people who can sell this where the real money is: military and law enforcement.


discourage people that aren't just hobbyists messing around from using it?


yeah, that's what I came up with myself. But I thought a main point, if not the whole point of publishing code for these companies was to appeal to developer-types who are fond of real open source/science. And those should be able to tell the difference...

It's a bit like allowing your scientists to publish their research, but only in prohibitively expensive and thus exceedingly niche journals.


Have you tried contacting the author/copyright holder to find out whether you can negotiate a commercial licensing agreement for the software? I assume you're open to paying commercial licensing fees.


Cheaper than a patent.


Would it be possible to map the first detected pose to a 3D model, scale and deform it to match the pose, and then use each next pose to manipulate the 3D model (vs generating all the vertices again)? This should result in smooth animation, without artifacts, and joint limits might even help with position estimation.


I'm too much of a newbie to figure this out but maybe someone here can tell me: Do they provide the final trained model? Or just a precursor model and code that trains a final model given bring-your-own data?


The site says "The dataset will soon be available on this website!", but apparently it's been saying that for at least four months.


Yes thanks. I was asking about the trained model though. I realize you can get the model by training with the data, but I don't believe for a minute that any data they release will be anything but a small fraction of what they trained their real model with.


Yes. From https://github.com/facebookresearch/DensePose/blob/master/GE...:

> DensePose should automatically download the model from the URL specified by the --wts argument


It can probably be used to identify potential criminals on the way the pose walk threatening pose. Police can then screen them. Like the movie Minority report.


Do you mean using gait analysis to identify humans and match them to a known criminal database or do you mean finding suspect criminals based on some “criminal” way they walk? If the latter, I don’t think that’s really based on anything more than current cultural profiling.


"Take that pep out of your step, citizen!"


I mean by tracking criminals cell phones locations, watching how they walk. Checking the network effect ie if a person contacts a previous criminal in the same gang network. Matching photos form surveillance cameras and public social media pictures. Matching walking pose you can pretty much track down criminals.

The power is when you combine the different databases and build a profile of the person. It is very similar to how advertisement companies like Google build up profiles of customers(gmail,search behavior,dns name resolution tracking,cookie tracking) only in a different field.

It is probably even more powerful when you combine physical behavior with online behavior-


" I don’t think that’s really based on anything more than current cultural profiling."

Yes, but if those individuals were actually more likely to commit crime, the AI would learn those things anyhow, leaving us with the question: if a specific demographic is considerably more likely to commit crime, and the AI picks up on it, is the AI 'racist'? Because racism is a moral judgement, moreover, the intersectionalists would indicate that it also requires the notion of 'power'.

This is not some novel issue I think and will fast become a real ethical dilemma.


Society definitely needs more law-enforcers with a high false positive rate.


that product/service already exists in Japan. No idea how effective it is but the company claims it uses deeplearning to recognize suspicious behavior of retail customers and then alert the staff to check.

Sorry I don't have a link. Saw it on a business news program.


Suddenly that movie's iris-scanning seems quaint.


Makes me wonder how much better they have in closed source to release this openly...


It is closed, the non-commercial license means almost no one can use it.


Previous thread (I hadn't seen it either): https://news.ycombinator.com/item?id=16289057


I _assume_ this would make it fairly easy to do bone-length estimation and comparison, leading to a way to uniquely identify someone from a video feed of them walking - even without any facial features...

Now I wonder how much of that tech is already deployed...


Gait analysis has been around for awhile. I guess this technique just makes it even easier.

What's crazy to think about is Gait Analysis from orbit https://www.schneier.com/blog/archives/2008/09/gait_analysis...


I can imagine this tech being using in some pretty interesting/scary ways:

* Generating avatars in Facebook's VR land from photos you're tagged in

* Recognizing a person IRL from photos they're tagged in


No-one's said it yet so i will (but not my idea..) This is going to be super useful for a future Oculus AR headset.

Basically, imagine the current oculus go headset, but with cameras on the front, and instead of showing you the actual world, it shows you a game, based on the existing current world, but morphed to look like Starship Troopers or something.


Rainbow's End is delves deeply into this idea - everyone can choose their reality and share it with others.


The Diamond Age - Neil Stephenson, touched on this.


I think you mean Syndicate. ;)


You could achieve similar results for cheaper by taking LSD. There's the added benefit that your brain actually believes it's all real instead of merely perceiving it.


Since you mentioned LSD, I wonder if technology like this could one day replace hallucinogenic drugs? At least partially - I'm sure some people will always prefer the chemical experience. I've always been pretty interested in the effects of hallucinogens, but could never bring myself to try any of them due the potential risks.

I wonder how much "your brain actually believes it's all real instead of merely perceiving it" matters... I know there have been a few times where I've gotten so sucked into a movie in a darkened theater that when it ends its incredibly jarring to be brought back to the real world, and the times that's happened to me haven't even been 3D movies, let alone interactive like AR.


Speaking from personal experience with psychedelics, it's not really the visual hallucinations that are impactful, but the way your brain thinks about things in a completely different way. So I don't think you can replace that chemical experience with a visual experience.


What would be interesting is the effect of both together; a sort of Brave New World 2.0. Great fodder for the likes of Black Mirror, at least.


Seeing how we live in an age of decadence and self-indulgence - perhaps just driven by instinct to pursue pleasure/curiosity - I'm sure that this is a possible future development: chemically-enhanced augmented reality experiences. "Augmented" in more than one sense of the word.


There are a variety of technical things where I think "wow, that's pretty clever and/or innovative - hats off!", and then there are things like this where I'm like "OMG wizardry!". But that's one of the things I love about the field of "computer stuff": there's so much interesting stuff I don't know.


Is there any reason this would be restricted to human pose estimation, as opposed to, say, rangefinding of moving objects of a specific roughly-known size?


I can't tell - is this able to extrapolate pose information into three dimensions, or can it only project onto two dimensional scenes?


It's something in between the two possibilities you describe.

Each human pixel in the image is labeled with an index and two coordinates: x, y (u and v are the traditional names, but think of 2D x, y coordinates)

The index specifies which patch that pixel is on, and the x, y coordinates specify where in the patch the pixel is on. This is for a pre-specified set of patches that cover a human mesh. See https://github.com/facebookresearch/DensePose/blob/master/no... for more detail.

So, no, it does not extrapolate the full mesh, but also for all human pixels, you are getting 3D information.


So is this basically 3-D rotoscoping?


This just popped up in my github feed, looks interesting.


Leave it to FB to come up with the most creepy and invasive tracking and surveillance.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: