Hacker News new | past | comments | ask | show | jobs | submit login
An On-Device Deep Neural Network for Face Detection (machinelearning.apple.com)
187 points by olivercameron on Nov 16, 2017 | hide | past | favorite | 50 comments



I really have to applaud Apple for doing these things on the device instead of on their cloud servers. They respect your privacy far more than most large tech companies (I would just say “they respect your privacy, others don’t”.)


how would they do on the server? no internet connection = cant use the phone?


I don't have the time at the moment to check whether or not Apple uses this approach, however, Google went about solving offline-functionality in a clever way. https://research.googleblog.com/2015/07/how-google-translate...


That’s exactly why I prefer on-device computation! The internet is not always available


there's such a thing as a pin. people keep forgetting.


What are the privacy implications of doing this vs. something like Google Photos? I mean iCloud also has my pictures.


They discuss photo and video storage on iCloud specifically:

> Apple’s iCloud Photo Library is a cloud-based solution for photo and video storage. However, due to Apple’s strong commitment to user privacy, we couldn’t use iCloud servers for computer vision computations. Every photo and video sent to iCloud Photo Library is encrypted on the device before it is sent to cloud storage, and can only be decrypted by devices that are registered with the iCloud account.


So if I throw out all my Apple devices and forget my current iCloud password, the photos are gone?

Or does Apple store the encryption key for me somewhere as a convenience, so that once I get new decides and reset my iCloud password, I regain access?


They’re gone


That's messed up and makes it certain that I will never use iCloud. Losing my family photos is a much worse outcome than having them accessed by a hacker.


That is what you get when you want real privacy protection, no one but you should be able to access that data. That being said, iCloud is not really a backup, just a File Storage with syncing features, you should get another service (internal or external) to backup your data like Backblaze or a NAS.


Apple allows you to reset your password if you forget it. This feature is why it can respond to subpoenas for iCloud data like iMessages.

It's an unfortunate tradeoff.


Don't put all your eggs in one basket


OK cool. That's what I was hoping.


What do you want to happen if you lose all your devices? Backup your damn data.


But that could technically be encrypted and still provide the same set of features.


This appears to be what powers the basic face detection in the Vision framework, not what powers Face ID: https://developer.apple.com/documentation/vision

It is not capable of identifying individual people, just recognizing faces and facial features in an image. Still quite useful for things like Snapchat's masks.


It was also used to assist blind users in their efforts to use the built in camera app.

>VoiceOver will then use face detection to tell you how many people are in the frame. It will also say if a face is small, and where in the frame a face or faces are located incase you want to try and better center them. When you move, it will tell you the new framing, so you can figure out if you're getting closer to the shot you want.

    One face. Small face. Bottom left corner.
https://www.imore.com/making-iphone-camera-work-blind


Does this work better than Google's on-device face detection? I was experimenting with it yesterday and it seems to work pretty well, even in low light.

https://developers.google.com/android/reference/com/google/a...


In my limited trials it was not terribly performant, running at 15fps or so only.


I would consider that pretty performant for a task such as this on a phone.


Compared to non-DNN techniques? I don’t have concrete numbers but I’m pretty sure “classical” techniques can do this with much higher throughput.


With equal accuracy?


If the non-NN techniques are faster, but the NN techniques are more accurate, couldn't they be combined to provide both speed and accuracy?

Use the NN to find the faces, and the non-NN to track faces once the NN has found them, and use the NN to periodically check to see that the non-NN is still locked on.


Does that work? A NN might detect half a face, but then how do you switch over to the "find two dots" technique when there's only one eye? This seems susceptible to a lot of flapping.


I'd expect that the non-NN part would be more of a "track movement of this arbitrary blob" thing rather than a "track movement of this face" thing.

Suppose the NN is only 25% of the speed you need to support the frame rate you want. Then every time you get a new face blob list from the NN, the non-NN tracker would to track the blobs for 3 frames. My guess is that in most common photography situations where you need face detection, faces won't move very far or won't change orientation or lighting very much in 3 frames.


On what device? The A11 chip is far more performant than previous gens.


I understand this article is about face detection, but on a tangential note, face identification against known faces in the Photos app has been laughable for me on iOS 10 (haven’t checked if this has improved in iOS 11, since I don’t have many new photos with new faces). Even with some manual training, the face identification and name matching seemed quite primitive, getting most of its guesses wrong (for example, matching kids’ faces against adults’ faces).

That said, I do strongly admire and appreciate Apple’s stance on privacy and the software doing these kinds of work on the device and not on the cloud. I can wait for these things to improve.


For me the faces function in ios photos has been pretty amazing. Accurately detecting blurry background faces is surprising. I just checked it and it picked out a picture of me where just my nose and part of a cheek is showing! (aviator sunglasses and scarf covering my face).


I feel like this paper reads part like a paper on AI and like a marketing piece humble brag, “back in 2014 when we were doing X” which seems like its trying to say “we’ve been doing DNN image face before Google and Facebook made it popular”. But if you read papers from Google, FB, MS, et al on AI, they don’t have these kinds of humble brags and they contain far more citations of previous work in the field.


That seems like a particularly cynical reading :) I can't say I got the same impression.


I can be mistaking myself but I read more this article like a technical blog post than a paper. It seems to me it is more akin of the usual technical blog posts written by big tech companies than the writing of a paper formally submitted to an academic journal.


Well, the way Apple publishes their results means that other people always have to refer to the "Apple Journal" when they cite those results. Wheareas Apple does not refer to articles published by Google and Facebook in that way (by naming them). There's no single reference to Google or Facebook on that blog (or journal, as they like to call it).


Could that one refer the the Photos.app (was it in iPhoto already?) feature that matches faces?


Yep, plenty of it:

However, due to Apple’s strong commitment to user privacy, we couldn’t use iCloud servers for computer vision computations. Every photo and video sent to iCloud Photo Library is encrypted on the device before it is sent to cloud storage, and can only be decrypted by devices that are registered with the iCloud account


What do you want them to say?

Something like "We didn't use an already-existing, massive database of ours because [REDACTED]."

Despite my somewhat facetious alternative, I am genuinely interested in how you think they could have worded that 'better'.


They could have matter of factly just described their algorithm. How many papers do you read from ACM that start off with a long, marketing oriented set of paragraphs to justify why the paper's algorithm was written?

Let if you're reading a paper on style transfer using DNN, you don't get "Well, we wanted to paint replicas of these paintings, but due to corporate issues, our motto, or our lack of impressionists, we were forced to invent this algorithm..."

I just felt like there was an excess of Apple branding and marketing in what's supposed to be a scientific paper. Yes, Google does this too, but on it's blogs, not in the actual CS papers it publishes.

The papers on Map Reduce or Dremel aren't full of humble brags.


This isn't a research paper. Apple never claimed it to be one. It's not in some scientific journal. It's not peer reviewed. It doesn't contain any level of detail. It lacks any of the sort of norms you would expect from a paper. It's even hosted on Apple.com in a blog style format.

So why are you holding it to a research paper standard ?


Maybe because I expect research to be published not just as marketing fluff, but also to be published as papers too. Apple for a long time was known for blocking their AI researchers from participating at conferences or publishing. They published their first paper only recently in 2016 (https://www.macrumors.com/2016/12/26/apples-ai-team-first-pa...) and it seems like squeezing water from rock.

This behavior rubs me the wrong way. Science is a collaborative community endeavor. Look at how many papers have been published by Google (https://research.google.com/pubs/papers.html) 874 in machine intelligence, arguably Google's main secret sauce that you could argue they should keep as a trade secret for competitive differentiation.

Apple's secrecy in product development is fine, but IMHO, if you're consuming the fruits of community academic and commercial research, and trumpeting your products advancements based on that, it behooves you to publish more openly, the papers, at least on preprint services like ArVix.


Back in 2014 teams at Apple weren't allowed to release papers. If they were you wouldn't be making this comment since you would be including Apple in the list of companies who do this. Not that this is even a paper.

And who cares about Google, Facebook anyway ? They never invented this technology nor were they even the first to incorporate it in shipping products.


Pushing human-identification capable software and libraries everywhere is starting to get pretty creepy...


Which is why it's nice that Apple do this on-device, rather than the google approach of hoovering up your data and letting them loose on it.


Not sure if this was your implication - but this is not capable of identifying people in terms of putting a name to a face. Just identifying where there is a face.


Photos.app is capable of finding faces and then categorizing them though. You put a name on it, and it can even be tied to a contact. Dunno if it’s related to this paper.


Right. And its used for things like helping the camera make sure it keeps people in focus.


What it is used for is unrelated to what its capable of doing. The danger always lies in the capability, not the intent.


This is not face recognition.


"Every photo and video sent to iCloud Photo Library is encrypted on the device before it is sent to cloud storage"


No quantization? There’s a lot of optimization being given. Left on the table by doing float math at inference time.


Isn't this part of every consumer camera and phone already?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: