Swift Homomorphic Encryption

tedunangst · 2024-07-30T18:17:05.000000Z

I feel like phone number lookup is the textbook example of homomorphic encryption not actually working because there's so few keys you can simply enumerate them.

colmmacc · 2024-07-30T18:24:33.000000Z

I think here the query exposes who called who, which isn't as enumerable. By encrypting the query homomorphically on the client, the answering service has no knowledge of what number the lookup is for, and so Apple can't build a database of who calls you.

tedunangst · 2024-07-30T18:31:27.000000Z

It includes both numbers? That wasn't clear. It sounded like they're just looking up the calling number for fancy caller id. How does the recipient affect the query?

colmmacc · 2024-07-30T18:37:22.000000Z

The lookup has to come from your phone, which implicitly discloses you. Encrypting the caller-id symmetrically or asymmetrically wouldn't get privacy, because the receiver would have the decryption key. Hashing it, even with a seed, would be dumb because it's enumerable as you pointed out. But encrypting the entire query does work, because it becomes on a homomorphic search on uniform looking data. So the receiver has no idea what you queried.

That said, research on things like memory access side-channels is thin here. Like if I take a guess and try a query for my guess number, are there timings there that could be exploited because of cache effects? I have no idea, but a lot of PIR schemes have fallen to this.

tedunangst · 2024-07-30T18:42:18.000000Z

Okay, I figure if Apple wanted, they could simply query every number and see which disk blocks get read. But now maybe I'm confused. They read the whole database on every query?

colmmacc · 2024-07-30T18:54:45.000000Z

My understanding of encrypted search in FHE is that there can be multiple copies of the same search key, and that search keys aren't simply in-place encrypted versions of themselves - as encrypted fields in a database are - but are mappings embedded in a higher dimensional space that is encrypted.

That reads like sci-fi nonsense, but the "on the metal" result is that a search key will translate to a set of memory locations that are combined to determine the match, and a separate query for the same search key will translate to a different (but potentially overlapping) set of memory locations that produce the same result.

MBCook · 2024-07-30T20:25:23.000000Z

Do I have this right?

If the server could actually decode things it would’ve gotten something that could be decrypted into let’s say 15 phone numbers. A little bit like if they were hashed, to simplify.

So the answer the server returns isn’t who the phone number belongs to, it’s who those 15 phone numbers belong to.

And then the client can decrypt it and just get the one that it cares about.

But unlike if you were just doing hash buckets no one who saw the data going back-and-forth could actually understand any of it. Correct?

Is this only really good for data to look up cases? I had thought homomorphic encryption could be used to do actual calculations, at least certain kinds.

ElFitz · 2024-07-31T05:16:44.000000Z

> I had thought homomorphic encryption could be used to do actual calculations, at least certain kinds.

Well yes. There was this:

- https://news.ycombinator.com/item?id=21638639

- https://news.ycombinator.com/item?id=31933995

- https://azeemba.com/posts/homomorphic-encryption-with-images...

And quite a few more.

fragmede · 2024-07-30T20:49:09.000000Z

Theoretically it can, but the tech isn't quite there yet, so we don't know for sure.

GTP · 2024-07-31T08:43:08.000000Z

AFAIK (but my knowledge is 2/3 years out of date) the problem with general computation is that current schemes are still too slow for parctical use. But recent developments give hope that it could be doable in the future.

silasdavis · 2024-07-30T22:05:06.000000Z

I'm not sure what enumeration attack you have in mind, but if you were to encrypt the same value many times you would not get the same ciphertext under most schemes.

willseth · 2024-07-30T18:23:04.000000Z

The novelty is that the server processing the phone number can perform the lookup without actually knowing the phone number or whether it matched.

Dylan16807 · 2024-07-30T18:27:31.000000Z

Are you thinking of hashing?

As far as I'm aware homomorphic encryption can keep even a single bit safe, but maybe I missed something.

scosman · 2024-07-30T22:04:21.000000Z

add a seed.

golol · 2024-07-30T19:04:22.000000Z

I find homomorphic encryption fascinating as it can in some sense move a simulation into an inaccessible parallel universe.

Jerrrrrrry · 2024-07-30T23:23:53.000000Z

> move a simulation into an inaccessible parallel universe.

more like, "move a computation into an progressed, but still unknown, state"

tpurves · 2024-07-30T20:20:23.000000Z

Anyone interested in FHE should also be checking out https://www.zama.ai they've made a ton of progress recently in making FHE practical.

bluedevilzn · 2024-07-30T18:13:32.000000Z

This must be the first real world use case of HE. It has generally been considered too slow to do anything useful but this is an excellent use case.

osaariki · 2024-07-30T18:50:15.000000Z

Edge's Password Monitor feature uses homomorphic encryption to match passwords against a database of leaks without revealing anything about those passwords: https://www.microsoft.com/en-us/research/blog/password-monit... So not the first, but definitely cool to see more adoption!

cedws · 2024-07-30T20:38:23.000000Z

This is nicer than the k-anonymity algorithm that Have I Been Pwned uses, but probably an order of magnitude more expensive to run.

dagmx · 2024-07-30T19:05:12.000000Z

I believe Safari does the same as well, so not even technically the first at Apple if I’m correct?

Groxx · 2024-07-30T19:35:09.000000Z

After reading the technical details... I'm really not sure tbh: https://support.apple.com/guide/security/password-monitoring...

I'm sure someone here understands that and can answer conclusively, but that's not me today.

nightpool · 2024-07-31T01:11:36.000000Z

No, the Apple article says that they're using the much faster and easier to implement k-anonymity strategy

MBCook · 2024-07-30T18:47:26.000000Z

I tried to look homomorphic encryption up casually earlier this year. I saw references that it was being used, but I don’t think they said where.

This is one topic I have a very hard time with, I just don’t know enough math to really grok it.

It just seems crazy a system could operate on encrypted data (which is effectively random noise from the server’s point of view) and return a result that is correctly calculated and encrypted for the client, despite never understanding the data at any point.

I sort of understand the theory (at a very simple level) but my brain doesn’t want to agree.

oblvious-earth · 2024-07-30T19:16:49.000000Z

Maybe it’s the fact it can be done with multiple operators and strong encryption that is hard to grok, but at least here is a very simple example of a limited partially homomorphic encryption:

You have a 7-bit character representation (e.g. ASCII) and your encryption is to add 1 mod 128. E.g. 0 -> 1, 1 -> 2, ... 126 -> 127, 127 -> 0.

As it turns out, all your operations can be represented as adding or subtracting constants. You can now encrypt your data (+1), send it to a remote server, send all the adding and subtracting operations, pull back the processed data, decrypt the data (-1).

Of course, this example is neither useful encryption nor generally useful operation, but can be useful for grokking why it might be possible.

kmeisthax · 2024-07-30T19:40:25.000000Z

Let's say I want you to add two numbers, but I don't want you to know what those numbers are, nor what the result is. What I can do is multiply both numbers by some other number you don't know. I then give you the premultiplied numbers, you add them, and give back a premultiplied answer. I can then divide out the number to get the true result.

What we've done here is this:

(a * key) + (b * key) = (c * key)

The rules of elementary algebra allow us to divide out the key on both sides because of a few useful symmetries that addition and multiplication have. Namely, these two equations are always the same number:

(a + b) * key = (a * key) + (b * key)

This is known as the distributive property. Normally, we talk about it applying to numbers being added and multiplied, but there are plenty of other mathematical structures and pairs of operations that do this, too. In the language of abstract algebra, we call any number system and pair of operations that distribute like this a "field", of which addition and multiplication over real[0] numbers is just one of.

A simple example of a field that isn't the normal number system you're used to is a 'finite field'. To visualize these, imagine a number circle instead of a line. We get a finite field by chopping off the number line at some prime[1] number that we decide is the highest in the loop. But this is still a field: addition and multiplication keep distributing.

It turns out cryptography loves using finite fields, so a lot of these identities hold in various cryptosystems. If I encrypt some data with RSA, which is just a pair of finite field exponents, multiplying that encrypted data will multiply the result when I decrypt it later on. In normal crypto, this is an attack we have to defend against, but in homomorphic crypto we want to deliberately design systems that allow manipulation of encrypted data like this in ways we approve of.

[0] Also complex numbers.

[1] Yes, it has to be prime and I'm unable to find a compact explanation as to why, I assume all the symmetries of algebra we're used to stop working if it's not.

tjade273 · 2024-07-31T13:57:42.000000Z

A set of numbers with addition and multiplication is a _ring_ and is a _field_ only if every nonzero number has a multiplicative inverse.

Integers mod N for composite N (like in RSA) form a ring, not a field - if N = p*q then there is is no well-defined division by `p` or `q` (for example, there is no element corresponding to `2/p`).

When p is prime, every nonzero integer `x` has a multiplicative inverse `1/x` mod p. That's why the integers mod `p` form a field (denoted F_p).

In fact there is another kind of finite field that has `p^n` elements where `p` is any prime and `n` is any positive integer. These fields are not composed of the integers mod p^n, but are made of polynomials of degree `n` with coefficients in F_p.

innermatrix · 2024-07-31T03:21:21.000000Z

Finite fields have multiplicative inverse only if the base is a prime, and that's what makes the rest of your usual algebra work (or not).

For example, for N = 5, 0 * 2 = 2, 1 * 2 = 2, 2 * 2 = 4, 3 * 2 = 1, 4 * 2 = 3, so the inverse of "* 2" is uniquely defined. On the other hand, for N = 4, 0 * 2 = 0, 1 * 2 = 2, 2 * 2 = 4, 3 * 2 = 2, so the inverse of "* 2" is not uniquely defined.

rozap · 2024-07-30T23:37:59.000000Z

Thanks, this is a nice explanation. Appreciate the clarity.

nightpool · 2024-07-31T01:09:59.000000Z

Second: Google Recaptcha Enterprise can use Homomorphic Encryption to check whether your password has been compromised (searching the set of all breached passwords without disclosing which individual password you want to check)

Now, in practice, HaveIBeenPwned does the exact same thing with a k-anonymity scheme based off of MD5 collisions, which is wayyyy easier in practice and what most people actually deploy, but the Google thing is cool too.

7e · 2024-07-30T18:57:58.000000Z

A TEE would be a cheaper and more straightforward solution, though.

saagarjha · 2024-07-30T20:45:12.000000Z

They also mean if you break the TEE then your privacy guarantees are lost. This, of course, has happened many times.

glenngillen · 2024-07-31T03:08:44.000000Z

I believe Cipherstash is using HE to do what they do: https://cipherstash.com

kmdupree · 2024-07-31T17:59:17.000000Z

it says on their webpage that they aren't using HE

tiffanyh · 2024-07-30T19:07:47.000000Z

This is hugely significant (long-term), that won't be felt immediately.

This is a massive announcement for AI and use cases related to PII.

oulipo · 2024-07-31T08:21:27.000000Z

How does it compare to the FHE from https://zama.ai ?

rhindi · 2024-07-31T08:34:22.000000Z

They use BFV, which is an FHE scheme allowing a limited number of fast additions and multiplications (enough for their use case).

Zama uses TFHE, which allows any operation (eg comparisons) with unlimited depth.

So if you only need add/mul, BFV, BGV and CKKS are good options. For anything else, you better use TFHE

jayavanth · 2024-07-31T10:24:33.000000Z

I was curious about that choice as well. I guess they also just wanted to operate on integers and not floats

gumby · 2024-07-30T18:23:58.000000Z

The name is hilarious because HME is anything but speedy -- by many orders of magnitude.

I think the real fix is secure enclaves, and those have proven to be difficult as well.

karulont · 2024-07-31T00:16:52.000000Z

There was a recent paper that also uses Swift in the name:

“Cheddar: A Swift Fully Homomorphic Encryption Library for CUDA GPUs” - https://arxiv.org/pdf/2407.13055

We were a little worried, but quickly discovered that they used Swift as an adjective not as a programming language.

[Disclosure: I work on the team responsible for the feature]

Someone · 2024-07-30T18:35:09.000000Z

> I think the real fix is secure enclaves

FTA: “Live Caller ID Lookup uses homomorphic encryption to send an encrypted query to a server that can provide information about a phone number without the server knowing the specific phone number in the request”

So, this would require a distributed Secure Enclave or one of them on Apple’s server communicating with one on an Apple device (likely, certainly over time, with lots of different Apple devices fo lots of different iCloud accounts)

dllthomas · 2024-07-30T19:41:12.000000Z

I don't see why it would? IIUC, the promise of homomorphic encryption is that I can encrypt my database of contacts and send it to an untrusted server, later send the encrypted query to that untrusted server, and get back an encrypted response, without the server being able to tell anything that couldn't be told from the wire (some bounds on how much data, timing of communication, that sort of thing) or provide an incorrect answer.

tempay · 2024-07-30T20:04:46.000000Z

That's not the use case mentioned here. The example given is blocking known spam callers and displaying identity information on the incoming call screen. To do this without homomorphic encryption requires the entire DB to be sent to every client. Even if size wasn't an issue (which it is), it's hard to update it frequently.

Homomorphic encryption means you can ask Apple "who is calling me" without Apple knowing who is calling you.

MBCook · 2024-07-30T20:30:09.000000Z

Not really. You could do it the way the Have I Been Pwned database works.

You hash your query and then send only the first X number of bits.

The server returns all results that hash up to that same first X number of bits.

The server doesn’t know exactly what number you were looking for, and you don’t have to download the entire database.

But in this case the server WOULD be able to figure out the set of possible phone numbers you were asking about. Because of the complexity of passwords the search space would be a lot larger.

So privacy wise this does seem better.

tempay · 2024-07-30T21:10:49.000000Z

Indeed, I wasn't clear enough in my original message that it was under the assumption that you want to keep the caller 100% private from Apple.

Though there is a valid argument that you're still leaking information (e.g. "Person X received a call at 21:05:43"), but I'm not sure how you could possibly make an API that avoided that given the time sensitive nature of identifying callers.

fboemer · 2024-07-31T00:08:40.000000Z

https://developer.apple.com/documentation/sms_and_call_repor... has some more details on privacy in Live Caller ID Lookup.

[Disclosure: I work on the team responsible for the feature]

tempay · 2024-07-31T14:08:20.000000Z

Thanks, that was an interesting read. Seems like a nice solution with the pragmatic trade off with trusting there isn’t collusion between Apple and the third party.

Cyphase · 2024-07-31T08:58:20.000000Z

The client can constantly and at random intervals make lots of chaff queries to the API, so the service doesn't know which are real calls and which aren't. The client knows it's incoming calls history, so it can make sure it's chaff queries are statistically convincing.

For instance, if you often receive a call at the same time of day, that could be a detectable signal in the noise, unless the client then creates a lot of similar fake signals in the noise.

dllthomas · 2024-07-31T07:09:44.000000Z

Ah, interesting. Yeah, they seem to point at "Private Information Retrieval" but I don't see where they explain how they do it? I see some mention of "cleartext inputs" and I guess if you can do that you can pass the whole DB that way? That sounds expensive, but I think if you want to actually keep the number completely confidential it has to touch every record - otherwise you could look at the access pattern...

shortstuffsushi · 2024-07-30T18:58:33.000000Z

I think Swift in this case is just referring to the programming language, Swift, and not a characteristic of the encryption library itself

dllthomas · 2024-07-30T19:38:38.000000Z

Right, but that doesn't make it not funny.

ganyu · 2024-07-30T19:57:04.000000Z

At least 10^4 times slower than raw code, i think

That makes HE anything but Swift (

bawolff · 2024-07-30T20:50:36.000000Z

Its like high-temperature super conductors, its all relative.

layer8 · 2024-07-30T19:06:21.000000Z

I didn’t look at domain at first and ended up being quite disappointed. :)

ReptileMan · 2024-07-30T20:04:58.000000Z

What is the processing that the server does on the encrypted phone number? I am not sure I understand. I always thought that this type of encryption was (roughly and imprecisely) - you send some encrypted blob to the server, it does some side effect free number crunching on the blob and returns the output blob. You decrypt the blob and everyone is happy.

But to return information if some number is spam it has to be either plaintext or hashed condition somewhere outside of the phone?

fboemer · 2024-07-31T00:12:31.000000Z

https://news.ycombinator.com/item?id=41115179 give some intuition. The server database is stored in plaintext, but the server response will be encrypted under the client's key.

[Disclosure: I work on the team responsible for the feature]

dboreham · 2024-07-30T20:37:28.000000Z

The "side effect free number crunching" in this case is: is <encrypted_phone_number> in <set_of_encrypted_bad_numbers>

You're on the right track with the idea of hashing -- I find it helpful to explain any fancy encryption scheme beginning with "if it were just hashing", then extend to "well this is a very fancy kind of hash", and <poof> now I kind of understand what's going on. Or at least it's no longer magic.

saagarjha · 2024-07-30T20:48:18.000000Z

I don't think the set of bad numbers needs to be encrypted.

vlovich123 · 2024-07-30T23:29:02.000000Z

It does - otherwise you would know which numbers are queried to process the query, letting you narrow things down (ie huge side channel and thus not HE anymore).

saagarjha · 2024-07-30T23:44:42.000000Z

How so? You can just query all the numbers and discard results you don't want.

vlovich123 · 2024-07-31T18:00:56.000000Z

Sure, you can query the database all you want. The important property is that the server cannot observe the client querying the database - processing a query occurs in an encrypted space that it does not have the keys to. Similarly, one would expect that each query, even if it's for the same phone number, would be observed to be reading randomly from the database each time.

yalogin · 2024-07-31T00:35:45.000000Z

FHE is cool but I wonder how many use cases it actually fits. Don’t get me wrong, it gives better security guarantees for the end user but do they really care if the organization makes a promise about a secure execution environment in the cloud?

Also from an engineering point of view, using FHE requires a refactoring of flows and an inflexible commitment to all processing downstream. Without laws mandating it, do organizations have enough motivation to do that?

kybernetikos · 2024-07-31T07:13:53.000000Z

I think the main thing that throws it into question is when you get the software that sends the data to the service and the service from the same people (in this case apple). You're already trusting them with your data, and a fancy HE scheme doesn't change that. They can update their software and start sending everything in plain text and you wouldn't even realise they'd done it.

FHE is plausibly most useful when you trust the source of the client code but want to use the compute resource of an organisation you don't want to have to trust.

bobbylarrybobby · 2024-07-31T03:30:08.000000Z

I assume companies like it because it lets them compute on servers they don't trust. The corollary is they don't need to secure HE servers as much because any data the servers lose isn't valuable. And the corollary to that is that companies can have much more flexible compute infra, sending HE requests to arbitrary machines instead of only those that are known to be highly secure.

nightpool · 2024-07-31T01:01:09.000000Z

> but do they really care if the organization makes a promise about a secure execution environment in the cloud?

Uh... demonstrably yes? No "secure execution environment" is secure against a government wiretap order. FHE is.

chipsrafferty · 2024-07-31T17:59:51.000000Z

Unless the operating system for iPhones is open source and one can verify which version they have installed, users can't really be sure that Apple is doing this. They could just say they are doing things to protect user's privacy, and then not, and sell their data.

nightpool · 2024-08-02T22:59:12.000000Z

> Unless the operating system for iPhones is open source and one can verify which version they have installed

There are a lot of security engineers out there reverse engineering Apple's iOS versions and payloads, especially ones installed on the phones of activists and other dissidents who may be under government surveillance. While in theory Apple could build a compromised OS and serve it only to a single IP or whatever, the reputational risk if they were to be discovered would be enormous. Compared to when the processing is happening on Apple's servers, where it's impossible to tell for sure if you're being wiretapped, there's just too much of a risk of detection and tipping off the target.

nmadden · 2024-07-30T20:07:48.000000Z

The thing that I always want to know with FHE: the gold standard of modern encryption is IND-CCA security. FHE by definition cannot meet that standard (being able to change a ciphertext to have predictable effects on the plaintext is the definition of a chosen ciphertext attack). So how close do modern FHE schemes get? ie how much security am I sacrificing to get the FHE goodness?

GTP · 2024-07-30T21:22:46.000000Z

Is the used scheme fully homomorphic encryption or just homomorphic wrt a specific operation? Because they only mention "homomorphic" without the "fully".

fboemer · 2024-07-31T00:04:57.000000Z

Swift Homomorphic Encryption implements the Brakerski-Fan-Vercauteren (BFV) HE scheme (https://eprint.iacr.org/2012/078, https://eprint.iacr.org/2012/144) (without bootstrapping). This is a leveled HE scheme, which supports a limited number of encrypted adds and multiplies (among other operations).

[Disclosure: I work on the team responsible for the feature]

Jommi · 2024-07-31T06:22:15.000000Z

That’s awesome. I’m part of a cryptography group working on more applied uses of homomorphic encryption, is there a way to contact you?

nmadden · 2024-08-01T07:47:43.000000Z

With respect to IND-CCA, it doesn’t matter. Neither is compatible.

hansvm · 2024-07-30T21:38:34.000000Z

You can't attain IND-CCA2 (adaptively choosing cyphertexts based on previous decryptions). You can attain IND-CCA1 (after a decryption oracle, you're done fiddling with the system).

nmadden · 2024-07-30T21:55:11.000000Z

Right, but IND-CCA1 is kind of a toy security goal though. A sort theoretical consolation prize if you can’t achieve the real thing. And AFAICT, no actually implemented schemes do obtain even CCA1?

hansvm · 2024-07-31T00:31:41.000000Z

Sure, but that's "how much security you're sacrificing to get the FHE goodness," and, as always in crypto systems, implementations might not be that good.

> A sort theoretical consolation prize if you can’t achieve the real thing

The real thing exists largely because it makes proofs easier. For something like FHE you can bolt on some extra user-space features to build something like IND-vCCA (your decryption oracle refuses to operate if the result was not obtained by executing the right algorithm on the right ciphertext), which may or may not make FHE suitable for this or that target application. It's not a weak property though.

nmadden · 2024-07-31T06:06:47.000000Z

> The real thing exists largely because it makes proofs easier.

I would not say that. It exists because practical padding oracle attacks (which are adaptive CCA) have been known for decades. CCA2 very much captures real-world attacks. Is there any realistic attack that is captured by CCA1? (Or vCCA).

Padding oracle attacks also generalise to any kind of parsing after decryption. Padding tends to be studied because it is independent of any particular format/application and also part of several encryption scheme definitions. The definition of CCA2 captures very realistic scenarios - almost all applications do some parsing after decryption and so are quite likely to reveal an oracle. Would vCCA also capture such attacks?

MzxgckZtNqX5i · 2024-07-31T08:18:42.000000Z

While it might not provide a direct answer to your question, this paper could be an interesting read: https://eprint.iacr.org/2021/1624.

menkalinan · 2024-07-31T08:59:07.000000Z

I don't quite understand how the server can match the ciphertext with a value without knowing the key. How does the server determine that the ciphertext corresponds to the specific value? If the server constructs this ciphertext-value database, how does it know what algorithm to use to create ciphertext from a value and store on its side?

karulont · 2024-07-31T14:18:28.000000Z

Check my comment here for some intuition: https://news.ycombinator.com/item?id=41115179

Basically the server does not know, it just computes with every possible value. And the result turns out to be what the client was interested in.

motohagiography · 2024-07-30T23:40:11.000000Z

great to see this becoming part of mainstream tools. the question I have is, when a weakness is published in FHE, is it more like a hash function you can do some transformations on, but there is no 'decryption' to recover plaintext again- or is it more like a symmetric cipher, where all your old ciphertexts can be cracked, but now your FHE data sets are no longer considered secure or private and need to be re-generated from their plaintexts with the updated version?

what is the failure mode of FHE and how does it recover?

j2kun · 2024-07-31T03:31:48.000000Z

It is more like a symmetric cipher. Once you have a key you can decrypt everything encrypted with that key

motohagiography · 2024-07-31T14:52:14.000000Z

the risk in this is that FHE is proposed as a privacy protecting tech, and it will "squeeze a lot of toothpaste out of the tube" in private data sharing, where a weakness will be a rug pull under all the data subjects whose data was shared under the aegis of being "encrypted."

It's important to understand this failure mode, imo.

j2kun · 2024-07-31T17:55:36.000000Z

I don't see how a bad actor can "rug pull" when everyone has a different encryption key.

A cryptographic scheme weakness is a threat to all computing systems, it's not worse for FHE. It's just that FHE relies on newer encryption than existing widely deployed crypto.

motohagiography · 2024-08-01T00:43:57.000000Z

The main use case for FHE is to enable partial data sharing between parties who are forbidden from sharing data with one another, or for whom being in custody of it is too much of a liability.

the reason I think the threat is worse for FHE is because it's going to be used to share encrypted versions of private information that isn't shared today, and realistically, the only people exploiting cryptographic weaknesses right now are intelligence agencies, where in the case of a repo of FHE enciphered PII shared with a vendor is suddenly decryptable by that vendor. my point is it's not the same as other encryption for those reasons.

lsh123 · 2024-07-30T23:01:08.000000Z

If we assume that server is “evil” then the server can store both PIR encrypted and plain text phone number in the same row in the database and when this row is read, simply log plain text phone number. What do I miss here? We can send PIR request and trust server not to do the above; or we can send plain text phone number and trust server not to log it — what’s the difference?

karulont · 2024-07-30T23:41:03.000000Z

A very simple PIR scheme on top of homomorphic encryption that supports multiplying with a plaintext and homomorphic addition, would look like this:

The client one-hot-encodes the query: Enc(0), Enc(1), Enc(0). The server has 3 values: x, y, z. Now the server computes: Enc(0) * x + Enc(1) * y + Enc(0) * z == Enc(y). Client can decrypt Enc(y) and get the value y. Server received three ciphertexts, but does not know which one of them was encryption of zero or one, because the multiplications and additions that the server did, never leak the underlying value.

This gives some intuition on how PIR works, actual schemes are more efficient.

[Disclosure: I work on the team responsible for the feature]

lsh123 · 2024-07-30T23:47:54.000000Z

Does the server reads specific rows from spam numbers DB or the whole database?

karulont · 2024-07-31T00:02:05.000000Z

In this PIR model the server has to read the whole database, otherwise it would be easy on the server to see, that these rows were not accessed and therefore they are not the one the client queried.

In this PIR model the server runtime is O(n) where n is the number of rows.

To keep it practical, we do support sharding the database. Client leaks a few bits of hashed query to pick the right shard, where we process the entire shard. There is a inherent privacy-performance tradeoff: less shards = less leakage vs more shards = better performance & less privacy.

lsh123 · 2024-07-31T00:10:08.000000Z

Thanks for explanation. I will read the code to see how sharing works.

karulont · 2024-07-31T15:31:46.000000Z

https://github.com/apple/swift-homomorphic-encryption/blob/3...

Run SHA256 on the keyword, truncate the hash and take the modulus with number of shards.

jayd16 · 2024-07-30T23:19:13.000000Z

The server never gets the plaintext at all. It only ever receives encrypted data that it cannot read.

vlovich123 · 2024-07-30T23:24:06.000000Z

I think OP is talking about the set of “spam phone numbers” stored on the server and looking at side channels based on what data is looked up by processing the query.

lsh123 · 2024-07-30T23:25:06.000000Z

Exactly, plain text phone number in the same db row

jayd16 · 2024-07-30T23:33:10.000000Z

Not sure if its implemented exactly like this but one could imagine that the server must use the client's public key to encrypt the database into a blob it can't decrypt. The encrypted input is used to read the encrypted blob in a way the server can't understand to construct a result the server cannot understand. The result blob is sent to the client which can decrypt it.

vlovich123 · 2024-07-30T23:30:30.000000Z

Yeah but as I wrote elsewhere, the DB isn’t a KV store of plain text numbers and their encrypted representation. Instead the entire database would be encrypted and you’d do set containment operations in encrypted space which wouldn’t /couldn’t leak anything about your query (modulo unexpected side channels in the design).

I don’t know how they do this efficiently and at scale with lots of updates, but maybe this database is kinda small to begin with anyway and the updates are reasonably cheap to process relative to how many spam numbers are out there.

chipsrafferty · 2024-07-31T18:07:53.000000Z

That could only work if the server didn't have its own database copy. Not sure how a client would be able to provide the server with a database encrypted by the client.

If the server can decrypt it, it's not really safe if you're assuming server is evil

vlovich123 · 2024-07-31T20:20:20.000000Z

The database isn’t secret here. The server indeed has its own copy - it would have to otherwise what is the client query resolving against. What’s secret is which phone numbers are contacting the client. So instead of sending the phone number to the server, you send an encrypted version of the phone numbers. This encrypted version is then checked against the encrypted database. This prevents the evil server from discovering the phone number the client is checking.

If you read the docs, a perfectly valid implementation is an HTTP request that sends the unencrypted database to the client which then checks the numbers locally - it achieves equivalent security priorities. The advantage here is that the database can be large enough to make distribution less practical than just doing a lookup per number and that’s where the HE comes in.

Remember: evil in a security context means someone trying to actively circumvent your protection guarantees, but you’re making an assumption that the database needs to be secret when it may not as the privacy and security guarantees are about the client’s information. Apple isn’t necessarily saying the database is secret since it’s just “this phone number is likely spam”. Of course, it’s possible that the server itself can’t even generate a valid query. It’s possible Apple designed it such that the query has to be generated on a valid Apple device to begin with (since it has a chain of trust to each device manufactured).

j16sdiz · 2024-08-01T04:11:29.000000Z

The client don't have to do that.

That's the whole point of Homomorphic Encryption. There is a Wikipedia article for that.

https://en.wikipedia.org/wiki/Private_information_retrieval

lsh123 · 2024-07-30T23:49:42.000000Z

That’s not what I saw in the code but I didn’t spend much time so I might be wrong. I’ll check it more carefully later. But if this indeed is whole DB then it’s very limited use case.

vlovich123 · 2024-07-30T23:23:08.000000Z

It’s a lot more complicated because the phone numbers themselves are stored encrypted and there’s not a 1:1 mapping between encrypted representation and the mapping. So processing the query is actually blinding the evil server afaik.

lsh123 · 2024-07-30T23:25:37.000000Z

Evil server stores BOTH encrypted and plain text phone number in the same db row

vlovich123 · 2024-07-30T23:33:52.000000Z

There’s no single encrypted representation of a phone number. Rather the entire database is encrypted and the accesses performed by the HE algorithm would be randomly accessing the database in a way that wouldn’t leak anything. Now of course if you have billions of lookups a day maybe something does end up leaking because you’d be able to extract networks from data patterns (ie if the same number is contacting 100 numbers, the data access patterns might be the same) but it’s a lot more complicated than what you’re proposing and I wouldn’t be surprised if this is explicitly handled in the design of the feature.

attilakun · 2024-07-30T23:52:26.000000Z

Is there a good primer that explains the math basis of this?

j2kun · 2024-07-31T03:32:30.000000Z

https://www.jeremykun.com/2024/05/04/fhe-overview/

tombert · 2024-07-30T19:48:18.000000Z

I wrote some basic homomorphic encryption code for a hackathon like 8 years ago. When I interviewed for a BigTechCo [1] about a year later, the topic came up, and when I tried explaining what homomorphic encryption was to one of the interviewers, he told me that I misunderstood, because it was "impossible" to update encrypted data without decrypting it. I politely tried saying "actually no, that's what makes homomorphic encryption super cool", and we went back and forth; eventually I kind of gave up because I was trying to make a good impression.

I did actually get that job, but I found out that that interviewer actually said "no", I believe because he thought I was wrong about that.

[1] My usual disclaimer: It's not hard to find my work history, I don't hide it, but I politely ask that you do not post it here directly.

hansvm · 2024-07-30T21:23:57.000000Z

I had the same experience with Python's walrus operator [0] in a BigTechCo interview. After few times of the interviewer insisting I had no idea what I was talking about, I wrote it a different way. I can't imagine trying explaining something actually complicated in that environment.

It didn't hold me back from the job either. I like to believe the interviewer looked it up later, but I never poked into my hiring packet.

[0] It was useful at the time to have a prefix sum primitive. Ignoring annotations, something like this:

    def scan(f, items, x):
        return [x := f(x, item) for item in items]

tomlue · 2024-07-30T23:38:38.000000Z

This happened to me in a grant application. We had written a web application that did a homomorphic encryption based calculation of molecular weight to demonstrate that HE could be used to build federated learning models for chemical libraries.

Our reviewers told us that machine learning on encrypted data was impossible. We had the citations and the working model to refute them. Very frustrating.

hot_gril · 2024-07-31T17:55:49.000000Z

What was the end result? I was almost roped into a project like this, encrypted ML for biology applications. It was definitely possible, but it seemed too slow to be worthwhile. Other federated learning projects shut down because it was wayy more efficient on a single cluster, and that was without the HE tax. I also have no idea if you can practically do HE matrix operations on a TPU or GPU or CPU SIMD at least; presumably that's something the research would cover.

Then again I didn't test very much because they also wanted it to be the proof of work for a blockchain, a possibility that I didn't discount but also figured it'd be extremely hard and I wasn't the guy to do it.

hot_gril · 2024-07-30T22:31:30.000000Z

This is pretty bad. We learned in school how RSA works, which can be easily extended to show HME multiplication at least. I can't remember it off the top of my head, but I know it's possible.

hot_gril · 2024-08-01T19:15:48.000000Z

(And if I didn't learn RSA in school, I wouldn't take a strong stand on how it works)

refulgentis · 2024-07-30T23:02:52.000000Z

Something similar happened to me at my first(!) tech interview, with Apple's [REDACTED] team.

There was ~3 minutes left in the interview, and they asked me a difficult l33t code concurrency question that was trivially answerable if you knew a specific, but lesser known, function in Apple's concurrency library. [1]

I said as much, TL;DR: "hmm I could do full leetcode that requires X, Y, and Z, and I might not have enough time to finish it, but there is a one-liner via a new API y'all got that I could do quick"

They said go ahead and write it, I did, then they insisted I was making up the function -- slapping the table and getting loud the second time they said it. Paired interviewer put a hand on their arm.

Looking back, that was not only a stark warning about the arbitrariness of interviews, but also that going from dropout waiter => founder => sold, then to Google, wasn't going to be all sunshine and moonbeams just because people were smart and worked in tech too. People are people, everywhere. (fwiw, Apple rejected w/"not a college grad, no bigco experience, come back in 3 years if you can hack it somewhere else". Took Google, stayed 7 years)

[1] https://developer.apple.com/documentation/dispatch/3191903-d...

77pt77 · 2024-07-31T00:38:38.000000Z

> he told me that I misunderstood, because it was "impossible" to update encrypted data without decrypting it. I politely tried saying "actually no, that's what makes homomorphic encryption super cool", and we went back and forth; eventually I kind of gave up because I was trying to make a good impression.

The moment you have to explain yourself you've already lost.

No argument you make will change their mind.

They are just stupid and that will never change.

And never forget, these people have power over you.

hot_gril · 2024-07-31T17:51:41.000000Z

It's not stupid to intuitively doubt HME and ask for an explanation if you've never heard of it before, but to argue that it's impossible without knowing anything about it, yeah.

jancsika · 2024-07-30T20:18:30.000000Z

Digression-- this is a good example where the mumbo jumbo that anarchists buzz on about applies in a very obvious way.

You were literate in that domain. The interviewer wasn't. In a conversation among equals you'd just continue talking until the interviewer yielded (or revealed their narcissism). The other interviewers would then stand educated. You see this process happen all the time on (healthy) FOSS mailing lists.

Instead, you had to weigh the benefit of sharing your knowledge against the risk of getting in a pissing contest with someone who had some unspecified (but real!) amount of power over your hiring.

That's the problem with a power imbalance, and it generally makes humans feel shitty. It's also insidious-- in this case you still don't know if the interviewer said "no" because they misunderstood homomorphic encryption.

Plus it's a BigTechCo, so we know they understand why freely sharing knowledge is important-- hell, if we didn't do it, nearly none of them would have a business model!

ChadNauseam · 2024-07-30T20:27:46.000000Z

In my experience this comes up a lot less often when people are paid to be empirically right, and the most annoying arguments occur when no one has an interest in being right and instead wants to defend their status. e.g. try telling a guy with his date nearby that he's wrong about something irrelevant like how state alcohol minimum markups work. An even more common scenario is when someone is passionate about a political topic and they publicly say something incorrect, and now would look like a fool if they admitted they were wrong. Sometimes I worry that a post-money future would become entirely dominated by status considerations and there would be no domain where people are actually incentivized to be right. Do you know if there's any anarchist thought related to this topic?

bawolff · 2024-07-30T23:22:30.000000Z

That does kind of make sense though - if you are paid to be right but someone doesn't believe you, you are still getting paid, so what does it matter?

ChadNauseam · 2024-07-31T19:18:22.000000Z

I was referring to the situations where being right directly means you make money - right about price movements, right about what users want, right about whether oil is present in a particular digging location, etc. In those cases you only get paid if you actually are right.

lcnPylGDnU4H9OF · 2024-07-30T20:30:51.000000Z

> the mumbo jumbo that anarchists buzz on about

I enjoy exposing myself to new-to-me opinions. Do you know a decent anarchist blog/vlog to dip my toes into this area?

mhitza · 2024-07-30T21:28:55.000000Z

Not OP, nor do I understand what he's referring to, but https://theanarchistlibrary.org/special/index is a good starting point.

317070 · 2024-07-30T22:20:42.000000Z

"The Utopia of Rules: On Technology, Stupidity, and the Secret Joys of Bureaucracy", by David Graeber might be good for this one, though some of Graeber's other books also apply.

saagarjha · 2024-07-30T20:42:11.000000Z

> In a conversation among equals you'd just continue talking until the interviewer yielded (or revealed their narcissism). The other interviewers would then stand educated. You see this process happen all the time on (healthy) FOSS mailing lists.

Yeah, what actually happens is that both parties think they are right and keep yapping until someone "yields" by being so fed up that they don't want to argue anymore. Everyone else watching learns nothing.

bawolff · 2024-07-30T20:46:28.000000Z

> You see this process happen all the time on (healthy) FOSS mailing lists.

In a FOSS mailing list, someone would hopefully just link to wikipedia.

No amount of arguing is going to resolve a duspute about definitions of terms.