Thanks for the enthusiasm and insights - and the toolkit of course! :-) > fundam...

betterunix2 · on Aug 3, 2020

"In other words I think you can bail early without revealing too much, with a suitable data encoding."

Maybe, but whatever that is would not be encryption. An FHE scheme is still a public key encryption scheme, so it should not be possible to compute any predicate over the message without having access to the private key. The homomorphic property allows you to compute a new ciphertext that will decrypt to the result of a computation, but that new ciphertext will still completely hide the plaintext.

Basically the reason you cannot "bail out early" is that FHE does not allow you to see an indicator value to tell you when to do so -- any such value would itself be encrypted. If it were not, then it would reveal "something" about the inputs; coupled with the fact that the inputs can always be modified (by e.g. changing the function that is being computed), that would be a complete break.

The only good technique we have right now for addressing this problem is to use some kind of interaction between multiple parties, which allows sublinear random access and can allow branching (and thus early loop terminations etc.). There are cases where that works well and there are many real-world use-cases of those techniques (I work on this for a large tech company). The tradeoff is that you need to do a lot of communication, which is typically more expensive than computation, and if too many parties are offline the computation will not proceed (in the two party case nobody can be offline). You also need to trust the parties not to collude with each other, which is sometimes a concern in practice.

jlokier · on Aug 3, 2020

> Basically the reason you cannot "bail out early" is that FHE does not allow you to see an indicator value to tell you when to do so -- any such value would itself be encrypted. If it were not, then it would reveal "something" about the inputs

It depends what you're doing.

If the idea is to search for a value in an encrypted list to see if it's there, or lookup a value in a key-value store, that's similar to a database query. Unencrypted, that can be performed in a hard-bounded sublinear time, so encrypted, the steps can also be sublinear without revealing content to the processor, if there is a way to calculate the addresses in each step homomorphically, i.e. without the processor knowing the meaning of those addresses, just calculating them from data it sees.

That's where the encoding comes in, storing the data in such a way that knowing individual encrypted addresses doesn't reveal much about the data, and depending on what properties are important to you, also in such a way that access pattern structure such as repetition is also hidden because the addresses for encrypted access are not repetitive. This requires the encoding to do some level of data diffusion.

Branching is also not ruled out if the processor can't determine whether a branch has occurred. That means although the processor cannot determine when a search is short-circuited, the result of the search may still be used "early" by later stages in the homomorphic calculations, provided this does not look like a branch to the processor. Think of predicated instructions.

> The only good technique we have right now for addressing this problem is to use some kind of interaction between multiple parties,

That's not what I have in mind, but there's some similarity. I imagine each memory having some dedicating processing on board, just to reduce communication overhead during writes, which are heavy due to data diffusion.

betterunix2 · on Aug 3, 2020

> if there is a way to calculate the addresses in each step homomorphically

Calculating the addresses homomorphically means that you will have a ciphertext that decrypts to the address, not an actual address.

> storing the data in such a way that knowing individual encrypted addresses doesn't reveal much about the data

As I pointed out in my earlier reply, with FHE if you reveal anything then you are actually revealing everything, so what you really need to say there is that you do not reveal anything about the data.

Beyond that, what you are describing sounds like oblivious RAM (ORAM), which is a well-studied technique that cannot be applied to FHE. Basically an ORAM has two levels of addresses, the "inner" address for a flat array you are simulating, and the "outer" address that you are actually working with to traverse the data structure. The outer address reveals nothing about the inner address, but to ensure that this is true you have to update some kind of state after every access (otherwise you could use the outer address to determine if two consecutive read operations retrieved the same data). It is also a requirement that the outer addresses are available as plaintext, and as I said, FHE only allows you to compute more ciphertexts.

Fundamentally the challenge with FHE is that the processor is free to compute whatever, as many times as it wants, and so nothing can be revealed except for ciphertexts. Introducing interaction ensures that the computations are limited (by other parties that are not colluding with the attacker), which is why it is possible to have things like ORAM (because you can ensure that the ORAM state is properly updated etc.) and branching. Interaction also makes it possible to talk about "controlled leakage" since you can limit the number of operations that leak data. You are correct that an ORAM can reduce communication among the parties in an MPC protocol, but it is not a silver bullet -- ORAM overhead is non-trivial and asymptotically it will always be worse than computation on plaintexts.

jlokier · on Aug 3, 2020

I'm not thinking about ORAM - as you point out, it reveals access patterns.

Thus the reference to data diffusion, i.e. inner words are not stored at a single physical outer address, they are mixed with other data, and the outer addresses used to interact with memory state vary for the same inner address, without the RAM having any secrets.

The outer addresses don't need to be plaintext, but they do need to satisfy relationships with each other that are algebraically useful yet not discernible without a secret.

> with FHE if you reveal anything then you are actually revealing everything

Is this not subject to cryptographic vs information-theoretic hardness arguments: If I reveal to you 2^-256 bits of data per operation (i.e. astronomically close to zero), we'll call that good enough because you can't do enough computation to matter?

betterunix2 · on Aug 3, 2020

"The outer addresses don't need to be plaintext"

Yes they do, because that is what you will actually be accessing on the physical system. Call it what you want, but at some point you need to have a physical address that you will read from for any computation to proceed.

"If I reveal to you 2^-256 bits of data per operation"

Sure, but that is not going to help you get around the fundamental need to simulate random access by performing linear scans when you are using FHE. Think of it this way: you need to read some physical addresses, somehow, regardless of how you encode your data. If you only revealed a vanishingly amount of information about the physical addresses to access, then you would not have given the processor enough to do anything more efficient than a linear scan.

Now assume the processor could somehow access fewer locations than a linear scan accesses, which implies that it does not access all physical addresses. The processor can try to query all logical addresses (since it must store that much data anyway it should be able to enumerate them), and choose two such that one accessed a particular physical address and another did not (this must exist, because everything is being accessed -- if somehow such an address could not be found then all queries will be linear scans). The two logical addresses must differ in a particular bit position. What the processor can do is to take a given ciphertext, homomorphically compute its individual bits, and then insert each bit in the position where the two addresses differ -- and then use the access pattern to determine the value of each bit.