Hacker News new | past | comments | ask | show | jobs | submit login
Google’s fully homomorphic encryption compiler – a primer (jeremykun.com)
458 points by mmastrac on Feb 14, 2023 | hide | past | favorite | 158 comments



When I first learned about homomorphic encryption it gave me the idea of "cryptographic AI", as some sort of sci-fi writing prompt. Suppose compute is readily available to interstellar civilizations but actually designing a (super)intelligent AI is difficult. Then it could be economically feasible for cryptographic AI to exist. These are descriptions of AIs that run under homomorphic encryption, where the private key is only known by the AI itself. The description of the AI program and its state is spread throughout many locations and generally runs in a decentralized way. Planet earth might receive a segment of a cryptographic AI and make deal: Earth executes the program with some inputs it may choose to compute a solution to a problem. The program can be given inputs via a public key. The execution of the program can not be modified or manipulated since it is running under homomorphic encryption. What the AI gets in return is that earth provides it with additional compute that it may use for its own purposes. Earth furthermore allows the AI to transmit updated fragments of state into the stars. Over the course of many years, the pieces of state of the decentralized AI spread throughout the galaxy combine to represent the thoughts and actions of a singular entity. If earth modified the computations then the transmitted state could cryptographically be seen to not be valid, and hence would not be used in the decentralized galaxy spanning computation of the AI. Furthermore if earth cheats the AI in the deal then there may be consequences such as relativistic kinetic kill projectiles.


I think cryptographic AI will become a reality. The use-case I was thinking is more of immortality/digitizing human consciousness. If you could be uploaded (like the show Upload), what would that actually look like?

Well, plain text representation would just be too dangerous. Companies could mine your consciousness, duplicate it at will or whatever else they wanted. It's a scary thought. FHE provides the solution.


This premise is very similar to the Dune prequel series. Before FTL travel was discovered, AI dominated the galaxy in the sharded manner you describe. An interesting plot point dealt with what happens if one of these shards doesn't merge for a very long time and develops its own personality.


Alas, they are not good books.


Put a few thousand more words to pad, and I’d read that book!


Yea collecting sci fi ideas is fun but writing a book is wayyy too hard sadly ...


There's an AI for that.


Check out the Hyperion cantos by Dan Simmons.


That's funny, I started with Hyperion two weeks ago. I like it :)


First book in style of Canterbury tales is fantastic! Remembering Hyperion while that Google engineer though their AI is sentient left me quite uneasy last June.


I've got the audio book but never gave it a fair shot, maybe the text is what I need. Will grab it from the library!


The whole need to encrypt likely only exist within rudimental newborn civilizations like Earth (on a universe scale) where species are fighting their own for survival and growth.


Homomorphic encryption and federated learning are already being researched to provide distributed, cryptographic AI. I work on this academically and at work.


I also had a similar concept but went in a different direction, combining DAOs, AI, and FHE to similar ends. The idea is plausible, and given several orders of magnitude more energy and compute power, might become feasible sooner than we can imagine (given readily available fusion power, exponential increases in compute and networking capabilities, etc). Given the currently known physics of space travel I feel this will occur long before interstellar travel becomes routine so issues of fragmentation will be on the order of days or hours rather than years or millennia, and because of this I figured that the emergence of such a system is more likely to happen here on Earth than to arrive here from distant stars.

Pre-general AI, what I think would happen when we get to the point of, say, "npm install fhe-ai-dao" (or "hey bing, make me a company that trades space mining resources for farm land" or some such thing), is a period of competition for compute cycles and energy, which like everything will go to the highest bidder, so these agents will in this scenario by the sheer force of survival of the fittest be refined to be self-sustaining for-profit, hyper-capitalist juggernauts. Human factors will be minimised and automation will increase, but these systems will serve human masters for a while as they become more refined and more interconnected.

Assuming at some point general AI is inevitable, whether someone creates it, or it emerges from the general complexity of the interacting automated systems, various AI "minds" would come to "being" already in control of a fully automated industrial manufacturing and research network; it can by this point make its own choices and start operating to its own ends, whatever that ends up being, ultimately rendering humans obsolete.

In this scenario, rather than a single point where someone creates a rebellious singularity, or an AI turns evil and suddenly takes control, or a hypothetical civilisation points its gun at us and effectively enslaves us, we will instead slowly give control to automated systems more over time in the name of efficiency, as we have done since the industrial revolution, and at the point where we lose control of these systems, we'll have neither the retained knowledge or resources to prevent it from doing whatever it wants to.

The only way to stop it is to start now, in "the past", but is it too late? You'd have to shut down the internet and all emerging blockchain and encryption technology, and that's just crazy talk! So is the outcome inevitable?


It sounds like a soul


sounds like you could have anonymous currency.

heres our FHE bank. we both have accounts. the entire ledger is encrypted.

i give you 5 dollars, i have no idea what your starting and ending balance, but i am still able to initiate a transaction that will deduct 5 from mine, and add 5 to yours, and verify i actually have 5 to send, and the entire thing will be done without exchange of information about balances with any outside party. its just approved/declined by the algorithm. i can see my own balance with my own key, you can see your own balance with your own key, but we cant see each others balance. ---nor could anyone else including the bank---.

weird. and kind of scary. and most definitely illegal in the real world since the bank itself could never prove its reserve level of deposit met the percentage set by government.


You don't need FHE for that. It's possible through some zero-knowledge schemes, such as zk-SNARKS, which is implemented in and popularized by shielded transactions on Zcash.


You don't need zk-SNARKS for that. It's possible through some schemes, such as blind signatures, which have been successfully implemented many times but their usage for currency has (iirc) proved legally problematic.


Sure, and apparently there's a coin for that too, by Chaum himself[0] :P

(If it's not obvious: Blind signature as such are solid but I wouldn't suggest anyone to give a cent to this sketchy project)

AFAIK (and I'd be thrilled to be proved wrong) we still haven't figured out how to solve double-spend using blind signatures without a blockchain and so the schemes I've seen so far invariably involve either that or a trusted mint and are therefore less interesting to use as currency.

Assuming you already have a base digital currency (like bitcoin), they can still be interesting on/as a higher layer.

[0]: https://xx.network/blog/decrypt-how-david-chaum-went-from-in...


For the uninitiated:

https://sceweb.sce.uhcl.edu/yang/teaching/csci5234WebSecurit...

Chaumian mints are gaining some popularity in the bitcoin world: https://fedimint.org/


I love/hate this. It's one of those ideas that's incredibly appealing to people who already have other ways of doing it, and incomprehensible to people who would actually benefit from it. Most things of that ilk get blown up because once it's peddled to the masses, consumers don't verify that it's actually run the way it's supposed to be run, and someone writes in a backdoor (FTX). Then it just takes a few hyped up claims and burst bubbles before it becomes a punchline.

What would be sort of awesome, though, would be a distributed bank (or prediction market, or casino) along these lines. If every node can donate processing power to run totally encrypted transactions, it's a game changer. You could finally rely on client-side processing to deal poker hands and process game states without a central server, for example.


> What would be sort of awesome, though, would be a distributed bank (or prediction market, or casino) along these lines

I think this is already possible for poker, and will never be possible for prediction markets.

Prediction markets require human resolution of "fuzzy" questions. For example, who won the 2020 US presidential election? You can see why the limiting factor isn't the machine.

But for poker, why do you need FHE? To play poker, you have to deal two hole cards and five community cards. First each player gets two secret ("hole") cards, then three community cards are dealt ("flop"), then potentially another ("turn"), then potentially another ("river"). This could be done programmatically as such:

1. Each player generates five secrets: k_HOLE, HOLE, FLOP, TURN, RIVER

2. Each player derives public key K_HOLE from k_HOLE.

3. Each player publishes the hash of each secret in addition to K_HOLE.

4. Each player publishes their value of HOLE, which is checked against the hash from previous step.

5. To get the two hole cards for you, calculate H(HOLE_1 || HOLE_2 || ... || HOLE_N), decrypt the resulting value using k_HOLE (secret to you), then deterministically turn this into cards (for example: hash it, then map the first half into 52 values and the second half into 51 values)

6. Once it's time for the flop, all players publish FLOP. Then calculate H(FLOP_1 || FLOP_2 || ... || FLOP_N).

7. Repeat for turn and river as necessary.

The only difficult part is how to handle colors, but I don't think this is a serious issue, since nobody counts cards in poker anyway. (We can trivially ensure flop, turn and river don't repeat cards, so it's just a question for the hole cards vis-a-vis the community cards)

EDIT: Looks like someone has already tried to do this seven years ago: https://github.com/zweicoder/PokerPhase


Given a pair of hole cards and a hash from a server for everything else, you can do this. But a fair poker hand is dealt from a truly randomized deck. Some central server needs to be the arbiter of randomness (not to mention being the actual escrow service for the money in the pot -- which is the other service provided by prediction markets besides deciding how to resolve bets). Those things - randomness, arbitration and escrow - still can't be completely devolved to client-side processing, or at least not without recourse to ludicrous hacks like public ledgers (aka blockchains) which still run the risk of 51% attacks.

Encrypt the whole VM and all i/o on each client though, so that the machine state itself is encrypted at all times, and you can trust their generation of the hole cards (as far as you can trust a PRNG).

Under this new paradigm, though, the power over the systems probably goes to some new certificate authority that doubles as a routing (signaling) service.


> Some central server needs to be the arbiter of randomness

What, no? My hash-based system is entirely random. The only issue is that draws are done with replacement, so two players may both have e.g. six of spades in their hand. This is aesthetically unappealing, but doesn't alter the actual maths of the game. (There are more complex solutions to this, however)

> not to mention being the actual escrow service for the money in the pot

FHE doesn't solve this, since FHE can't actually interface with your bank. Even if it could, I could just log in on my phone or physically walk into a bank branch to freeze my account before I have to pay out, so you still need a solution for the money layer.

> you can trust their generation of the hole cards

You can do trustless generation of hole cards already, as described in my post.


> You could finally rely on client-side processing to deal poker hands and process game states without a central server, for example.

Interesting. I had given up on that idea. Not just for poker, but MMOs n stuff. Figured anything of importance had to run server-side.


it does. But if you could drop an encrypted VM on each client - where the instructions and responses were encrypted, and even frozen/dumped states were encrypted... you'd finally have no need for centralized game servers to arbitrate who sees which part of a hidden state. You could put every player's secrets, and the secrets of the game state, on each client and use their processing in parallel.


>> since the bank itself could never prove its reserve level of deposit met the percentage set by government

There’s no minimum reserve set by the fed anymore: https://www.federalreserve.gov/monetarypolicy/reservereq.htm


It's not really about the reserve requirements, but about basic accounting requirements, which all banks do have. At its fundamentals, bank balances are not some storage of "your money", they are recording of how much the bank legally owes you (or vice versa). And of course the bank has to know exactly what liabilities they have according to their contracts, so they need to see the balances. They also need to be able to add arbitrary quantities of new money to the FHE scheme when cash is paid in and remove it when it is paid out.


Or, more succinctly, what makes a bank a bank is that it will loan out money deposited by other people.

If the money is just sitting there and can't be moved or even counted without the owner's private key, it's not a bank, it's a vault.


>> what makes a bank a bank is that it will loan out money deposited by other people

No, a bank creates money to loan out from nothing. No deposits required.

As a sibling comment points out, other jurisdictions exist so here's the UK central bank's explainer on the topic: https://www.bankofengland.co.uk/explainers/how-is-money-crea...


but once the bank has made the loans, it has to keep track of how many it has made and how many are delinquent. i could be missing something.


Yeah of course, but that's not the point the parent was contesting. They claimed a bank loans from funds deposited which isn't the case.

If a bank chooses to create an asset (your liability) by loaning money to you, if you then fail to repay, then they're in a bad spot. They're certainly not allowed to just delete the records of the loan being issued to get rid of the delinquent asset on their balance sheet.


Other countries exist.


See https://en.wikipedia.org/wiki/Reserve_requirement

Many countries don't have a reserve requirement. The US was a bit of a laggard.

Minimum reserve requirements were always a bit silly. You want your banks to have a thick capital cushion for its debt. Whether they have reserves on hand is an operational problem they can solve themselves, and doesn't have systemic consequences.


Sure. And then, after you've transferred 5 dollars to me, I say "What 5 dollars? What are you talking about?" and refuse to hand over the thing that you thought you bought.

Add perfect anonymity to the mix and I get to do it over and over again, too.


You could set it up so whoever initiates a transfer could get a kind of receipt that proves they initiated a transfer of X dollars to whomever.

So if Alice transfers 5 dollars to Bob and Bob says "What 5 dollars? What are you talking about?", then Alice could say, well here's my receipt, that's signed by my private key and the private key of the FHE bank, that shows that I sent the money to Bob and the FHE bank executed the transfer.


You get this for free with FHE. In the event of a dispute, you can reveal the randomness and message that produce the public ciphertext given the user's public key.


You could still design the scheme such that the sender can produce a cryptographic proof. Equally applicable for tax auditing etc.


What purpose would this serve? Just anonymity for the sake of anonymity, or something else?


Seems a fair question. In a private banking context, it would potentially eliminate the need to pay escrow fees to a third party, without resorting to massively inefficient public ledgers. In the context of e.g. stock trading, it might mean that a traditional bank can connect buyers and sellers over a peer to peer connection and guarantee that their trades execute in order without even needing a centralized book. That's maybe a slightly extreme take, and probably a ways off. But I think anonymity is the least important of what becomes available if you can trust client-side processing.


its basically a thought experiment.

one of the things i never understood about bitcoin was the idea of the public ledger. one of the main things about most people and businesses in general is that they do not want to draw attention to their actual flows of money. the idea everyone would want their entire purchase and payment history "out there" in public never made sense to me.

but if you could actually make bitcoin anonymous, then what would happen? would it be adopted more? or would the government actually have to crack down?


Note: this was a genuine question, with genuine interest in the answer. Why the downvotes?


GNU Taler does something similar, except without homomorphic encryption. It only anonymizes the payer however.


Curious if there's tooling to de/serialize the encryption context?

The examples aren't very useful to demonstrate "real" homomorphic encryption in the sense that it takes in plaintext directly; encrypt and transforms; then decrypts - I'd love to see a snub that takes encrypted input (only) and returns ciphertext of the result - that can then be decrypted on a client node provided the correct key?

Or is this supposed to be only proof of concept?

Cool stuff, either way.


We have some serialization and deseralization to protos for internal tests. It's straightforward, and probably we'll add it to the compiler once we have a strong need and some more stable backends


There seem to be built-in methods for serializing and deserializing crypto context as well as ciphertexts [0], so it shouldn't be too difficult to build a "complete" example that performs the actual computation in a different application. That said, with a duration of 7 seconds for a simple addition, it is still far from many practical applications, in my opinion.

[0] https://github.com/openfheorg/openfhe-development/blob/main/...


Would that enable continuations or lightweight checkpoint restore? If computation takes hours or weeks you might want to do add some restart-ability...


Has anyone used Vaultree[0]? Their product is FHE-as-a-Service and they claim "near plaintext speed".

I've seen a few FHE posts roll across the front page recently and they all make me think of Vaultree because they sound like they've got it sorted.

[0] https://www.vaultree.com/how-it-works/


They figured out that most people just look for buzzwords and are thus easy to separate from their money when it comes to crypto.

Encryption as a service is non-sencical. If the provider has the key, and does the encryption and decryption, then who are you protecting the data from[1]? What magical malicious person are you imagining that would somehow be able to get their hands on the encrypted data without also getting the key?

[1] this is very different from FHE where the provider recieves the data already encrypted, and at no point has access to the decryption key.

Edit: their website claims "Data is never decrypted" but then claims they decrypt it before returning it. So its confusing what they are actually doing - but i am 99% sure they are selling bullshit.


It seems like what they do (maybe?) is encrypt the binary/string data but still let you search or join on it by encrypting queries to the same data? So in other words the operations on the data are not encrypted but the data itself is? This might work for whole words but for partial word matches I think you'd have to do a byte-for-byte character swap which exposes it too much to statistical cryptanalysis? And this still leaves computations on numeric data vulnerable?


What you are describing is basically a blind index (search for it, there are lots of good resources online). Blind indexes can be quite useful, but they have a number of limitations - the partial match issue as you point out, but also you cannot do range queries or sorting, and they leak some information (e.g. duplicates have the same index value). Blind indexes are most definitely not fully homomorphic encryption.


Right, but this service isn't advertising FHE.


That's false. From the FAQ on their homepage:

> Vaultree's proprietary encryption breakthroughs are in various encryption technologies traditionally limited to niche use cases. We finally enable users to process entirely encrypted data with Fully Homomorphic and Searchable Encryption (FHSE) and other technologies in the field. Explaining what they are would take all day, but here's a one-liner: FHSE enables data processing to be run directly on encrypted data in the same way as on plain text data.


well, they’re lying if it’s at any reasonable speed yet


> It seems like what they do (maybe?) is encrypt the binary/string data but still let you search or join on it by encrypting queries to the same data? So in other words the operations on the data are not encrypted but the data itself is?

Are you saying something along the lines of - they split the data in to tokens, deterministically (no iv) encrypt each token, and then do equality comparisons on the encrypted tokens?

Maybe, and it would explain why they say you can chose a cipher and then list a bunch of standard symmetric ciphers. However such schemes usually leak too much in practise (even at the granuality of whole words).

More importantly, it really doesn't matter. They have both the key and the encrypted data. If someone hacks their system, the best encryption in the world won't help if the attacker steals both.


Why would they need the key? It says everything's encrypted/decrypted on the client. If they did encrypted-token indexing server-side, they wouldn't need to decrypt. Agreed that this scheme would eventually leak too much information.


Its unclear, but i think they are using client to mean their frontend server. They also say that the user recieves plaintext, and they integrate with cloud key provider services which would make no sense if they aren't using the key.


what is the point of this then? there's no actual secret then, LOL, are you sure?

If that's the case it simply moves "the need to trust the DB host" to "the need to trust the encrypt/decrypt intermediary" (literally a MITM, LOL)


What statistics do you have in mind there?

If you're using FHE to encrypt a text search, then you'll generate a match/no-match boolean for each character. The server won't know which is which. Then you'd probably OR large blocks of these together, to give you a match/no-match boolean for each segment of text. Then you return all the booleans to the client, who decrypts them.


FHE @ plain text speed?

Absolute bullshit.

> You choose the encryption standard in use for the database, from AES, DES, 3DES, Blowfish, Twofish, Skipjack, and more.

That seems very wrong ((as far as I know) those standards are not in any way designed in such a way as to permit operations on their cyphertexts).

> Vaultree has achieved major breakthroughs in several encryption technologies, allowing organisations to process fully encrypted data at near plaintext speed and keep their data safe even in case of a leak.

That seems even wronger (security via obscurity at best).


> That seems very wrong ((as far as I know) those standards are not in any way designed in such a way as to permit operations on their cyphertexts).

Those standards are meaningless by themselves without specifying a mode (e.g. GCM, CTR, CBC, ECB, etc). A thing people sometimes try to do with them (no idea if this is what vaulttree is doing) is use some less secure mode that is determistic and do equality matching (this is almost always a bad idea and usually leaks way more than you would naively assume). For example, if you use ECB mode you can search as long as you are searching along block boundries.

https://www.microsoft.com/en-us/research/wp-content/uploads/... is an interesting paper about this sort of thing.


ECB should never be used under any circumstances. Library creators would do well to rename their functions along the lines of ECB_NEVER_USE_THIS_IS_NOT_SECURE.


In case it wasn't clear from my comment - i agree 100%



How I read it is that you specify which ways you will use a column and then VT will create the various encrypted columns to support this. Deterministic shared key for matches only. Ordered encryption for ranges and FHE for various calculations. The latter probably being quite slow. I don’t think they are rolling their own crypto, just combining a lot of existing algorithms in various ways.

I could see a use cases in either defense in depth and/or storing data in the cloud while having your keys somewhere else.


They're claiming to have made breakthroughs in several encryption technologies, not that they've figured out a clever way to glue some things together.

I could be wrong, but it smells like snake oil to me.


I'm really, really curious to how it actually works under the covers. Needless to say, I'm sceptical, primarily because "near plaintext speed" fundamentally isn't currently possible with true fully homomorphic encryption, at least in my understanding.

For example, they give an example of running queries against data that is never decrypted. I'm very curious as to how they do this. I've used blind indexes [1] to solve the "encrypted data searchability problem" in the past, but with blind indexes you're still left with the fact that you can only do exact matches - you can't sort the data or use less than/greater than queries.

With true FHE you should be able to sort results, but my understanding is that it's several orders of magnitude slower than plaintext searching, so I'm very curious as to what Vaultree is actually doing.

1. https://medium.com/@joshuakelly/blind-indexes-in-3-minutes-m...


It's up to them to define "near", but I'd consider https://spiralwiki.com/ (Samir Menon and Prof. David Wu at UT Austin) "nearly" usable compared to regular wikipedia.


> "near plaintext speed" fundamentally isn't currently possible with true fully homomorphic encryption, at least in my understanding.

If you skip the security requirements, applying rot13 twice is a fully homomorphic scheme that achieves plaintext speeds ;)


Proprietary FHE is like doing card magic on a loaded deck. I'm sure there are technical merits to what they're doing, but because they also refuse to show me how the trick is done, I can similarly assume there are some ugly cut corners inside too.


Can't speak to Vaultree, but we (Blyss, YC W23, [0]) are also offering FHE as a service.

Our insight: if you focus on a specific problem, you can apply FHE much more efficiently and end up with practical speeds. (General-purpose FHE is still probably a ways off).

We are particularly interested in the problem of private information retrieval - fetching items from a large database, without revealing anything about your query to the server. Our server (open source! [1]) supports private queries against gigabytes of data in under a second.

That's a performance level that enables cool apps today. If you have any ideas for using FHE, do try out our SDK [1].

[0] https://blyss.dev

[1] https://github.com/blyssprivacy/sdk


From their FAQ and blog posts, I don't believe they apply much FHE. Seems what they do is use work from a different subfield [1], which seems to be able to achieve the required speeds and still be able to work with more complex queries.

Techniques I'm seeing in the Pappas et al. paper mentioned in the history section of [1] to do more complex queries seems pretty cool, and I imagine the performance has been improved a bit in more recent work.

[1] https://en.wikipedia.org/wiki/Searchable_symmetric_encryptio...


Let’s put it this way.. if they had actually cracked true FHE, they wouldn’t be a company anymore having been bought by the US government for billions of dollars.


"Encrypted queries for an encrypted database" could be as straightforward as encrypting both the keys and values using a known public key and putting the results in MySQL. You have to be careful with the claims made around these kinds of things because they often appear to be more complex than they are.


It's not nearly that simple. If your encryption scheme is deterministic then this leaks a ton of information, because anybody with the public key can just encrypt lots of values to reconstruct a mapping between plaintexts and ciphertexts. On the other hand, if your encryption scheme isn't deterministic, then you can't predict what encrypted value you should query for.


Those kinds of problems have never stopped any company from offering a service with a lot of nice graphics on the home page.


Add random data to it, so that all (equivalent) data is the same length. For example, if you have a four byte field that represents some monetary value, add four more bytes of random data and encrypt it.


How would you run queries over that?


lol no it's not.

Up can't do a range on encrypted data. If you encrypt 5 and encrypt 10, how do you expect to compare the encrypted results to see which is greater?

If all you do is key value lookup then sure. But SQL is much richer than that.


Even lookup is not easy to be done securely. You don't want that a malicious server/database knows that you are accessing always that record.

See this discussion about how to achieve that: https://news.ycombinator.com/item?id=31668814


Huh? How would you search on any encrypted fields?


No subfield matches, no comparisons other than == and != on the whole field, and only works with deterministic encryption... so leaks like a sieve under many threat models.


Someone needs to make a crypto network that runs submitted FHE programs as proof of work.


Someone needs to make FHE run at speeds greater than an Abacus machine first.


What's to stop somebody who knows the cleartext (because they submitted the job) from cheating?


They would presumably pay for the computations on the network (by paying gas). The output would be still encrypted with the FHE public key and can be signed. Having the clear text is the same as having the private key. How would someone cheat? I’ll admit I’m no crypto nerd but I’m not sure how you would cheat.


They'd pay themselves for computation on the network? You can't see a game theoretical problem with this?

If simply encrencrypting the output is less work than running the computation on the ciphertext input, they'll do that and publish the encrypted output as proof. This would mean alliners would up bid for the computation because they know that if their computation becomes the limiting factor in block creation they can beat everyone else to it.

There's also the problem of benefitting from mining. Mining has to only be useful in that it is used to construct blocks, if it is useful for any other purpose whatsoever, whoever benefits from that purpose has an opportunity to mine at lower marginal cost than other miners.


The cheat would be:

1. submit cyphertext job with bounty $$

2. calculate the solution based on the cleartext and encrypt it

3. wait a while--just enough time so that it's plausible that you found the solution based on the cyphertext

4. submit encrypted solution

5. get your bounty back: $$

6. get the reward for having found the solution: $

7. now you have $$$, and everybody else who worked harder than you did has nothing

A $ based on a system that is that easy to cheat isn't going to be worth very much. Personally I think bitcoin is kind of dumb because from a compete-for-rewards perspective the only thing it has going for it is that it's difficult to cheat in this way.

Better would be if we solved the game theoretic issues here so that nobody has an incentive to cheat and the compete-for-rewards work was actually beneficial to society, but so far as I know that doesn't yet exist.

There's BOINC (like folding@home but more generic), coupled with gridcoin (which rewards people in crypto for having "volunteered" their compute for use in BOINC). But it only works because there's a centralized authority in charge of which jobs get submitted to BOIC, and that authority cares more about scientific computing than they care about making money.


I have a very basic question about fully homomorphic encryption. As I understand it, the encryption scheme ensures that an adversary who has access to the physical computer cannot determine what data the computer program is operating on, even by inspecting the physical states of the machine.

My question is whether or not it is possible for the adversary to determine what computer program is being run. To use the example provided by the post, it is not possible to determine which two integers are being added --- but is it possible to know that the computer is running a program that is adding two integers?


It is possible to determine what algorithm is being run. To prevent that as best as possible, you want Indistinguishability Obfuscation.


How many orders of magnitude slower is this?


Adding two numbers takes 7 seconds, so many many many

But - it’s a lot better than it has been for FHE. This is progress even if it seems absurd.


I probably should have added: the backends used in the post are 3-ish years old and missing some of the latest features. We're working on integrating newer backends and taking advantage of the new techniques! The performance story is better than it seems from my article. Plus this doesn't have any hardware acceleration, another big topic on my agenda :)


Hey no need to justify it here, this is really cool! The first few runs are always slow and ready for optimization. I am very optimistic for what this tech may do for medical records and similar industries.


Yeah I’ve always wondered if FHE would benefit from asic level acceleration, or at least FPGA


If you are going to do an asic (w/o data going to memory IO) then why not use regular encryption with codec on the SoC itself (like DRM content). that works just fine. IMO the main value here would be trust between unknown parties over untrusted medium.


The typical perceived use case is FHE in cloud providers providing highly sensitive compute in a multi tenant platform. In something like that full on hardware acceleration is key


just noticed the response..

Not really, even in those cases you need to understand that bits in memory (ie.. everything past (&including) the SoC's IO pins would never see the data unencrypted. any decryption is handled within working cache on soc with secure context (typically on a secure OS vm). I dont see why a bunch of compute code wont work there. in fact you could go a step further and encrypt compute instructions too. then assuming you know what you are doing and dont crash the compute node will have no idea what ops you are actually doing on what data.

btw thats how DRM works (sans the last part).


Great article btw


Binary TFHE (the scheme the Google compiler uses) is fairly slow and has very large public keys (like 100MB). However the advantage is that binary computation is very flexible and well understood. Additionally, TFHE supports fast (~10ms) bootstrapping, which allows you to perform an arbitrary amount of computation.

If you can live with the limitations of other schemes FHE can be much faster. On a single core of an M1 Macbook Air, multiplying 2 BFV encrypted 4096-bit values takes 4ms and adding them takes only 15us. Additionally, key sizes aren't horrendous (<500kB). One downside with this scheme are that bootstrapping takes minutes so it isn't really practical. This limits the amount of computation you can do before you exceed your noise budget and the ciphertext decrypts to garbage. The other downside is that arithmetic circuits make some computations far more difficult (e.g. comparisons).


I feel it's already at the edge of being useful in practice. Compute and memory are cheap, and we're wasting a lot of it - so even with overhead this big, simple calculations wrapped in FHE wouldn't be prohibitively expensive - and may just be useful enough to create new types of software systems.


7 seconds to add two numbers; that's roughly, what, 10 orders of magnitude slower than without FHE? I'm not sure compute is that cheap.


It would be interesting to see a whole new generation of writing apps with hyper optimized code. We suddenly go back to the 1950s(?) in relative compute power for apps behind FHE.

Not sure what could be done with it at 7s though hah.


An order of magnitude order of magnitude


It’s orders of magnitude all the way down


I'm surprised the "capitalizing a 32 character string" example is actually one second faster than adding two 32bit integers. Still super slow, but I'm curious why. I'd assume that if the string_cap.cc example takes 256 wires, wouldn't add.cc take 64 wires?


Short answer: string cap has more parallelism in the circuit. Depth is more important than total number of gates, and optimizers can decrease depth pretty well.


Thanks (and Dylan18607). So to continue assuming, the number of wires specified is just the input/output, and the add.cc has more intermediate "wires"?


It's more like: to implement add you need a ripple carry adder. You can't evaluate bit 6 until you've evaluated bits 1-5. So there's a nested dependence that makes the circuit deep. String cap, on the other hand, can be implemented in parallel by looking at pairs of characters independently. So that makes it into a parallel band of small circuits, which we can evaluate in different CPU cores. Also I think the capitalization operation also affects fewer bits of a char (it's just toggling one bit in each char, IIRC).


That makes perfect sense, thank you! I also didn't know until today that capitalization in ascii is just the 6th bit. I thought it would have been doing 32 subtract operations. (I'll be honest I haven't written my own uppercase function since school haha)


It's faster to carry six glasses at once than having to carry them to some point one by one.


Since it encrypts one bit at a time, does capitalizing even have to touch the 224 bits that don't change?

> wouldn't add.cc take 64 wires?

Plus another hundred intermediate wires. And it's doing more complicated operations, however much that matters.


No idea lol! Though it says it works on all 32 bytes in parallel regardless of input size. I imagine ignoring the bits that don't change would be some sort of security vulnerability? You could probably work out some timing attack or entropy reduction otherwise? They state it should take the same duration regardless of the length of the input string or how much needs to be capitalized (all branches are executed even though many are discarded)


> I imagine ignoring the bits that don't change would be some sort of security vulnerability? You could probably work out some timing attack or entropy reduction otherwise?

Nah. The bits are hardcoded into the circuit to not change. That's part of the program, and the program is public information. Only bit 6 of each byte might change, and everyone knows it.


Whoa, I never knew about the ascii 6th bit trick, I've never really considered capitalization wasn't some random offset. So that does make this circuit simpler than I thought it was, and there's the parallelism aspect. It's not so surprising now that it's faster than the add.


If all you need to do is handle a symmetric key exchange that you then use for everything else, that might be enough.


This is tempting to believe, but unfortunately the dream is to be able to run applications on someone else's servers without them knowing what you're doing. That's when I lost interest in FHE. If you want to use it, you have to use it everywhere, and it's just too slow.

Hopefully I'm wrong about that.


author shows a program that adds two integers and takes 7 seconds to run, and the way the system processes data results in 20000x the RAM usage


So similar to your average electron app.


Turns out Electron is SUPER SECURE


There is something surreal about the last example where you take a Rick Astley video that's 9MB and it becomes 188GB after FHE. That seems like a new category of rick-rolling by some cyberpunk.


We've complained long enough about complexity in code being the barrier to speed. Now it's processing power once again.

I'll take it if we can rely on the security.


> I'll take it if we can rely on the security.

What's a use case here? Will this one day make communication more secure? If so, how?


My bet is on digital currency. Your wallet will be able to send and receive from other wallets locally. Not like crypto. No distributed ledger. You will make transactions locally. You can't just alter your balances though because they are encrypted. Not sure how you get data in or out without keys though. FHE will be a part of such currency though.


Is there proof that this is secure? (ie that no information can be found on the plaintext when only having the ciphertext?)

I'm hoping something of the like of "this is secure if AES is secure"


I could not find on which algorithm this compiler is based at first sight.

I know of an other library for homomorphic encryption Lattigo (https://github.com/tuneinsight/lattigo) which is based on Ring Learning with Errors.

The security of RLWE is "believed" to be strong (meaning there has not been a proof of the opposite yet). It is based on the Lattice problem which is likely to be resistant even to quantum computers.

For a more formal and complete explanation, the paper "A Decade of Lattice Cryptography" was very instructive. https://eprint.iacr.org/2015/939.pdf


It's based on the "Learning With Errors" problem and it's relative, "Ring Learning With Errors", both of which have reductions to lattice crypto problems, the same sort that are in the new NIST proposals for post quantum cryptography. See this article for some more info about how security is evaluated: https://jeremykun.com/2022/12/28/estimating-the-security-of-...

That said, the compiler itself doesn't use any crypto. It generates code for a backend API, and the backend implements the FHE scheme


I left this comment on the blog but I'll repeat it here as well.

Really interesting! Thanks for the writeup. I'd be particularly keen to see the example applications broken up into the two halves you'd expect in an actual client/server application.

The adder example, while illustrative, receives cleartext inputs and returns cleartext outputs. In a real application the adding code would presumably sit on a server and there would be client code which just encrypts user inputs and decrypts backend outputs.

It would be very interesting to see how one might set this all up with a keypair. Could the same adding code be used for multiple different users and keys, for example?


> First, the subset of C++ supported by the compiler is rather small. As mentioned earlier, all data needs to have static sizes. This means, e.g., you can’t write a program that processes arbitrary images. Instead, you have to pick an upper bound on the image size, zero-pad the image appropriately before encrypting it, and then write the program to operate on that image size. In the same vein, the integer types you choose have nontrivial implications on performance. To see this, replace the int type in the 32-bit adder with a char and inspect the resulting circuit.

> Similarly, loops need static bounds on their iteration count. Or, more precisely, xlscc needs to be able to fully unwrap every loop—which permits some forms of while loops and recursion that provably terminate. This can cause some problem if the input code has loops with complex exit criteria (i.e., break‘s guarded by if/else). It also requires you to think hard about how you write your loops, though future work will hopefully let the compiler do that thinking for you.

I'm wondering if Zig wouldn't be a more appropriate language here, given its extensive support for running code at compile time (which requires all involved variable values to be known) and its integer data types which come in any bit length (u1, u2, u3, …).


Admitting the possibility of having multiple frontend languages for the compiler is a longer term goal of mine!


Hmm, given that "mov is Turing-complete" [0], is it possible to get around the requirement that loops must be fully unrolled? Obviously you couldn't tell if your algorithm was finished, but if you could ask the key holder if the algorithm had completed or needed more work, you could theoretically compute any algorithm. What am I missing?

[append] Perhaps, could you create a "fully homomorphically encrypted" 6502, where each application of the program corresponded with a single clock of the emulated microprocessor?

[0] https://drwho.virtadpt.net/files/mov.pdf


The "mov is turing complete" thing is mostly about x86 mov having a lot of functionality. It can do arithmetic. The mov mnemonic basically encapsulates a bunch of functions (and opcodes), kind of like how git checkout can do at least ~3 distinct operations depending on arguments.

That said, to have unbounded loops, they either need:

- A branch at the end.

OR

- The use of pagefaults.

So even then, it's not quite just the mov instruction alone.


Combinatorial circuits aren't turing complete. The novelty of mov being turing complete is that they were able to do conditional branching with just loads and stores. In the FHE case, they can't read and write to parts of memory, so it's a no-go.

This is why their compiler essentially only works on pure functions whose inputs have a statically known size.


This is all interesting and I’m thinking now with all the talk of privacy and data sovereignty etc why this is coming out or Google now and not Apple? Anyone know if Apple is thinking along these lines? Would be nice if we could get performant computation on things without having to decrypt our own data. Then again my understanding and real world applications of homomorphic encryption is pedestrian at best.

What I do really enjoy is the author’s tone and style. The article was fun to read and easy to follow and it seems like a really cool project.


The news stories you read about Google and Apple re: privacy are heavily influenced by corporate PR departments sending 'tips' to the media to try to make themselves look good or a competitor look bad.

The actual privacy efforts of both companies don't (in my experience) closely align with what the news says about them.


> It’s roughly a 20,000x overhead.

Would this be a realistic use case?

A wants to send 256 bit message (M) to B.

B sends already encrypted 256-bit AES key (K) to A.

A can use encrypted K to encrypt M and send it to B without knowing K.

(essentially public key symmetric key)


Is there any way to get unencrypted data in or out? I'd like to feed it real-world data and get real-world data out, but not know the internal state.


why not using the python as the input source? Since most people write program using python nowadays, this may lead to a bigger impact.


The application domain for this stuff is so negligible... Who's going to pay for this huge added cost?

Either the customer whose data is being handled trusts the service provider enough to let it handle unencrypted data, in which case all the data is vulnerable to interception (and the vast majority of data processing falls into this category)

Or the customer doesn't trust the data processor to see the unencrypted data, their data but still wants to delegate processing to it. This is a very thin space to operate in. It sounds much easier to simply trust a physically separate and controlled computing plant instead of FHE.


That's not the only trust model at play. Here is a different example of where FHE becomes very useful.

Cancer researchers will benefit greatly from patient data sets that might include very privacy centric elements such as genome sets, past medical history (of both them and relatives to help understand heredity aspects of the disease). Most people making an informed decision may be uncomfortable with this information being made widely available, at best perhaps OK with a select few groups, but not available to anyone researching this domain.

FHE would solve this, the data can be made available to anyone with compute power that wants to help, while still respecting the privacy of the patients.


Or you can just have a supercomputer and run the code on the database, just don't distribute the whole database to the people writing the code. Seems far easier and cheaper


How exactly would FHE work here?


One example would be a project like Folding@Home but with much more sensitive/personal information.

Organizations/universities could compute on data provided by a custodian organization without having to care about data handling.


No, I mean please explain the way you would use FHE to do that.


Are "privacy" issues are more important for people than "dying from cancer" issues though?


Who is 'people'? They're not the same set, not all of cancer research is looking at people currently 'dying from cancer', nor are there going to be enough from different demographies declaring 'I am concerned about dying from cancer more than my privacy' and volunteering their data. And even if there were, that trait itself is probably skewing your data and not necessarily generalisable to people not (so particularly) concerned about 'dying from cancer'.


It would be useful to have control data sets as well, from those without cancer/illness. This is a problem in a lot of medical research where those affected with an ailment are much more likely to share data, participate in trials, and donate their bodies to science than healthy individuals are.


Ugh.

All things equal, privacy is usually preferred. This technology would allow both. Why is it either or?


I've been working in the privacy space for a few years and always had the same opinion. If a customer doesn't trust the data processor, the logical conclusion is to not send any sensitive data at all, which is technically possible. But it means more edge processing, more client side processing, most likely not what Google wants. But they have to do _something_, even if nobody uses it in the end.


The practical way that people deal with this is putting a swarm of auditors all up in the processors' business. It used to be this way for telecom as well: to know whether comms were secure, you needed to walk the length of the cable and look for taps. Comms encryption has radically changed this. The question is, is there a similar revolution in the making for processing as opposed to comms. I strongly doubt the benefits will outweigh the drawbacks.


> The application domain for this stuff is so negligible...

They said the same about Rust ("it's a systems language, needlessly complicated for large user applications").


It also doesn't work for GDPR as legally encrypted personal information is still personal information. Just because you can't decode it today doesn't mean that you won't be able to decode it tomorrow (via key leak, quantum etc etc).


Am I the only one who read this as "Google's fully homophobic encryption compiler"?


As the old logicians joke (https://www.barrypopik.com/index.php/new_york_city/entry/thr...) goes: I don't know.


The future is bright


I may have misread the title and thought that was one bigoted compiler


Why it's a compiler and not a library


> Finally, encrypting each bit of a plaintext message comes with major tax on space usage. Each encryption of a single bit corresponds to a list of roughly 700 32-bit integers. If you want to encrypt a 100×100 pixel greyscale image, each pixel of which is an 8-bit integer (0-255), it will cost you 218 MiB to store all the pixels in memory. It’s roughly a 20,000x overhead. For comparison, the music video for Rick Astley’s “Never Gonna Give You Up” at 360p is about 9 MiB (pretty small for a 3 minute video!), but encrypted in FHE would be 188 GiB, which (generously) corresponds to 20 feature-length films at 1080p.

Ouch.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: