Hacker News new | past | comments | ask | show | jobs | submit login
Zero Knowledge Proofs: An illustrated primer (2014) (cryptographyengineering.com)
138 points by _qc3o on Dec 5, 2016 | hide | past | favorite | 44 comments



Thanks for posting this!

Are all ZKP probabilistic in nature?

Anyway, now I need to figure out a protocol that can be done manually and in a reasonable timeframe for a following problem: there is a group of good and evil people. There's no more evil people than good people in this group. Evil people know each other, and only one good person knows who everyone is; the rest of the good people don't know who is in which group. How can that one good person pass their knowledge to everyone without revealing their own identity? Also, the protocol must resist the evil people trying to disrupt it.

Yes, I'm trying to stop people playing Avalon in my office in a very over-the-top, ultimately nerdy way - by literally providing a winning strategy and calling it GG.

https://en.wikipedia.org/wiki/The_Resistance_(game)#Avalon_v...


> Are all ZKP probabilistic in nature?

Yes, but in some cases the probability is too small to consider.

For example, take digital signatures. These are in effect a proof that the person claiming to have signed it is in fact the holder of the private key behind them, without revealing the private key. But it is possible for someone to accidentally generate a valid signature without having the private key. Extremely unlikely. But possible.

-----

The problem you pose is interesting because most ZKPs involve a prover and a verifier. Here, there are many separate verifier entities each with different information that need to be convinced.

What kind of communication is allowed?


> Yes, but in some cases the probability is too small to consider.

Thanks!

> What kind of communication is allowed?

Any kind. People are free to say whatever, whenever - the question is, whether or not you believe them.

In the game, each person takes a turn to select a 'squad' for a 'mission'. Then, everyone votes whether or not to accept the squad, and if the squad is accepted, it goes on a mission. The mission consists of the entire squad voting in secret whether the mission is a "failure" or a "success". Good people can vote only "success", evil can vote either "success" or "failure". The votes are tallied (without revealing who voted which way), and if there's one or more failures, the mission fails. The good team wins the game if 3 missions end with success, the evil team wins if 3 missions end with failure (or if the general vote fails to accept the squad five times in a row). The twist is, if good team wins, evil team has one last chance to guess which of the good players was "Merlin", i.e. one who knows who is in which team. If they guess correctly, the evil team wins.

The game consists mostly of the good players trying to guess who is good (to avoid taking evil players on missions), and the evil players trying to confuse everyone. Merlin, who knows who is good and who is evil, can't be too obvious about revealing his/her knowledge, lest he/she gets sniped at the end.

So my question is - how can Merlin pass his knowledge about the composition of two teams to everyone without revealing his/her own identity? We can assume that whatever protocol there is, at least half of the players (i.e. the good team) will be willing to follow it, and the rest might want to disrupt it or maliciously insert false information.

I've been thinking about it for quite a while, but I don't have much experience or knowledge of the area, so any pointers would be appreciated.


You can upgrade the system to have private communication with a public-private key system. I thought about ways to make it work otherwise and I can't come up with any. I don't think that regular zkp techniques will work here. Interesting problem though.


Note that signatures do not constitute zero knowledge proofs, however, because I can transmit your signature to a third party and convince them of the same thing you convinced me of, namely the authenticity (or whatever) of a particular document. Of course, larger protocols built around signatures don't suffer, because at worst you just need to make sure you sign a sufficient collection of data (to avoid interpretation as a different message given a different context)... but it's not zero-knowledge.


Yeah, I was looking at it in the context of PKI -- where the public key is already tied to an identity.

(It is also a zkp in case of a scheme with a challenge nonce)


> It is also a zkp in case of a scheme with a challenge nonce

This is my point - it is not zk because as verifier you can show the transcript to a 3rd party and convince them that the thing to be proved is true, even if the thing to be proved can't let you successfully impersonate the prover thanks to a nonce. In a zkp scheme, the transcript proves nothing - it's as if you took the signing scheme transcript and made it so you couldn't determine which pubkey was being signed for: the 3rd party would have no reason to believe you hadn't faked the transcript.

In short, zkp transcripts must be fakeable, as counterintuitive as that sounds.


This isn't related to ZKPs but may help you figure out a strategy to your game:

https://en.wikipedia.org/wiki/Byzantine_fault_tolerance#The_...


Also, ideas from consensus protocols from distributed systems may be useful for similar reasons.

But yeah, Byzantine fault tolerance is key here if the good folks are in majority.


There's a lot of security applications for zero knowledge proofs that go beyond just cryptography. Nuclear Weapons verification using ZKP has been getting quite popular

http://lnsp.mit.edu/zero-knowledge-warhead-verification/


> By comparing the hash values for a real and test warhead, one can confirm if the test warhead is a match or not.

But if only the weapon owner knows what is in the "real" warhead, couldn't he simply submit a dummy as his "real" one, and then offer up a load of dummy warheads for test?


Yes, authenticating a first one is required. However, if this bothers you, consider what we do currently on verifying nuclear warheads -- nothing.


I am quite relieved to find out that there's a protocol that uses zero-knowledge proofs to do password authentication. In a cryptography class I took, the teacher asked the question "What can we use zero-knowledge proofs for?". My suggestion was "to communicate passwords over an insecure network", but he shot it down with "Ehm no, hashes are for that.". Kept me confused for a long time. I guess the protocol is just a little obscure.

In retrospect, maybe I should have known better, as the same teacher did not understand why some students would question the accurateness of the statement "this data is random because it satisfies Golomb's postulates".


Some of the new ZK PAKE[0] schemes are very cool and would prevent phishing attacks. You only share your password with a party which proves in ZK that it already knows your password. Kinda surprised someone hasn't built a startup around it: "Make your enterprise immune to phishing with our expensive auth appliance".

I really want Chrome to roll it out as some sort of browser mediated authentication scheme.

[0]: https://en.wikipedia.org/wiki/Password-authenticated_key_agr...


Unfortunately phishing is a social problem, not a technical one. Users will happily put their password into any random webpage which tells them they really really need to, regardless of how much you train them only to put it into a specific application.


It is trivial to make a phishing login where the login page that looks exactly like the legitimate login page. Of course under such conditions training will fail. The rules of the game are such that no amount of training is going users not make mistakes here.

Consider instead that passwords are only entered into some special and distinct OS controlled textbox* that webpages are prevented from mimicking (or even a physical device). This is a far easier training target (only enter passwords into boxes that look like X).

* The software behind this textbox ensures that the site knows the password before asking the user for the password.


> Consider instead that passwords are only entered into some special and distinct OS controlled textbox

No, I understood this already. This is the easy technological solution which doesn't actually solve the real problem: some users (or really all users some of the time) will always be willing to enter their password into some other box which looks nothing like the one that webpages are prevented from mimicking.

Hell, I did it myself today: I entered my work password into an intranet site which was showing a "certificate error", even though in past experience this site had valid certs. Could that have actually been someone who broke into the intranet and set up a honeypot? Absolutely. But I needed the resource that was behind that password box in order to do my job, so I entered my password anyways.


>some users (or really all users some of the time) will always be willing to enter their password into some other box which looks nothing like the one that webpages are prevented from mimicking.

Phishing sites mimick real user sites because that greatly increases the success rate. You can always find someone who will do something, the important question is how often.

I don't think we should just throw up our hands and say user problem are unsolvable with technology. Good UX solves user problems, compare an AppleII to a iPad.

We have two problems: 1. it is easy to mimic password prompts, 2. it is hard for computers to tell who is legitimate and should be sent the password. This solution solves both.


You could be right, I could be right, only way to know for sure is to build it.


The problem with randomness is that you can never really be sure.


You can never be 100% sure, but you can be as arbitrarily close to sure as you are willing to spend resources for.

If you have a random algorithm with 0% false positives and 50% false negatives, and repeated trials are independent, then technically combining a few dozen runs of the algorithm actually has the same chance of returning the wrong results as a deterministic algorithm failing due to cosmic radiation. The drop-off of failure probability is exponential in the number of trails, making it quite practical to achieve arbitrary certainty. As long as you're a few orders of magnitude more certain than the hardware, the algorithmic randomness ceases to matter.


For a mathematical background we are editing this soon to be published doc about zk-SNARKS: https://drive.google.com/file/d/0B57LWnAurZr_OGhVOENnbHRoaGM...


For a non-mathematical explanation of the core idea behind SNARK, see http://manishearth.github.io/blog/2016/03/05/exploring-zero-... (I've attempted to make it accessible to people with a minimal background in these things)

(I wrote this after being fascinated by the explanation of the core idea in https://people.xiph.org/~greg/simple_verifyable_execution.tx..., but finding that post too complicated to share with my non-crypto physics friends)


Does CoinFabrik (the startup you are at) use zero knowledge proofs in any form?


Some of our work is related to integrating apps with zcash.


I google bumped into this article a few months ago when I was trying to understand Zero Knowledge, so it was quite surprising (in a good way) to see this come up in HN today. Thanks.

Took me a while to understand the point of the time machine/simulation and verifier, which in a roundabout (to me!) way led to the conclusion that only the verifier could verify the ZKP and yet not leak information to others.

If anybody has an even-clearer explanation, I'm all ears!


I would argue that zero-knowledgeness is a property that (from an information theory perspective) can only be approximated. It is not possible to have a technique based on information theory that works like the example with the hats; you always give a tiny bit of information about the solution (this information is the hash in the commitment scheme), and this is kind of interesting.


Well, yes, but as long as you use a nonce and sufficiently large hash you can make the probability of brute forcing the hash arbitrarily close to zero. So it works well enough :)


The non-interactive version of this is still a mystery to me. I'd love to see a simplified explanation of that if it exists.


I wrote about the general ZKP in a post here: http://manishearth.github.io/blog/2016/03/05/exploring-zero-... . It covers the graph coloring ZKP without requiring physical presence.

(I'm not sure what you mean by non-interactive -- is it the one with the hashes? "interactive" in a ZKP is a process with a back-and-forth. I guess you mean the ZKP that doesn't need physical presence? Regardless, I explain that a bit in that blog post.)


For instance a zkSNARK as used in ZCash. It's just a piece of data that's put there on the blockchain. It's "non-interacrive" (a term I heard, not one I made up) in that no challenges are made on it by those verifying transactions. It just encapsulates the necessary evidence all on its own. Even though I roughly understand the interactive sort of zkp, this still blows my mind.

Thanks for the link, I'll check it out.



"In the next post I’ll talk about some of those — specifically, the efficient proofs that we use for various useful statements. I’ll give some examples (from real applications) where these things have been used. Also at reader request: I’ll also talk about why I dislike SRP so much."

Did he ever write this? I can't find it.


He should change "Next they fill in the graph with a valid solution, but using the new random ordering of the three colors" to "... with the same valid solution, but using the new random ordering of the three colors."


Suppose Prover has multiple (>1) valid solutions to the problem. What changes if the Prover not only picks representation at random, but also pick a solution at random each round?

To me it looks that at least in this case entirely different solution and different representations are permutations of the solution, therefore both are allowed?


Sure, it's allowed, and makes sense in this particular example. But someone reading this might think that the scheme depends on using a random coloring in the second round, which is precisely not the case.

In ZKP, usually the claim is "I have constructed this 3-colorable graph, and I wish to prove to you that it is 3-colorable without giving you any info about the coloring", and I only know the one colorability that I constructed the graph with (so the narrative given about Google calculating a coloring for you isn't the best analogy to begin with).

Or, "Here's a number that's a product of two big primes, a fact which I will prove to you without giving you a hint on what the primes are," in which there is obviously only one solution.


Nothing changes. If something did change, that could[1] mean that information about your solution was being leaked (since this means that different solutions behave differently under the ZKP process, which cannot be the case for it to be zero-knowledge).

ZKP is about proving you have a solution. If you have many, good for you.

[1]: Not always, I think.


Question about the hat example given:

The author says that the phone company wouldn't learn anything about google's solution by keeping notes because of the randomization of the coloring. that doesn't seem to be true to me.

Assuming the solution that Google colored in was correct, the phone company doesn't necessarily need to know the frequencies of the edges, merely that they are different: which is learned by recording the results after E^2 selections.

Am I wrong here?


These are always adjacent nodes being compared. Assuming the solution is correct, adjacent nodes will always have a different color.

So the most information that can leak is the correctness of the solution. Which is exactly what we need.


The proof of zero-knowledgeness using the time machine still doesn't make sense to me.

What if the Verifier says, "the information extracted will only be deemed 'useful' if I can be satisfied with a reasonable certainty that google has indeed found a solution."

In other words... yeah we could trick my Verifier with a time machine but forget all of the information extracted on a cheat run. Can we indeed not gather information only from the successful runs?


I think the time machine example is circular logic. Consider a scheme where you uncover three adjacent nodes instead of two. We know that this does leak information, since it will tell you whether or not two nonadjacent nodes share a color.

Applying the time machine to this, you can't fake having the solution with a time machine because after many runs the verifier will know which nodes share colors, and will find an inconsistency (unless you had the correct solution after all).

In effect, the time machine doesn't work because you cannot simulate a given run of the process without leaking the same information that a correct solution would leak unless you actually know a correct solution.

-----

It's pretty easy to determine that the regular two-node scheme doesn't leak data. In each run, the verifier is just given the information that the colors of two adjacent nodes differs. Assuming a correct solution, this isn't new information. Because of the permutation the actual colors don't matter and can't be correlated between runs. So the only information that "leaks" is the fact that the coloring is in fact valid, which is all we wanted.


"How to explain Zero Knowledge Protocols to your children": http://pages.cs.wisc.edu/~mkowalcz/628.pdf


Anyone have a link to part two?


Since there are no new posts in the [category 'fundamentals'][1], I assume that there is no part two yet.

[1]: https://blog.cryptographyengineering.com/category/fundamenta...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: