There's something about the way that this is written that rubs me the wrong way. There's a lot of emphasis on how unlikely the attack would be - that the collision would take "significant resources", that many conditions must be true at the same time, that Pulse Security's proof was unrealistically simple.
All of this can be true, but rhetorically, it sends the wrong message to put so much emphasis on this. The message I'd like to see would be more like this:
"We got notified of this security problem. We immediately worked on mitigation and to find out if any customers were affected. We couldn't find any, and we have patched the problem. We have a patch for you to apply to fully prevent the problem.
The problem depended upon an identity collision. We think the probability of this is remote, but we always take this stuff very seriously."
In other words, I want to see platforms like this emphasize their response, rather than try to convince me that the problem is minor. The way these things are phrased matters a lot!
As a network admin I think my primary concern on first reading is actually the severity of the problem and how much I should worry about it. Then it’s good to see that the response was prompt and transparent. Probably can’t satisfy everybody no matter how you write it.
Yeah, that makes a lot of sense too, and you've convinced me that I'm probably being a bit too uncharitable. I totally agree that these things are very hard to write in a way that makes everyone happy.
I’m giving them the benefit of the doubt. Sounds like they had some audits / penetration testing done, the security firm found a real weakness, but it’s just unlikely to happen.
They disclosed the issue pretty well, but at the same time, are afraid of the response; they decided to overcommunicate that they attack vector has likely never been exploited, and I can see why they did that.
I’m not sure I see a big difference between what you wrote and what the article said since you also minimize the likelihood of the attack and the article also talked about their response.
To me it's fair to say it's not a likely scenerio, it's just that they continously say "this would be difficult" at every step. I appreciate the breakdown, but it comes off as trying to convince you it was a near-complete impossibility. I understand it was unlikely, but it was possible, and that makes it severe either way.
Right - I wanted to try to phrase it in a way that replicates the content, but changes the emphasis. Definitely true that they cover all of the same things.
I certainly agree with you. I just posted a few days ago saying I would never trust this company.
In fact, I would never trust a company that didn't explicitly state that the company having a central server facilitating operations (Signal, et all) is a huge risk.
Every time someone specifically states "we take security very seriously" or "we take your privacy very seriously", or even "you're in control of your data" it makes me think otherwise.
Definitely. We've had formal design audits, informal audits, and open source audits (like the one that found this issue). Formal audits are in the works.
We're glad this was found and glad that this is the first significant vulnerability report we've received in many years of operation, but we don't want that to make us arrogant. Distributed systems are hard and cryptography is hard. Distributed systems with cryptography are REALLY hard. :)
We've talked to Pulse Security about hiring them in the future. The fact that they found this obscure issue is a pretty strong pitch.
Trail of Bits, and they did recommend additional scrutiny into dependencies on addresses alone for authentication. They didn't find this specific issue because as you say it is a bit of a design and implementation issue. It required that a weak point in the design be combined with a mistake in implementation somewhere else (roots).
We've fixed the mistake in the roots, and are also fixing the weak point in the design soon with a release. Fixing the roots blocks this specific attack but we want to fix the fundamental weakness to ensure that there are no similar issues in the future.
The public key is bound to the address. The problem is that this binding was not as strong as we thought it was in one edge case. Exploitability relied on another problem in the roots, which is now fixed, so the issue is no longer exploitable.
We are also going to do a release though because we thought of a way to ensure that a complete hash of the address and public key (rather than just the 40 bit address) is always checked in certificates of membership. In retrospect it always should have been this way, but then again all security issues always seem silly and obvious in hindsight.
This will make an exploit of this nature impossible even if the roots are misbehaving, since the certificate of membership won't validate against a colliding identity at all.
It would be nice if the address could be at least 256 bits long, but there's a major ergonomic problem with that. Would you rather join network abcd0123ab12345 or network 8c6e2a2647ee854f469a3bb798e02ba5a8b1812cab229ff129f073e7a80c1202?
If humans could remember and easily type very long strings a lot of information security would be way, way easier. :)
The issue was fixed entirely on the root side. No release was necessary. It was a private disclosure and we fixed it within a few hours.
No release is technically required now, but we have one coming that contains an endpoint-side mitigation that renders the attack and others like it impossible even if the root is misbehaving. A 1.6.6 patch release is currently being built as we speak and it will also be in 1.8, which is delayed to fix some issues with the new UI.
The root fix renders this exact attack impossible with current nodes, but we think it's a good idea to close the gap in another way that renders this entire class of attacks impossible just in case someone figures out some other way to accomplish something similar. We found a way to do that so we are releasing it.
I don't think security should be done by just playing whack a mole. You want to try to get ahead of it. If there's an opportunity to harden something, do it.
From this response it's not quite clear to me: So this identity collision is against their Curve25519 implementation? Does this mean the attacker has effectively found a new brute force attack on that specific public/private key algorithm? That seems it would be bigger news and affecting more than just zerotier. Or is here some proprietary crypto in place on which the collision has been generated? Maybe I'm missing an important link with the details?
I believe in some areas there was a shortened, truncated form of the public key being used as an "address".
If a device went offline and was forgotten about (but still trusted), an impersonator spoofing the same (truncated) public key could gain access, as long as the server didn't reject this identity and say "that's not the public key you had before". I believe truncation was used to facilitate typing it into the UI.
So in short, it seems to me this aspect was based on truncation of a public key or hash, and the inevitable finding of collisions in this reduced address space.
All of this can be true, but rhetorically, it sends the wrong message to put so much emphasis on this. The message I'd like to see would be more like this:
"We got notified of this security problem. We immediately worked on mitigation and to find out if any customers were affected. We couldn't find any, and we have patched the problem. We have a patch for you to apply to fully prevent the problem.
The problem depended upon an identity collision. We think the probability of this is remote, but we always take this stuff very seriously."
In other words, I want to see platforms like this emphasize their response, rather than try to convince me that the problem is minor. The way these things are phrased matters a lot!