Filippo describes how evolutions in the source code and hidden assumptions caused a bug in a special case implementation he kept for performance reasons. This is a great example of how the incompleteness of the old Weierstrass curves multiplication formulas still bites us and how he uses the opportunity to get valuable lessons from this accident. It also shows again how optimizing for performance can cost security and hurt maintainability.
The assembly is there for performance. Other curves are implemented in constant time pure Go. (It's actually more robust than constant time C, because the Go compiler is less creative in "optimizing" things into branches.)
Heh, I used to joke that in Go it's safer because I know where the compiler team sits, but I think there's also a fundamental design reason: the Go compiler has speed as a primary goal. I don't know if that's still true, but there used to be a rule of thumb that a new optimization had to speed up the compiler enough to make up for its cost to be added. It's unlikely that unwinding bit masks into branches would be worth adding.
Anyway, we're not moving all cryptography to assembly anytime soon, and I'm positive that would not be a net increase in security, even if it would mitigate the hypothetical risk of compiler-introduced side-channels.
> The second, more interesting lesson is that while assumptions might be valid now, they aren’t guaranteed to be valid in the future, after the code is refactored or reused over the years. It’s important to minimize assumptions and clearly document the remaining ones, as much as possible into the API or type system, and otherwise in high-level comments.
A lot of software engineering lessons can be reduced to “minimize assumptions and dependencies”. This includes the assumption that assumptions won’t change in the future.
It also means that formal verification needs to be part of the CI system, or at least "check that the code hasn't been changed in ways that violate the correctness guarantees" needs to be part of CI.
Doesn't do anything for this bug, which was an ASM issue. There's nothing you can do about this sort of bug except give people a better idea of the risks (e.g., make it clear that they can trade safety for performance.)
Well, this issue was in the assembly, but it was not an assembly issue. It's interesting to think about how we could have encoded the assumptions in a machine-checkable way, because they are actually not unlike the assumptions of some formally verified components.
Here we had: 1) incomplete addition that had to be called with different points; and consequently 2) scalar multiplication that had to be called with a reduced scalar so the computation wouldn't wrap and add two equal points. I don't think we have the tools to encode (1), nor the fact that (2) satisfies (1). (2) maybe could have been encoded in the type system by accepting a ReducedScalar.
If you have to do things with type-based checking, presumably you can make two new types that are only ever output by one routine, something like ensureNotEqualPoints(A, B) -> (pointNotEqualTypeOne, pointNotEqualTypeTwo), err. Then you have one incomplete addition formula that takes in the two different point types. This doesn't protect you from calling it multiple times in a way that scrambles the types and breaks the invariant.
Formal verification with typechecking sucks and is very limited.
If you have a haskell style type system you can do such a thing and have the type system prevent you from calling things multiple times, eg:
data P1 a
data P2 a
data PointNotEqual where
| T :: P1 a -> P2 a -> PointNotEqual
getP1 :: P1 a -> Point
getP2 :: P2 a -> Point
checkNE :: Point -> Point -> Maybe PointNotEqual
(+) :: P1 a -> P2 a -> Point
Here when you are given a PointNotEqual, you don’t know what the type variable is, only that it’s the same between the two points. So you (ie the compiler) cannot prove that a P1 or P2 from a different call to PointNotEqual has the same type variable, so you can’t add them.
But these types also mean you need to do the check before every addition which defeats the point of the whole constant time thing, so the typechecking is still limited. And I guess you can’t do it in Go either.
> Scalars, when applied to a specific curve, do have a “order” though,
Nitpick: the important point here is that points on the elliptic curve have order (by definition the smallest k such that kP = identity), and the group of such points has order (by definition, the number of elements it has). Here, scalar is just a way to say "not a point", or "integer value" and if you define them in the right group, sure, we could talk about their order as well, but what we actually care about is the order of the points.
From the definition of order, qP = identity, thus (q+30)P = qP + 30P = identity + 30P = 30P, hence the wrapping.
> Fundamentally, a scalar multiplication function was returning the wrong value for a very specific input because of a combination of the pre-existing complexity and unsafety of some optimized assembly, of undocumented assumptions, and of the neverending state of flux of open source code.
So, basically, it's everybody's fault except for the maintainer of the go crypto code. Which, coincidentally, is the guy writing this blog post.
Oh, pre-existing complexity, huh? I wonder who put that there. It probably fell from the heavens.
I would certainly feel much safer using code from someone who is able to say they made a mistake. This is not about blame. This is about being able to take responsibility.
I wouldn't have said anything if the whole tone of this blog post wasn't "here, children, let me tell you something you can learn from". How about you learn from it first, Obi Wan?
I have to admit I don’t understand your comment. I’ve looked at the Golang stdlib and its crypto parts (disclaimer, I found a CVE in the math parts), and found the assembly code pretty awful. This was before filippo was working at Google. It’s all about performance, similar to how openssl has fast but unauditable code for crypto. There’s no documentation and it’s hard to understand how that code was produced.
From some discussions I had with filippo years ago it sounded like he agreed and was trying to clean parts of these now that he was working there. It looks like progress is being made, and if he can share his learnings along the way let’s benefit from that instead of being mad at him no? Sharing is caring.
I mean even in a world where he would have written that code originally, what’s wrong with being transparent about it? Everybody writes bugs.
Even if we assume that all of the code was written by the author of the post, it‘s much more helpful to reason about how the mistake happened than just bluntly saying „I made a mistake and it will never happen again.“ Because it will, at least if you don‘t understand where that mistake came from, what underlying assumptions were made and not documented at the time, how the code evolved so that the assumptions were no longer valid. All of the code that‘s involved in this bug probably seemed reasonable at the time, all decisions made have sound reasons. Making the non-constant-time code constant time? Reasonable security practice. Implementing the code in assembly for performance reasons? Sounds about right. Implementing the incomplete formulas? Makes sense, complete formulas were not available. And so on. Every step reasonable, and yet, a bug happened. And that‘s the valid learning, and I‘m fairly confident that most of us can learn from being reminded of that.
It's not about who wrote that code.
It's about responsibility. If you are the maintainer of shitty code, then at the very least communicate that publicly. Don't come out with how bad your code is after the fact!
After he was responsible for that code for years, he now goes public and matter of factly states that all the code was shit the whole time.
Well why didn't he do anything about it then?
What did he think the job description of being maintainer entailed?
Filippo wasted no time shitting on other crypto projects, like GnuPG, from what then looked like the high ground. And now he leaves Google and by the way the supposed high ground was an optical illusion (and that's the charitable phrasing).
My theory is that you people like this kind of story because it helps you cope with your own mediocrity. If Google and Golang have this kind of laissez-faire approach, why should I have higher standards? We'll just call it a life lesson as if it was handed down from heaven, as if things HAD to be this bad.
Sprinkle some "bugs happen to anyone" and "you can't have 100% security anyway" on top and your shit burger is finished.
Do you know for a fact that it was all written by the current maintainer? Though, even if it was. I'm not sure they are intentionally not admitting a mistake. When you work in a shop with blameless culture, you end up learning to always talk like that and you don't even notice.