A Cryptographic Near Miss

feb · on April 12, 2023

Filippo describes how evolutions in the source code and hidden assumptions caused a bug in a special case implementation he kept for performance reasons. This is a great example of how the incompleteness of the old Weierstrass curves multiplication formulas still bites us and how he uses the opportunity to get valuable lessons from this accident. It also shows again how optimizing for performance can cost security and hurt maintainability.

dan-robertson · on April 12, 2023

Was the asm for performance or constantish time?

FiloSottile · on April 12, 2023

The assembly is there for performance. Other curves are implemented in constant time pure Go. (It's actually more robust than constant time C, because the Go compiler is less creative in "optimizing" things into branches.)

twic · on April 12, 2023

> It's actually more robust than constant time C, because the Go compiler is less creative in "optimizing" things into branches.

Today's Go compiler. This feels like a belief with a lot of potential for generating near misses of its own.

FiloSottile · on April 12, 2023

Heh, I used to joke that in Go it's safer because I know where the compiler team sits, but I think there's also a fundamental design reason: the Go compiler has speed as a primary goal. I don't know if that's still true, but there used to be a rule of thumb that a new optimization had to speed up the compiler enough to make up for its cost to be added. It's unlikely that unwinding bit masks into branches would be worth adding.

Anyway, we're not moving all cryptography to assembly anytime soon, and I'm positive that would not be a net increase in security, even if it would mitigate the hypothetical risk of compiler-introduced side-channels.

eptcyka · on April 12, 2023

Constant time.

WakoMan12 · on April 12, 2023

O(nlogn)

eptcyka · on April 12, 2023

Yes, but it ensures that for all inputs of a given size, runtime will be the same.

consp · on April 12, 2023

That would more likely be a tight bound, not an upper bound.

layer8 · on April 12, 2023

This is the important takeaway:

> The second, more interesting lesson is that while assumptions might be valid now, they aren’t guaranteed to be valid in the future, after the code is refactored or reused over the years. It’s important to minimize assumptions and clearly document the remaining ones, as much as possible into the API or type system, and otherwise in high-level comments.

A lot of software engineering lessons can be reduced to “minimize assumptions and dependencies”. This includes the assumption that assumptions won’t change in the future.

matthewdgreen · on April 12, 2023

It also means that formal verification needs to be part of the CI system, or at least "check that the code hasn't been changed in ways that violate the correctness guarantees" needs to be part of CI.

Doesn't do anything for this bug, which was an ASM issue. There's nothing you can do about this sort of bug except give people a better idea of the risks (e.g., make it clear that they can trade safety for performance.)

FiloSottile · on April 12, 2023

Well, this issue was in the assembly, but it was not an assembly issue. It's interesting to think about how we could have encoded the assumptions in a machine-checkable way, because they are actually not unlike the assumptions of some formally verified components.

Here we had: 1) incomplete addition that had to be called with different points; and consequently 2) scalar multiplication that had to be called with a reduced scalar so the computation wouldn't wrap and add two equal points. I don't think we have the tools to encode (1), nor the fact that (2) satisfies (1). (2) maybe could have been encoded in the type system by accepting a ReducedScalar.

matthewdgreen · on April 12, 2023

If you have to do things with type-based checking, presumably you can make two new types that are only ever output by one routine, something like ensureNotEqualPoints(A, B) -> (pointNotEqualTypeOne, pointNotEqualTypeTwo), err. Then you have one incomplete addition formula that takes in the two different point types. This doesn't protect you from calling it multiple times in a way that scrambles the types and breaks the invariant.

Formal verification with typechecking sucks and is very limited.

dan-robertson · on April 12, 2023

If you have a haskell style type system you can do such a thing and have the type system prevent you from calling things multiple times, eg:

  data P1 a
  data P2 a
  data PointNotEqual where
      | T :: P1 a -> P2 a -> PointNotEqual
  getP1 :: P1 a -> Point
  getP2 :: P2 a -> Point
  checkNE :: Point -> Point -> Maybe PointNotEqual
  (+) :: P1 a -> P2 a -> Point

Here when you are given a PointNotEqual, you don’t know what the type variable is, only that it’s the same between the two points. So you (ie the compiler) cannot prove that a P1 or P2 from a different call to PointNotEqual has the same type variable, so you can’t add them.

But these types also mean you need to do the check before every addition which defeats the point of the whole constant time thing, so the typechecking is still limited. And I guess you can’t do it in Go either.

zahllos · on April 12, 2023

> Scalars, when applied to a specific curve, do have a “order” though,

Nitpick: the important point here is that points on the elliptic curve have order (by definition the smallest k such that kP = identity), and the group of such points has order (by definition, the number of elements it has). Here, scalar is just a way to say "not a point", or "integer value" and if you define them in the right group, sure, we could talk about their order as well, but what we actually care about is the order of the points.

From the definition of order, qP = identity, thus (q+30)P = qP + 30P = identity + 30P = 30P, hence the wrapping.

jeffrallen · on April 12, 2023

This article is the perfect intersection of cryptography theory, cryptography practice, and software archeology. Great work!

fefe23 · on April 12, 2023

Can we just stop with the weaseling please?

> Fundamentally, a scalar multiplication function was returning the wrong value for a very specific input because of a combination of the pre-existing complexity and unsafety of some optimized assembly, of undocumented assumptions, and of the neverending state of flux of open source code.

So, basically, it's everybody's fault except for the maintainer of the go crypto code. Which, coincidentally, is the guy writing this blog post.

Oh, pre-existing complexity, huh? I wonder who put that there. It probably fell from the heavens.

I would certainly feel much safer using code from someone who is able to say they made a mistake. This is not about blame. This is about being able to take responsibility.

I wouldn't have said anything if the whole tone of this blog post wasn't "here, children, let me tell you something you can learn from". How about you learn from it first, Obi Wan?

Now feel free to downvote me, you hypocrites.

baby · on April 12, 2023

I have to admit I don’t understand your comment. I’ve looked at the Golang stdlib and its crypto parts (disclaimer, I found a CVE in the math parts), and found the assembly code pretty awful. This was before filippo was working at Google. It’s all about performance, similar to how openssl has fast but unauditable code for crypto. There’s no documentation and it’s hard to understand how that code was produced.

From some discussions I had with filippo years ago it sounded like he agreed and was trying to clean parts of these now that he was working there. It looks like progress is being made, and if he can share his learnings along the way let’s benefit from that instead of being mad at him no? Sharing is caring.

I mean even in a world where he would have written that code originally, what’s wrong with being transparent about it? Everybody writes bugs.

traceroute66 · on April 12, 2023

> This was before filippo was working at Google

As a fan of filippo and and avid hater of evil-Google, I am pleased to point out that....

"Last May I left my job on the Go team at Google"[1]

[1] https://words.filippo.io/full-time-maintainer/

fefe23 · on April 12, 2023

So you are saying while he worked there he didn't do anything about the problem, and now that he's gone he's shit-talking his old employer?

I have no beef with either Filippo or Google or Go. Or you for that matter. But this weaseling needs to stop.

Now feel free to downvote me, you hypocrites.

traceroute66 · on April 12, 2023

> So you are saying while he worked there he didn't do anything about the problem, and now that he's gone he's shit-talking his old employer?

Huh ? When did I say that ?

I posted what I did because the OP made it sound like filippo was still working at Google.

I just wanted to point out that as of next month it will have been a year since filippo left.

Beyond that, I was NOT proferring any sort of opinion on what fillipo may or may not have done whilst wearing the Google hat.

Xylakant · on April 12, 2023

Even if we assume that all of the code was written by the author of the post, it‘s much more helpful to reason about how the mistake happened than just bluntly saying „I made a mistake and it will never happen again.“ Because it will, at least if you don‘t understand where that mistake came from, what underlying assumptions were made and not documented at the time, how the code evolved so that the assumptions were no longer valid. All of the code that‘s involved in this bug probably seemed reasonable at the time, all decisions made have sound reasons. Making the non-constant-time code constant time? Reasonable security practice. Implementing the code in assembly for performance reasons? Sounds about right. Implementing the incomplete formulas? Makes sense, complete formulas were not available. And so on. Every step reasonable, and yet, a bug happened. And that‘s the valid learning, and I‘m fairly confident that most of us can learn from being reminded of that.

fefe23 · on April 12, 2023

It's not about who wrote that code. It's about responsibility. If you are the maintainer of shitty code, then at the very least communicate that publicly. Don't come out with how bad your code is after the fact!

After he was responsible for that code for years, he now goes public and matter of factly states that all the code was shit the whole time.

Well why didn't he do anything about it then? What did he think the job description of being maintainer entailed?

Filippo wasted no time shitting on other crypto projects, like GnuPG, from what then looked like the high ground. And now he leaves Google and by the way the supposed high ground was an optical illusion (and that's the charitable phrasing).

My theory is that you people like this kind of story because it helps you cope with your own mediocrity. If Google and Golang have this kind of laissez-faire approach, why should I have higher standards? We'll just call it a life lesson as if it was handed down from heaven, as if things HAD to be this bad.

Sprinkle some "bugs happen to anyone" and "you can't have 100% security anyway" on top and your shit burger is finished.

Now feel free to downvote me, you hypocrites!

affinepplan · on April 12, 2023

> Sprinkle some "bugs happen to anyone" and "you can't have 100% security anyway" on top and your shit burger is finished.

I mean, yeah. both of these statements are unconditionally true.

aflag · on April 12, 2023

Do you know for a fact that it was all written by the current maintainer? Though, even if it was. I'm not sure they are intentionally not admitting a mistake. When you work in a shop with blameless culture, you end up learning to always talk like that and you don't even notice.