Hacker News new | past | comments | ask | show | jobs | submit login
Writing a Kubernetes CRD Controller in Rust (technosophos.com)
166 points by lukastyrychtr on Jan 9, 2021 | hide | past | favorite | 74 comments



Go bills itself as simple, but Rust, despite it's complexity (which is hidden from you with libraries like serde and others) is simpler to use at the higher level.

Rust has a near-best-of-class type system, better abstractions, and runs on everything from embedded to the browser. Go has captured the hearts of sysadmin types and those looking for something just a step up from python (as evidenced by much of the devops code written in it -- which I'm very grateful for, I don't think I'd use kubernetes if it was in Java still), so I can see it growing in that field.

I don't see how Rust doesn't become the most important emerging systems language (C & C++ obviously aren't going anywhere just by virtue of being incumbents) of the next decade.


Because building things quickly is important and Rust makes you think a lot about memory management. This is a fine thing for performance critical systems, but for systems where Python’s performance could be described as adequate, the extra performance Rust offers is immaterial. Of course, Rust gives you a bit more type safety than Go as well, but still not enough to justify trading off so much development velocity (or rather, type safety and correctness simply aren’t nearly as important as velocity from an economic perspective).

Rust is getting better all the time—non-lexical lifetimes and rust-analyzer have provided a surprising improvement in development velocity, but Go is still in another class. Personally I think Go is the best language for developing quickly—even better than Python for non-toy projects (I say that as one with 15 years of Python experience and almost a decade of ago experience).


> type safety and correctness simply aren’t nearly as important as velocity from an economic perspective

In my experience type safety is one of the most factors in terms of being able to produce code quickly, at least once a project grows beyond a couple hundred lines.

I agree with your overall point, and I think the advantages of Go are especially relevant in a team setting: Go's categorical aversion to complexity makes it much more difficult to write code that someone else will not understand, where Rust projects can become relatively impenetrable when they wade too far into getting "clever" with the more esoteric aspects of the type system.

Also the learning curve is real: you can take a Javascript developer and get them writing Go code in a couple of weeks. If you hire someone who's not experienced with Rust for a Rust project, you may be paying them for months just to get up to speed.

Don't get me wrong, I'm a huge fan of Rust and use it for almost all of my personal projects currently. But I think the idea that Rust can become the main programming language in all settings comes from a place of heavy bias. Rust is definitely on one extreme when it comes to complexity, and it's a more relevant trade-off for some use-cases than others.

aside - I know it's a minority opinion, but I actually think that Swift is a sweet-spot language in terms being simple to program in like Python (by aggressively obviating away low-level details), but giving you many of the powerful tools for correctness which Rust has, like algebraic types. It's a shame that it's painful to use in all but a few "blessed" applications.


> In my experience type safety is one of the most factors in terms of being able to produce code quickly, at least once a project grows beyond a couple hundred lines.

I think it depends on your domain. If you are working in a mission-critical space where every bit of static analysis saves hours or days of testing, then yeah, the more static analysis the better. If you’re building web systems software (things like Kubernetes) or even ordinary application software, then you get benefit from a little static analysis, but returns quickly diminish beyond a certain point. I think Go hits the sweet spot and Rust is generally excessive (development is slowed down much more than necessary). Swift might also hit that same sweet spot; I haven’t used it much. I have heard that it has some maturing to do before it’s ready for more than Apple GUI apps, but that might be dated information.


Yeah I agree - Rust does have more static checking than is strictly needed for many domains. But I think the reason for that is strictly orthogonal to type safety - rather it's because Rust's approach to memory management is a lot more demanding on the programmer than GC.

Speaking only in terms of type systems, I think Rust's features - like algebraic types and explicit nullability - are strictly better than Go and make it easier to write correct code. It just happens to be that this benefit is eclipsed by all the ceremony required to satisfy the borrow checker.

> Swift ... has some maturing to do before it’s ready for more than Apple GUI apps

I actually think Swift as a language is plenty ready for general purpose programming. The problem is the tooling. It's supposedly being worked on, but last time I checked it's possible but not at all simple to get it running on anything other than Apple platforms and a few blessed linux distros. Also I had the experience that every time I would uprgade the toolchain, it would pretty much break all my larger projects. Even though I like Swift as a language more than Rust in some ways, I've pretty much abandoned it on new projects, because with Rust I can just run `cargo run` on any system and be pretty confident it's going to work, and Swift is very far from this.

Also you always feel like the language is a bit at Apple's whim. Like when SwiftUI was released, they shoe-horned a number of features into the language which seemed pretty half-baked because of iOS's priorities. It makes one hesitant to invest too much in a language which is so heavily influenced by a single stakeholder who may have different interests than you.


> But I think the reason for that is strictly orthogonal to type safety - rather it's because Rust's approach to memory management is a lot more demanding on the programmer than GC.

Rust lacks a GC so it uses its type system (namely its borrow checker) to compensate. I don’t know that this can be considered “orthogonal to type safety” in a meaningful way.

> Speaking only in terms of type systems, I think Rust's features - like algebraic types and explicit nullability - are strictly better than Go and make it easier to write correct code. It just happens to be that this benefit is eclipsed by all the ceremony required to satisfy the borrow checker.

Yeah, I completely agree. Go with ADTs would be a significant improvement. I would really like to see someone write a language like this that is interoperable with Go (compiles to idiomatic Go to the extent possible).

> I actually think Swift as a language is plenty ready for general purpose programming. The problem is the tooling.

Hmm, well I hope it finds its feet. It certainly seems like it would be a welcome addition.


> Rust lacks a GC so it uses its type system (namely its borrow checker) to compensate.

Ok fair enough. I guess mentally I separate the ownership/lifetime system from the type system, but I suppose it's true that borrows, mutable borrows etc. are actually part of the type system.


Yeah, it’s a bit pedantic, but I wanted to be precise for clarity.


I really don't think Rust MAKES you think a lot about memory management.

You CAN think a lot about memory management if you care about performance.

If not, you can throw `clone()` around everywhere and still have a very fast program.


In my experience, 98% of the time, I don't have to think much about memory management in Rust. The other 2% of the time I am untangling some kind of tricky, arcane situation.


It is not because you can that you should.


You're missing his point. Rust is simpler, so it's easier to use. The language is designed in a way that you don't really think about memory management unless you want to, and if you want to, it'll be pretty straight forward.


Either Rust improved dramatically while I wasn't looking the last few days, or that's some of the most beautiful trolling I've ever seen. Someone needs to be congratulated, either way ;)

I love Rust. It's like a beautiful puzzle that keeps my mind sharp and lets me make things that are truly elegant and performant. But the borrow checker is omnipresent, inescapable, and complexity-inducing... far from straightforward.


Rust has a lot of great qualities but simplicity is certainly not one of them.


Try to write a GUI application or a game engine, and enjoy how pretty straightforward the whole experience is all about.


I believe OCaml is the best language for fast development. It's got a super-fast compiler (faster than Go's), better type system than both Rust and Go (of course aside from lifetime analysis), and its performance is quite impressive (described as within xN of C where N can be counted with the fingers of one hand).


I really wanted to like OCaml for the reasons you mentioned, but the ecosystem seems to be a mess with several “standard libraries”, mediocre tooling, and a dearth of libraries. Moreover, the language is still working to get its parallelism story out the door and Windows support is either lacking or non-existent (I forget which). The documentation is also rather poor, to the extent that I would often try to infer the OCaml solution from F# documentation. Moreover, when I’ve asked (polite, good faith) questions about how to do something in any online OCaml community, I was met with defensiveness and hostility (apparently if you have problems that Jane Street doesn’t have, you’re building software wrong). Lastly I just can’t get my head around the syntax and style guidelines; I also tried Reason, but it introduced other problems (typically related to build tooling and integration with the rest of the ocaml ecosystem) which may or may not have been worked out in the intervening years.

OCaml has a lot of features that other languages lack, but they aren’t enough to make up for the table stakes features that it lacks.


Libraries, documentation, tooling--all fixable and all being worked on. Without mega-corp support, of course, so it will take more time. But even today it's absolutely usable for many scenarios.

Multicore--Node, Ruby, Python etc. all seem to be doing fine without it. Nevertheless, it's coming with OCaml 5--this year.

Windows--support has gotten better over the last few years but still being worked on.

Hostility--don't know what to tell you about that one--it's not my everyday experience. Sounds like a one-off.

Syntax and style guidelines--they are the way they are for a reason (no pun intended), and after a while when you understand the reason for each quirk, it just falls into place.

Table stakes features--many other languages out there, but IMHO the really table stakes features are being able to actually model your domain properly in a language. Which IMHO very few languages outside of OCaml get right. E.g. here's someone trying to propose abstract data types for Go without much luck: https://github.com/golang/go/issues/43123


> Multicore--Node, Ruby, Python etc. all seem to be doing fine without it.

Ruby 3.0, released last month, has experimental parallel support via limited sharing “Ractors”.


I don't think any ML languages are going to be able to qualify for best language for fast development. They are a paradigm shift for too many developers, which means you're going to pay a price getting up to speed.

Compilation time is generally not the bounding factor for development speed these days. Theoretically, compile time is zero for scripting languages so that's at least one point against weighing compile time heavily. I think the key benefit of OCaml and other ML langauges over Rust right now is actually higher kinded types, dependent types, and things in that realm of abstraction -- which you're probably not going to be using unless they're built into some library you're making use of, so the benefit is kinda moot there. Feels like the best power/weight for abstractions in codebases is the typeclass (or Traits in rust), most of the stuff above that is not great for fast development unless it's already in place.

And if we're talking about ML languages, I'm firmly in the Haskell camp because I think it has a better ecosystem (more than any research language should), an incredibly good runtime system (they've got the best native+green thread implementation I've ever seen), and fantastic abstractions for working with shared memory (when you need that). The whole "if it compiles, it works" is generally true. No need for a holy war, but if OCaml is in the race for fast development (which means that ML is in the race), then I'd pick Haskell over it to start on ecosystem/adoption, production readiness and features -- if you can manage to keep your developers from climbing the ladder of research-level abstraction.


I haven't tried it, but I've heard OCaml has a much smaller community (fewer libraries and general knowledge) and a lack of tooling and corporate sponsorship- I know Jane Street works with it and they have their own version of a stdlib.


I encourage you to try it out to understand why the fuss.


> Because building things quickly is important and Rust makes you think a lot about memory management. This is a fine thing for performance critical systems, but for systems where Python’s performance could be described as adequate, the extra performance Rust offers is immaterial. Of course, Rust gives you a bit more type safety than Go as well, but still not enough to justify trading off so much development velocity (or rather, type safety and correctness simply aren’t nearly as important as velocity from an economic perspective).

I think this really depends on the the higher level abstraction you get to use in rust -- case in point being the original post. I'm not sure if this is a result of the rust code just being done that much later but the Rust in there is much simpler to read and understand than getting stated with a golang codebase (whether via kubebuilder, or others) outside of scaffolding your code base -- there's just less moving parts and less noise.

Golang shines when it comes to developer fungibility -- it's simpler to onboard, train, and hire go developers. It's also easier for those developers to make mistakes, but compile type typechecking will constrict the mistake "space" (more so than ruby/python/vanilla js), and performance will get a boost. If Java is the mouse trap because C/C++ were too hard, Go is a better mouse trap on almost every axis (though I think generics in Go 2 are sorely needed).

> Rust is getting better all the time—non-lexical lifetimes and rust-analyzer have provided a surprising improvement in development velocity, but Go is still in another class. Personally I think Go is the best language for developing quickly—even better than Python for non-toy projects (I say that as one with 15 years of Python experience and almost a decade of ago experience).

I agree here, though I personally favor Typescript for building quickly (at least it is for me) if you add Typescript to it. Performance is great because most simple things are not CPU bound at the API layer, ecosystem is enormous (finding value can be difficult), and it's relatively easy to deploy (though Go is easier). I largely stopped keeping track of Go once I'd written some stuff in rust, but I do still recognize it as an excellent choice for building systems quickly and with some (type) protection as opposed to other options, with easy deploys.

I'd love to get your thoughts on recent "modern" Python and it's shiny parts and warts though. I used it last year for a client and found the modern stack to still be pretty disappointing compared to what I knew was possible with server-side JS -- typing use being sort of incomplete still (mypy was a little awkward to use), async still being a little bit awkward, GIL still being a thing.


> I think this really depends on the the higher level abstraction you get to use in rust -- case in point being the original post. I'm not sure if this is a result of the rust code just being done that much later but the Rust in there is much simpler to read and understand than getting stated with a golang codebase (whether via kubebuilder, or others) outside of scaffolding your code base -- there's just less moving parts and less noise.

I think people over-index on abstraction, especially in a professional setting. I think humans have a harder time understanding abstractions than they do in dealing with concrete problems, and I think Rust (and many other languages) encourage us toward the most abstract code we can conceive of whether or not that abstraction is actually necessary. And not only are people bad at understanding abstractions, but making good abstractions is a skill, and I’ve seen way too many bad abstractions created when no abstractions are necessary at all. On the other hand, sometimes abstraction really is necessary or helpful (and contrary to Go’s critics, these cases are not to elide error handling boilerplate or to facilitate generic map functions or other “hyper-localized-abstractions”) and in these cases Go can be quite painful.

> I'd love to get your thoughts on recent "modern" Python and it's shiny parts and warts though. I used it last year for a client and found the modern stack to still be pretty disappointing compared to what I knew was possible with server-side JS -- typing use being sort of incomplete still (mypy was a little awkward to use), async still being a little bit awkward, GIL still being a thing.

My opinion is that using Python these days is painting yourself into a corner. Developers need more rails to keep them from writing shitty code and Mypy isn’t yet mature enough (and its pace of development seems glacial). Further the package management is still worst-in-class. I’ve never worked on a Python project that didn’t hit some major performance bottleneck that the Go/.Net/Java tier of languages wouldn’t have struggled with at all, and unlike that tier of languages, Python leaves you with no better options. Async makes a lot of applications faster, but it also allows for some pernicious bugs (“Guess Who Is Blocking The Event Loop And Bringing The Whole Application Down!” is not a fun game to play even with Python’s tooling). It’s particularly unpleasant if you don’t have a type checker because you’ll find yourself forgetting “await” a lot even in the kind of code that you would think is so simple that it doesn’t need tests. Mostly I think we have better options these days—my go-to is Go (pun unintended) but I’ve heard good things about TypeScript as well. If you really need Python for some data science stuff or something else, I would try to contain that bit as much as possible by making it its own tiny microservice or calling into it as a sub process call or similar. I would not write more than necessary in Python.


Rust complexity is not encapsulated in libraries that parses JSON or YAML, or others.

Rust complexity is embedded in the following: in order to get something to work, you effectively need to solve a puzzle.

Solving the puzzle is fun and feels very good once completed, and the prize is definitely worth it when performance and safety are critical.

I highly doubt that "solving the puzzle" makes sense in the context of ad-hoc automation or boring sysadmin/devops stuff. Last time I checked, creating a static binary (a-la go) was not easy. It may sound stupid... but at the end of the day go is the king of pragmatism.

I think exactly the opposite though about systems languages: Rust should take over C and C++.


Even with respect to dealing with JSON, good luck deserializing a structure with borrowed references. People will tell you not to use borrowed references, but sometimes you’re at the mercy of a library author and even if you’re not it kind of sucks to have to do a big refactor across your codebase to replace borrowed fields with owned fields, right after you got everything working just because you can’t figure out how to deserialize properly and no one else appears to know how either. This is one of many problems that make Rust more difficult and time consuming in practice than Go which doesn’t distinguish between owned and borrowed. Of course this borrow checker gives you a lot more thread safety than Go provides, but I write a lot less parallel code than code that must be deserialized (or any of the other areas where pernicious borrow-checker issues creep in).

Rust improves quickly so I don’t think we should discount it, but I wish Rust folks would address these concerns when talking about how Rust is going to take over $x domain.


> it kind of sucks to have to do a big refactor across your codebase to replace borrowed fields with owned fields

Some refactorings in Rust has brought back memories of refactoring struggles with highly coupled code bases, where one seemingly small change ends up requiring touching many many areas of the code.

The omniscient need to define ownership across a code base seems to make it easy to introduce high coupling, which is unavoidable unless you start using shared_ptr/RC all over the place.


> Last time I checked, creating a static binary (a-la go) was not easy

When was the last time you checked?


Today, go produces static binary by default, not Rust.


By default, Go dynamically link to libc on Linux if you use networking. I believe because of a dependency on nss.

I recall it's truly static on MacOS though.


And yet reading some giant lib.rs is a bigger mess / un-readable than duplicated Go code. I think people gives too much credit to Rust sometimes.

You give some Go and Rust code to some team that never used either language, I can tell you right away that the Rust code will never be properly understand.

Go is easy to read and to maintain, I can't say the same for some Rust code. Go itself has been stable for years, there is no new keyword, std lib is the same, almost no changes ( beside modules ).

When you look at Rust they're adding thing much faster and it feels like C++ at times, the mental overhead to keep up with that is not trivial.


I will take a longer on-boarding in return for better long-term code any day of the week.

I’ve been down the “developer speed above all else” rabbit hole with Python, and I don’t think it ever ended up working out meaningfully better than the “think the problem through first” approach.

Personally I don’t find go easy to read: the noise of the “mechanics” of the implementation gets in the way of the intent behind the code in my opinion. Also the error handling is far inferior to the more principled approach taken by Rust/Haskell.

> ...I can tell you right away that the Rust code will never be properly understand.

I think this is an unfair characterisation: if I did this with my c#-writing coworkers, I think they’d certainly get a cursory understanding of Go quicker, but it wouldn’t take that much longer to get a grip on Rust and they’d write better and more correct code in Rust.


> the “developer speed above all else” rabbit hole with Python

I've become convinced that any language without a strong type system will eventually lead to a codebase which collapses under its own weight. Lately I've been working on a bit of data analysis in Jupyter, and even in that limited setting I find it hilariously slow and error-prone working with python. I would rate developer speed in terms of the time it takes to imagine and implement a correct solution. The correct part is key, because solving issues at runtime so so much slower than doing it at compile time. The only place where Python has an advantage is maybe in terms of how long it takes to type the code.

> it wouldn’t take that much longer to get a grip on Rust

This just hasn't been my experience at all. I've been programming professionally for over a decade, and I've dabbled with probably over dozen programming languages, and done real work in several: C/C++, Java, Obj-C, JS, Ruby, Swift, Python and Rust. I would say that Rust is an extreme outlier in terms of learning curve. It really took months to get to the point where I really got to that comfort point where I can just type Rust code with confidence that it's going to compile without looking a lot of things up, and even then I run into situations where I have to ask for help, and I feel like I am just scratching the surface with some more esoteric aspects of the language. In most languages it has been possible to reach that point in weeks.

I say this as someone who actually loves programming in Rust and currently reaches for it often as my "main language" of choice, but it is a complex language which is not easy to learn, and this is a real point to consider in terms of using Rust in a team setting.


The “developer speed above all else” mindset is driven by economics rather than technic. There are no reasons why one should not produce a good design in a any language. In the end it's a "people problem" most of the time.

> Personally I don’t find go easy to read: the noise of the “mechanics” of the implementation gets in the way of the intent behind the code in my opinion

I'd add another dimension: reading go is slower than reading Rust, but it's easy. Rust may be faster to read, but you can hit some very rough patch and serious head scratchers.

It's all a trade-off at the end of the day.


> There are no reasons why one should not produce a good design in a any language.

Certain languages give you better tools than others in terms of reaching a good design. Go for instance puts you in a bit of a "jail" in terms of complexity which prevents you from going too far off the rails. Rust uses a strict compiler to categorically prevent whole classes of issues. So yes in some sense bad code is a "people problem" but the right tools can go a long way to mitigate it.

> Rust may be faster to read, but you can hit some very rough patch and serious head scratchers.

I think this is where Rust's type system can often be a double-edged sword. I have run into more than a few libraries which try to make things "simple" to use by leveraging traits solve problems in a somewhat magical way, but in my experience this often makes it very difficult to trace what's actually going on. I find that Rust code is not all that self documenting, and it relies heavily on documentation in order to be understandable. And quite a few libraries are strong on per-item, per-function documentation but there's a gap in terms of high-level docs explaining the intended use of the library.


The library used in the this post has gotten much better since this post was written (in 2019)

Recently the ability to do `exec` and `attach` was added. It should reach feature parity with the Go version soon once `port forward` is added.

https://github.com/clux/kube-rs


Apart from that, the runtime module has also been overhauled completely since this blog post. In particular, the new `Controller`[0] adds a bunch of stuff over `Informer` to make it usable in production (merging duplicate queued events, parallel execution, delayed retries, fixed a bunch of internal race conditions, etc). We really need to make a 2020 edition of this blog post, but we do at least have an updated example controller.[1]

(Not to dunk on TechnoSophos, none of that existed back when the blog post was written.)

Disclaimer: Co-maintainer of kube-rs

[0]: https://docs.rs/kube-runtime/0.47.0/kube_runtime/controller/...

[1]: https://github.com/clux/kube-rs/blob/8611e5cdd23d4ff487fdcea...


Hmm! Interesting. I'm wondering if adding a test framework like kubebuilder is on the agenda. I'm not a great fan of it (kubebuilder) but the simple-to-use (in a relative term, i.e. as simple as it can get with Go) is a huge benefit. So you won't have to deploy after every build to see how your changes look like.


We don't currently support anything like Kubebuilder's envtest helper, but it is on our radar (and I finally got around to making an issue for it[0]). The rest of Kubebuilder's testing page seems to be general for Go's test framework (in which case Rust should have native equivalents).

Disclaimer: Co-maintainer of kube-rs

[0]: https://github.com/clux/kube-rs/issues/382


I'm not sure about that, but the maintainers are very active and receptive so maybe make an issue? Better yet a PR.


How has Kubernetes being a giant reactive resource API panned out? My first reaction was to like it, but near the end of when I was very invested in the space it started feeling like there were too many moving parts and a hectic mess. Stacks became more difficult to reproduce (controllers, their versions and platform limitations) and people moved too many things into custom resources.

Did the custom resource controllers pan out well?


I'm not sure what you mean by too many things ended up as custom resources -- it's the recommended extension point and you can do a lot with them that you can't do with core types.

Custom resources had/have some growing pains but I think worked out pretty well. It can be very very hard to test and distribute them though. As someone who maintained a controller with paid support, your test matrix gets pretty large pretty fast accounting for different k8s versions, different hosted versions (GKE, AKS, etc), different distros (openshift, rancher, etc). And that's before you even get into specific configurations like pod security policies, can the control plane communicate with the data plane, is there a service mesh.

Resource versioning is hard to get right. Once a resource type is v1, it becomes difficult to extend it. You can't add a beta field to it easily. Revising schema can be hard since anything more than a no op conversion between versions requires a webhook, which requires a certificate chain, and while cert-manager is popular it is not ubiquitous and regularly has breaking changes. Webhook setup issues made up a large portion of our support requests.

As far as the general "reconciliation loop" architecture goes, you end up with something similar in most orchestration systems I've worked on, or you wish that you did. So overall I think that worked out well. Getting it right can be hard, but I think that's the nature of the beast.


Really interested to know about the challenges you faced in revisioning but couldn't find the exact thing, do you happen to know a source for understanding it more?


Not OP but the feature you're looking for is OLM

https://olm.operatorframework.io/docs/


The “kuberbetes as a reconciler loop” for CRDs pattern is its must useful quality to us at Heroku.


Don’t know but it’s a shame that the k8s API doesn’t have link fields... it kind of breaks the whole concept of automatic interfacing


Can you elaborate on what you mean by link field? I've searched a bit but I'm not certain what you're referring to in this context.



Custom resources are really "operators" and there are some really great operators, and there are some not so great ones. Notably OpenShift 4 is implemented as operators on top of K8s, so there is a lot you can do with them.


Well look how popular Kubernetes is, how many thousands of things built on top of it because its powerful and extensible API model.


> How has Kubernetes being a giant reactive resource API panned out?

It was never reactive, at least in the sense of the Reactive Manifesto. There is no system of backpressure.

> Did the custom resource controllers pan out well?

I think it will prove to be useful for a handful of uses, but that in general, it will wind up like Wordpress and Jenkins plugins. Powerful, popular, but a guaranteed mess.

Kubernetes was not originally designed to accommodate CRDs. There's no concept of tenancy, only things you can cobble together into tenancy-ish-alike-kinda shapes. There's RBAC, but it was designed before CRDs and means that poorly designed custom controllers can be attack vectors.

I think CRDs are useful and necessary[0], but that they are greatly overused and have expensive design flaws. Unfortunately, I don't have the wits to join into the multi-billion dollar market of "papering over things Kubernetes doesn't do very well but which can't now be changed".

[0] I mean, I wrote a book about Knative, which is a purely CRD-based architecture.


>"I think CRDs are useful and necessary[0], but that they are greatly overused and have expensive design flaws."

Could you elaborate on the "design flaws" part? Do you feel that the CRD model itself has design flaws or are you referring to specific project's CRDs?


tl;dr If you’re operating with CRDs at trivial scale, you probably having nothing to worry about. But operating with CRDs at scale is a different story and suggests careful testing with the specific applications involved.

——-

The usage patterns of native k8s types and the implications those patterns have on the scalability and reliability of etcd and the apiserver are relatively well-understood. CRDs can be a wild-card, though, and afaik testing efforts thus far have not investigated worst-case usage of CRD-based applications.

As commonly deployed, CRDs are served from the same apiservers and etcd cluster that serves the native types for a k8s cluster. That can result in contention between the CRDs supporting 3rd party additions to a cluster and the native types critical to the health of a cluster. This kind of contention has the potential to bring a cluster to its knees.

Efforts like priority and fairness seek to ensure that the apiserver can prioritize at the level of the API call. But that won’t prevent watch caches from OOM’ing the apiserver if excessive numbers of CRDs are present. The judicious use of quotas could head off the creation of an excessive number of objects, but it’s not just count that matters - the size of each resource is also a factor.

In theory, CRDs could be isolated from native types by serving them from an aggregated apiserver backed by a separate etcd cluster. afaik this not a supported configuration today, and even if it were the additional resources required to support it (especially the separate etcd cluster) may be prohibitive for many use cases.


I agree with all of this, with one nitpick intended to self-aggrandise.

You can actually nominate particular types be stored in particular etcd servers -- GKE does this to put Events into a separate etcd from everything else.

However, it still has problems. Firstly, you can only define it for inbuilt types. Secondly, it's common for different objects to cross reference each other through objectRefs and the like, which behave badly when you effectively perform a join in the API server over multiple etcds.


>"You can actually nominate particular types be stored in particular etcd servers -- GKE does this to put Events into a separate etcd from everything else."

Interesting. Is this documented anywhere?


The --etcd-servers-overrides flag can do it. I could swear I'd seen a proper writeup in the Kubernetes docs, but couldn't find it again.

What I said was slightly wrong. It's not that you nominate Kinds, it's that you nominate which etcd servers get which etcd key paths. You can essentially work it out because the path structure is consistent.


Thanks for the insights. I was curious about this however:

>"But operating with CRDs at scale is a different story and suggests careful testing with the specific applications involved."

Do you mean the number of different CRDs deployed here or just the number of custom resources created? Or is it the same concern with either? I'd be curious what you are defining as "scale" as well?


The number of deployed CRDs is not likely to be an issue. The number and size of custom resources (CRs - instances of CRDs) is potentially an issue.

Scalability is relative, and depends on many factors including but not limited to:

- the resources available on the hosts running apiservers and etcd members

- the number and size of resources (custom and native) that controllers will maintain

Relatively speaking, a cluster of a given size might be perfectly capable of handling on the order of many thousands of resources . Push that an order of magnitude and the overhead of serving LIST calls - marshaling json from etcd to golang structs for apimachinery and back again for sending over the wire - could exhaust an apiserver’s memory allocation. And since the impact of resources is cumulative, any one application relying on lots of CRDs might not destabilize a cluster on its own but might well contribute to an unhealthy cluster when running alongside similarly CRD-heavy applications.

The key takeaway is that the kube api is best thought of as a specialized operational store rather than a general-purpose database. Anyone wanting to rely on CRDs at non-trivial scale would be well-advised to test carefully.


Regarding multi-tenancy: The CRD schema is global but versioned, so it's annoying-but-manageable with multiple tenants.

Controllers can be scoped to a namespace without requiring cluster-level permissions, no?


> The CRD schema is global but versioned, so it's annoying-but-manageable with multiple tenants.

It means every controller for a CRD winds up installing another webhook. And having to test a variety of orderings. It's hard to get right.

> Controllers can be scoped to a namespace without requiring cluster-level permissions, no?

The difficulty is that Kubernetes RBAC is good at expressing rules "Role 'foo' can perform operation 'list' on kind 'Deployment'". But it's less capable of saying something like "Role 'foo' can perform operation 'list' on kind 'Deployment' which were created from kind 'CoolerDeployment'". It's also hard to delegate something, along the lines of "Role 'foo' can delegate ('create' over kind 'Deployment') within namespace 'bar'".

I think dissatisfaction with PodSecurityPolicy will cause Rego to worm its way into the core architecture over the coming years. It'll then eventually crowd out RBAC because you can impose (sort of, more or less) arbitrary rules. But none will dare call it ABAC.


Agreed, this is one of the biggest issues I see. Anywhere you install a custom controller essentially has access to ~all of your resources. What could go wrong?


This is (2019).


"The Go version was over 1700 lines long and was loaded with boilerplate and auto-generated code. The Rust version was only 127 lines long"

Right ...


I've written CRD controllers. This is an accurate description of what your code looks like.

Because of Golang's anemic type system, massive code generation is the only viable way to deal with the dynamic blackboard that the API Server presents to clients.


I also write CRD, but I don't beleive for one second that you have more than 10x less code, especially for generated code that you don't touch at all.

It's just a bad idea to write something for Kubernetes that is not in Go, you have 0 support and k8s releases a new version every 6month, good luck keeping up with that.


It's a different language with a different type system. I've also interacted with the API Server using a Ruby library that relied on metaprogramming. I could do things in a dozen lines that would require thousands of lines of checked-in Go.

Replying to your edit:

> It's just a bad idea to write something for Kubernetes that is not in Go, you have 0 support and k8s releases a new version every 6month, good luck keeping up with that.

Well, a few things.

First, using client-go doesn't save you from version shifts. It is explicitly unsupported, so if they break your downstream system, stiff shit.

Second, client-go is derived from code generators. The same generators also produce other clients. Unfortunately they are terrible, by virtue of OpenAPI generator being a hellish maze and because they leak Go-isms like a spiteful sieve. I've used that code generator to develop a proxy. It sucked.

Third, this library is derived independently. It uses the same OpenAPI definitions as client-go, client-java and so forth. It's as easy to keep up to date as any of the others.

Fourth, the Kubernetes API isn't fully tested. Not even close. Many libraries can rightfully claim to be just as conformant as client-go, despite being broken on ~66% of endpoints: https://apisnoop.cncf.io/


As the maintainer of the Rust bindings that the library used in the article (kube) is backed by, I can confirm that Kubernetes' openapi spec requires a lot of Kubernetes-specific handling to generate a good client that generic openapi generators do not provide. Yes, that includes all the other Kubernetes clients in github.com/kubernetes-client like Python and .Net too.

See https://github.com/Arnavion/k8s-openapi/blob/master/README.m... for a full description.

I also confirm that I keep it up-to-date with Kubernetes releases and have been doing so for the ~3 years that it's been around. Not just the minor ones every few months, but even the point ones; these days the latter usually only involves updating the test cases instead of code changes and they're done within a few hours of the upstream release.


Truthfully, the fact that your bindings don't use OpenAPI generators has tempted me to learn Rust just so that I can escape their madness.


Not sure why you’re being downvoted. OpenAPI schema generation is one of the most challenging elements of developing with CRDs. Anyone who is not an apimachinery SME is likely to struggle at some point to get their schema generated as expected.


Thank you so much for your work on this and keeping it up to date.


I don’t think openapi is necessary. All endpoint paths are easy to construct knowing gvk tuples which can be easily extracted from k8s.io/api using some go package analysis (you can also do it at runtime by parsing openapi endpoint). The data types are all defined in generated.proto in the same repo.


It depends again on language. The ruby library I used exploited exactly this discoverability to create objects at runtime. It was awesome, except when time came to navigate my codebase. Where does the FooKind exist? How do I checkpoint it? Drat.

The generated codebases like client-go provide a great deal more navigability of code rather than APIs. It is the code that I actually interface with. I have my complaints about the necessity of code generation, but I do like having some kind of in-advance typing to help me figure things out.


It’s a perfectly fine idea. Kubernetes uses standard rest endpoints with defined semantics and data types are strictly defined via protos


I tend to agree! However, I think it has factored in the generated code as well. With something like Kubebuilder you won't have to write that many lines of code for a basic operator.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: