Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: Signadot (YC W20) – Lightweight Test Environments for Microservices
56 points by jaguar75 on Sept 16, 2022 | hide | past | favorite | 20 comments
Hey everyone, we’re Arjun and Anirudh, founders of Signadot (https://www.signadot.com/). Signadot is a Kubernetes-based solution that enables lightweight environments, called Sandboxes, to test microservices early in the development lifecycle.

Before founding Signadot, I managed engineering teams building microservices at different companies. As the number of services and external dependencies (i.e databases, message queues, third-party APIs, etc) increased, testing became challenging. While at a smaller scale, we could stand up our “application in a box”, as the complexity increased, we relied on a pre-production (staging) environment to do a lot of our testing. However since the staging environment was shared by many teams for testing, it became a bottleneck. When issues were discovered on staging, the root-cause analysis took a long time.

We talked to many companies about how they were testing their microservices, especially once they started to grow beyond ~20 engineers. We encountered various solutions ranging from companies setting up multiple (expensive) staging environments, to having each team take turns “locking” the staging environment. At larger companies, like Uber and Lyft, we learned that they had built their own highly scalable (but bespoke) solution for testing microservices based on dynamic traffic routing. With this kind of system, environments can spin up quickly, and are cost-effective at scale. We wanted to build a similar system that was more generalized and make it available to everyone running on Kubernetes.

The intuition behind Sandboxes is that each test environment only has a few microservices under test and all other dependencies can be fulfilled from a shared pool of microservices running the latest version called the *baseline*. Starting with a staging Kubernetes cluster running up-to-date stable versions of microservices, sandbox environments are described in terms of what is modified with respect to the baseline. This is similar to the copy-on-write model of resource management. Once a sandbox environment is set up, tests can be run against it and requests get routed to the versions of microservices under test.

An important consideration is isolation between different sandbox environments. For isolating test requests, we use traffic labeling and request routing. With traffic labeling, a tenant ID is set as an L7 header on each HTTP/gRPC request and based on the value of this header, requests follow different paths through the microservices. Request routing is realized by working with a service mesh (like Istio) if already installed, or by using our sidecar proxies. For tests that require data isolation, we built a pluggable framework called Resources (https://docs.signadot.com/docs/sandbox-resources) that can set up an ephemeral stateful resource (Kafka topic, database schema, etc) on the fly and tie it to the sandbox lifecycle. Resources can also be used to test when there is async communication across services.

We worked with a few enterprise companies for over a year to come up with an architecture that can support complex microservice environments. We built a Kubernetes Operator that installs into our users’ Kubernetes clusters and connects to a control plane that we host. Our control plane acts as both an API to create sandbox environments and as a proxy layer that can route traffic to sandboxes.

We are launching with support for integration & end-to-end testing by providing high-fidelity environments that can be set up via the CI pipeline. Next on our roadmap is making it easy to write custom resource plugins and enabling feature testing using sandboxes. With feature testing, developers working on different microservices can see how their changes interact with each other’s services before merging.

If you’d like to try it out, we have a free tier. Our pricing is based on the number of unique services in sandboxes.

Thanks for reading this post and we welcome your feedback and comments!




As someone who has built out a similar internal tool, one of the things I'm excited to see someone do is the route propagation technique. It's something I've evaluated adding to our internal solution as we rolled out a service mesh, but ultimately we manage routing in a different way under the hood.

Either way, this notion of slices of environments being deployed for testing, with "baseline" or fallback environments being used otherwise, is the future of software development. It's a real boon for developers when rolled out effectively, and I've seen it scale massively at GoodRx.

Congrats to the team. Wish you all tons of success!


Thank you! We're really excited about covering the entire dev lifecycle using this approach. Re: routing, do you make use of a library that is directly included in each service? We've seen an approach in the wild where such logic is embedded into gRPC interceptors and HTTP middleware. Curious to hear how you thought about the choice between that and using the mesh.


Right now, we just require intermediate services to be spun up if you need to connect with a custom downstream service.

So, as an example, if I have three services where A talks to B, B talks to C, and I have a custom version of A & C that require testing E2E, we have to spin up B in the middle.

It definitely is a point of confusion for engineers who have to understand what intermediate services are in play when working on a frontend and some distant backend microservice. Fortunately, that intermediate layer tends to just always be the same two services, so folks learn pretty quickly. At scale, especially hyper-scale in a microservices architecture, this becomes untenable, and either automatic dependency discovery OR routing (like you designed) is the path forward.

Needless to say, it's a "niche" issue I think folks don't run into for a while in their setups; however, once you run into it, it's a PIA for those involved.


Makes sense. I have seen a couple of instances of this pattern - with service dependencies being stitched together using configuration. Thanks for sharing!


One thing that maybe is just wooshing over my head, but how does persistence fit into this, at least as best practices?

The way I understand this tool from the docs, is that a request is duplicated to go to the baseline service A as well as service A' -- so service A on `main` branch is actually serving the up to date code, while service A' has modifications that developers can quickly see either work or blow up. What happens if the change is a DB write change? Do both A and A' point to a primary prod DB and if A' changes something in that results in bugged data, wouldn't that screw up prod data? How do I go about accounting for that? Or am I just entirely misunderstanding the point of this tool?

EDIT: I think I just bumped into my own answer within the docs -- "Sandbox Resources." I see so far you have Mysql/Maria, SQS, and Rabbit plugins. What's next on the roadmap? Kafka/PgSQL soonish? :)


I was just about to post this :)

Depending on the request context (typically a set of headers), a request will end up going to one of baseline service A or A’, but never both. Depending on the scenario being tested, you can choose to isolate additional stateful components. In the case of testing a change to the database itself (DDL, etc), you’re right in that the database must be isolated so that A’ does not talk to the same shared database as A. So, as you pointed out, typically this would involve using a resource plugin (https://docs.signadot.com/docs/using-resources-with-sandboxe...) to set up that new database shard (or table, or schema) that is isolated as part of the test sandbox environment, and then linking A’ to talk to that isolated instance instead.

Kafka is on our roadmap next! We're actually working right now on making custom resource plugins easy to write, because invariably with databases, there's a data seeding step that's pretty custom for users. Right now, for building a custom plugin, aside from writing the logic in the provision / deprovision scripts (for ex. https://github.com/signadot/plugins/tree/main/src/mariadb), there's a bit of docker image building and helm packaging needed that we're hoping to simplify.


> a request will end up going to one of baseline service A or A’, but never both

that's interesting, i totally didn't gleam that. so X percentage of users are getting a potentially bugged experience? is this kind of like a more robust feature flagging tool? i.e "rollout this thing to 1% of people for X time, and then roll it back to control"

and if so, if you have 200 engineers all testing all sorts of stuff on prod users/data, the chances of buggy experience for small amounts of people increases? am i understanding that correctly?


Ah, the primary use-case here is for developers to test microservices as changes are made to them - so one would typically be running this setup in pre-production / staging only and no prod user data would be involved. I would think of it as a way to run integration / e2e tests for each commit or pull request. Running these tests in the high fidelity test environment constructed this way reduces the bottlenecks on staging environments, and bugs are discovered / addressed sooner.


AH! derp. thank you that makes total sense :) -- seemingly awesome tool, will certainly look into this. thanks for the quick responses


How does it compare with https://www.vcluster.com/ which also available as open source?


Thanks for the question. At a high level, vcluster and sandboxes can be used to create environments within k8s. The approach we take with sandboxes is different in that it optimizes these environments for testing in high fidelity and without having to deploy a full isolated copy of the entire application which doesn’t scale well. As I understand it, vcluster shines where isolation is important right down to the k8s control plane and networking, but one would still typically deploy every workload into a vcluster in order to realize a test environment. The scaling considerations that apply to deploying the entire application into a K8s namespace (or real cluster) would also carry over to the use of vcluster.


Amazing work Arjun & Anirudh! I know you have been working on this for a while! What has been your biggest lesson learned building such a sophisticated devtool?


Thanks! The biggest lesson for us has been in finding the balance between being opinionated v/s being flexible in the abstractions we build. I think it's easy to go too far in either direction but we're learning as we build about how to make these decisions better.

For example, there were 2 different versions of the sandbox creation APIs that we built in the past - one that optimized for docker image-driven workflows, and another that designed for the sandboxes being immutable, but our current APIs are document-style and mutable, giving additional flexibility to build more use-cases on top like feature testing.


This looks pretty dope. Is there a documentation where one can understand the underlying concepts of such a tech?


We've been working on some concept docs at https://docs.signadot.com/docs/concepts. Please let me know if it could use more detail on anything!


Let us know what could improve. Thanks


Congratulations Arjun & Anirudh!

Exited to see tools for testing microservices early in the development lifecycle.

Wish you best of luck.


Thanks for the kind words!


Congrats on the launch Anirudh! Can’t believe we met in a random Uber years ago in Menlo Park


Thank you! :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: