Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: Runops (YC W21) – A better cloud shell for production apps
97 points by andriosr on March 8, 2021 | hide | past | favorite | 23 comments
Hello HN! I'm Andrios, from Runops.io - we're building a proxy to commands you run in the terminal that adds Git, code reviews in Slack, and removes sensitive data from results. It's like the Cloud Shells from GCP/AWS, but with more features and using your local zsh/bash terminal.

You run an AWS CLI command in the terminal and it goes to Runops instead of AWS. Runops adds the command to Git and gets peer reviews (when required) in Slack before sending it to AWS. After it runs, we deliver the results back in the terminal, but with all sensitive data masked. It works for AWS, Kubernetes, databases, and others.

I was leading the Infra team at a Fintech (pismo.io/en), and we wanted to give autonomy to all developers in production. But we couldn’t give them direct access due to compliance requirements. The solution was to have a small number of people (my team) with "full access" to production systems. Engineers would ask us when they needed to run one-off scripts in production. Our goal was to deliver automations so that other teams wouldn't need to ask us to do things. We would build a way for them to do it with compliance, security, and reliability.

It didn't work. We were spending 80% of the time processing the queue of requests, and 20% building automations. The backlog was always increasing, and the team was burning out. Engineers were not happy as their requests took a long time to process and clients were angry at them.

But some nice automations came out of that. For instance: we needed to review ad-hoc prod database reads to avoid bad queries. So we built a Jenkins pipeline that ran SQL queries from Git after code review using Flyway. Any engineer could run queries in prod, leaving traces on who did it, reviews, when it happened, and why, for every query.

When talking to friends at similar companies, I saw the problem was even worse. Some of them weren't trying to automate, they already had dedicated people for running these scripts, i.e., an ops team. I knew there was a better way, so I set out to build it. I quit this job mid last year, with about 8 months' worth of savings to make this work before I'd need to find a job again. It was tough in the beginning, as I’m an engineer and had to learn sales, marketing and product management on the job, but after getting the first few customers things started improving.

The goal for Runops is to let any engineer run anything in production as if they had full access, automating as much as possible of security and compliance. When human interaction is needed, we make it synchronous using Slack. Now, instead of having a single team as a bottleneck, you can have everyone do things in production. Centralizing teams with most of the access to AWS, Kubernetes, and databases is bad. It makes for slow Change Management processes using Jira or other tools with manual executions at the end. Runops let’s you add quick reviews from experts (Infra, DBA, security, etc), and automates executions.

The primary interface is a CLI, where you run scripts that goes from SQL queries to kubectl exec and AWS CLI commands. We don't create new abstractions, you use the same commands and docs available, we just proxy them. A nice benefit is replacing VPNs and the 10 client tools/credentials you would need today. We also support templates for custom actions in a bunch of languages.

We built it using Github Actions for executing commands. We store configurations and credentials as Actions Secrets and they get injected when a command requires them. It's nice because we can run anything that goes in a Docker container in <15 seconds. We have plans to improve it beyond Actions by creating a real-time proxy. That will enable a REPL-like experience. Runops doesn't have a web interface, this is on purpose, we don't want to be one more tool engineers have to learn. Most interactions happen with our CLI or Slack. We have a simple admin UI in Retool.

We do everything using Lisp. The CLI uses Clojurescript; the REST API uses Clojure. It's great to have the same language everywhere, and Lisp is also a fantastic advantage.

Today we have big Fintechs using Runops. They use it to let developers run commands inside Kubernetes pods, like Rails Runner and Elixir IEx, SQL queries, DynamoDB queries, and making internal API calls in private networks using cURL. One of the best parts of building this has been seeing developers doing more production work. Regulated companies that never considered giving this level of autonomy to all developers are changing their minds. It's great to see a tool impacting the culture, increasing trust.

We're really happy we get to show this to you all, thank you for reading about it! Please let us know your thoughts and questions.




What you've basically created is automated change control, but lacking some change control features. You might want to add a set of features specific to managing change control, because otherwise I'll have to build a change management system around this.

> You run an AWS CLI command in the terminal and it goes to Runops instead of AWS

Most enterprises are wary of handing over control to a vendor, especially if it's the underpinning of all operations in the company. I suggest a self-hosted/Enterprise release. After a few years of trying to make it work, the Enterprise will gladly pony up more money for a hosted cloud solution, but the self-hosted will get you in more doors.

> We do everything using Lisp. The CLI uses Clojurescript; the REST API uses Clojure.

Do you expect regular people to be able to contribute to or modify this? Do you find a lot of Lisp/Clojure devs out there for when you need to expand?

> Painless audit trails: No need complex for ETL to connect trails from Cloud Trail, Database Audit Logs, Kubernetes audit, etc.

You still have to audit those things. If a hacker gets in to your infrastructure, you have to know what they did.


I agree with many of these points, here are some thoughts on how we deal them:

> Most enterprises are wary of handing over control to a vendor

Great point, we do have the Enterprise version, which is self-hosted.

> Do you find a lot of Lisp/Clojure devs out there for when you need to expand?

We won't hire engineers based on the language they know, but instead in general engineering skills, and they can learn Clojure here (already worked for the first one:)

> You still have to audit those things

Yes we do, but mostly to trigger alerts if anything happens there and to show that the accesses are either from Runops or the applications during audits. This is way lighter than relying on these as the source of truth for trails.

I'm curious about the Change Management features you think are missing. We do have review workflows and other CM-related features I didn't add here, this demo shows some of it: https://see.runops.io/videos/demo


That's great, I didn't know about the Enterprise version. I'll look closer at this and see if it fits our use cases. (fyi, the more docs we can read [esp. operational docs, but also technical], the more likely we are to push for adopting this in our enterprise)

Here's some of the things a typical IT CM process handles: How do you properly surface and acknowledge the risk & impact of a change? What are the order of operations & how do you track them? How do you validate an operation? Roll back? Are there governance structures in place (X team can run Y task in Z environment)? With multiple stakeholders, how do you get approval from everyone, or handle change overrides? How do you run 'almost' the same thing in different environments? Do you use multiple communication/coordination methods, like e-mail, MS Teams, Slack w/multiple workspaces/orgs, Zoom, Jira, ServiceNow, etc? Is there a "change plan" which is drafted, edited, published, approved, and executed, in coordination with multiple teams?

All that may be overkill for your tool, but it's some of the stuff I would need to build around it to use it in my enterprise. Otherwise I (the ops guy) would need to build & run all those steps and coordinate with others when it's time for them to perform their steps (release, validation, debugging, rollback, etc).


This is great, thanks for sharing! We handle most of these in Runops itself, but you can also use your existing CM tool and leverage only the UX and automations from Runops. We have APIs and webhooks that enables you to extend Runops, one company is doing this and integrating Runops to ServiceNow.


I haven't personally been on an infra team but I've seen Infra / Dev tools teams being overwhelmed with requests. This seems like a really helpful and elegant solution!


Curiously I started in the dev team and migrated to infra in an attempt to fix things :)


The information on https://runops.io/ is light, does not have information on examples, workflow etc.

what is the setup like (is it cloud hosted or hosted by onself in a cloud). Is your code open source? how is authn/authz if I want to use this?


Yes, we have a lot of work to do on our landing page to better explain these points. It's early days, but we will get there! Here is some light on them:

It's cloud hosted, and we do support self-hosting for enterprises. The code is not open-source. We support Okta, Google, and other OAuth providers for Authentication. For Authorization we have the concept of Targets, which are abstractions of your cloud resources to users/developers. Say you have a Mysql database, you can create a read-only Target and let everyone use it, and create a second Target for the same database with write access. In the second Target you require reviews from tech leads, or let selected groups run queries.


Thanks!.

how does your service compare to services such as teleport

https://github.com/gravitational/teleport


Teleport is a fantastic tool. The main difference are: 1) Runops doesn't require you to have tools (kubectl, psql, etc) installed locally and don't download temporary credentials to access resources, commands execute in the Cloud. 2) Runops has synchronous reviews workflows on the command/intent level, again as opposed to getting an open session for a period of time. 2) We automatically remove sensitive data from the results of every command. 4) Runops uses Git as the source of truth for the audit trails.


I'm curious if it's visually obvious that commands are running in a non-dev environment. Saving people from the scenario where they walk away for a coffee, return, then accidentally start typing into the wrong terminal window.


I've done that and can relate to the problem! It's common for Kubernetes, where you never know which cluster kubectl is pointing to. The Target (what we can where you are running things), is one of the options in the CLI. So you have to at least provide: the Target and the script to run a command. This way you always know where you are running things, it's something like this:

runops tasks create --target mysql-demo --script 'select * from dundermifflin.customers;


Ah, great, thanks. Some of the wording made it sound like perhaps it was hooked transparently. This appears very clear.


We’re an Azure/M365 house, but some of the things this tool explicitly solves were mentioned as areas of improvement for us during PCI assessment recently. I’ll be keeping an eye on this. Great work so far!


Glad to hear we could help in the future, feel free to reach out any time. We would love to hear more about your use cases and the alternatives you guys have in mind to improve the PCI assessment results. We support Azure :)


How did you acquire your first customers?


It was a combination of multiple things. The first customer came from the newsletter I run called SRE Teams (https://sreteams.substack.com). Others came from intros from my network and from reaching out to people I thought we could help. When I was running the DevOps team at Pismo we used to organize meetups and knowledge sharing sessions with other companies having similar problems, this also helped.


So hosted PAM, but I don't see any compliance certifications on your website? How do you have fintech customers today using it (or really, anyone using it)? Why would anyone trust you guys to proxy access to their environments?


Yes, I like your definition. We don't have certifications yet, but the team has done the biggest ones before we are keeping everything ready to get them. We should start the processes in the next couple of months. That being said, not all certifications require all software you use to also have the certifications. I understand this is critical for PCI, where anything with access to the data is also scope, but for SOC2 this is not the case. Most of our customers today are fintech, we are very transparent about our architecture and how we do things with our customers, that is where the trust come from. We the best solutions available to deal with things like storing credentials and sensitive data. That being said, you can always opt for the self-hosted enterprise version.


That looks promising. Make me think of my favorite tool rundecks...


Yes, Rundeck is nice, but has its downsides. We have a lot of companies migrating from it. Runops is the perfect alternative :)


Could one refer to this as a so-called "API Gateway".


This is an interesting way to put it. Yes, you could say it's an API Gateway for Cloud tools. I like it!




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: