Does anybody _actually_ use the HashiCorp stack, besides Vagrant, for serious work? I tried and honestly found their products sorely, sorely lacking. Very shiny documentation, very incomplete, un-battled-tested tools, no examples given, little response from their devs other than the PR team.
Heck I found the same to be true of vagrant. Big promises, short on delivery. While mostly usable, it's definitely not as seamless across various guests, providers, and provisioners as their docs might have you think. Stray off the beaten path even the tiniest bit and expect to spend hours looking at github issues for workarounds (submitted by other users not the dev team).
My experience.
Windows guests: buggy
FreeBSD guests: buggy
Ansible provisioning: buggy
Windows support was incorporated from a community plugin. Now it's in limbo maintained neither by the original creators nor the (seemingly) lone dev from hashicorp assigned to vagrant.
They use DRM to protect their closed source proprietary plugins and it makes it a nightmare to build a version from source that works with said plugins. Building the installer is closed source for some reason.
After finally being able to run with modified source, I was able to monkey patch their paid VMWare plugin to add a capability so I could use private_network with static IP on Windows guests . I posted the monkey patch to github issues and it took two weeks for their paid plugin to be updated.
I wanted to love and contribute to vagrant, but the somewhat dishonest docs, unresponsive dev team, and DRM have turned me into a grudging user only because there are no alternatives.
Vagrant is definitely a wolf in sheepskin in terms of appearing open source and open to community collaboration.
Abstractions-on-top-of-abstractions which promise to reduce complexity by adding more levels of indirection, when the UX of the underlying tools need to be better.
(Having wrote a multihypervisor disk-cloner in Ruby.)
Speaking for myself, I've put enough tooling around Packer to make it tolerable in my build pipeline and I haven't found a better tool that does what it does (suggestions welcome!). I would be using Vault, except it's just on the far edge of usable for me (particularly around the aws-ec2 provider) and I am repelled sufficiently by Go that I'm not driven to hack it to suit my needs until I exhaust my other alternatives, and I haven't yet.
Consul is...fine, I guess? It's not particularly useful for me, but that's because I have a Zookeeper background and am comfortable with its primitives. I'd use it if I had a need for it. OTOH, I've had extremely bad experiences with Terraform, both in terms of trying to build tooling around it and in having it hose states when resources fail to create correctly (I've mentioned previously around here that multiple clients independently invented the term "terrafucked" for the results of their Terraform state) and I don't feel like I can trust the tool with something as critical as my cloud infrastructural resources. I might bite the bullet and use it if I were working on Google Cloud or Azure, but in AWS, CloudFormation (with tooling on top) is sufficient.
A long, long time ago? It has been a while. But it also isn't what I'm looking for. Aminator is designed for pretty restrictive situations where you're deploying applications (in the form of single RPMs). And I do a lot of that in what Aminator considers a base AMI, and that's fine...but what do I use to build the base AMI? =)
Packer has a lot of user-friendliness problems (destroying build artifacts on failure, as the OP noted, being just one of them), but it does constrain the universe sufficiently that I can just run Chef and get on with my day.
If what you are after is generating AMIs for JVM, Node.js or Go apps, we at Boxfuse offer a great alternative to Packer with very fast creation of very small and secure AMIs https://boxfuse.com/blog/amis-in-30-seconds
So I spent some time studying your product, and it looks like you'd like to charge me $100/month besides for a worse experience with more constrained features than what Packer gives me. At a very fundamental level it doesn't even look like it considers the notion that not everything is an app, that high-availability, fault-tolerant services exist and need to be run, that even "immutable" servers need a consistently updated method of service discovery and credential retrieval to be effective.
If I'm wrong, please let me know how I can write, as I would using Chef and Packer, a set of directives to install and prep Zookeeper, then discover other Zookeeper nodes in my cluster at runtime, while remaining aware of nodes that are replaced due to system failure. (If the explanation exists and involves "our proprietary, closed-source agent," or "our open-source agent that talks to a proprietary backend", you don't have an answer.)
First of all thank you for evaluating our work. As I clearly stated in my post above, Boxfuse isn't intended as a general purpose replacement for Packer. If your usage falls within the usecases I highlighted it can be a very good fit though.
As for the notion of "immutable" servers, this industry term means servers that aren't updated in place, not servers without a read-write file system or read-write memory.
In the case of service discovery with client-side load balancing you can easily integrate client libs for services like Eureka or Zookeeper directly in your JVM application, or you can ship an agent (like Consul for example) and run that. You have a minimal Linux x64 system after all.
And no you don't have to pay $100/month. The licensing is based on a freemium model and your first app is free forever. And at the end of the day all you need to do is make the decision whether those monetary costs outweigh the value you get. And if it doesn't that's fine too. It simply means Boxfuse isn't the right fit for you.
You explicitly asked for suggestions for alternatives. All I did was provide one in case it may prove useful for you.
Boxfuse is not an alternative to Packer, though, except from a very great distance if one squints. It is a restricted platform play. This is why I asked how Boxfuse deploys Zookeeper, not a Zookeeper client. Which it can't do, which makes it even further not an alternative because anybody who's going to have even a moderately complex environment--which is to say "anybody who needs more than RDS"--is going to still need Packer (or a configuration management rig, Packer's just memoizing a lot of the initial stuff you'll re-run on convergence) to build out the rest of the infrastructure.
I'll be honest: I cannot envision a company that should use Boxfuse. Not one. They're better off with something like a Racker template that takes a few parameters and feeds them into Packer than your spooky-action-at-a-distance stuff.
Everyone has a different experience and perception I suppose.
I agree we have a long way to go, but I disagree that our tooling is "very incomplete" OR "un-battle-tested".
Ignoring Vagrant as you did, Consul is used at multi-thousand node (per datacenter) scale by dozens of companies and a couple specific companies are using it at an even larger scale. And that's ignoring the thousands of hobbyist and smaller company usage at dozen to hundred node scale. I only don't mention specific names because I don't have explicit permission, but you'll just have to trust I don't intend to lie here.
Vault as another example: if you interacted with a financial institution, your transaction at some point likely hit a Vault cluster. Did you visit some websites today? One of the world's largest CDNs has fully deployed Vault for internal TLS cert management.
Those are just a couple examples.
Or, ignoring tech usage completely, we just had our first seven-figure quarter after only three quarters of sales (and that was dozens of deals, not just a handful). You just can't get those sorts of numbers without real world usage.
On completeness, I think the adoption speaks for itself. Tools don't get adopted at the scale they're being adopted without being complete enough to productively solve a real pain point. I believe we have a long way to go but what we have already in most of our tools is relatively complete by measure of being able to get real, meaningful, and productive work done.
Its unfortunate that we can't get to every open source issue and resolve every problem for everyone, but please try to understand that the issue inbound across our projects is massive in addition to trying to build enterprise solutions for customers and run a business. We'd love to hire hire hire to handle all the community inbound but that'd be irresponsible of us financially. Our teams are slowly growing and we're also promoting more and more community members to core committers who help out quite a bit, too. Packer has ups and downs since we don't have a full team around it at the moment but its on our radar of things to work on.
Ultimately, we can improve in every area and we'll strive to do so. In the process, we're motivated and encouraged by our community and also by the "serious work" we see our tools doing every day across various industries.
I'll follow up with you via email to see where we've fallen short for you. I find these criticisms educational and would like to see where we went wrong. I'm not doing that to hide anything from the public, but only because its hard to have meaningful back-and-forth discourse in a few nesting levels of comments. :)
I, for one, think your tools are brilliant and have made my life as an engineer in a highly regulated environment much easier. I don't make many comments here, but I feel taking the time to thank you and your team for your hard work is needed.
To be sure, there is always room for improvement. HashiCorp does an incredible job at providing active support for the tools they create, and the active community that they've built up over the years is proof of their dedication to their ecosystem.
And if you're into distributed systems, talk to us seriously:
"From a values perspective, we're trying to understand the way the world works — that's what our business is — and so we're really interested in people that have a sort of deep curiosity, people that have the patience to understand deep and complex systems," Kreiter said. "Now, whether those are biological systems, or economic systems, or political systems, it doesn't really matter. Somebody who has an interest in and an ability to understand that deeply is interesting to us."
Mitchell, I appreciate what you do with Consul and Vault. Starting with Vagrant, you've made the world a little bit better and your sales growth is no surprise.
I've not had much luck with Terraform, I never really "got it" with Otto, it feels like Packer is unloved, and I'm also struggling to understand Nomad (vs. Kubernetes, Mesos, et al).
Hopefully constructive feedback -
There's some disparity between the level of "marketing polish" and "technical polish" with HashiCorp - almost all of the software had hype surrounding the announcement and they have slick websites but launch maturity has varied, and several products have felt like they lacked follow-on investment. Sharp edges haven't always been identified.
I know the hashicorp stack fairly well as we run packer and terraform to provision our production environment and we've considered rolling out consul.
I'm so glad you're killing a product. My complaint has always been that hashicorp was doing so many different products it was unable to actually deliver any of them at high quality.
Packer is a reliable and trusty piece of tech for us, but terraform (which has so much potential) is also so rough around the edges we've wrapped it in a mess of our own tooling to make it somewhat usable.
I hope you can find a way to focus on one or two things and do them well.
Hey Mitchell, sorry for sneak dissing. I wasn't trying to hate on your efforts, which are greatly appreciated, I was more just trying to poll the community if people actually use the stack in production. Will follow-up in email.
Vendor-published case studies are pretty worthless IMO. I've seen plenty where the vendor was absolutely despised by anybody working for the client who had to deal with them.
Random positive comment. Thanks for what Hashicorp is doing. You all do a great job with many of the crucial pain points in the SW Dev lifecycle. We're using several of your products to solve some deployment/delivery issues and I can't even imagine how we would go about it without your tools.
We have been using Vagrant, Terraform, Packer, Vault etc from some time. While there are some issues(that's true for many tools), I find these tools helpful. Hack, I used terraform to provision my QA cluster right now :)
You guys rock! Hashicorp's tools are fantastic and pleasure to learn and use.
We're a small company and we're going to officially purchase Atlas in few weeks - because it saves tons of time.
Also, every developer in our company (rather small company, where people have great flexibility what tools to use) has been cheering and clapping when we demoed CI/CD flow in Atlas. It's so easy to deploy and run services.
In my experience all their tools have a pretty high barrier of entry as they require to learn in most cased a new language or at least alter your thinking. But once you master it, most of them are kicking ass.
For example we use Packer daily and it's just fantastic. Yes, once you need to debug it, it's a pain in the ass. Also it makes some basic assumptions for you (for instance it turns off project wide ssh keys while provisioning [in GCE], so you have to manually add them anytime you want do something).
We also use Terraform daily and it works well. Couple of bugs now and then, but it does things so effectively that once it works, you save a ton of time.
I work for a Healthcare start-up, and we use Consul and Vault extensively. We did find Terraform to be troublesome though, and rather difficult to work with (error handling, corrupted state files etc.). Vagrant is used sometimes for development and testing, but the Developers prefer docker (faster start up, lower footprint) and who am I to argue with progress.
I will continue using Consul (it's DNS interface for service discovery is awesome) and Vault, but most everything else seems like it's a bit too ambitious and ultimately lacking.
Based on regular use of Vagrant and irregular use of Packer (to build base boxes for vagrant) I would be reluctant to rely on their other tools.
They mostly work, but it's never as seamless as they make it out to be.
I do appreciate the work that's put into these tools but I also believe anything can be criticised. That doesn't mean the author has to fix what everyone says is wrong but an attitude of "well it's free, either PR or stfu" doesn't help either.
I really, really want to like Terraform. I tried it on a medium project by starting with and modifying the Segment stack. I ran into constant problems with it not being able to deal with errors and not being able to modify the stack in a way that would work. I ran into several situations where I would have to change names and labels just so it could work around itself while I was trying to iterate on the stack and bring it to a workable state.
To give you an idea of how many problems I had I currently have 4 different tfstate files from 4 days of testing. I had to go into AWS and manually delete resources because it couldn't recover from the errors it created.
One example: I was using the ECS option and changed the container source for a service. Seems easy enough and something that should work. Terraform wedged itself after applying the change so badly that I had to blow away the entire setup to get it reset to where it could even run `plan` without erroring.
Otto looked nice but it had fundamental, basic issues and it seems like nobody was actually working on it. I +1'd a bug with the PHP implementation where it didn't give you the option to change the web root and never got an update. This is something that every single decent PHP framework out there REQUIRES and wasn't supported. Otto PHP seemed like it was designed simply to work with Wordpress.
Have you considered starting your own stack from the ground up?
I definitely find it a lot easier to manage and reason about if I mostly avoid third-party Terraform modules. Out of probably a dozen different Terraform projects, I've never run into a situation which I needed to manually resolve. This includes both projects which I started myself and cases where I'm helping to improve/manage client deployments.
> I was using the ECS option and changed the container source for a service.
What do you mean by this? I've used ECS with Terraform extensively and never had a problem with updating the container/image which a service referenced.
That being said, I never used Otto. It definitely seems like they tried to bite off more than they could chew and I wasn't really interested in such high-level solutions.
> Have you considered starting your own stack from the ground up?
Yes. I'm actually in the middle of trying that now.
I want to set up a vpc, a few web servers (1-10) with an autoscaling policy behind the vpc along with a bastion server and a cron server, a code deploy setup to work with autoscaling, cloudwatch logging and monitoring, a load balancer, an elasticache instance, and an rds instance. I've been working on this off an on for months. If you or anyone else can point me in a direction to simplify this I'd be grateful.
The core of the problem I had with terraform (outside of the ECS issue) is that there is one AWS service that gets soft deleted. I can't remember what it is right now but it really threw tf for a loop. So I'd setup the stack, do some testing, decide to shut everything down for the day with a `terraform destroy` and the next day i couldn't resume because tf thinks the resource exists but aws doesn't think it does.
What you are describing is pretty trivial with Terraform so shouldn't definitely take months. A week or two.
You can look at my github.com/RichardKnop/coreos-cluster as an example (that one sets up a CoreOS cluster but you can take just the VPC, RDS, security groups, subnets and NAT bastion from there. I also have couple more terraform repos on my GitHub that deploy AWS infrastructures like you described.
Also look at the GitHub of Government Digital Service (GDS), I think it's alphagov. They have a lot of nice terraform stuff there from their experimentation with different PaaS.
What part are you having trouble with? You can use the AWS provider for the VPC, security groups, EC2 instances for the bastion host, cron server and load balancer (or an ELB). Elasticache and RDS also require parameter groups and subnet groups. They don't support replication groups for Elasticache yet, but there's a PR for it.
I've also seen tfstate get weird after a slow Elasticache spin up or termination. If it takes over 10 minutes it times out. The main thing I don't like about Terraform is that they don't support conditionals, which can be annoying.
I'm going to throw one thing in here about terraform over boto / etc, it's actually really nice to have a way to put together your own modules. I think they could do a better job with some common libs to make it easier, but overall it's super powerful.
We use Serf for infrastructure command propagation ("Chef, go converge on host X", "Docker, start running container Y on host Z") across our entire staging and production sets of hosts, and Consul for holding most of our network topology information across three different geographic regions. Works pretty well. We're bringing in Nomad (edit: and Vault) now, though they still have a few rough spots.
Used Vagrant a few years ago, but almost completely moved to Docker, it doesn't make sense to have a massive VM image anymore.
Use Packer for all AMI building, probably one of the easiest tools to get started with.
Use Consul for service discovery with consul-template and envconsul, all work nicely together.
Terraform. I've been using for the past few months, but it has its problems. Modules use can be very tricky to manage, with no real guidelines on how to use them or TF itself. The isolation can be more complex than say, Cloudformation, which has its faults too. With CFN you have a concept of a infra stack, which will share common resources, with supplementary stacks providing additional resources. Unless you are happy to have a load of main.tf files in sub directories, which seems to be the pattern, you end up with a giant load of TF as one "stack". The benefit is never hard coding things, which you are probably going to do with CFN, plus you have a much better scope of changing resources in TF, but if you don't need that, TF can offer little over CFN with a language to generate it, like CFNDSL or troposphere.
We also regularly break though our AWS API limits for TF because it touches so many things, if you have TF in a single git repo with multiple hooks, your TF can take a long time to actually plan. Recently, I did a plan on 0.7 to see what the impact was for upgrading, without realising that it would overwrite the state file to comply with 0.7, making it impossible to switch back to 0.6. I did roll back to a previous version of the state, but even that caused some issues that I had to manually resolve.
Finally, Atlas. I would say that it is far from complete. There are a lot of things that UI is missing. The API for atlas is also non existent, so you have to use the UI constantly unless you want to auto deploy, which is probably fine for a dev environment, but not production. If you have a plan apply that needs manual intervention, it will sit there while commits pile up above it, so you either let apply and let atlas hit your account with massive amounts of API calls, or you cancel all the plans currently being blocked. It obviously works very different with say S3 and using TF from a CI, but i'm not sure how much focus there is on Atlas vs other products. Atlas also doesn't let you set up things like organisational access tokens, for Atlas itself and for Github. Right now, if we removed someone from our Github or Atlas org, stuff will simply break.
We use Conusl, Terraform, Packer and (sometimes) Vagrant.
Terraform is probably my favorite you have to be a bit careful running it against live infrastructure but it is so nice to be able to figure out who did what to your machines rather than clicking having people click around in the AWS dashboard.
We use Serf for deploy triggering and lightweight messaging between our asynchronous processing nodes and it works perfectly day in and day out. Really like it, provides a great alternative to pub-sub for some use cases.
You mentioned Chef and Ansible. Using both? What are you thoughts on one vs the other? Chef seems more powerful / flexible, ansible maybe a bit simpler.
I'm not the parent, but I've used both (Chef more heavily - going on 6 years, and I'm certified as a Chef trainer).
Your assessment is roughly correct, Ansible is definitely easier to get started with, and if you just want to manage a small number of nodes that you have SSH access to, it's a good choice. Chef has a larger ecosystem, and a different communication model (VMs under management communicate out to a Chef server over HTTPS, no need for inbound ports to be opened). The tradeoff is a slightly longer learning cycle. I'm biased towards Chef because it's what I know, but having spent some time with Ansible it's a good choice for some use cases.
Thanks for explaining. We are using Chef heavily and we like it, but some people have been experimenting with Ansible so we started to wonder how they would compare in general. The concensus seems to be to stay with Chef.
Packer (qemu builder) is a huge step up from manually orchestrating image builds with qemu by hand.
Terraform seems to be a fully unique product. No CloudFormation isn't really the same thing, nor are Azure ARM Templates. Terraform has solved multiple problems for me that were not solvable any other way except for lots of manual orchestration done with bash scripts.
I haven't used Nomad yet, but I've heard good things and am happy Kubernetes/Mesos has some competition (no I don't consider Swarm Mode a real competitor yet).
- packer, that is a really nice and simple image building tool. It saved lot's of headache compared to tools like oz, because packer fails fast and saves debugging time.
- terraform, that is comparable to OpenStack Heat or AWS CloudFormations. Each have their own issues. But terraform being platform agnostic is an advantage for me.
- vagrant is not modern anymore, but it's ubiquitous.
It is unfortunate that all people can do is to complain.
One should probably think that about sending a PR for a feature request or a bug report if it really impacts them. An attempt, even if in the wrong direction, would probably push the priority of the underlying issue higher for the core devs working on the project.
If you can't do that, at least attempt to be constructive.
If you use vagrant heavily, you are almost forced to use packer at one point. There are certain things you simply have to put into a base image and packer is the best way to do that. However, I will say the support for vagrant seems fairly responsive, while packer seems basically unsupported.
They have enterprise versions of Terraform, Vault, Nomad, and maybe more. You can get them all in "Atlas". They'll also host it all for you, if you want (for a fee). They charge enterprise prices for all that.
At work we use Packer, Terraform and Consul across all of our apps, and little smatterings of other stuff in some places. A little on each:
- Packer: Not my favorite, honestly. I can't argue that it's doing the job, but it feels inflexible and hard to integrate into a coherent workflow. It seems that Hashicorp Atlas can smooth this over in principle; we don't use it, because it didn't seem to fit with our use of Terraform at the time we got started, and we have a semi-home-grown alternative now.
- Terraform: We're using Terraform not only for low-level infrastructure stuff (VPCs, subnets, etc) but also for application deployment. I'd say our success with Terraform was due to a couple things. First: we picked up Terraform at a time when we were in the process of a total infrastructure rework in our org anyway, so we were effectively starting from scratch. Second: I spent a few months using Terraform for toy things and learning what it was good at, what it was less good at, and building a "pattern library" of techniques that had worked out. Once we started applying it to real problems, we just cherry-picked suitable patterns from that library and used them. I expect that Terraform is tougher for someone who already has significant infrastructure deployed and is trying to manage it with Terraform with few changes, since there are definitely approaches that are harder to model in Terraform than others.
- Consul: I really enjoy the simplicity of Consul. Getting a cluster up and running is pretty straightforward. Once you have it running, you get a highly-available, datacenter-aware key/value store and a service registry. We honestly don't use the service registry very much, but we have made extensive use of the consul-template utility in conjunction with Terraform's consul_key_prefix resource to have applications/services announce where their endpoints are for consumption by their clients.
We actually decided against using Vagrant because it was "more bulky" than our app developers were willing to tolerate. Instead we continued with our previous solution (running the apps direction on the users' laptops with a README in each app describing how to get it running) being optimistic that the new Docker for Mac and Docker for Windows would be awesome enough to get the good parts of Vagrant in a lighter package.
Vault showed up a bit late for our "architecture remix" so we solved our Vault-ish problems in other ways. I like its design in theory, and would probably give it a try if the opportunity arose.
Similar story with Nomad: too late for us, and we'd gone down an alternative path before it showed up. Can't really speak to it, since I only dabbled with it very briefly.
I'm sad but honestly not surprised to see Otto phased out. I was initially excited when it was announced last year but I could never really figure out how to get it to behave in the way I expected... I always felt like I was fighting it, and doing things in a way it didn't expect. I think there's room for the Hashicorp family of tools to "tessellate better", but Otto seemed like a very coarse, heavy solution -- essentially wrapping and templating the complex tools underneath -- where I was more hoping for the tools themselves to grow features to close the gaps.
This turned in to a bit of a rant, so I'll stop. :D
I advise anyone using Terraform in production to wrap it up in some sort of automation. Hashicorp would of course like you to use Atlas :D but you can get a long way with CI/automation tools like Jenkins, Rundeck, ...
We have a wrapper script which:
- configures the remote state in a predictable way (setting up remote state properly is one of the more fiddly parts of Terraform usage)
- takes a snapshot of the current state
- runs "terraform plan" to produce a plan file
- takes a snapshot of the current state, which has now been refreshed by Terraform
- pauses here and waits for human approval of the plan
- takes a snapshot of the current state one more time, even though it's usually just another copy of the last state we snapshotted
- runs "terraform apply" to apply the plan created earlier
- takes a snapshot of the final state
All that state-snapshotting is an insurance policy against Terraform getting itself confused. There are definitely some gotchas in this area[1] but honestly we've only actually made use of these zealous state snapshots on two separate occasions, and they were both on our pre-production staging environment (which we deploy to more carelessly, as a dry run for production) rather than our production environment.
I have thought about open sourcing that wrapper script but sadly it has some assumptions about our environment built into it (e.g. locking using a specific service in our world, so that two deploys can't run concurrently) and I've not had the time to scrub them out and generalize it.
I work for a mid-sized media company and we use Vagrant & Packer to build all our infra. AWS CF Templates have been a big pain and we are evaluating Terraform as an alternative
I found found packer to be quite useful for building VM images for my team. It's a little hacked together, but far better than the manual process it replaced.
They did a good job at trying something, it didn't work as expected and they were honest and decided to focus on something else. Those are all very good qualities. I think that's commendable.
It's too bad (and uncharacteristic of mitchellh) that this post is so light on specifics. Were the "previously unknown challenges" simply that not enough people adopted Otto? Or were there actual technical hurdles?
The premise of Otto isn't clearly flawed, so it would be interesting to see specific challenges - even if it's just "the problem space is way too big and not enough people wanted it"
I'm happy to answer myself. Previously unknown challenges are just the various facets of building and deploying an application. Its not so much that they were unknown problems so much as the abstraction we designed for doing so proved challenging to solve those problems.
Ultimately, Otto was trying to be a masterless PaaS (Heroku, etc.). When you frame it that way and think about all the things you'll have to solve it becomes challenging. On top of that, we always wanted Otto itself to be fairly "thin" and delegate its difficult duties to the rest of the stack. This required us to build a bunch of features we weren't ready to build into our other products OR risk bloating Otto itself.
Well, in Heroku proper, you feed its git repos an app, it figures out what type of app it is, and applies the right build pack and hosting environment for it. Keeping build packs up to date, keeping all the scripts running, and making sure an app has associated dependencies, etc--I imagine that's the difference between an independent setup you can self-host quickly and easily and one that's very dependent on an ecosystem of Heroku maintainers, tooling and existing server infrastructure...
It's a very hard problem to solve, and one which will likely only catch on as devops tooling improves and becomes expected for apps, and as app runtimes standardize. Alternatively, you could look at the myriad ways operating systems package applications, and the ways applications allow themselves to be packaged, let alone store data in production, and ... basically give up on this ever happening in an easy, hands-free automated way.
> Keeping build packs up to date, keeping all the scripts running, and making sure an app has associated dependencies, etc--I imagine that's the difference between an independent setup you can self-host quickly and easily and one that's very dependent on an ecosystem of Heroku maintainers, tooling and existing server infrastructure...
I work for Pivotal on the Cloud Foundry buildpacks team. 4 of our buildpacks (Ruby, Python, Go, NodeJS) are downstream forks of Heroku's.
We merge from upstream approximately weekly, but the pace has definitely dropped.
We build all the runtime binaries we ship with our buildpacks. We also build the rootfs it all runs on. Some of these pipelines are now fully automated. For example, when a NodeJS tag lands, our pipeline will build the binary, add it to a buildpack and put it through our battery of test suites. Our product manager can make a release with a few keystrokes and a button press.
The difficulty of engineering really comes down to the nature of the ecosystem you're turning into a buildpack. We did an article on writing buildpacks[0], taking Rust as our example. It was a doddle, because of Cargo. Meanwhile our PHP buildpack performs incredible gymnastics to make a 12-factor cloud platform look like a shared host circa 1999.
The selling point was you just do the bare minimum and otto figures out what other specific things you need to get your project up and running. That's not scope, that's magic.
Lots of useful, successful things appear to be magic (especially in their selling points). Early Heroku is a great example in this space.
I didn't see anything in the initial premise of otto that was technically untenable. We could speculate about the "challenges" - the scope was too wide/unbounded, it was open source, it was a distraction from other company goals, it didn't gain enough early traction - but that's simply speculation without details from the creators.
Kudos to HashiCorp for realizing that the complexity of the project was getting away from them and for having the guts to pull the plug in such a public way.
I work for Pivotal on the fringes of Cloud Foundry. Between us and other Cloud Foundry Foundation members there are around 200 full time engineers working on it.
Featuresome, robust, industrial-grade cloud platforms are hard.
Will "zero configuration" be a feature/goal in your next attempts at this abstraction? Was this one of the things that made building Otto so challenging?
A few months ago when I tried Otto I found the "zero configuration" idea off-putting. In fact, I couldn't even get a basic Python app to work because there was no way for me to install libmysqlclient[0]. There really was no way to configure anything in Otto.
I'm still a big fan of minimal configuration through sane defaults, but not advocating "zero configuration" in that sense that you have no power to change those defaults.
One area I think we've done really well with this in the past year is the "-dev" flag on Vault, Consul, Nomad. It is a zero-configuration way to get a fully functional dev server up and running in one command though you can still specify a config if you wanted. For non-dev, I just don't want to set dozens of options just to get going, so we'll continue to strive for defaults that work where we can.
All that being said, they are defaults, so you can always change them.
For a small example, I _still_ get +1 notifications on this critical issue nearly every day: https://github.com/mitchellh/packer/issues/409 - no response from dev team whatsoever.