"Immutable infrastructure" what a laugh. In a large deployment, configuration so...

wpietri · on March 14, 2020

Are you sure you two are talking about the same thing?

My understanding of immutable infrastructure is the same as immutable data structures: once you create something, you don't mess with it. If you need a different something, you create a new one and destroy the old one.

That doesn't mean that the whole picture isn't changing all the time. Indeed, I think immutability makes systems overall more fluid, because it's easier to reason about changes. Mutability adds a lot of complexity, and when mutable things interact, the number of corner cases grows very quickly. In those circumstances, people can easily learn to fear change, which drastically reduces fluidity.

tibbon · on March 14, 2020

Yup. We do this. When our servers need a change, we change the AMI for example, and then re-deployment just replaces everything. Most servers survive a day, or a few hours.

wpietri · on March 14, 2020

Makes sense to me. I was talking with a group of CTOs a couple years back. One of mentioned that they had things set up that any machine more than 30 days old was automatically murdered, and others chimed in with similar takes.

It seemed like a fine idea to me. The best way to be sure that everything can be rebuilt is to regularly rebuild everything. It also solves some security problems, simplifies maintenance, and allows people to be braver around updates.

emanlin · on March 14, 2020

Configuration Management is still present in this process, it's just moved from the live system to the image build step.

notacoward · on March 14, 2020

Probably the most insightful comment in this entire thread. Thank you. In many cases, an "image" is just a snapshot of what configuration management (perhaps not called such but still) gives you. As with compiled programming languages, though, doing it at build time makes future change significantly slower and more expensive. Supposedly this is for the sake of consistency and reproducibility, but since those are achievable by other means it's a false tradeoff. In real deployments, this just turns configuration drift into container sprawl.

__float · on March 14, 2020

Is this still as painful as it used to be? AMI building took ages, so iteration ("deployment") speed is really awful.

grrywlsn · on March 14, 2020

Personally that's why I avoid Packer (or other AMI builders) and keep very tightly focussed machines set up by the cloud-init type process.

notacoward · on March 14, 2020

So, once you create a multi-thousand-node storage cluster, if you need to change some configuration, replace the whole thing? Even if you replace onto the same machines - because that's where the data is - that's an unacceptable loss of availability. Maybe that works for a "stateless" service, but for those who actually solve persistence instead of passing the buck it just won't fly.

wpietri · on March 14, 2020

Could you say more about why your particular service can't tolerate rolling replacement of nodes? You're going to have to rebuild nodes eventually, so it seems to me that you might as well get good at it.

And just to be clear, I'm very willing to believe that your particular legacy setup isn't a good match for cattle-not-pets practices. But I think that's different than saying it's impossible for anybody to bring an immutable approach to things like storage.

lukeschlather · on March 14, 2020

The person you're replying to didn't say "replace every node," they said "replace the whole thing."

To give a really silly example, adding a node to a cluster is a configuration change. It wouldn't make sense to destroy the cluster and recreate it to add a new node. There are lots of examples like this where if you took the idea of immutable infrastructure to the extreme it would result in really large wastes of effort.

wpietri · on March 14, 2020

Could you please point me at prominent advocates of immutable infrastructure who propose destroying whole clusters to add a node? Because from what I've seen, that's a total misunderstanding.

lukeschlather · on March 14, 2020

As I said, it's a silly example just to highlight an extreme. In between there are more fluid examples. I don't think it's that ridiculous to propose destroying and recreating the cluster in its entirety when you're deploying a new node image. However as you say I'm not sure anyone would advocate that except in specific circumstances.

On the other hand, while my suggestion of doing it to add a node sounds ridiculous I'm sure there are circumstances in which it's not only understandable but necessary, due to some aspect of the system.

wpietri · on March 14, 2020

I'm saying it's not even an extreme, in that I don't believe what people are calling "immutable infrastructure" includes that.

If your biggest objection to an idea is that you can make up a silly thing that sounds like it might be related, I'm not understanding why we need to have this discussion. I'd like to focus on real issues, thanks.

lukeschlather · on March 15, 2020

I'm not objecting categorically to anything. I think that immutable infrastructure is a spectrum, and depending on your needs you may have just about everything immutably configured, or almost nothing. I just don't think it's so black and white as "you should always use immutable infrastructure."

I also think it's a cool idea to destroy the entire cluster just to add a node, and it sounds ridiculous but also like there's some circumstances where it makes perfect sense.

wpietri · on March 15, 2020

Again, do you have a citation for the notion that it's a spectrum? The original post that coined the term doesn't talk about it that way, and neither do the other resources I found in a quick search. As I see it, it's binary: when you need to change something on a server, you either tinker with the existing server or you replace the server with a fresh-built one that conforms to the new desire.

notacoward · on March 14, 2020

Wow, look at those goalposts go! If you make enough exceptions to allow incremental change, then "immutable" gets watered down to total meaninglessness. That's not an interesting conversation. This conversation is about configuration management, which is still needed in a "weakly immutable" world.

wpietri · on March 14, 2020

Again, could you please point me at notable advocates of immutable infrastructure proposing the approach you take such exception to? And note that I'm not proposing any exceptions.

j88439h84 · on March 14, 2020

Presumably you replace the parts that changed and keep the parts that didn't.

grrywlsn · on March 14, 2020

Interesting to say you've "solve[d] persistence" when you seem to be limited by it here. Is there a particular reason your services can't be architected in less stateful, more 12-factor way?

notacoward · on March 14, 2020

Kick the persistence can down the road some more? Sure, why not? But sooner or later, somebody has to write something to disk (or flash or whatever that doesn't disappear when the power's off). A system that stores data is inherently stateful. Yes, you can restart services that provide access or auxiliary services (e.g. repair) but the entire purpose of the service as a whole is to retain state. It's the foundation on top of which all the slackers get to be stateless themselves.

notyourday · on March 14, 2020

The vast majority of people simply redefine the terms to fit whatever they are selling.

If your systems are immutable they can run read-only. In the in nineties Tripwire, the integrity checker, popularized it. You could run it off cdrom. Today immutable infrastructure is VMs/containers that can be ran off a SAN or a pass through file system that is readonly. It means snapshots are completely and immediately replicatable. When you need to deploy, you take a base image/container, install a code onto it, run tests to ensure that it is not broken and replicate it as many times as you need, in a read-only state. This approach also has an interesting property where because system is readonly ( as in exported to the instance read-only/mounted by the instance readonly ) it is extremely difficult to do nasty things to it after a break in - if it is difficult to create files, it is difficult to stage exploits.

That's the only kind of infrastructure where configuration management on the instances themselves is not needed

bradhoffman · on March 14, 2020

What sort of stack do you all use then to manage these clusters? Have you found any solutions to your conflicts?

notacoward · on March 14, 2020

The hosts are managed via chef, the jobs/tasks running on those hosts by something roughly equivalent to k8s.

As for the conflicts, I have to say I loathe the way the more dynamic part of configuration works. It might be the most ill conceived and poorly implemented system I've seen in 30+ years of working in the industry. Granted, it does basically work, but at the cost of wasting thousands of engineers' time every day. The conflicts occur because (a) it abuses source control as its underlying mechanism and (b) it generates the actual configs (what gets shipped to the affected machines) from the user-provided versions in a non-deterministic way which causes spurious differences. All of its goals - auditability, validation, canaries, caching, etc. - could be achieved without such aggravation if the initial design hadn't been so mind-bogglingly stupid.

But I digress. Sorry not sorry. ;) To answer your question, my personal solution is to take advantage of the fact that I'm on the US east coast and commit most of my changes before everybody else gets active.

NikolaeVarius · on March 14, 2020

Sure. Its more that your CICD is lacking.

DrJones1098 · on March 14, 2020

Sometimes you have to work with what you're given in a brownfield env and a config managment tool is useful in that case, but it's possible that you are working with a less than ideal architecture with less than ideal time/money to make changes.

State is always the enemy in technology.

I can't even imagine managing hundreds of servers whose state is unpredictable at any moment and they can't be terminated and replaced with a fresh instance for fear of losing something.

notacoward · on March 14, 2020

> State is always the enemy in technology.

I work in data storage. Am I the enemy, then? ;)

> can't even imagine managing hundreds of servers whose state is unpredictable at any moment

Be careful not to conflate immutability with predictability. The state of these servers is predictable. All of the information necessary to reconstruct them is on a single continuous timeline in source control. But that doesn't mean they're immutable because the head of that timeline is moving very rapidly.

> can't be terminated and replaced with a fresh instance for fear of losing something.

No, there's (almost) no danger of losing any data because everything's erasure-coded at a level of redundancy that most people find surprising until they learn the reasons (e.g. large-scale electrical outages). But there's definitely a danger of losing availability. You can't just cold-restart a whole service that's running on thousands of hosts and being used continuously by even more thousands without a lot of screaming. Rolling changes are an absolute requirement. Some take minutes. Some take hours. Some take days. Many of these services have run continuously for years, barely resembling the code or config they had when they first started, and their users wouldn't have it any other way. It might be hard to imagine, but it's an every-day reality for my team.

eropple · on March 14, 2020

> I work in data storage. Am I the enemy, then? ;)

You’re the prison guard.

DrJones1098 · on March 17, 2020

> Be careful not to conflate immutability with predictability.

I don't trust predictability. Drift is always a nightmare. Nothing is ever as predictable as you would like it to be.

>You can't just cold-restart a whole service that's running on thousands of hosts and being used continuously by even more thousands without a lot of screaming.

If it's architected well you can :)

madhadron · on March 14, 2020

> State is always the enemy in technology.

Except that state and its manipulation is usually the primary value in technology.

> I can't even imagine managing hundreds of servers whose state is unpredictable at any moment and they can't be terminated and replaced with a fresh instance for fear of losing something.

Yes, that sounds awful. That's why we have backups and, if necessary, redundancy and high availability.

DrJones1098 · on March 17, 2020

> Except that state and its manipulation is usually the primary value in technology.

Exactly and thats why you put state in data stores and keep your servers immutable.