While I agree that ALTER SYSTEM change is probably a good thing, I also see the pain and perspective of the people who were pushing back.
Anyone who has had to work in security knows about the endless deluge of low effort script kiddie “security reports” of non issues they have to deal with by security “researchers” fishing for a bounty. And you have to read all of them, because there’s no telling when one might be legitimate. Or you bring in some third party auditor for compliance or some other reason and they are just as clueless and snap up something like this to add to their list of “recommendations” that you have to respond to so they can feel like they are more justified collecting the paycheck.
So I think implementing the feature - but being extremely loud about what it is and isn’t - is the right move here.
I work on building security hardended systems, and often see the opposite problem. Engineers, even ones who should really know better, will see a feature like this and assume it creates an actual security boundary. Then they design something that relies on it for a security requirement. At that point, you need to hope someone catches the mistake, but even then, you can spend a week arguing with a non-technical manager that you need to start over, because that thing that sounds like it does exactly what you want actually provides 0 security guarantees.
I completely agree with this. If something seems at first blush to enforce a security control, but does not in reality, it should somehow be made very clear that it is just disabling a footgun rather than actually enforcing a security control. Either the setting in the config file needs to be named something silly like disable_alter_system_not_a_security_feature, or it needs to be overridable with something like "WITH FORCE", to make it really clear that all it is is a stumbling block rather than a control.
The reality is that your average engineer should not be considered an expert in security. You need dedicated security people with the authority to reject commits or code changes.
This exactly. It's hard to properly understand security, even for educated professionals. IMHO security features should be the default only if they are demonstrably correct or hard to misuse.
Otherwise, the often wiser thing to do is to make the danger of doing something as obvious and blatant as possible. It's like the debate around having a root account or not: IMHO it's better to have a superuser that everybody perceives as an obvious danger, that's easily recognizable and never really ok to use for mundane tasks. I've seen people misunderstanding Windows ACLs way too often because of this.
Speaking as somebody who has had to debug why servers were ignoring changes to the configs in our git repos, this is a welcome change.
It's a shame there are ways to bypass it, but at least it communicates the intent: you aren't supposed to modify this system without modifying the actual config
Not very happy that a group of people with vested interests (cloud companies) push through a feature that bears risk of misconfiguration for everyone. Mongo comes to mind with all the problems they had around a false feeling of security.
I think if the the config is unchangeable, all ways to change the config (e.g. don't reload config on demon restart etc.) should be considered. Or the config piped into PG on startup, not reading any files. But this takes time to do right.
This had to be in on short notice (trys to move this to another release where squashed) b/c vested interests.
Just because it's kubernetes doesn't mean it's cloud companies. This is a useful feature for anyone that wants repeatable deployments of pistresql. That can be cloud, or it can be k8s, or it can be an automated testing pipeline.
I think the top comment on the LWN article gets it right. Don't think of it as a SECURITY feature, think of it as a SAFETY feature. Imagine some subset of buttons on a physical control panel than can be locked behind a flimsy plastic door. It's not going to keep anyone determined out, but it will keep people from using them on accident.
"Just because it's kubernetes doesn't mean it's cloud companies. "
All of the people driving this work for companies that sell postgres hosting. This is why they are driving this, to make their own offering cheaper to operate, customers can't use "ALTER SYSTEM" and destroy the managed DB creating problems.
Their intent is not for Kubernetes users, this might be a side effect, but for their own companies. If your developers use ALTER SYSTEM and then deploy to production without changing prod conf, I'd say talk to the developers.
It's irrelevant on how it is meant to be, it's 100% relevant on how it is perceived.
The feature is driven by "We're power users and we know what to do and what this is" whereas the resistance is "Yes, but there are a million users, they are not power users, they will shoot themselves in the foot, and then PG will gain a bad rep for security".
Assume one guy in my team does ALTER SYSTEM on a cloud managed DB. Next month he leaves, later someone redeploys the DB, nothing works and nobody knows why. Declarative config and IaC has become standard operations to solve real world problems.
Besides that, if I wanted to operate the system internals of a DB myself I wouldn’t be using a cloud-hosted version in the first place. I pay them precisely to manage all these details so I don’t have to think about it. If I don’t want that there are a lot cheaper alternatives.
> All of the people driving this work for companies that sell postgres hosting.
Even if that's true (and I'm not sure it is, there seemed to be plenty of people in support of the feature), that doesn't mean it only benefits them. Plenty of features are championed by one group but benefit many others, so perhaps evaluating whether it's good based on who asked for it isn't a useful strategy.
> The feature is driven by "We're power users and we know what to do and what this is"
That's not how I perceived it. There are, as was noted in the article, multiple ways to do this, and even ways to enforce it currently. To me this seems very clearly a case of users, and people responsible for providing it to users, wanting a feature to make the quality of life of those users better (anyone that does an ALTER SYSTEM in the environments they are referring to is going off-track and going to cause themselves a problem) and attempting to work with upstream to provide a solution rather than using kludges (such as making the files immutable, but possibly not getting a useful error from postgresql when the command fails) or implementing local patches, so it's not necessarily exactly the same as the mainline postgresql.
> the resistance is "Yes, but there are a million users, they are not power users, they will shoot themselves in the foot, and then PG will gain a bad rep for security".
No, the resistance did not seem to be that the users will shoot themselves in the foot, from what I read. It was that they were afraid security researches would see this as a security mechanism, but since it's not necessarily hard to bypass they would get called out for poor security, affecting the reputation of the project. I'm not sure how you expect millions of users to shoot themselves in the foot by having to enable a flag that prevents them from doing this and then attempting to do it. It's not default behavior, it's specifically opt-in behavior. And if it's clearly not a security feature, but a safety feature to help people from doing things they shouldn't in environment that it will cause problems in, then hopefully security researchers won't waste their time with erroneous CVEs.
That's about as useful as saying "it's a chainsaw, if you want safety, don't use one". Sometimes powerful tools are necessary, but a bit of protection to avoid accidents when using them is not a bad thing.
This is meant to solve the problem of, "Developer tries to fix something in their local environment, gets it working, then finds it gets thrown away when they restart Kubernetes. Winds up convinced that PostgreSQL is broken."
That's a real use case. The right way to address it is an easy to place speed bump. It isn't some complex solution that achieves a random security ideal which has little to do with the use case. And would make configuration harder.
That's exactly why this is being described as a safety feature, not a security feature.
A similar thing happens in networking. A router gets updated. In a few months the router is rebooted and config is lost. Air circulators are covered in excreta because someone forgot to saveconfig.
Happens all the time in less disciplined workplaces.
"Developer tries to fix something in their local environment, gets it working, then finds it gets thrown away when they restart Kubernetes. Winds up convinced that PostgreSQL is broken."
No, it tries to fix the problem of Postgres managed hosting companies, where users can use ALTER SYSTEM but can't change *.conf - all of the people driving this, work for companies that offer managed hosting. The "Developer tries..." is just a fake argument, b/c "We, the Postgres hosting companies want this" would probably not work. So it's the "For the developer ..."
In 25 years I never used ALTER SYSTEM in Postgres for my local development, I always made changes to *.conf - if for the idiotic reason that whatever you Google, you get an answer to change some entry in *.conf not an answer to ALTER SYSTEM
Because when you're working with a containerized setup, it is standard for every developer to have the whole setup on their individual work computer.
A developer debugging problems on their own computer will naturally prefer to try to fix things themselves rather than depend on an external admin whose attention is hard to get. And you have every tool necessary to try to do so.
I can neither confirm nor deny that of my own knowledge.
I'll merely say that if it is possible, then I'd expect this detail to have been in the article. Because the existence of an acceptable solution for the use case would have been an important part of the discussion.
As evidence for my point of view, the article early on points out that Tom Lane thought that there was a way to do it. And even described how to do it. But it turned out that he was wrong.
Now I'd be shocked if either of us knew PostgreSQL nearly as well as https://en.wikipedia.org/wiki/Tom_Lane_(computer_scientist). The fact that he thought there was a way to do it makes it reasonable that you'd think the same. The fact that he couldn't come up with a way to do it without changing PostgreSQL suggests that, like he was initially, you're wrong.
But as I say, my evidence is circumstantial. And I'd be open to learning what it is in case I ever again wind up setting up PostgreSQL on an old version for a containerized system. Though in a new system I'd prefer to use the mechanism created specifically for this purpose.
That's how I see it too. Not security, but developer experience. You set the file as readonly, but provide a message to PostgreSQL superuser that this is as intended
This is discussed in LWN comments, in fact. But the problem there is that the protocol used to communicate the fact that the file is read-only to the application (errno==EACCESS or equivalent on other platforms) does not provide any useful way for the system administrator who makes the file read-only to add a notice explaining why permission is denied, in a way that it is also communicated to the app - so that it could display it to the user.
So the proper solution to this whole thing would be for the OS to provide such a facility: "permission X is denied to Y because Z". This seems like a useful facility in general, come to think of it. But it would have taken more time and effort, and would require buy-in from more parties, some of whom might be very hostile to this notion (e.g. I don't think it would be an easy thing on Linux). No wonder that this isn't an option that is even contemplated as realistic.
And so instead we got yet another easy-to-make crutch in the tower of crutches and duck tape that is modern software.
There seems to be major lack of communication in the threads as evidenced by Tom Lane's "I never objected to the idea of being able to disable ALTER SYSTEM"
I guess a technical understanding of Postgresql leads to different thoughts than your average working dev/devops/ops person.
Which, TBF characterizes a lot of tech discussions.
> will naturally prefer to try to fix things themselves
That's precisely my point. They aren't "fixing" anything. They're creating blind workarounds that simply let them continue developing.
> And you have every tool necessary to try to do so.
So containers are an entirely inappropriate mechanism to containerize things? Thus we have to patch software that runs in the container to be container aware to avoid problems /created/ by developers "fixing" things?
Very natural indeed. What benefit that brings just happens to be entirely beyond me.
Doesn't everyone working on Postgres have 'vested interests' on working on it?
Isn't this what we want - that people have their own various motivations for contributing to an open source project? A healthy process is one that allows for contributions, variously motivated, to be discussed and accepted or not.
I think the problem here is not only related to cloud environments. Is having an SQL command that writes the configuration file a good idea to start with? It's probably something dangerous, I prefer to SSH into the machine and edit the config file by hand, possibly adding comments describing why I've set a parameter to a specified value, rather than doing it with an SQL command.
I also think that is a feature that is rarely used, that probably shouldn't be on by default.
See I read that Postgres did that and thought “what a great idea!” I’ve seen so many stories of people who changed something important at runtime in their DB/router/managed switches/whatever and then forgot about it until after a restart “broke” things 2 years later and the change is lost forever.
Maybe contentious but I'd argue the ability to rewrite config files at runtime with SQL is the thing that makes more sense being an extension although it's existed for so long as a core feature, it's non trivial to change.
> This had to be in on short notice (trys to move this to another release where squashed) b/c vested interests.
From the article:
> Even then, the discussion was not quite done; Momjian questioned merging this change so late in the PostgreSQL development cycle. ""My point is that we are designing the user API in the last weeks of the commitfest, which usually ends badly for us"". Fennema-Nio pointed out that the API was essentially unchanged from its initial, September form, and that months had been spent discussing alternatives. Haas said that such a small patch would not improve by being held up for another release cycle: ""I think it has to be right to get this done while we're all thinking about it and the issue is fresh in everybody's mind.""
I wanted to clarify that I (Jelte, the author of the final patch) work at Microsoft. So, it's definitely not just one company seeing benefit in this feature. Arguably Microsoft is a cloud company too though, but honestly most of the active contributors to PostgreSQL work at a "cloud company".
> I think if the the config is unchangeable, all ways to change the config (e.g. don't reload config on demon restart etc.) should be considered.
What? The goal isn't to make the configuration unchangeable, it's to ensure that all configuration changes go through the configuration management system. You need to have a reload option that reloads the configuration from disk.
This is not something that's just useful for cloud companies btw, it's useful in any scenario where you're doing configuration/change management.
I wish they had a more declarative way for provisioning (and de provisioning) users and databases. I'd like to define these at startup and have them reconciled automatically without having to do a deployment+psql dance to manage them.
My initial reaction was in agreement, but you’re almost certainly going to need a mechanism to manage schema changes. It makes sense to use the same system for both
Schema changes happen inside a database on the server but usually you need the DB and user provisioned first (generally using a superuser and the existing/default DB)
You can use the same tool, however it needs to connect to 1 DB, do the DB creation and role setup, then reconnect to a new DB.
If you have a single app or DB in a server, it can make sense to usually schema management but it's more complicated if multiple apps have their own DBs sharing the same server.
The last few places I worked had scripts that ran before schema management as a superuser then used the application framework schema management (migrations) thereafter.
Yeah I have the same awkward process for provisioning databases. Another difficulty is configuration of passwords/auth which varies by server typically. Generally speaking I don’t want my migration process to have the ability to assume the super user role.
The variance in the script prevents using the typical migration frameworks built into the app, and it’s the wrong responsibility - the app migrations have a contract with whatever database is being connected to about which users and schemas exist.
I’m still not sure a declarative mechanism is desirable though because the database “setup” is still subject to change after initial provisioning: new schemas/roles/etc will be required and we need an imperative API to execute those changes transactionally with precision.
that's kind of an odd thing to say. all of that state is really held in Postgres tables. all of the various tools just manipulate those. you could quite trivially write a tool that would take a big yaml or something and splat that.
This would be a welcome change for folks running PG in k8s or with any external automated configuration management. Once you want to deploy Postgres databases via API to semi-trusted users you want to have configuration control outside of the database itself. We disabled this at bit.io via patches - would have been nice if upstream had this.
> I especially don't believe in kluges to the effect of 'superuser doesn't have all permissions anymore'
This comes from a postgres developer?
Postgres is the only software I know that has a hardcode that forbids running as root:
"root" execution of the PostgreSQL server is not permitted.
The server must be started under an unprivileged user ID to prevent possible system security compromise.
There is no option to disable that.
I think that's bad.
You: "But I want to run postgres isolated in a single user VM or container or unshare where only root exists, which has much stronger security guarantees that UNIX user separation!"
Postgres: "Forget about it, those things don't exist, this is the 70s."
I haven't looked this up in a few years, but is it still bad practice to run things as root in docker? There was always a fear in the community that running things as root would make the path easier if a hypothetical jailbreak were to occur. It always felt like "Buy our kevlar vest, but maybe pack some chainmail underneath. Just incase."
> It always felt like "Buy our kevlar vest, but maybe pack some chainmail underneath. Just incase.
My understanding of Docker has always been that you're not supposed to rely on it for security—it's less a kevlar vest and more a build-your-own-platemail tinker toy set. If you know what you're doing you can create a quite secure environment with it, but if you don't you're not that much better off than running stuff directly on an EC2 instance.
Some practices that help secure your image are running a stripped-down image (fewer binaries to call out to if an attacker achieves RCE) and being judicious in volumes and network settings. But yeah, I'd definitely also prefer to see the container running rootless.
It's all about reducing what an attacker can do if they get to RCE, and root still plays a role in that.
I wonder why they didn’t go with the file system permissions on the .auto.conf file, and gracefully handling that as a way to disable the feature. Seems like an obvious solution that doesn’t lead to surprising semantics. The container use case would work fine.
But in this scenario, wouldn't you want to break those tools precisely because they are going around the centralized config from which .conf is supposed to be generated?
But that file gets read at startup. So making changes there is a valid way of making changes.
What you really want is to prevent postgres from writing to that file.
That’s more complicated than just making it write only for everyone. Adding an option to stop postgres from doing what you don’t want it to do makes sense to me.
pg applying the change and failing to persist it (in case of a write failure) is the acceptable behaviour probably? Write failures leading to alter failures would count as a surprise.
But after disabling alter system, you can use permissions to avoid local overrides, so this is probably worth adding in the docs.
Makes sense to me. It seems a lot of features in Postgres optimize for managing a single instance but every setup I've worked on in the last 8 years has been HA. In that case you'd apply config changes to the replica and fail over versus restart the daemon.
Checkpointing/crash recovery/shared memory also seem to be optimized for daemon restarts.
I'm very glad to see this option added. If I could add one thing to it, it would be to display a custom message when ALTER SYSTEM is disabled. You know, one that would let the person setting it up tell the person trying the command where in the repo the configuration file that should be edited can be found instead.
Containerization was bad enough when it was just the result of not wanting to learn how to run services on Linux, but now we're actively modifying system software to make containerization "easier" (not really easier, just more in line with its ethos)? Let's hope this kind of thing doesn't become a trend.
Not to jump too far down this hole here, but the only difference between an Ansible playbook, a Dockerfile, and postgresql.conf is format (indeed you're probably pretty likely to somehow jam a postgresql.conf file into your configuration-as-code tool, diminishing the distinction even further). The whole impetus behind configuration-as-code and containerization was to avoid having to bother at all with the host OS; my argument here is that we would've been a lot better off just figuring out how to configure--for example--postgresql.conf on Alpine Linux rather than inventing Docker and Kubernetes.
But! I will admit Kubernetes is handy for stuff like specifying scaling properties... but I mostly feel like conflating the two things ("configuring services" and "configuring cloud providers") was an oops.
Virtualizing hardware and running extra copies of the kernel has a small overhead.
I'm not sure it matters for the majority of companies but it has a measurable impact for the gigantic ones.
Afaik containization was rooted in the ability to run multi tenant servers without semi trusted apps overstepping their bounds. See Google Borg and Facebook Tupperware (iirc they started as chroot automation and expanded)
Some stuff like Java have ran in a runtime level container long before OS containers so it seems a bit redundant there, but still facilities having conflicting versions of the runtime installed.
I don’t understand the purpose of this. If what’s being said is true, alter system being disabled wouldn’t prevent the super user from doing all the same things as alter system but in a less convenient way. It will just give people a false sense of security, because they can abuse COPY to overwrite the config.
It is a safety feature that can help avoid people getting confused when working in a containerized environment. It is not a security feature at all. Nor is it intended to be. It is meant so that people like me can set up a configuration that will work better for coworkers who know less about databases than I do.
Then surely the answer should be that there should be a config setting that disables it, with the option to do something like "WITH FORCE" which would force enable it in the environment even with that config setting enabled. This gives the best of both worlds, in that it stops people from copying blindly from Stack Overflow, but doesn't give the sense of security that it initially seems to imply.
Yes, lots of people had ideas about this should work. Exactly like you just did. Unfortunately they had different ideas. And that's why there was months of debate over how this feature should work before something was accepted.
Just to give one example of an obvious problem with your idea, saying "WITH FORCE" will suggest to some that this configuration setting is somehow forcibly maintained. This is likely to generate bug reports based on that misunderstanding. "I said WITH FORCE and it is easy to get around!" And, worse yet, questions about, "Why do we have a security feature that doesn't provide any security?"
Then it'd be very purposeful, the idea of this kind of patch would be that someone Googling random older PG answers to fix a random issue wouldn't be tempted to just issue an ALTER command, think the issue is fixed and then find out after a restart/rebuild that the fix is not there any more.
This not a security patch but more of a safeguard against hapless idiots manhandling themselves without resistance in an unsupported case.
Anyone who has had to work in security knows about the endless deluge of low effort script kiddie “security reports” of non issues they have to deal with by security “researchers” fishing for a bounty. And you have to read all of them, because there’s no telling when one might be legitimate. Or you bring in some third party auditor for compliance or some other reason and they are just as clueless and snap up something like this to add to their list of “recommendations” that you have to respond to so they can feel like they are more justified collecting the paycheck.
So I think implementing the feature - but being extremely loud about what it is and isn’t - is the right move here.