Why I don't like environment variables: 1. I worry about programs dumping all th...

OJFord · on April 1, 2021

I used to agree with (2), but now I think Meh, it's an implementation detail whether the program uses my environment variable 'directly' or with a child process, it's not meaningful to make that distinction.

When it is meaningful (and this is supported today) is to set them just for specific programs/invocations, rather than exporting for a long-running interactive shell (and everything within it) willy-nilly.

More innovation around making that easier would be interesting, env vars that should be set specified by program, isolated from others, for example. So `foobar` would actually get executed like `FOO_SECRET=hunter2 foobar` without specifying it every time or having it exported in the shell, and in a generic way not specific to each program's config.

It's not really related but for some reason 'summon' is on my mind as a tool to mention. I haven't used it in anger yet, but it is interesting. It's not quite this though, or at least, it solves only the 'storage' part of the implementation of what I described, not the 'orchestration' or mapping of programs to vars/summon invocations.

jschwartzi · on April 1, 2021

This is pretty much how systemd works. You can specify secrets that are retrieved from somewhere else and provided to the process in the environment it is started with. So you could do exactly this with the right unit configurations.

OJFord · on April 1, 2021

Ha, funnily enough I mentioned systemd and then deleted it. I do run as much like that as possible, I just couldn't succinctly explain why I thought it was different or better than putting:

    VAR=whatever process

in .xinitrc or wherever.

Lvl999Noob · on April 1, 2021

Iiuc you are advocating for setting env vars at the call site, like `FOO_SECRET=hunter2 foobar`? In that case, why not just use command-line args and call it like `foobar --secret=hunter2`?

OJFord · on April 1, 2021

I wasn't advocating for it in preference to args, but there are circumstances where that's not possible, for example calling some CI/CD tool (say terraform, ansible, fabric, whatever) that doesn't consume the var itself but uses something that does.

It's also a more convenient/already generic interface for doing something consistent across multiple programs.

PeterisP · on April 1, 2021

Because in the latter case the commandline of the executed process (which may be exposed in various places, including a simple process list) is `foobar --secret=hunter2` and in the former case it's just `foobar`.

npsimons · on April 1, 2021

My first thought on the headline are specialized concerns of the above: environment variables are an attack surface. If you use them for configuration, it's all too easy for an attacker to modify them without the victim knowing. Just look at issues with LD_PRELOAD: https://attack.mitre.org/techniques/T1574/006/

That said, I agree with GP that environment variables are super useful and super simple. But I've also been burned more than a couple of times by setting something in the past and then having it caused unexpected bugs that are hard to trace down as they aren't in my working memory. They're a double-edged sword, to be sure.

jschwartzi · on April 1, 2021

There's actually a long list of variables that are unset when invoking sudo to prevent these kinds of attacks. Systemd will also start programs with a very minimal environment that isn't inherited from any shells. You then have to specify environment variables explicitly as part of the unit file. You can also specify environment variables in environment files.

HelloNurse · on April 2, 2021

1. If your program is chatty, it can be chatty in the same way regardless of where the improperly logged secrets come from; it's still your fault for being coarse and lazy. There's little difference between logging all environment variables(and/or all command line parameters) and logging the whole configuration object.

2. If your child processes shouldn't inherit environment variables, set them properly. The "ghosts of Unix past" have "foreseen the need" for execve(2) and execveat(2), which don't pass anything by "default".

rascul · on April 1, 2021

> I wish the ghosts of unix past had forseen the need for a way to mark particular variables as, say, 'sensitive' and 'noexport', allowing them to opt out of the default behaviour.

The default behavior is a non-exported variable. If you want child processes to see it, you must export it.

yrro · on April 2, 2021

There is no such thing as an exported or non-exported environment variable. In fact, as the kernel is concerned, there is no such thing as an environment _variable_ at all, just a block of data.

See execve(2):

"envp is an array of pointers to strings, conventionally of the form key=value, which are passed as the environment of the new program. The envp array must be terminated by a NULL pointer."

You can confirm this my examining the the environment block that was passed in to your current shell with:

    < /proc/$$/environ tr '\0' '\n'

What you're referring to are actually "shell parameters", some of which may be marked for export. When the shell starts up, it parses the environment block and sets parameters based on what it finds, marking them all for export. And the shell uses only the parameters marked for export when constructing the environment block for a child process (which is passed to execve(2)/execveat(2) in the envp argument).

rascul · on April 2, 2021

I guess I assumed you were referring to environment variables in the context of the shell. Apparently I was incorrect.

intrepidhero · on April 1, 2021

Isn't storing credentials in environmental variables bad practice to begin with?

emanlin · on April 1, 2021

What better place is there to store credentials?

falcolas · on April 1, 2021

A config store like Vault. Of course, that needs credentials too, which are typically a file on the file system.

IMO, people are overly sensitive about environment vars. They are really no worse than files on the file system - both can be accessed if you're a privileged user on that machine.

KptMarchewa · on April 1, 2021

Vault should be source of those env variables. Via some predefined initcontainer or something like that, to which devs don't have access to.

danielhlockard · on April 1, 2021

Or you could, you know, auth to vault and pull the creds from vault inside of your app?

tylerchr · on April 1, 2021

You could, but then you’ll have replaced a universal and standardized abstraction with a hard commitment to one very specific approach. That doesn’t come cheap.

falcolas · on April 1, 2021

One thing that I like, which this approach allows for, is live configuration. For things like databases and such which allow for the regular rolling of credentials.

It's not simple by itself, but it simplifies other things.

emanlin · on April 2, 2021

How do you auth to vault?

LanternLight83 · on April 1, 2021

Via either program config files, an inline subshell calling cat, or ssh-agent in that specific case, to keep credentials both out of the environment, and off of the command-line where it can be read by inspecting the resulting process for it's invocation.

emanlin · on April 2, 2021

All of those places can also be read.

SSH agent is a good example. It’s effectively an environment var which is why this works fine:

  sudo SSH_AUTH_SOCK=$SSH_AUTH_SOCK git clone ...

Edit:

The reason I think it’s silly to make a blanket statement environment vars are bad is because too many containers have credentials baked into the image when they should be passed in another way.

emj · on April 1, 2021

You can disable access the possibility to read the memory of other processes and you can do it for environment variables. Storing access tokens in memory is more obscure than environment variables, that is true though.

whatshisface · on April 1, 2021

Store a path to the top secret file in an environment variable, and have the program read the credentials out of the file. Put the file somewhere far away from the repo, on the deployed filesystem.

yrro · on April 1, 2021

Yes, for those reasons

darksaints · on April 1, 2021

Compared to what?

jrwr · on April 1, 2021

depends on who you are talking to, since you run the risk of committing the creds to a git repo

bopbeepboop · on April 1, 2021

How do command line arguments or config files solve either problem?

yrro · on April 1, 2021

Command line arguments aren't inherited by child processes. Unfortunately they are visible to other users on the system, so they're no good for credentials.

Config files (or an abstraction of them such as reading config data from a socket), after parsing, result in some credentials sitting in the memory of the process that needs them. They aren't in the environment block, so a quick and dirty "dump all my environment variables to stdout" procedure won't risk exposing them. And for the same reason, a child process that does the same won't inherit them in order to expose them.

Note that talking about preventing accidental exposure of credentials. Config files alone can't protect credentials from a malicious process that deliberately goes looking for them to leak them; for that additional measures have to be taken... but environment variables aren't part of the solution!

hajile · on April 1, 2021

It's also easier to encrypt a config file and provide the decrypt key externally.

PeterisP · on April 1, 2021

So then the real secret is the key, which is provided "externally" ... how? Through command line parameters, some other config file, or environment variables? :P

IMTDb · on April 1, 2021

Another encrypted config file. Obviously.

snicker7 · on April 1, 2021

Encrypted files all the way down.

Someone · on April 1, 2021

A file that’s protected appropriately or standard input.

bopbeepboop · on April 1, 2021

> Unfortunately they are visible to other users on the system, so they're no good for credentials.

If you’re concerned about your own software logging credentials, command line arguments are negative in two regards:

They’re highly visible when the process is running; they’re often automatically logged.

> They aren't in the environment block, so a quick and dirty "dump all my environment variables to stdout" procedure won't risk exposing them.

Okay — but the usual way that happens is “dump my config object in a log”, which parsed configs don’t help with.

You also now have a config file: how is it stored? ...is it in the repo? ...what are the permissions? ...how do we deploy it?

Environment variables don’t persist in repos and are designed to be integrated with hosting tools, like secrets managers.

I’m not seeing how a config file beats Kubernetes injecting from the secret store, which is why we use environment variables: so our tools (secret stores) can configure the environment our software uses.

yrro · on April 1, 2021

Good point about command line arguments being often automatically logged! So they bad for both reasons :)

Now, if you're running in k8s then you can improve your setup by mounting your secret into your container, and have your code read the credentials from the file within the mount. This just looks like another kind of config file to me :)

PeterisP · on April 1, 2021

Embedding secrets (which should be changeable and with limited access) into container images (which should be reproducible and perhaps stored in accessible locations) sounds like not a goood idea; IMHO you definitely need the capability to have the same container use different credentials so that, for example, you can run the same container in a development or testing environment as in production, but with different credentials.

yrro · on April 1, 2021

I'm talking about doing this: https://kubernetes.io/docs/concepts/configuration/secret/#us...

bopbeepboop · on April 1, 2021

I believe they meant mounting via Kubectrl as a file, from the secret manager.

So runtime file injection.

bopbeepboop · on April 1, 2021

> And for the same reason, a child process that does the same won't inherit them in order to expose them.

Wait, this is the crux of why you think it’s more secure — but actually I see the reverse problem:

Dropping environmental variables is standard security practice, but dropping file access permissions is not. Most child processes read from the same set of files as the parents.

How would having files rather than ENVs make my container more secure, where we’re concerned with developers making mistakes (passing ENV vs passing file permissions)?

Similarly, the only proposed benefit of your idea is we don’t have them around post reading — but that’s true if you initialize a config object from ENV and then pass it around as well. (You ignored my point about how mistakes via logging happen.)

yrro · on April 1, 2021

I'm concerned with 'I have credentials in my environment block and just dumped the whole thing to a log file'. Avoiding storing the credentials in the environment block obviously avoids that.

To be fair, unsetting sensitive environment variables after consuming them would probably also avoid that eventuality. I can count on the fingers of no hands the number of times I've seen developers do that! :)

Some other part of my process (or a child process I might launch) deliberately hunting for credentials in order to leak them is a different problem with other solutions.

In between these two cases we have mistakes like "dump config object (containing credentials) to a log file". That, too, can happen and should be avoided, what more can I say?

marcosdumay · on April 1, 2021

> You also now have a config file: how is it stored?

Hum... Your environment variables must be stored at some place too, so the server can be launched. You can store the files at the exact same place.

bopbeepboop · on April 1, 2021

Sure — you throw them in the Kube secret manager.

But now you have multiple config files (smart) or your entire config outside the repo (not smart). This isn’t always the wrong approach — SSH keys get loaded this way, for instance.

ENV variables naturally provide a way to layer content from different sources in a way that files don’t, so if you have a relatively simple config from multiple providers (eg, getting AWS session token from the host plus your environment config from the launch ENVs) it’s easier to use the K-V store nature of ENV variables versus multiple files.

Again, multiple config files isn’t always wrong — but using that to store single strings instead of ENV variables is a code smell, for sure.