Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Dotenv, if it is a Unix utility (github.com/gyf304)
225 points by gyf304 8 months ago | hide | past | favorite | 100 comments
I like the idea of using dotenv files, but I dislike having to use different language-specific libraries to read them.

To solve this, I created a small utility that lets you prefix any command with "dotenv" to load the ".env" file.

This is how I imagine dotenv would work if it had started as a UNIX utility rather than a Node.js library.




I think direnv already does a good job in this space, and it's already available in your package manager.

https://direnv.net/


I don't think direnv and dotenv are really the same — dotenv manages environment variables for a program, whereas direnv manages environment variables for an interactive shell.

As an example of the difference, dotenv is useful for running programs inside Docker containers — which do not inherit your interactive shell's environment variables — whereas direnv isn't particularly useful there. Ditto for programs run via init systems like systemd or even classic SysV init. On the other hand, direnv is convenient for end-user env var config, since it's aware of your shell's working directory and updates the env vars based on it without needing to run extra commands.


In my experience, the subcommand `direnv exec ...` works just fine in non-interactive scenarios like launchd jobs. I'm not sure if it even involves a shell of any kind in that mode.


I've used direnv, but I think a nice property of OP's dotenv is that it's explicit: if I want to pass env vars, I run my program under it. If I don't, then I don't. There's no "hidden behavior" for me to forget about and then get surprised by.


As far as I'm aware of, Direnv's behavior is not hidden at all. Whenever you cd into the directory, you get a message listing all the new en var activated. And when you change the .envrc, you get another message saying that direnv has been deactivated. I never had happen to me "oh shoot !! I forgot this env var was activated because I'm in this dir".


I don't think "hidden" and "explicit" are true antonyms.

From your description, direnv is implicit and noisy, whereas dotenv seems to be (unless you embed it in a script) explicit and quiet.


Well basically, you cd in the shell, and then it shows: "direnv: loading ~/dev/foo/.envrc direnv: export +DATABASE_URL" Is this "implicit" to you ? Because to me it's pretty explicit. But yeah it's automatic, if you don't want this behavior, you don't install direnv. Just to be clear, implicit is "suggested but not communicated directly", and to me, this is communicated directly, so I don't see why it would be implicit...


What happens when, 30 commands later, you execute a command in your shell and didn't remember that message from a day or so ago?


If you use direnv you always know your environment is loaded (assuming you've formed the script properly and allowed direnv to run it). And that's what you want. It's a way to keep your workspaces distinct in the terminal environment. You set what you need to set to be able to do what you need to do from any particular directory.


Of you're in the directory, don't you want the env to be loaded? For me, being able to forget it is one of the features.


Not always. I can think of a scenario where I have an env I need to load for some infrastructure stuff, and I may call some commands in a directory that has another .env that's part of what I'm trying to deploy. Generally this scenario is short lived as I'm quickly moving infra commands to automated ci/cd, but I've definitely been in this scenario more than once.


Got it! I think nested envs become confusing really quickly, so there it is indeed better to do it explicitly.

My assumption overall in this is that most people have just one .env per project (or perhaps in sub-folders per environment, e.g prod, staging, local), but these don't nest. With nested .env files, the mental overhead they bring remove (IMO) most of the benefits, if not more.


It’s the same as forgetting you put something in your .profile or .bashrc, no? In any case, both forgetting the env config and using the same shell for days in a row seem like two things that probably don’t coincide too often.


I never close my shell, I never reboot my laptop unless necessary - an uptime of 6+ months is normal. So my experience may be different.


How do you deal with kernel security patches?


Oh yeah, that's the default. Everyone I know im ediately disabled that and I even forgot about it till now


I know ohmyzsh and zsh has this covered for auto loading the .env when you enter the directory


…Just don’t commit your .envrc to a repository & add it to the .(git|hg)*ignore. Provide an example if you want, but don’t expect everyone to want to use your exact settings. This is for your personal environment.


I‘m a happy user of direnv. Hard to imagine my life without it. The only problem is not to forget to include it to .gitignore.


Please just see this

`env -S "$(cat .env)" <cmd>`

Believe it or not that’s all you need.

> S, --split-string=S process and split S into separate arguments; used to pass multiple arguments on shebang lines

edit: forgot the quotes around shell substitution


Also, if we are going to involve the shell, we could also just make .env a shell fragment and do this:

   sh -c '. .env; <cmd>'
There is a way to pass commands to it which are reliably executed, like thisL

   sh -c '. .env; "$@"' -- command arg1 arg2 arg3.
The non-option arguments passed to the shell are available as `"$@"`. A command consisting of nothing but `"$@"` basically executes the arguments. We can use `exec`, speaking of which:

   sh -c '. .env; exec "$@"' -- command arg1 arg2 arg3.
What I'm getting at is that this form is fairly easily exec-able;

   execl("/bin/sh", "/bin/sh", ". .env; exec \"$@\"", "--", "command",
         "arg1", "arg2", "arg3", (char *) 0);
The command and arguments can be arbitrary strings, not subject to any shell mangling.


I wouldn't follow this approach because if you run `. .env;` you .env gets evaluated as a bash script, not as a configuration file. This means that you can get runtime errors in the .env file, and nobody wants that.


Sourced environment scripts in the Unix environment are standard operating procedure. E.g. for toolchains.

The .env being evaluated as a shell script means that it's in a widely used language, with a widely known syntax. You can look at it and know what it's going to do.

The .env being a data format to some uncommon utility; that's anyone's guess.

For instance, suppose we want a newline character in an environment variable. Does the given "env file" format support that? How?

There is one de-facto standard format: the /proc/<pid>/environ kernel output format on Linux. The variables are null-terminated strings, so it is effectively a binary format. It represents any variable value without requiring quoting mechanisms.


This is now syntax that requires processing by the shell.

The nice thing about utilities like env and dotenv is that they can be easily exec-ed:

  execl("/usr/bin/dotenv", "/usr/bin/dotenv", "command", "arg", (char *) 0);
-S is a fairly recently added option to the GNU Coreutils env (possibly inspired by BSD?). I have a window to an Ubuntu 18 VM where it's not available.

You want $(cat .env) quoted, as in "$(cat .env)" so that the content of the file is reliably passed as one argument.

-S will split on whitespace; but it respects quoting, so spaces can be protected. Basically .env has to be prepared with the features of -S in mind. Of which that thing has quite a few: escape sequences like \n, commenting, environment variable substitution.


This will fail with comments. Of course you can script around that as well (I have done so), but it's not bulletproof. It makes sense to have a dedicated tool for the job.


Isn’t the problem with dotenv that it’s not a formal specification? The closest to a specification is the “reference” nodejs implementation. Even across languages that aren’t shell the behaviors differ to some extent. I think also it’s not just comments, there are probably some other edge cases that can’t be parsed as legitimate shell code either.


ksh/bash can abbreviate $(cat x) to $(<x) although this syntax is not in POSIX (and it should be).


What is the difference with or without "-S". 'env $(cat .env) <cmd>' still work?


edit 2: seems that there are expectation around a complex .env unspecified file format I was totally not aware of, I was just merely trying to share the simplest way I've ever found to store and reuse env vars


Compare with dotenvx - https://github.com/dotenvx/dotenvx This is my current tool of choice.


I just remembered. Adding a -f <file> option to the GNU Coreutils env utility has previously been discussed:

https://lists.gnu.org/archive/html/coreutils/2021-10/msg0000...

It came up in the mailing also this March. I saw the posting in my inbox and proposed that a null-terminated format be handled, which is exactly like /proc/<pid>/env:

https://lists.gnu.org/archive/html/coreutils/2024-03/msg0014...

If that feature were available, this becomes

  env -f .env command arg ...


There is also shdotenv that allows you to load different .env file formats and convert between them, e.g. for UNIX shell.

https://github.com/ko1nksm/shdotenv


This is a very neat project that seems to accomplish the same goal and have some extra features.


I recently discovered shdotenv and I like it a lot!


dotenv started as a ruby library actually. The first implementation inspired the others such as the golang version of the library.


    export $(cat .env | xargs)
Agree with the premise but this can be achieved with actual Unix concepts no need for anything else.

The language runtime dotenv projects are banned in my engineering org.


Your example has the downside of making the environment variables sticky, however, so it's not achieving the same thing.


What about:

    source <(cat .env | xargs)
or:

    export $(cat .env | xargs)
And then:

    unset $(cat .env | cut -d= -f1)
?

The last one unsets the environment variables that were set by the first command, ensuring they are not persisted beyond the current shell session.

If you are worried about forgetting to execute it, there are a couple of ways to work around it, depending on your case.


That is kinda the purpose of an environment.


I jump around between multiple projects every day. Sticky environment variables carry risk.


I would suggest using a tool like tmux to partition those projects entirely. Instead of tearing down env and building it back up to switch projects, just re-attach to that tmux session. I treat this stuff as though it’s immutable and try to consciously avoid cross pollination.


That’s reasonable, but my point stands: your original proposal is insufficient to be treated as equivalent.


    env $(cat .env) my-cmd-wanting-dotenv
would, though, wouldn't it?

ETA: the main difference between `env` and `dotenv` seems to be that `env` gets its arguments from the command line, whereas `dotenv` gets its arguments from a file. I think that's a fair difference, but I might also think that perhaps `env` should expand its offering to include some kind of `-f filename` option so that it can focus on the notion of "a configurable sub-environment for a command" and we can avoid subtle distinctions.


Further addition, I haven't investigated dotenv deeply, but I suppose it would be a command that specialises in making sure the contents of .env are just environmental variables that get defined. The `env` command as I wrote it is probably not the sort of thing you want to just trust on a file in a git repo shared with colleagues. Anyway, like my ETA above suggests I'm in two minds about whether env and dotenv should be the same thing with different arguments or not.


Several people, including you, are proposing using env rather than sourcing; is that somehow preferable to something like this?

    (. .env; my-cmd)


See my comment sibling to yours for some concerns with `env $(cat file)`; I would have these and then some with sourcing the file even in a subshell. You can do whatever you want in a shell script which can have effects outside of the subshell.

Another advantage of env is that you can type `man env` and learn something useful; sourcing and subshells via syntax is a little bit harder.

Finally, I think the major point of this branch of the discussion is to explicitly decorate a command with a special environment. Starting up a subshell isn't the same thing. It might have the same effect, but you can see that you're creating a subshell, running a builtin in the subshell, and then running a command in the subshell. It is something of a difference between declarative (dotenv/env) and imperative (sourcing in a subshell) approaches, and inherits all the pros and cons of the imperative approach.

If it works for you, I make no recommendation against it.


Not really, if we are just talking about the "run environment" of a single binary.


What about:

    env $(cat .env) [program]


Has whitespace handling issues... but valiant effort!


When your environment variable values have spaces (e.g. some connection strings) this doesn’t work iirc


I tend to agree, and we do this a lot actually. But it gets a little more complicated if you have several .env files.

Would love to hear more about why dotenv is banned at your org though.


Because I banned it haha. There should not be more than one .env file. Our projects have a .env.example that has any overrides a dev might want to override but this list is kept intentionally very short. Meanwhile .env is noted in gitignore. I absolutely hate seeing an entire application configured with environment variables. Some? Sure, where it makes sense. Most? No, those should be in version control, secrets aside.

I believe in convention over configuration. Most of our apps have hard-coded config, with a concise/short and finite number of things that can be overridden (like 3-4 parameters, tops). Secrets get injected.

I do subscribe to the idea of the 12 factor app, but there is a line that needs to be drawn between env config which is more dynamic and more persistent config that should be baked in to the release.


To add to that, SOME_SECRET env vars should be banned (or at least overridable) in favor of SOME_SECRET_FILE env vars. I usually just put an example of the env vars into the readme or link to the file in the source code handling that directly.


But then the problem is changing configs means building a new release, and needs code push access. Pretty much every config variable has env override in my apps - allows project owner to poke about in web UI without bothering me for changes.


Would not

sh -c '. .env; echo $MY_VAR'

do the same thing? (I am not in front of a shell at the moment.)


There are like a couple dozen different ways to do this...

I have this on my .bashrc:

    alias loadenv='export $(xargs <.env)'
source: [1]

--

1: https://stackoverflow.com/a/60406814/855105


That will break if there's comments in the file, or if any one of the variables' values contain spaces. You can use `set -a` to load .env into an existing shell instance instead:

    loadenv() {
        set -a
        source ./.env
        set +a
    }


Cool! This answers a question someone had in this thread.

... except I'm thinking this may `set +a` if the environment already had `set -a`, which maybe could cause problems? I wonder if it would make sense to record the existing status of "-a" (allexports) an set it / unset it as necessary.


You could do that, and it'd still be POSIX-shell comptible:

    loadenv() {
        case "$-" in
            *a*) source ./.env ;;
            *) set -a; source ./.env; set +a ;;
        esac
    }
Although I have yet to see a long shellscript utilise `set -a` globally :)


Very nice! Thanks for the suggestion. Seems more Unix-esque. Are there any important drawbacks of this version compared to the dedicated tool? (dotenv or dotenvx)


I can't believe I've never thought of doing this until now.


You'd need to `set -a` or pass the `-a` as a flag to have them auto-exported though, so:

    sh -ac '. ./.env; ./prog'
Also if you use the `.` builtin it's a good idea to specify the path with a slash in it, so that `.` doesn't search $PATH first.


That would, but unless each line in your .env file is prefixed with "export", those env vars won't get passed into any subprocesses you run.


Now that is unixy!


Seems to work.


Thanks to OP and other posters - various ideas useful in different cases.

The xargs idea made me think of using bash as the parser :

  bash -c "exec -c bash -c 'source $CONFIG/main.bash; env'"
This test .bash file contains multiple source-s of other .bash files, which contain a mix of comments, functions, set and env vars - just the env vars are exported by env. This seems useful e.g. for collating & summarising an environment for docker run -e.

This outputs the env vars to stdout; for the OP's purpose, the output could be sourced :

  envFile=$(mktemp /tmp/env.XXXXXX);

  bash -c "exec -c bash -c 'source $CONFIG/main.bash; env'" > $envFile;

  env $(cat $envFile) sh -c 'echo $API_HOST'
# For Bourne shell, use env -i in place of exec -c :

sh -c "env -i sh -c '. $CONFIG/main.sh; env'" > $envFile


This looks good and neater than my solution in my .zshrc:

envup() {

  local file=$([ -z "$1" ] && echo ".env" || echo ".env.$1")

  if [ -f $file ]; then
    set -a
    source $file
    set +a
  else
    echo "No $file file found" 1>&2
    return 1
  fi
}

You can also specify `envup development` to load .env.development files should you want. Obviously this will pollute the current shell but for me it is fine.


This is interestingly similar to a little tool I wrote called sops-run [0], which manages encrypted secrets for cli tools using Mozilla’s sops [1]. Biggest upshot is that you can use it more confidently for secrets with encryption at rest. Built it when I was trying out CLI tools that wanted API keys, but I didn’t want to shove them into my profile and accidentally upload them into my dotfiles repository. Do need to finally get back to making this a package, being able to install it with pip(x) would be really nice.

[0] https://github.com/belthesar/sops-run

[1] https://github.com/getsops/sops


Doesn’t this already exist as https://www.npmjs.com/package/dotenv-cli ?



Since loading dotenv files happens together with executing code I I have decided to trust my .env files just like I trust the rest of my code not to delete my entire system and therefore I source them.


I never understood why it had to be a dot file, except for naming it.


Does this accept the exact same format (including quotes and whitespace) as a Docker env file? That’s a key feature for me


I don't understand what more it does than sourcing a file on your shell would? Anyone can explain?


You can't source an .env file without some munging. All the keys would need `export` in front of them I believe.


That doesn't seem like a huge barrier compared to shipping a dotenv binary compiled specifically to all deployment arch.


It's not a huge barrier, but it's still a barrier. I have lots of infra using Helm/K8s, sometimes Docker. These .env files don't have `export` keywords in them.

So your suggestion is to munge these for local development. And you're okay with that barrier? That's terrible dx, and it adds surface area for bugs.


A sourced .env would have to be correct shell syntax for a sourced environment file, yes.


But a lot of env file doesn't do that iirc. `allexport` solves this though.


You can implement that in a simple shell function anyway.


This idea seems to be cloned everywhere now, so something is causing the popularity


Kubernetes / containderd / docker apps are much more convenient to configure through ENV vars, as they easily pass through the sandbox layer (whatever that may be) files are not so easy to make work. Because that's how prod works devs want to be able to recreate prod to run locally, hence the cambrian explosion of tools like this.


What a tragic state of affairs.

It's a shame that running modern software requires carefully packaging a virtual environment and then injecting a bunch of ugly global env vars.

I still think Docker shouldn't exist. Programs should simply bundle their dependencies. Running a program should be as simple as download, unzip, run. No complex hierarchical container management needed.

Alas I am not King.


Docker is "programs bundling their dependencies".


In an extremely heavyweight and needlessly convoluted way.


> Programs should simply bundle their dependencies.

awww. i don't think it's OK in any way to download libc6/msvcrt as many times as I download __any__ software. even more, is there a strong difference between dependency and runtime environment? if sensible people does not bundle the whole python distribution to a "stuff.py" then why bundle libopenssl.so to a webserver application?

IMO, a saner approach would be just not to confuse dependencies: appX depends on libY 1.9; appZ depends on libY 2.0; people are quick to declare that appX and appZ are incompatibe as they can not run on the same system due to "conflicting dependencies". but who said you have to seach libY in /usr/lib*/libY.so? if you need different versions of a lib, just install them in separate dirs and make your apps find the right one (eg. by setting RPATH or versioned .so filenames).


> is there a strong difference between dependency and runtime environment?

Programs should rely on the global runtime environment as little as possible

> if sensible people does not bundle the whole python distribution to a "stuff.py"

Unfortunately Python deployment is such a such an unmitigated disaster that it's a leading cause of Docker images.

Deploying a portable copy of Python is about 9 megabytes compressed. This is significantly preferable to multi-gigabyte Docker images.

> people are quick to declare that appX and appZ are incompatibe as they can not run on the same system due to "conflicting dependencies". but who said you have to seach libY in /usr/lib*/libY.so? if you need different versions of a lib, just install them in separate dirs and make your apps find the right one (eg. by setting RPATH or versioned .so filenames).

You make a strong and compelling argument as to why programs should bundle their dependencies and not rely on the system environment.

Users should not have to perform any witchcraft to launch a program. Download and run. No further steps should be necessary.


C? Y u no Rust?


The same thing already exists in Rust, it’s both a library for in-process loading and a binary, I use it daily and only for the binary: https://github.com/allan2/dotenvy


Once I actually wrote a version in Emacs Lisp, for purposes of being able to run stacks that depended on .env configuration in Emacs buffers.


[flagged]


what are the suggestions


GPT-4:

The code provided has a few potential issues, including security vulnerabilities:

Buffer Overflow and Memory Allocation Errors: The malloc function in read_file does not check if the memory allocation fails (it checks if buffer is NULL instead of buffer). This can lead to a null pointer dereference if malloc fails and returns NULL. There's a possibility of buffer overflow or improper handling if the file size read by ftell is exactly MAX_FILE_SIZE, because an additional byte is added (buffer = malloc(file_size + 1);). Improper Input Validation: The skip_until_charset and skip_charset functions do not check if the end of the buffer is reached within their loops. If the charset does not appear in the buffer, this could lead to reading beyond the buffer's limit, causing undefined behavior. Use of Uninitialized Variables: In the main function, the value_end pointer is set conditionally inside the if (cur == '"' || cur == '\'') block but used unconditionally outside it. This can lead to the use of an uninitialized pointer if the if block is not executed. Command Line Argument Handling: The check if (argc < 1) is unnecessary because argc will always be at least 1 in any C program, as it includes the name of the program itself. Insufficient validation of command arguments, specifically not handling scenarios where argv[1] might not be provided correctly or safely. Potential Command Injection: The execvp call uses user-supplied input (argv[1] and argv[2...]) without sanitization. If the environment or the .env file can be manipulated, it could lead to command injection vulnerabilities. Error Handling: The program's error messages (e.g., "Missing equal after key") are clear but the handling of such errors (e.g., continuing to parse or halting operations) may need to be revisited based on desired robustness.

Claude:

Here are a few potential security vulnerabilities and bugs I noticed in the code:

The read_file function does not properly handle the case where malloc fails and returns NULL. It should check if buffer is NULL after the malloc call and handle the error appropriately, likely by returning an error code. Currently it would dereference a NULL pointer. The code trusts the contents of the .env file. If an attacker can modify the .env file, they could potentially inject malicious values into environment variables that might be used in an unsafe way by the child process. There are no checks on the size of keys or values. A malicious .env file could contain extremely long keys or values that could potentially cause issues. The code uses strchr to look for meta characters like '=', '#', etc. If the .env file contains non-ASCII characters, this could potentially lead to unexpected behavior. The code uses execvp to execute the provided command with the parsed environment. If the command is provided by an untrusted source, this is a command injection vulnerability. The code should validate the command is on an allowlist before executing. The parsing allows for escape sequences in values, but doesn't have bounds checks. A maliciously crafted escape sequence could read or write out of bounds. The code will modify the process environment even if executing the final command fails. It may be better to parse into a separate data structure and only update the environment if the final execvp call will succeed. It's a good practice to set a proper process umask before writing files or creating directories accessible to other users on the system.

While the code looks generally well written, for a utility like this that parses untrusted input and executes commands, it requires very careful security review and hardening. I'd recommend a thorough security audit before using it on untrusted .env files or with untrusted commands. Adding additional input validation, using allowlists, and limiting the scope of what env vars and commands are allowed would help mitigate some risks.


I think if you're not willing to manually verify the output of these generative machine learned algorithms, then you probably shouldn't present them to somebody as if you've done them the service of a free code review.


I don't understand people who chime in just to regurgitate GPT slop that they haven't verified (and sometimes not even read!)

I don't it hard to believe that they truly believe they're doing us a favor. Surely they're doing it just to feel smart or included?


I suggested he look at it since he is the expert on his code lol.


* Memory allocation NULL check: this is a bona-fide bug introduced by my refactoring

* MAX_FILE_SIZE: I don't think this is true.

* skip_until_charset, skip_charset bound check bug: I don't think this is true

* Uninitialized value_end: I don't think this is true - and if true should be caught by -Wall -Werror flags.

* argc < 1 check not being necessary: This is not true, you can make argc == 0 by using the exec family of libc functions.

* Error Handling: Currently all parsing errors should cause the program to exit, which I think is the desired behavior.

* Unsanitized input for .env: Intended behavior.

* Unsanitized input for execvp: Intended behavior.


GPT-4 is not a static code analyser


These LLMs are not a lot of things that people think they can use them for, apparently. Therapist, search engine, cheap coding labour, the list goes on.

These LLMs produce syntacticly valid language, no more. And information contained within is a by-product and not necessarily factual nor correct.


Still incredibly helpful. Therapy? Helped me a couple of times on some dark days. It’s not a replacement for it, but it helped me. Search engine? Why not? As long as you verify claims, you’re good, and especially nowadays the answers are preferable to whatever Google thinks should be on the first 10 pages (hint: it’s all crap). Cheap coding labour? Again, as long as you verify. A senior programmer can get a lot more done in a day this way.

It’s not necessarily factual or correct, no. But it’s incredibly helpful and useful. As always, the real answer is somewhere in the middle.

Just don’t ask it about politics. It’s so blatantly obviously biased.


If you don't mind, I'm really interested in how you used an LLM for therapy. My Gmail username is the same as my HN username if you feel comfortable talking about it and helping others. Thank you!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: