Many thanks to ChALkeR for responsibly disclosing this to npm and giving us time to notify people and clean up as much as possible. We were very busy, and ChALkeR was incredibly patient with us :-)
In response to this disclosure, we have set up a continuously-running scanner for credential leakages of various kinds. It's not foolproof, but it's made things a lot better. We'll be writing a proper blog post about this at some point, but we've been really busy!
I published a tiny script that makes mass grabbing of files from Github easy (https://github.com/cortesi/ghrabber), and wrote about some of the interesting things one could find with it. For example, there are hundreds of complete browser profiles on Github, including cookies, browsing history, etc:
I've held back on some of the more damaging leaks that are easy to exploit en-masse with a tool like this (some are discussed in the linked post, but there are many more), because there's just no way to counteract this effectively without co-operation from Github. I've reported this to Github with concrete suggestions for improving things, but have never received a response.
You can access all of this functionality with ghrabber.
One of my suggestions to Github is that they disable indexing of dotfiles of all persuasions (including contents of dot-directories), unless the repo owner explicitly opts in. That would make it much harder to find a very large fraction of the more obvious leaks.
Nearly all of them, depending on what exactly you're looking for. Simple things like being able to exclude results from forked repos can save a huge amount of time, and being able to limit results by language, creation date and even number of stars (to find personal repos) has come in useful.
I've heard tell that there are people out there who make use of GH's activity feeds to scrape just about every action that's taken on all GH repos.
If true, doesn't this make crippling the usefulness of GH's search really superfluous?
Full disclosure: I'm never a fan of crippling search to cover the ass of someone who has pushed sensitive information to a publicly accessible location. I'm still sore about Google's decision to do things like prevent one from searching for -for instance- credit card numbers. :(
The legitimate use comes from real world situation. Quite often I first cat a file only to find out it is too large to look through by hand and only then grep it. As the last command was `cat /some/long/path`, I go up to the last command and just add the grep to the end (and thus end up with cat | grep). Likewise, I vaguely remember that grep has a count switch, but looking it up in the manpage is more work than using wc (-> grep | wc). And likewise, before I would use find's -exec command I would need to look up the precise syntax again, because there where some details regarding character escapes, IIRC.
Remember, we are always intermediates at most things[1].
This isn't "shell golf". This idea that we shouldn't use small focused tools in a chain, but rather we should find as many arcane arguments and switches as we can to shorten the chain, is contraindicated by the Unix philosophy. I don't know why this nitpick comes up so often. There are many "human factor" reasons why a longer chain with simpler commands is desirable.
Is it really so arcane to `grep PATTERN FILE` or `grep PATTERN` <kbd>Alt .</kbd> (if the previous command was `cat FILE`)? Is it also arcane to `pgrep PATTERN` instead of `ps aux | grep PATTERN`? Is it also arcane to `egrep 'PATTERN|PATTERN'` instead of `grep PATTERN | grep PATTERN`? Personally I prefer the "correctness" of this sort of approach, but the tools are just a means to an end and understandably people have varying preferences. Ironically "legitimate" was probably not an accurate choice of words.
> `egrep 'PATTERN|PATTERN'` instead of `grep PATTERN | grep PATTERN`
Oops? Ironically (assuming two distinct values of PATTERN) I think you just answered your own question. (They are different: first is disjunction of patterns, second is conjunction).
Your point has merit for scripts (performance) but for data exploration at the prompt it's almost always irrelevant: the simplicity of pipe composition outweighs anything else.
Whoops you've got me there. Yes for that example, the grep alternative is not very elegant. Anyways I wasn't making an argument against composition, just particular types of composition (such as useless use of cat, parsing ls, grepping ps) for which there are side-effects or there is a simpler or more appropriate alternative.
I'm sure I've done this in the past haha, the npm workflow isn't great at times in this regard. If you have something (to test etc) that is not checked into Git, but still in the directory, it can still make its way into a publish. That's definitely what I'd advise people to be most careful of, use npm-link and use credentials elsewhere etc.
Koa I'm curious of, I've seen almost every pull-request go in there, anyway nice post.
There's an easy way to prevent credential leakage when publishing to npm => Explicitly list the files to include in the package through the `files` property in package.json.
I have taken to this route. It also clears out the cruft to bring dependency directory size down. Your module doesn't need .editorconfig or README.md and other stuff to run, remove it from the published stuff.
One of the things that worries me about nodejs is the huge chain of dependencies. I'm not an expert on these things so it would be amazing if someone could correct me if I'm wrong.
It's enough for one of the packages down the line to break compatibility and don't change the version correctly (i.e. bump up major version number bit), or have a slightly too loose version requirements and everything breaks down the line. Ok, if something gets broken it's relatively easy to notice given the test coverage is good enough.
However, it's much much harder when it comes to security breaches (like the one described in the linked article), you might not notice it for a long long time.
Anecdotal data but I tried to teach the interns to use yeoman when they were working on a small angularjs project and it just didn't work, because some dependency somewhere was broken. Happened to me as well and the solution was to try to update it a few days later (should have opened an issue, I know).
I'm using npm shrinkwrap to avoid surprises but still.. It just doesn't feel right. I shouldn't be risking to break the project just by updating the dependencies, unless I've decided to update one of the dependencies to a new major version.
Yes, you should use npm shrinkwrap. It baffles me that automatically update your dependencies is considering the right thing to do.
Practically that means that you can push a semicolon fix, your CI server will fetch a different (newer) version of a dependency and break something completely unrelated.
Secondly, it's not deterministic and will generate huge diffs every time you run it even if nothing changes.
Uber has a tool called npm-shrinkwrap that in theory is supposed to solve the latter, but I've never gotten it working on my current projects: https://github.com/uber/npm-shrinkwrap
> It baffles me that automatically update your dependencies is considering the right thing to do.
The idea is to rely on semver. If you do ~1.3.4 in your dependency then if that dependency follows semver properly, you'll get 1.3.5 if it's out, and your stuff will still work, but you're getting bug fixes and patches without having to keep an eye on the sometimes hundreds of dependencies. Luckily tools like greenkeeper.io are around now.
The drawback is many people don't follow semver, so I opt to appending --save-exact to all npm installs (actually have npm config set save-exact true)
Exactly. In ideal world, semver would solve this issue really easily (probably not completely but to a large extent). However, there are so many dependencies in a typical nodejs project that I have hard time trusting that devs will follow semvar :)
Yes, shrinkwrapp is a good solution to lock down your dependency versions.
Regarding the security breaches, this is a big problem because, as you mention, there are huge chains of dependencies, and also some of the vulnerabilities are not even in your direct dependency - which makes it hard to fix vulnerabilities.
Snyk recently released stats that found that 14% of npm packages carry known vulnerabilities. Also 64% of our users have found a vulnerability in a private repo, so the chance that your package also contains a vulnerable dependency is quite high.(I work there btw)
We released an open source cli tool that helps you find and fix vulnerabilities in your dependencies tree. You are welcome to try it - https://snyk.io.
You can also take a look at our open source vulnerabilities database with advisories and patches at https://github.com/Snyk/vulndb.
Looks interesting, I will take a look :) after briefly skimming through the landing page, seems like a cool thing to add to CI process because, well, there's no reason not to automate that and hope that folks will run it from time to time.
If you want to be concerned about the npm chain of dependencies model, one thing you should probably think about is that there is no curation of code on npm, so a package could be malicious (e.g. if an attacker compromised a set of package maintainer creds from a dot file) and npm packages (like most package formats) have the ability to execute code on install...
I actually found out about this because the guy that created this project contacted me with respect to a package I had uploaded that contained my .npmrc. I was totally blown away, as I'd just followed instructions for creating an npm package I found online. When he contacted me -- prior to publishing this work, which leaves me in awe of his coolness -- panic ran through my veins, because I'm usually paranoid about this kind of thing. Through talking with him, I discovered that I'd published my .npmrc inadvertently, and I got pretty mad at npm that it was even possible. When the npm people contacted me (I'm assuming they had acted on ChALkeR's contacting them), they were very receptive to the obvious feedback of checking for this kind of thing when publishing.
It really depends what's in the .npmrc. For example, you might have one containing only a setting to use absolute versions when installing packages and saving them. It's also worth noting that it's a good idea (although I always forget) to use the files field of package.json to act as a whitelist.
Edit: the author notes that these are excluded by npm anyway these days. The documentation does not reflect this.
"Please, don't re-use the same password to «fool» the robot while restoring your password — this will result in your account being vulnerable. Yes, several people have done that."
Wow, this is the scariest part. You already have your details leaked, get notified about it and still decide that resetting the token/login to the original value would be the best thing to do.
Github and bitbucket etc really should offer an opt-out scan-on-push service that looks for the most common mistakes and reject the push with an URL explaining what's going on in the server echo.
That's pretty sweet, but by then the damage's already done. Rejecting the push would be even better, making sure the confidential data never goes public. AIUI dark-hatted people are scraping the real-time push feeds at Github for credentials and botting up exploits, so even a few seconds could be a big enough window for damage to be done.
I was just thinking of this exact story when the grandparent mentioned AWS. They have a huge financial interest in automatically detecting this kind of thing.
I mean, obviously AWS can't reject your pushes, so it's the best they can do, but I agree that it would be nice if Github did this as well.
> That's pretty sweet, but by then the damage's already done.
I'm not sure that's true. I was able to disable the key before anyone used it (although it was locked down so far that they couldn't have charged anything to my account, since I didn't trust the code I was testing with real money).
There's been _alot_ of comments here and elsewhere that state the exact same thing. On one hand, no it's not their fault and _should_ they be held accountable - obviously no. But it would be so awesome if they did provide this. Wherever human error can occur, it will occur.
No too much load, really. We (Bitbucket) already have pre-receive hooks for a handful of other things. The trick would be defining the rules properly to have a reasonably low false negative rate while avoiding work inhibiting false positives (or allow for a mechanism to override it with, say, a force push).
Of course, 93% of our repositories are private so this feature may not be exceedingly useful to our customers vs other things we could be spending our time on.
Edit: I shouldn't have said not useful, rather, comparatively there may be more value in us pursuing other work first. E.g., provide a mechanism for 3rd party pre-receive hooks via our add-on system.
We're very focused on professional teams working on private projects -- but you can't see that because they're all private. Bitbucket is sort of like an iceberg: 1. you can only see a small percentage of the of the total mass; 2. it is blue.
Yes, we're part of Atlassian and we're hiring in San Francisco.
FWIW Github already scans your commits for Github OAuth tokens and revokes any tokens it found. Doesn't prevent you from checking in SSH keys or anything like that but it's something, right? (Source: I did this once with a non-priviledged token. Oops.)
Yeah, definitely hard to do for free probably, but it's a potentially great value add, and could be an additional service an organization/user could pay for.
What worries me is that this is possible at all. npm stores npmjs.org credentials in a repository-local dotfile, and this is how packages are submitted?!
PHP's package repository, Packagist, doesn't have this problem because it's in the browser. You never enter or store any credentials on the command-line, you click a button on the Packagist site and it tracks your already-published GitHub repository.
Like most dotfiles it starts in the current directory and works its way up. Unless npm as changed, the default location for .npmrc is your home directory. You have to actively store the file in the repo.
This is what I'm not understanding about this. How in the world do you make the mistake of storing credential files in a repo? And this seems to be beyond people not making template files for configs. Then again, I don't know anything about NodeJS packaging.
A reminder that while you shouldn't rely on them, tools like https://github.com/jandre/safe-commit-hook can help protect you from mistakenly committing secrets to git repositories.
Has anyone looked into leaked credentials in images on the Docker Hub? I can't count how many times I've forgotten to add .env to my .dockerignore file before building.
Title is a bit misleading. Actual content title is 'Do not underestimate credentials leaks.'
The article states that many popular Node.js packages have had leaks (in the past). Also, this article was not the source of many of these leaks (example: bower's github oauth token was expired by github itself when it was posted to the website).
While it was not me who posted this to the Hacker News (so don't blame me for the title here), I can assure you that all mentioned credentials were active at the time when I found them.
You should not trust automatic tools to do that. They will inevitably be subject to both false negatives and false positives, and will most probably just give you a false sense of security but will not protect you from the actual leak.
You should better review stuff that you publish. That includes commit review, package contents review before publishing them, config files review, logs review before sharing them.
If you have an org — it would better to educate your devs more and make each commit go through an independent review. Also, don't forget about checking package contents.
While you should do everything you said I don't see the harm in an extra safety net where an automated tool may catch something you miss. Automated tools won't be as good as s human but humans are not perfect either and are bound to make a mistake; if there are tools that can assist I'm all for it.
The problem is the false sense of security. The idea that "something is better than nothing" does not necessarily hold true in security, and additional layers can weaken your security rather than strengthen it.
I don't think sanity checks are a form of false sense of security. Ideally the way you develop software those types of credentials would never even be in your project but maybe you're testing something and they're temporarily in there (because we've all done that); a warning could let you know you're about to screw up.
Naturally anyone can become dependent on anything designed to assist them. I'm not really passionated about either direction really.
It would be nice if there was an interactive publish option, something like `npm publish --confirm`, which would print a list of the files to be uploaded and wait for you to type "y" to confirm.
Ruby will but it's an API key not creds directly.
Python probably will, IIRC it's not auto-created but is general pratice to put creds in a dot file for PyPI.
Go doesn't really have a widely used package manager in that sense, people use github repos (and go get) from what I've seen.
Not too sure about PHP and Lua...
In response to this disclosure, we have set up a continuously-running scanner for credential leakages of various kinds. It's not foolproof, but it's made things a lot better. We'll be writing a proper blog post about this at some point, but we've been really busy!