Quoting globalise83 from the other thread for summing it up perfectly:
>"we received a report to our security bug bounty program of a vulnerability that would allow an attacker to publish new versions of any npm package using an account without proper authorization."
>"This vulnerability existed in the npm registry beyond the timeframe for which we have telemetry to determine whether it has ever been exploited maliciously. However, we can say with high confidence that this vulnerability has not been exploited maliciously during the timeframe for which we have available telemetry, which goes back to September 2020."
>Any version of any npm package published before September 2020 could have been tampered with by anyone aware of that exploit and no-one would be any the wiser. That is pretty bad news.
No kidding. That’s bad enough news to warrant flagging any release by users not currently authorized to its respective package. At least internally for further analysis, but probably also for maintainers and users to at least have an opportunity to do their own threat assessment.
Because this is buried in the post and people don't seem to be grokking it:
> Second, on November 2 we received a report to our security bug bounty program of a vulnerability that would allow an attacker to publish new versions of any npm package using an account without proper authorization.
They correctly authenticated the attacker and checked they were authorised to upload a new version of their own package, but a malicious payload allowed the attacker to then upload a new version of a completely unrelated package that they weren't authorised for. Ouch!
> However, the service that performs underlying updates to the registry data determined which package to publish based on the contents of the uploaded package file
Yeah, this is what's going to keep me up tonight. Yikes.
I can't help but wonder if the root cause was HTTP request smuggling, or if changing package.json was enough.
How do we even mitigate against these types of supply-chain attacks, aside from disabling run-scripts, using lockfiles and carefully auditing the entire dependency tree on every module update?
I'm seriously considering moving to a workflow of installing dependencies in containers or VMs, auditing them there, and then perhaps commiting known safe snapshots of node_modules into my repos (YUCK). Horrible developer experience, but at least it'll help me sleep at night.
How do we even mitigate against these types of supply-chain attacks
Don’t import thousands of modules from third parties just to write a simple web app. If you have 10 stable dependencies it’s no problem to vendor them and vet changes. If you have 10k you’ve entirely given up on any pretence of security.
Recently Node 16 LTS cycle started. One month and a few days before the carry-over, a super controversial package titled `coredeps` [0] was officially declared a core module and has been bundled with all official distributions since.
The NodeJS team refuses to discuss NPM because it's a separate 3rd party. And yet.... this NodeJS Core module comes pre-installed as a global NPM package.
We're just getting started.
This module installs or even reinstalls any supported package manager when you execute a script with a name that would match any that they'd recognise. Opt-in for only a short period, and intending to expand beyond package manager installations.
Amidst all that's been going on, NPM (Nonstop Published Moments) is working on a feature that silently hijacks user commands and installs foreign software. The code found in those compromised packages operated in a similar manner and was labeled a critical severity vulnerability.
The following might actually make you cry.
Of these third party remote distributions it's downloading, the number of checksum, keys, or even build configurations that are being verified is 0.
The game that Microsoft is playing with their recent acquisitions here is quite clear, but there's too much collateral damage.
Not that I agree with the methodology running `corepack enable` introduces, providing OS shims for the specific package manager commands to download them...
corepack (or package manager manager) was transferred to be a Node.js foundation project, voted to be included in release by the Node.js Technical Steering Committee. The one member I'm aware is affiliated with Github/NPM abstained from the vote. The specific utility of corepack is being championed by the package managers not distributed with node so that (Microsofts) `npm` is not the single default choice.
I'm interested to hear what parts of this you see as coming from Microsoft/NPM as I didn't get that vibe? In my view this was more likely reactionary to the Microsoft acquisitions (npm previously being a benign tumour, doctors are now suggesting it may grow :)
I think Corepack is a bad idea and have explicitly added feedback to say so. That said, I know you're misrepresenting the situation (whether intended or not) by suggesting this is a Microsoft initiative (it's not, Microsoft acquired NPM, if anything is even relevant to that acquisition this is meant to distance Node from that initiative).
Whether this is entirely by design I don't know, but Microsoft's positioning in the ecosystem is just brilliant. They're like a force of nature now.
NPM's security issues prime the ecosystem for privacy and security topic marketing (ongoing, check their blog), which is leveraged to increase demand for Github's new cloud-based services.
In the meantime they will just carry on moving parts of NPM to Github until there's so little of the former left, that it'll be hard to justify sticking with it rather than just moving to Github's registry like everyone else.
Eventually NPM gets snuffed-out and people will either be glad it's finally gone, or perhaps not even notice.
To reiterate on what sibling comments said, I'm the one who spawned the discussion and implementation of Corepack, and npm remained largely out of it; the push mostly came from pnpm and Yarn.
Additionally, unlike other approaches, Corepack ensures that package manager versions are pinned per project and you don't need to blindly install newest ones via `npm i -g npm` (which could potentially be hijacked via the type of vulnerability discussed here). It intends to make your projects more secure, not less.
- No security checks are present in the package manager download and installation process so there are still no guarantees.
- Existing installations of package managers are automatically overwritten when the user calls their binary. What if this was a custom compilation or other customisations were made?
- This solution does a lot more behind the scenes than just run that yarn command that the user asked for but hand't installed.
- Why not simply notify the user when their package manager isn't installed or only allow it with a forced flag? (As has been suggested uncountable times by numerous people anywhere this topic came up over the years.)
Disrespecting user autonomy, capacity to self-regulate, and ownership over their machine and code is not the way.
People don't directly import thousands of modules. It's actually a lot closer to your "10 stable dependencies". But those dependencies have dependencies that have dependencies. It's a little hard to point the finger at application developers here, IMO.
Some of the comments in this thread are wild. Huge dependency trees are bad pattern, plain and simple.
The problem isn’t only ridiculous amounts of untrusted code, but thousands of new developers of the last 10 years who think this is the way to write reliable code. Never acknowledged the risks of having everyone write your code for you, and overestimate how unique and interesting their apps are.
If you must participate in this madness, static analysis tools exist to scan your 10000 dependencies, taking security seriously is the issue.
> Huge dependency trees are bad pattern, plain and simple.
And what's the alternative? Do you write your own libraries to store and check password hashes complete with hash and salt functions? Roll your own google oauth flow? Your own user session management library?
It's madness on either side, the difference is `npm install` and pray allows you to actually get things done
A large standard library is a big part of the solution. Your project may pull in a crypto library that includes password hashing, and an oauth library, and a session management library, but all of those libraries will have few or no dependencies outside of the standard library.
Every time this discussion comes up about JavaScript ecosystem and the "problems" the solution everyone brings to the table is "Have a large standard library"
You know JavaScript doesn't have one, don't you? That is why this "issue" exists. Putting the cat back in the bag is impossible.
When vetting a dependency consider whether it depends on packages you already depend on, if new dependency tree is too large try breaking down desired functionality—multiple lower level smaller direct dependencies may have lighter overall footprint, take them and some built-ins and some of your own glue code and you get the same thing with fewer holes.
More tangentially, use persistent lockfiles and do periodic upgrades when warranted (e.g. relevant advisories are out) and check new versions getting installed.
You use a trusted standard library which has crypto functions (and lots of other helpers) and for small things you write your own.
Yes you can write your own things like session management, yes that is better than the entire web depending on a module for session management which depends on a module which depends on a module maintained by a bored teenager in Russia.
Please do check out other ecosystems, there is another way.
Using a small number of libraries, where each library provides a large amount of functionality. When I install Django, for instance, four packages are installed, and each package does a substantial amount of work. I don't have to install 1000 packages where each package is three lines of code.
When I'm writing a C program I can somehow depend on only one library for password hashing and one for oauth (maybe two if it also needs curl). In javascript land it's probably a couple dozen, probably from a couple dozen different people.
How many developers write C programs versus how many developers write JS apps?
Without accounting for that, your comparison makes no sense! Not even mentioning that you’re comparing two very different level languages. A low level language like C would never behave like a high er level language
Most of those dependencies have well defined, stable api. They use or at least try to follow semver. And you're probably only hitting about 10% of your dependencies on the critical path you're using, meaning that a lot of potentially vulnerable code is never executed.
I get the supply chain attacks. I get that you have a tree of untrusted javascript code that you're executing in your app, on install, on build and in runtime. But there's also Snyk and Dependabot which issue you alerts when your dependency tree has published CVEs.
We can talk about alert fatigue, but to be honest, I feel more secure with my node_modules folder than I do with my operating system and plethora of DLLs it loads.
I don't wanna turn this into a whataboutism argument, but at some point you gotta get to work, write some code and depend on some libraries other people have written.
> And you're probably only hitting about 10% of your dependencies on the critical path you're using, meaning that a lot of potentially vulnerable code is never executed.
If a dependency has been compromised it doesn't matter if its code is actually used, since it can include a lifecycle script that's executed at install-time, which was apparently the mechanism for the recent ua-parser-js exploit.
This is the direct result of the culture of tiny dependencies in JS and some other languages, but not all ecosystems are like this. If you choose to use node, this is where you end up, but it was a choice.
Many languages have a decent standard library which covers most of the bases, so it’s possible to have a very restricted set of dependencies.
I mean, you say that, but the practice of pulling in so many dependencies is fairly recent. It wasn't even possible for most projects before everyone had fast internet.
> It's a little hard to point the finger at application developers here, IMO.
I disagree. Any application developer who seriously thinks that they only have 10 dependencies if they're only importing directly 10 dependencies should not be an application developer in the first place.
You sure about that? Even if you’re writing just for a vetted distribution of an OS, and you write code with zero explicit dependencies, you still have much more than zero dependencies. It’s turtles all the way down. The key is to have an entire ecosystem that you can, to some degree more or less, trust.
No. we've been shouting warnings for years. There have been dozens, if not hundreds of threads on HN alone warning of supply-chain security threats.
At this point if you're not actively auditing your dependencies, and reducing all of them where you can, then you're on the wrong side of history and going down with the Titanic.
The frank truth is that including a dependency is, and always has been, giving a random person from the internet commit privileges to prod. The fact that "everyone else did it" doesn't make it less stupid.
> The frank truth is that including a dependency is, and always has been, giving a random person from the internet commit privileges to prod
I mean, no. This is hyperbole at best and just wrong at median. A system of relative trust has worked very well for a very long time - Linus doesn’t have root access to all our systems, even if we don’t have to read every line of code.
Linus doesn't have root access to our systems for several reasons. One of them is the fact that we get the actual source code, and not just a compiled blob doing "something". Another is the fact that they have at least some level of reviews wrt who can commit code, although this isn't perfect as the case with the University of Minnesota proved.
Npm on the other hand is much, much worse. Anyone can publish anything they want, and they can point to any random source code repository claiming that this is the source. If we look at how often vulnerable packages are discovered in eg. npm, I'd argue that the current level of trust and quality aren't sustainable, partly due to the potentially huge number of direct and transitive dependencies a project may have.
Unless you start to review the actual component you have no way to verify this, and unlike the Linux kernel there is no promise that anyone has ever reviewed the package you download. You can of course add free tools such as the OWASP Dependency Check, but these will typically lag a bit behind as they rely on published vulnerabilities. Other tools such as the Sonatype Nexus platform is more proactive, but can be expensive.
Maybe this is arguing semantics but unless you run something like Gentoo you will most likely get the linux kernel as a binary blob contained in a package your distribution provides. There isn't really any guarantee that this will actually contain untampered linux kernel sources (and in case of something like RHEL it most likely doesn't because of backports) unless you audit it, which most people won't do (and maybe can't do). So, in princpile at least, this isn't really that much better than the node_modules situation.
Security and trust are hard issues and piling on 100s of random js dependencies sure doesn't help but you either build everything yourself or you need to trust somebody at some point.
It depends on how you look at it. If I'm running Debian, I have decided to trust their sources and their process, regardless of how their software is being delivered. That process and the implementation of it is the basis for my trust. If I'm really paranoid, I can even attempt to reproduce the builds to verify that nothing has changed between the source and the binary blob.[1]
For npm, trust isn't really a concept. The repository is just a channel used to publish packages, they don't accept any responsibility for anything published, which is fair considering they allow anyone to publish for free. There are no mechanisms in npm that can help you verify the origin of a package and point to a publicly available source code repository or that ensures that the account owner was actually the person who published the package.
Security and trust is very hard, but my point here is that npm does nothing to facilitate either, making it very difficult for the average developer to be aware of any issues. The one tool you get with npm is...not really working the way it was supposed to.[2]
I 100% agree and I kind of wonder why this doesn't seem to be a problem with similar repositories like maven. That doesn't seem to hit HN every 1-2 weeks with a new security flaw/compromised package so they seem to be doing something right, whatever that may be.
It's likely to be a combination of several things. Npm is trendy and has a low threshold for getting started, plus the fact that adding eg. bitcoin miners to a website is a nice way to decentralize and ramp up mining capacity.
Maven on the other hand define several requirements, such as all files in a package being signed, more metadata and they also provide free tools the developers can use to improve the quality of a package.
You do need to trust somebody (such as your Linux distribution of choice) but with NPM you're trusting thousands of somebodies and your system's security depends directly on all of them being secure and trustworthy.
Linux has all sorts of controls and review policies that NPM doesn't have. It's a false equivalence to say "we trust Linux, so therefore trusting NPM is OK".
If <random maintainer> commits code to their repo, pushes it to npm, and you pull that in to your project (possibly as an indirect dependency), what controls are in place to ensure that that code is not malicious? As far as I can tell, there are none. So how is this not trusting that <random maintainer> with commit-to-prod privileges?
Yeah, this is what I meant, except it goes in all directions. It’s not stating a “false equivalence” because pointing out that you can draw a line between 0 and 100 isn’t stating an equivalence.
Different risk profiles exist. There’s a difference between installing whatever from wherever, installing a relatively well known project but with only one or two Actually Trusted maintainers, and installing a high profile well maintained project with corporate backing.
This is true in Linux land, and it’s true in npm land. You can’t just add whatever repo and apt get to your hearts content. Or, you know, you also can, depending on your tolerance for risk.
I agree with what you're saying, but I don't see any discussion of risk in any conversation about JS programming (and I'm only picking on JS because of the OP - Ruby and Python aren't any better, and even Rust is heading the same way).
For example (taking one of the top results for "javascript dependency management" at random): https://webdesign.tutsplus.com/tutorials/a-guide-to-dependen... talks about all the dependency management methods available. The word "risk" is not in that article. There is no paragraph saying "be aware that none of these package managers audit any of the packages they serve, and you are at risk of supply-chain attack if you import a dependency using any of them".
This doesn't get any better as you get more expert. I've had conversations with JS devs who've been professionally coding for years, and none of them are aware of it (or if they are, treat it as a serious threat). You can see the same in the comments here.
If there's not even any discussion of risk, and no efforts to manage it, then it's not really a relevant factor. No-one is considering the risk of importing dependencies, so the 0-100 scale is permanently stuck on 100.
And should we all start rolling our own crypto now to avoid dependencies? In most cases a stable library is going to be much more secure than a custom implementation of `x`. Everything has trade-offs. What's stupid is dogma.
I know you're being hyperbolic and I also want to add that for crypto you should just use libsodium. The algos and the code are very good. And lots of very smart folk have given it a lot of review. And its API is very nice.
When you say this, do you mean actual C libsodium? Because surely you don’t mean that I, a js developer, should need to figure out how to wrap this .h file thingy to get it to work in js when there’s SIX third-party libsodium implementations/wrappers/projects sitting right there listed on the libsodium website? /s
> Even if you’re writing just for a vetted distribution of an OS, and you write code with zero explicit dependencies, you still have much more than zero dependencies.
Sure, the entire OS is a dependency. Nothing I said contradicts that. And yes, every application developer should be aware of what they are depending on when they write software for a particular OS.
> The key is to have an entire ecosystem that you can, to some degree more or less, trust.
You don't necessarily need to trust an entire ecosystem, but yes, every dependency you have is a matter of trust on your part; you are trusting the dependency to work the way you need it to work and not to introduce vulnerabilities that you aren't aware of and can't deal with. Which is why you need to be explicitly aware of every dependency you have, not just the ones you directly import.
I am actually not sure if this is possible, while also accepting security updates etc from my OS distributor? How do you literally personally vet every line of code that gets run directly AND indirectly by your application, and still have time to write an application?
I’m okay with saying, “I trust RHEL to be roughly ok, just understand the model and how to use it, and keep my ear to the ground for the experts in case something comes up.”
At the level of npm, I feel roughly the same about React. I don’t trust it quite as much, but I’m also not going to read every code change. I’ll read a CHANGELOG, sure, and spelunk through the code from time to time, but that’s not really the same. I’ll probably check out their direct dependencies the first time, but that’s it.
I actually don’t know how you could call yourself an application developer in most ecosystems and know every single dependency you actually have all the way down, soup to nuts. Heck, there are dependencies that I accept so that my code will run on machines that I have no special knowledge of, not just my own familiar architecture. I accept them because I want to work on the details of my application and have it be useful on more than just my own machine.
Edit for clarity: I agree with almost everything you’re suggesting as sensible. Just not with your conclusion: that you’re not a “real” application developer if you don’t know all of your dependencies
> I am actually not sure if this is possible, while also accepting security updates etc from my OS distributor?
Accepting the OS as a dependency includes the security updates from the OS, sure.
> How do you literally personally vet every line of code
Ah, I see, you think "understanding the dependency" requires vetting every line of code. That's not what I meant. What I meant is, if you use library A, and library A depends on libraries B, C, and D, and those libraries in turn depend on libraries E, F, G, H, I, etc. etc., then you don't just need to be aware that you depend on library A, because that's the only one you're directly importing. You need to be aware of all the dependencies, all the way down. You might not personally vet every line of code in every one of them, but you need to be aware that you're using them and you need to be aware of how trustworthy they are, so you can judge whether it's really worth having them and exposing your application to the risks of using them.
> I’ll probably check out their direct dependencies the first time, but that’s it.
So if they introduce a new dependency, you don't care? You should. That's the kind of thing I'm talking about. Again, you might not go and vet every line of code in the new dependency, but you need to be aware that it's there and how risky it is.
> I actually don’t know how you could call yourself an application developer in most ecosystems and know every single dependency you actually have all the way down, soup to nuts.
If you're developing using open source code, information about what dependencies a given library has is easily discoverable. If you're developing for a proprietary system, things might be different.
I really appreciate your stance, but just have to disagree. If it’s core React, I don’t check beyond what curiosity mandates. If it’s a smaller project with less eyes on it, yes absolutely I’ll work through the dependency chain. But that can also get pretty context dependent, based on where the code is deployed.
But I don’t know how you can make such a strong distinction between “a committed line of code” vs “a dependency”, because the only thing differentiating them is the relative strength of earned trust regarding commits to “stdlib,” commits to “core,” commits to “community adopted,” etc.
It’s too much. There’s a long road of grey between “manually checks every line running on all possible systems where code runs and verifies code against compiled binary” and “just run npm install and yer done!”
I only imported 10 dependencies, but those 10 dependencies each had 10 dependencies which each had 10 dependencies which each had 10 dependencies and all of the sudden I'm at 10k dependencies again...
The transitive dependency chain should be part of your evaluation of a library. Frameworks are special cases, for sure. But if you’re adding a dependency and it adds 10,000 new entries to your lock file, that should be taken into consideration during your library selection process. Likewise, when upgrading dependencies, you should watch how much of the world gets pulled in.
That said, I don’t know what the answer is for JS. There are too many dependency cycles that make auditing upgrades intractable. If you’re not constantly upgrading libraries, you’ll be unable to add a new one because it probably relies on a newer version of something you already had. In most other ecosystems, upgrading can be a more deliberate activity. I tried to audit NPM module upgrades and it’s next to impossible if using something like Create React App. The last time I tried Create React App, yarn-audit reported ~5,000 security issues on a freshly created app. Many were duplicates due with the same module being depended on multiple times, but it’s still problematic.
That's going to be incompatible with writing interesting software on the web, unless we want to just hand the problem over to a handful of big players who can afford to hand-vet 10,000 dependencies.
The reason packages are so big is the complexity for an interesting app is irreducible. People don't import thousands of modules for fun; they do it because simple software tends towards requiring complex underpinning. Consider the amount of operating system that underlies a simple "Hello, world!" GUI app. And since the browser-provided abstractions are trash for writing a web app, people swap them out with frameworks.
I'm working on a React app right now where I've imported about a dozen dependencies explicitly (half of which are TypeScript @type files, so closer to a half-dozen). The total size of my `node_modules` directory is closer to a couple hundred packages. It's 35MB of files. And no, I couldn't really leave any of them out to do the thing I want to do, unfortunately.
People oftentimes do this, with suspicious reasoning. Classic examples:
1) "We have is-array as a dependency" Why? Well, pre Array.isArray, there wasn't anything built-in. Why not just write a little utility function which does what is-array does? See #3
2) "We have both joi and io-ts. Don't they do roughly the same thing?" They do; io object validation. New code uses io-ts, but a bunch of old code relies on joi. Should we update it? Eh we'll get around to it (we never do).
3) "is-array is ten lines of code. why don't we just copy-paste it?" Multiple arguments against this, most bad. Maybe the license doesn't support it. More usually; fear that something will change and you'll have to maintain the code you've pasted without the skills to do so. Better to outsource it (then, naturally, discount the cost of outsourcing).
4) "JSON.parse is built-in, but we want to use YAML for this". So, you use YAML. And need a dependency. Just use JSON! This is all-over, not just in serialization, but in UI especially; the cost analysis between building some UI component (reasonably understood cost) versus finding a library for it (poorly understood cost, always underestimated).
Not all dependency usage is irreducible. Most is. But some of it is born, fundamentally, out of a cost discount on dependency maintenance and a corporate deprioritization of security (in action; usually not in words).
The counterpoint is all the security issues generated when dev teams re-implement the already-well-implemented. Your points are valid, but as with anything, it is not cut and dry.
If your software is ultimately dependent on thousands of other modules from various developers all over the Internet, you have no idea whether what you're depending on is actually well implemented or not.
No. First, Linux is an entire operating system, not a single application. Second, when people pull software from their Linux distribution that ultimately comes from developers all over the Internet, they do it to use the software themselves, not to develop applications that others are going to have to deal with. Third, Linux distributions put an extra layer of vetting in between their upstream developers and their users. And for a fourth if we need it, I am not aware of any major Linux distribution that has pulled anything like the bonehead mistakes that were admitted to in this article.
> No. First, Linux is an entire operating system, not a single application.
Sorry, to clarify: when I say "Linux distro" here, I mean the distribution package sets, like Debian or Ubuntu.
> Second, when people pull software from their Linux distribution that ultimately comes from developers all over the Internet, they do it to use the software themselves, not to develop applications that others are going to have to deal with.
The distros are chock full of intermediary code libraries that people use all the time to build novel applications depending on those libraries, which they then distribute via the distro package managers. I'm not quite sure what you mean here... I've never downloaded libfftw3-bin for its own sake; 100% of the time I've done that because someone developed an application using it that I now have to deal with.
Conversely, I've also used NodeJS and npm to build applications I intend to use myself. It's a great framework for making a standalone localhost-only server that talks to a Chrome plugin to augment the behavior of some site (like synchronizing between GitHub and a local code repo by allowing me to kick off a push or PR from both the command line and the browser with the same service).
> Third, Linux distributions put an extra layer of vetting in between their upstream developers and their users.
This is a good point. It's a centralization where npm tries to solve this problem via a distributed solution, but I'm personally leaning in the direction that the solution the distros use is the right way to go.
When I'm writing desktop software, I don't have to worry about whether yaml adds a dependency that I can't afford to maintain.
People who develop web apps want that level of convenience. And if we can't solve the security problem in a distributed fashion, web development will end up owned by big players who can pay the money to solve the problem in a centralized fashion.
> When I'm writing desktop software, I don't have to worry about whether yaml adds a dependency that I can't afford to maintain.
Why not? Because some big, centralized player has put the time, effort, and money into making yaml part of a complete library that gives you everything you need to write desktop software. Nobody writes desktop software by importing thousands of tiny libraries from all over the Internet.
> That's going to be incompatible with writing interesting software on the web, unless we want to just hand the problem over to a handful of big players who can afford to hand-vet 10,000 dependencies.
Consolidating into a distro-management-style solution would be one option.
> why don't we just copy-paste it? ... Maybe license doesn't support it.
You did say the argument was bad, but a license that prevents you from making a copy manually but allows you to make a copy though the package manager isn't a thing, is it? In either case the output of your build process is a derived work that needs to comply with the license.
Unless, perhaps, you have a LGPL dependency that you include by dynamic linking (or the equivalent in JS – inclusion as a separate script rather than bundling it?) in a non-GPL application and make sure the end user is given the opportunity to replace with their own version as required by the license.
> The reason packages are so big is the complexity for an interesting app is irreducible
These kinds of claims demand data, not just bare assertions of their truthiness.
Firefox, as an app with an Electron-style architecture (before Electron even existed), was doing some pretty interesting stuff circa 2011 (including stuff that it can't do now, like give you a menu item and a toolbar button that takes you to a page's RSS feed), with a bunch of its application logic embodied in something like well under <250k LOC of JS.
The last time I measured it, a Hello World created by following create-react-app's README required about half a _gigabyte_ of disk space between just before the first `npm install` and "done".
That NPM programmers don't know _how_ to write code without the kind of complexity that we see today is one matter. The claim that the complexity is irreducible is an entirely different matter.
Firefox's 250k LOC are riding on the millions of lines of code of the underlying operating system and GUI | TCP | audio toolkits that it used. To compare it to npm development, you would need to factor in the total footprint of every package that you had to install to compile Firefox in 2011.
... And I think it's an interesting question to ask why we can trust the security of, say, Debian packages and not npm, given how many packages I have to pull down to compile Firefox that I haven't personally vetted.
> Firefox's 250k LOC are riding on the millions of lines of code of the underlying operating system and GUI | TCP | audio toolkits that it used.
Right, just like every other Electron-style app that exists. The comparison I made was a fair one.
> To compare it to npm development, you would need to factor in the total footprint of every package that you had to install to compile Firefox in 2011.
No, you wouldn't. That's a completely off-the-wall comparison.
How many lines of application code (business logic written in JS including transitive NPM dependencies before minification) go into a typical Electron app in 2021? Into a medium sized web app? Is the heft-to-strength ratio (smaller is better) less than that of Firefox 4, about the same, or ⋙?
After I compile my Rust or C app (and pull all attendant libraries to make that possible, spread all over my system) I’ve downloaded about 500MB of code. The resultant binary is 10MB.
If I do the same thing with my JS app, I still download a bunch of libraries, but puts them all in node_modules. That’s also about 500MB. The resulting compiled/built code is around 2MB.
> The reason packages are so big is the complexity for an interesting app is irreducible.
This is absolutely, demonstrably false. Can you really claim that you use 100% of the features provided by all of the dependencies you pull in? If not, you are introducing unnecessary complexity to your code.
That doesn't mean that this is necessarily a bad thing, or that we should never ever introduce incidental complexity—we'd never get anything done if that was the case. My point is simply that there exists a spectrum that goes from "write everything from scratch" on one end all the way to "always use third-party code wherever possible" on the other. It's up to you to make the tradeoff of which libraries are worth pulling in for a given project, but when you use third-party code, you inevitably introduce some amount of complexity that has nothing to do with your app and doesn't need to be there.
I don't use 100% of the features I pull in. But I also don't use 100% of the features of libc or gtk if I'm building a GUI app in C.
I have 35 MB of node_modules, but after webpack walks the module hierarchy and tree-shakes out all module exports that aren't reachable, I'm left with a couple hundred kilobytes of code in the final product.
> But I also don't use 100% of the features of libc or gtk if I'm building a GUI app in C.
That’s exactly my point. This is a tradeoff that’s inherent to software development and has nothing to do with the web or Node or NPM. You could just as well decide to write your desktop app with a much smaller GUI library, or even write your own minimal one, if the tradeoff is worth it to reduce complexity. (Example: you’re writing an app for an embedded device with very limited resources that won’t be able to handle GTK.)
> browser-provided abstractions are trash for writing a web app
This is the key.
If browsers would improve here we wouldn't need half of the dependencies that we use now. It took nearly a decade to get from moment.js to some proper usable native functions for example.
Besides that we _really_ need to solve the issue of outdated browsers. Because even when those native APIs exist we'll need fallbacks and polyfills and lots of devs will opt for a non-standard option (for various reasons).
The web is still a document platform with some interactivity bolted on top, I love it but it's a fucking mess.
Without more information this mindset is stuck where the web platform was maybe a decade or more ago. Roughly a dog or cat lifetime. Consider the list APIs at https://developer.mozilla.org/en-US/docs/Web/API I'd be curious to know if anyone active on HN could actually say they have proficiency with the entire list. Professionally speaking I wouldn't call that a mess. I'd call it a largely unused and unexplored opportunity.
Somehow people managed to develop useful software before NPM and node and so on, without having thousands of very small dependencies. Maybe it's because the stuff built in to Javascript is nearly useless? And the older languages had a standard library that included most of the useful stuff you'd need to build something?
Ruby, Python, Go, Rust, etc all have this exact same problem; it's not unique to NPM.
JS has a culture of using lots of small, composable modules that do one thing well rather than large, monolithic frameworks, but that's only an aggravating factor; it's not the root of the problem.
They do not, they have capable and trusted standard libraries and it’s quite possible to build a web app in those other languages without any external dependencies whatsoever.
JS and its culture of small dependencies that do one thing but import 100 other things to do that thing is the root of the problem here.
The GNU software ecosystem can be described as "culture of small dependencies that do one thing but import 100 other things to do that thing..." Installing, say, GIMP for the first time using `apt-get install` pulls in about 50 packages and many, many megabytes in total.
So the issue is probably something other than using bazaar-style code design. I think as other people in the thread have noted, the distros have centralized, managed, and curated package libraries that get periodically version "check-pointed" and this is not how npm works.
I may have my answer to the original thought I floated: the way this problem has been solved successfully is to centralize responsibility for oversight instead of distributing it.
Part of that was that we didn't make major changes to how we did things every other project back then. If we needed to do X and that wasn't built in to the language or standard library we were using we would either write our own X library or we could take the time to carefully evaluate the available third party X libraries and pick a high quality one to use. We could justify spending the time on that because we knew we'd be taking care of not just our immediate X needs but also the X needs for our next few years worth of projects.
That's going to be incompatible with writing interesting software on the web
Lots of people are writing interesting web software without these problems - the website you’re currently posting on is one example. So I completely disagree with this statement and think you need to examine your assumptions.
"Interesting" was a bad choice for specificity here on my part. By the definition I mean, HN isn't interesting... It's got interesting content, but the UI is a dirt-simple server-side-generated web form.
OpenStreetMap is "interesting." Docs and Sheets are "interesting." Autodesk Fusion 360 is "interesting." Facebook is "interesting." Cloud service monitoring graph builders are "interesting." The Scratch in-browser graphical coding tool is "interesting." Sites that are pushing the edge of what the browser technology is capable of are "interesting."
None of the sites you mention above would require npm to build.
At some stage after you've seen enough 'interesting' dependencies changing the world around your app as you write it you'll realise that boring is good for most of the tech you depend on - the more boring the better, and the fewer dependencies the better.
I have to think there's a lot of YAGNI going on, dependencies that are included to be a better version of native functionality. A faster JSON parser, say, with I dunno, 20 dependencies (a count which may further extend within those deps) for something where slow JSON parsing has not yet become an issue. I think there's a lot of "academic" inclusions out there like this.
My experience working on tens of front end projects is the complete opposite. Nobody is adding dependencies just for the fun of it, or because you might need it in a year. You add a dependency because you need some functionality and there is no time/budget to re-do it in house - not to mention that if it's a well-supported library with, for example, hundreds of thousands of users, it's unlikely you could even make it better.
What are the actual time cost savings when you take the total costs into consideration?[1][2] What would it look like if you didn't implement an app by stringing together dozens/hundreds/thousands of third-party modules implemented bottom-up, but instead took control of the whole thing top-down?[3]
I agree that using node to write browser client code requires more configuration of the compilation environment than I would like (especially since I have to configure both node and some kind of packer to convert all of my es6 module dependencies into one flat pack JavaScript file).
That's a small up-front one-time cost relative to writing Redux from scratch. And before anyone asks... Yes, our use case is complex enough to justify a local state storage solution based on immutable state curated via actions and reducers. Just as our rendering use case is complex enough to justify React.
Well, that's what I'm wondering. GNU/Linux distros like Debian and Ubuntu don't seem to suffer supply chain attacks, but it's not entirely clear to me why. Is it because the distros are more carefully curated, and the infrastructure for extending them older so it has had more time to wrestle security concerns to the ground?
Or is it, disquietingly, the possibility that they are completely vulnerable to this sort of attack and either nobody has noticed there compromised or attackers haven't decided that compromising a major desktop Linux distro is worth the time?
Distributions like Debian are _highly_ aware of supply chain attacks. That's one of the key reasons for projects like Reproducible Builds [0] and rekor [1] existing.
So yes, distributions are carefully curated, with a large team of experts vetting the system in a huge number of ways, and are always looking to improve upon them. Because attackers are actively attempting to compromise major distributions.
Unfortunately most modern JavaScript tooling has made this very difficult. Before you even have a "hello world" app running create-react-app et al. will install literally a thousand random packages. It's already over.
What’s the alternative? Writing everything in house? I think a better solution would be a better dependency installer/resolver that is as secure as possible.
Don't use the popular hype garbage. Yes, I realize that may not be an option for a lot of people professionally. But I believe if you actually spend some time on due diligence for any dependency you consider adding, you can significantly reduce the number of untrusted deps you pull in.
One of the problems of course is that javascript exacerbates this problem somewhat by not having a comprehensive standard library. But whenever I look for go libraries, go.sum is usually one of the first files I click to check how much garbage it pulls in.
Standard library is a dependency too and can have bugs in it. What's better - having stdlib tied to the runtime release schedule or having a lot of micro libraries on their own rolling release schedule which can quickly release security patches?
I agree, having those dependencies authored by Node.js Foundation itself will yield higher level of trust. But we're all human, and one can argue earnest open source developers have better aligned incentives than a randomly selected Node.js Foundation employee.
I honestly am not sure I fully agree with what I've just written above either. But one thing I would want to pinpoint: those things are NOT black and white. The specific set of trade offs the Node.js ecosystem fallen into might look accidental and inadequate. But I think it's fairly reasonable.
You’re not wrong, I’ll admit, but if we judge everything by the most extreme examples we’d still be writing assembly and only mathematicians would be programmers. I’m sure there’s a universe where that’s the case, and I’m sure there’s a percentage of people here who wish that were the case, but I’d say the world is better off with separation of skill sets and I’d rather leave the writing of libraries to people who enjoy writing them and can do it well.
How about we just go back to writing all the trivial stuff in house?
Nobody is suggesting we each write our own charting library, but we should each be capable of writing that function that picks a random integer between 10 and 15. Because the npm version of that function will have the four thousand dependencies that everybody likes to mock whenever npm is discussed.
Other People’s Javascript is generally pretty terrible. My policy is to only use it when absolutely necessary.
Frameworks and library authors could stand to do more in-house. It's also on devs to vet a library for maintenance concerns like sprawling dependencies.
Or a very large dep like apache commons in java that you can trust rather than one dependency for zip compression, one dependency for padding, one dependency for http error codes and so on ?
How do you police what your imports import? Serious question. Let's say I'm building a Discord app (as I want to do.) Well, either NPM or Python PIP to get one module - the discord module. But who knows how safe what it imports is. That's the point.
Are there stable dependencies from reputable companies that do the things I want without me vetting 10k submodule imports?
That's the crux of the matter. Server-side you can, and should, choose a different platform than Node.js but for the browser we're all stuck with JS. A more capable standard library, where vetting everything would be much more feasible, would do much to improve the situation.
Just a week or two ago, a malicious NPM package was published which, for the hour or so that it was up, would be pulled in by any installation of create-react-app, since somewhere in the dependency tree it was specified with “^” to allow for minor updates.
Any machine that ran “npm -i” with CRA or who knows how many other projects during that hour may have compromised credentials.
1 hour to find and unpublish the malicious package is a fast turnaround time, so someone was watching and that’s great. But any NPM tree that includes anything other than fully-specified and locked versions all the way down the tree is just waiting for the next shoe to drop.
This requires that you're pulling in only exactly the same versions of those dependencies as those that Facebook and Google have vetted. Is there a way to do that?
3. Restricting build scripts from touching anything outside of the build directory.
4. Pressuring organizations like npm to step up their security game.
It would be really nice if package repositories:
1. Produced a signed audit log
2. Supported signing keys for said audit log
3. Supported strong 2FA methods
4. Created tooling that didn't run build scripts with full system access
etc etc etc
I started working on a crates.io mirror and a `cargo sandbox [build|check|etc]` command that would allow crates to specify a permissions manifest for their build scripts, store the policy in a lockfile, and then warn you if a locked policy increased in scope. I'm too busy to finish it but it isn't very hard to do.
Thanks. I was thinking of a CI step that checked the SHA-256 of yarn.lock against a "last known good" value committed by an authorized committer and enforced by a branch policy.
Signed audit logs seem like a good idea.
Now...how to get developers to avoid using NPM and Yarn altogether on sensitive projects...
>How do we even mitigate against these types of supply-chain attacks
I know HN is usually skeptical of anything cryptocurrency/blockchain related, and I am too. But as weird as it sounds, I think blockchain might actually be the solution here.
The problem with dependency auditing is it's a lot of work. And it's also duplicate work. What you'd really like to know is whether the dependency you're considering has already been audited by someone you can trust.
Ideally someone with skin in the game. Someone who stands to lose something if their audit is incorrect.
Imagine a DeFi app that lets people buy and sell insurance for any commit hash of any open source library. The insurance pays out if a vulnerability in that commit hash is found.
* As a library user, you want to buy insurance for every library you use. If you experience a security breach, the money you get from the insurance will help you deal with the aftermath.
* As an independent hacker, you can make passive income by auditing libraries and selling insurance for the ones that seem solid. If you identify a security flaw, buy up insurance for that library, then publicize the flaw for a big payday.
* A distributed, anonymous marketplace is actually valuable here, because it encourages "insider trading" on the part of people who work for offensive cybersecurity orgs. Suppose Jane Hacker is working with a criminal org that's successfully penetrated a particular library. Suppose Jane wants to leave her life of crime behind. All she has to do is buy up insurance for the library that was penetrated and then anonymously disclose the vulnerability.
* Even if you never trade on the insurance marketplace yourself, you can get a general idea of how risky a library is by checking how much its insurance costs. (Insurance might be subject to price manipulation by offensive cybersecurity orgs, but independent hackers would be incentivized to identify and correct such price manipulation.)
The fact that there is actual value here should give the creator a huge advantage over other "Web 3.0" crypto junk.
This is a pretty clever application of DeFi, thanks. DeSec? Can't help but wonder if there still would be incentive for lone wolves to slip backdoors and vulnerabilities into libraries though[0].
> I can't help but wonder if the root cause was HTTP request smuggling, or if changing package.json was enough.
Maybe I'm just incredibly cynical from my experiences with the intersection of the JS ecosystem and security, but...
...I'd bet dimes to dollars it's the latter (just changing the package.json). My guess is they authenticate but don't actually scope the authentication properly, and no one noticed because no one thought to look.
Of course, as we've seen in the past decade, there's so much inertia behind the JavaScript ecosystem that none of this is going to fundamentally change. It'll just take another decade or so for the ecosystem to reinvent all of the wheels and catch up to the rest of the space.
And at that point it will probably be considered stuffy and "enterprise" and the new hotness unburdened from such concerns will repeat the cycle again.
The 'wheels' might simply be having a standard library and less number of packages instead of micropackage mess.
For example, look at django, it provides more functionality (though not directly comparable to) than react. But installation is quick and there are small number of packages from trusted authors.
The ecosystem is orthogonal to how good package manager is.
It’s a requirement for the central repo if I recall.
And the best past is the signature handling is a part of Java, not the package manager, so nothing needs to be re-invented. The default class loader checks the signatures at runtime as well.
Typically you need 1-2 repositories, but often just 1. But if you’re an organization, you can set up your own repository very easily and use it to store private deps and to cache deps (which also allows you to lock binaries and work offline). Repo mirroring is super easy to set up. If you have an internal repo, you can just have your internal project use your own repo and your computer never has to directly reach outside the Internet for a package.
Unlike other languages, the “central repo” and the package manager tooling are independent and package resolution is distributed. When you start a project, you choose your repos. I don’t know how quickly Sonatype would react personally but they are only default by de facto. Many packages are published on several repos and mirroring is a default feature of a lot of repo software. If Sonatype started screwing up, everyone could abandon them instantly, which forces them to be better.
I'm seriously considering moving to a workflow of installing dependencies in containers or VMs, auditing them there, and then perhaps commiting known safe snapshots of node_modules into my repos (YUCK). Horrible developer experience, but at least it'll help me sleep at night.
I have had people tell me in discussions online, also entirely seriously, that running a package manager to install a dependency while developing is inherently dangerous and anyone who does it outside of a disposable sandboxed VM deserves everything they get. If the packages are inexplicably allowed to do arbitrary things with privileged access to the local system without warning at installation time then clearly the first part is correct, but victim-blaming hardly seems like a useful reaction to that danger.
>How do we even mitigate against these types of supply-chain attacks, aside from disabling run-scripts, using lockfiles and carefully auditing the entire dependency tree on every module update?
Don't trust the package distribution system - use public key crypto.
Public key crypto doesn't help much if your private keys get stolen, which was essentially what happened with some of the recent hacked packages and which is why they're now starting to enforce 2FA.
The longer term solution to this is public key signatures with an ephemeral key, rooted to some trusted identity source (e.g., a GitHub account with strong 2FA). There’s lots of work on that front coming out of the Open Source Security Foundation.
It s very easy: add a dev signature in the repo that cannot be changed ever, and force the devs to sign their stuff before allowing a change of binary or a download.
Like that you can have anything trying to upload but fail the signature check.
Also: "This vulnerability existed in the npm registry beyond the timeframe for which we have telemetry to determine whether it has ever been exploited maliciously."
The part that made sure the user could update the package could have at least check if the payload is about that package before passing it to the service that trusted it.
That one, combined with the other “ability to read names of private packages, makes for the possibility of a really really sneaky attack. I wonder how many orgs treat their private npm packages with significantly less scrutiny than the public ones they rely on?
CVEs alert end users that they need to take action to apply updates. That's relevant when a specific npm package contained a known vulnerability. It's not relevant when the npm server contained a known vulnerability. There's nothing a user of npm can do to update the npm server.
CVEs don't just mean "this is a big security problem".
Isn't this the biggest security flaw in the package ecosystem ever?
They don't even know when, if, who and when this was exploited, but maybe I didn't pay enough detail attention to the few paragraphs devoted to the real problem.
So shoudn't we assume all NPM packages published prior to 2nd of November are compromised?
> Transparency is key in maintaining the trust of our community.
and yet a security incident where it was possible to publish any npm package without authentication is nine paragraphs down, and isn't alluded to at all in the page or section titles. I'm not sure that's entirely in the best spirit of transparency.
NPM keeps me up at night. We have a CRA with over 300k node_modules files and over 1700 dependencies. Just one compromised dep and suddenly someone else is driving your AWS/Heroku CLI, stealing your credentials, and etc. There was malicious dep version just a few weeks ago on an agent string parser.
Even if you don't do that yourself, the culture is such that lots of NPM authors would rather add a dependency — thinking nothing of the additional security risk due to package takeover, nor the cost to downstream users who might actually want to audit their dependencies.
The result is that instead of a dependency tree of consisting of a few packages or a few dozen, you end up with an unmanageable number like 1700 coming from who knows how many authors.
I would not completely blame it on culture. There are practical upsides to having a large number of packages - using a dependency instead of duplicating code means the resulting code is smaller, and splitting large packages up allows you to only include specific functions you need. This is important for the web, which is a lot more size sensitive.
Nowadays tree shaking means that having a large package with lots of smaller function should work better, especially since adding an import incurs its own overhead, but a lot of older packages are stuck on the small packages model.
I work in an org with a few tens of "microservices". Some are written in python, some in go, and some in js or typescript. The each yarn.lock file, individually, is bigger than the whole JS application it serves, and bigger than most of our microservices written in python or golang in their entirety (including their lock files which are like 40 to 100 lines). These js dependency trees are completely un-reviewable and absolutely absurd.
The theory about tiny libraries enabling tiny programs might work if developers were targeting microcontrollers, but developers have comparatively infinite cpu, memory, and disk, they've lost perspective, and the result is truly ABSURD. 10 big fat libraries are smaller than 2000 micro-libraries.
This is what I've always thought the reason was as well. The JS ecosystem (at least since NPM has been around) has been about giving you the ability to not have to pull in a bloated, unfocused library. The complaints people have of libraries like that aren't even exclusive to the web. People working with C++ have long taken issue with massive dependencies. If the library is shipped as a DLL, then you end up bringing along all the code inside whether you use it or not.
But methods of removing unused code do a lot to take care of the problems that people have with larger libraries. In C++, header-only libraries help with the deployment size since it will only compile in what you're using. In Javascript, tree shaking will have the same effect.
It seems that if the Javascript ecosystem isn't moving towards larger libraries, it probably should. It would be much nicer to have one large library from a trusted source instead of a thousand small ones from who knows who.
Even if that's the case, you're still shipping the whole library. It's not an uncommon complaint that the file size of applications is massive (here on HN anyway, outside of tech I don't know if people really take notice).
If I have a library that needs to handle a bunch of general cases, but I only need 1 or 2 of them - it's probably less code to just write out those cases myself.
As a trite example, look at the source code for `is-even`. It imports the is-odd package, and the is-odd package has a bunch of error checking (and imports a library "is-number" to check errors too!) before it returns `n % 2 === 1` to is-even just to be negated.
Now blow this insanity up to all your packages of various sizes and you have a tonne of useless code that nobody needs.
I see "tree shaking" as a common defense (too often IMO) from people arguing over bloated code and dependencies.
But in this case, tree shaking simply doesn't occur at all in this scenario since npm modules are installed on the backend, and tree shaking simply doesn't occur on the backend.
On top of that, have you had to go GitHub fishing to find a fork of an abandoned package? Try chasing down a version of an ffi module that doesn't keep ghosting.
The best answer I can provide is to nuke your entire software supply chain from orbit and start over with something new that doesn't require you to depend on potentially hundreds of arbitrary 3rd parties. Factor into this all of your tooling and infrastructure vendors as well.
You might not like our particular brand of medicine, but we are finding massive success in these supply chain matters with a pure Microsoft stack: .NET core/5/6, GitHub, Azure, VS2022, Server 2019, et. al. We also technically use SQLite, but no one has ever attempted to probe us on that vendor, and it is incorporated well enough into the Microsoft death star (Microsoft.Data.Sqlite) to pass as yet another defensive armament at this point. We avoided shitty javascript web stacks by using technologies like Blazor, or just hand-rolling a little vanilla javascript when required (the horror, i know).
The reason we like this path is that we now require virtually zero additional third party dependencies for building our B2B products. .NET6 covers nearly 100% of the functions we require. On top of this we have 1-2 convenience nugets like Dapper (StackOverflow, et. al.), but everything else is System.* or Microsoft.*. The only other 3rd party items we consume into the codebase come from vendors of our customers as part of our integration stack - typically in the form of WCF contracts or other codegen items. Now, we do hedge for the inevitable Microsoft framework churn by not getting too deep into certain pools like AspNetCore. For example, we have rolled all of our own authentication, logging & tracing middleware, since this is the area they seem most hellbent on changing over time.
Certainly, Microsoft has, can, and will drop the ball, but they also have a very long track record available to build [dis]trust upon. For us, we went with the trust path. If our customers run us through the due diligence gauntlet (and they will - we're in the banking industry), we can produce a snarky ~1 item list of vendors that makes life much easier for everyone involved. No one has ever given us a hard time for doing business with Microsoft. Typically, everyone we work with is also to some degree. Is this bad? Maybe. I am ambivalent about the whole thing because of how much energy they have apparently put into the open source side of things. I have actually been able to contribute to their process on GitHub and watch it come out the other side into a final release I could use to correct a problem we were having.
This is almost exactly the direction my team has gone. We use the MS stack (.NET Core, EF Core, MVC, SQL Server 2019, VS2019), we have rolled our own auth and logging, and we have written our own JS and CSS libs to handle browser interaction, adornment of basic input/select controls, and styling.
While it seems like a lot of work, IMHO, the tradeoff of using an ecosystem with such a massive attack surface (NPM) is simply a pill we couldn't swallow. For those of us building systems that actually NEED to be secured, the "convenience" of using NPM isn't worth being kept up at night thinking about all the ways your app could be fubar'd.
1)How much JS/CSS are we talking about for heavily interactive pages? Do you not even use light weight libraries like knockoutjs or backbonejs?
2)Have you gone down the Blazor route yet?
3)What kind of system yall are working on that requires this much security? I've worked for a very information sensitive department of one of the top International banks where it was all sorts of npm galore.
1) We built our own event driven system for JS interactions. It does exactly what we want, when we want and comes in under a few thousand lines of code. The data transfer happens via a wrapper we wrote around the standard Fetch API called FetchWithTimeout (thank you David Walsh [0]).
We also built our own animation library to handle things like graceful entry and exit transitions (e.g. when an item is deleted, it asynchronously swipes or fades out of view). All of this was done in vanilla JS/CSS. The only external library we used was a sub-1000 line library for toasting messages in the UI. We heavily extended this library and, hopefully, improved upon the original design. We also wrote our own CSS utility library. It's bare bones but it is exactly what we need.
2) We considered Blazor but went with MVC instead. Better suited to our skill set and definitely a bit more optimized when compared to Blazor (at least it was when we started our project).
3) We are building an internal-facing financial management system. We are heavy handed with our security approach but we have the time and the budget to be. We are in a unique situation where we have a lot of time in which to complete our project, so we can be really careful about building what we need. Also, since our application is internal, we completely bypass common user issues like browser compatibility (everyone uses the same browser) and complicated server infrastructure (we have sub 200 employees). It's a pretty fun project tbh.
I realize it looks like an odd contradiction, but doing highly-effective business on top of Microsoft products is sometimes a game of compromises and grey areas.
Even though Microsoft is ultimately responsible for NPM on the org chart, I still dont mind the entire ordeal from my current perspective. No one ever said we had to use 100% of Microsoft's product offerings. The nuance is in selective adoption and careful negotiation of roadmaps.
I will signal my displeasure with Microsoft's acquisition of the NPM ecosystem by simply disregarding it's existence and never electing the node workload at VS install time. That's all it really takes to entirely opt-out for us. I don't hold any principled grudges against an organization larger than most municipalities. There are a lot of stakeholders involved here.
I’ve worked with .net for 7 years and the “Microsoft” tag can really give you a false sense of security. I worked in the public sector, so we also took these things rather serious.
We’ve had far more security issues with the .net toolset than we have had with Python which is far more open. Most of them have been developer mistakes, because the update process for .Net is far less intuitive than it is for Python. So my developers haven’t always been on point with updates, getting caught in the act when our network team closed old TLS versions or similar.
But the biggest issues have been with libraries abandoned by Microsoft. Like when they wanted to move the world into Azure runbooks and this no longer needed their library for Windows Server Orchestrtion runbooks. Or the half finished libraries like everything involving on prem AD.
By comparison we’ve had absolutely no issues with Python. So I think this is more of a NPM issue than anything.
I spent the first 15 years of my career writing .NET programs. It really is a great stack, and some of the best tooling I’ve ever used. You do avoid the crazy 3rd party dependency hell that seems to engulf most other stacks. Go is also quite nice in that way, but I prefer C# and F# for sure.
I've been saying this since the early days of Node.js and npm... But nobody would listen; instead, they opted to go for the linux philosophy of encouraging many tiny composable dependencies... They went overboard with it..It doesn't make sense to use a dependency which is less than 100 lines of code. Not worth the risk.
"Tiny composable dependencies" only works in a high-trust environment. Modern internet-based computing hasn't been in that environment for at least 10 years--probably more.
We have achieved total replacement of our former Angular & RiotJS applications. We are also using Blazor in server-side mode, which is turning out to be a fantastic approach for all of our sensitive admin dashboards. Everything loads ~instantly and the latency is negligible for most of our use cases. VS tooling around razor components could be better, but seems to be improving over time.
Being able to directly inject C# services into the razor components is one of the most productive transitions we experienced. All of our JSON APIs got sent to the trashcan and we now have way more time to focus on more important stuff. Also, little things like subscribing to CLR events from components really make you feel empowered to move mountains. Some of our Blazor dashboards have absolutely incredible UX and it took almost nothing to make it happen because of how close you can get the backend to the HTML. You definitely have to change how you think about certain problems, but once you find a few patterns for handling edge cases (e.g. large file download/upload), you are set forever.
Our custom blazor javascript interop source file is still under 200 lines of code. Every other bit of js is provided by the framework. I would say we certainly got the benefit we were going for wrt minimzing javascript source in the codebase.
You are reducing something that is essentially a software issue in general to an ecosystem or a vendor issue. It's possible that the team that is responsible for this issue now works for Microsoft on, let's say, .NET. How secure do you feel now?
The wake up call for me was Heartbleed -- a serious flaw is potentially everywhere, and people who think they are protected because of their tech stack choice are living in the biggest bubble of them all.
Other ecosystems are not like JS, that is not to say they don't suffer from vulnerabilities, but vulnerabilities are not endemic and impossible to eradicate like those in the JS ecosystem.
Right, and they don't even confirm who might have seen that list, except that "the data on this service is consumed by third-parties who may have replicated the data elsewhere". So basically consider all private package names as of that date forever public.
That second issue is the kind that scares me. Be it Rust, Python, Node... public package managers have always seemed like a huge risk to me with how we just assume nothing nefarious will be installed b/c hey, npm repo said 11,000,000 downloads per week, so it can't possibly be dangerous?
I'm guilty of this: my latest Nuxt project has 47,000 dependencies. yarn audit helps, but can i even trust that since it is retroactive?
Does that count duplicates, i.e. if a thousand different packages depend on exactly the same version of some package X, you get a thousand copies and count it a thousand times?
Otherwise I can't fathom how it's possible for a project to have 47000 dependencies. I mean, my main Linux machine has all kinds of old garbage installed and still the package manager only lists 2000 packages.
Yes, it does include dupes, that's why I use yarn instead of npm. That being said, sometimes it is multiple versions of the same package, so yes-and-no.
It's not really conceptually different from relying on third party libraries in any context.
I haven't touched JavaScript since the late 90s, so I dunno what the hell's going on there, but in my C++ projects I typically have 10-20 dependencies (counting modularized Boost as one). They're either built by a custom script which includes the SHA256 of the tarballs it expects, or by a particular pinned commit of vcpkg which likewise uses SHA512 to verify its downloads.
I generally only update these when I need a new feature or bug fix, which means I'm unlikely to get bitten by any temporary security compromise.
If the "particular checkout of vcpkg" type of approach is impossible with other package managers, that's unfortunate.
> I haven't touched JavaScript since the late 90s, so I dunno what the hell's going on there,
Well you're in for a surprise: the entire web is built on JavaScript for one thing. And that is build on frameworks which are built on ... other frameworks, which are built on a ginormous repository typically accessed by npm/yarn.
npm modules aren't the same as boost. Boost is written and scrutinized by some of the best C++ minds on the planet.
npm modules are written by anyone. they are all open source, but so many are in use that i doubt they get the scrutiny they deserve. at one point there was a package just to left-align things and a bug in it broke thousands of services.
but that's the landscape the modern web is built on, for better or for worse.
Fortunately this is the default in JavaScript world with both Yarn and NPM supporting lockfiles which have hashes and pinned versions. The problem is the sheer volume of dependencies and transient dependencies which makes it hard to reliably audit those, as updating one thing can cause a lot of work.
This is probably the worst security problem ever in the JS ecosystem. Any npm package could be corrupted, and we wouldn't even know it if the original maintainers don't pay attention to new releases anymore.
Still some people argue if this deserves its own CVE.
the other one that gives me the screaming heeby-jeebies is the wave of maintainers that are going to get bored of being abused for no pay and sell their maintainer rights to malware authors. Or to seemingly-nice people who will then sell it to malware authors.
Though this does give them a shortcut.No need to bribe some aging, disenchanted nerd to sell their soul when you can just impersonate them.
Linux distros and Mobile Apps rarely see these issues for one simple reason: packages must all be signed by a vetted maintainer before they are ever submitted and every client has the ability to verify signatures came from approved maintainers who hold the signing keys.
Phishing, bad 2FA, and vulnerabilities of the central repo upload path itself all go away with this simple tactic used by all sane package managers.
Someone PRed this exact same effective strategy to NPM in 2013, and it was refused even as -optional-.
NPM team members have ignorantly maintained that hashing packages is good enough. They insist on being a central authority for all packages with no method to strongly authenticate authors and this negligence has repeatedly endangered millions.
Meanwhile Debian and other community Linux distros maintain, sign, and distribute hundreds of popular NodeJS packages themselves now because they realize it would be negligent to risk having NPM in their supply chain.
Maybe it's just me, but the whole handling of this has been much more Microsoft-ey than usual. Is this a sign the MS culture is slowly seeping into GH?
They didn't even have telemetry data before Microsoft and this flaw has been sitting there probably since first day. If you look closely, MS might be improving it, but I don't think MS deserves credit nor blame here.
Hm. I assumed that the "telemetry" they were talking about was on their own servers (measuring uploads, downloads, etc.) but given that it was introduced after the Microsoft acquisition, is this telemetry actually client-side, like the telemetry[1] they put in the .NET Core dev tools?
Could we not move to a strategy where authors have to cryptographically sign packages with their own package-specific private key when publishing them?
You then have to manually add the public key for a given package to your package.json so it can verify a tarball came from the author/source you expect.
This won't solve problems where the author is malicious, but it helps other cases.
> Could we not move to a strategy where authors have to cryptographically sign packages with their own package-specific private key when publishing them?
I'm sorry the NPM ecosystem doesn't do this already? Good god!
In NPM's minor defence, I don't know of any contemporary registry that does.
If I had to guess, the registry operator probably either sees this as friction to onboarding, or if they do support signatures, they'd probably rather sign it themselves.
These are both stupid. The author should be responsible for signing, the registry should never see the key, and the registry should require 2FA to log in and set the public key for a package for users to discover.
> we can say with high confidence that this vulnerability has not been exploited maliciously during the timeframe for which we have available telemetry, which goes back to September 2020.
The preceding sentence says they have no idea if it was exploited before the start of their logfiles:
"This vulnerability existed in the npm registry beyond the timeframe for which we have telemetry to determine whether it has ever been exploited maliciously."
So packages uploaded after September 2020 are probably fine.
Before that: ¯\_(ツ)_/¯
If NPM/Github were being responsible here, they would make package owners re-upload clean copies of anything which hasn't been touched since before the start of their audit logs.
> If NPM/Github were being responsible here, they would make package owners re-upload clean copies of anything which hasn't been touched since before the start of their audit logs.
I’m surprised more isn’t being said about this part. Any stale dependency is now untrustworthy and they all need a version bump to prove provenance. This is potentially something GitHub could protect against server-side for everybody or build into NPM. They know if a version was published before this date and can stop people from using them.
It should be possible to generate a list of package versions uploaded before September 2020, sorted by number of weekly downloads, to empower users of those packages to upgrade to a newer version (if there is one) or file an issue against that package asking for a release of a new patch version (if there isn't).
Unfortunately, the number of weekly downloads wouldn't give much indication of how many people were affected, since some of the downloads will be by bots or eager CI systems, and some organisations cache packages locally after the first download.
That's great, thank you! Does it check recursive dependencies, and could you make it work against a package you haven't installed yet, by specifying the package name as an argument, rather than it only looking at what's already installed?
If that's possible, it would be really good to then run it against a list of popular packages, like [1] or [2], and report back which packages are the highest priority for getting version bumps (or at least for having someone manually check that the code in the package matches the code in its repo, which we assume an attacker didn't have control over).
It's cynical, I know, but can't keep from wondering whether your npm package has already been compromised by another hidden flaw to prevent us from finding out which older ones are at risk?
Still is a good indicator, as you can assume that if some bug was exploited a long time ago, it’s very likely to continue to be exploited in the present / until it is fixed.
Also, if an attacker exploited this bug to upload a patched version N+1 after a legitimate version N was published, there's a good chance that the legitimate developer would eventually also try to release version N+1, and NPM would report the clash to them.
An attacker would have to get very lucky, exploiting this bug just up until the point when the logs started (which they had no way to predict), and to target only packages which have either never been updated since, or which were followed by a minor/major package update (not a patch).
I'd say npm itself probably use npm packages internally? What if they were already compromised through this flaw to avoid these red flags from popping up?
The attacker might be monitoring their logs, selectively silencing version clashes. Heck, it's even possible they now have backdoor access to do whatever they want to any package.
I know it's cynical thinking, but this vulnerability was unbelievable and the way they're handling is definitely not reassuring, from my personal standpoint.
This was common knowledge among Perl devs. Every place I've worked that used CPAN did this. No one was pulling down random versions of random packages off the interwebs like a lunatic. I was amazed NPM didn't even have checksums a few short years ago. Every security incident or fiasco (remember unpublished packages??) I've simply nodded and said: yup. That was obviously going to happen.
Got to be honest I tended to use the distribution packages for Perl back in the day. That would have been Debian or FreeBSD ports back then. If the module was missing I would shrug it and make do. This cultural approach came from a place I work which was airgapped so we had a local package mirror server which was loaded from Debian CDs.
Also no distracting internet or Google and you had only the man pages to work off.
I really don’t like the culture of ”download any old shit off the internet, ram it in a container and throw it into production”. It keeps me awake. One day the whole thing will come crashing down and instantly spawn a costly magic enterprise solution which will cost a fortune just to mitigate that risk which doesn’t actually mitigate it all just allow the box to be ticked on a compliance form.
Wow. Really bad news. Is there a way to automate looking at packages published pre September 2020 and compare the contents to publicly available repositories? It wouldn’t cover all possible malicious packages but it seems like it would be a start.
Package content can't be compared to a repository.
1. npm doesn't record the repo, branch, or commit so it doesn't know what to compare with.
2. Published content is usually a transformation of the repo content -compiled, minified, bundled. You would have to run the same transformation on the repo source and it would have to be a deterministic build.
npm could require that popular packages are published via GitHub actions, then it could strongly associate the published version with a source commit and the build that produced the artifact.
This has some downsides like tie in to the GitHub ecosystem. Maybe that could be offset with sponsored builds?
A research paper "Investigating The Reproducibility of NPM Packages"[0] found that "Among the 3,390 versions of the 226 packages, only 2,087 versions are reproducible". (Coincidentally that paper was published in September 2020 too).
There was an earlier project called "Trust But Verify"[1] which tried to detect discrepancies between published NPM packages and the corresponding (inferred) tag/commit in the source repository, but sadly it doesn't seem to have gained traction.
Interestingly, the "Gold" standard of the Core Infrastructure Initiative (CII) Best Practices guide[2] is to have a reproducible build, but it is only "SUGGESTED that projects identify each release within their version control system", so presumably packages are free to indicate which commit they are built from in an ad hoc out-of-band way, which may not be amenable to automated third-party checking.
How is everyone dealing with this? I have no idea how to begin auditing our dependencies for an issue like this.
GitHub really comes across looking like total garbage here with this blog post. Security issues shouldn’t be hidden like this. This is dishonest and irresponsible.
By getting off NPM entirely and new projects internally banned from using it from day ome. Luckily all the backend stuff isn't in javascript which is 80% of the battle.
NPM modules can run arbitrary code at install time using the privileges of the user doing the install. So, whatever malicious actions such a user could take, an NPM module could take.
Fortunately there is an option (--ignore-scripts) that prevents all code from running at install time, and there are solutions if specific scripts do need to be run. Such examples are so rare, though, that there is an active proposal to make this option the default.
If you don't trust the scripts, you don't trust the code. Although this limits one attack vector, the issue is just kicked down the road to `import`/`require` time.
It does reduce the attack surface a little, though. For example, if you install a package A which depends on B for some obscure feature, and B gets compromised, but you never use A in a way that imports/requires the code in B, then you can potentially dodge that landmine.
Similarly, if you are downloading npm packages that provide frontend-only code, that is only run in the context of the browser's sandbox, then you don't have to worry about arbitrary code execution (although a malicious frontend package could still exfiltrate user passwords, among other things).
Yeah it's definitely an improvement, but there needs to be something more.
The way dependencies move depending on when you run a yarn/npm install has never been useful. Both for projects initialising a lock, and projects upgrading from a previous locked position.
The greatest programming fallacy is blindly trusting someone else to have correctly invented "the wheel" before. Shockingly few people really know what they're doing.
> Shockingly few people really know what they're doing.
Unfortunately that includes the people who think their NIH wheel will be much better than the existing one. In fact, the Dunning-Kruger effect would suggest that people who don't know what they're doing are disproportionately likely to be in the latter group.
> In this architecture, the authorization service was properly validating user authorization to packages based on data passed in request URL paths. However, the service that performs underlying updates to the registry data determined which package to publish based on the contents of the uploaded package file
I have a similar story . At my old job, we had a web socket gateway that authenticated using JWTs , then hit an internal service to request REST resources. The issue was that it didn’t actually validate the requested REST resource URL; a malicious user could authenticate as themselves but request a resource for any other account.
I found it as I was getting up to speed on the code base, having recently switched teams. Funnily enough, nobody on the team really understood the vulnerability - the EM marked it low priority and wanted the team to work on other things. I had to essentially go directly to the security team and convince them it was a sev 1. I sometimes wonder if it’s easier to just report security issues as an outsider through the bug bounty program; internal reports don’t seem to get taken as seriously.
This is a problem. But comparatively not many high-profile break-ins have been due to npm (in spite of its massive popularity).
If you picked Node, common best practices need to be followed for production:
1) Thoroughly vet dependencies you bring in. Pin to exact patch versions in package.json. npm-shrinkwrap.
2) Add code in tree when possible (prefer StackOverflow snippets over npm).
3) Prefer light modules with fewer features. Always check their dependencies.
There will still be some vulnerabilities from time to time. But if you've been somewhat careful, most of them tend to happen among devDependencies, which is usually safe. To go further, cut out devDependencies too; for example use a shell script along with esbuild instead of bringing in a heavy-duty bundler.
I would say keep an eye on dev dependencies which are a part of the build process. Malicious code there could easily append malicious code to the production build, even if you have a deployment server setup. This is just as dangerous as a malicious production dependency.
Be careful with stackoverflow though. Some highly upvoted answers are sometimes broken or even dangerous. But reading the tiny comments and other answers is usually enough to identify a better solution.
> We determined that this vulnerability was due to inconsistent authorization checks and validation of data across several microservices that handle requests to the npm registry. In this architecture, the authorization service was properly validating user authorization to packages based on data passed in request URL paths
Would 2FA have prevented that? If not then them loudly promoting 2FA before even mentioning this issue seems odd.
Well, it may not be directly related with this issue, I think that it's going to be harder for individual developer. More and more things are needed to be considered.
Maybe at some point, only large enterprise can afford to do proper development.
I worry about these second-order effects too. Is the eventual aim that 2FA will tie packages to people's government IDs (either via their phone number, or some sort of biometric system)?
Before we start gatekeeping Free Software development with a "trusted developer" system, we need to make sure that this system supports pseudonymous identities, so that people don't have to dox themselves (even to their own government) to be able to make contributions.
Does the kind of flippant nature of packet management in js ecosystem origins in the idea of js use cases being something "not so important"? And when js expanded on backend the package management approach was inherited?
There was no package manager for JS before JS began to become popular on the backend with the release of node. NPM stands for node package manager after all.
There's more:
> vulnerability that would allow an attacker to publish new versions of any npm package using an account without proper authorization.
> We determined that this vulnerability was due to inconsistent authorization checks and validation of data across several microservices that handle requests to the npm registry. In this architecture, the authorization service was properly validating user authorization to packages based on data passed in request URL paths. However, the service that performs underlying updates to the registry data determined which package to publish based on the contents of the uploaded package file.
This seems like a much bigger deal. Disclosing private names is not ideal but I think you have to assume your namespace and package names will leak at some point. In my opinion you should prepare for this well ahead of time by ensuring your organization uses a unique namespace/org that matches your internal/private namespace/org and squat on it. This will prevent a supply chain attack where they take a leaked namespace/package, register it and publish packages with higher versions.
Ok, we've replaced the title with the relevant subheading from the article. Thanks!
The main article title is such corpspeak that it would go against the site guidelines to use it—that sort of corporate press release title is typically misleading, linkbait, or both.
Commitment to security = security was broken. It's similar to "we value your privacy" and or "announcements about the future of the project". Finest newspeak.
It's not necessarily more or less secure. You load libraries by simply linking ES Modules from your local files or the internet; there's no central repo, you link the exact URL. So it's as secure or insecure as your source.
Some people say it's insecure because, well, if you get your modules from a compromised site it is.
However recent events have shown there's also plenty of danger with centralized repos as well.
The advantage to Deno's approach is it encourages using CDNs, which allow that code to load quicker especially if you're using edge functions or something like CloudFlare workers. It also requires you to be super explicit about where the code's coming from. The negative is that many people will download code from dodgy websites then complain later.
Yes. While Deno Deploy runs on the edge and they seem to encourage a very modern "serverless" workflow, you can also just put it on a server just like Node, self-host, whatever.
1. Deno comes with a standard library that has been audited by the Deno maintainers
2. The Deno runtime runs with limited permissions by default, so you have to allow access to file system locations or network hosts explicitly
Now, the repository part is arguably not as big of an improvement (if at all). There’s a central repository for third party modules. One nice aspect is it just mirrors code, so you can see what you’re getting before downloading (and, in NPM’s case, running install scripts, if you forgot —-ignore-scripts).
Deno also suggests using CDNs, but they often just host NPM modules that have been transformed into ES modules. That seems worse than NPM, in my opinion.
This is dumb. Node and npm had years to solve this issue. Really disrespectful towards the entire ecosystem. I get it: open source, bla bla, but get your horses together and fix this already
It came with the actual title, although arguably not the correct one.