Skip the nonsense and just check your dependencies in directly to your repo. The...

ncallaway · on Oct 22, 2021

> I run into this crap all the time to the point that people who claim it isn't a problem I know have to be lying.

I don't think that's right.

Just because someone denies a problem exists—a problem that you know for a fact, with 100% certainty exists—doesn't mean they're lying.

It may mean you know they are wrong, but wrong != lying, and it's a good thing to keep in mind.

If you have external reasons to believe that the person you're talking to should or does know better, then it's fair to say they are lying.

But, in general, if you accuse someone who is simply wrong to be lying, you're going to immediately shut down any productive conversation that you could otherwise have.

Osiris · on Oct 23, 2021

People don't do this because `node_modules` can be absolutely massive (hundreds of megabytes or more), and a lot of people don't like (for various reasons) such large repositories.

There is a deprecated project at my work that committed the entire yarn offline cache to the repo. At least those were gzipped, but the repo still had a copy of every version of every dependency.

It isn't a good long term solution unless you really don't care at all about disk space or bandwidth (which you may or may not).

rsj_hn · on Oct 23, 2021

A middle ground that I've seen deployed is corporate node mirrors with whitelisted modules. Then individual repos can just point to the corporate repo. Same thing for jars, python packages, etc.

pacifika · on Oct 23, 2021

And build pipelines that fail due to the size of the repo.

lhorie · on Oct 23, 2021

Committing node_modules and reproducibility are somewhat not orthogonal though.

You can get reasonable degrees of reproducibility by choosing reasonable tools: Yarn lets you commit their binary and run that in the specified repo regardless of which version you have installed globally. Rush also allows you to enforce package manager versions. Bazel/rules_nodejs goes a step further and lets you pin node version per repo in addition to the package manager. Bazel+Bazelisk for version management of Bazel itself provides a very hermetic setup.

Packages themselves are immutable as long as you don't blow away your lockfile. I used to occasionally run into very nasty non-reproducibility issues with ancient packages using npm shrinkwrap (or worse, nothing at all), but since npm/yarn got lockfiles, these problems largely went away.

These days, the non-hermeticity stuff that really grinds my gears is the very low plumbing stuff. On Mac, Node-GYP uses xcode tooling to compile C++ modules, so stuff breaks with MacOS upgrades. I'm hoping someone can come up with some zig-based sanity here.

As for committing node_modules, there are pros and cons. Google famously does this at scale and my understanding is that they had to invest in custom tooling because upgrades and auditing were a nightmare otherwise. We briefly considered it at some point at work too but the version control noise was too much. At work, we've looked into committing tarballs (we're using yarn 3 now) but that also poses some challenges (our setup isn't quite able to deal w/ a large number of large blobs, and there are legal/auditing vs git performance trade-off concerns surrounding deletion of files from version control history)

pwdisswordfish0 · on Oct 23, 2021

Ill-advised tool adoption is exactly the problem I'm aiming to get people to wake up and say "no" to. You need only one version control system, not one reliable one plus one flaky one. Use the reliable one, and stop with the buzzword bandwagon, which is going to be a completely different landscape in 4 years.

> Packages themselves are immutable as long as you don't blow away your lockfile

Lockfiles mean nothing if it's not my project. "I just cloned a 4 year old repo and `npm install` is failing" is a ridiculous problem to have to deal with (which to repeat, is something that happens all the time whether people are willing to acknowledge it or not). This has to be addressed by making it part of the culture, which is where me telling you to commit your dependencies comes from.

johnsoft · on Oct 23, 2021

This problem does happen, but committing node_modules won't fix it. Assuming the npm registry doesn't dissapear, npm will download the exact same files you would have committed to your repo. Wherever those files came from, 4 years later you upgraded your OS, and now the install step will fail (in my experience usually because of node-gyp).

Unless you were talking about committing compiled binaries as well, in which case every contributor must be running the same arch/OS/C++ stdlib version/etc. M1 laptops didn't exist 4 years ago. If I'm using an M1 today, how can I stand up this 4 year old app?

The real problem is reproducible builds, and that's not something git can solve.

jasonkester · on Oct 23, 2021

Assuming the npm registry doesn't dissapear, npm will download the exact same files you would have committed to your repo

I think that’s where you missed what the gp is saying.

If everything is checked in to source control, npm will have nothing to download. You won’t need to call npm install at all, and if you do it will just return immediately saying everything is good to go already.

The workflow for devs grabbing a 10 year old project is to check it out then npm start it.

johnsoft · on Oct 23, 2021

You are probably not running the same OS/node/python version as you were 10 years ago. If you were to try this in real life, you'd get an error like this one. https://stackoverflow.com/questions/68479416/upgrading-node-....

The error helpfully suggests you:

>Run `npm rebuild node-sass` to download the binding for your current environment.

Download bindings? Now you're right back where you started. https://cdn140.picsart.com/300640394082201.jpg

Of course if you keep a 10 year old laptop in a closet running Ubuntu 10.04 and Node 0.4, and never update it through the years, then your suggestion will work. But that workflow isn't for me.

lhorie · on Oct 23, 2021

> which to repeat, is something that happens all the time

Are you sure this isn't just a problem in your organization? As I qualified, the issue you're describing was a real pain maybe two or three years ago, but not anymore IME. For context, my day job currently involves project migrations into a monorepo (we're talking several hundred packages here) and non-reproducibility due to missing lockfiles is just not an issue these days for me.

As the other commenter mentioned, node-gyp is the main culprit of non-reproducibility nowadays, and committing deps doesn’t really solve that precisely because you often cannot commit arch-specific binaries, lest your CI will blow up trying to run mac binaries

pwdisswordfish0 · on Oct 23, 2021

> Are you sure this isn't just a problem in your organization?

I'm really struggling to understand the kind of confusion that would be necessary in order for this question to make sense.

Why do you suspect that this might be a problem "in [my] organization"? How could it even be? When I do a random walk through projects on the weekend, and my sights land on one where `npm install` ends up failing because GitHub is returning 404 for a dependency, what does how things are done in my organization have to do with that?

I get the dreadful feeling that despite my saying "[That] means nothing if it's not my project", you're unable to understand the scope of the discussion. When people caution their loved ones about the risk of being the victim of a drunk driving accident on New Years Eve, it doesn't suffice to say, "I won't drink and drive, so that means I won't be involved a drunk driving accident." The way we interact with the whole rest of the world and the way it interacts with us is what's important. I'm not concerned about projects under my control failing.

> non-reproducibility due to missing lockfiles is just not an issue

Why do you think that's what we're talking about? That's not what we're talking about. (I didn't even say anything about lockfiles until you brought it up.) You're not seeing the problem, because you're insisting on trying to understand it through a peephole.

lhorie · on Oct 24, 2021

I mean, of course I'm going to see this from the lenses of my personal experience (which is that nasty non-reproducibility issues usually would only happen when someone takes over some internal project that had been sitting in a closet for years and the original owner is no longer at the company). Stumbling upon reproducibility issues in 4 year old projects on Github is just not something that happens to me (and I have contributed to projects where, say, Travis CI had been broken in master branch for node 0.10 or whatever) and getting 404s on dependencies is something I can't say I've experienced (unless we're talking about very narrow cases like consuming hijacked versions of packages that were since unpublished) or possibly a different stack that uses git commits for package management (say, C) - and even then, that's not something I've run into (I've messed around w/ C, zig and go projects, if it matters). I don't think it's a matter of me having a narrow perspective, but maybe you could enlighten me.

As I mentioned, my experience involves seeing literally hundreds of packages, many of which were in a context where code rot is more likely to happen (because people typically don't maintain stuff after they leave a company and big tech attrition rate is high, and my company specifically had a huge NIH era). My negative OSS experience has mostly been that package owners abandon projects and don't even respond to github issues in the first place. I wouldn't be in a position to dictate that they should commit node_modules in that case.

Maybe you could give me an example of the types of projects you're talking about? I'm legitimately curious.

pwdisswordfish0 · on Oct 25, 2021

> I don't think it's a matter of me having a narrow perspective, but maybe you could enlighten me.

I have to think it is, because the "shape" of your responses here have been advice about things that I/we can do to keep my/our own projects (e.g. mission critical or corporate funded stuff being used in the line of business) rolling, and completely neglected the injured-in-collision-with-other-intoxicated-driver aspect.

Again, I _have_ to think that your lack of contact with these problems has something to do with the particulars of your situation and the narrow patterns that your experience falls within. Of the projects that match the criteria, easily 40% of them are the type I described. (And, please, no pawning it off with a response like "must be a bad configuration on your machine"; these are consistent observations over the bigger part of a decade across many different systems on different machines. It's endemic, not something that can be explained away with the must-be-your-org/must-be-your-installation handwaving.)

> code rot is more likely [...] My negative OSS experience has mostly been that package owners abandon projects and don't even respond to github issues

Sure, but the existence of other problems doesn't negate the existence of this class of problems.

> I wouldn't be in a position to dictate that they should commit node_modules in that case.

Which is why I mentioned that this needs to be baked in to the culture. As it stands, the culture is to discourage simple, sensible solutions and prefers throwing even more overengineered tooling that ends up creating new problems of its own and only halfway solves a fraction of the original ones. (Why? Probably because NodeJS/NPM programmers seem to associate solutions that look simple as being too easy, too amateurish—probably because of how often other communities shit on JS. So NPMers always favor the option that looks like Real Serious Business because it either approaches the problem by heaping more LOC somewhere or—even better—it involves tooling that looks like it must have taken a Grown Up to come up with.)

> Maybe you could give me an example of the types of projects you're talking about? I'm legitimately curious.

Sure, I don't even have to reach since as I said this stuff happens constantly. In this case, at the time of writing the comments in question, it was literally the last project I attempted: check out the Web X-Ray repo <https://github.com/mozilla/goggles.mozilla.org/>.

This is conceptually and architecturally a very simple project, with even simpler engineering requirements, and yet trying to enter the time capsule and resurrect it will bring you to your knees on the zeroeth step of just trying to fetch the requisite packages. That's to say nothing of the (very legitimate) issue involving the expectation that even with a successful completion of `npm install` I'd still generally expect one or more packages for any given project to be broken and in need of being hacked in order to get working again, owing to platform changes. (Several other commenters have brought this up, but, bizarrely, they do so as if it's a retort to the point that I'm making and not as if they're actually helping make my case for me... The messy reasoning involved there is pretty odd.)

lhorie · on Oct 26, 2021

> check out the Web X-Ray repo <https://github.com/mozilla/goggles.mozilla.org/>.

Thanks for the example! Peeking a bit under the hood, it appears to be due to transitive dependencies referencing github urls (and transient ones at that) instead of semver, which admittedly is neither standard nor good practice...

FWIW, simply removing `"grunt-contrib-jshint": "~0.4.3",` from package.json and related jshint-related code from Gruntfile was sufficient to get `npm install` to complete successfully. The debugging just took me a few minutes grepping package-lock.json for the 404 URL in question (https://github.com/ariya/esprima/tarball/master) and tracing that back to a top-level dependency via recursively grepping for dependent packages. I imagine that upgrading relevant dependencies might also do the trick, seeing as jshint no longer depends on esprima[0]. A yarn resolution might also work.

I'm not sure how representative this particular case is to the sort of issues you run into, but I'll tell you that reproducibility issues can get a lot worse in ways that committing deps doesn't help (for example, issues like this one[1] are downright nasty).

But assuming that installation in your link just happens to have a simple fix and that others are not as forgiving, can you clarify how is committing node_modules supposed to help if you're saying you can't even get it to a working state in the first place? Do you own the repo in order to be able to make the change? Or are you mostly just saying that hindsight is 20-20?

[0] https://github.com/jshint/jshint/blob/master/package.json#L4...

[1] https://github.com/node-ffi-napi/node-ffi-napi/issues/97

pwdisswordfish0 · on Oct 27, 2021

I don't understand your questions.

My message is that for a very large number of real world scenarios the value proposition of doing things the NPM way does not result in a Pareto improvement, despite conventional wisdom suggesting otherwise.

I also don't understand your motivation for posting an explanation of the grunt-related problem in the Web X-Ray repo. It reminds me of running into a bug and then going out of my way to track down the relevant issue and attach a test case crafted to demonstrate the bug in isolation, only to have people chime in with recommendations about what changes to make to the test case in order to not trigger the bug. (Gee, thanks...)

And to reiterate the message at the end of my last comment: the rationale of trying to point at how bad mainstream development manages to screw up other stuff, too, is without.

lhorie · on Oct 27, 2021

Personally, I see code as a fluid entity. If me spending a few minutes to find a way to unblock something that you claimed to "bring you to your knees on the zeroeth step" is a waste of time, then I guess you and I just have different ideas of what software is. For me, I don't see much value in simply putting up with some barely working codebase out of some sense of historical puritanism; if I'm diving into a codebase, the goal is to improve it, and if jshint or some other subsystem is hopelessly broken for whatever reason, it may well get thrown out in favor of something that actually works.

You may disagree that the way NPM does things works well enough to be widely adopted (and dare I say, liked by many), or that true reproducibility is a harder problem than merely committing files, but by and large, the ecosystem does hum along reasonably well. Personally, I only deal with NPM woes because others find value in it, not because I inherently think it's the best thing since sliced bread (in fact, there are plenty of aspects of NPM that I find absolutely atrocious). My actual personal preference is arguably even less popular than yours: to avoid dependencies wherever possible in the first place, and relentless leveraging of specifications/standards.

pwdisswordfish0 · on Nov 5, 2021

> My actual personal preference is arguably even less popular than yours: to avoid dependencies wherever possible in the first place

You lack the relevant facts to even be able to speculate about this. You haven't even done an adequate job grappling with the details presented here without the need for assumptions.

> If me spending a few minutes to find a way to unblock something

Imagine you have a coworker who frequently leaves unwashed dishes in the sink. You wash and put them away, but it happens enough that you decide to bring it up. Now imagine that when you do bring it up, your coworker lectures you, condescendingly and at length, about the steps you can take to "unblock" the dirty dishes problem (by washing them), as if there's some missing piece involving not knowing how and that that's the point of the discussion, instead of the fact that this (entirely avoidable) problem is occurring in the first place.

You're not unblocking anything, nor would this be the place to do so even if you were. Under discussion is the phenomenon that, for all the attention and energy that has gone into NPM, frequently old projects fail on some very basic preliminary step: just being able to complete the retrieval of the dependencies, i.e. the one thing that `npm install` is supposed to do. You voiced your skepticism and then with every response to this thread moved the goalposts further and further away, starting out generally dismissive that the problem exists, then wrapping up by offering your magnanimity to what you choose to see as someone struggling.

There is no struggle here, nor are any of the necessary conditions that would make your most recent comments relevant satisfied. Like the earlier anecdote about the annoyance of dealing with the type of person who jumps in to offer pointers about how to change a failing test case so that it no longer reveals the defect it was created to isolate, these are responses that fail at a fundamental level to understand the purpose and greater context that they're ostensibly trying to contribute to.

If `npm install` is broken on a Thursday and annoys the person who ends up forced to work through that problem, and then you show up on a Saturday with a explanation after-the-fact about what to do in situations like the one that happened on Thursday, what possible purpose does that serve? At best, it makes for an unfocused discussion, threatening to confuse onlookers and participants alike about what exactly the subject is (whether the phenomenon exists vs morphing this into a StackOverflow question where there's an opportunity to come by with the winning answer and subtly assert pecking order via experience and expertise). At worst, it does that and also comes across as incredibly insulting. And frankly, that is the case here.

> You may disagree that the way NPM does things works well enough to be widely adopted (and dare I say, liked by many)

By your count, how many times have you moved the goalposts during this, and how many more times do you plan on moving them further?

> I don't see much value in simply putting up with some barely working codebase out of some sense of historical puritanism

Major irony in relation to the comment above and given the circumstances where this discussion started. Shocking advice to let your source control system manage your source code and examine the facts about whether late-fetching dependencies the NPM way makes it worth the cost of putting up with the downsides that I brought up and the recurring security downsides that lead to writeups like the article that was originally posted here.

sofixa · on Oct 23, 2021

> At work, we've looked into committing tarballs (we're using yarn 3 now) but that also poses some challenges (our setup isn't quite able to deal w/ a large number of large blobs, and there are legal/auditing vs git performance trade-off concerns surrounding deletion of files from version control history

With Git LFS the performance hit should be relatively minimal (if you delete the file it won't redownload it on each new clone anyways, stuff like this).

jasonpeacock · on Oct 23, 2021

And what happens when you need to update those dependencies?

Software is a living beast, you can't keep it alive on 4yr-old dependencies. In fact, you've cursed it with unpatched bugs and security issues.

Yes, keep a separate repo, but also keep it updated. The best approach is to maintain a lag between your packages and upstream so issues like these are hopefully detected & corrected before you update.

pwdisswordfish0 · on Oct 23, 2021

> And what happens when you need to update those dependencies?

Then you update them just like you do otherwise, like I already said is possible.

> you can't keep it alive on 4yr-old dependencies. In fact, you've cursed it with unpatched bugs and security issues

This is misdirection. No one is arguing for the bad thing you're trying to bring up.

Commit your dependencies.

jasonpeacock · on Oct 23, 2021

Let's say that the day you update your dependencies is after this malware was injected but before it was noticed.

Now you have malware in your local repo :(

Having a local repo does not prevent malware. Your exposure to risk is less because you update your dependencies less frequently, but the risk still exists and needs to be managed. There's no silver bullet.

pwdisswordfish0 · on Oct 23, 2021

This is more misdirection. By no means am I arguing that if you're doing a thousand stupid things and then start checking in a copy of your dependencies, that you're magically good. _Yes_ you're still gonna need to sort yourself out re the 999 other dumb things.

kortex · on Oct 23, 2021

Sounds like a great way to end up with what $lastco called "super builds" - massive repos with ridiculous amounts of cmake code to get the thing to compile somewhat reliably. It was a rite of passage for new hires to have them spend a week just trying to get it to compile.

All this does is concentrate what would be occasionally pruning and tending to dependencies to periodic massive deprecation "parties" when stuff flat out no longer works and deadlines are looming.

seer · on Oct 23, 2021

That’s the whole deal with yarn 2 isn’t it? With their plug’n’play it becomes feasible to actually vendor your npm deps, since instead of thousands upon thousands (upon thousands) of files you only check in a hundred or so zip files, which git handles much more gracefully.

I was skeptical at first as it all seemed like way too much hoops to jump through, but the more I think about it the more it feels that it’s worth it.

hk1337 · on Oct 23, 2021

> Skip the nonsense and just check your dependencies in directly to your repo.

Haha, no.

That would increase the size of the repository greatly. Ideally, you would want a local proxy where the dependencies are downloaded and managed or tarball the node_modules and save it in some artifacts manager, server, or s3 bucket

Too · on Oct 23, 2021

What's the problem with a big repository? The files still need to be downloaded from somewhere. It's mostly just text anyway so no big blobs which is usually what causes git to choke.

For that one-off occasion when you are on 3G, have a new computer without an older clone, and need to edit files without compiling the project (which would have required npm install anyway), there is git partial clone.

Does npm have a shared cache if you have several projects using the same dependencies?

5e92cb50239222b · on Oct 23, 2021

>Does npm have a shared cache if you have several projects using the same dependencies?

pnpm does, that's why I'm using it for everything. It's saving me many gigabytes of precious SSD space.

https://github.com/pnpm/pnpm

eyelidlessness · on Oct 23, 2021

Now anything with native bindings is broken if you so much as sneeze.