We check our node_modules folder into source control

jdlshore · on Dec 12, 2021

Disappointed to see so many knee-jerk reactions to this. Vendoring dependencies is a simple way to ensure consistent build inputs, and has the bonus effect of decreasing build times.

To respond the two major criticisms:

1) “It takes a lot of space”

Don’t be so sure. Text diffs and compresses well. I have a 9-year old Node repo that I’ve been vendoring from the beginning and it’s only grown 200MB over that time. (Granted, I’m fairly restrained in my use of dependencies. But I do update them regularly.)

But even if it does take a lot of space… so what? If your dependencies are genuinely so huge that this is a problem, then vendoring may not be right for you. But you could also use one of the many techniques for managing the size of your repo. Or just acknowledge that practices are contextual, and there’s no such thing as “best practice”—just a bunch of trade-offs.

2) “It doesn’t work well with platform-specific code”

This can cause some pain if you’re in a multi-platform environment. The way I deal with it (in Node) is by installing modules with --ignore-scripts, comitting the files, running “npm rebuild”, and then adding whatever shows up to .gitignore. I have a little shell script that makes this easier.

This is only an issue for modules that have a platform-specific build, which I try to avoid anyway. But when it comes up, it can be a pain in the butt. I find its pain to be less frequent and more predictable than the pain that comes from not vendoring modules, though, so I put up with it.

Bonus) “It’s not best practice”

Sez who? Dogma is for juniors. “Best practices” are all situational, and the only way to know if a practice is a good idea is to examine its tradeoffs in the context of your situation.

jayflux · on Dec 12, 2021

> Vendoring dependencies is a simple way to ensure consistent build inputs

If consistent build inputs is your concern may I ask why using lock files wasn’t enough? That’s a problem they were designed to solve.

dtgriscom · on Dec 12, 2021

npm has long had a problem respecting lock files. The concept is easy: have a fixed lock file, get a reproducible build. But no: npm will change your lock file (I believe it's framed as "optimizing") without notice.

(Perhaps they've solved this in the last couple of years. I've been staying away from that ecosystem... too much growing in it...)

kevincox · on Dec 12, 2021

I think the trick is that you should use `npm ci` instead of `npm install` in most cases.

smarx007 · on Dec 12, 2021

It says so in the post, in case another left-pad removal happens.

aniforprez · on Dec 12, 2021

Not that I'm a fan but didn't npm resolve that problem right after? You can't yank entire packages anymore 24 hours after they're published

smarx007 · on Dec 12, 2021

Yes, you are right. But there are people who trust npm (now and esp. in the future) and other free infrastructure and there are those who prefer to be a bit more self reliant after getting burned once.

onei · on Dec 12, 2021

I use vendoring in Go because my team's builds happen within a huge, complicated corporate network that has been known to break arbitrarily in new and interesting ways (or rather when something gets changed unexpectedly and then it takes days/weeks to navigate outsourced IT and change it back). Vendoring deps doesn't save me all the time, but I've generally found it helps. Plus builds are a bit quicker because I can download everything in one go (via git clone) rather than pulling everything in at build time. It also helps when the linter decides it wants all the dependencies downloaded before doing anything and then we find it has a relatively short timeout when the network gains a lot of latency without notice.

On reflection, it seems more like I'm papering over network issues. Perks of working in an enterprise company I guess.

cxr · on Dec 12, 2021

Onus confusion strikes again <https://news.ycombinator.com/item?id=29276656>. The (mediocre) tooling for lockfiles isn't bedrock.

In a discussion about Skub, no one need to explain why the Skub-powered approach isn't good enough. It is the duty of anyone pushing Skub to explain exactly what makes Skub so special to the point that we need to have Skub in our lives.

jwlake · on Dec 12, 2021

There are only 2 problems I see with the existing solution in npm.

- "npm add package" puts in a "^ver", which is bad practice

- there is no good infrastructure to pull hash based blobs out of the ether in case npmjs is offline

npm-shrinkwrap has solved repeatability forever, people just didn't always use it. Auto-upgrading dependancies is the big problem, which should have never existed because it is not principled. I'd go further and say that dependancies and devDependances should only support exact versions, and peerDepenancies are the only thing that supports non-exact versions.

TomSwirly · on Dec 12, 2021

Came here to say this but you said it better.

I don't check in my dependencies in my current project because I don't need to; but in earlier projects, I or we did, for various good reasons; and it worked perfectly well, and was extremely convenient for new developers.

fctorial · on Dec 12, 2021

> Vendoring dependencies is a simple way to ensure consistent build inputs

It wouldn't be necessary if the dependency tree was a pure function of package manifest.

https://developer.okta.com/blog/2019/12/16/semantic-versioni...

anthomtb · on Dec 12, 2021

Dumb question: what does it mean to “vendor” your dependencies?

Best guess is something like “ship required source or binaries along with your end product.” Like static linking but extended to dynamic languages and source control.

oftenwrong · on Dec 12, 2021

To "vendor" a dependency is to check it into your project's source control, whether as source or binary.

marcosdumay · on Dec 12, 2021

It's to have it come from some place you control. It can be your source control, but a very common way to vendor dependencies on other languages is to just save them in a server somewhere and install pull them from that on your code.

somehnacct3757 · on Dec 12, 2021

Upsides from storing node_modules in repo are outweighed by the downsides. Unless of course you're Google-scale and can afford to contribute filesize fixes upstream, write fancy tooling to enforce commit-time workarounds, etc. Nobody working finger-to-feature has time for this.

For your average npm shop which doesn't have infinite internet oil money, here is why the article recommendations won't work for you.

Your CI will pay the time penalty during git clone instead of npm ci. In fact, the node_modules folder will be bigger than your source folder almost immediately. And over time you won't be cloning just the head files you'll also be cloning every npm package binary ever committed. You can't undo this without investing in smarter git tooling. Which is time spent not writing features.

NPM packages which install arch-specific binaries will constantly flip flop from commits by devs on different OS's.

Nobody is safe from left-pad, not even Google, and committing your node_modules folder doesn't change that. Eventually someone is going to have to run npm i.

Running npm ci on everyone's machine is reproduceable, I don't know what OP is warning about. Package lock pins all the versions.

If you have a large enough team to invest in dev experience, there's way better ways to get the advantages of the article without the downsides. You can cache the npm ci result in a container layer for your CI/CD or use middleware like artifactory

mustyoshi · on Dec 12, 2021

Isn't --depth=1 all you need to avoid downloading every binary?

somehnacct3757 · on Dec 12, 2021

Maybe, but everyone's CI situation is so variable that it may not be that easy. For instance if you are using a monorepo then even a shallow clone can be overkill. And if you rely on git history for conditional CI then a shallow clone will ruin the output of many git commands. So you could end up in an either/or optimization situation depending on the order your CI/CD organically grew and other architectural decisions you made.

ownagefool · on Dec 12, 2021

Counter point, and ignoring download size as your typical CI probably doesn't download the entire history, when are we supposed to pretend we've reviewed our dependencies?

I'll admit I don't believe everyone always need to check every dep, but we're skating close nobody checking them ever.

somehnacct3757 · on Dec 12, 2021

My team guards up front when introducing a new dependency. You fill out a little template with security assessment as well as some other stuff, just to do a dirt simple build vs 'buy' analysis. left-pad for example would fail because the build time cost savings are not worth the ongoing maintenance cost. (In fact doing this assessment at all rarely makes sense for microlibs, by design.)

Once something's in package.json I don't believe anyone who says they can vouch for the security of that over time. We're all doing security theater with npm audit, dependabot, etc. Don't use npm at all if anyone's life depends on your code.

ownagefool · on Dec 14, 2021

I think formlising an assessment like that makes some sense, but the question was more around what the assurances are. So it probably works like this:

#1 You look at the dependency and do an assessment on whether it's worth including. Check.

#2 You probably require some automated checks. SAST, Depedency Scanning / SCA, maybe some DAST, etc. Check

The outstanding question though...

#1 Did anyone actually read the code of the depedency?

#2 Did anyone actually look at what the depedency itself pulls in?

#3 Are these checks re-done when you update the lock files?

#4 If nobody is doing it, who's updating the lists and rules we use to scan from?

#5 Where possible do you have the monitoring to check when an app is doing something weird? i.e. network ACLs that when they fail, cause an event, that alerts a person to investigate?

I think we're mostly agreeing here, but the wider question is why is it that folks writing the app and including the depdency don't feel responsible for these things?

aliswe · on Dec 13, 2021

GH actions doesnt, Azure devops does

ownagefool · on Dec 14, 2021

I think you mean they're automatically doing SCA and maybe SAST. I don't think there's a human working at microsoft reading the code for you though, is there?

gravypod · on Dec 12, 2021

> Your CI will pay the time penalty during git clone instead of npm ci.

Things like GitLab's CI runners will do a single clone then do `fetch`, `checkout`, and `clean` to checkout your repo. Git repo size isn't a huge bottleneck in CI performance.

rpadovani · on Dec 12, 2021

Only if you have long-living runners: if you use a dynamic fleet to save money, then almost every time you clone the repo. However, this is why you can do sparse checkouts and limit the git depth.

ilumanty · on Dec 12, 2021

Yarn offers "Plug'n'play" mode since v2, which basically promotes what the author says. It takes the idea further: dependencies are stored as zip archives instead of thousands of small files, which reduces the "git noise" and actually makes this viable as a performant workflow.

https://yarnpkg.com/features/pnp

rarkins · on Dec 12, 2021

Yarn’s offline support is not directly tied to PNP. Yarn v1’s support for “offline” installs was a day one requirement, and as I understand it one of the primary drivers for Facebook engineers (at the time) to drive the creation of Yarn.

If offline installs is what you want, I don’t see any advantage of node_modules compared to this feature - only disadvantages (size, noise, and cross-platform incompatibilities).

Jarred · on Dec 12, 2021

Unfortunately, reading files from .zip can be considerably slower than the filesystem.

On an Ubuntu x86_64 machine with an SSD, require("react") with Yarn PnP ends up 4x slower than when installed from npm (Node 14.17.6).

DecoPerson · on Dec 12, 2021

I might be misremembering but at runtime, the Node process loads .pnp.js, which is a kind of monolithic “compiled” modules file containing all the modules installed. No reading of the .zip files occurs at runtime. (Again, I might be misremembering or have misunderstood. Please confirm/deny this if you know.)

matharmin · on Dec 12, 2021

The pnp.js file has a list of modules, not the modules themselves. The modules are still read from zip files at runtime.

Yarn doesn't do any kind of compilation itself - that would be done by build tools like webpack.

charrondev · on Dec 12, 2021

I’d be curious how “read file from a zip that I know the exact path to” performs compared to “recursively walk the node_modules directory and dynamically lookup the location of the file”.

maxpert · on Dec 12, 2021

Exactly reading article I for a second felt like author was either not fully aware of tooling, or doesn't care about the noise it will cause (potentially complex conflicts) checking node_modules will cause.

ronenlh · on Dec 12, 2021

Specifically, it’s the zero-installs philosophy. It’s linked from the url you shared.

https://yarnpkg.com/features/zero-installs

It’s also compares this with checking node_modules into git

brodo · on Dec 12, 2021

Also if you use the yarn deployment plugin (forgot the name), it will create a deployment folder for you that does not contain dev dependencies. This way we found a package that was installed as a dev dependency but in reality was a regular runtime dependency.

wruza · on Dec 12, 2021

Yarn pnp was in a poor state tooling-wise last time I checked it a year or so ago. Too many tools depend on ./node_modules/ (or actually an entire module.paths thing) to exist on a filesystem. Was it resolved somehow?

hasmolo · on Dec 12, 2021

by and large it has been solved with editor extensions and pnp plugins for various tools, i've used yarn 2/3 at enterprise scale for a few years now and am happy with it

mkranjec · on Dec 12, 2021

Wanted to suggest the same thing. Used Yarn 2 PnP with zero installs in the past company and it worked great. Whenever I switched to projects not using it install seemed to take ages. Used that only on backend, YMMV.

hasmolo · on Dec 12, 2021

and yarn 3 will even allow for node_module linker usage instead of cjs require hook overload

Bellend · on Dec 12, 2021

It's awesome apart from the vast majority don't use yarn.

DaiPlusPlus · on Dec 12, 2021

The GitHub mirror of the Chrome DevTools repo has the node_modules folder here: https://github.com/ChromeDevTools/devtools-frontend/blob/mai...

But there is nuance (there always is...), the README file in node_modules is here: https://github.com/ChromeDevTools/devtools-frontend/blob/mai... - and it makes it clear the only NPM dependencies used by the build-system or infrastructure is meant to be checked-in. Other NPM packages should not.

----------

In conclusion: the linked blog-article is clickbait that misrepresents how the Chrome team manages their dependencies.

malkia · on Dec 12, 2021

Maybe I'm wrong, but I see quite a lot of node_modules here - https://source.chromium.org/chromium/chromium/src/+/main:thi...

Here are all "node_modules" folders I can find (sorry I'm not much familiar with npm/js to know more) https://source.chromium.org/search?q=file:node_modules%2F$%2...

malkia · on Dec 12, 2021

This file seems to be controlling what gets ("checked-in") - https://source.chromium.org/chromium/chromium/src/+/main:thi...

In our industry (games) we often do that - checkin prebuilt code in the depots (typically "p4"). I'm not saying it's wrong/right, it's just what we do (not 100% fully, but almost, though people in IT/infra tend to do otherwise).

DaiPlusPlus · on Dec 12, 2021

Ah Perforce... you have my sympathy.

Does your team/company store art and asset blobs (even rendered FMVs?) in Perforce too, or do you have a separate asset-management system for that? If you do have two separate systems, how do you keep the asset-store and source-control in-sync?

It's been a long time (easily 20 years now) since I dabbled in any high-end creative software (like 3ds, etc), but I remember they generally all had large binary singleton project files (like Flash .fla and Photoshop .psd files) that couldn't be deconstructed and effectively diffed by any source-control system (though Flash eventually supported external ActionScript source-files), I'm curious how that affects your org's asset storage needs.

malkia · on Dec 12, 2021

We ... commit them. Probably some really exceptional cases stay behind some "ftp"-like service (not really ftp, something like this).

It maybe wasteful, but it's the established practice (it seems). People coming from other game companies pretty much use it too (some rare exceptions). Also the automotive/chip design industry uses them - and yes mainly for the big blobby non-diffable/mergeable assets.

Locks are terrible, and yet you gotta doit sometimes, as how else would you prevent people working on the same asset.

I'm (still) terrible with git, often screw up commits, and have to google search/stack overflow to get it right (I use it mainly for simple home projects). I can only imagine the pain and suffering a non-tech person would have with git. Also the metadata is quite lot for WFH conditions. Working remotely does not always mean working from a dumb terminal (I wish).

DaiPlusPlus · on Dec 14, 2021

> Locks are terrible, and yet you gotta doit sometimes, as how else would you prevent people working on the same asset.

I'm unfamiliar with P4's locking semantics; when files are "locked" does that prohibit other users from even getting a copy of the centralized file, or merely prevent users from overwriting the centralized file on push/upload? How does branching work?

If I were designing a centralized asset management system then I'd definitely add support for git-style (i.e. "many-worlds") branches instead of SVN/TFS-style "spatial" branches - but I'd also add support for some kind of "mini-branch" or deferred-conflict-resolution, whereby a file, or entire directory, can still be pushed to central storage but have multiple different representations that can be resolved/merged later, rather than immediately. So if two artists are working on the same "texture123.psd" file without realizing it then the system would let them both push (so the first artist to push would get to overwrite the file, and the second artist's push would see their file saved as "texture123.psd.v2").

There are good business reasons for having the ability to disallow changes to files in central storage, but that doesn't mean locks need to be used: it could be done by instead directing all updates/pushes to separate mini-branches, thus allowing users to push-and-forget and allowing them to defer conflict resolution while still protecting files from unwanted changes.

dijit · on Dec 12, 2021

The main reason to use perforce in the first place is because of assets.

Locking of files too, but that’s only necessary on non-diffable assets.

DaiPlusPlus · on Dec 12, 2021

I've only ever used Perforce through a command-line interface. What's the UX like when using Perforce for blobs/assets? Does everyone have to use the command-line or is there a GUI experience (with fast-rendering thumbnails?)?

Forgive the questions, I'm just curious about the minutia and peculiarities of the gaming-biz because I've never worked in the field.

dijit · on Dec 13, 2021

The CLI is punishing as hell.

Most game engines natively integrate with perforce so it's largely invisible.

Those that don't use the engine typically use an IDE which automatically checks out files; then they just check them in using the p4 GUI.

Even I don't use the commandline and I'm a CLI-Junkie.

davedx · on Dec 12, 2021

I used Perforce 10-15 years ago, it had a pretty solid GUI application. I never used the CLI.

dzhiurgis · on Dec 12, 2021

Does it work across x86, 64, arm, linux, mac and windows?

Some modules straight up download binaries so I don’t see being so straightforward.

tossaway9000 · on Dec 12, 2021

I came to mention this exact issue.

I've seen this trip up people in the past, in one case a CI/CD system running Linux was used to produce a project deployed to a mostly windows environment this didn't cause any issues until the day a developer added a binary module. Honestly, I'm surprised it worked as long as it did, it took a little over a year before anyone hit that issue.

uranusjr · on Dec 22, 2021

Depending on the use case, especially if they’re developing a frontend project, this may be a non-issue. It’s very much a YMMV situation.

coding123 · on Dec 12, 2021

No, and that detail is dangerously missing from the advice. lol Anyone that does this and has teams on Mac, Windows and Linux will find out how crappy this is very quickly.

eyelidlessness · on Dec 12, 2021

It’s even worse than that. If you support all LTS/current Node versions—as is, and should be, very common for libraries—it'll break even on a single platform. I’m sure it’s solvable, but the solution would be so complex it’s tantamount to building a new package manager.

davidjytang · on Dec 12, 2021

We can thus infer that author being working in "Google Chrome DevTools team", they must have a way to unifying their dev env.

zffr · on Dec 12, 2021

Would this still be an issue if the whole team uses docker to run the code?

nickjj · on Dec 12, 2021

> Would this still be an issue if the whole team uses docker to run the code?

It could be if it's a CPU architecture difference. For example an M1 Mac (ARM64) vs just about every other system (x86-64).

I know we had to switch out MySQL with MariaDB locally because the official MySQL Docker image doesn't support ARM64 devices but MariaDB does. That's just another example where even if you're using Docker there could be differences.

We've also had issues where developers aren't used to case sensitivity at the file system level and things work on their Mac but fail on Linux in CI because Docker's bind mounts (often used in dev) will use file system properties from the host OS which means even if your app runs in Linux within a container it may run differently on a macOS host vs Linux.

The moral of the story here is Docker is good but it isn't a 100% fool proof abstraction that spans across Linux, Windows and macOS on every combination of hardware.

tentacleuno · on Dec 12, 2021

> Once you check your node_modules in, there's no need to run an install step before you can get up and running on the codebase. This isn't just useful for developers locally, but a big boost for any bots you might have running on a Continuous Integration platform (e.g. CircleCI, GitHub Actions, and so on). That's now a step that the bots can miss out entirely. I've seen projects easily need at least 1-2 minutes to run a complete npm install from scratch [...]

Couldn't this issue be solved by using caching? If I remember correctly, Travis CI has the option to cache certain folders between builds, meaning an npm install doesn't have to start from scratch, and can just incrementally update the cached node_modules folder (any changes are then copied to the next CI build).

paulbjensen · on Dec 12, 2021

I understand what he's getting at, but if you are trying to have an exact copy of code in order to guarantee an exact behaviour, then you would want to extend that to the operating system used to run the software.

One example was having code where the test suite ran fine on my local MacBook, but would fail on CI (Linux). It turned out that on Linux finding files by name is case sensitive, whereas on Mac OS it isn't, and a require statement in Node.js was referencing a file path with casing that was different to the file name's spelling.

smarx007 · on Dec 12, 2021

Isn’t it where we are going with Nix and co?

notthathardbro · on Dec 13, 2021

This entire comment thread is just torture for anyone who uses real dep management tools like Nix and Guix. Sorry, but it's just exhausting watching the same asinine conversations play out over and over. I guarantee there are multiple people who have spent more time pontificating over pointless ways to microscopically improve the node tooling situation who could've picked up Nix in 1/10th the amount of time.

But then again, I know there's a bunch of Nixers that pick their head up slightly, shake it, and then just go back to work on actually interesting problems instead of a millionth discussion about dealing with npm. Christ, the stuff people put up with.

EDIT: Ironic, this being here along with the CISA/log4j post where everyone is yammering about SBOM (software bill of matearials). Again, I just glance over at Nix and go, "sure, what do you want to know, I can tell you instantly if log4j is anywhere and if it's a vulnerable version (excluding non-source-built packages in nixpkgs)".

tlhunter · on Dec 12, 2021

This really only works if you avoid native modules. I'm not sure why the author didn't mention native modules as it's a pretty big deal in this context.

ng12 · on Dec 12, 2021

> We've come up with a rule that helps us here: a change that updates node_modules may not touch any other code in the codebase

What? What if the dependency upgrade requires code changes? Master is just broker until the second MR merges?

This all sounds like terrible advice.

NavinF · on Dec 12, 2021

Dude he addressed that in the next few lines:

> There are times where this doesn't work; updating TypeScript may require us to update some code to fix errors that the new version of TypeScript is now detecting. In that case we have the ability to override the rule. As with anything in software engineering, most "rules" are guidelines, and we're able to side-step them when required.

ng12 · on Dec 12, 2021

Ah, I glossed over that. But that just makes me more confused, not less. On a long lived project most changes to node_modules will require code changes... just not sure what the point is.

NavinF · on Dec 13, 2021

If your libraries have breaking changes on every update, sure. Most of the libs I use haven’t had one in decades, but I don’t write js so maybe my experiences aren’t that relevant.

emteycz · on Dec 12, 2021

You can have multiple commits in a PR

lf-non · on Dec 12, 2021

Yes, but a lot of people prefer to have every commit in master in a buildable state. Otherwise things like git bisect become a lot more tedious.

xboxnolifes · on Dec 12, 2021

> have every commit in master

Yes, but a PR commit is not in master.

pjerem · on Dec 12, 2021

I see you, little squasher :)

ng12 · on Dec 12, 2021

But then you lose the benefit. Hiding node_modules during code review is the easy part.

clintonb · on Dec 12, 2021

Some teams force all pull requests to be squashed to a single commit.

awill88 · on Dec 12, 2021

Can’t you like, cache the node_modules folder on CI builds? I dunno, seems gross, unless you’re on a project with minimal deps or very meticulous about which deps you leverage. I am just one of those people who are constantly trying to upgrade dependencies anyways (cautiously of course) as to avoid vulnerabilities. That said, I see the point, it’s interesting..

20after4 · on Dec 12, 2021

Upgrading could introduce vulnerabilities just as easily as fixing them.

awill88 · on Dec 12, 2021

it’s possible, kind of a moot point as doing nothing induces a similar level of risk in my experience. Don’t touch your code for two weeks, I guarantee you that ‘npm audit’ will complain about some new issue.

trog · on Dec 12, 2021

Also worth noting the longer you wait to upgrade, the harder it can be to do so when you finally need to. If someone discovers a critical fault in the version you're running but you're several years out of date, upgrading can be a huge pain.

paulryanrogers · on Dec 12, 2021

This very week I was dealing with my artifacts exploding in size because AWS got 429s from GitHub. Then Composer pulled from source and there were so many extra files and SCM folders we exceeded the max artifact size.

Another idea is to host your own package cache. That would be my preference where SCM size prohibits checking in dependencies themselves.

antonvs · on Dec 12, 2021

> Another idea is to host your own package cache.

This makes most sense, and is the common solution at larger companies. Right tool for the job and all that.

simonw · on Dec 12, 2021

In the past for a large Python project I've handled this using a separate repository for all of the dependencies - that way you can still get work done even if PyPI is unavailable for some reason, but you don't bloat your main repository with an extra few hundred MBs of stuff.

revskill · on Dec 12, 2021

In my experience, it's about the inconsistency between Node.js versions (each version could produce different package-lock.json).

So, for example, you install the dependencies with Node 12, but have to run locally the system with Node 14.

So, i think just checkout the node_modules folder into git is not complete solution. You're avoiding the need to commit inconsistent package-lock.json, which is not hard to solve though.

deanc · on Dec 12, 2021

Nobody should be running different versions in a team. Unless you’re selling a product where this is a likely scenario you should be running the same environment either through some kind of virtualization or at worst nvm.

Tajnymag · on Dec 12, 2021

Isn't that really a problem with npm and package maintainers? Node is really just a runtime.

revskill · on Dec 12, 2021

When you install nodejs, you have npm installed. So in this case, there's not much you can do though. So it's not just npm issue, it's NodeJS issue, too.

Tajnymag · on Dec 12, 2021

Not necessarily. On Arch, npm isn't bundled with the node package.

Also, according to the official npm documentation, npm should be installed through nvm, thus having the ability to specify its version separately from node.

https://docs.npmjs.com/downloading-and-installing-node-js-an...

danlugo92 · on Dec 12, 2021

No, this is what the lock file is for.

d3ad1ysp0rk · on Dec 12, 2021

Note the other child comments of this; NPM has failed to make the lockfile reliable across systems and versions. Here's an example from March that isn't fixed: https://github.com/npm/cli/issues/2846

The fact that you have so many people believing "nuke node_modules and delete package-lock.json" is a reasonable step in diagnosing an error is damning to NPM.

We don't check in our node_modules, but "use the lockfile" is not a valid counter to this article's points.

danShumway · on Dec 12, 2021

I honestly don't understand how people get this impression of lockfiles as being perfectly reliable. How are they not occasionally bitten by these bugs? Maybe I'm just unlucky, but I'm a little jealous of these developers who apparently are good enough managing/updating their dependencies and keeping their count low enough that they've just never run into problems like this before.

Lockfile v1 literally ignores pinned versions of dependencies if the package.json specifies a fuzzy version number[0], and the advice of the npm team was, "it's fine, everyone will just bump a major version number of npm." And to this day, I still don't know what the expected behavior is, there really isn't a list anywhere about when the lockfile is and isn't supposed to be respected. So it's not really surprising to me that people distrust version pinning, and I always feel like I'm kind of living in a different world when people say that lockfiles just solve everything.

[0]: https://github.com/npm/cli/issues/564#issuecomment-921314014

pas · on Dec 12, 2021

npm went through a rough few years (lockfiles, leftpad) and obviously the hivemind of the JS ecosystem is not the most careful one (hence all the advice of nuke it and npm i).

but those who care use yarn, those who even want to be correct use yarn2, and so on.

spankalee · on Dec 12, 2021

Indeed. What's probably needed here is a way to review a diff of the contents of the updated packages. Checking them is is just a brute-force way to do that.

pas · on Dec 12, 2021

not to mention that unless someone is very familiar with the code of dependencies it's very hard to review hundreds of small near meaningless changes unrelated to your actual functional/business requirements.

something like cargo-crev for npm might be a long term solution

https://github.com/crev-dev/cargo-crev

Bellend · on Dec 12, 2021

"I delete my lock file" is so common that I would say it has no value.

gregplaysguitar · on Dec 12, 2021

I’ve heard lots of people claim that yarn gives no advantage over npm anymore, as of a year or two ago. But in 5 or so years of using yarn every day, on numerous projects, I’ve probably nuked node modules a couple of times, and never even considered deleting yarn.lock. Maybe yarn is still superior in this regard?

Tajnymag · on Dec 12, 2021

I have the exact same experience. I've tried to use npm again for multiple of times, but the experience has always become less stable overall.

Supermancho · on Dec 12, 2021

Then this solution has no value either. They achieve the same goal.

bitlax · on Dec 12, 2021

Check in your lock file.

Bellend · on Dec 12, 2021

Done. Then I delete it, and `npm install`. Then commit. The majority of people I have worked with do that. On Friday some dude was saying "shrinkwrap v2 is not shrinkwrap v1" and the advice was "delete it and npm install" then commit. (Payment company software manager).

amerkhalid · on Dec 12, 2021

Yes, this is fairly common in my experience. Whenever local build fails, a lot of troubleshooting advices says to nuke node_modules and lock file.

gausswho · on Dec 12, 2021

Anyone recommend a build check to block PR's from including fully rebuilt lockfiles? Other than a high LOC, it's too easy for this to slip through.

eyelidlessness · on Dec 12, 2021

In my experience this is probably not viable. A single, valid update to a direct dependency can bring dozens or even hundreds of updated sub-dependencies. And an audit fix will often update just the lockfile. Maybe I lack imagination, but I can’t think of a workable heuristic for determining how the lockfile was changed.

I think, instead, it would be good to move in a different direction for sub-dependencies generally. A rough sketch:

1. Packages state their dependencies as they do currently.

2. When released, they’re built and bundled by the package manager host.

3. Included in the bundle is a manifest of the dependencies used, specific imports used, and a hash for each (recursively until exhausted).

4. On install, identical code (same hash/same bundled result) is deduplicated.

5. A human readable record of dependencies and imports used is produced. It’s important that it’s human readable, because:

6. This should not be filtered out in diffs. It should be subject to review just like any other change.

All of this is pretty complex, and there are probably ways to reduce that complexity. But it has some obvious advantages:

- Only your direct dependencies are installed. Tons of bloat can be stripped out.

- Even deduplication can be performed on the package manager’s servers. And hashes aren’t a particularly expensive lookup.

- It would go a long way towards addressing audit fatigue: if your dependencies’ bundles don’t include affected code, the audit doesn’t apply; if they do, you can be reasonably confident the audit is valid.

- A (wild guess) huge amount of the time, sub-dependency changes will require little to no review. Their stable parts will seldom change, and the parts shared among several dependencies could be reviewed as one unit.

- Lock files themselves just need to track direct dependencies (and even then, only to support semver ranges).

gausswho · on Dec 12, 2021

What you're describing is more in spirit with the intent of a lockfile and worth exploring by package managers. But I do think a heuristic could be conceived for the lockfile-was-rebuilt situation: If a single top-level dependency declared with ^ or ~ was present in the lockfile when the existing resolution was still valid.

Ideally the onus is on the package manager to provide metadata in the lockfile for the strategies it took when generating.

beaconstudios · on Dec 12, 2021

use yarn instead, the lockfile is much better.

davidmurdoch · on Dec 12, 2021

Problem with yarn is that it doesn't actually use dependencies' lock files... So once you publish your library to npm your lock file doesn't do anything whatsoever.

lights0123 · on Dec 12, 2021

That makes a lot of sense for file sizes—otherwise, common dependencies patch versions apart would be duplicated many time, and it would block you from upgrading a library's dependency for a security fix. You still get the important part of reproducible builds for your program. Rust's Cargo behaves the same way.

01acheru · on Dec 12, 2021

I think that checking in your lock file is an actual best practice

nhoughto · on Dec 12, 2021

I like the benefits but our node_modules is 1.9GB, def not checking that in.

A Git bot that parses changes to yarn lock and comments with size/loc/files of all the deps etc, kinda like how coverage bots work. get all the observability benefits without the checkin cost, also wouldn’t have to split npm changed and code changes..

xdennis · on Dec 13, 2021

What do you have in there that it's that big? I assume there are quite a few binary files which probably shouldn't be in Git.

nhoughto · on Dec 13, 2021

Tbh haven’t gone deep into it, now you ask maybe I should. It compresses ok, zstd to 240MB. React + react native, we lack etc tons of stuff. My guess/hope is mostly dev dependencies.

nhoughto · on Dec 13, 2021

mostly dev dependencies, i guess you really pay the price for not having a compressed intermediate format like a JAR: - storybook is 420MB (! will look into this one!) - babel is 110MB - sentry sdk is 92MB (?) - aws-sdk is 63MB - typescript 60MB - react-native 51MB

.. on and on, 40MB..30MB..20MB.. hundreds of them just adds up! 147 dependencies over 1M. pretty incredible/wasteful really when you look at it.

AtlasBarfed · on Dec 12, 2021

Maven, Gradle and the like always download dependencies as part of the build, and if a server is down (or flaky like jitpack) good lord is it annoying.

What I don't understand is why dependency download isn't a separate task you do before compilation. FIRST you download all the dependencies to stabilize that, then you build the code. The reproducibility is the big one.

IT drives me nuts that the dependencies are hidden away in maven and gradle. I have to lookup obscure "download dependencies to a lib" task configuration. Still the obsession with massive jars/wars/whatever when all you should have to update in a deploy is the difference in the libs and the main code jar.

The reason for this obscurement is pretty much "uh, it saves disk space?" which is a laughable consideration given the bloat in war files, docker images, and the like.

I agree with the poster.

gravypod · on Dec 12, 2021

> What I don't understand is why dependency download isn't a separate task you do before compilation. FIRST you download all the dependencies to stabilize that, then you build the code.

This is what bazel does. It also offers a `bazel fetch` to pre-download things before going offline (ex: flight).

oftenwrong · on Dec 12, 2021

For Maven, you can use this plugin to move downloading dependencies into a separate step:

https://github.com/qaware/go-offline-maven-plugin

It's not perfect, but it can be useful.

StopHammoTime · on Dec 12, 2021

I don't agree that it should be checked into source control. I do believe it should be cached somewhere. How much bandwidth of popular sites are used by redundant actions. A single request for a 2GB archive is much better than a 1,000,000 small requests that are all a few KBs or MBs in size.

cookiengineer · on Dec 12, 2021

Nodejs has a node-gyp problem. Every node_module that somewhere down the dependency tree requires a "native" module will require recompilation on the target machine (or in worst case: the user machine).

I really would have hoped that the NaN module related problems will be fixed over time, but here we are in 2021 and nothing's been fixed.

As long as npm doesn't use binaries and headers, those things will stay broken. The thing that they argue with to use "always source" is kinda ridiculous when considering that probably the most of all npm packages are using webpack or another bundler before pushing their own package to npm - because npm itself has become impossible to use as a package manager alone.

I mean, a couple MB of libraries with the wrong dependencies can lead to multiple phantomjs installation, which is an inactive, deprecated, and unsecure project for years already... just because of some unit tests that have no place in a production npm package.

My hopes are that more sane developers come together, switch to ESM and implement better policies for evaluating their dependencies (e.g. blocking sources from people that have more than 1000 npm packages and brag about it).

Pikapkg was a great idea in my opinion, and I was using it before they moved the project to building astro as a platform :-/

abraxas · on Dec 12, 2021

This is basically a way of saying: to hell with those "package managers".

It's a sentiment that I'm actually in agreement with. I've been coding mostly in Java for the past 22 years. Somewhere around 2010 Maven became the prevalent build tool quickly displacing the venerable Ant. With Ant we had builds that used checked in jar file dependencies. It was obvious what your builds consisted of and they were very fast once you cloned the repo.

Then came Maven and the conflict resolution hell quickly followed (esp with unpinned dependencies). Now every time I type mvn install it feels like a small adventure in its own right. I'm absolutely flabbergasted. We sacrificed simplicity and reliability for a bit of instant gratification. Bad tradeoff.

oftenwrong · on Dec 12, 2021

You can save yourself some headaches by not using transitive dependencies in your Maven builds. This can be enforced with Maven Enforcer. The trade-off is that you will have to clean unused dependencies. This can be tool-assisted with something like: https://github.com/castor-software/depclean/ . You can also enforce dependency version convergence with Maven Enforcer. None of this saves you from "jar hell" directly, but it helps prevent you from unknowingly creating a disaster-in-waiting.

smarx007 · on Dec 12, 2021

It’s been a long while since I saw a POM file with unpinned dependencies… ca 12 years of Java experience. Had to deal with an Ant project recently that didn’t check in the JARs into Git, I thought my hair would become gray by the time I figure out how to pull the right dependency versions transitively to get the Ant build to work.

But I share the overall sentiment. What if Maven Central goes down one day? It’s just a web server like any other.

oftenwrong · on Dec 12, 2021

>What if Maven Central goes down one day?

Most Java shops run their own artefact repository acting as a pull-through repository to Maven Central and other 3rd party repositories. In addition to acting as insurance against losing access to critical dependencies, it also can provide a performance boost when downloading dependencies, and helps to offload your organisation's traffic from Maven Central.

kevin_thibedeau · on Dec 12, 2021

Ant has shite syntax that "clever" coworkwers can turn into a morass.

NaN1352 · on Dec 12, 2021

I don’t check in node_modules but do a backup once in a while.

With frontend nowadays it’s sad but after six months it’s highly unlikely that my project will compile, not to mention the tooling like Vue, Vite, etc that has breaking changes.

I mean it’s scary. You write a program and it WONT run if you just give it enough time. Locking versions is not really a solution since often times the tooling itself, and IDE extensions require newer versions of packages.

Maybe you wanted to fix a typo a year down the line but oh no, now you need to figure out why Vite won’t start, why eslint dropped support for xyz, spend hpurs figuring out what you need to change in your configs, etc.

onion2k · on Dec 12, 2021

I don't see how this would work in practise. You could use module-alias[1] to actually switch to the backup copy during dev or a local build, but that will only work when the backup is on the same machine you're building on, and then everyone on the team will need to have the same backup (or use a network drive for it I guess). If you don't check in your backup then nothing will get through CI or make it to production. Why not just check in node_modules and let git handle the 'backup' process?

[1] https://www.npmjs.com/package/module-alias

NaN1352 · on Dec 19, 2021

Sure I’m talking solo projects.

scrollaway · on Dec 12, 2021

This is all horrible advice and other commenters have rightfully pointed this out already so I won't repeat it more… (okay, once more: This is all horrible advice, don't do that)

But, there is one thing I like from this, which is git diffs showing the actual final code diff when you upgrade dependencies.

Of course, this being horrible advice, it ignores how many JS packages ship minified which would make the diff as useful as binary noise. But I like it in theory. This could be a good opportunity to write a tool that replicates this specifically (and what's more, for other languages as well)

johnnypangs · on Dec 12, 2021

Most node modules (and I’m pretty sure is best practice) do not minify published node module code. The user should decide if they want to/how to minify. I often go in an read node module code and although it might have been transpiled for compatibility, it is not minified.

ChrisMarshallNY · on Dec 12, 2021

This is fairly standard "conservative configuration management" stuff.

The company I worked for, used to do that with everything. In fact, one of the ways that they archived versions, was to create a bootable external hard disk clone, of the entire development machine, and store that.

If you want to be absolutely sure that you have the complete building blocks, then you don't trust your package manager. Make local dupes of the packages, and integrate them into your own version control.

DylanSp · on Dec 12, 2021

Does the Chrome DevTools team use Google's big monorepo and all the tooling around it? If so, that puts the author in a different situation than the vast majority of devs.

DaiPlusPlus · on Dec 12, 2021

Chrome DevTools has their own separate repo

* https://developer.chrome.com/docs/devtools/

* It's mirrored on GitHub: https://github.com/ChromeDevTools/devtools-frontend

* And look, THAR SHE BLOWS: https://github.com/ChromeDevTools/devtools-frontend/tree/mai...

_ugfj · on Dec 12, 2021

Totally agreed with every step, that's what we do too -- except it's composer and composer managed packages here not npm but the reasoning is similar. On top, we too often need to actually patch composer managed packages and rolling patches without the code being version controlled is a PITA. It's git diff --relative if it's already in git otherwise it's .... I dunno, check out the package somewhere else, hope you get a close enough version (because what's released can differ a bit from version control), copy over the files , roll a patch, clean up the patch... what an unnecessary nightmare.

And composer patches makes life quite easy compared to maintaining a fork. If I were to fork something I would need to handle merging every time they have a new release, run the build etc. With composer patch, a new released version is installed and the patch on top. Sure, if there's a conflict that needs to manually resolved but that's usually minimal effort since most patches are absolutely tiny, a few kilobytes at most.

I never even understood the arguments for keeping the packages out of git. Trying to save disk space these days is pointless. Maybe npm is different but composer handles about 160MB of code here. Maybe I missed the memo but these days that's nothing. My laptop shipped with a 500 000MB SSD so it's like, what, half a percent? The speed advantage , on the other hand, is absolutely undeniable, git won on speed in the first place, these script language tools can't possibly compete with a git pull on speed. git diff, as the author notes, is not at all a problem, just separate the vendor commits from your commits. And as I noted: they are useful for vendor packages.

jakear · on Dec 12, 2021

So instead of downloading the current set of dependencies after cloning, you download every dependency ever used while cloning? And you do this to "save bandwidth"? Doesn't make much sense. (Yes I know about shallow clones, but it's often nice to have the full history around)

Any reasonable CI tool will have a way to cache generated assets based on file contents, that's the way to go here IMO.

cxr · on Dec 12, 2021

Consider if you had written, "So instead of downloading the current project source tree, you download every version..?"

That is what happens in a DVCS, after all—in fact, it's sort of the whole point. If you're so uncomfortable with this, it might be worth asking yourself whether it was ever really the case that you agreed that DVCSes were the right approach. (Even then, still no reason to embrace package managers like some kind of paramilitary force that's subject to its own rules—better to just improve your version control system to handle things the right way, right?)

jakear · on Dec 13, 2021

Theres cost-benefit tradeoffs for sure. It'd be nice to be able to do a big bisect without needing to flash the dependencies every iteration, but if it comes at the cost of making everything else slower (more bandwidth, more disk I/O, larger diffs to parse, etc.), it just isn't worth it.

Sure you can say "well why not just make git better then it will handle any operation in any size repo imperceptibly quickly", but I think you and I both know that isn't anywhere near as easy to implement as it is to type.

cxr · on Dec 13, 2021

> you can say "well why not just make git better then it will handle any operation in any size repo imperceptibly quickly", but I think you and I both know that isn't anywhere near as easy to implement as it is to type

Good thing I didn't type that, then. That's not what the shape of my argument looks like at all.

There's a massive leap between, "we don't want every clone of our repo per se to carry all the baggage of our dependencies' histories, so it would by nice to have some scheme for handling lightweight, shallow copies" and "... so we decided the right way to do that, rather than making that a first class feature of the version control system we're using, is to create a hack in the form of a new set of unrelated tools meant to circumvent our VCS's fundamentals completely—so from our its point of view, these controlled objects and the scheme we use for managing them are invisible and might as well not even exist."

mohanmcgeek · on Dec 12, 2021

The listed reasons are insufficient and we could achieve many of these by just pinning our dependencies versions. If we did this, our git repo after a few commits will tend towards a gazillion GBs. Costs outweigh the benefits, if at all there are any.

Horrible advice. Don't break the industry practice and check-in your node_modules

locallost · on Dec 12, 2021

Whether he's right or wrong can be debated, but reducing the argument to "follow industry practice" is a perfect example of cargo culting. Industry practice needs to be based on something today, not something from ten years ago that might or might not be valid anymore. He offers a long list of arguments and even though I don't check in node_modules, some of the arguments are compelling - e.g. we already want reproducible builds and use package-lock, but why not skip this step altogether? Why not skip setting up cache on your CI if you don't need to? What if knowing the details of your package manager and CI is useless because there are simpler ways of doing things. I'm not that convinced that a few commits would create a gazillion GB repo, so his arguments seem stronger.

But "not best practice", qed.

fivea · on Dec 12, 2021

> He offers a long list of arguments and even though I don't check in node_modules, some of the arguments are compelling (...)

Are they, really?

The less debatable point is arguing that making CICD pipelines slightly faster, but this feels like an appeal to microoptimization. Any free tier CICD system out there let's you do a single npm install and move these dependencies as far as you'd like into the pipeline as artifacts. Is a git checkout really faster that a npm install?

For example, is adding a npm dependency really invisible if you already track package.json and even package-lock.json? Those files show up in diffs, and it's hard to miss them.

Also, if the goal is to get replicated builds, isn't this handled by pinning versions and tracking package-lock?

The left_pad example is particularly ridiculous as I highly doubt that a company like Google, like any company that cares about auditing and vending dependencies, does not run its own npm proxy with cherry-picked packages.

raverbashing · on Dec 12, 2021

Here's a better idea, can't you have an npm cache/clone that keeps all the artifacts you use in your code? So you pull from it, it pulls from npm and caches?

fivea · on Dec 12, 2021

> Here's a better idea, can't you have an npm cache/clone that keeps all the artifacts you use in your code? So you pull from it, it pulls from npm and caches?

Not only is that possible, that's also expected to be mandatory in any company that is required to monitor ad control dependencies. I know for a fact that some FANGs do manage and enforce the use of internal npm repositories, mainly because of infosec audits, and I doubt Google is not one of them.

xtian · on Dec 12, 2021

The fact that there’s no mention of the `npm ci` command (or yarn or pnpm) makes me wonder how deeply this problem was investigated before using it as a justification for this hacky workaround.

One wonders why they aren’t checking in the binaries for their database and language runtimes. Surely this would save crucial seconds in project setup.

cxr · on Dec 12, 2021

There's a tremendous irony here, which is that these package managers are little more than a hack to let people manage parts of their project tree (and the accompanying shame) outside the harsh and knowing gaze of their version control system.

xtian · on Dec 13, 2021

So I imagine that if you ship an app on Linux you check in the source of every system library and utility it depends on, right? Anything less is little more than a hack to escape the harsh and knowing gaze of your version control system.

cxr · on Dec 13, 2021

> I imagine that if you ship an app on Linux you check in the source of every system library and utility it depends on, right?

Dumb false equivalence, since that ("every system library and utility it depends on") is not the argument of the side you're trying to appear to offer a response to. Please refer to the HN guidelines.

A less dishonest retort would be to ask if one should check in the dependencies that are analogous to what ends up in node_modules, and the response would be, "welp, that's exactly how many app developers have been known to approach things, so 'yes'."

xtian · on Dec 13, 2021

Despite the sanctimony, I'm sure there's at least a slim chance of you understanding my point: the distinction between "what ends up in node_modules" and any other application dependency is arbitrary and purely conventional. There are legitimate technical reasons to check dependencies into source control, but neither the reasons cited in the article nor any pompous ascriptions of moral judgment to software tools are among them.

cxr · on Dec 14, 2021

As a point of fact, the sanctimony of referring to something as a "hacky workaround" began here, where _you_ were the one to (unironically) introduce the phrase: <https://news.ycombinator.com/item?id=29528285> Pointing out the logical inconsistency of a strong claim is not provocation, no matter how much you feel like you are the one who is being attacked.

I don't recognize your claim that the distinction is arbitrary. Is the NPM world's distinction between package.json's "dependencies" vs "devDependencies" arbitrary? (Answer: no.)

danShumway · on Dec 12, 2021

This is a fun shift in culture because for a long time checking in node_modules was the official advice of the early Node documentation unless you were building a library for npm.

It's already been long enough since that time that people seem to have forgotten that it even existed. Vendoring dependencies is one of those things where every once and a while a language will reintroduce the concept, and it always seems to catch people off guard. Go is a good example, although it seems to have varying advice about whether vendored dependencies should be checked in to version control. That might not be surprising considering that Go is also coming out of Google, just like this article.

It feels a little bit weird to say that this is just "industry practice" when you have the Chrome DevTools team telling you they don't do it, but :shrug:. Google does tend to be a bit of a rarity in how it treats monorepos. I'm just always interested to see how opinions on this have evolved; it's rare for me to see analysis that says, "we used to do this, and here's why we found out that it didn't work." Usually the opinions end up seeming more universalist, like the very idea of vendoring dependencies is somehow weird and unexpected, and not something that the industry was largely on board with for a decent amount of time.

unqueued · on Dec 12, 2021

I think that having rapid access to node_modules can be very helpful sometimes. The solution I came up with was this:

https://github.com/unqueued/git-cache-tag

It copies all untracked stuff (including node_modules) into a leaf tag. It is fairly easy to manage them, or find the latest one. And because they are leaves, they can be pruned and completely garbage collected when they aren't useful anymore.

I have been burnt many times by npm, and I use this script to guarantee that I have a stash of my node_modules, while also keeping my project small.

And I have diffed different snapshot tags to see which module changed that broke something.

And by leaving everything in unaltered text, it exposes it to git which does a great job at compression stuff, especially highly differential revisions of my node_modules.

A 500M node_modules from one of my projects only weighed about 100M extra, even with several snapshots. And I can just delete them anyway.

I need to work on it a lot more, it was just a quick and dirty solution when I had to work with React Native a few years ago. It doesn't handle submodules at all, and there's plenty more I'd like to do with it.

andreineculau · on Dec 12, 2021

Funny but I've also done this. Thanks for sharing!

Depending on the context, if you don't want this in git history, and want to handle git submodules, there's also git-archive-all https://github.com/roehling/git-archive-all (if you like shell scripts, it is using bats for testing - it was the first time I heard of it)

shric · on Dec 12, 2021

I remember an article a few years ago about Google's 86TB google3 source code repo. Someone on HN asked "I know Google is big but how on earth do they have 86TB?". Someone then quipped "someone accidentally checked in node_modules"

batoure · on Dec 12, 2021

Basically came here to say this. The whole article felt very silly until I was like “oh wait OP stated early on he works at google… yeah seems about right ~closes tab~”

batoure · on Dec 12, 2021

Insomnia lead me to go dig into the repo… the average age of 98% of files in node_modules is 10 months old, attached to the commit when most of these files were added to the repo in the first place… so the entire argument is predicated on the changes to 2% of the dependencies

caymanjim · on Dec 12, 2021

You're drastically overestimating the storage requirements. Time is far, far more valuable than storage savings. The point about CI builds running faster is enough to sell the idea all by itself.

Oddly not mentioned in the article is that it allows you to build when npmjs.org is down or unreachable, which happens often enough to be frustrating, and if it happens when you're trying to deal with an emergency, it's downright infuriating.

stavros · on Dec 12, 2021

We cache the directory on CI and get the same benefits without needing to make our repo massive.

Goz3rr · on Dec 12, 2021

Git is good at storing text diffs, any binary file in your node_modules (images, natives etc) are permanently stored in your repo, including old versions or deleted files. I've seen times where this had meaningful impact on both disk use and the speed of Git itself

mohanmcgeek · on Dec 12, 2021

We run an npm server that proxies request to npmjs.org and caches what it got.

andrewmcwatters · on Dec 12, 2021

Reread what you just wrote for a moment and reflect on that.

Also, you clearly have not written software using Node.js on a long enough time horizon. Pinned versions don't mean anything when sub-dependencies can have transient versioning resolution occur.

The reality is that unless you can fully byte-for-byte assure what you have deployed today is what you can retrieve from an old tag, let's say weeks, months, or years from now, you don't have a replicable build.

Most people will never need to do this, sure, but serious operations who will choose Node.js to build some software and then plan to walk away from it later should not only commit their node_modules directory, but also keep a copy of the designated Node.js engine version as well. It's not likely you'll need a backup of an LTS version when you can just go retrieve it, but that's not the point:

You will encounter a scenario in your professional career where retrieval is not an option for some piece of software.

Edit: There are industries where committing prebuilts is normal and has absolute strengths, and having experienced it myself, it certainly is desirable sometimes.

sedro · on Dec 12, 2021

> Pinned versions don't mean anything when sub-dependencies can have transient versioning resolution occur.

That's the purpose of lockfiles-- to pin the entire dependency tree.

danShumway · on Dec 12, 2021

Until very recently, the official LTS release of Node shipped with an npm version that would ignore lockfiles during certain situations when running `npm install`.

If the package.json listed a fuzzy dependency and the lockfile was pinned to an outdated version, it would just be updated anyway. This was fixed in later versions with the release of the lockfile v2 format, but the fix was never backported to older versions of npm, even though those versions of npm were the recommended, default versions that shipped with LTS Node installs if you went to the main website or installed from a software repo.

I think that for a non-trivial number of people, they may not have a lot of trust for lockfiles because they tried using them and they just straight-up didn't work.

johnchristopher · on Dec 12, 2021

Now, that explains some recent oddities I bumped into.

framecowbird · on Dec 12, 2021

> Reread what you just wrote for a moment and reflect on that.

This comes across as very patronising. I would assume the author is perfectly aware of what they just wrote.

manicdee · on Dec 12, 2021

I guarantee they are not aware of the ramifications of what they wrote. They live in the now, following the contemporary paradigm.

I live in the five years future where the propeller head rock star programmer has moved to greener pastures (the ones where he doesn’t have to write project planning documents of any kind).

They are different worlds.

0des · on Dec 12, 2021

Hey it's me, 10-15yrs in the future guy. Pass me a stack of punch cards and let me babble a little bit about legacy code. I found some dusty bourbon in rockstar guy's old desk.

Don't let it get to you, I'll happily cash checks to work on whatever legacy spaghetti tech is in play. Hours are hours, dollars are dollars, and as long as I'm maintaining a happy ratio of those two, I don't mind what code I'm working on. I'll sharpen pencils and sweep floors for 30 hours a week, I'll even listen to your life's struggles if that's what you want me to do for that money.

Don't let me give the impression that this is because all I think about is money or that I don't care, it's quite the contrary.

I think of myself as a developer, I like to think I do a good job, when I get the opportunity to straighten the edges of a sagging beam or create a structure from scratch I take pride in that, as it is my purpose. There's no point in getting worked up about the practices of my peers because that does not serve me. This is my craft, and the person who's creations I am now steward of was also a craftsman, who had different experiences, motives, and contexts that led them to expressing their intent through the code now entrusted to me.

Imagine a television show, or movie franchise; it may have different writers over time commanding the dialogue of the main character, developing their mannerisms, polishing their pearl. These businesses and legacy products we work on are just like those characters who get passed on to new writers. Think of yourself as one of that team, carrying on a legacy, adding your flair and support. Never stop working on your pearl and use every project as an opportunity to fulfill your own desires and express yourself, while honing your skill.

manicdee · on Dec 12, 2021

I wish I had the time to sit down for a bourbon and a chat with you but in order to earn my $26k salary I have to repeatedly unearth these ancient systems, divine how they were supposed to work, repair them to a state where they don't break as much anymore, and then move to the next emergency.

I'd love to join the smoking-jacket crowd but I have student debts to pay, and my other job as a janitor actually sweeping floors to get to.

The world isn't the same place it was when punched cards, smoking jackets and bourbon in the library were a thing.

bennyp101 · on Dec 12, 2021

I've just checked a largish repo, and the node_modules folder is just under 500mb - I could check that in, and be done with it. Updates aren't constant, so it would be every now and then.

That's really not a lot for knowing the code that is being used in your codebase hasn't changed, and the bonus of having everything available should the registry go down, or something.

So I don't think it's 'horrible advice' - it's do what suits your needs best. Some people want to have everything they need to build their application in their control, on the off chance everything hits the fan.

(Also, this: "Don't break the industry practice and check-in your node_modules" - does not necessarily mean it is the best way, it just happens to be the advice from the start.

klodolph · on Dec 12, 2021

I’d say that the main link in that reasoning is that “Git can’t handle this without making the repo a gazillion GBs” which, can of course, can be solved if you weren’t using Git in the first place. Certain other SCMs, like Perforce, allow you to trim history and don’t require you to clone the whole history in the first place.

sofixa · on Dec 12, 2021

With git you can specify the clone depth and only get the latest X versions. And there are ways to trim history with external plugins ( git filter repo).

toxik · on Dec 12, 2021

You have to remake every commit of the repository which basically means you have a new repository, and new commits. In order to do the latter thing you said, you cannot have done the former thing you said.

avgcorrection · on Dec 12, 2021

The author thinks that Git is so obiquitous that they just use “Git” as a stand-in for “VCS” throughout after the first paragraph. So the author definitely thought that Git would be suitable for this.

umvi · on Dec 12, 2021

At my previous company we committed a zipped node modules to git LFS so we could have easy reproducible offline builds without hosting an internal npm instance. Seemed to work well enough.

unqueued · on Dec 12, 2021

If you are going to be tracking binaries, you should take a look at git-annex. It is so much more flexible and powerful. The thing that I don't like about git-lfs is how limiting it is with backend serving, and how you essentially can't remove something from your repo history after it has been checked in.

TAForObvReasons · on Dec 12, 2021

Building an internal npm instance is not hard at all. verdaccio (https://verdaccio.org/) is magical

manicdee · on Dec 12, 2021

Building an internal repository of any kind is not hard at all for you today.

For me, five years after you left the company it’s a pain the arse because all your code refers to this repository that doesn’t exist, the one with the custom packages with no source control, the dependencies which are no longer even in LTS versions of any extant OS distribution, and the Vagrantfile won’t work because it used undocumented perimeters for both Vagrant and that homebuilt hyper visor that you and your team built as a lark (that doesn’t exist outside your personal laptop).

So for me, it’s all the dependencies get checked into the repository, all the tests run before we merge to master, and we do not use any custom in-house infrastructure of any kind.

winrid · on Dec 12, 2021

Running a proxy for dependencies, whether it be NPM or maven, is pretty common.

This way you can build if those services go down. Also, performance.

manicdee · on Dec 12, 2021

Just another service that needs to be maintained but isn’t on the books as something that needs to be maintained leading to a wonderful day a couple of years in the future when a license cull of abandoned VMs means all your code stops building successfully on the same day.

winrid · on Dec 12, 2021

I mean, yeah, gotta maintain it. If it breaks one day every 5 years but speeds up build times 2x every day until then, worth it.

manicdee · on Dec 12, 2021

My complaint isn't about build times, it's about dark repositories which aren't just mirrors of offical repositories but also contain home-grown packages such as "Company X custom VirtualBox Ubuntu box for VMWare" which contains an Ubuntu machine with up-to-date guest tools for the version of VMWare we use, along with the versions of Puppet, NTPd, Samba, and so forth that we use for all our Vagrant-ified infrastructure. Thus we save time over building the guest VM from scratch (about 20 minutes for each box we spin up) but someone has to maintain that repository.

StreamBright · on Dec 12, 2021

Lets address some points instead of cargo culting.

How would you go about the left pad issue?

uranusjr · on Dec 22, 2021

Adding my voice to say this is not a bad idea at all in practice. Before anyone trying to apply the same approach to other ecosystems, however, be aware that (frontend) JavaScript is kind of unique in this. In most languages you’d likely have at least one package somewhere in the dependency chain that is in binary form, and those tend to need a bit more vendoring mechanism to handle correctly, unless everyone in your team have exactly the same setup.

kello · on Dec 12, 2021

This seems like an unnecessary space hog. If you are that concerned about reproducibility isn't that what npm shrinkwrap tries to solve?

davidjytang · on Dec 12, 2021

I don't get why we are concerned about space. Space at this scale (never reached GB level) is so cheap and easy, to me at least, yet reproducibility is not.

orliesaurus · on Dec 12, 2021

yeah dependency fixing is the way to do it

with NPM shrinkwrap you lock down the version of the package you installed and their deps. That way you can use the same package on all envs. Helps with testing and debugging as you're removing a variable (bad deps, outdated deps, newer deps etc)

kleton · on Dec 12, 2021

The google3 repo is probably terabytes at this point

po1nt · on Dec 12, 2021

None of the benefits mentioned in article won't benefit well maintained codebase. If you're forced into such solution you're doing something wrong.

Just use locking, cache and reduce your dependencies to bare minimum you actually need.

In JS you can often find yourself pulling a massive dependency tree just for simple usecase that would be replicable in your codebase under 3 hours.

penguin_booze · on Dec 13, 2021

Can't speak for elsewhere, but at $WORK, the development and CI environments sit behind a package resolver (don't know if that's the right word) which transparently caches and proxies package downloads. So dependencies are implicitly persisted permanently by virtue of usage, even if only once.

maccard · on Dec 12, 2021

People here complaining about the size of the node modules folder - this is one of the reasons people use other solutions like perforce. Checking a 2GB folder into p4 is an absolute no brainer, and for all the flexibility people talk about with git, it's inability to handle this is pretty damning after so many years.

sktrdie · on Dec 12, 2021

Can you expand on this? Why is the size smaller on p4 compared to git?

maccard · on Dec 12, 2021

As the other commenter replied p4 doesn't download the entire history locally, so storage is only a concern on the server. Depending on what you're storing in p4, you can use p4 archive and/or p4 obliterate to manage the actual storage used on the server. E.g. you might use p4 obliterate to only keep the latest version of your node modules folder in p4 if you're trying to optimize for CI performance, or if you're using p4 for build artifacts, you might use p4 archive and store the older revsions on slower bulk storage.

masklinn · on Dec 12, 2021

P4 does not store the entire history locally, so your only limit is the amount of space on the server.

Git will pack objects after a while (initially they’re stored as-is, just compressed), but large files or enormous amounts of changes can make the repository grow to unwieldy amounts, and while git is able to perform “shallow” clones (clones which only store part of the history), not everything handles them well

oftenwrong · on Dec 12, 2021

Git also has "partial" clones, which can avoid some of the downsides of shallow clones. You can, for example, filter out all historical objects (--filter=blob:none). Historical objects are fetched lazily as needed, such as checking out an old revision. The same can be done with unreachable trees.

This article has excellent diagrams that depict the different types of clones:

https://github.blog/2020-12-21-get-up-to-speed-with-partial-...

littlecranky67 · on Dec 13, 2021

The article contains some false assumptions and false statements

> Having your node_modules checked in guarantees that two developers running the code are running the exact same code with the exact same set of dependencies.

No, it doesn't, your system environment is important. Any code executed can check for environment parameters and branch accordingly. For example, a simple "if (macOSVersion === "10.10") {} else { }" would run different code branches and possibly produce different results, even when executing the same code binaries.

The reason you don't check in node_modules is that differences in system env during build time produces different build results - checking in node_modules fixes that, but does not handle system differences at runtime.

wtfrmyinitials · on Dec 12, 2021

Perhaps you could achieve the best of both worlds by checking node_modules into a git submodule

saghm · on Dec 12, 2021

I don't disagree with the general idea, but in practice, using git submodules always ends up feeling like I'm in the worst of most worlds

DaiPlusPlus · on Dec 12, 2021

Subtrees, instead of submodules, are meant to be super-cool - and I'd love to try them, but I'm still far too wedded to existing git tooling (namely GitKraken) where there's still no support for subtrees.

z3t4 · on Dec 12, 2021

Suggestion how to manage node_modules: When testing different modules use npm/yarn, but once you have decided which module to use, fork the module on Github (most node_modules are on Github) then link directly to your fork in package.json like this: https://github.com/{user}/{module}/tarball/master Now you can use git/Github instead of npm to manage package updates.

cj · on Dec 12, 2021

> Once you check your node_modules in, there's no need to run an install step before you can get up and running on the codebase

Is this really true for packages with pre/post install scripts?

DaiPlusPlus · on Dec 12, 2021

Only if those packages' post-install scripts only mutate their own node_modules directory contents - and I can see this quickly falling-apart as soon as the team's dev-boxes becomes a heterogenous environment (e.g. someone using Ubuntu-on-WSL onboarding an all-M1 Mac team).

Anyway, it's a given that I disagree with the article's specific point (i.e. to commit node_modules to source-control), however I am sympathetic to arguments about avoiding another left_pad incident, but there are better solutions to that then simplistically committing node_modules:

1. package-lock (though this is an incomplete solution: it helps to protects you from vague dependency version numbers (as it uses cryptographic hashes), but it doesn't store a copy of the npm package, and you need to make sure everyone is using the exact same npm/node tooling versions otherwise your package-lock file will be clobbered by different users checking-in wildly different `lockfileVersion` versions.

2. git LFS: every so often (once a month or so?) in a separate directory off-to-the-side, add a heavily compressed 7z LZMA archive of a snapshot of your node_modules directory (ideally in a known-good-state). This allows you to keep a repo-local copy of your important dependencies without it cluttering up your commits. While these would be monthly updates - and your actual package.json/package-lock.json dependencies may change daily or weekly - in the event of catastrophe it won't be too much work to track-down any missing dependencies or to revert the deps back to the last known-good LFS file.

3. Use tools like `offline-npm` and Verdaccio, which are NPM caching proxies. If this was 2019 and everyone was working in a central office then you'd run Verdaccio on a single box in your LAN and have everyone configure their NPM clients to route through that box, which then stores every package ever requested - you could presumably run a cron-job to ensure that package cache is backed-up somewhere safe, maybe even with git-LFS as discussed above.

3np · on Dec 12, 2021

> the exact same npm/node tooling versions

I've only ever had issues with differing major versions. That is, sharing lock-files between any node v14.x should work, but expect things to break if you go to v16.

Bellend · on Dec 12, 2021

No it's not. They are talking about their personal preference and it has nothing to do with real life employers.

DaiPlusPlus · on Dec 12, 2021

RTFA.

It says, quite succinctly:

> I currently work at Google on the Chrome DevTools team and we check our node_modules folder into source control

Bellend · on Dec 12, 2021

Yeah I read that and I disagree because at google if I checked in my node_modules I would fail my commit.

DaiPlusPlus · on Dec 12, 2021

The author says they work on Chrome. Chrome doesn't use Google's much-feared monorepo, so maybe they made it work for them?

Can't you ping him internally (don't you guys still use WebEx? lol) to ask him to clarify, and find out how it works for their team?

dzhiurgis · on Dec 12, 2021

tbf Google is so far ahead that it’s kinda not a real company. Aka you don’t have to adopt every best practice google does since you aren’t that big, mature, secure or rich.

tossaway9000 · on Dec 12, 2021

I second this, "google's problems are not your problems"