NPM registry in numbers

basicallydan · on Sept 18, 2014

This is a really interesting analysis.

    From these graphs we could come up with some reasonable boundaries. Packages that:
      - have more than one version: 71853
      - are over two weeks old: 42106
      - had a new release in the last 360 days: 71277

    There are 31888 packages satisfying all three conditions above. Though it’s only one third of all packages there, the amount is still enormous.

Now, if only I could search those 31,888 packages on NPM alone if I wanted to.

EDIT: By the way, for anybody reading this article not familiar with the JavaScript ecosystem, this is interesting:

    Mocha is clearly the most popular test framework, while nodeunit is way behind. And no-one seems to use Jasmine.

Jasmine appears to be unpopular within the JavaScript community but in reality that just accounts for those in the NodeJS and, more recently, the subset of the front-end JS community who use NodeJS tools in their projects.

Mocha, the most popular test framework, has been a Node module since late 2011 [0], whereas Jasmine has only been one since mid-2013 [1]. I have a hunch that the majority of Jasmine users aren't on-board with the movement of front-end JS and NodeJS ecosystems merging, and since Jasmine [3] doesn't really aim itself at NodeJS projects, it's far less likely to be used in any project which uses NPM for dependency management.

Not really that important, just interesting :)

[0]: https://github.com/visionmedia/mocha/commits/master/package....

[1]: https://github.com/pivotal/jasmine/commits/master/package.js...

[2]: https://github.com/pivotal/jasmine

phadej · on Sept 19, 2014

I corrected the jasmine stats. The right package is `jasmine-node`, not `jasmine`. Sorry for my mistake.

Though there are now much more users, there are still less than of nodeunit or mocha.

HNJohnC · on Sept 18, 2014

Jasmine is hopeless for serious node apps, that's why. It lacks critical features that are long standing and bitterly complained about by node developers due to the hostile and half-assed responses from the maintainers to those issues.

aikah · on Sept 18, 2014

> It lacks critical features

Like?

seldo · on Sept 18, 2014

We are working on including more accurate and detailed health metrics into the upcoming re-launch of the npm site. We're definitely going to be including stuff like GitHub stars (but this disadvantages non-GitHub projects), downloads, number and recency of versions, dependencies (and dev-dependencies) and are considering allowing people to filter by license and passing tests. We're also adding the Collections feature which will allow people to curate groups of packages they use or think work well together, and that will feed into rankings as well.

We've barely scratched the surface of the ways we can improve npm search, and we'd value any ideas the community has. If you have specific suggestions, I've created an issue over at https://github.com/npm/newww/issues/131 for you to throw them into :-)

ecaron · on Sept 18, 2014

> It seems that CoffeeScript isn’t anymore popular for new projects.

I'm happy to see some research behind what my gut has been sensing for a while. The distinction between "popular" and "popular for new projects" is very important too. Is anyone else talking about whether-or-not this is an active downward trend?

lowboy · on Sept 18, 2014

That's not to say that packages aren't being written in CoffeeScript as source and then being compiled to Javascript for distribution. That wouldn't require CoffeeScript to be in the deps at all if a dev doesn't use build tools (gulp-coffee, browserify+coffeeify, etc).

joshfinnie · on Sept 18, 2014

I agree, I would be interested to see how many packages are written in CoffeeScript but then compiled for NPM or the open source community.

It is interesting to get people who will not support your project if it's CoffeeScript... I would think this is the driving force and not so much that CoffeeScript is not popular.

lowboy · on Sept 18, 2014

Agreed, which is too bad. It's one thing to not contribute to a project if the source isn't to your liking, but it's another to flat out not use the compiled version.

Besides, I've found that writing CoffeeScript is pretty much like writing Javascript. Different syntax, but the semantics are almost entirely 1:1, leaving out the class stuff (which I don't often use). I've introduced CS to many JS devs and while not all of them loved it, they were able to get up to speed very quickly. But I understand peoples' objections to it - I think they're wrong, but I understand them!

phadej · on Sept 19, 2014

Good comment!

I did make a too quick judgement there. It turns out you can't easily see how popularality of CoffeeScript has changed.

    It is surprising to see CoffeeScript in this list as a language compiler
    should be mainly a dev-dependency. You compile the CoffeeScript source to
    JavaScript for distribution, so you don’t need coffee-script to be a
    dependency. The packages that depend upon coffee-script include among others:
    grunt, jasmine-node, jscoverage, cucumber and hubot. They all allow you to use
    CoffeeScript sources.

While there isn't anything wrong in supporting CoffeeScript out-of-the-box, I like karma's approach more. It has "karma-coffee-preprocessor", it should be self-explanatory.

yaph · on Sept 18, 2014

It is really impressive how fast NPM grows, last year in July I created two network graphs to visualize NPM dependencies. There were about 35000 packages a year and 2 months ago, that number almost tripled. The top 3 packages (underscore, async, request) in terms of number of dependent packages kept their positions.

You find the graphs and some more info at the following URLs:

- http://exploringdata.github.io/vis/npm-packages-dependencies... (takes a while to load, use Chrome or another Webkit browser)

- http://exploringdata.github.io/vis/npm-top-packages-dependen... (includes only packages with at least 10 dependent packages)

- http://exploringdata.github.io/info/npm-packages-dependencie... (info post)

rattray · on Sept 18, 2014

> It looks like underscore is more popular than lodash. I think this table is flawed.

I think this way of going about data analysis is flawed... "I don't like this conclusion, so I'm going to massage the data until it supports my preference".

I do agree that it'd be nice if the npm maintainers went about clearing out all the dead projects though.

untog · on Sept 18, 2014

Yeah, I don't understand that. Why would lodash automatically be more popular than Underscore?

lowboy · on Sept 18, 2014

As tbassetto said, I can't see any reasons to use Underscore over Lo-Dash. Performance is on par or better, AMD/CommonJS modularity out of the gate, and their CLI can be used to build a minimal version of the lib based on the functions you need.

I wrote an article on how you can analyze source code and produce a minimal Lo-Dash build in only 73 characters: http://jjt.io/2014/07/18/analyzing-source-files-to-automatic...

tbassetto · on Sept 18, 2014

From what I've read, lodash does everything underscore does, faster (and it also does more). I may be missing something but there are virtually no reasons to still use underscore unless you just don't want to update your project's dependencies (which is fine, no harm done).

ChiperSoft · on Sept 18, 2014

I think it was supposed to be a joke about the rivalry between underscore and lodash.

nawitus · on Sept 18, 2014

>Always remember to check the licenses of transitive dependencies. There are packages which say they are licensed under MIT, yet they depend on an (A)GPL package! That might or might not to be an issue for you.

I recommend using a tool like license-checker to create a list of all the licenses. It also shows the unknown ones, so you can start digging for the licenses. Like the article states, there's a large number of npm packages without licenses. I've usually made pull requests whenever I stumble upon a package with missing package.json license information, and I hope the situation is slowly improving.

orf · on Sept 18, 2014

I'm really impressed, PyPi only has about 37,000 packages[1] and it has been around for much longer. Really shows how far Node has come.

It would also be interesting to see the average number of lines each package has.

1. http://tomforb.es/how-much-code-is-there-in-the-python-packa...

efdee · on Sept 18, 2014

It's not as impressive if you consider that people make modules for everything, no matter how trivial. Case in point: https://github.com/blakeembrey/upper-case/blob/master/upper-...

untog · on Sept 18, 2014

I've found myself infuriated by this recently. I do understand the desire to compartmentalise everything, but the last time I installed Express (a web framework) I also had to install modules to read POST bodies and serve static files. Quickly my JS file becomes more require() statements than anything else - it just doesn't seem worth the hassle to me.

morganherlocker · on Sept 18, 2014

You can have both. Frameworks like turf (a geospatial analysis library that I authored) have one high level convenience module[1]. You can require('turf'), then run any function from a single object (ie: turf.buffer(), turf.smooth(), etc.).

Finer control is often required, however, for all sorts of reasons. For these cases, turf is made up of a collection of sub-modules [2][3], and these can be required individually.

I like this approach, since it provides a nice curated framework for users simply wanting to get stuff done quickly, but it also allows for more tuned usage in production environments. It has been a bit more work to maintain (~70 repos, instead of 1), but I have been pleased with the approach so far.

[1] https://github.com/Turfjs/turf [2] https://github.com/Turfjs/turf-buffer [3] https://github.com/Turfjs/turf-smooth

olalonde · on Sept 18, 2014

There are higher level frameworks, Express is meant to be very minimalist. It is also usually a good idea to serve static files directly through Nginx or a CDN. If you find yourself always requiring the same modules in most of your files, a common pattern is to have a `common.js` file where you export all your modules and you then only need to have one `require('./common')` at the top of your files. I personally don't really like this pattern as it makes refactoring harder and is usually a sign of bad architecture.

aikah · on Sept 18, 2014

> There are higher level frameworks

Like?

lowboy · on Sept 18, 2014

Modularity is good for restructuring, among other things. I'd rather type out a buttload of require() statements than have to try and de-couple parts of a monolithic system if I wanted to replace one part.

Also, static files should be a part of Express:

>Express 4 no longer depends on Connect, and removes all the built-in middleware from its core, except express.static

phadej · on Sept 18, 2014

I like the minimalistic nature of npm packages. It also encourages people to publish own packages. If you create an useful test assertion library, you don't need to turn it into full blown framework with all possible bells and whistles!

vlunkr · on Sept 18, 2014

Oh man I hope he wasn't being serious when he wrote that. But you are right, number of packages != quality of packages

kmike84 · on Sept 18, 2014

Npm is impressive, but pypi is also growing nicely - currently it has 48k+ packages (the article is from Dec 2013)

phadej · on Sept 18, 2014

It was enough to fetch only the package metadata for the analysis in the post.

One could fetch the actual packages too (e.g. the smaller ~20k package set). But there should be more motivators than just counting the amount lines of code :)

lukasm · on Sept 18, 2014

That impressive number is caused by tiny standard lib in JS. Node packages tend to be smaller and depend on others.

Average Django app has less deps than similar Express app.

joemaller1 · on Sept 18, 2014

What tools were used to discover and digest the data? I'd be very interested in the process behind this post.

phadej · on Sept 18, 2014

Short version: node.js of course!

A bit longer one:

One can fetch `jsverify` package metadata from http://registry.npmjs.org/jsverify and all current packages are listed in http://registry.npmjs.org/-/all (this one is special, its size is around 50MiB). Please cache your results, let's not DDoS the registry.

There are around one gigabyte of nice JSON data. After initial fetch you can traverse it using any tools you want. I naturally used node.js for that too.

redidas · on Sept 18, 2014

If you'd rather work with a SQL database, here's a module that attempts to put NPM into a Postgres database: https://www.npmjs.org/package/npm-postgres-mashup

phadej · on Sept 18, 2014

Wow, someone had time and motivation to write all of that boilerplate there. Yet e.g. the license parsing is very naive: compare https://github.com/npm/npm-www/blob/99020b5b3e21607dab24cd69... and https://github.com/rickbergfalk/npm-postgres-mashup/blob/56d...

terryf · on Sept 18, 2014

From the version distribution graph it is evident that npm hosts 0 packages with 25k+ versions, one package that has almos 20k versions and 40 packages with 0 versions.

That seems to be at odds with the numbers quoted in the rest of the article. Strange.

(Hint: please do label the axis on your graphs. thanks.)

pluma · on Sept 18, 2014

They are labeled (now).

kyberias · on Sept 18, 2014

Note to the author: when presenting numbers in a table, one should place them in a separate column and align them to the right. It makes them easier to compare.