Trouble in Node.js paradise: The mess that is npm

autarch · on Dec 20, 2011

This is pretty funny for us Perl folks. People have been making this complaint about CPAN for many, many years.

It's a valid complaint, but all of the alternatives seem to be much worse.

I think the idea of using various metrics to pick the best options is a good one, and something we've been pursuing in Perl-land for many years (kwalitee, CPAN Ra(n)tings, metacpan.org, cpantesters, and more).

peteretep · on Dec 20, 2011

Oh God Yes.

When I first started to use JS, I loved it. Having recently done some hardcore development in it, I realized everything I loved about it was the freedom it gave you ... which is what I loved about Perl.

Then the reality of: weird type coercians; the cesspit that is npm; weird OO (rich from a Perl programmer, I know); no Moose (although I hear Joose is coming along); no DBIx::Class; wtf is up with the scoping; etc etc etc... became too much.

I use JS now if I need to write a client-side app. For any other purpose ... why bother?

mcantelon · on Dec 20, 2011

Evolution > central planning doesn't. CPAN was a great stride in open source culture.

gizzlon · on Dec 20, 2011

And it's still improved upon.. metacpan.org, f.ex, is a nice improvement.

substack · on Dec 20, 2011

Perhaps you could just be more narrow in your searches with `npm search`? Perhaps instead of searching for "asset management" you could just search for a tool that does browser javascript bundling and then search for another tool that does css bundling. There are fewer of each of those and you can refine your search further by looking for specific things like bundlers that do node-style require()s versus AMD-style requires.

Plus, on http://search.npmjs.org you can see how many people have starred a project. You can star a project yourself with `npm star`. For a lot of projects you can click the home page or github links from the project page too. These are really useful to get a quick glance of what the API is like. If a project doesn't have either of these then it probably isn't worth using.

mikl · on Dec 20, 2011

Thanks for the tips – I do look forward to the stuff that Isaac and friends are cooking up with more metrics, but I'd still like to see more collaboration and less duplication :)

malandrew · on Dec 20, 2011

You may also want to checkout the project pages for the most recent Node.js Knockout. Those pages often list all the modules used by the teams.

To find modules worth knowing about, I started by looking at the modules used by the winning teams.

In fact, the best way to get into any community and know what projects matter is to find out who matters and then follow them to the projects that do matter.

murz · on Dec 20, 2011

https://www.ruby-toolbox.com/ is one way the Ruby community solves this. I'm not a huge fan of their font choices, but the interface provides an easy way to see which gems in a given category are the most popular or most active.

For example, if we were looking for an asset management package like the OP, we would quickly see that Jammit is the most widely used: https://www.ruby-toolbox.com/categories/Asset_Management

davej · on Dec 20, 2011

By the way, it's not as fully featured but..: http://toolbox.no.de/

minimax · on Dec 20, 2011

With respect to the OP's specific problem, namely serving up compressed static assets, why use Node.js at all? They're static files. Host 'em on a CDN or let lighttpd take care of it. There's no need to add the complexity of Node to a relatively simple problem.

jashkenas · on Dec 20, 2011

Check out:

http://toolbox.no.de/

http://search.npmjs.org/

secoif · on Dec 20, 2011

The problem is neither of these tools give you useful metrics up-front: I want to compare the number of watchers/forks/updated/dependencies, but instead I have to wade through each result item to make a decision.

It is open source though, https://github.com/activesphere/nodetoolbox so I guess I could go fix it.

AdrianRossouw · on Dec 20, 2011

I don't believe it's an issue, as node modules tend to be far smaller in scope and have far fewer side-effects than Drupal modules. I personally find projects mostly via github. I try to avoid any project that tries to do too much. The moment it registers a route or renders a page, I'm gone.

There is no 'node.js way' to do things, so you gain the freedom to find tools that work more closely to how you would like them to, instead of just being forced to integrate with the existing stuff "because it's there".

There are no holy cows, or 'core that shall not be modified', because everything is small enough that they are easily forkable/hackable and you can easily maintain your own forks if you really had to. Npm can even use packages directly from github.

NPM is a dream compared to drupal.org infra and the (albeit useful) hack that is drush-make.

itay · on Dec 20, 2011

I understand the author's frustration, but I don't think this is an issue with npm - rather, it is an issue with the culture.

Most node packages are small - rarely more than a couple hundred lines, and rarely do more than a single function (and do it well), in terms of functionality. Given that, it's easy to understand why people tend to reinvent the wheel - it's fun, exciting, and many times you think you can do it better.

mikl · on Dec 20, 2011

Indeed, the cultural problem is one of the main points of the last section of the post :)

malandrew · on Dec 20, 2011

I agree with modeless in that duplication of effort is a non-problem. AFAIK The Node community is taking a different more scalable approach to community management that is painful short-term (especially for those looking for a Rails, Drupal, Django experience) and more valuable and scalable long-term.

Curation creates bottlenecks, single points of failure and is very subjective.

The fact that members of the community are thinking about objective ways to measure modules is better for the community long-term. Quality of the code in packages is important because it suggests that the module will be more maintainable and extensible long-term. But that is just one factor.

I hope the community leverages weighted social proof as a way to suggest which modules are fittest and should survive and prosper. Those developers who are most active in the community and contribute the most are also people that ship code and rely on the code of others in the community. One of the best way to determine which modules to use would be by weighted popularity where the usage of a module by someone of importance in the community carries more weight than usage by a non-contributer.

Basically, the idea of number of "watches" and "forks" in github needs to be taken to its logical conclusion because not all watches and forks are created equal.

I would imagine that these are the kinds of issues the node community is considering as they try to come up with a scalable, objective way to manage what succeeds.

Personally I think this is a much better solution than the approach in other communities where certain library/technologies are foisted upon you under the auspices of convention over configuration. Some communities no longer just defining conventions that are widely adopted, but are picking winners and losers among newer technologies instead of allowing time for the community decide.

The community and open source projects around Node.js are growing too fast to subject it to curation and expect the truly great projects to emerge naturally.

As an endnote, the Drupal community and node.js community are fundamentally different in their approach. Drupal is an entire platform and framework, where many of the decisions of which module to use are make for you and you accept them or have to hack away to change that decision. This works in the Drupal community because by and large the problem space, content management, Drupal addresses is much smaller than that of Node.js, any problem that is better solution with asynchronous non-blocking I/O.

substack · on Dec 20, 2011

I discussed this with isaacs in one of the nodeup episodes and we concluded that a pagerank implementation for authors and projects that reads in the npm stars and dependency graph might work. It would also be useful to consider the github watchers, elapsed time since the last update, and the presence and status of tests, perhaps by ingesting data from travis-ci.

Edit: turns out what I have in mind nearly already exists at http://eirikb.github.com/nipster

malandrew · on Dec 20, 2011

That's basically what I had in mind too -> edgerank for libraries. I reckon you'd need to have some sort of decay function that is relative to the amount of activity in that problem domain.

In inactive problem domains (such as linting, which these days sees few commits and has almost no competition), you wouldn't need as strong a decay coefficient. Lack of activity suggests a solved problem or something that is no longer a problem.

Highly active areas such as asset management (ender.js, browserify, require.js, etc.) you'd probably need some sort of coefficient for that problem domain. Tagging could be used to strictly or loosely assign packages to a problem domain.

Acceleration is another issue worth considering. How quickly is a project being adopted among those that matter.

Anyways, it's not a trivial problem to solve by any means once you get around to measuring social aspects surrounding a module, but it certainly is a step in the right direction. It's also a problem whose solution can be continually refined.

TBH, I reckon that any refined system is going to look increasingly like a financial mark where certain behaviors are analogous to actions like put and call options.

It'd be awesome to have a smidgen of transparency into private repositories in the form of aggregated gripping of the require() statements in active private projects. You could go even farther and look at the number of method invocations of a particular library.

It would be cool to be able to view such data in the same way Google Trends works. For example, it'd be interesting to compare optimist, nopt, commander and nomnom.

(going to stop now because I'm just rambling now. hehe)

dkubb · on Dec 20, 2011

Another idea I haven't seen mentioned is having a project building properly via travis-ci.org or some other CI system shows that the project is actively maintained and passing an automated build. Perhaps other metrics like code coverage could be used as well.

modeless · on Dec 20, 2011

Duplication of effort is a non-problem, and any attempt to "fix" it would reduce contributions and discourage innovation. With good metrics the best solutions will rise to the top over time.

astrodust · on Dec 20, 2011

The problem is metrics. How is this problem addressed?

Ruby and Python seem to suffer the same problem with a lack of feedback on the quality of offerings.

wycats · on Dec 20, 2011

In Ruby, the Ruby Toolbox does a good job of providing reasonable, automatable metrics. https://www.ruby-toolbox.com/

mikl · on Dec 20, 2011

I'd say every minuted wasted reinventing the wheel is a minute that was not spent solving interesting problems.

I'm not sure what you're referring to when you talk about fixes that would discourage innovation or contribution. I'd say I suggest the opposite, in fact :)

dextorious · on Dec 20, 2011

"""Duplication of effort is a non-problem,"""

Really? Because of I've been following OSS communities for, like, 15 years, and it has been a huge problem in most of them.

malandrew · on Dec 20, 2011

Duplication of effort leaves more room for experimentation, so it's not all bad.

If there was only one canonical version of all modules, it would be very limiting.

For problems that are deemed solved or community issues, having one or canonical module is great. A good example of this is JSLint and JSHint.

For problems that at TIMTOWTDI, duplication of effort allows exploration of the problem space. Once the right location for the new hive has been found we'll just headbutt the other worker bees into submission :P

dextorious · on Dec 20, 2011

I agree -- I was basically answering the parent that it's a "non problem". Sometimes it IS a problem.

shapeshed · on Dec 20, 2011

Isn't this just that node is a newish framework? Discoverability and metrics of quality are issues but I think many devs are excited by node because it is green field. More libraries are good in the long term as the best ones will rise to the top. npm itself solves dependency management really well, but I agree currently a feature like npm star doesn't really work. I'm not sure npm should be solving this problem though. The most valuable data comes from GitHub on modules.

JoeAltmaier · on Dec 20, 2011

Hey, the beauty of great support libraries is, its a joy to build on top of them. So lots of people do.

Collaborate? Makes sense if you're doing it for a job. But for fun? A hobby? To learn? No, then you have to do it yourself, that's the whole point.

Maybe npm could organize, some kind of 'social score' for how often projects are used or how many '+'s they get or something.

But lets not discourage innovation, or hacking, or re-inventing something already solved just for the joy of it!

krmmalik · on Dec 20, 2011

While i think the author has some valid points, I wonder if the concerns are a little premature. I'm still learning Node, so not qualified to comment on many things, but i do feel that the approach the core contributors have taken with Node has been a pretty mature one so far.

I'm sure they'll figure out a decent solution soon enough.

amix · on Dec 20, 2011

I hope that the node.js community will embrace even more democracy and even more choice when handling the "standard library" and the "external library". Here is what I wrote 2 years ago about the ideas for a future language that embraces democracy instead of tyranny.

What I can see is that npm relies heavily on GitHub and the tools that GitHub provides (such as wikis and documentcloud). Personally, this kind of integration could lead to a revolution, since it makes collaboration much easier (it also makes it easier to evaluate reputation of developers, dependencies etc.)

I am a Python developer and I think npm is already a MUCH better platform than PyPi given the tight integration with GitHub and I can only imagine what the future brings.

===

The idea of the social platform is to create a platform where developers can collaborate and a platform that promotes quality software. CPAN, Ruby Gems and Python Package Index are early versions of this vision and they need to become more social to become more useful. The social platform should be applied to the core libraries as well and not just be used for the external libraries. A language's standard library should be a democracy where the best and most used libraries win - and not like now, where libraries win by being selected by a few dictators!

Essentials of this platform are:

* Easy distribution: It should be easy to push out modules to the platform.

* Easy forking: It should be easy to fork modules, to apply patches and to send patches back. Using something like git or mercurial is a must.

* Reviews and a reputation system: The platform should have reviews, but also reputation, something like Stack Overflow's excellent reputation system. I.e. a system where helpful developers are rewarded for their hard work.

* Search and discovery: It should be easy to find modules you are looking for and to compare modules. If I am looking for a template library, then I might want to sort template libraries by how many others are using them or by a developer's reputation.

* Fully integrated into the language: This platform should be fully included in the language and ship with the language. The signup process to get on the platform should be very easy.

* Trac like features included: This platform should include tickets, timeline of changes and a basic wiki.

Most of these pieces are partially implemented, especially in products like GitHub and BitBucket. What is needed is a much better integration with the languages. The bottom line is that languages should embrace "social programming" and implement a platform similar to GitHub/BitBucket that enables developers for easy collaboration.

from http://amix.dk/blog/post/19475

masklinn · on Dec 20, 2011

> Essentials of this platform are:

I think most of your ideas are wrong, and are in fact what hamper pypi: the desire to have everything built in.

You mention CPAN, the CPAN ecosystem has most of the features you describe, but virtually none of these come from CPAN itself. Instead, they come from what people have built on top of and alongside CPAN, CPAN mostly hosts package data and metadata and provides the core on which everybody else can build useful services (search engines, bug trackers, comments systems, testing/CI systems, CPAN.pm/CPANPLUS, etc...)

Pypi has made huge progress in the "search & discovery" category not from improvements to pypi but from `pip` being separately created. Pypi tried to add reviews and reputations, it stank and was removed.

> A language's standard library should be a democracy where the best and most used libraries win

No, a standard library is a core of broadly useful batteries, just because "Rails" or "LXML" are very popular does not mean they belong anywhere near a standard library.

tmcw · on Dec 20, 2011

Nah. Making a nicer, more informative npmjs.org is totally doable and likely an eventuality.

Making a package manager that only occasionally triggers the rage of picky programmers is one heck of an accomplishment.

maxogden · on Dec 20, 2011

In practice (for developers that write javascript well) it takes about 10 seconds of glancing through node package code on github to determine quality.

zgohr · on Dec 20, 2011

Talk about an excellent problem to be running into.

mikl · on Dec 20, 2011

Yes, this is truly a good kind of problem to have – but a problem none the less :)

mcantelon · on Dec 20, 2011

#nodejs irc.freenode.net

If the Github watch count for a Node module can't help you pick, just ask the hundreds of people who are always in the IRC channel.

In the Drupal world someone who proposes a module that duplicates an existing module that does things badly can be blocked. That's a bad policy because one size fits all leads to wearing burlap sacks.

The fact that npm is not a planned economy is brilliant. If you want to add ratings, etc., npm is backed by CouchDB and thereform ridiculously each to build upon. In addition to being able to freely read npm repo info, you can also extend package.json info with your own fields. I remember when drupamodules.com came out, providing a solution for rating Drupal modules (something drupal.org still doesn't have, AFAIK). The guy who made it got grief for not making something blessed by the mothership.

As an example of how CouchDB helps make npm awesome, wanna find every field used in npm package.json files? Boom (thanks to Isaac S for this tidbit)!

http://registry.npmjs.org/-/fields?group=true

krisroadruck · on Dec 20, 2011

This is the same type of thing that soured me on python. Choice overload.