Hacker News new | past | comments | ask | show | jobs | submit login
Devs unknowingly use “malicious” modules snuck into official Python repository (arstechnica.com)
244 points by zymhan on Sept 18, 2017 | hide | past | favorite | 110 comments



The casual culture of pulling in hundred of dependencies, and mushrooming language specific package managers is ridiculously insecure and has to go.

There is no way for anyone to know what all this code is doing, there is little way to verify updates and its simply untenable.

If some developers like this sort of unsafe practice it should be strictly limited to their machines and in no way make it across in any form as a deployment artifact.

There are already secure distribution package managers with the necessary infrastructure, you are not special, use those. Ruby is already paying a price for imposing dependency hell of users and wasting millions of man hours. Many have suffered and do not even bother with Ruby apps anymore. Node and others who think this is a good model will be next. Users should simply boycott such user hostile developers and languages that encourage this kind of insecurity.


Everything we do in life relies on a trust system more or less. Reproducible build only guarantees integrity, not the trustworthiness of the code.

> There is no way for anyone to know what all this code is doing, there is little way to verify updates and its simply untenable.

Because few people have time to read source code. Take Django as an example. Big community, lots of contributors. Can we say we should trust the code because the number of eyes? Probably, but not always.

Why? We can overlook and pretend the code is legitmate. Take Linux kernel or some of the crypto projects out there. A lot of unreadable old tricks made backdoor really easy. [1] is interesting because someone apparently made an authorized changeset to BitKeeper. Are you going to read JRE code to make sure no backdoor? Nah.

Is there a solution? Nope and will never have a solution. The best one can do is using a DVCS to prevent unathorized changeset (provided every contributor does a sanity check for pulling in changeset) and trust the community. No machine can distinugish a good code from a bad code. Humans can't even tell unless raising a red flag.

Sometimes change of project ownership can also introduce some uncertainty, but that is a rare case for major projects.

I am pretty confident there are code in the Linux kernel no one has ever touched or ever read for years.

[1]: https://freedom-to-tinker.com/2013/10/09/the-linux-backdoor-...


> someone apparently made an authorized changeset to BitKeeper

I suggest you read the link you mentioned. It was not an authorized change in BK, it was not an authorized change at all. It was only in a third-party CVS mirror, and it was found exactly when someone asked why there was a changeset in the CVS mirror that wasn't in BK.

And while I don't doubt that there are enough unreadable code to hide a backdoor in, certainly anywhere in the massive driver tree, this backdoor is a fairly obvious one. I don't think you could hide an assignment as a comparison through code review. It's such a common error to make that it really stands out.

> Are you going to read JRE code to make sure no backdoor?

That is a highly misleading question.

While I will trust the JRE developers without checking every single change, they are a diverse bunch enough that the fact that they are checking each other's work goes a long way.

It's very much the same situation as with the kernel and the compiler. The difference is only one of magnitude.

The difference between that and nodejs or ruby where any one can upload anything, anonymously and completely unchecked, is enormous.


I was intended to say unauthorized change but spell check fucked me up, although you could probably tell my actual intention from the rest of my comment. Thanks for pointing out.

I ack when it comes to size my comparison is not fair, but I am trying to say just because there is a large community, we sometimes still overlook. Usually big projects would have some "module owners" to approve merge. This limits to a number of people. If the project is active enough, there will be dozens of commits or more per day. Sometimes people really do overlook and let things pass. Until someone spots something wrong, the code could have been in the wild for days or years.


Distributions - let's take Debian as a concrete example - provide an audit trail to individual identified developers as a mitigation for users relying on trust. Just because we must necessarily rely on some level of trust does not mean that we must blindly trust, which is what happens when anyone can upload to a repository such as PyPI.


As far as I know, once the project name is taken, that ownership belongs to the account first created this package in PyPI.


Not only reading source code - how many people disassemble their compiled code and check for any unwanted side effects the compiler may have generated? For any non-trivial code base, this is not an option, hence where the implicit trust in the tools lies, and also the potential for exploitation


Only the developers know why they need 150 deps, there is no way for end users or deployment to verify this chain or stay on top of this.

They simply need to deploy the app and keep it updated safely. If developers and languages have not thought of this basic step then please defer to the distribution package manager which has.

These kind of frauds won't pass the average distribution package managers scrutiny and CVEs will have quick updates that are tested to work.

Contrast this with users scrambling to update affected apps and their individual local libraries which in turn may have their own specific deps which may not have been updated and thus will fail because of version inconsistencies. End result. Millions of man hours wasted because of clearly bad engineering practices.


These problems you bring up are a different order to the "grab stuff from wherever" culture that seems to have sprung up around a lot of languages lately.


NPM is my primary reason for not using NodeJS. I installed a specific package in an empty project and got over 690 dependencies. For running `npm install {package-name}`.

What's worse is that I skimmed the tree to check for anything particularly heinous, but there was nothing that stood out as unneeded.

With such a tiny stdlib, especially out of the browser environment, there's not really a better alternative than to make it easy to include dependencies for your dependencies. Without NPM, the Node community would be tiny if not already dead.

I don't have a better idea either. For my purposes, it just means I use a different language, but that's not really a solution.


A package manager must at a minimum address code security, dependency hell and deployment. A json parser that wgets urls into a folder is not a package manager.

These things must be thought of upfront and cannot be left vague and open to adhoc practices that leak to end users and deployment creating insecurity.

For instance a package should be a package, it should not be a simple class or function that in turn pulls in a hundred dependencies. 10 packages like this can pull in 600 packages all with conflicting versions. This is exactly dependency hell that creates insecure practice as no one can verify the chain and sheer number of packages.

Language developers cannot just sit back and let this happen. Eventually you will pay the price for this kind of shoddy engineering. This should be the minimum required from any responsible language.

And best then if you don't use the system package manager, do not mix up both to create a frankenstein language that multiplies complexity for everyone.

The biggest thing is stop being faddish. Recognize upfront that there are always developers and groups jockeying for influence and creating dependencies on their apps and platforms. Discourage them from hoisting their self serving bad ideas on the ecosystem and multiplying complexity for everyone. Ultimately the language and ecosystem have to develop consensus on robust engineering practices or a wildland which will eventually lose users.


Who says? npm is proof that package managers need to do none of these things to be successful, and indeed the traditional design of a package manager as a constraint solver is self-defeating, because it puts a perverse pressure on libraries to have as few dependencies as possible and to avoid using the package manager to resolve their dependencies.

The reason npm has teething problems is because it's one of the first package managers to actually reliably address the problem of libraries depending on other libraries. Almost incredibly, nobody bothered to solve this problem before npm. i.e. npm addresses the case of library X depending on Z v2.0 and library Y depending on Z v3.0 without falling flat on its face.


How does npm solve the transitive dependencies problem?


If by "transitive dependencies problem" you mean a case where package A depends on package B which depends on package C and you want to use A in your project, then it solves that by simply pulling in all three of those packages.

Or did you mean the case where you want package D as well, and it depends on a conflicting version of package C? In that case, it solves the problem by pulling in both versions of package C and running both side by side.


I do mean the latter - A and D both using different versions of C.

Running both C side-by-side cannot work if C is incompatible (that is, even if both versions are api compatible, but each assumes it's the sole C being loaded - and therefore, do some static/singleton crap that might get clobbered when loaded again).


The nature of the Javascript language makes it possible to sandbox an entire library, ensure that two versions of the same library can run side-by-side without conflicts (because they are both in a different sandbox).

The way sandboxing works is actually not specified by the package manager (npm) nor the language itself (Javascript); each consumer of npm packages can roll their own sandboxing mechanism (webpack, browserify, nodejs, etc.).

There isn't even a common specification for the way packages should export public symbols. You have a choice of CommonJS, AMD, Ecmascript 2015, etc.


> Running both C side-by-side cannot work if C is incompatible

I've honestly never encountered an NPM package which couldn't be run side-by-side with another version of itself. This is due to the nature of how Node's module system (CommonJS) works; packages are isolated from each other and only share resources with each other via explicit exports and imports.

I suppose a conflict might be possible if the package was using native extensions or connecting to some external service or something, but for the most part NPM's module system makes conflicts very unlikely.


npm is today about 15 steps behind where CPAN was doing the hight of Perl, have anyone actually tired CPAN on something new say an aarch64 or even good old armv7 CPU, or a excotic new uclib/busybox based linux distro like alpine?

How confident are anybody that a random CPAN package's original maintainer is still actively maintaining anything given the average perl hackers age? and how confident would you be that the NPM repos are going to suffer under less bitrot then CPAN in say 10 years from now?

How confident are you that all major versions of your own code will be either removed from any active package manager or production install or patched for all known security flaws?

And for last the big one how confident are you that all of the fixed version dependencies you add can you make any guarantee that they wont be a big open security hole sitting waiting for anyone still stupid enough to just npm install code 3 years from now?

NPM benefits from still being just before/around peak hype where most of the people who commited to npm's repo's are still early in their careers, at one point it will face the point where most original package submitters have abandoned the task of maintaining them, even if node itself remains around to an greater extend then Perl still does today?

In the linux package management world it have significance when a project is packaged and put into a core repo, as most Linux distributors promise to help fix abandoned code included with the core distribution, nobody in the world of gem, yarn, pip or npm makes any such guarantee and nobody screens new package maintainers before granting them a "name" with commit access to even the degree of Debian or Fedora, which are both fairly open communities.


Who says cancer cells are bad? Look how efficiently they proliferate!


My favorite NPM moment was the time I realized a project's dependencies included all three of https://www.npmjs.com/package/array-unique, https://www.npmjs.com/package/array-uniq, and https://www.npmjs.com/package/uniq.


You specifically chose a large dependency with many sub-dependencies, so yes, that will happen. There's also the risk of installing an outdated, unmaintained dependency.

Do a little research — check the package's npm page, assess whether it's too light or too heavy for your use-case. Check its github page to assess whether it's currently maintained (and how important that is for your use-case). If you're unsure, look at similar packages and/or peruse the source code.

It only takes a few minutes and you'll have much greater confidence because you know you picked the correct multi-byte-string-length-calculating dependency for your use case, not the naive implementation which is 100x slower (for example).


>You specifically chose a large dependency with many sub-dependencies, so yes, that will happen.

You're too kind.

JavaScript is outright unusable without pulling in hundreds of dependencies. NPM's ecosystem is 80% band-aids over terrible language design, which in turn leads to things like this: https://github.com/stevemao/left-pad/issues/4

NPM is the symptom. JS is the problem.


> You specifically chose a large dependency with many sub-dependencies, so yes, that will happen.

To clarify, I did NOT choose that package. Because it brought in 690 dependencies...

Javascript didn't even have a pad left in the stdlib until the kik fiasco. Pulling in dependencies isn't really optional unless you want to start from first principles. Am I saying padding a string is difficult? No. But I am saying it's an incredibly common operation as evidenced by how much broke when it was pulled from NPM.

As as this talk from 2016 shows, the versions that used to be available on NPM don't even pass a reasonable set of tests for a left pad: https://youtu.be/FyCYva9DhsI?t=605 Not even being in the spotlight was enough to catch the bugs there, if you reimplement the world from scratch you're bound to make some errors yourself.


You mean that every time you write a new project you write your own request parser, your own server, your own web framework, your own database access libraries, etc. from scratch?

Packages taken over include misspelling of urllib. What exactly do you propose as an alternative here?

I'm all for limiting bloat, but your rant here seems completely inappropriate for the issue bring described.


> You mean that every time you write a new project you write your own request parser, your own server, your own web framework, your own database access libraries, etc. from scratch?

No, but downloading random code is just insane.

As a comparison, Debian has ~3000 packages, and every single package had an identified maintainer, with its own GPG key, validated in face-to-face meeting with an ID card by three people, an identified upstream, etc. Each maintainer is physically identified, and has passed a number of technical validation steps, explained his motivation etc.

There is also a dedicated security team that can be contacted 24/7.

The system is not perfect, but it provides a good level of security. And this is a project only made by volunteers.


Debian has ~3000 packages, and every single package had an identified maintainer...

As of yesterday npmjs had 516,132 packages, which was an increase of 373 since the day before. Debian's 3000 packages is a couple of weeks of npmjs activity. Even if you stripped out the unnecessary, abandoned, or duplicate packages you're still looking at a something significantly different to Debian.

For what it's worth I think a well-maintained, verified, and known secure subset of npmjs would be a great idea, but the logistics of providing that would be effectively impossible without some serious cash behind it.


The culture around what should be in a package is vastly different. Node has left-pad. Debian has stuff like Apache. Sure, those are radical examples, but the barrier to entry for Debian is pretty high (as is the standard for quality) vs npm where anyone can put whatever out there.

How do things like left-pad even come to be widespread dependencies? Does the node development process involve a lot of "gee I wonder if someone made a package for (simple thing I need to do with a few lines of code)"?


JavaScript has things like left-pad because it doesn't have a standard library of modules for common tasks like other languages do, and because some of its built-in types don't have the same rich set of operations as other languages.


http://www.haneycodes.net/npm-left-pad-have-we-forgotten-how...

is a nice article on the subject of left-pad. If you can't implement a left-pad in less than 5 minutes yourself, you don't know how to code.


Just because you _can_ spend 5 minutes re-inventing the wheel in every project you write, doesn't mean you _should_.


But the large number of packages available, even for seemingly trivial features, are what makes NPM popular and successful. You jest at left-pad, but there's real value in simple libraries they might only be "a few lines of code" because an app is made up of a large amount of "simple features" strung together.

Not to mention the code reuse factor.

If npm took the route of debian's verification/certification, you might end up with no libs at all!


> You jest at left-pad, but there's real value in simple libraries they might only be "a few lines of code"

Nicely said, but you omit the elephant in the room: that the dependencies have their cost, which is quite high and is rarely matched, much less offset, by the value of these "simple libraries".


FWIW I have always thought this would become a problem for nodejs. Encouraging one-liner packages [1][2] is a recipe for an enormous dependency tree, and a totally impossible-to-manage security situation.

Python projects I have recently worked on might max out at 50 packages. The last project I worked on with nodejs as front end had 3000+ packages just for the UI.

[1] https://github.com/sindresorhus/ama/issues/10#issuecomment-1... [2] https://github.com/kevva/is-positive


>As of yesterday npmjs had 516,132 packages, which was an increase of 373 since the day before. Debian's 3000 packages is a couple of weeks of npmjs activity.

Only in volume. In code quality and man hours they're an order of magnitude ahead of all of npm.

(Just consider something like Linux the kernel, GCC, Apache, OpenJDK, core utils, Python, Postgres, etc, compared to the trite that even the best npm packages are).


Debian packages on average have more functionality than npm packages. Such a subset as you describe would be one (or a few) Debian package(s).


> ... Debian has ~3000 packages ...

It's irrelevant to your point but Debian 9 included >51,000 packages [0].

[0]: https://www.debian.org/News/2017/20170617


Debian security are very good at their job indeed, however so many of the packages are so old they’re not usable, furthermore Debian doesn’t enable SELinux by default and has a number of policies missing which significantly weakens the average deployment.


Those comments are all irrelevant to the OP's point. Debian software is "old" by intention; it is part of the spec that it shouldn't be a moving target.

As for Debian making some policy decisions that you disagree with, it's very different for Debian to make decisions than for NPM to make decisions. There's just no comparison.


I agree that Debian is so different to npm as to make any comparison irrelevant, but you should direct that at the comment which introduced the comparisons, not one that pointed out ways it doesn't work to compare them.


Indeed, sorry if I didn’t make it clear - I was commenting on the previous person who talked about Debian - not the OP.


>You mean that every time you write a new project you write your own request parser, your own server, your own web framework, your own database access libraries, etc. from scratch?

No, they mean that we need a better method than trusting some random popular GitHub / PyPy / npm whatever storage and delivery mechanism.

Strong core language libraries ("batteries included"), that come with your distribution of the language/compiler and are well maintained, used by millions, and signed would be a good start to at least have 80% of dependencies being actually dependable.


> Strong core language libraries ("batteries included"), that come with your distribution

There's a running joke in python that stdlib is where modules go to die. Stdlib by definition needs to be more dependable and stable than other places. But that reduces the innovation. There's a reason urllib is terrible to use and we're on urllib3 now, yet the nicest solution (requests) is not included.

See http://www.leancrew.com/all-this/2012/04/where-modules-go-to... This applies to some extent to all languages / runtimes.


> Strong core language libraries ("batteries included"), that come with your distribution of the language/compiler and are well maintained, used by millions, and signed would be a good start to at least have 80% of dependencies being actually dependable.

The trick is avoiding another urllib: shipped as a stdlib, and passed over for libraries with better interfaces.

perl has the idea of core and dual-life modules: when a new version is released, it ships with some additional modules. Some of these modules live in the perl repository & are called 'core modules'. Upgrading them requires a new version of perl to be release. Some of these modules are forked & published to CPAN & are called 'dual-life' modules. Upgrading them requires installing them from CPAN, and new versions of perl can provide a well-tested newer version.

All the modules used to be core-only, and over the years any which weren't heavily tied to the internals of the perl interpreter (like opcode deparsers etc.) have moved to be dual-life instead. This avoids problems like interface issues in the standard library by allowing a new version to be immediately released (with some overhead for developers who need to upgrade it for deployments), be tested in the real world, updated, and then included in the next language release.


Java ecosystem has just as many libraries, and yet you don't see this problem. Why is that? Maven Central is just as available to add to (last i checked, all you need is a public GPG key registered to the MIT public gpg server).


It's not just as available to add to; you can only add to Central through one of three approved hosts unless your project has a special exception. And those hosts all involve an actual review of your artifact, its signature, and your POM.

I've never done it, but it at least sounds like there's a process where you need to convince 3 or more people that adding your package is a good idea and will not hurt security. That's very different from PyPI or npm.


There is only one person and you are not convincing the person much about security nor have to go through any kind of difficult vetting. Just formal stuff about packege naming, properly filled pom.xml and such.

However, you have to sign everything with pgp including updates and that is verified. You also have to own the domain with same path as your packages - meaning name space is larger and name clashes less likely. They actually check this and won't release unless you host project. Which explains why java open source tend to use packages like com.github.my_account.my_project


you don't need to convince anyone other than the central hoster - provide an issue in the tracker with the information required (see the myriad of existing projects https://issues.sonatype.org/projects/OSSRH/issues/OSSRH-3462...).

It's not a free-for-all like npm, but may be npm can learn a thing or two.


> Packages taken over include misspelling of urllib. What exactly do you propose as an alternative here?

1. Similar way of identifying who is responsible for a package to Debian style repo - would allow those affected to identify a named person if malicious code added.

2. Fuzzy matching of package names (so taking the library name urllib also takes urilib, urlib, urllib2 and other names with a short document distance).

3. Built in mechanism to allow third party reviews within the repositories infrastructure.


> You mean that every time you write a new project you write your own request parser, your own server, your own web framework, your own database access libraries, etc. from scratch?

They made it pretty clear that they get them from a distro repository. Most are much better about managing breaking changes and offering years of security patches (at least to core packages).

> Packages taken over include misspelling of urllib. What exactly do you propose as an alternative here?

Stricter guidelines and auditing should have caught this.


You are fighting a strawman. Either use the system package manager which has already thought about and solved these issues or if you must use a 'home brew' package manager, design a secure one. And if you can't do that limit the circus to developer machines.


I've not seen a non-trivial project which doesn't require at least one package outside of the repo, or an updated version. And then you either have to run your own repo and packaging process. (Been there, done that, not fun) Or mix sources. (noooooooope) Or just switch to pypi completely.

Even if you can work with that, you need to sort out differences between your destination system, test system, and all of your developers machines (who most likely run Mac, not your server Linux flavour). Sure, there's still docker, vagrant, etc. But this means slapping more and more layers just to make the same packages from the same source available.

And unless you want a "works on my machine" environment, you can't just leave developers to do whatever and use a different system in production. A home brew, limited system will just result in shadow it, which will use pypi unless your solution is much better and easier to use (and allows fresh versions to be imported)

I've been doing this stuff for years in big projects. It's not easy.


On top of this, updating dependencies is becoming just as important as patching the OS itself, except without the culture of providing non-breaking updates. Often you're left with the choice of having to either upgrade to the latest version and deal with breaking changes or leave the insecure versions in place.

This compounds the dislike companies already have for updating software, updating packages isn't something that's ever planned and budgeted for. A standard enterprise app will be using tonnes of outdated and potentially insecure packages, I've come across some that are a decade out of date with no pain-free update path.

And now it seems that even "systems languages" are heading down this path.


> Users should simply boycott such user hostile developers and languages

I await with interest your newsletter to "the internet" on how to boycott JavaScript.


There are a lot of constructive ways to begin to solve this but I guess you prefer the camel in sand approach.

Every single Ruby post has commentators complaining about dependency hell and steering clear of Ruby apps. This was not the case even a couple of years ago.

This is effectively a user boycott which Ruby may not deserve but has brought on itself by letting the 'break everything crowd' run amok. They have moved on to Node and will move again to the next big thing but it's Ruby left dealing with the fallout.


What has changed in the past few years in the Ruby world that has changed dep mgmt? Bundler has been the thing to use for quite a while now, and it used a version lock for all dependencies. It took Node a little while to figure that one out, and shrinkwrap seemingly led to yarn then back into NPM improvements. Ruby is comparatively not that bad, excepting libs with e.g. C dependencies which Node has as well.


> Every single Ruby post has commentators complaining about dependency hell and steering clear of Ruby apps. This was not the case even a couple of years ago.

Strangely the problems with rubygems and related package managers were apparent five years ago (at my first contact with earnest gem development). It seems the major change is major gems get abandoned—take for example bcrypt. I wonder if the rise in distrust and the rise of abandonded but critical gems are related.


That's developers boycotting the language, not users boycotting developers of the language.


Server side Javascript (which has the abovementioned dependency problems) can be boycotted just as easily as any other language, the JS monopoly effect applies only to the front end.


It's also worth noting that it's pretty easy to do JavaScript development and just not use NPM, so no-one needs to boycott JavaScript, just that particular installation approach.


Sure, and I actually like java, but you know, blanket "don't use javascript" because it happens to have package manager is dumb. I mean, java has maven. It just so happen to be that package managers are useful things and helps to avoid massive mess in anything larger then hello world. Especially if we are speaking about open source projects.


I agree. This type of casual 'language manager install X', or even worse; use this container or vagrant image as your base, is the root of all evil. It is the political core of the devops revolution which allows developers to pollute operations with shortcuts and balloon code to avoid the hard work of maintaining/documenting a system and performing due diligence.


> Many have suffered and do not even bother with Ruby apps anymore.

These sort of grandiose statements have of course existed on the Internet for decades, but with the rise of demagogues like Trump, we see that such statements can be readily believed en masse without a second thought.

Could you provide data or numbers explaining that Ruby's package management system is a factor in new apps not being built in that language anymore?

I don't mean to necessarily add politics in here, but hastily throwing out intimidating messages--"Ruby's package manager is broken and now no one is using it; Node and others will of course head down the same path"--helps no one. First, it harms the morale of new programmers learning these languages who may be lead to believe they are wasting their time. Second, it frankly is rude to the groups of individuals who do work on the package managers themselves; if you have a better idea, build or sponser one.

The burden of proof is on you to provide these statistics, so until then, there's also the third result: it makes you look like a crank, not a professional who has considered the pros and cons of different package management systems.


Don't you think bringing Trump into this is an example of the same things you admonish GP for?


How so? My point is that stating something as a fact without proof can have serious consequences.


I wasn't aware that anybody on Hacker News had an implicit moral obligation to "not harm the morale of new programmers", especially in cases where it might be a open question if said programmers are "wasting their time."

Also, there's a third choice if you have a better idea about how package managers could work: you can use an existing alternative that works better. If said existing alternative happens to be in some other language, and you don't have a strong reason to stay on your current language, then switching languages is perfectly sensible.



Bulshit. The benefits of this "casual culture of pulling in hundred of dependencies" vastly outweigh the harm. I'll be assembling another $6000 job that will take me maybe 12 hours to complete while you write your compiler, from scratch, in your own assembler, made for your own cpu, that you youre gonna cook up from a bucket of sand you collected yourself, from a sandpit you trust.


Damn straight. While you're assembling that $6k job, I'm selling your client's data from out from under them. Everyone gets paid.


i agree that the dependencies are probably unavoidable and probably a net good but... well... just don't forget the system and network security best practices while you quickly put together that project, especially if it's in a public cloud.

btw you should be making $6000 + $1000 or so a month for at least a couple of years doing that, not $6000 a single time. the total lifetime value of setting up an application should be above $30k, hopefully way above, but at the entry level where you are that's a nice healthy number to shoot for.

we bill our total customer base over $6000 a day for this kind of work and it all starts with 'assembling a job' but certainly doesn't end there, or ever (hopefully). good luck buddy!


the problem is that your $6000,- one off will cause your client to face either completely trivial to exploit production systems or exponentially growing operating costs, because lets face it you are going to provide zero hours of post deployment support for that task.


Of course, support is $300/h, what am I, a chump?


Sooo, you read the OpenSSL source code and avoided heart bleed ?


The IP address it phones home to, 121.42.217.44, is located in China and visiting it with HTTP just displays this interesting message:

    Hi bro :)

    Welcome Here!

    Leave Messages via HTTP Log Please :)


    On 2017-09-16:

    Happy to see somebody find it ! :)

    Just curious about how long it would take for people to find those 'bad' packages

    As you see, that's just a toy script, no harm, hope you enjoy it !
It looks like someone (security researcher?) just set up a PoC and didn't intend to actually "weaponise" it.


It may be https://pytosquatting.org/. In that case it was indeed a harmless and quite useful PoC.


No. (Co-operator of pytosquatting here)

While our code was similar in nature, it was non-obfuscated and we always threw an exception telling the user that he installed something he shouldn't.


But you still call an URL first, e.g. https://www.pytosquatting.org/pingback/pypi/urllib2/ for "urllib2" before raising the exception. Though it is not an IP in China and only returns the text "Pingback from package urllib2 in repository pypi".


What would you expect the page to say if they were malicious?


One nice thing about languages like C is that a lot of programmers just avoid dependencies because dealing with them kind of sucks. That's one solution to this problem.


It is good for security research, since that introduces so many security holes of such wonderful diversity that there is a much larger ecosystem to study.


Yeah this way you reinvent a less unit tested, battle field tested and documented wheel.

It's like saying the good thing with not having a cellphone is that you avoid a lot of fight with your girlfriend cause you can't talk as much.


> It's like saying the good thing with not having a cellphone is that you avoid a lot of fight with your girlfriend cause you can't talk as much.

I can't quite see what's wrong with that...


Do you have a cellphone? If so, throw it into the trash right now.


I would if I had a girlfriend and she was ok with it. I feel like it's hard to date without one which really sucks.


I love Python for this. Big standard library. I hate JavaScript for this... So many weird legacy issues that generally get resolved with libraries. Though ES6 went a long long way.


ES6 does not fix the left-pad issue; there is a propensity to use a package to get one small function vs using a library of common helper functions (which is somewhat what e.g. ES6 helps with...less need for underscore or similar). You pretty much don't need jquery now, but so many things depend on it for convenience / backward compatibility / because it's so insanely battle-tested that you know that it will work or at least have a StackOverflow explaining why a particular thing does a thing in a weird way.


> ES6 does not fix the left-pad issue

Actually, for that particular problem... https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...


> there is a propensity to use a package to get one small function vs using a library of common helper functions

Lodash functions are available in both forms.


The node.js community sees that very different and thats one aspect i dont understand, the blinded trust in dependencies. A midsize project will need quickly a 1000+ dependencies and we have to trust them all.


you're not wrong. once i wrote a whole blog post about how to properly include Boost into C++ code using a particular IDE.


I put Boost up there with PyQt in terms of my annoyance with it and how hard is had been to get it working correctly for the things which have required it. The XQuartz stuff with macOS is also pretty annoying. Bash an OpenSSL versions would be another.


Haha, I was just about to come say that.


FYI: "Malicious software libraries found in PyPI posing as well known libraries"

https://news.ycombinator.com/item?id=15256121

(2 days ago, 465 points, 245 comments)


This is a rehash of: https://hackernoon.com/building-a-botnet-on-pypi-be1ad280b8d...

Interestingly, my fake system packages have been downloaded about 480 000 times so far this year


Ouch, and thanks for being proactive getting those names squatted


Well pip has the same problem as NPM : no namespaces by default.

But NPM is worse with all its dependencies of dependencies. Composer (PHP) got both namespace and dependencies right: flat dependencies, it's up to the developer to resolve conflicts, not to the package manager to create insane dependency trees.

It leads to more stable packages and make spotting fakes easier.


I'd say pip is definitely worse. Things like local dependencies, requirements.txt, and virtual env feel hacked add-ons to make pip more like npm


Original advisory (including fake packages): http://www.nbu.gov.sk/skcsirt-sa-20170909-pypi/


The other day, I was using rmarkdown and noticed that every time it ran it tried to source this:

https://mathjax.rstudio.com/2.7.2/MathJax.js?config=TeX-AMS-...

This was really disconcerting since I did not expect to be downloading anything, let alone running javascript from there just to make a chart.


I think there is a value in upgrading the cheeseshop to use "verified maintainers" - it should be simple to do a first pass of "the person who signed the hash of this module has also verified they own the domain requests.kreitz.org by publishing that public key at that domains root"

Even more useful might be domain/keys/kreitz/publickeylisting

The key signing party is much harder to arrange but is easier to be confident about

Personally I am surprised the Post Office does not do key signing

Edit: I thought pip did do cryptographic checks?!


> I think there is a value in upgrading the cheeseshop to use "verified maintainers" - it should be simple to do a first pass of "the person who signed the hash of this module has also verified they own the domain requests.kreitz.org by publishing that public key at that domains root"

That sounds useless. An attacker could verify that they own evilattacker.com & publish their malicious packages there.

I don't think a web of trust is the answer here, because it doesn't really matter if the attack is anonymous or not, and if only trusted people can publish packages, trust will be given more readily to encourage new programmers to contribute.

I think a reputation/review system is better, like docker search's star count, or metacpan's ++ rating.


pip allows you to specify, along with package name and version, the expected hash of the downloaded package. If you do that, then pip will download the package, calculate the hash, and check against what you specified. In case of mismatch, pip will refuse to install the package.

PyPI also supports adding GPG signatures alongside packages, but with no trust/verification process to assert "this key really is the key of the person who should be releasing this package", the signature is literally worthless; anyone who could put up a fake package could also generate a signature for it, and you'd have no way of knowing that the key which signed that package shouldn't be trusted for that package.

It is a very very hard problem, and people need to appreciate that.


How do you check if you are affected?


The list is in the advisory. Check if you have installed any of these:

  – acqusition (uploaded 2017-06-03 01:58:01, impersonates acquisition)
  – apidev-coop (uploaded 2017-06-03 05:16:08, impersonates apidev-coop_cms)
  – bzip (uploaded 2017-06-04 07:08:05, impersonates bz2file)
  – crypt (uploaded 2017-06-03 08:03:14, impersonates crypto)
  – django-server (uploaded 2017-06-02 08:22:23, impersonates django-server-guardian-api)
  – pwd (uploaded 2017-06-02 13:12:33, impersonates pwdhash)
  – setup-tools (uploaded 2017-06-02 08:54:44, impersonates setuptools)
  – telnet (uploaded 2017-06-02 15:35:05, impersonates telnetsrvlib)
  – urlib3 (uploaded 2017-06-02 07:09:29, impersonates urllib3)
  – urllib (uploaded 2017-06-02 07:03:37, impersonates urllib3)


Really close call (I have the correct version of setuptools installed which is what had me worried).


The advisory has a regex, but it's not formatted well for copy-paste (non-ASCII quotes!). Here's a version that works:

    pip list –format=legacy | egrep ‘^(acqusition|apidev-coop|bzip|crypt|django-server|pwd|setup-tools|telnet|urlib3|urllib) ‘


It appears that your version has non-ascii quotes as well.


It's the ',

pip list –format=legacy | egrep '^(acqusition|apidev-coop|bzip|crypt|django-server|pwd|setup-tools|telnet|urlib3|urllib) '

works for me


Haha - I copied the wrong version and couldn't tell the difference! Rookie mistake.


There are many tools that can scan your dependencies and tell you if anything is off. Here's one for reference (cannot vouch for it though as I have not used it):

https://github.com/jeremylong/DependencyCheck/blob/master/RE...


I was wondering the same thing.. and why isn't there a list of package names included in this article?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: