Hacker News new | past | comments | ask | show | jobs | submit login
Git: Malicious repositories can execute remote code while cloning (openwall.com)
634 points by todsacerdoti on March 9, 2021 | hide | past | favorite | 211 comments



The commit that fixes this issue:

https://github.com/gitster/git/commit/684dd4c2b414bcf648505e...

(Surprise, the root cause is a cache)


Another cachelty


well played. I think that just got added to my standard vocabulary. Caching has caused more errors and bugs that I've had to deal with than I can recall. My favorite was an off by one error where we returned nicely cached info -- just for the previous user who came through our system! :facepalm: That was a bad one.


That's because essentially, "state" and "caching" are the same thing on some level.

And the problem with state is that you have to make sure all your state transitions don't cause bugs. What we know as a "cache" is essentially creating new state representing existing state, with all new transitions...


I like to look at caching as a form of denormalization - introducing redundancy to improve performance. And whenever we have redundancy, we have to make sure all our copies are synchronized, which can be tricky, especially in a concurrent environment.

On the other hand, the whole point of normalization in databases is to avoid redundancy and have "single source of truth".

I find the concepts of normalization and denormalization applicable and helpful outside databases as well, though a different terminology is often used.


In that light, a cache is a form of partition, so it can be available xor consistent with its source.

https://en.wikipedia.org/wiki/CAP_theorem


Cache is a type of state but not necessarily the other way around


In the efficient implementation of a pure functional language (say Haskell without MVars), what really is the difference between state and cache?

I know this is overly philosophical, and in practical scenarios we readily (although not always unambiguously) differentiate between "cache" and "state", but the point about transitions and that being a major source of bugs still stands.


> In the efficient implementation of a pure functional language (say Haskell without MVars), what really is the difference between state and cache?

If you want to unify state and cache, you might want to go down a different route:

Think of log based filesystems (or a log based data base).

Instead of defining your operations in terms of state, you define them as pure functions of the log.

So your log is full of operations. Writing just means appending a symbolic operation like `write(key, value)` to your log.

And you define the result of `read(key)`: scan backwards through the log until you hit the last instance of `write(x, value)` and the return that `value`.

Now state means: compact your log by replacing a swath of `write` log entries with one big `snapshot` operation that encompasses many key/value pairs.

Alternatively, you can also define state to mean caching your `read` operations.

In this approach, it's no coincidence that the log is a data structure that has a linear shape: the evolution of state over time is also linear.

(With some cleverness you can replace the linear structure with eg a DAG; and then also think about how you merge divergent states.)


Here goes the obligatory

> There are only two hard things in Computer Science...


I can never remember what they are, though. To avoid this problem, I think I wrote them down on a post-it, but I had too many post-its on my desk so I got rid of them all, and now I can't remember.


The two most difficult ones are naming things, cache invalidation, and off-by-one errors. HTH. ;)


I prefer the ordered version.

three most difficult things in CS:

2) Naming Things

1) Cache Invalidation

4) off by one errors

3) Concurrency


Another version, about distributed systems:

There are only two hard problems in distributed systems:

2. Exactly-once delivery

1. Guaranteed order of messages

2. Exactly-once delivery


two most difficult things in CS:

0) naming things

1) cache invalidation

42) asynchronous callbacks

2) off by one errors


They are likely making a joke about caching the answer


What does HTH mean?


"Hope this helps" (sometimes used sarcastically)


Hope That Help, HTH.


"Hope this helps"

I see it with HAND "have a nice day" too


I've always read it as "Happy To Help".

I see that's wrong, but ignorance has made the internet seem just that little bit warmer all these years!


I think it could probably mean "Happy to help" if said in response to a thanks of some sort. Saying it before someone has said thanks is a bit presumptuous. :-)


this reminds me I was dumpster diving at a place with lots of post-it notes and there was one that said 2HCS => whatchamacallit, CI!

what that you?

on edit: I'm going to let that 'what that you' stand because one of the hardest things about HN posts is grammatical correctitude.


> wrote them down on a post-it

You write it on local media and kept it on-premises?

Cloud is the new thing, I hear.


The Post-it was but a cache.


I would consider the Post-it as persistent storage or the backup. Your memory would be the cache :D


Yeah but the speed.


Oh man...


It's amazing how often exploits come down to optimizations. The general form being "the domain logic is X, and is secure, but we faked around it in this one case to make it faster, and it turns out we made a bad assumption while doing so". Meltdown fits this description too.


Optimizations come from making assumptions, and bugs come from mistaken assumptions.


Same with bugs. Hate to be the mantra guy but:

1. Make it work

2. Make it right

3. Make it fast (make sure you need to) a.k.a. optimize

4. Make it scale


I am really fascinated by the responses to this comment. So many people exclaiming how many issues are caused by caches. In ten years as a fulltime programmer the only cache issues I've seen are cache misses. It probably has to do with one's field. I'm a game developer mainly dealing with graphics programming.


The key problem (as I understand it) is that updating a cache properly requires knowing the exact graph of relations that an entry in the cache has to other entries. So that when that entry changes, you can propagate that change throughout the cache to other concerned entries which need to be recomputed. But knowing that exact graph is too complex a task to be trivial, it seems in this case. Basically it sounds like the non-visual version of rerendering UI when a state changes, which is hard enough even with visual feedback.


A lot of threading issues are also cache related. Forget to properly mark access to shared variables and suddenly every thread /CPU core ends up with its own locally cached version of it.


Yes, but this is such a well understood danger that I've never really been bitten by it in practice.

Along the same lines a lot of GPU programming tutorials warn of inconsistencies between threads and it has never been a problem since I just assume I cannot rely on consistency or order of execution, seeing each thread as separate and independent.


I wish folks who announce security issues would link to the patches for the issues they are announcing. This should become standard practice.


Difficult problems in programming:

(1) cache invalidation

(2) off-by-one errors


The classical three problems.


I thought the two hardest problems were:

1) naming

2) cache invalidation

...

3) off-by-one errors


I'm pretty sure it's:

1) naming

4) concurr2) cache invalidationency

...

3) off-by-one errors


Love this.


You forgot

0) Race consegmentation fault (core dumped)

(I know I was ninja’d but didn’t see until after)


FWIW, your version has the nice touch of also introducing a 0th item.


I feel like we should have solved the naming problem as an industry by now.

Alas.


It’s actually the hardest one of the three, being outside the grasp of formal methods.


To solve this problem we would need to first understand the human mind, how it stores data, how it does computation, and how it interacts with names. So we would need the same set of information that we would need for creating AGI. A solution is probably only a couple of months/decades away.


I really have to disagree, why can't you devise a formal method?

- a good name should be descriptive

- avoid being overly clever, call a spade a spade

- don't optimise for generalisation, naming things is a time to be specific

- aim for short but not at the expense of losing context

- avoid redundancy in naming of things nearby, leverage spatial context

- avoid qualifiers or type information where possible - type should be obvious from context and use, if it's not qualify or refactor

Anything else?


Those aren’t formal definitions. “Formal” means, at the very least, that the specification is done in a formal language, and usually that conformance to the specification can be checked mechanically, that is, by a computer.

https://en.m.wikipedia.org/wiki/Formal_methods


shouldn't your list start from 0?


There are three kinds of programmers:

  1) Those who number lists starting at 1.
  1) Those who number lists starting at 0.
  2.5) Stan Kelly-Bootle, who proposed a compromise.


That’s the joke


Fortunately, many off-by-one errors can be caught with more ergonomic tooling.

For the simplest example: compare the old C-style for loop vs a Python style for-each loop.


I don't find I ever make off-by-one errors with simple collection iteration; at some point "i < len" becomes tattooed on your brain stem. The off-by-one errors I tend to make are related to implementation details of certain data structures or algs. Really, I would describe them more as "thinking at the margins can be challenging." Correctly handling doubly linked lists, that sort of thing.

Oh, and slicing. I will never get Python slicing right the first time. The fact that the range is [begin, end) is just never the way I expect it to work.


But slicing 0..len and for (i=0; i< len) are literally the same thing.

In [0, len) ')' means less than. As in 0≤ x< len.


(3) remembering the joke


Per your downvotes - I used to hate jokes on Hacker News and downvote them when I saw them, but I've become more ambivalent. They're a way of amicably sharing culture and experiences with other engineers that transcend any differences in age, gender, race, background, etc.

The formulation of this joke I tend to see is,

The two hardest problems in programming:

(1) cache invalidation

(2) appropriately naming things

(3) off-by-one errors


It's barely even a joke to me anymore -- it's just too real for me to laugh.

(Cache invalidation is essentially the same problem as managing mutable state -- "Out of the Tar Pit" frames mutable state as either essential or incidental, the latter being rederivable in principle from essential state. Incidental mutable state is no more and no less than a cache, and usually one with an informal and undocumented invalidation policy.)

(And naming things has a very real technical counterpart in addressing, which comes up obviously in networking, but you can also see its shadows in quite a lot of concerns around architecture and modularity.)


Humor is often the most efficient way to communicate/accept the truth.


To make the truth seem like an acceptable parallel universe, and then join it.


The two hardest problems in programming:

(1) cache invalidation

(3) off-by-one errors

(2) appropriately naming things

(4) parallel execution [leading to race conditions / ordering bugs]


I think you win a bad in-joke award - the first annual Turing-Dad joke award.


Minor quibble, it should be the three hardest problems.


It should be, but the increments were run in parallel on nonvolatile memory.


The two hardest problems in computer science are:

1) Naming 3) Cache Invalidation 2) Off-by-one errors 3) In-order once-only delivery of distributed messages

And an almost fanatical devotion to the Pope


I love it told like this: https://news.ycombinator.com/item?id=26406351

1) naming

4) concurr2) cache invalidation

ency

3) off-by-one errors


(5) feature creep


They had this bug before in other code unrelated to caching. To me, that suggests a deeper root.


Strange. The guy who fixed the issue works at Microsoft, but uses his gmx email for Github.


And the guy who announced the new Git release works for Google, but uses his pobox.com email for Git development.


Yes, actually, Googlers are encouraged to use their personal Github accounts.


That is true but the git release has nothing to do with GitHub accounts.


But probably their work email when doing things on company time?


Nope. This is an example of someone working on company time using their personal email.


Right, which is (AFAIK) not usually recommended except for side projects or ones where there is already an existing relationship under a personal email address.


I think you just got a glimpse of a vast sea of internal policy and compliance issues.


Or that the opensource hobby is much more longlived that something as temporary as an employer.


Wait, there are people who use real, in-use email addresses in publicly hosted git repos? I mean, it's likely his spamcatcher address, no?


> (Surprise, the root cause is a cache)

Couldn’t it just as well be attributed to improper file path normalization? If we had only lower case ASCII file systems it would not have caused a problem.


Things can have more than one cause that act together.


That could be any Git repository.

Have you seen the mayhem that some of mine cause when you clone them and then type ./configure && make, like you have been socially engineered into doing?


It doesn't even have to be there... The main reasons to clone a repo are because you're about to compile and run the code there, or you already have and need to fix something.

I don't personally audit all the code I run, but I hope someone is doing it. That being said, source code being public is much better than the alternative of just downloading binaries from who knows where.

I don't trust anything absolutely, and I don't see a way past it.


There is a huge difference between “clone a repo” and “clone a repo and run code from it”.


In spite of my tongue-in-cheek statement, I get it.

It's huge in the context of non-programming uses of Git. If some people are just sharing some text documents with Git, then it's a big deal.

This is likely on the rise.

E.g. if you look at a site like Github, there is a lot of non-code content in it. Some people stash that content, and other people believe that content to just be harmless files that will never perpetrate an exploit just from being cloned.


It’s a big deal regardless of whether documents or code are being stored. Cloning a repo should not open you up to RCE.


Agreed, I frequently clone repos so I can look at the code in a terminal with grep, with no intention of ever building it.


Technically yes, but I can’t think of the last time I cloned a repo without then running code from it...


Well, I clone repos to inspect code all the time, and when I run code, it’s usually not with the same permissions as the corresponding `git clone`. Maybe I should be better about sandboxing Git…


Depends on how you define "running code".

  1. Download container description (Dockerfile)
  2. Upon image build it "compiles things" (e.g. processes/assembles javascript)
  3. Build fails, because it pulls architecture incompatible library (or does not pull architecture mandated library)
  4. Fix build scripts, rebuild container image
  5. Verify container
  6. Pull repo
  7. Reproduce changes, commit
  8. Push
Nothing apart clone-edit-push happens on the repo. The code can be executed on a remote, hardened, isolated system. With proliferation of containers I guess this scenario will become more and more common among ops people.


Any html/js web frontend project that runs in a browser?


Sure I'll just `npm install`.... damnit! Hacked again.


For a while I tried to only run untrusted builds in Docker containers, like doing `docker run -v $PWD:/src node npm install`, but IDEs are not really configured to deal with this. Even my Vim has ALE and would just run node_modules/.bin/tsserver on my machine, which could be anything. Why aren't our tools concerned with this at all?


Because at a certain point you shrug and blame the user for downloading sketchy code and executing it.


I get that you are not completely serious, but before the cmake/meson/... people jump on this:

If ./configure is checked in as part of the official repository of a moderately well known project, I doubt any committer would be stupid enough to insert a backdoor into ./configure or the Makefiles.

What can happen if an apostate project is not on GitHub: Some (usually several) faithful persons decide to correct the situation and put multiple unofficial mirrors on GitHub, and other faithful people clone from a random one of these.

In that case however, they get what they deserve.


Reminds me of another git vulnerability from 2014 on case-insensitive filesystems: https://github.blog/2014-12-18-vulnerability-announced-updat...


Considering how much I `pip install garbage`, this is perhaps not so critical for me.


Isn't the bug really on case-insensitive file-systems that allow symlinks?


No. Why would it be?


> if Git is configured globally to apply delay-capable clean/smudge filters (such as Git LFS)

What is the simple test for whether this is the case or not?

Is this a default-on scenario?


> What is the simple test for whether this is the case or not?

As suggested in GitHub's announcement post[1], you can test this with the following:

`git config --show-scope --get-regexp 'filter\..*\.process'` (replace the single quotes by double quotes on Windows Command Prompt)

> Is this a default-on scenario?

On Windows yes, because Git-for-Windows configures Git LFS by default.

[1]: https://github.blog/2021-03-09-git-clone-vulnerability-annou...


Doesn't Git-for-Windows default configure symbolic link support off, though? Or does this exploit work even in that case as long as the underlying file system supports symlinks?


Git-for-Windows may turn symlink support on by default under some specific circumstances. As the repo's wiki [1] says:

Short version: there is no exact equivalent for POSIX symlinks on Windows, and the closest thing is unavailable for non-admins by default unless Developer Mode is enabled and a relatively recent Windows 10 version is used. Therefore, symlink emulation support is only turned on by default when that scenario is detected.

[1]: https://github.com/git-for-windows/git/wiki/Symbolic-Links


Is this a default-on scenario?

No, LFS is something you would have to explicitly enable, however it is pretty common to do so if you want to store binary blobs in Git.


Git for windows has LFS on by default.


Tangential.. are there bug bounties available for vulns in open source projects?


A lot of open source project owned by large corporations do.

More independent projects sometimes does it to, for example curl.


> This vulnerability affects platforms with case-insensitive filesystems...

What kind of platforms use case-insensitive filesystems?


Linux these days actually. You can make ext4 case insesitive.

https://www.collabora.com/news-and-blog/blog/2020/08/27/usin...

Note: I also learned this today. Had no clue.


Why the heck would anyone want that? That’s arguably worse than adding spaces in file names


MacOS and Windows


In windows, the underlying ntfs is still case sensitive, and that gets made use of with the WSL 1.0 stuff.


Before Windows XP, any application could open a file with a case-sensitive flag to request the operating system to not do any case folding. Starting with XP, the same feature exists but requires a registry key set (and a reboot) to instruct the kernel to allow case-sensitive operations.

Starting with Windows 10, the aforementioned key still works, but there's also a per-directory case-sensitive flag that forces all DOS and Windows programs to have case-sensitive operations unconditionally. This is made to great effect in both WSL1 and Cygwin.


I guess it shows that I haven't really used either of those in a long time.


macOS has defaulted to be case insensitive largely due to historical and perhaps usability reasons. You can opt to make it case sensitive (and I do, which broke Steam for several years but that also freed my time).


When necessary you can make an auto-expanding volume that's case-sensitive and leave your host FS alone. I have not found that I really want to have differently-cased but otherwise identical filenames in the real world at any point though.


Ages ago, I heard from co-workers at a company that I had left that there was an issue because of some file-naming in a PHP application that they were trying to run locally on a Mac. There was foo.php which was the interface and Foo.php which had a class definition in it.

What idiot would name files like that? I said.

You, they answered.

I don't do that sort of thing anymore.


A bug that affects my teams once every few years is a developer will create a file named "A.txt" check it into Git, realize it should be "a.txt" and rename it, and then basically everything will shit the bed and you waste a day figuring out why nothing is working.

A related bug is a developer will make a webpage called /a/ but link to /A/ and then the link will be broken in production. At this point, I have seen this same bug enough times to be able to fix it reasonably quickly, but it definitely wastes time for the team.


Yeah, my current company encourages development in a case-sensitive volume and I assume this is why. But this is more an issue of your development environment working differently than prod than a problem with the notion of case insensitivity per se.


We added a commit hook to block case conflicts.


But git is largely used by programmers.


macOS, to name one. It appears NTFS is also vulnerable according to the posting.


Ashamed to admit (as an OSX user) that I didn't even realize the FS was case-insensitive (having migrated from years of Linux usage to a non-Linux desktop). It does a good job of hiding this from the user (filenames are still listed with cases, and bash autocompletion completes to the correct case as well)


MacOS by default uses a "case-preserving case-insensitive" filesystem, so you can create files with mixed case, but you can't create two files with the same name and different case. It's one of MacOS's more-egregious crimes against Unix. Fortunately it doesn't manifest that often, but it rears its head often enough to be a problem.


It may be a crime, but is the result of a set of compromises in the design of the OSX filesystem, which had to work with a BSD variant while also being compatible with pre-OSX days. I think it’s one thing they actually did an elegant job with.

EDIT: This document describes some of the challenges: https://www.usenix.org/legacy/publications/library/proceedin...


This is the first article I've found that does a decent job of explaining what a resource fork really is.


> It's one of MacOS's more-egregious crimes against Unix.

Nah. Using a file system means putting up with its semantics. HFS+ was case-insensitive; they were deploying an upgrade to millions of existing filesystems.

If you mount, say, an NFS volume, MacOS does the expected thing.


The fact that Linux is case-sensitive is the egregious crime. It's a nasty holdover from circa-1970 Unix when case-folding was an expensive operation.


Case-senitivity is not a "nasty holdover", it is a good design decision that continues to be proven correct (case in point, this bugfix for case-insensitive filesystems).

Why would you introduce complexity into the filesystem to try to normalize file names when you can simply, not? I mean, have you _seen_ the mess that is Unicode normalization? Hundreds of different glyphs or whatever that are all considered equivalent, but are actually composed of different bytes. The filesystem should try to make sense of all that, and consider them equivalent paths?

Even if you say "well, just capitalization, not Unicode normalization," there's the whole German letter ẞ => ss (or is it ß?) and similar friends like the Turkish dotted I that have popped up as articles on HN. Absolutely glad Linux filesystems by and large do not attempt to take that on, and treat paths as a bucket of bytes instead.

All for what benefit - so you can type File.txt in the terminal and have the OS find file.txt? That is much more appropriate for the Application layer to resolve, rather than the filesystem.

Bugs like this come from over-engineering. Filesystems should be simple, and follow the principle of least surprise.


It bugs me to see foo.c and Foo.c as separate files in a directory listing. I like the fact that MacOS doesn't allow this situation to ever happen. Not taking on that problem means it's left to the user to figure out what's going on when similar glyphs occur.


> Fortunately it doesn't manifest that often, but it rears its head often enough to be a problem.

IIRC, one place where it does rear its head in when a file is renamed in a git commit to a value that downcases to the same value as the prior name. For example `Foo.txt`->`foo.txt`.

I have `core.ignorecase = true` in my `.gitconfig` for this very reason.


The extraordinarily frustrating case is where you're working on a repository that has multiple files that differ only by case. Git will check out one of them, then overwrite it with the other.

The Linux kernel is one of them: several of the headers that get installed to /usr/include/linux/netfilter have conflicts on a case-insensitive filesystem. https://github.com/torvalds/linux/tree/master/include/uapi/l...

Debhelper used to be one of those until I convinced them to change it: they had a Debian/ directory for the Perl module Debian::Debhelper as well as a debian/ directory for the packaging metadata. https://bugs.debian.org/873043

(I suspect I'm a little unusual in wanting to have checkouts of Linux and Debhelper on my Mac homedir.)


That’s the same as Windows, but Windows enables making directories case sensitive on a directory by directory basis.


wasn't this fixed with their new Apple FS?

Wait I'll try

...nope, after `touch makefile` and `touch Makefile`, I still see just `makefile`. OK


OSX has even more annoying problem that it decomposes unicode: https://stackoverflow.com/questions/5581857/git-and-the-umla...

Many fun times trying to copy/move/remove a file and not being able to do so because the input and name stored on fs is actually different bytewise...

Seems like linux has the only sane filesystems not trying to mangle paths at all.


If linux doesn't normalize unicode at all, can you have two different files that look like they are named `josé`, depending on if the é is decomposed or not?


Yes, for linux filenames are just bytes. Apart from / and NUL characters it doesnt care what you give it, nor does it mangle them anyway, its the only sane thing to do.


The only sane thing to do if you don’t care about how humans (as opposed to nerds) think.

In the end, the file system doesn’t exist in isolation, it is there to support users, and most of them won’t care how many bytes “é” takes to store.

Unix, by not even defining the way to interpret the bytes of file names (one can’t even assume that names consisting of only bytes that correspond to ASCII letters and digits should be interpreted as ASCII) makes it impossible to show file names to users. That’s insane.


> The only sane thing to do if you don’t care about how humans (as opposed to nerds) think.

At the FS layer, I think that's better. Makes things simpler for programs. For non-techie humans, unicode can be normalized at upper levels, like the GUI file manager or toolkit library that does save dialogs, etc.

That's if humans being confused because of lack of normalization of unicode is a real practical issue and not just something that can happen but never does.


If you use a file system that doesn't normalize lookups, yes you can.


I'm in the same boat - used MacOS for the past 6 years, including the terminal nearly everyday! :d


FYI, on MacOS, it is a property of the partition, so you can reformat and have a case-sensitive filesystem. Applications may subtly break if they weren't tested on such a filesystem, but I had used one for several years without too many issues.


Since the introduction of APFS I've taken to creating a new APFS volume formatted as case-sensitive, and put my git repositories there.

This has mostly been useful for working on shared repositories where, say, a Linux user (or other user on a case-sensitive filesystem) pushes two branches, say `feature/foo` and `feature/Foo` which works fine for them, but on a case-insensitive filesystem, git gets very upset.


Once spent way longer than I would have liked trying to debug an iOS app issue that couldn’t reproduce and debug in the emulator because iOS devices have a case-sensitive FS, macOS devices typically don’t, and the emulator was subject to the macOS file system’s conventions.

Leaky abstractions all the way down.


Windows.

Its a notable problem with git + Windows that has gotten better over time but still leads to a lot of WTF moments. For many this event is the first time they hear that window's filesystem is case insensitive.


It is strange I haven't noticed earlier. Maybe it is just so unnatural to name files the same with different cases that I haven't tried.


Sometimes it feels like corporate IT creates more security problems than it solves: windows as development machines, solar winds, Fucking McAffee malware on everything.


ZFS has a case-insensitive option.


I guess I'll have to stop running

$ sudo git clone ...


I don't think that smugly not running as root saves normal users; while malware running as your user can't trash your laptop, they can get your Google cookie and read and send emails as you, spend your money, view your private photos, etc.


And it can run sudo as your user after you warm it up. Or use any number of frequently disclosed OS vulnerabilities for local privilege escalation.


> And it can run sudo as your user after you warm it up.

How is it getting my root password?


Sudo persists authorization for a short period of time so that you don’t need to re-authorize for back-to-back commands.

Also, once they get ACE they can modify your bashrc to make sudo an alias for “sudo rm -rf / ;” or all sorts of other evil trickery.


it could also alias sudo to some other command


Like "doas".


I set sudo to NOPASSWD, so it doesn’t need one for my workstation


Relevant XKCD: https://xkcd.com/1200/


I mean, there are... not-totally-unreasonable workflows that do clones as root.

Edit: although I am struggling to think of one that clones from an untrusted source.


> not-totally-unreasonable workflows that do clones as root

Uh... really? Like what?


etckeeper and friends (I have a git checkout in /etc/nixos on nixos machines), portage sync on funtoo, pulling ports tree or even system source on a BSD, grabbing setup scripts during install of Arch before a non-root user exists


So, basically workflows where the cloned code gets run as root without further inspection anyways.


Is nothing sacred anymore


What's an easy way to fix the default git installation on OSX?


Wait for a macOS security update that includes it. If you don’t want to wait, macports and homebrew will both be patched much faster.


Isn't git distributed with the Xcode command line tools?


Yes, so you’ll probably get a new version when 12.5 drops (maybe later this month?)


I did a `brew install git` and then deleted /Library/Developer/CommandLineTools/usr/bin/git. You can't delete /usr/bin/git even with sudo (system integrity policy).

After installing git via brew and removing the one in CommandLineTools, /usr/bin/git is showing the latest version.

    me@local % git --version
    git version 2.30.2
I don't know if this is recommended or if it will have negative consequences that i don't know about, but it seemed like the way I could accomplish it. Given that /usr/bin/git is working with the homebrew installed git, I'm hopeful that everything will be good.


I don't use Mac, so I can't speak yo how the changes you made will affect your system. For future reference however, you should know that binaries are searched on your $PATH in order. Instead of deleting anything, you could have edited your $PATH variable so that the directory that brew stores binaries in is searched before other locations.


brew install git


Git: Malicious repositories can execute remote code while cloning on a case-insensitive filesystem with symlinks

FTFY


> This vulnerability affects platforms with case-insensitive filesystems with support for symbolic links, when certain clean/smudge filters are configured globally (e.g. Git LFS).

Can we get the title changed to "on macOS and Windows?"

I was worried for a second, but this is meaningless.


Why is it meaningless? Lots of people use Git on MacOS and Windows. I'd even be willing to bet that there are more people using Git on MacOS and Windows than Linux.


And use git LFS and cloned a malicious repo? This bug has probably not affected a single user.


Isn't the whole point of announcing security patches so that people can update before they're exploited?


Sure, I'm just saying this isn't a big deal and its likely no one was hit.


Just because there is not an active threat doesn’t make it any less of a vulnerability to be exploited.


It does if nobody uses it. You can't exploit Apache 2.4.2 proxy bugs if nobody runs Apache 2.4.2 in proxy mode.

Of course, you should still update because you're a config change away from being vulnerable, but GP's point of it not being a big deal if (and only if, don't know if that's correct) nobody uses it stands.


The GitHub desktop app configures lfs in your gitconfig automatically, so that adds a lot of users to the vulnerable pool.


Many Git distributions come with git-lfs installed by default


I assume the primary user base of git-lfs is folks doing things like video game development (so that they can check in image/audio assets to a repo without massively bloating it), which probably has a much higher fraction of Mac/Windows users than folks writing server-side apps or whatever.


That's not why we do security research.


I would assume that most people developing on macOS have configured case sensitive filesystems. And does Windows do symlinks now?

Seems like a weird edge case to me. I guess Apple and Microsoft should push out OS updates to cover it.


Windows has done symlinks (known as "junctions") since Windows 2000, so I guess it's a more recent feature you might not have learned about.


To be honest I was a heavy Windows user until Windows 7 yet only recently learned that it has symlink support. It's not something you (used to?) really come across in the ecosystem.

That said, I did snicker at the comment :) I had no idea it was that old.


That's a dishonest statement and misrepresents the actual support.

The "junctions" are unusable as "windows symlinks".

For one, you can't create any as a normal non-admin user without specific authorization by default.


Only the most sadistic company on earth is going to ask you to do development on a Windows machine without admin privileges.


I had exactly that done to me. By a huge 200k+ employees corporate monster... You can taste the humiliation of having to justify every sudo through a ticket system. They censored the internet for employees too, in an of course absurdly broken way. Made me learn to read ASN.1 printouts & detect tampering with TLS certs. Add to that an iconic "Office Space"-ey workplace atmosphere, absolutely toxic... made me _request a headset_ from the company (employees are not allowed to bring their own); 2 weeks of ticketing again, and they deliver: an rj45-plug phone headset, with three obscure boxes on the wire (I can only presume, for surveillance). I've been testing my limits for 3 months with them, and left without saying a word. A lesson is a lesson.


> most people developing on macOS have configured case sensitive filesystems

I don't think this is true at all. Too many things will break if you turn this on.


There are many options for case-insensitivity on Linux. The common one would be FAT, which can't handle symbolic links, so that is moot. There is also ext4 and ZFS that can have case-insensitive modes enabled (they aren't by default), which do support symbolic links. ntfs-3g also has an option to mount as case-insensitive (though said option can actually subtly break access to an NTFS volume, since NTFS itself is always case-sensitive and it's just the OS's VFS layer that pretends otherwise).


The exploit can also be done with (case-sensitive) Unicode file names. All it requires is that git thinks two paths are distinct, while the file system thinks they're equivalent


Yes, but no sensible people use case-insensitivity on Linux, and the amount of other people that do in a relevant context can probably be measured with four digits.


I have case-insensitivity enabled for DOSBox and Wine file systems.

I've actually thought about converting my whole $HOME to that way, but I do have a few files that would conflict if I did that. I honestly don't think it's that bad of an idea.


Do you store code in $HOME. If so, I wouldn't recommend it. I have a case-sensitive partition on my macOS machine because I was bitten one too many times by code that worked fine on my development machine (case-insensitive file system) only to fail in production (case-sensitive file system).


That would indeed be one reason (aside from sheer time) I've avoided doing it fully.


May I ask why?

Case sensitivity is one of the things that really bothers me on Linux, it causes me to make mistakes for no reason. If I ever really switched to Linux full-time, I’d probably want to change that.


Because case insensitivity causes ambiguity and complexity for no meaningful benefit, and more often than not causes problems like in the post. This isn't a "Linux" thing for me; every UNIX and POSIX system that has been well-designed with the exception of Snow Leopard has had case sensitivity.


The benefit seems pretty clear to me. Users do not generally consider uppercase and lowercase versions of the letter completely distinct. The use cases for identical file names with different cases would seem quite limited.


every UNIX and POSIX system that has been well-designed

Case sensitivity wasn’t designed; the first Unix couldn’t spare the CPU cycles to do case insensitive matching, which was the norm at the time.


And yet nearly every UNIX system that was well-designed isn't case insensitive. I was pointing out correlation, not causation.


How many case-insensitive UNIX systems are there? I’m not sure you have enough data points here. :P


> mistakes for no reason

Don't you think that 65 ≠ 97 is sufficient of a reason?..

I mean, 'A' ≠ 'a', in ASCII, Unicode and even EBCDIC. In computers, those are two distinct characters. This fact won't change no matter how you rationalize your expectations.

Thus, pretending that "y.txt" is the same as "Y.txt" is an elaborate lie. Even acknowledging that it's a "white", well-intentioned lie (designed to preserve the mistaken expectation that "y.txt" is the same as "Y.txt") — I don't like when computers lie to me; do you?

As every lie, this one has weird consequences. One of them is the today's RCE in OP. Another one was CVE-2014-9390. Myriads others.

Linux rejects the whole notion of filename case-insensitivity, and demonstrates how computers actually work. It becomes easier on developers and more secure on users.

Lastly, don't feel that I'm attacking you; I'm opposing an idea. So, here's a tip: you can set up case-insensitive filename completion in bash, so that TAB will correct your casing mistakes for you. It's a simply one-line change involving putting `set completion-ignore-case on` into an inputrc.


> Don't you think that 65 ≠ 97 is sufficient of a reason?.. [...] In computers, those are two distinct characters

In computers yes, but i am a human and to me as a human 'A' and 'a' are the same letter.


Fair enough. But notice: systems tend to expect that humans interacting with them observe basic rules. "The capital/lowercase variants of western alphabet letters are represented each as distinct character" is one such generic, basic rule with computer systems. Especially if we zoom out of FS's into a broader context (http, json, programming languages, etc) — you can't deny it; it's a fact.

We do have the options to ignore the fact and say "What bytes? I don't care. Guess what I mean, and lie to me as well as you can so I can stay happy in my ignorance" — but, see, coordinating good support for that isn't easy. Minor wrinkles in it continue causing burns, sometimes RCEs. Maybe "doing in Rome as Romans do" isn't such a bad advice after all?


Why is this an unacceptable lie but the notion of letters instead of code points is acceptable? Especially one you get into multi-byte characters?


I didn't say it's unacceptable, neither meant that. In many contexts, it'd be tough without case-insensitive regex matching, for example. Reinforcing my point, //i gains issues once applied to the entirety of Unicode.

It's almost comical: people continue insisting on "letters not code points" knowing very well how computers are bad with guesswork and under-defined notions. Issues stemming from that keep coming up. What if, instead, the norm accepted that 'A' ≠ 'a' and stopped creating problems which computers are known to deal poorly with?


Well, then I think computers would be less useful than they are. The machines serve us, not the other way around.


Why does unix show paths as strings? It's a lie.

65 ≠ 'A'.


I've had git repositories on vfat formatted usb drives before. This isn't something I do with much frequency but it's not that exotic of a use case.

There's also the possibility of a git repo on a SMB share. That's not a use case I have, but it's not too difficult to imagine in a corporate environment.


Yep, I sensed a similar relaxation when reading this. But whatever, don't be silly, title it long enough as it is.

Anyway, what I'm actually thinking when something like this is disclosed is how many more similar things must be known to a team of malicious professionals at Unit 8200 or whatever. I don't think I would reasonably suspect "git clone" being capable of something like that. How many more things I don't suspect to be dangerous actually are? It feels almost pointless to worry about it.


You'd be hard-pressed to get this on Windows/NTFS because of the way symbolic links work and if they're implemented as a link or a junction. This is really a macOS problem.


> Can we get the title changed to "on macOS and Windows?"

That isn't correct; it's a bug that manifests on case-insensitive[1] filesystems.

My colleagues who run a linux VM (our product targets Linux only) tend to git-clone onto a mounted NTFS partition so they can access the source from both host and guest. This bug will affect them even though they are running on Linux.

My other colleagues who run an actual Linux box tend to use a fast removable drive to git-clone (so they can work on it from home), and said drives tend to be FAT, which will also be susceptible to this bug.

If, on the third hand, you're running Windows and using ext4 as a filesystem (removable drive, mounted partition, whatever), then this bug should not affect you.

TLDR; the OS doesn't matter, the filesystem does.

[1] They aren't, not really; NTFS is case-sensitive! It preserves the case when writing filenames and ignores it when reading filenames.


This should be fixed especially for those who want to inspect the code in a repository before running it. But anyone should keep in mind that malicious repositories can do a lot of bad things after cloning, even without this bug.


If you can't clone and then verify the code, a lot of things get much harder.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: