Git: Malicious repositories can execute remote code while cloning

stefan_ · on March 9, 2021

The commit that fixes this issue:

https://github.com/gitster/git/commit/684dd4c2b414bcf648505e...

(Surprise, the root cause is a cache)

softwaredoug · on March 9, 2021

Another cachelty

sethammons · on March 10, 2021

well played. I think that just got added to my standard vocabulary. Caching has caused more errors and bugs that I've had to deal with than I can recall. My favorite was an off by one error where we returned nicely cached info -- just for the previous user who came through our system! :facepalm: That was a bad one.

anyfoo · on March 10, 2021

That's because essentially, "state" and "caching" are the same thing on some level.

And the problem with state is that you have to make sure all your state transitions don't cause bugs. What we know as a "cache" is essentially creating new state representing existing state, with all new transitions...

branko_d · on March 10, 2021

I like to look at caching as a form of denormalization - introducing redundancy to improve performance. And whenever we have redundancy, we have to make sure all our copies are synchronized, which can be tricky, especially in a concurrent environment.

On the other hand, the whole point of normalization in databases is to avoid redundancy and have "single source of truth".

I find the concepts of normalization and denormalization applicable and helpful outside databases as well, though a different terminology is often used.

adolph · on March 10, 2021

In that light, a cache is a form of partition, so it can be available xor consistent with its source.

https://en.wikipedia.org/wiki/CAP_theorem

bin_bash · on March 10, 2021

Cache is a type of state but not necessarily the other way around

anyfoo · on March 10, 2021

In the efficient implementation of a pure functional language (say Haskell without MVars), what really is the difference between state and cache?

I know this is overly philosophical, and in practical scenarios we readily (although not always unambiguously) differentiate between "cache" and "state", but the point about transitions and that being a major source of bugs still stands.

eru · on March 10, 2021

> In the efficient implementation of a pure functional language (say Haskell without MVars), what really is the difference between state and cache?

If you want to unify state and cache, you might want to go down a different route:

Think of log based filesystems (or a log based data base).

Instead of defining your operations in terms of state, you define them as pure functions of the log.

So your log is full of operations. Writing just means appending a symbolic operation like `write(key, value)` to your log.

And you define the result of `read(key)`: scan backwards through the log until you hit the last instance of `write(x, value)` and the return that `value`.

Now state means: compact your log by replacing a swath of `write` log entries with one big `snapshot` operation that encompasses many key/value pairs.

Alternatively, you can also define state to mean caching your `read` operations.

In this approach, it's no coincidence that the log is a data structure that has a linear shape: the evolution of state over time is also linear.

(With some cleverness you can replace the linear structure with eg a DAG; and then also think about how you merge divergent states.)

saganus · on March 10, 2021

Here goes the obligatory

> There are only two hard things in Computer Science...

jholman · on March 10, 2021

I can never remember what they are, though. To avoid this problem, I think I wrote them down on a post-it, but I had too many post-its on my desk so I got rid of them all, and now I can't remember.

llarsson · on March 10, 2021

The two most difficult ones are naming things, cache invalidation, and off-by-one errors. HTH. ;)

Xelbair · on March 10, 2021

I prefer the ordered version.

three most difficult things in CS:

2) Naming Things

1) Cache Invalidation

4) off by one errors

3) Concurrency

csunbird · on March 10, 2021

Another version, about distributed systems:

There are only two hard problems in distributed systems:

2. Exactly-once delivery

1. Guaranteed order of messages

2. Exactly-once delivery

Viliam1234 · on March 10, 2021

two most difficult things in CS:

0) naming things

1) cache invalidation

42) asynchronous callbacks

2) off by one errors

zaphirplane · on March 10, 2021

They are likely making a joke about caching the answer

samb1729 · on March 10, 2021

What does HTH mean?

mattacular · on March 10, 2021

"Hope this helps" (sometimes used sarcastically)

sgn · on March 10, 2021

Hope That Help, HTH.

kzrdude · on March 10, 2021

"Hope this helps"

I see it with HAND "have a nice day" too

_puk · on March 10, 2021

I've always read it as "Happy To Help".

I see that's wrong, but ignorance has made the internet seem just that little bit warmer all these years!

gwd · on March 10, 2021

I think it could probably mean "Happy to help" if said in response to a thanks of some sort. Saying it before someone has said thanks is a bit presumptuous. :-)

bryanrasmussen · on March 10, 2021

this reminds me I was dumpster diving at a place with lots of post-it notes and there was one that said 2HCS => whatchamacallit, CI!

what that you?

on edit: I'm going to let that 'what that you' stand because one of the hardest things about HN posts is grammatical correctitude.

_zhqs · on March 10, 2021

> wrote them down on a post-it

You write it on local media and kept it on-premises?

Cloud is the new thing, I hear.

ampdepolymerase · on March 10, 2021

The Post-it was but a cache.

mrathys · on March 10, 2021

I would consider the Post-it as persistent storage or the backup. Your memory would be the cache :D

taf2 · on March 10, 2021

Yeah but the speed.

AbraKdabra · on March 10, 2021

Oh man...

brundolf · on March 10, 2021

It's amazing how often exploits come down to optimizations. The general form being "the domain logic is X, and is secure, but we faked around it in this one case to make it faster, and it turns out we made a bad assumption while doing so". Meltdown fits this description too.

saagarjha · on March 10, 2021

Optimizations come from making assumptions, and bugs come from mistaken assumptions.

gfiorav · on March 11, 2021

Same with bugs. Hate to be the mantra guy but:

1. Make it work

2. Make it right

3. Make it fast (make sure you need to) a.k.a. optimize

4. Make it scale

Agentlien · on March 10, 2021

I am really fascinated by the responses to this comment. So many people exclaiming how many issues are caused by caches. In ten years as a fulltime programmer the only cache issues I've seen are cache misses. It probably has to do with one's field. I'm a game developer mainly dealing with graphics programming.

ironmagma · on March 10, 2021

The key problem (as I understand it) is that updating a cache properly requires knowing the exact graph of relations that an entry in the cache has to other entries. So that when that entry changes, you can propagate that change throughout the cache to other concerned entries which need to be recomputed. But knowing that exact graph is too complex a task to be trivial, it seems in this case. Basically it sounds like the non-visual version of rerendering UI when a state changes, which is hard enough even with visual feedback.

josefx · on March 10, 2021

A lot of threading issues are also cache related. Forget to properly mark access to shared variables and suddenly every thread /CPU core ends up with its own locally cached version of it.

Agentlien · on March 10, 2021

Yes, but this is such a well understood danger that I've never really been bitten by it in practice.

Along the same lines a lot of GPU programming tutorials warn of inconsistencies between threads and it has never been a problem since I just assume I cannot rely on consistency or order of execution, seeing each thread as separate and independent.

pabs3 · on March 10, 2021

I wish folks who announce security issues would link to the patches for the issues they are announcing. This should become standard practice.

segfaultbuserr · on March 9, 2021

Difficult problems in programming:

(1) cache invalidation

(2) off-by-one errors

mattiasfestin · on March 9, 2021

The classical three problems.

xarope · on March 9, 2021

I thought the two hardest problems were:

1) naming

2) cache invalidation

...

3) off-by-one errors

Ancapistani · on March 10, 2021

I'm pretty sure it's:

1) naming

4) concurr2) cache invalidationency

...

3) off-by-one errors

hesdeadjim · on March 10, 2021

Love this.

edgyquant · on March 10, 2021

You forgot

0) Race consegmentation fault (core dumped)

(I know I was ninja’d but didn’t see until after)

account42 · on March 10, 2021

FWIW, your version has the nice touch of also introducing a 0th item.

waheoo · on March 9, 2021

I feel like we should have solved the naming problem as an industry by now.

Alas.

layer8 · on March 10, 2021

It’s actually the hardest one of the three, being outside the grasp of formal methods.

kreeben · on March 10, 2021

To solve this problem we would need to first understand the human mind, how it stores data, how it does computation, and how it interacts with names. So we would need the same set of information that we would need for creating AGI. A solution is probably only a couple of months/decades away.

waheoo · on March 10, 2021

I really have to disagree, why can't you devise a formal method?

- a good name should be descriptive

- avoid being overly clever, call a spade a spade

- don't optimise for generalisation, naming things is a time to be specific

- aim for short but not at the expense of losing context

- avoid redundancy in naming of things nearby, leverage spatial context

- avoid qualifiers or type information where possible - type should be obvious from context and use, if it's not qualify or refactor

Anything else?

layer8 · on March 11, 2021

Those aren’t formal definitions. “Formal” means, at the very least, that the specification is done in a formal language, and usually that conformance to the specification can be checked mechanically, that is, by a computer.

https://en.m.wikipedia.org/wiki/Formal_methods

bouncycastle · on March 10, 2021

shouldn't your list start from 0?

gdavisson · on March 10, 2021

There are three kinds of programmers:

  1) Those who number lists starting at 1.
  1) Those who number lists starting at 0.
  2.5) Stan Kelly-Bootle, who proposed a compromise.

edgyquant · on March 10, 2021

That’s the joke

eru · on March 10, 2021

Fortunately, many off-by-one errors can be caught with more ergonomic tooling.

For the simplest example: compare the old C-style for loop vs a Python style for-each loop.

Gene_Parmesan · on March 10, 2021

I don't find I ever make off-by-one errors with simple collection iteration; at some point "i < len" becomes tattooed on your brain stem. The off-by-one errors I tend to make are related to implementation details of certain data structures or algs. Really, I would describe them more as "thinking at the margins can be challenging." Correctly handling doubly linked lists, that sort of thing.

Oh, and slicing. I will never get Python slicing right the first time. The fact that the range is [begin, end) is just never the way I expect it to work.

Ygg2 · on March 10, 2021

But slicing 0..len and for (i=0; i< len) are literally the same thing.

In [0, len) ')' means less than. As in 0≤ x< len.

user-the-name · on March 10, 2021

(3) remembering the joke

echelon · on March 9, 2021

Per your downvotes - I used to hate jokes on Hacker News and downvote them when I saw them, but I've become more ambivalent. They're a way of amicably sharing culture and experiences with other engineers that transcend any differences in age, gender, race, background, etc.

The formulation of this joke I tend to see is,

The two hardest problems in programming:

(1) cache invalidation

(2) appropriately naming things

(3) off-by-one errors

Twisol · on March 9, 2021

It's barely even a joke to me anymore -- it's just too real for me to laugh.

(Cache invalidation is essentially the same problem as managing mutable state -- "Out of the Tar Pit" frames mutable state as either essential or incidental, the latter being rederivable in principle from essential state. Incidental mutable state is no more and no less than a cache, and usually one with an informal and undocumented invalidation policy.)

(And naming things has a very real technical counterpart in addressing, which comes up obviously in networking, but you can also see its shadows in quite a lot of concerns around architecture and modularity.)

imoverclocked · on March 9, 2021

Humor is often the most efficient way to communicate/accept the truth.

pablius · on March 10, 2021

To make the truth seem like an acceptable parallel universe, and then join it.

JBiserkov · on March 10, 2021

The two hardest problems in programming:

(1) cache invalidation

(3) off-by-one errors

(2) appropriately naming things

(4) parallel execution [leading to race conditions / ordering bugs]

lifeisstillgood · on March 10, 2021

I think you win a bad in-joke award - the first annual Turing-Dad joke award.

rileymat2 · on March 10, 2021

Minor quibble, it should be the three hardest problems.

bluejekyll · on March 10, 2021

It should be, but the increments were run in parallel on nonvolatile memory.

scubbo · on March 10, 2021

The two hardest problems in computer science are:

1) Naming 3) Cache Invalidation 2) Off-by-one errors 3) In-order once-only delivery of distributed messages

And an almost fanatical devotion to the Pope

froh · on March 10, 2021

I love it told like this: https://news.ycombinator.com/item?id=26406351

1) naming

4) concurr2) cache invalidation

ency

3) off-by-one errors

nicklaf · on March 10, 2021

(5) feature creep

slavik81 · on March 10, 2021

They had this bug before in other code unrelated to caching. To me, that suggests a deeper root.

de6u99er · on March 10, 2021

Strange. The guy who fixed the issue works at Microsoft, but uses his gmx email for Github.

enneff · on March 10, 2021

And the guy who announced the new Git release works for Google, but uses his pobox.com email for Git development.

skzv · on March 10, 2021

Yes, actually, Googlers are encouraged to use their personal Github accounts.

enneff · on March 10, 2021

That is true but the git release has nothing to do with GitHub accounts.

saagarjha · on March 10, 2021

But probably their work email when doing things on company time?

enneff · on March 10, 2021

Nope. This is an example of someone working on company time using their personal email.

saagarjha · on March 11, 2021

Right, which is (AFAIK) not usually recommended except for side projects or ones where there is already an existing relationship under a personal email address.

eru · on March 10, 2021

I think you just got a glimpse of a vast sea of internal policy and compliance issues.

balp · on March 10, 2021

Or that the opensource hobby is much more longlived that something as temporary as an employer.

skrebbel · on March 10, 2021

Wait, there are people who use real, in-use email addresses in publicly hosted git repos? I mean, it's likely his spamcatcher address, no?

lixtra · on March 10, 2021

> (Surprise, the root cause is a cache)

Couldn’t it just as well be attributed to improper file path normalization? If we had only lower case ASCII file systems it would not have caused a problem.

yxhuvud · on March 10, 2021

Things can have more than one cause that act together.

kazinator · on March 10, 2021

That could be any Git repository.

Have you seen the mayhem that some of mine cause when you clone them and then type ./configure && make, like you have been socially engineered into doing?

pontifier · on March 10, 2021

It doesn't even have to be there... The main reasons to clone a repo are because you're about to compile and run the code there, or you already have and need to fix something.

I don't personally audit all the code I run, but I hope someone is doing it. That being said, source code being public is much better than the alternative of just downloading binaries from who knows where.

I don't trust anything absolutely, and I don't see a way past it.

minitech · on March 10, 2021

There is a huge difference between “clone a repo” and “clone a repo and run code from it”.

kazinator · on March 10, 2021

In spite of my tongue-in-cheek statement, I get it.

It's huge in the context of non-programming uses of Git. If some people are just sharing some text documents with Git, then it's a big deal.

This is likely on the rise.

E.g. if you look at a site like Github, there is a lot of non-code content in it. Some people stash that content, and other people believe that content to just be harmless files that will never perpetrate an exploit just from being cloned.

minitech · on March 10, 2021

It’s a big deal regardless of whether documents or code are being stored. Cloning a repo should not open you up to RCE.

doctor_eval · on March 10, 2021

Agreed, I frequently clone repos so I can look at the code in a terminal with grep, with no intention of ever building it.

jtsiskin · on March 10, 2021

Technically yes, but I can’t think of the last time I cloned a repo without then running code from it...

minitech · on March 10, 2021

Well, I clone repos to inspect code all the time, and when I run code, it’s usually not with the same permissions as the corresponding `git clone`. Maybe I should be better about sandboxing Git…

friendzis · on March 10, 2021

Depends on how you define "running code".

  1. Download container description (Dockerfile)
  2. Upon image build it "compiles things" (e.g. processes/assembles javascript)
  3. Build fails, because it pulls architecture incompatible library (or does not pull architecture mandated library)
  4. Fix build scripts, rebuild container image
  5. Verify container
  6. Pull repo
  7. Reproduce changes, commit
  8. Push

Nothing apart clone-edit-push happens on the repo. The code can be executed on a remote, hardened, isolated system. With proliferation of containers I guess this scenario will become more and more common among ops people.

0x0 · on March 10, 2021

Any html/js web frontend project that runs in a browser?

IshKebab · on March 10, 2021

Sure I'll just `npm install`.... damnit! Hacked again.

remram · on March 10, 2021

For a while I tried to only run untrusted builds in Docker containers, like doing `docker run -v $PWD:/src node npm install`, but IDEs are not really configured to deal with this. Even my Vim has ALE and would just run node_modules/.bin/tsserver on my machine, which could be anything. Why aren't our tools concerned with this at all?

hctaw · on March 10, 2021

Because at a certain point you shrug and blame the user for downloading sketchy code and executing it.

bvendor · on March 10, 2021

I get that you are not completely serious, but before the cmake/meson/... people jump on this:

If ./configure is checked in as part of the official repository of a moderately well known project, I doubt any committer would be stupid enough to insert a backdoor into ./configure or the Makefiles.

What can happen if an apostate project is not on GitHub: Some (usually several) faithful persons decide to correct the situation and put multiple unofficial mirrors on GitHub, and other faithful people clone from a random one of these.

In that case however, they get what they deserve.

vesinisa · on March 9, 2021

Reminds me of another git vulnerability from 2014 on case-insensitive filesystems: https://github.blog/2014-12-18-vulnerability-announced-updat...

mbar84 · on March 10, 2021

Considering how much I `pip install garbage`, this is perhaps not so critical for me.

fortran77 · on March 10, 2021

Isn't the bug really on case-insensitive file-systems that allow symlinks?

jesboat · on March 10, 2021

No. Why would it be?

floatingatoll · on March 9, 2021

> if Git is configured globally to apply delay-capable clean/smudge filters (such as Git LFS)

What is the simple test for whether this is the case or not?

Is this a default-on scenario?

matheust · on March 10, 2021

> What is the simple test for whether this is the case or not?

As suggested in GitHub's announcement post[1], you can test this with the following:

`git config --show-scope --get-regexp 'filter\..*\.process'` (replace the single quotes by double quotes on Windows Command Prompt)

> Is this a default-on scenario?

On Windows yes, because Git-for-Windows configures Git LFS by default.

[1]: https://github.blog/2021-03-09-git-clone-vulnerability-annou...

dragonwriter · on March 10, 2021

Doesn't Git-for-Windows default configure symbolic link support off, though? Or does this exploit work even in that case as long as the underlying file system supports symlinks?

matheust · on March 10, 2021

Git-for-Windows may turn symlink support on by default under some specific circumstances. As the repo's wiki [1] says:

Short version: there is no exact equivalent for POSIX symlinks on Windows, and the closest thing is unavailable for non-admins by default unless Developer Mode is enabled and a relatively recent Windows 10 version is used. Therefore, symlink emulation support is only turned on by default when that scenario is detected.

[1]: https://github.com/git-for-windows/git/wiki/Symbolic-Links

goatinaboat · on March 9, 2021

Is this a default-on scenario?

No, LFS is something you would have to explicitly enable, however it is pretty common to do so if you want to store binary blobs in Git.

ExtraE · on March 10, 2021

Git for windows has LFS on by default.

akdor1154 · on March 10, 2021

Tangential.. are there bug bounties available for vulns in open source projects?

SuchAnonMuchWow · on March 10, 2021

A lot of open source project owned by large corporations do.

More independent projects sometimes does it to, for example curl.

jolmg · on March 9, 2021

> This vulnerability affects platforms with case-insensitive filesystems...

What kind of platforms use case-insensitive filesystems?

Foxboron · on March 9, 2021

Linux these days actually. You can make ext4 case insesitive.

https://www.collabora.com/news-and-blog/blog/2020/08/27/usin...

Note: I also learned this today. Had no clue.

suifbwish · on March 11, 2021

Why the heck would anyone want that? That’s arguably worse than adding spaces in file names

rhinoceraptor · on March 9, 2021

MacOS and Windows

cma · on March 9, 2021

In windows, the underlying ntfs is still case sensitive, and that gets made use of with the WSL 1.0 stuff.

chungy · on March 9, 2021

Before Windows XP, any application could open a file with a case-sensitive flag to request the operating system to not do any case folding. Starting with XP, the same feature exists but requires a registry key set (and a reboot) to instruct the kernel to allow case-sensitive operations.

Starting with Windows 10, the aforementioned key still works, but there's also a per-directory case-sensitive flag that forces all DOS and Windows programs to have case-sensitive operations unconditionally. This is made to great effect in both WSL1 and Cygwin.

jolmg · on March 9, 2021

I guess it shows that I haven't really used either of those in a long time.

Jtsummers · on March 9, 2021

macOS has defaulted to be case insensitive largely due to historical and perhaps usability reasons. You can opt to make it case sensitive (and I do, which broke Steam for several years but that also freed my time).

emodendroket · on March 10, 2021

When necessary you can make an auto-expanding volume that's case-sensitive and leave your host FS alone. I have not found that I really want to have differently-cased but otherwise identical filenames in the real world at any point though.

dhosek · on March 10, 2021

Ages ago, I heard from co-workers at a company that I had left that there was an issue because of some file-naming in a PHP application that they were trying to run locally on a Mac. There was foo.php which was the interface and Foo.php which had a class definition in it.

What idiot would name files like that? I said.

You, they answered.

I don't do that sort of thing anymore.

earthboundkid · on March 10, 2021

A bug that affects my teams once every few years is a developer will create a file named "A.txt" check it into Git, realize it should be "a.txt" and rename it, and then basically everything will shit the bed and you waste a day figuring out why nothing is working.

A related bug is a developer will make a webpage called /a/ but link to /A/ and then the link will be broken in production. At this point, I have seen this same bug enough times to be able to fix it reasonably quickly, but it definitely wastes time for the team.

emodendroket · on March 10, 2021

Yeah, my current company encourages development in a case-sensitive volume and I assume this is why. But this is more an issue of your development environment working differently than prod than a problem with the notion of case insensitivity per se.

jesboat · on March 10, 2021

We added a commit hook to block case conflicts.

trollian · on March 10, 2021

But git is largely used by programmers.

Operyl · on March 9, 2021

macOS, to name one. It appears NTFS is also vulnerable according to the posting.

pcthrowaway · on March 9, 2021

Ashamed to admit (as an OSX user) that I didn't even realize the FS was case-insensitive (having migrated from years of Linux usage to a non-Linux desktop). It does a good job of hiding this from the user (filenames are still listed with cases, and bash autocompletion completes to the correct case as well)

caymanjim · on March 9, 2021

MacOS by default uses a "case-preserving case-insensitive" filesystem, so you can create files with mixed case, but you can't create two files with the same name and different case. It's one of MacOS's more-egregious crimes against Unix. Fortunately it doesn't manifest that often, but it rears its head often enough to be a problem.

leephillips · on March 9, 2021

It may be a crime, but is the result of a set of compromises in the design of the OSX filesystem, which had to work with a BSD variant while also being compatible with pre-OSX days. I think it’s one thing they actually did an elegant job with.

EDIT: This document describes some of the challenges: https://www.usenix.org/legacy/publications/library/proceedin...

EamonnMR · on March 10, 2021

This is the first article I've found that does a decent job of explaining what a resource fork really is.

_lqaf · on March 9, 2021

> It's one of MacOS's more-egregious crimes against Unix.

Nah. Using a file system means putting up with its semantics. HFS+ was case-insensitive; they were deploying an upgrade to millions of existing filesystems.

If you mount, say, an NFS volume, MacOS does the expected thing.

dreamcompiler · on March 10, 2021

The fact that Linux is case-sensitive is the egregious crime. It's a nasty holdover from circa-1970 Unix when case-folding was an expensive operation.

akersten · on March 10, 2021

Case-senitivity is not a "nasty holdover", it is a good design decision that continues to be proven correct (case in point, this bugfix for case-insensitive filesystems).

Why would you introduce complexity into the filesystem to try to normalize file names when you can simply, not? I mean, have you _seen_ the mess that is Unicode normalization? Hundreds of different glyphs or whatever that are all considered equivalent, but are actually composed of different bytes. The filesystem should try to make sense of all that, and consider them equivalent paths?

Even if you say "well, just capitalization, not Unicode normalization," there's the whole German letter ẞ => ss (or is it ß?) and similar friends like the Turkish dotted I that have popped up as articles on HN. Absolutely glad Linux filesystems by and large do not attempt to take that on, and treat paths as a bucket of bytes instead.

All for what benefit - so you can type File.txt in the terminal and have the OS find file.txt? That is much more appropriate for the Application layer to resolve, rather than the filesystem.

Bugs like this come from over-engineering. Filesystems should be simple, and follow the principle of least surprise.

dreamcompiler · on March 10, 2021

It bugs me to see foo.c and Foo.c as separate files in a directory listing. I like the fact that MacOS doesn't allow this situation to ever happen. Not taking on that problem means it's left to the user to figure out what's going on when similar glyphs occur.

Ancapistani · on March 10, 2021

> Fortunately it doesn't manifest that often, but it rears its head often enough to be a problem.

IIRC, one place where it does rear its head in when a file is renamed in a git commit to a value that downcases to the same value as the prior name. For example `Foo.txt`->`foo.txt`.

I have `core.ignorecase = true` in my `.gitconfig` for this very reason.

geofft · on March 10, 2021

The extraordinarily frustrating case is where you're working on a repository that has multiple files that differ only by case. Git will check out one of them, then overwrite it with the other.

The Linux kernel is one of them: several of the headers that get installed to /usr/include/linux/netfilter have conflicts on a case-insensitive filesystem. https://github.com/torvalds/linux/tree/master/include/uapi/l...

Debhelper used to be one of those until I convinced them to change it: they had a Debian/ directory for the Perl module Debian::Debhelper as well as a debian/ directory for the packaging metadata. https://bugs.debian.org/873043

(I suspect I'm a little unusual in wanting to have checkouts of Linux and Debhelper on my Mac homedir.)

ComputerGuru · on March 9, 2021

That’s the same as Windows, but Windows enables making directories case sensitive on a directory by directory basis.

shp0ngle · on March 11, 2021

wasn't this fixed with their new Apple FS?

Wait I'll try

...nope, after `touch makefile` and `touch Makefile`, I still see just `makefile`. OK

Cloudef · on March 9, 2021

OSX has even more annoying problem that it decomposes unicode: https://stackoverflow.com/questions/5581857/git-and-the-umla...

Many fun times trying to copy/move/remove a file and not being able to do so because the input and name stored on fs is actually different bytewise...

Seems like linux has the only sane filesystems not trying to mangle paths at all.

jrochkind1 · on March 9, 2021

If linux doesn't normalize unicode at all, can you have two different files that look like they are named `josé`, depending on if the é is decomposed or not?

Cloudef · on March 9, 2021

Yes, for linux filenames are just bytes. Apart from / and NUL characters it doesnt care what you give it, nor does it mangle them anyway, its the only sane thing to do.

Someone · on March 10, 2021

The only sane thing to do if you don’t care about how humans (as opposed to nerds) think.

In the end, the file system doesn’t exist in isolation, it is there to support users, and most of them won’t care how many bytes “é” takes to store.

Unix, by not even defining the way to interpret the bytes of file names (one can’t even assume that names consisting of only bytes that correspond to ASCII letters and digits should be interpreted as ASCII) makes it impossible to show file names to users. That’s insane.

jolmg · on March 16, 2021

> The only sane thing to do if you don’t care about how humans (as opposed to nerds) think.

At the FS layer, I think that's better. Makes things simpler for programs. For non-techie humans, unicode can be normalized at upper levels, like the GUI file manager or toolkit library that does save dialogs, etc.

That's if humans being confused because of lack of normalization of unicode is a real practical issue and not just something that can happen but never does.

chungy · on March 9, 2021

If you use a file system that doesn't normalize lookups, yes you can.

oars · on March 9, 2021

I'm in the same boat - used MacOS for the past 6 years, including the terminal nearly everyday! :d

jnwatson · on March 9, 2021

FYI, on MacOS, it is a property of the partition, so you can reformat and have a case-sensitive filesystem. Applications may subtly break if they weren't tested on such a filesystem, but I had used one for several years without too many issues.

jamesog · on March 10, 2021

Since the introduction of APFS I've taken to creating a new APFS volume formatted as case-sensitive, and put my git repositories there.

This has mostly been useful for working on shared repositories where, say, a Linux user (or other user on a case-sensitive filesystem) pushes two branches, say `feature/foo` and `feature/Foo` which works fine for them, but on a case-insensitive filesystem, git gets very upset.

jameshart · on March 9, 2021

Once spent way longer than I would have liked trying to debug an iOS app issue that couldn’t reproduce and debug in the emulator because iOS devices have a case-sensitive FS, macOS devices typically don’t, and the emulator was subject to the macOS file system’s conventions.

Leaky abstractions all the way down.

zo1 · on March 9, 2021

Windows.

Its a notable problem with git + Windows that has gotten better over time but still leads to a lot of WTF moments. For many this event is the first time they hear that window's filesystem is case insensitive.

rightbyte · on March 9, 2021

It is strange I haven't noticed earlier. Maybe it is just so unnatural to name files the same with different cases that I haven't tried.

swiley · on March 9, 2021

Sometimes it feels like corporate IT creates more security problems than it solves: windows as development machines, solar winds, Fucking McAffee malware on everything.

chungy · on March 9, 2021

ZFS has a case-insensitive option.

mateo411 · on March 10, 2021

I guess I'll have to stop running

$ sudo git clone ...

jrockway · on March 10, 2021

I don't think that smugly not running as root saves normal users; while malware running as your user can't trash your laptop, they can get your Google cookie and read and send emails as you, spend your money, view your private photos, etc.

fulafel · on March 10, 2021

And it can run sudo as your user after you warm it up. Or use any number of frequently disclosed OS vulnerabilities for local privilege escalation.

Wowfunhappy · on March 10, 2021

> And it can run sudo as your user after you warm it up.

How is it getting my root password?

thelopa · on March 10, 2021

Sudo persists authorization for a short period of time so that you don’t need to re-authorize for back-to-back commands.

Also, once they get ACE they can modify your bashrc to make sudo an alias for “sudo rm -rf / ;” or all sorts of other evil trickery.

VMG · on March 10, 2021

it could also alias sudo to some other command

JetSpiegel · on March 11, 2021

Like "doas".

edgyquant · on March 10, 2021

I set sudo to NOPASSWD, so it doesn’t need one for my workstation

selfhoster11 · on March 10, 2021

Relevant XKCD: https://xkcd.com/1200/

yjftsjthsd-h · on March 10, 2021

I mean, there are... not-totally-unreasonable workflows that do clones as root.

Edit: although I am struggling to think of one that clones from an untrusted source.

krick · on March 10, 2021

> not-totally-unreasonable workflows that do clones as root

Uh... really? Like what?

yjftsjthsd-h · on March 10, 2021

etckeeper and friends (I have a git checkout in /etc/nixos on nixos machines), portage sync on funtoo, pulling ports tree or even system source on a BSD, grabbing setup scripts during install of Arch before a non-root user exists

t0astbread · on March 10, 2021

So, basically workflows where the cloned code gets run as root without further inspection anyways.

jordanmorgan10 · on March 10, 2021

Is nothing sacred anymore

toomim · on March 10, 2021

What's an easy way to fix the default git installation on OSX?

jhugo · on March 10, 2021

Wait for a macOS security update that includes it. If you don’t want to wait, macports and homebrew will both be patched much faster.

btgeekboy · on March 10, 2021

Isn't git distributed with the Xcode command line tools?

saagarjha · on March 10, 2021

Yes, so you’ll probably get a new version when 12.5 drops (maybe later this month?)

mdasen · on March 10, 2021

I did a `brew install git` and then deleted /Library/Developer/CommandLineTools/usr/bin/git. You can't delete /usr/bin/git even with sudo (system integrity policy).

After installing git via brew and removing the one in CommandLineTools, /usr/bin/git is showing the latest version.

    me@local % git --version
    git version 2.30.2

I don't know if this is recommended or if it will have negative consequences that i don't know about, but it seemed like the way I could accomplish it. Given that /usr/bin/git is working with the homebrew installed git, I'm hopeful that everything will be good.

Zambyte · on March 11, 2021

I don't use Mac, so I can't speak yo how the changes you made will affect your system. For future reference however, you should know that binaries are searched on your $PATH in order. Instead of deleting anything, you could have edited your $PATH variable so that the directory that brew stores binaries in is searched before other locations.

jonahx · on March 10, 2021

brew install git

jfrunyon · on March 9, 2021

Git: Malicious repositories can execute remote code while cloning on a case-insensitive filesystem with symlinks

FTFY

caslon · on March 9, 2021

> This vulnerability affects platforms with case-insensitive filesystems with support for symbolic links, when certain clean/smudge filters are configured globally (e.g. Git LFS).

Can we get the title changed to "on macOS and Windows?"

I was worried for a second, but this is meaningless.

jonas21 · on March 9, 2021

Why is it meaningless? Lots of people use Git on MacOS and Windows. I'd even be willing to bet that there are more people using Git on MacOS and Windows than Linux.

SilverRed · on March 9, 2021

And use git LFS and cloned a malicious repo? This bug has probably not affected a single user.

jonas21 · on March 9, 2021

Isn't the whole point of announcing security patches so that people can update before they're exploited?

SilverRed · on March 10, 2021

Sure, I'm just saying this isn't a big deal and its likely no one was hit.

aiisjustanif · on March 10, 2021

Just because there is not an active threat doesn’t make it any less of a vulnerability to be exploited.

lucb1e · on March 10, 2021

It does if nobody uses it. You can't exploit Apache 2.4.2 proxy bugs if nobody runs Apache 2.4.2 in proxy mode.

Of course, you should still update because you're a config change away from being vulnerable, but GP's point of it not being a big deal if (and only if, don't know if that's correct) nobody uses it stands.

CamJN · on March 10, 2021

The GitHub desktop app configures lfs in your gitconfig automatically, so that adds a lot of users to the vulnerable pool.

anaisbetts · on March 10, 2021

Many Git distributions come with git-lfs installed by default

geofft · on March 10, 2021

I assume the primary user base of git-lfs is folks doing things like video game development (so that they can check in image/audio assets to a repo without massively bloating it), which probably has a much higher fraction of Mac/Windows users than folks writing server-side apps or whatever.

junon · on March 10, 2021

That's not why we do security research.

trollian · on March 10, 2021

I would assume that most people developing on macOS have configured case sensitive filesystems. And does Windows do symlinks now?

Seems like a weird edge case to me. I guess Apple and Microsoft should push out OS updates to cover it.

emodendroket · on March 10, 2021

Windows has done symlinks (known as "junctions") since Windows 2000, so I guess it's a more recent feature you might not have learned about.

lucb1e · on March 10, 2021

To be honest I was a heavy Windows user until Windows 7 yet only recently learned that it has symlink support. It's not something you (used to?) really come across in the ecosystem.

That said, I did snicker at the comment :) I had no idea it was that old.

memorysafety · on March 10, 2021

That's a dishonest statement and misrepresents the actual support.

The "junctions" are unusable as "windows symlinks".

For one, you can't create any as a normal non-admin user without specific authorization by default.

emodendroket · on March 10, 2021

Only the most sadistic company on earth is going to ask you to do development on a Windows machine without admin privileges.

memorysafety · on March 12, 2021

I had exactly that done to me. By a huge 200k+ employees corporate monster... You can taste the humiliation of having to justify every sudo through a ticket system. They censored the internet for employees too, in an of course absurdly broken way. Made me learn to read ASN.1 printouts & detect tampering with TLS certs. Add to that an iconic "Office Space"-ey workplace atmosphere, absolutely toxic... made me _request a headset_ from the company (employees are not allowed to bring their own); 2 weeks of ticketing again, and they deliver: an rj45-plug phone headset, with three obscure boxes on the wire (I can only presume, for surveillance). I've been testing my limits for 3 months with them, and left without saying a word. A lesson is a lesson.

fortran77 · on March 10, 2021

> most people developing on macOS have configured case sensitive filesystems

I don't think this is true at all. Too many things will break if you turn this on.

chungy · on March 9, 2021

There are many options for case-insensitivity on Linux. The common one would be FAT, which can't handle symbolic links, so that is moot. There is also ext4 and ZFS that can have case-insensitive modes enabled (they aren't by default), which do support symbolic links. ntfs-3g also has an option to mount as case-insensitive (though said option can actually subtly break access to an NTFS volume, since NTFS itself is always case-sensitive and it's just the OS's VFS layer that pretends otherwise).

brundolf · on March 10, 2021

The exploit can also be done with (case-sensitive) Unicode file names. All it requires is that git thinks two paths are distinct, while the file system thinks they're equivalent

caslon · on March 9, 2021

Yes, but no sensible people use case-insensitivity on Linux, and the amount of other people that do in a relevant context can probably be measured with four digits.

chungy · on March 9, 2021

I have case-insensitivity enabled for DOSBox and Wine file systems.

I've actually thought about converting my whole $HOME to that way, but I do have a few files that would conflict if I did that. I honestly don't think it's that bad of an idea.

nicoburns · on March 9, 2021

Do you store code in $HOME. If so, I wouldn't recommend it. I have a case-sensitive partition on my macOS machine because I was bitten one too many times by code that worked fine on my development machine (case-insensitive file system) only to fail in production (case-sensitive file system).

chungy · on March 9, 2021

That would indeed be one reason (aside from sheer time) I've avoided doing it fully.

Wowfunhappy · on March 10, 2021

May I ask why?

Case sensitivity is one of the things that really bothers me on Linux, it causes me to make mistakes for no reason. If I ever really switched to Linux full-time, I’d probably want to change that.

caslon · on March 10, 2021

Because case insensitivity causes ambiguity and complexity for no meaningful benefit, and more often than not causes problems like in the post. This isn't a "Linux" thing for me; every UNIX and POSIX system that has been well-designed with the exception of Snow Leopard has had case sensitivity.

emodendroket · on March 10, 2021

The benefit seems pretty clear to me. Users do not generally consider uppercase and lowercase versions of the letter completely distinct. The use cases for identical file names with different cases would seem quite limited.

goatinaboat · on March 10, 2021

every UNIX and POSIX system that has been well-designed

Case sensitivity wasn’t designed; the first Unix couldn’t spare the CPU cycles to do case insensitive matching, which was the norm at the time.

caslon · on March 10, 2021

And yet nearly every UNIX system that was well-designed isn't case insensitive. I was pointing out correlation, not causation.

Wowfunhappy · on March 10, 2021

How many case-insensitive UNIX systems are there? I’m not sure you have enough data points here. :P

memorysafety · on March 10, 2021

> mistakes for no reason

Don't you think that 65 ≠ 97 is sufficient of a reason?..

I mean, 'A' ≠ 'a', in ASCII, Unicode and even EBCDIC. In computers, those are two distinct characters. This fact won't change no matter how you rationalize your expectations.

Thus, pretending that "y.txt" is the same as "Y.txt" is an elaborate lie. Even acknowledging that it's a "white", well-intentioned lie (designed to preserve the mistaken expectation that "y.txt" is the same as "Y.txt") — I don't like when computers lie to me; do you?

As every lie, this one has weird consequences. One of them is the today's RCE in OP. Another one was CVE-2014-9390. Myriads others.

Linux rejects the whole notion of filename case-insensitivity, and demonstrates how computers actually work. It becomes easier on developers and more secure on users.

Lastly, don't feel that I'm attacking you; I'm opposing an idea. So, here's a tip: you can set up case-insensitive filename completion in bash, so that TAB will correct your casing mistakes for you. It's a simply one-line change involving putting `set completion-ignore-case on` into an inputrc.

badsectoracula · on March 10, 2021

> Don't you think that 65 ≠ 97 is sufficient of a reason?.. [...] In computers, those are two distinct characters

In computers yes, but i am a human and to me as a human 'A' and 'a' are the same letter.