The commit that fixes this issue: https://github.com/gitster/git/commit/684dd4c2...

softwaredoug · on March 9, 2021

Another cachelty

sethammons · on March 10, 2021

well played. I think that just got added to my standard vocabulary. Caching has caused more errors and bugs that I've had to deal with than I can recall. My favorite was an off by one error where we returned nicely cached info -- just for the previous user who came through our system! :facepalm: That was a bad one.

anyfoo · on March 10, 2021

That's because essentially, "state" and "caching" are the same thing on some level.

And the problem with state is that you have to make sure all your state transitions don't cause bugs. What we know as a "cache" is essentially creating new state representing existing state, with all new transitions...

branko_d · on March 10, 2021

I like to look at caching as a form of denormalization - introducing redundancy to improve performance. And whenever we have redundancy, we have to make sure all our copies are synchronized, which can be tricky, especially in a concurrent environment.

On the other hand, the whole point of normalization in databases is to avoid redundancy and have "single source of truth".

I find the concepts of normalization and denormalization applicable and helpful outside databases as well, though a different terminology is often used.

adolph · on March 10, 2021

In that light, a cache is a form of partition, so it can be available xor consistent with its source.

https://en.wikipedia.org/wiki/CAP_theorem

bin_bash · on March 10, 2021

Cache is a type of state but not necessarily the other way around

anyfoo · on March 10, 2021

In the efficient implementation of a pure functional language (say Haskell without MVars), what really is the difference between state and cache?

I know this is overly philosophical, and in practical scenarios we readily (although not always unambiguously) differentiate between "cache" and "state", but the point about transitions and that being a major source of bugs still stands.

eru · on March 10, 2021

> In the efficient implementation of a pure functional language (say Haskell without MVars), what really is the difference between state and cache?

If you want to unify state and cache, you might want to go down a different route:

Think of log based filesystems (or a log based data base).

Instead of defining your operations in terms of state, you define them as pure functions of the log.

So your log is full of operations. Writing just means appending a symbolic operation like `write(key, value)` to your log.

And you define the result of `read(key)`: scan backwards through the log until you hit the last instance of `write(x, value)` and the return that `value`.

Now state means: compact your log by replacing a swath of `write` log entries with one big `snapshot` operation that encompasses many key/value pairs.

Alternatively, you can also define state to mean caching your `read` operations.

In this approach, it's no coincidence that the log is a data structure that has a linear shape: the evolution of state over time is also linear.

(With some cleverness you can replace the linear structure with eg a DAG; and then also think about how you merge divergent states.)

saganus · on March 10, 2021

Here goes the obligatory

> There are only two hard things in Computer Science...

jholman · on March 10, 2021

I can never remember what they are, though. To avoid this problem, I think I wrote them down on a post-it, but I had too many post-its on my desk so I got rid of them all, and now I can't remember.

llarsson · on March 10, 2021

The two most difficult ones are naming things, cache invalidation, and off-by-one errors. HTH. ;)

Xelbair · on March 10, 2021

I prefer the ordered version.

three most difficult things in CS:

2) Naming Things

1) Cache Invalidation

4) off by one errors

3) Concurrency

csunbird · on March 10, 2021

Another version, about distributed systems:

There are only two hard problems in distributed systems:

2. Exactly-once delivery

1. Guaranteed order of messages

2. Exactly-once delivery

Viliam1234 · on March 10, 2021

two most difficult things in CS:

0) naming things

1) cache invalidation

42) asynchronous callbacks

2) off by one errors

zaphirplane · on March 10, 2021

They are likely making a joke about caching the answer

samb1729 · on March 10, 2021

What does HTH mean?

mattacular · on March 10, 2021

"Hope this helps" (sometimes used sarcastically)

sgn · on March 10, 2021

Hope That Help, HTH.

kzrdude · on March 10, 2021

"Hope this helps"

I see it with HAND "have a nice day" too

_puk · on March 10, 2021

I've always read it as "Happy To Help".

I see that's wrong, but ignorance has made the internet seem just that little bit warmer all these years!

gwd · on March 10, 2021

I think it could probably mean "Happy to help" if said in response to a thanks of some sort. Saying it before someone has said thanks is a bit presumptuous. :-)

bryanrasmussen · on March 10, 2021

this reminds me I was dumpster diving at a place with lots of post-it notes and there was one that said 2HCS => whatchamacallit, CI!

what that you?

on edit: I'm going to let that 'what that you' stand because one of the hardest things about HN posts is grammatical correctitude.

_zhqs · on March 10, 2021

> wrote them down on a post-it

You write it on local media and kept it on-premises?

Cloud is the new thing, I hear.

ampdepolymerase · on March 10, 2021

The Post-it was but a cache.

mrathys · on March 10, 2021

I would consider the Post-it as persistent storage or the backup. Your memory would be the cache :D

taf2 · on March 10, 2021

Yeah but the speed.

AbraKdabra · on March 10, 2021

Oh man...

brundolf · on March 10, 2021

It's amazing how often exploits come down to optimizations. The general form being "the domain logic is X, and is secure, but we faked around it in this one case to make it faster, and it turns out we made a bad assumption while doing so". Meltdown fits this description too.

saagarjha · on March 10, 2021

Optimizations come from making assumptions, and bugs come from mistaken assumptions.

gfiorav · on March 11, 2021

Same with bugs. Hate to be the mantra guy but:

1. Make it work

2. Make it right

3. Make it fast (make sure you need to) a.k.a. optimize

4. Make it scale

Agentlien · on March 10, 2021

I am really fascinated by the responses to this comment. So many people exclaiming how many issues are caused by caches. In ten years as a fulltime programmer the only cache issues I've seen are cache misses. It probably has to do with one's field. I'm a game developer mainly dealing with graphics programming.

ironmagma · on March 10, 2021

The key problem (as I understand it) is that updating a cache properly requires knowing the exact graph of relations that an entry in the cache has to other entries. So that when that entry changes, you can propagate that change throughout the cache to other concerned entries which need to be recomputed. But knowing that exact graph is too complex a task to be trivial, it seems in this case. Basically it sounds like the non-visual version of rerendering UI when a state changes, which is hard enough even with visual feedback.

josefx · on March 10, 2021

A lot of threading issues are also cache related. Forget to properly mark access to shared variables and suddenly every thread /CPU core ends up with its own locally cached version of it.

Agentlien · on March 10, 2021

Yes, but this is such a well understood danger that I've never really been bitten by it in practice.

Along the same lines a lot of GPU programming tutorials warn of inconsistencies between threads and it has never been a problem since I just assume I cannot rely on consistency or order of execution, seeing each thread as separate and independent.

pabs3 · on March 10, 2021

I wish folks who announce security issues would link to the patches for the issues they are announcing. This should become standard practice.

segfaultbuserr · on March 9, 2021

Difficult problems in programming:

(1) cache invalidation

(2) off-by-one errors

mattiasfestin · on March 9, 2021

The classical three problems.

xarope · on March 9, 2021

I thought the two hardest problems were:

1) naming

2) cache invalidation

...

3) off-by-one errors

Ancapistani · on March 10, 2021

I'm pretty sure it's:

1) naming

4) concurr2) cache invalidationency

...

3) off-by-one errors

hesdeadjim · on March 10, 2021

Love this.

edgyquant · on March 10, 2021

You forgot

0) Race consegmentation fault (core dumped)

(I know I was ninja’d but didn’t see until after)

account42 · on March 10, 2021

FWIW, your version has the nice touch of also introducing a 0th item.

waheoo · on March 9, 2021

I feel like we should have solved the naming problem as an industry by now.

Alas.

layer8 · on March 10, 2021

It’s actually the hardest one of the three, being outside the grasp of formal methods.

kreeben · on March 10, 2021

To solve this problem we would need to first understand the human mind, how it stores data, how it does computation, and how it interacts with names. So we would need the same set of information that we would need for creating AGI. A solution is probably only a couple of months/decades away.

waheoo · on March 10, 2021

I really have to disagree, why can't you devise a formal method?

- a good name should be descriptive

- avoid being overly clever, call a spade a spade

- don't optimise for generalisation, naming things is a time to be specific

- aim for short but not at the expense of losing context

- avoid redundancy in naming of things nearby, leverage spatial context

- avoid qualifiers or type information where possible - type should be obvious from context and use, if it's not qualify or refactor

Anything else?

layer8 · on March 11, 2021

Those aren’t formal definitions. “Formal” means, at the very least, that the specification is done in a formal language, and usually that conformance to the specification can be checked mechanically, that is, by a computer.

https://en.m.wikipedia.org/wiki/Formal_methods

bouncycastle · on March 10, 2021

shouldn't your list start from 0?

gdavisson · on March 10, 2021

There are three kinds of programmers:

  1) Those who number lists starting at 1.
  1) Those who number lists starting at 0.
  2.5) Stan Kelly-Bootle, who proposed a compromise.

edgyquant · on March 10, 2021

That’s the joke

eru · on March 10, 2021

Fortunately, many off-by-one errors can be caught with more ergonomic tooling.

For the simplest example: compare the old C-style for loop vs a Python style for-each loop.

Gene_Parmesan · on March 10, 2021

I don't find I ever make off-by-one errors with simple collection iteration; at some point "i < len" becomes tattooed on your brain stem. The off-by-one errors I tend to make are related to implementation details of certain data structures or algs. Really, I would describe them more as "thinking at the margins can be challenging." Correctly handling doubly linked lists, that sort of thing.

Oh, and slicing. I will never get Python slicing right the first time. The fact that the range is [begin, end) is just never the way I expect it to work.

Ygg2 · on March 10, 2021

But slicing 0..len and for (i=0; i< len) are literally the same thing.

In [0, len) ')' means less than. As in 0≤ x< len.

user-the-name · on March 10, 2021

(3) remembering the joke

echelon · on March 9, 2021

Per your downvotes - I used to hate jokes on Hacker News and downvote them when I saw them, but I've become more ambivalent. They're a way of amicably sharing culture and experiences with other engineers that transcend any differences in age, gender, race, background, etc.

The formulation of this joke I tend to see is,

The two hardest problems in programming:

(1) cache invalidation

(2) appropriately naming things

(3) off-by-one errors

Twisol · on March 9, 2021

It's barely even a joke to me anymore -- it's just too real for me to laugh.

(Cache invalidation is essentially the same problem as managing mutable state -- "Out of the Tar Pit" frames mutable state as either essential or incidental, the latter being rederivable in principle from essential state. Incidental mutable state is no more and no less than a cache, and usually one with an informal and undocumented invalidation policy.)

(And naming things has a very real technical counterpart in addressing, which comes up obviously in networking, but you can also see its shadows in quite a lot of concerns around architecture and modularity.)

imoverclocked · on March 9, 2021

Humor is often the most efficient way to communicate/accept the truth.

pablius · on March 10, 2021

To make the truth seem like an acceptable parallel universe, and then join it.

JBiserkov · on March 10, 2021

The two hardest problems in programming:

(1) cache invalidation

(3) off-by-one errors

(2) appropriately naming things

(4) parallel execution [leading to race conditions / ordering bugs]

lifeisstillgood · on March 10, 2021

I think you win a bad in-joke award - the first annual Turing-Dad joke award.

rileymat2 · on March 10, 2021

Minor quibble, it should be the three hardest problems.

bluejekyll · on March 10, 2021

It should be, but the increments were run in parallel on nonvolatile memory.

scubbo · on March 10, 2021

The two hardest problems in computer science are:

1) Naming 3) Cache Invalidation 2) Off-by-one errors 3) In-order once-only delivery of distributed messages

And an almost fanatical devotion to the Pope

froh · on March 10, 2021

I love it told like this: https://news.ycombinator.com/item?id=26406351

1) naming

4) concurr2) cache invalidation

ency

3) off-by-one errors

nicklaf · on March 10, 2021

(5) feature creep

slavik81 · on March 10, 2021

They had this bug before in other code unrelated to caching. To me, that suggests a deeper root.

de6u99er · on March 10, 2021

Strange. The guy who fixed the issue works at Microsoft, but uses his gmx email for Github.

enneff · on March 10, 2021

And the guy who announced the new Git release works for Google, but uses his pobox.com email for Git development.

skzv · on March 10, 2021

Yes, actually, Googlers are encouraged to use their personal Github accounts.

enneff · on March 10, 2021

That is true but the git release has nothing to do with GitHub accounts.

saagarjha · on March 10, 2021

But probably their work email when doing things on company time?

enneff · on March 10, 2021

Nope. This is an example of someone working on company time using their personal email.

saagarjha · on March 11, 2021

Right, which is (AFAIK) not usually recommended except for side projects or ones where there is already an existing relationship under a personal email address.

eru · on March 10, 2021

I think you just got a glimpse of a vast sea of internal policy and compliance issues.

balp · on March 10, 2021

Or that the opensource hobby is much more longlived that something as temporary as an employer.

skrebbel · on March 10, 2021

Wait, there are people who use real, in-use email addresses in publicly hosted git repos? I mean, it's likely his spamcatcher address, no?

lixtra · on March 10, 2021

> (Surprise, the root cause is a cache)

Couldn’t it just as well be attributed to improper file path normalization? If we had only lower case ASCII file systems it would not have caused a problem.

yxhuvud · on March 10, 2021

Things can have more than one cause that act together.