Hacker News new | past | comments | ask | show | jobs | submit login

The commit that fixes this issue:

https://github.com/gitster/git/commit/684dd4c2b414bcf648505e...

(Surprise, the root cause is a cache)




Another cachelty


well played. I think that just got added to my standard vocabulary. Caching has caused more errors and bugs that I've had to deal with than I can recall. My favorite was an off by one error where we returned nicely cached info -- just for the previous user who came through our system! :facepalm: That was a bad one.


That's because essentially, "state" and "caching" are the same thing on some level.

And the problem with state is that you have to make sure all your state transitions don't cause bugs. What we know as a "cache" is essentially creating new state representing existing state, with all new transitions...


I like to look at caching as a form of denormalization - introducing redundancy to improve performance. And whenever we have redundancy, we have to make sure all our copies are synchronized, which can be tricky, especially in a concurrent environment.

On the other hand, the whole point of normalization in databases is to avoid redundancy and have "single source of truth".

I find the concepts of normalization and denormalization applicable and helpful outside databases as well, though a different terminology is often used.


In that light, a cache is a form of partition, so it can be available xor consistent with its source.

https://en.wikipedia.org/wiki/CAP_theorem


Cache is a type of state but not necessarily the other way around


In the efficient implementation of a pure functional language (say Haskell without MVars), what really is the difference between state and cache?

I know this is overly philosophical, and in practical scenarios we readily (although not always unambiguously) differentiate between "cache" and "state", but the point about transitions and that being a major source of bugs still stands.


> In the efficient implementation of a pure functional language (say Haskell without MVars), what really is the difference between state and cache?

If you want to unify state and cache, you might want to go down a different route:

Think of log based filesystems (or a log based data base).

Instead of defining your operations in terms of state, you define them as pure functions of the log.

So your log is full of operations. Writing just means appending a symbolic operation like `write(key, value)` to your log.

And you define the result of `read(key)`: scan backwards through the log until you hit the last instance of `write(x, value)` and the return that `value`.

Now state means: compact your log by replacing a swath of `write` log entries with one big `snapshot` operation that encompasses many key/value pairs.

Alternatively, you can also define state to mean caching your `read` operations.

In this approach, it's no coincidence that the log is a data structure that has a linear shape: the evolution of state over time is also linear.

(With some cleverness you can replace the linear structure with eg a DAG; and then also think about how you merge divergent states.)


Here goes the obligatory

> There are only two hard things in Computer Science...


I can never remember what they are, though. To avoid this problem, I think I wrote them down on a post-it, but I had too many post-its on my desk so I got rid of them all, and now I can't remember.


The two most difficult ones are naming things, cache invalidation, and off-by-one errors. HTH. ;)


I prefer the ordered version.

three most difficult things in CS:

2) Naming Things

1) Cache Invalidation

4) off by one errors

3) Concurrency


Another version, about distributed systems:

There are only two hard problems in distributed systems:

2. Exactly-once delivery

1. Guaranteed order of messages

2. Exactly-once delivery


two most difficult things in CS:

0) naming things

1) cache invalidation

42) asynchronous callbacks

2) off by one errors


They are likely making a joke about caching the answer


What does HTH mean?


"Hope this helps" (sometimes used sarcastically)


Hope That Help, HTH.


"Hope this helps"

I see it with HAND "have a nice day" too


I've always read it as "Happy To Help".

I see that's wrong, but ignorance has made the internet seem just that little bit warmer all these years!


I think it could probably mean "Happy to help" if said in response to a thanks of some sort. Saying it before someone has said thanks is a bit presumptuous. :-)


this reminds me I was dumpster diving at a place with lots of post-it notes and there was one that said 2HCS => whatchamacallit, CI!

what that you?

on edit: I'm going to let that 'what that you' stand because one of the hardest things about HN posts is grammatical correctitude.


> wrote them down on a post-it

You write it on local media and kept it on-premises?

Cloud is the new thing, I hear.


The Post-it was but a cache.


I would consider the Post-it as persistent storage or the backup. Your memory would be the cache :D


Yeah but the speed.


Oh man...


It's amazing how often exploits come down to optimizations. The general form being "the domain logic is X, and is secure, but we faked around it in this one case to make it faster, and it turns out we made a bad assumption while doing so". Meltdown fits this description too.


Optimizations come from making assumptions, and bugs come from mistaken assumptions.


Same with bugs. Hate to be the mantra guy but:

1. Make it work

2. Make it right

3. Make it fast (make sure you need to) a.k.a. optimize

4. Make it scale


I am really fascinated by the responses to this comment. So many people exclaiming how many issues are caused by caches. In ten years as a fulltime programmer the only cache issues I've seen are cache misses. It probably has to do with one's field. I'm a game developer mainly dealing with graphics programming.


The key problem (as I understand it) is that updating a cache properly requires knowing the exact graph of relations that an entry in the cache has to other entries. So that when that entry changes, you can propagate that change throughout the cache to other concerned entries which need to be recomputed. But knowing that exact graph is too complex a task to be trivial, it seems in this case. Basically it sounds like the non-visual version of rerendering UI when a state changes, which is hard enough even with visual feedback.


A lot of threading issues are also cache related. Forget to properly mark access to shared variables and suddenly every thread /CPU core ends up with its own locally cached version of it.


Yes, but this is such a well understood danger that I've never really been bitten by it in practice.

Along the same lines a lot of GPU programming tutorials warn of inconsistencies between threads and it has never been a problem since I just assume I cannot rely on consistency or order of execution, seeing each thread as separate and independent.


I wish folks who announce security issues would link to the patches for the issues they are announcing. This should become standard practice.


Difficult problems in programming:

(1) cache invalidation

(2) off-by-one errors


The classical three problems.


I thought the two hardest problems were:

1) naming

2) cache invalidation

...

3) off-by-one errors


I'm pretty sure it's:

1) naming

4) concurr2) cache invalidationency

...

3) off-by-one errors


Love this.


You forgot

0) Race consegmentation fault (core dumped)

(I know I was ninja’d but didn’t see until after)


FWIW, your version has the nice touch of also introducing a 0th item.


I feel like we should have solved the naming problem as an industry by now.

Alas.


It’s actually the hardest one of the three, being outside the grasp of formal methods.


To solve this problem we would need to first understand the human mind, how it stores data, how it does computation, and how it interacts with names. So we would need the same set of information that we would need for creating AGI. A solution is probably only a couple of months/decades away.


I really have to disagree, why can't you devise a formal method?

- a good name should be descriptive

- avoid being overly clever, call a spade a spade

- don't optimise for generalisation, naming things is a time to be specific

- aim for short but not at the expense of losing context

- avoid redundancy in naming of things nearby, leverage spatial context

- avoid qualifiers or type information where possible - type should be obvious from context and use, if it's not qualify or refactor

Anything else?


Those aren’t formal definitions. “Formal” means, at the very least, that the specification is done in a formal language, and usually that conformance to the specification can be checked mechanically, that is, by a computer.

https://en.m.wikipedia.org/wiki/Formal_methods


shouldn't your list start from 0?


There are three kinds of programmers:

  1) Those who number lists starting at 1.
  1) Those who number lists starting at 0.
  2.5) Stan Kelly-Bootle, who proposed a compromise.


That’s the joke


Fortunately, many off-by-one errors can be caught with more ergonomic tooling.

For the simplest example: compare the old C-style for loop vs a Python style for-each loop.


I don't find I ever make off-by-one errors with simple collection iteration; at some point "i < len" becomes tattooed on your brain stem. The off-by-one errors I tend to make are related to implementation details of certain data structures or algs. Really, I would describe them more as "thinking at the margins can be challenging." Correctly handling doubly linked lists, that sort of thing.

Oh, and slicing. I will never get Python slicing right the first time. The fact that the range is [begin, end) is just never the way I expect it to work.


But slicing 0..len and for (i=0; i< len) are literally the same thing.

In [0, len) ')' means less than. As in 0≤ x< len.


(3) remembering the joke


Per your downvotes - I used to hate jokes on Hacker News and downvote them when I saw them, but I've become more ambivalent. They're a way of amicably sharing culture and experiences with other engineers that transcend any differences in age, gender, race, background, etc.

The formulation of this joke I tend to see is,

The two hardest problems in programming:

(1) cache invalidation

(2) appropriately naming things

(3) off-by-one errors


It's barely even a joke to me anymore -- it's just too real for me to laugh.

(Cache invalidation is essentially the same problem as managing mutable state -- "Out of the Tar Pit" frames mutable state as either essential or incidental, the latter being rederivable in principle from essential state. Incidental mutable state is no more and no less than a cache, and usually one with an informal and undocumented invalidation policy.)

(And naming things has a very real technical counterpart in addressing, which comes up obviously in networking, but you can also see its shadows in quite a lot of concerns around architecture and modularity.)


Humor is often the most efficient way to communicate/accept the truth.


To make the truth seem like an acceptable parallel universe, and then join it.


The two hardest problems in programming:

(1) cache invalidation

(3) off-by-one errors

(2) appropriately naming things

(4) parallel execution [leading to race conditions / ordering bugs]


I think you win a bad in-joke award - the first annual Turing-Dad joke award.


Minor quibble, it should be the three hardest problems.


It should be, but the increments were run in parallel on nonvolatile memory.


The two hardest problems in computer science are:

1) Naming 3) Cache Invalidation 2) Off-by-one errors 3) In-order once-only delivery of distributed messages

And an almost fanatical devotion to the Pope


I love it told like this: https://news.ycombinator.com/item?id=26406351

1) naming

4) concurr2) cache invalidation

ency

3) off-by-one errors


(5) feature creep


They had this bug before in other code unrelated to caching. To me, that suggests a deeper root.


Strange. The guy who fixed the issue works at Microsoft, but uses his gmx email for Github.


And the guy who announced the new Git release works for Google, but uses his pobox.com email for Git development.


Yes, actually, Googlers are encouraged to use their personal Github accounts.


That is true but the git release has nothing to do with GitHub accounts.


But probably their work email when doing things on company time?


Nope. This is an example of someone working on company time using their personal email.


Right, which is (AFAIK) not usually recommended except for side projects or ones where there is already an existing relationship under a personal email address.


I think you just got a glimpse of a vast sea of internal policy and compliance issues.


Or that the opensource hobby is much more longlived that something as temporary as an employer.


Wait, there are people who use real, in-use email addresses in publicly hosted git repos? I mean, it's likely his spamcatcher address, no?


> (Surprise, the root cause is a cache)

Couldn’t it just as well be attributed to improper file path normalization? If we had only lower case ASCII file systems it would not have caused a problem.


Things can have more than one cause that act together.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: