well played. I think that just got added to my standard vocabulary. Caching has caused more errors and bugs that I've had to deal with than I can recall. My favorite was an off by one error where we returned nicely cached info -- just for the previous user who came through our system! :facepalm: That was a bad one.
That's because essentially, "state" and "caching" are the same thing on some level.
And the problem with state is that you have to make sure all your state transitions don't cause bugs. What we know as a "cache" is essentially creating new state representing existing state, with all new transitions...
I like to look at caching as a form of denormalization - introducing redundancy to improve performance. And whenever we have redundancy, we have to make sure all our copies are synchronized, which can be tricky, especially in a concurrent environment.
On the other hand, the whole point of normalization in databases is to avoid redundancy and have "single source of truth".
I find the concepts of normalization and denormalization applicable and helpful outside databases as well, though a different terminology is often used.
In the efficient implementation of a pure functional language (say Haskell without MVars), what really is the difference between state and cache?
I know this is overly philosophical, and in practical scenarios we readily (although not always unambiguously) differentiate between "cache" and "state", but the point about transitions and that being a major source of bugs still stands.
> In the efficient implementation of a pure functional language (say Haskell without MVars), what really is the difference between state and cache?
If you want to unify state and cache, you might want to go down a different route:
Think of log based filesystems (or a log based data base).
Instead of defining your operations in terms of state, you define them as pure functions of the log.
So your log is full of operations. Writing just means appending a symbolic operation like `write(key, value)` to your log.
And you define the result of `read(key)`: scan backwards through the log until you hit the last instance of `write(x, value)` and the return that `value`.
Now state means: compact your log by replacing a swath of `write` log entries with one big `snapshot` operation that encompasses many key/value pairs.
Alternatively, you can also define state to mean caching your `read` operations.
In this approach, it's no coincidence that the log is a data structure that has a linear shape: the evolution of state over time is also linear.
(With some cleverness you can replace the linear structure with eg a DAG; and then also think about how you merge divergent states.)
I can never remember what they are, though. To avoid this problem, I think I wrote them down on a post-it, but I had too many post-its on my desk so I got rid of them all, and now I can't remember.
I think it could probably mean "Happy to help" if said in response to a thanks of some sort. Saying it before someone has said thanks is a bit presumptuous. :-)
It's amazing how often exploits come down to optimizations. The general form being "the domain logic is X, and is secure, but we faked around it in this one case to make it faster, and it turns out we made a bad assumption while doing so". Meltdown fits this description too.
I am really fascinated by the responses to this comment. So many people exclaiming how many issues are caused by caches. In ten years as a fulltime programmer the only cache issues I've seen are cache misses. It probably has to do with one's field. I'm a game developer mainly dealing with graphics programming.
The key problem (as I understand it) is that updating a cache properly requires knowing the exact graph of relations that an entry in the cache has to other entries. So that when that entry changes, you can propagate that change throughout the cache to other concerned entries which need to be recomputed. But knowing that exact graph is too complex a task to be trivial, it seems in this case. Basically it sounds like the non-visual version of rerendering UI when a state changes, which is hard enough even with visual feedback.
A lot of threading issues are also cache related. Forget to properly mark access to shared variables and suddenly every thread /CPU core ends up with its own locally cached version of it.
Yes, but this is such a well understood danger that I've never really been bitten by it in practice.
Along the same lines a lot of GPU programming tutorials warn of inconsistencies between threads and it has never been a problem since I just assume I cannot rely on consistency or order of execution, seeing each thread as separate and independent.
To solve this problem we would need to first understand the human mind, how it stores data, how it does computation, and how it interacts with names. So we would need the same set of information that we would need for creating AGI. A solution is probably only a couple of months/decades away.
Those aren’t formal definitions. “Formal” means, at the very least, that the specification is done in a formal language, and usually that conformance to the specification can be checked mechanically, that is, by a computer.
I don't find I ever make off-by-one errors with simple collection iteration; at some point "i < len" becomes tattooed on your brain stem. The off-by-one errors I tend to make are related to implementation details of certain data structures or algs. Really, I would describe them more as "thinking at the margins can be challenging." Correctly handling doubly linked lists, that sort of thing.
Oh, and slicing. I will never get Python slicing right the first time. The fact that the range is [begin, end) is just never the way I expect it to work.
Per your downvotes - I used to hate jokes on Hacker News and downvote them when I saw them, but I've become more ambivalent. They're a way of amicably sharing culture and experiences with other engineers that transcend any differences in age, gender, race, background, etc.
It's barely even a joke to me anymore -- it's just too real for me to laugh.
(Cache invalidation is essentially the same problem as managing mutable state -- "Out of the Tar Pit" frames mutable state as either essential or incidental, the latter being rederivable in principle from essential state. Incidental mutable state is no more and no less than a cache, and usually one with an informal and undocumented invalidation policy.)
(And naming things has a very real technical counterpart in addressing, which comes up obviously in networking, but you can also see its shadows in quite a lot of concerns around architecture and modularity.)
Right, which is (AFAIK) not usually recommended except for side projects or ones where there is already an existing relationship under a personal email address.
Couldn’t it just as well be attributed to improper file path normalization? If we had only lower case ASCII file systems it would not have caused a problem.
https://github.com/gitster/git/commit/684dd4c2b414bcf648505e...
(Surprise, the root cause is a cache)