Let me try explaining a few ways: 1. --- You don't need to assert absence, the a...

hackinthebochs · on May 31, 2021

Abstractions are about carving up the problem space into conceptual units to aid comprehension. But these abstractions do not suggest lower level details don't exist. What they do is provide sign posts from which one can navigate to the low level concern of interest. If I need to edit the code that reads from a file, ideally how the problem space is carved up allows me to zero-in on the right code by allowing me to eliminate irrelevant code from my search. It's a semantic b-tree search. Without this tower of abstractions, you have to read the entire codebase linearly to find the necessary points to edit. There's no way you can tell me this is more efficient.

Of course, not all problems are suited to this kind of conceptual division. Cross-cutting concerns are inherently the sort that cannot be isolated in a codebase. Your example of cache locality is case in point. You simply have to scan the entire codebase to find instances where your code is violating cache locality. Abstractions inherently can't help, and do hurt somewhat in the sense that there's more code to read. But the benefits overall are worth it in most contexts.

MillenialMan · on May 31, 2021

I feel like you didn't really engage with most of what I said. It sounds like you're repeating what you were taught as an undergraduate (I hope that doesn't come across as crass).

I understand the standard justifications for abstraction - I'm saying: I have found that those justifications do not take into account or accurately describe the problems that result, and they definitely underestimate the severity. Repeatedly changing the shape of a problem until it is unrecognisable results in a monster, and it's not as easy to tame as our CS professors make out.

To reiterate: Twitter, with a development budget of billions was crashing people's entire browsers for multiple years. That's not even server-side, where the real complexity is - that's the client. That kind of issue simply should not exist, and it wouldn't if it were running on a (much) shallower stack.

This is a side note, but you keep referencing the necessity of the tower. Bear in mind what happens when you increase the branching factor on a tree. You don't need a tower to segment the problem effectively. 100-item units allow segmenting one million items with three layers, and 10 billion items with five. Larger units mean much, much fewer layers.

hackinthebochs · on June 1, 2021

>I feel like you didn't really engage with most of what I said.

I didn't engage point-by-point because I strongly disagree with how you characterize abstractions and going point-by-point seemed like overkill. They don't misrepresent--they carve up. If you take the carving at a given layer as all there is to know, the mistake is yours. And this isn't something I was taught in school, rather I converged to this style of programming independently. My CS program taught CS concepts, we were responsible for discovering how to construct programs on our own. Most of the students struggled to complete moderately large assignments. I found them trivial, and I attribute this to being able to find the right set of abstractions for the problem. Find the right abstractions, and the mental load of the problem is never bigger than one moderately sized functional unit. This style of development has served me very well in my career. You will be hard-pressed to talk me out of it.

>Repeatedly changing the shape of a problem until it is unrecognisable results in a monster

I can accept some truth to this in low-level/embedded contexts where the "shape" of the physical machine is a relevant factor and so hiding this shape behind a domain-specific abstraction can cause problems. But most software projects can ignore the physical machine and program to a generic Turing-machine.

>You don't need a tower to segment the problem effectively

Agreed. Finding the right size of the functional units is critical. 100 interacting units is usually way too much. The right size for a functional unit is one where you can easily inspect it for correctness and be confident there are no bugs. As the functional unit gets larger, your ability to even be confident (let alone correct) falls off a cliff. A good set of abstractions is one where (1) the state being manipulated is made obvious at all times, (2) each functional unit is sized such that it can easily be inspected for correctness, and (3) each layer provides a non-trivial increase in resolution of the solution. I am as much against useless abstractions and endless indirection as anyone.

MillenialMan · on June 1, 2021

I don't think we're going to agree on this, so I'll just say that I do grok the approach you're advocating, I used to think like you, and I've deliberately migrated away from it. I used to chunk everything into 5ish-line functions that were very clean and very carefully named, being careful to encapsulate with clean objects with clearly-defined boundaries, etc. I moved away from that consciously.

I don't work in low-level or embedded (although I descend when necessary). My current project is a desktop accessibility application.

Like, I can boil a lot of our disagreement down to this:

> 100 interacting units is usually way too much.

I don't think this is true. It's dogma.

First, they aren't all interacting. Lines in a function don't interact with every other line (although you do want to bear in mind the potential combinatorial complexity for the reader). But more specifically: 100-line functions are absolutely readable most of the time, provided they were written by someone talented. The idea that they aren't is... Wrong, in my opinion. And they give you way more implementation flexibility because they don't force you into a structure defined by clean barriers. They allow you to instead write the most natural operation given the underlying datastructure.

Granted, you often won't be able to unit-test that function as easily, but unit tests are not the panacea everyone makes out, in my opinion. Functional/integration tests are usually significantly more informative and they target relevant bugs a lot more effectively - partly because the surface you need to cover is much smaller with larger units, so you can focus your attacks.

hackinthebochs · on June 1, 2021

> 100-line functions are absolutely readable most of the time, provided they were written by someone talented.

Readable, sure. Easily inspected for correctness, not in most cases. The 100 lines won't all interact, but you don't know this until you look. So much mental effort is spent navigating the 100 lines to match braces, find where variables are defined, where they are in scope, and whether they are mutated elsewhere within the function, comprehend how state changes as the lines progress, find where errors can occur and ensure they are handled within the right block and that control flow continues or exits appropriately, and so on. So little of this is actually about understanding the code's function, its about comprehending the incidental complexity due to its linear representation. This is bad. All of this incidental complexity makes it harder to reason about the code's correctness. Most of these incidental concerns can be eliminated through the proper use of abstractions.

The fact is, code is not written linearly nor is it executed linearly. Why should it be read linearly? There is a strong conceptual mismatch between how code is represented as linear files and its intrinsic structure as a DAG. Well structured abstractions help us move the needle of representation towards the intrinsic DAG structure. This is a win for comprehension.

>Functional/integration tests are usually significantly more informative and they target relevant bugs a lot more effectively - partly because the surface you need to cover is much smaller with larger units, so you can focus your attacks.

We do agree on something!

MillenialMan · on June 3, 2021

Honestly, this characterisation doesn't ring true to me at all. I find long functions much easier to read, inspect and think about than dutifully decomposed lasagne that forces me to jump around the codebase. But also, like... Scanning for matching braces? Who is writing your code? Indentation makes that extremely clear. And your IDE should have a number of tools for quickly establishing uses of a name, and scope.

The older I get, the more I think the vast majority of issues along the lines of "long code is hard to reason about" are just incompetent programmers being let loose on the codebase. Comment rot is another one - who on earth edits code without checking and modifying the surrounding comments? That's not an inherent feature of programming to me, it's crazy. However, I absolutely see comment rot in lasagne code - because the comments aren't proximate to the algorithm.

With regards to the idea that abstractions inherently misrepresent, I'll defer to Joel Spolsky for another point:

https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-a...