Hacker News new | past | comments | ask | show | jobs | submit login
Abstraction: Not what you think it is (pathsensitive.com)
163 points by tempodox on March 30, 2022 | hide | past | favorite | 127 comments



Basically functions are not abstractions, the data structures that functions act on are the real abstractions. Because the data structures have been planned out in a particular way, functions can manipulate them as if they were manipulating concrete entities in the business domain. Here's a bunch of relevant quotes talking about that idea (credits to https://medium.com/webdevops/data-structures-548cbea9c520)

"Bad programmers worry about the code. Good programmers worry about data structures and their relationships."

- Linus Torvalds

"Fold knowledge into data, so program logic can be stupid and robust."

- Eric S. Raymond

"Data dominates. If you’ve chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming."

- Rob Pike

"Representation is the Essence of Programming"

- Fred Brooks

"One of the key attributes of a "Good" representation, is the ease in which it is transformed into other valid information the computer can process."

- https://www.cs.utah.edu/~germain/PPS/Topics/information_repr...


It seems like you missed the point the article was trying to make: an abstraction is a mapping from a complex world to a simpler world, which preserves the ability to perform many useful operations.

Data structures can be used as part of an abstraction. So can functions. But neither one is the abstraction.

In the case of data structures, you might use a particular data-structure as the domain of the simpler world, but it's the mapping itself that is the abstraction, and that mapping is not part of the code, it's a conceptual thing that must be documented.

All of those quotes say how important data and its representation is to a programmer, but that doesn't equate to saying data structures are abstractions, because abstractions aren't the only important thing. I'm not sure they're even the most important thing.


You could say that an abstraction is a decision about which details of the complex world are important, or even expressible. Problems are sometimes intractable just because the wrong decision's been made.


My feeling is that abstractions can also be (potentially) more. E.g. good abstractions are themselves composable in expressive and new ways, which gives programmers the ability to reach goals faster, easier and with less risk and maintenance cost.

Building abstractions can be ery much like creating a new language.


Still, I bet Laplace's demon could get by just fine without them, and maybe even spot clever optimizations that a human never would. After all, there's no such thing as a free lunch.


> After all, there's no such thing as a free lunch.

I think a lot depends on where your cost goes. E.g. you could create hyper-optimized code that performs incredibly well, but it is also hard to maintain, next to impossible to comprehend and cannot adapt to change very well.

I believe it is possible to create code that does well on all fronts (readable, maintainable, adaptable, performant), it is just a much harder search task (you pay it with developer time/experience).


Hm, every abstraction comes with costs and benefits. For information theoretic reasons ("no free lunch folklore"), the costs can never be completely eliminated.

The usual way to make abstractions useful is to choose a restricted problem space, compressible enough to offset the abstraction's overhead.


And then a lisp programmer parachutes in and points out that functions are absolutely a kind of data structure, other functions can act on them, and whether they're part of your business domain depends on how you've defined your business domain.

And then a data scientist with expertise in differentiable programming zooms in on a motorcycle to back them up.

And then a mathematician with expertise in linear algebra time travels in from the 19th century to join the party, although they're admittedly a bit lost and confused and not much help in this particular conversation because they're not familiar with modern jargon.

Which isn't to say that those quotes aren't valid or don't express concepts worth knowing and internalizing. But they're practical statements that apply to a certain way of looking at things. A very useful one, to be sure, but not the only one. The distinction between functions and data has a tendency to disappear upon close examination.

(Almost as if the concepts of "function" and "data structure" were themselves abstractions.)


Functions are abstractions in time; data structures are abstractions in space. For example, the same abstraction looks like binary search when drscribed as a function and it looks like a binary tree when described as a data structure. What binary search looks like by itself is unthinkable to us because anything we can think about has to be projected to space or time.

Edit. If the vertical axis is time and the horizontal plane is space, then the floating something in the middle is the 'abstraction', its shadow on the plane below is its data structure and its projection onto the vertical axis is its function. The need to compress everything into the 1-dimensional time is what creates those for-loops and ifs. The art of finding the right data structure is the art of finding the right angle at which the 'abstraction' drops a good descriptive shadow.


Computers indeed look more and more like black holes. It is not unexpected, then, that one feels the need now to talk about structurefunctions that live in spacetime.


We might call a structurefunction an object. Maybe if we made this programming paradigm object oriented we could solve some real-world problems.

[edit] spelling [/edit]


"What is "data" without an interpreter" - Alan Kay https://news.ycombinator.com/item?id=11946764

You've failed to account for the interpreter and the hardware it runs on that provides operational properties necessary for human use, i.e. performance


But this is more about shifting focus. You still need both, data and functions even if you make data richer and functions more general and simple. The latter still carry or encode an essential part about how you model information. Especially if we consider this quote here:

"Syntax without representation is tyranny!"

- Gerald Jay Sussman


Functions are abstractions of the business logic. If they weren't then you couldn't use them over and over again. It's abstracting by exposing a simple interface that then does more under the hood, and this is repeated layer after layer, from functions to libraries.

The decision of what a function will do over and over again is abstracting the layer below which if it weren't for the function, would have repetition of some kind. It's taking those repeated bits, and factoring out the different elements reducing them to a function call.

That said, I understand where this perspective is coming from, and perhaps "refinement" is more descriptive of a more practical approach.


"Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious." -- Fred Brooks, The Mythical Man Month (1975)


> Basically functions are not abstractions, the data structures that functions act on are the real abstractions

That distinction sounds mildly problematic when one can turn a data structure into a function.


> - Linus Torvalds > > "Fold knowledge into data, so program logic can be stupid and robust." > > - Eric S. Raymond > > "Data dominates. If you’ve chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming." > > - Rob Pike > > "Representation is the Essence of Programming" > > - Fred Brooks > > "One of the key attributes of a "Good" representation, is the ease in which it is transformed into other valid information the computer can process."

Ah, they're advocates for strongly-typed programming and they didn't even know it!


> Basically functions are not abstractions, the data structures that functions act on are the real abstractions

Maybe, but this is not at all what the article is arguing, however.


Why don’t they define data structures if that’s so important? Lol. I get what they mean, but interpretation isn’t really a value in programming. Why not just explain it all the way?


"It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures." - Alan Perlis


cannot agree on that. Here you got the OO ways of thinking by combining the data and code logic. However, it's totally legitimate to say functions wrapped up the procedures or transformations l.


But if you're writing a compiler, the program becomes the data ...


> the data structures that functions act on are the real abstractions

I think type level functions, like Functor or Monad, also qualify as abstractions.


Religious devotion to abstraction is a common failing in younger coders. Abstraction performs a valuable service when it reliably masks details irrelevant to your task. It is actively harmful when it conceals from you details important to your task.

A common example of a harmful abstraction, in C and C++, is a typedef of a pointer to an opaque name. The typedef conceals, when you read code using it, that the type is really a pointer, and so subject to all the excess operations provided for pointers in those languages. But one thing every C and C++ programmer always needs to know about a type they are working with is whether it is a pointer.

Every abstraction has an unavoidable cost. The value it provides has to exceed that cost for it not to be a liability. Usually the main cost is that it is not as comprehensively documented as its creator imagines (if at all), and the user cannot trust what it will do without tracing through and understanding what details and pitfalls it hides. The more comprehensive its documentation, the more time it takes to read and understand, and the more likely it is to be, or later become, inaccurate.

Thus, part of the job of every programmer is to distrust abstractions. An abstraction works on cases it has been seen to work on, but often cannot be trusted otherwise. This is what makes Standard Library abstractions valuable: you have confidence the implementation has been verified to match the specification, and the specification (1) has been carefully designed to avoid pitfalls and (2) documents them, where it cannot. J. Random Abstraction in your local library rarely gets that much attention.

Make your abstractions earn their keep.


>A common example of a harmful abstraction, in C and C++, is a typedef of a pointer to an opaque name. The typedef conceals, when you read code using it, that the type is really a pointer, and so subject to all the excess operations provided for pointers in those languages. But one thing every C and C++ programmer always needs to know about a type they are working with is whether it is a pointer.

Nah, I don't agree at all. Renamed generic pointers are actually an extremely useful signal. They say "you should not concern yourself with what this pointer points to. You should only be using it through the library's functions. Do anything else with it at your own peril". Also, no, C and C++ programmers don't really care whether a value of some novel type is actually a pointer or not (C++ programmers much less so). They only care about what size it is to know how to pass it around and whether it needs to be cleaned up in some specific way. If I tell you that when you're done with a fluor_t you should pass it to release_fluor(), do you need to know whether fluor_t is a pointer or a struct, or whether release_fluor() is a function or a macro?


Anywhere it would have been useful to "rename a pointer", it would be more useful to wrap it in a type that defines exactly the operations intended, and none that are not.

If users have to call release_fluor(), your abstraction has failed to earn its keep.


Are you talking about a C or C++ API? If C++ then yes, I agree that would not be an idiomatic API. If C, then no, that's a perfectly typical API for the language.


If you will need to release it, then you also need to know it is a pointer that would afterward refer to freed storage.

Programmers understand that referents of pointers have lifetime that must be managed. Hiding that quality of your special-snowflake type does nobody any good.


>If you will need to release it, then you also need to know it is a pointer that would afterward refer to freed storage.

You really don't. If the documentation says that after calling release_fluor(), the fluor_t that was passed to it can't be passed to get_fluor_mass(), it doesn't matter whether fluor_t is a pointer or not.

I'm sorry, but you're just plainly wrong.


If you are relying on everybody who reads code that uses your type to have thoroughly scoured all the documentation, you are Doing It Wrong.

Programmers understand pointers and pointer semantics: when they see a pointer, they know what is required of them. An opaque type that has pointer semantics you can't tell without reading up on documentation is, exactly, an abstraction that costs more than it delivers.


> If you are relying on everybody who reads code that uses your type to have thoroughly scoured all the documentation, you are Doing It Wrong.

As fluoridation indicated, this is the norm in C programming. By means of example, in the OpenCL C library, bitfields all use the same type. You have to consult a function's documentation to know what you're allowed to pass. [0]

I do agree that the 'Ada-style' approach to types is far superior to the 'C-style', but Doing It Wrong seems a bit strong, unless you want to condemn all C code.

With the right library you can use the 'Ada-style' in C++. Specifically, using a 'strong typedef' library which prevents implicit conversions, such as [1], but not Boost's strong_typedef which permits implicit conversion back to the underlying type. Unfortunately this kind of thing isn't possible in C.

> Programmers understand pointers and pointer semantics: when they see a pointer, they know what is required of them

Not really. Should you call free on the pointer once you're done with it? The type alone doesn't tell you. You need to consult the documentation to know how you are meant to use the value, whether or not a pointer type is used.

[0] https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenC...

[1] https://github.com/foonathan/type_safe/


People know they need to find out whether and when they can or must free a pointer. If it looks enough like a file descriptor, they know they might need to call close() on it. For other types, there are no standard conventions.


Okay. Have fun using C APIs without reading documentation and wondering why your programs keep crashing, I guess. What else can I say?


I know why programs other people write keep crashing. They are coding them in C (or bad C++; but I repeat myself), and are relying on abstractions that conceal exactly that which they should instead expose.


Maybe the data type isn't a pointer, maybe it's an integer handle to a kernel object. And you may still need to free/release it even though it's not a pointer at all.

There are plenty of APIs that have similar semantics (connect/disconnect) that must be followed. And even if something is a pointer doesn't mean you can just free() it -- you might need to use whatever method the API provides to get proper operation.

This is the main abstraction mechanism in C programs -- the abstract data type. A piece of data you pass around but never know the innards of.


File descriptors are an example of such a type. In good code they are carefully kept equal to -1 when not open, and are carefully identified, and not leaked.

When something has reference semantics but isn't an actual pointer, that is an extra fact that the reader of code needs to know in order to reason correctly about it. That is, thus, a burden above what would be needed for an actual pointer. It is a burden that is harmful to impose entirely unnecessarily.


Pointers aren't special, and memory is not the only resource that requires managing.


Pointers are special. They have implicit conversions, and a wide variety of pre-defined operations, typically only a few of which are correct for any particular use.

Managing any resource is trivial with destructors. If you need to manage a resource, a pointer is a poor way to do it.

In C, you have a poverty of choices, but people are anyway used to being careful around pointers.


I feel that a large fraction of the complexity of C++ comes from the imperative to abstract away the 'pointerliness' of pointers. If I have understood the article correctly, it is an example of abstraction by anti-unification (over pointer and value-semantic variables), and it works very well in the Standard Library.

In this view, there is something ironic about how many of the complexities of the language came about to enable the abstractions of the Standard Library, which greatly simplify writing applications in the language!


This is a novel approach to designing a programming language that, I believe, came along with C. Older languages, like, for example, PL/I, tended to put much of the standard functionality in the language itself; C, C++, Haskell, etc., on the other hand, decided to externalize everything except for the "core" facilities; the benefit of this is of course that a developer can now, theoretically, use the language to implement anything, but that comes at the expense of the language being conceptually more complex. (Even a simple language like C is considered complex by many due to its excessive reliance on the use of pointers and the manual memory management.)


and disadvantages being it will be never as clean, as easily optimizable or debuggable, as the one implemented in language.

UFCS is better than C++ ranges. builtin sum types are better than std::variant, etc..


Every user-level feature that you cannot implement about equally well in a library as in the core language betrays a weakness in your core language.

All languages have weaknesses, and so end up with core features. Sometimes the language evolves to the point where the core feature is not needed anymore because a library does equally well, or sometimes better. A good example is the C++ Standard Library std::array, which is better than a built-in C-style array in every way except being a little longer to type.


Something that showed up in 1972 cannot still be called "novel" 50 years later.


There is nothing ironic about a language that provides unlimited power, including the ability to restrict the power of things built with it. That is the whole point of language design. At least, languages meant for professionals.

Many languages have strictly limited power, in order to keep non-professionals from getting into trouble.


> A common example of a harmful abstraction, in C and C++, is a typedef of a pointer to an opaque name. The typedef conceals, when you read code using it, that the type is really a pointer, and so subject to all the excess operations provided for pointers in those languages. But one thing every C and C++ programmer always needs to know about a type they are working with is whether it is a pointer.

C (and C++) programmers only need to know if a type is a pointer if they are meant to know it is a pointer. Otherwise very often a foo_t is meant to be used via manipulation functions and it being a pointer or an integer or whatever else is a detail that the programmers do not need to care about. This is a very common approach for libraries when trying to make a shared library with a stable ABI, though it is also common when you want to isolate accessing the underlying data of whatever a "foo_t" represents to a single known module.

Often you see "foo_t*" instead of "foo_t" (with foo_t being an undefined struct) but with "foo_t" being unusable by itself it is always stored and passed around as a pointer and thus the same -incredibly few and irrelevant in practice- pitfalls would apply anyway, so in reality it all boils down to personal stylistic preference if you want to litter all those stars in a project or not. Personal preferences like that have nothing to do with youthfulness or even experience.


I.e., you lack the experience to understand that concealing pointer semantics of a type is harmful.


> I.e., you lack the experience to understand that concealing pointer semantics of a type is harmful.

I have been working with C codebases for decades and have more than enough experience thank you.

You didn't really reply to what i wrote though.


Suffice to say, there is way too much bad code out there, and we all pay.


Are HANDLEs as used by Win32 an example of what you are talking about? They may be implemented as a pointer to something, but that's not how you use it.


Yes, HANDLEs are the worst kind of abstraction: they hide everything you need to know, and provide no corresponding benefit.


Believe me, you do not want to know what is hidden behind a HANDLE!


The nice thing about a HANDLE is they can be a pointer to kernel space, a pointer to something in user space, or an opaque index into an internal table of managed resources. If it was a pure pointer, then 0 has to be avoided as it can be mistaken for "no value" by users. Of course, you replace that with knowing to check for INVALID_HANDLE_VALUE.

IMO, HANDLEs are probably a bit too opaque for my taste, as you have to know what operations you should be using with them. I'd prefer opaque structs if I had to choose.


That is not, in fact, a nice thing. It is exactly what you would better have different types for, that implicitly do the right thing, and only ever do the right thing.

HANDLE is the worst of all possible worlds.


On further reflection I agree.

An opaque struct confers the ability to know the set of valid operations involving said type, while not giving up any advantages of a HANDLE. Issues of indeterminate size can be worked around with a single void*/uintptr_t field.


You are not wrong; the problem is, "different types" are language-specific, whereas HANDLE is a primitive (machine-level) type that is not and thus can be easily used from any language.


I don't even want a HANDLE.


Yeah, for example passing such an opaque typedef as const is a common trap and very often source of surprise (even for the author of the API), something like:

    struct Opaque {
        int data;
    };

    typedef Opaque *OpaqueRef;

    void bar(const OpaqueRef ref) {
        ref->data = 42; // compiles fine, ref is 'Opaque *const'
    }

    void baz(const Opaque *ref) {
        ref->data = 42; // does not compile, ref is 'const Opaque *'
    }


In my experience, beginners tend to underabstract, and it's the intermediately-experienced ones that go too far in the other direction. The most experienced ones are somewhere in the middle and closer to using abstractions where they make sense. Unfortunately, the majority of the community and industry is in the middle group.


"the majority of the community and industry is in the middle group"

I.e., "intermediately-experienced" and younger than the "most experienced". As I said. You are unlikely to find yourself relying on a library written by a beginner.


Yep, my experience as well. I'd say (in keeping with the article's terms) that the intermediately-experienced tend to overuse indirection, but underuse abstraction.


There is something about pointers. Pointers are special. But there isn't, and they aren't. They are types (or values) just like any others. In C++, you can say either auto p{...}; or auto* p{...}; and I always use the former in order to make code as much "abstract" as possible.


Every abstraction has a cost. If it is not delivering more value than that cost, it is a net liability. Saying "auto p = " when you could say "auto* p = " saves you practically nothing, but costs the reader, potentially, a lot. Somebody reading the code who doesn't instantly understand without a bunch of poking around that p has reference semantics is at a marked disadvantage. You have stolen that advantage from that reader.

There is a reason that the language permits "auto& r = " and "auto* p = ". That reason is, specifically, to help prevent errors. Failing to spend a single keystroke to help prevent errors is false economy.


Except auto& does not prevent errors, and it is a source of innumerable problems, because '&' is too easy to forget.

auto* does not add anything to making code more reliable, either; it just makes code tied to a particular implementation detail (e.g. I could redefine things using my own pointer-like object, and the * would simply stand in the way for no good reason whatsoever.)


I think ncmncm is a total scrub, but I will say two things:

First, both adding an ampersand after auto when you shouldn't have and not adding it when you should have can lead to logic errors of different kinds. Sometimes either one is fine, sometimes you have to use the correct one.

Second, auto * vs auto by itself has no effect on the generated code, but it can lead to a more robust source. Imagine that you get a pointer from some function and you perform some operations with it, assuming it actually is a pointer in a way that would be invalid if it wasn't. If the function signature later changes to return a pointer-like object, you'd want the compiler to give you an error to know to fix that code. In other words, you should use auto if you really don't care at all if the value is a pointer or not, and you should use auto * if you're assuming it's a pointer.


It has always been permitted to not make your code as clear as it could be, and always will be. Each of us may choose to help the person reading our code, but not all of us care to.


The discussion about abstraction in this article is interesting and thought provoking. It's useful to think about the different things we call abstractions. Very much worth a read.

But I feel like it tries a bit too hard to re-define what abstraction means ("only this subset of abstractions are real abstractions"), instead of just putting "abstract interpretation" next to it. This idea can stand on its own without trying to re-define the world.

Secondly I understand the discussed notion of abstraction as a domain model, which is information (data and functions) _about_ the world. I like the term model better here than "idealized mapping" because the former implies a bit more that there are assumptions about the data and usage context of said model (often implicit, unstated) and that it is incomplete.


From the article:

> But we've had a pretty good definition of abstraction since 1977, originally in the context of program analysis, and — I claim — it actually translates quite well into a precise definition of “abstraction” in engineering.

So I don't think the author is saying that all these other people are "wrong", but rather that this one particular definition seems to capture the underlying essence of what we intuitively think of when we say "abstraction", and the rest of the "hodgepodge of different concepts" break down if you analyze them a bit deeper (as the author did in the "Not Abstraction" section).


Yes and I don't agree that this is particularly fruitful. I might have completely misunderstood it but here we go:

The common theme in the Not Abstraction section is: These things are used to build abstractions and are not necessarily abstractions themselves. This is not only nitpicking but it's also false. Programming language features like functions and interfaces are already abstractions, even in the narrow sense that the article describes. They are an idealized mapping to something concrete (register machines, vms etc.) and provide precision, soundness and are based on specification.

Again, I very much like the core idea presented here, but I think it is completely unnecessary to make the claims about what is and isn't "true" abstraction.

Take this quote:

"Abstractions are separate from the code, and even from the abstract domain. It does not make sense to say that the bookTable function or anything else in this file “is” the abstraction, (..)"

The function _is_ not the abstraction because abstractions are entirely conceptual? Is it a way to communicate an abstraction then? What about the actual machine code the compiler produces from that function, is there some abstraction?

None of this helps me. The interesting part is that code is used to model information, it's not the world itself or even tries to be.

Then this:

"Feel free to call numbers an abstraction of the hardware, but be prepared to switch to this precise terminology when there's tension on the horizon."

Simple abstractions that are not discussed are names and signs. These are abstractions in the very real sense and they are very simple on their own. But they are entirely contextual and are allowed to mean different things. For example the term "abstraction".


I think you've misunderstood it.

Let's design a system that allows us to run virtual computers on a single hardware platform. In some part of this system we need to map real hardware addresses to virtual ones in order to isolate the different virtual computers.

Now you might need a way to take a real, logical address to a memory location on this machine and get a virtual address. You might have to run this system on many different hardware platforms and you only know the platforms you will run on today (or the next few months). Almost without thinking about it you might choose to use a virtual method on a class to define a function signature for this operation. That way folks implementing this system for different hardware platforms can create a sub-class and fill in a concrete definition of the virtual method. Now users of the system can translate an address and don't have to think about the underlying hardware platform they're running on! Abstraction!

Only it's not an abstraction.

If you attempt to formalize what it means to map a hardware address to the virtual address you will find it quite difficult because your interface provides no information about hardware addresses. In other words the virtual method does not put in all of the details of the concrete implementation and so we cannot maintain an invariant in our specifications that would ensure that we only translate valid addresses, or that translation is deterministic -- properties that would be important for users of the abstraction to truly be able to ignore the underlying complex domain. If we could write such a specification for this virtual method though, it would be very complicated, and that is a good sign this is not a good abstraction.

Abstractions are often separate from the code because they're much more general than a single program. If we came up with a better abstraction in a formal logic like separation logic or using Hoare logic -- we might be able to prove that the properties we care about hold and therefore any program that implements this specification will have the same properties. And there might be several such programs!

So what industry programmers are often saying when they are talking about "abstraction" is "indirection and vagueness." What this article is trying to show is that we can give precise definitions to our abstractions so that we can build new _semantic_ layers that have precise meaning: when I translate a hardware address I get a valid, deterministic virtual address and nothing else.


These are very useful properties to have. And I rather use an abstraction where these properties hold, rather than one that just takes care of some plumbing. There is a qualitative difference here.


Ya. My brain is all thumbs when trying to handle the philosophy parts of software.

Both this OC and the wiki article on 'abstraction', wrt to software design (eg OOAD) make me think of generalization and specialization.

--

My own working definition of 'abstraction' vs 'mental models', because I lack more proper terms:

- abstractions hide details, aka black box

- mental models explain how something works, even if over simplified

An example of a good (quite useful) abstraction is the initial Java Virtual Machine. Its 'interface' (per this OC) is quite good, meaning comparatively small impedance mismatch, and acceptable 'leakiness'. As we've seen over time, driven by optimization, that abstraction is necessarily bypassed or mitigated with JMM, FFI, unsafe stuff, intrinsics, direct memory, etc.

Using an example of a mental model from my prior works, because I'm precaffeinated, and I can't think of a better example than Newtonian mechanics, already mentioned.

I used to work in print production, like books and magazines. For bookwork, I created an algorithm for the folding of big sheets of paper into 'signatures' containing printed pages. My app accounted for folds, binding, and cutting. It revealed the manufacturing process, so there could be no mistake. It reduced an Illustrator-like application (ScenicSoft' Preps imposition app) to a simple form. This simplification was possible because I had invented a better mental model.

--

My working definition of abstraction doesn't jive with anyone else's. I'm open to alternate suggestions. https://en.wikipedia.org/wiki/Abstraction


I think of things in terms of abstractions and models. Abstractions generalize commonalities into a vocabulary that can be used on any entity that is covered by the abstraction. A sequence is an abstraction, but data structures themselves aren't as far as software is concerned. Data structures can be sequences, but you can't create a sequence.

A (logical) model is a simplified representation of the real thing that is still complete enough that it can be used to successfully reason about and accomplish goals with a system. A linked list datastructure is a real thing with a number of possible implementation details, but it still has a straightforward, concrete model of what it is. It's not an abstraction.

A model does not have to be abstract at all; it can be very concrete, even if it in practice hides some underlying details.

It seems common in software to either get lost in abstraction so that you forget the model, or consider the model so rigid that you miss useful, simplifying abstractions.


I really like your description of how models enable reasoning about the problem.

Ya, abstractions vs models. I'll drop the "mental" qualifier, thanks.

> get lost in...

Totally. Too many times, the implementation comes to be treated as the model, completely obfuscating the client's actual problem.


It reminds me of a concept I've been hearing a lot recently: Dictionaries do not define words. They tell you what a word is being used for, but nothing about a dictionary gives any moral or philosophical credence to any particular definition.

In other words, the definition of a word depends on its real-world usage; dictionaries attempt to capture common usages, but no definition of a word can be considered the "true" definition more than any other used and understood application of the word.

So in this case, calling these definitions "imposters" or "not-abstractions" is absolutely wrong. That said, I like the attempt to introduce names for different flavours of abstraction to make discussion easier. That's definitely useful. Just wish it wasn't off the back of incorrectly telling people that their use of the language is incorrect.


There's actually two different schools of linguistics: prescriptive ("language is this") and descriptive ("language is used like this"). A dictionary can be written in one or the other, or some mixture of the two. To say that all dictionaries are purely descriptive is incorrect.


There are - and without going into too much detail about the art of lexicography (of which I'm not an expert) natural languages tend to be handled descriptively (e.g. Esperanto not being a natural language is exempt from this, French being one of a few natural languages that are notable exceptions, as they have an institute that does prescribe the language - I believe for France French it's the Institut français - in an attempt to fight the influences of English on the language. I believe that French-speaking Canada has a separate institute too)

[edit: this information might be out of date - looks like there isn't a central forum for this in French any more, however forums do spring up periodically]

But yes, English is - these days - predominantly handled within a descriptive framework, and dictionaries such as Miriam Webster [1] and Oxford English Dictionary [2] point out that they only exist to describe usage.

There may be an argument to be made about definitions within scientific fields; but without the dictates of a cohesive authority, it's very hard to claim this as truly prescriptive. It is up to groups of expert practitioners to come to consensus on the definitions. And you can argue whether that is the very definition of prescriptive (because they prescribe the word's usage on the rest of the world) or descriptive (because they as the predominant users of those words are describing the academic usage of the words)

But I have yet to see an English dictionary that claims to be fully prescriptive. If it did, it would have dubious authority to do so. Similarly, I feel like we're so many miles away from a consensus on the Computer Science definition of "an Abstration" that my original point stands.

[1]: https://www.merriam-webster.com/words-at-play/descriptive-vs...

[2]: https://public.oed.com/blog/quiz-myth-busting-the-oed/


Even descriptive dictionaries imply a certain amount of prescription, as: "prefer using words as they are herein defined, if you care about others being able to understand you".


I would tend to agree. I've heard the "dictionaries only record usage" argument before, and I've always found it unconvincing. The same people who argue for purely descriptive linguistics when you challenge a word usage of theirs for being ambiguous will turn around and correct you when you misspeak.


Fully agreed! And this:

> That said, I like the attempt to introduce names for different flavours of abstraction to make discussion easier. That's definitely useful.

Is the real value of this article.


This is really interesting!

Encapsulation might be a more appropriate word in most cases, since abstraction doesn't tell you what specifically you're abstracting away from.

Coming from OOP, I've always thought of abstraction as just "A thing that hides the real nature of something and presents it as a standard and simplified model, and lets you ignore more things than the non-abstract truth allows".

drawImage(file, x, y) is an abstraction, because I don't need to know how it works or how it knows what decoder to use.

One might even say a toilet is an abstraction because what goes into it is(Presumably) dealt with in a sanitary manner by professionals.

Things like monads and composition are abstractions in the colloquial sense, objects far from some concrete thing like a car or a pen, but using them effectively requires understanding the logic, so in a sense they are leaky, because the act of using them "leaks" the mathematical logic, and subjectively nothing like that comes to mind when I think of abstractions in programming.

They're definitely abstractions, but they are used to reveal and illuminate a program's structure, not to hide it in a black box.


> They're definitely abstractions, but they are used to reveal and illuminate a program's structure, not to hide it in a black box.

In my mind, abstractions are used to reveal the program's interface and intent more than its structure. The functions, data structures, variables, and ultimately the source code reveal the program's true structure.


This seems like an ultimately doomed attempt to redefine the suitcase term "abstraction" to be something more mathematical. But you aren't going to change how people use the word "abstraction" by telling them they're using it wrong. If you need a more precise terminology, it would be better to come up with some different words or phrases and popularize them.


> something more mathematical

It has been done. Category Theory (a.k.a. "abstract nonsense") has been pretty successful describing abstraction using the language of functors.


I've come to think of an abstraction as a potentially lossy transformation of a model into one better suited to the task at hand.


... to one somebody hoped was better suited to the task. That hope is not necessarily realized.

If it is not lossy, it is likely not to be providing net value.


That's not true. Consider the problem of detecting null pointer dereferences. Bucketing points into "null" and "not null" is lossless w.r.t. this problem domain. This can be very useful. When you stick Top and Bottom into the possible options is where you get lossy behavior. That can be useful too, but it isn't strictly necessary.


Binning pointers into "null" and "not null" is, exactly, lossy.


With respect to the domain it isn't.


The domain is, exactly, lossy.


If you aren't considering an abstraction with respect to a specific domain then there every single abstraction is lossy, not just the useful ones. You cannot transform state in any way without loss because, given enough effort, all properties of state are observable. This makes "lossy" a worthless descriptor using your approach.

Instead, I propose that we consider abstractions with respect to some problem domain. If you have states {null, notnull} then this is lossless but {null, notnull, top, bottom} is lossy.


Where you put your lossiness is a matter of choice and taste.


You're not wrong.


Interesting. You've made me realize that models (in the scientific sense) are lossy transformations of reality.


Moreover, the point of a model is to be lossy: https://en.wikipedia.org/wiki/On_Exactitude_in_Science


Spherical cows being the obvious example - in some circumstances that might be a perfectly valid modelling assumption.


Yeah, this is how I think about Newtonian Mechanics -- a useful abstraction.


Starting with “point particle.” (By the way, theoretical mechanics has long been considered part of mathematics.)


If asked, I'd've said: "Applied ontology[1]".

When we code, we're philosophizing about information in some language (itself an applied ontology), for the decreasingly worse.

Occasionally one arrives at a coherent, stable product.

[1] https://en.wikipedia.org/wiki/Ontology


What the article describes as the true meaning of "abstraction" is very close to (if not the same) as "interface contract", and it’s correct that the important quality is to enable precise reasoning (of which “abstract interpretation” is one possible form) about the program behavior along that abstraction/interface boundary. Functions have an interface contract, classes have an interface contract, data types have an interface contract, things like REST APIs have an interface contract, and so on.

The fact that some languages, as mentioned in the article, have a language feature called “interface” is somewhat unfortunate, because it makes unclear when someone says “interface” whether they refer to the language feature or to the more general meaning. (For example, every Java class has an interface-in-the-general-sense, just like every class has an implementation, regardless of whether the class happens to implement an interface-the-language-feature or not.) Prior to that, it was always understood that “interface” implies a certain contract, usually one that imposes conditions beyond the mere type signature of the respective entity.

Incidentally, that’s also one benefit of nominal typing (as opposed to structural typing), because it expresses the fact that the interface is usually more than its type signature, and therefore two entities having compatible type signatures are not necessarily compatible in terms of their interface contract, and thus shouldn’t be implicitly convertible/assignable to each other.


A tangent about writing conventions: It pisses me off when a headline tells me that I'm understanding / doing something wrong.

I realize this is subjective, but I hear it as the author arrogantly assuming that their insight or approach is objectively better than those of their entire audience. Without even knowing their audience members are. They just know they're more correct than you, and they're happy to advertise that.

I'm sure it drives engagement, but it definitely reduce the chance of me reading the article.


He’s talking about what this article calls “anti-unification”, but this Bret Victor post really changed the way I think about abstraction: http://worrydream.com/LadderOfAbstraction/


Looks more like parameterization.


I think of an abstraction as a map from a model space into the problem domain. (A map that is not necessarily injective, and sure enough not surjective.)

I am sure category theory can say much about this.


The whole point of the article is providing a rigorous definition rather than the wishy-washy stuff people usually use. This is independent of category theory. Cousot was working in a different space.


No, I thoroughly appreciate that, it's just that the functorial language of the "abstract nonsense" seems perfect for discussing modeling and abstraction.


The concept of abstraction is simple and clear when you think about programming as primarily the act of teaching other people (future maintainers, other contributors, and my forgetful self) rather than writing instructions for a computer. An abstraction is simply a grouping of human-level concepts. Good abstractions are generally easy to teach and learn.

Some abstractions are more useful/helpful/powerful than others. The usefulness of an abstraction often depends on the domain. Much of the time, the best way to find out the right abstractions is to first create multiple things with no new abstractions, then decide that everyone will save time if you merge some of the concepts you created. It's good to use well-known, tested abstractions early on, but it's risky to create new abstractions prematurely.

Sometimes a function, data structure, interface, or API happens to embody an abstraction very cleanly. When that works out, the abstraction is easy to teach and tends to be more useful. Most abstractions are a bit messier than that, though, which is why abstraction is really a human-level teaching/learning construct rather than any element of programming languages.


I don’t mind the basic interpretation of abstraction that the article mentions, but I generally extend it to also mean that code (not just the coder) doesn’t have to deal with the underlying complexity.

For example, I consider most SDKs to be abstractions.

It’s really just semantics, as I guess we could argue that the “classic” definition applies to code, as well as to coders.


Reading this makes me think of "stratified"[0] design, which I think is a good example of abstraction done right.

[0] - https://medium.com/clean-code-development/stratified-design-...


> "Bad programmers worry about the code. Good programmers worry about data structures and their relationships."

That's catchy. But:

The practice of programming is about manipulating a special kind of data (code). And so: "Good programmers worry about code structures and their relationships."

(Even more so in the age of devops, infrastructure as code, etc.)


The lambda calculus definition of abstraction as lambda i think matches the idea that abstractions are not found in code. Performance constraints IRL are the key operational gap that mathematical lambda does not account for – as lambda cannot effectively account for hardware execution and network – and the sole reason (AFAICT) that the implementing code of an IRL software system running on hardware cannot contain true abstraction, and thus as software practitioners we should not seek them. Instead, we should seek to minimize LOC which has direct tangible impact on costs and outcome.


I've come to understand abstraction as a reduction of something complex into something symbolic that captures the bare minimum / absolute essence of the reduced thing, and not necessarily in an obvious way.

In programming, it could be the boundaries of a concept independent of implementation. In visual art, it could be a few lines on canvas that evoke the spirit of the thing referenced. In jazz, it could be strange notes on the peripheries and askew of rhythm that by being what is not, circumscribe what is.


Non-developer here that likes to try to understand CS. I have latched onto this definition from Stanford:

“But fundamentally, computer science is a science of abstraction — creating the right model for thinking about a problem and devising the appropriate mechanizable techniques to solve it.”

http://infolab.stanford.edu/~ullman/focs/ch01.pdf


You could fill pages on the subject of abstraction. While on the subject it would've been apt for the Author to quote or give a nod to the one of the pioneers in this field : Barbara Liskov, and the Liskov substitution principle. Not that I've anything against Joel Spolsky - but I'd read Liskov any day over Joel's.

So easy to get wrapped up in all the syntactic sugar.


This is a fairly common sense notion of abstraction (abstraction as abstracting away the details of something), but I don't think its the whole picture. I'm studying abstraction and the best formulation of this aspect I found was abstraction as the degree of complexity encapsulation.


Yea, but this complexity is not inherent to code.


What do you mean? Of course it is, all code is leaky abstractions.


> Abstraction in general is usually said to be something which helps readers understand code without delving into the details.

I agree with this. And to me, this means a good abstraction is necessary to be ambiguous. For example, number is an abstraction. But a type that is int32 is not an abstraction. In fact, we are not using a meta-layer that allow ambiguity is the reason that most abstractions force readers to be concerned with details in order to understand code.

Math abstractions are good and rigorous. That is because math is never concerned with reality. As soon as you deal with reality, math is ambiguous from start to end. For example, what is one? Math throw away all the physical details, that is what allows it to be useful -- one can understand math without ever concerned with physical details.

The current popular programming language are not suitable for abstractions. Our natural language are good for abstractions. But I think we probably can have something in-between, like what we come up with math.


What? How is int32 not an abstraction? There's no such thing as a "real" int32; it's just a bunch of metal in a box with electricity in it.


Abstraction is about the core idea without distracting details. Your core idea is likely to either have a finite bound (if the core idea includes bound) or have no bound (if your core idea is just about number). `int32` have the detail of signedness, minimum and maximum bound, underflow and overflow behaviors. Those are distracting details. And those are precise details. Thus, it is not a abstraction at all. It's like a specific 3-lb hammer, is that an abstraction of your hammer?


But an int32 is an abstraction over even lower details: alignment, endianness, and all of the compiler and hardware implementations of signedness, boundedness, and overflow behaviors.

There are multiple layers of abstraction. int32 is just one layer lower than you're used to.


It's not the right abstraction you are making. I assume here we are discussing abstrations above the base programming language level, such as structured vs spaghetti code.


I don't think that's a good assumption. The article above is talking about all abstractions, those built into the programming language, those not part of the programming language, and even those that span multiple programming languages.


These are all very abstract statements. (And you are not wrong.)



for me to abstract, is to elevate a concept to a parent category so that operations are interchangeable with other siblings.

Wether to know if something is an abstraction, could be tested by asking, is this thing useful to operate with different child categories in just one place ?

When you have an abstraction with just one child category, rather than useful is just confusing.

For example if in your universe would exist just on type of Vehicle called Car, but you would use Vehicle abstraction.


Its opposite is unification, which is comparatively never used in programming.

Isn't that just what people usually refer to as "inlining"?


Abstraction is not what you think it is if you think it is not what I think it is.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: