Hacker News new | past | comments | ask | show | jobs | submit login

Religious devotion to abstraction is a common failing in younger coders. Abstraction performs a valuable service when it reliably masks details irrelevant to your task. It is actively harmful when it conceals from you details important to your task.

A common example of a harmful abstraction, in C and C++, is a typedef of a pointer to an opaque name. The typedef conceals, when you read code using it, that the type is really a pointer, and so subject to all the excess operations provided for pointers in those languages. But one thing every C and C++ programmer always needs to know about a type they are working with is whether it is a pointer.

Every abstraction has an unavoidable cost. The value it provides has to exceed that cost for it not to be a liability. Usually the main cost is that it is not as comprehensively documented as its creator imagines (if at all), and the user cannot trust what it will do without tracing through and understanding what details and pitfalls it hides. The more comprehensive its documentation, the more time it takes to read and understand, and the more likely it is to be, or later become, inaccurate.

Thus, part of the job of every programmer is to distrust abstractions. An abstraction works on cases it has been seen to work on, but often cannot be trusted otherwise. This is what makes Standard Library abstractions valuable: you have confidence the implementation has been verified to match the specification, and the specification (1) has been carefully designed to avoid pitfalls and (2) documents them, where it cannot. J. Random Abstraction in your local library rarely gets that much attention.

Make your abstractions earn their keep.




>A common example of a harmful abstraction, in C and C++, is a typedef of a pointer to an opaque name. The typedef conceals, when you read code using it, that the type is really a pointer, and so subject to all the excess operations provided for pointers in those languages. But one thing every C and C++ programmer always needs to know about a type they are working with is whether it is a pointer.

Nah, I don't agree at all. Renamed generic pointers are actually an extremely useful signal. They say "you should not concern yourself with what this pointer points to. You should only be using it through the library's functions. Do anything else with it at your own peril". Also, no, C and C++ programmers don't really care whether a value of some novel type is actually a pointer or not (C++ programmers much less so). They only care about what size it is to know how to pass it around and whether it needs to be cleaned up in some specific way. If I tell you that when you're done with a fluor_t you should pass it to release_fluor(), do you need to know whether fluor_t is a pointer or a struct, or whether release_fluor() is a function or a macro?


Anywhere it would have been useful to "rename a pointer", it would be more useful to wrap it in a type that defines exactly the operations intended, and none that are not.

If users have to call release_fluor(), your abstraction has failed to earn its keep.


Are you talking about a C or C++ API? If C++ then yes, I agree that would not be an idiomatic API. If C, then no, that's a perfectly typical API for the language.


If you will need to release it, then you also need to know it is a pointer that would afterward refer to freed storage.

Programmers understand that referents of pointers have lifetime that must be managed. Hiding that quality of your special-snowflake type does nobody any good.


>If you will need to release it, then you also need to know it is a pointer that would afterward refer to freed storage.

You really don't. If the documentation says that after calling release_fluor(), the fluor_t that was passed to it can't be passed to get_fluor_mass(), it doesn't matter whether fluor_t is a pointer or not.

I'm sorry, but you're just plainly wrong.


If you are relying on everybody who reads code that uses your type to have thoroughly scoured all the documentation, you are Doing It Wrong.

Programmers understand pointers and pointer semantics: when they see a pointer, they know what is required of them. An opaque type that has pointer semantics you can't tell without reading up on documentation is, exactly, an abstraction that costs more than it delivers.


> If you are relying on everybody who reads code that uses your type to have thoroughly scoured all the documentation, you are Doing It Wrong.

As fluoridation indicated, this is the norm in C programming. By means of example, in the OpenCL C library, bitfields all use the same type. You have to consult a function's documentation to know what you're allowed to pass. [0]

I do agree that the 'Ada-style' approach to types is far superior to the 'C-style', but Doing It Wrong seems a bit strong, unless you want to condemn all C code.

With the right library you can use the 'Ada-style' in C++. Specifically, using a 'strong typedef' library which prevents implicit conversions, such as [1], but not Boost's strong_typedef which permits implicit conversion back to the underlying type. Unfortunately this kind of thing isn't possible in C.

> Programmers understand pointers and pointer semantics: when they see a pointer, they know what is required of them

Not really. Should you call free on the pointer once you're done with it? The type alone doesn't tell you. You need to consult the documentation to know how you are meant to use the value, whether or not a pointer type is used.

[0] https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenC...

[1] https://github.com/foonathan/type_safe/


People know they need to find out whether and when they can or must free a pointer. If it looks enough like a file descriptor, they know they might need to call close() on it. For other types, there are no standard conventions.


Okay. Have fun using C APIs without reading documentation and wondering why your programs keep crashing, I guess. What else can I say?


I know why programs other people write keep crashing. They are coding them in C (or bad C++; but I repeat myself), and are relying on abstractions that conceal exactly that which they should instead expose.


Maybe the data type isn't a pointer, maybe it's an integer handle to a kernel object. And you may still need to free/release it even though it's not a pointer at all.

There are plenty of APIs that have similar semantics (connect/disconnect) that must be followed. And even if something is a pointer doesn't mean you can just free() it -- you might need to use whatever method the API provides to get proper operation.

This is the main abstraction mechanism in C programs -- the abstract data type. A piece of data you pass around but never know the innards of.


File descriptors are an example of such a type. In good code they are carefully kept equal to -1 when not open, and are carefully identified, and not leaked.

When something has reference semantics but isn't an actual pointer, that is an extra fact that the reader of code needs to know in order to reason correctly about it. That is, thus, a burden above what would be needed for an actual pointer. It is a burden that is harmful to impose entirely unnecessarily.


Pointers aren't special, and memory is not the only resource that requires managing.


Pointers are special. They have implicit conversions, and a wide variety of pre-defined operations, typically only a few of which are correct for any particular use.

Managing any resource is trivial with destructors. If you need to manage a resource, a pointer is a poor way to do it.

In C, you have a poverty of choices, but people are anyway used to being careful around pointers.


I feel that a large fraction of the complexity of C++ comes from the imperative to abstract away the 'pointerliness' of pointers. If I have understood the article correctly, it is an example of abstraction by anti-unification (over pointer and value-semantic variables), and it works very well in the Standard Library.

In this view, there is something ironic about how many of the complexities of the language came about to enable the abstractions of the Standard Library, which greatly simplify writing applications in the language!


This is a novel approach to designing a programming language that, I believe, came along with C. Older languages, like, for example, PL/I, tended to put much of the standard functionality in the language itself; C, C++, Haskell, etc., on the other hand, decided to externalize everything except for the "core" facilities; the benefit of this is of course that a developer can now, theoretically, use the language to implement anything, but that comes at the expense of the language being conceptually more complex. (Even a simple language like C is considered complex by many due to its excessive reliance on the use of pointers and the manual memory management.)


and disadvantages being it will be never as clean, as easily optimizable or debuggable, as the one implemented in language.

UFCS is better than C++ ranges. builtin sum types are better than std::variant, etc..


Every user-level feature that you cannot implement about equally well in a library as in the core language betrays a weakness in your core language.

All languages have weaknesses, and so end up with core features. Sometimes the language evolves to the point where the core feature is not needed anymore because a library does equally well, or sometimes better. A good example is the C++ Standard Library std::array, which is better than a built-in C-style array in every way except being a little longer to type.


Something that showed up in 1972 cannot still be called "novel" 50 years later.


There is nothing ironic about a language that provides unlimited power, including the ability to restrict the power of things built with it. That is the whole point of language design. At least, languages meant for professionals.

Many languages have strictly limited power, in order to keep non-professionals from getting into trouble.


> A common example of a harmful abstraction, in C and C++, is a typedef of a pointer to an opaque name. The typedef conceals, when you read code using it, that the type is really a pointer, and so subject to all the excess operations provided for pointers in those languages. But one thing every C and C++ programmer always needs to know about a type they are working with is whether it is a pointer.

C (and C++) programmers only need to know if a type is a pointer if they are meant to know it is a pointer. Otherwise very often a foo_t is meant to be used via manipulation functions and it being a pointer or an integer or whatever else is a detail that the programmers do not need to care about. This is a very common approach for libraries when trying to make a shared library with a stable ABI, though it is also common when you want to isolate accessing the underlying data of whatever a "foo_t" represents to a single known module.

Often you see "foo_t*" instead of "foo_t" (with foo_t being an undefined struct) but with "foo_t" being unusable by itself it is always stored and passed around as a pointer and thus the same -incredibly few and irrelevant in practice- pitfalls would apply anyway, so in reality it all boils down to personal stylistic preference if you want to litter all those stars in a project or not. Personal preferences like that have nothing to do with youthfulness or even experience.


I.e., you lack the experience to understand that concealing pointer semantics of a type is harmful.


> I.e., you lack the experience to understand that concealing pointer semantics of a type is harmful.

I have been working with C codebases for decades and have more than enough experience thank you.

You didn't really reply to what i wrote though.


Suffice to say, there is way too much bad code out there, and we all pay.


Are HANDLEs as used by Win32 an example of what you are talking about? They may be implemented as a pointer to something, but that's not how you use it.


Yes, HANDLEs are the worst kind of abstraction: they hide everything you need to know, and provide no corresponding benefit.


Believe me, you do not want to know what is hidden behind a HANDLE!


The nice thing about a HANDLE is they can be a pointer to kernel space, a pointer to something in user space, or an opaque index into an internal table of managed resources. If it was a pure pointer, then 0 has to be avoided as it can be mistaken for "no value" by users. Of course, you replace that with knowing to check for INVALID_HANDLE_VALUE.

IMO, HANDLEs are probably a bit too opaque for my taste, as you have to know what operations you should be using with them. I'd prefer opaque structs if I had to choose.


That is not, in fact, a nice thing. It is exactly what you would better have different types for, that implicitly do the right thing, and only ever do the right thing.

HANDLE is the worst of all possible worlds.


On further reflection I agree.

An opaque struct confers the ability to know the set of valid operations involving said type, while not giving up any advantages of a HANDLE. Issues of indeterminate size can be worked around with a single void*/uintptr_t field.


You are not wrong; the problem is, "different types" are language-specific, whereas HANDLE is a primitive (machine-level) type that is not and thus can be easily used from any language.


I don't even want a HANDLE.


Yeah, for example passing such an opaque typedef as const is a common trap and very often source of surprise (even for the author of the API), something like:

    struct Opaque {
        int data;
    };

    typedef Opaque *OpaqueRef;

    void bar(const OpaqueRef ref) {
        ref->data = 42; // compiles fine, ref is 'Opaque *const'
    }

    void baz(const Opaque *ref) {
        ref->data = 42; // does not compile, ref is 'const Opaque *'
    }


In my experience, beginners tend to underabstract, and it's the intermediately-experienced ones that go too far in the other direction. The most experienced ones are somewhere in the middle and closer to using abstractions where they make sense. Unfortunately, the majority of the community and industry is in the middle group.


"the majority of the community and industry is in the middle group"

I.e., "intermediately-experienced" and younger than the "most experienced". As I said. You are unlikely to find yourself relying on a library written by a beginner.


Yep, my experience as well. I'd say (in keeping with the article's terms) that the intermediately-experienced tend to overuse indirection, but underuse abstraction.


There is something about pointers. Pointers are special. But there isn't, and they aren't. They are types (or values) just like any others. In C++, you can say either auto p{...}; or auto* p{...}; and I always use the former in order to make code as much "abstract" as possible.


Every abstraction has a cost. If it is not delivering more value than that cost, it is a net liability. Saying "auto p = " when you could say "auto* p = " saves you practically nothing, but costs the reader, potentially, a lot. Somebody reading the code who doesn't instantly understand without a bunch of poking around that p has reference semantics is at a marked disadvantage. You have stolen that advantage from that reader.

There is a reason that the language permits "auto& r = " and "auto* p = ". That reason is, specifically, to help prevent errors. Failing to spend a single keystroke to help prevent errors is false economy.


Except auto& does not prevent errors, and it is a source of innumerable problems, because '&' is too easy to forget.

auto* does not add anything to making code more reliable, either; it just makes code tied to a particular implementation detail (e.g. I could redefine things using my own pointer-like object, and the * would simply stand in the way for no good reason whatsoever.)


I think ncmncm is a total scrub, but I will say two things:

First, both adding an ampersand after auto when you shouldn't have and not adding it when you should have can lead to logic errors of different kinds. Sometimes either one is fine, sometimes you have to use the correct one.

Second, auto * vs auto by itself has no effect on the generated code, but it can lead to a more robust source. Imagine that you get a pointer from some function and you perform some operations with it, assuming it actually is a pointer in a way that would be invalid if it wasn't. If the function signature later changes to return a pointer-like object, you'd want the compiler to give you an error to know to fix that code. In other words, you should use auto if you really don't care at all if the value is a pointer or not, and you should use auto * if you're assuming it's a pointer.


It has always been permitted to not make your code as clear as it could be, and always will be. Each of us may choose to help the person reading our code, but not all of us care to.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: