Hacker News new | past | comments | ask | show | jobs | submit login
Let the callers of your API control allocations (cheney.net)
130 points by robfig on Sept 5, 2019 | hide | past | favorite | 73 comments



I read the title as "don't force callers to allocate" not "don't do allocations behind the call" which is what the post is about. Totally agree on that point, but with nuance. Any call that allocates (or blocks) should be obvious in name and API.

For example a lot of C libs will have an API along these lines:

    typedef void* my_api_t;
    int create_api (my_api_t* api); // almost certainly allocates

    typedef struct {
        // yada yada yada
    } api_result_t;

    int get_result(my_api_t api, api_result_t* result); // does this allocate? I have no idea

I think every function that might block/allocate should be stupid obvious in the documentation and function naming convention, especially if you're distributing closed source libs with opaque types.


In your `get_result()` example, the caller won't be able obtain the address of the new allocation in case the callee allocates. The pointer itself is passed by value and any modifications on it by the callee are done on a local copy. If the function was allocating, it would require a double pointer like so:

    int get_result(my_api_t api, api_result_t **result);
    ...
    api_result_t *result;
    get_result(api, &result);
A lot of confusion comes when API and library authors start typedef-ing pointer types into opaque types like that:

    typedef api_result_t *API_RESULT;
    ...
    API_RESULT res;
    &res; // is this a single or a double pointer?
    res; // is this a pointer at all?
If you meant that the function could possibly allocate fields of `result`, the API should always have a `free_result()`.


Good point, but the call could result in an allocation through the modification of a field of result

  typedef struct {
    void* some_data_pointer;
  } api_result_t;


  int get_result(my_api_t api, api_result_t* result) {
    result->some_data_pointer = realloc(foo_size + MAGIC_VALUE);
    // ...
  }


> If you meant that the function could possibly allocate fields of `result`, the API should always have a `free_result()`.

Yea that's what I meant. But this sort of gets into the weirdness of docs, if there is a `free_result` somewhere then authors should make sure to put a "see also free_result()" in their doc comments, especially when you don't have the niceties of constructors/destructors.


All depends on what "api_result_t" is, right? If the API's result is a bucket of scalars, then passing the pointer could be perfectly fine. Same if api_result_t contains a pointer to a buffer, and the call just makes the buffer pointer point to a previously allocated block.


If there is no void free_result(api_result_t *) in the API, then it probably doesn't allocate.


Be careful: free(3) could be the deallocation function.


Ideally memory handling and I/O should be overridable (and often is by creating some kind of "context"), but when it isn't then, yeah it should be obvious from the naming.

If it isn't - and especially if you're dealing with a closed source library - you have to assume the worst and even then you're often unpleasantly suprised :/


I read the title that way too, and the article almost seems like it's reinforcing that point until you get to the end. Kind of a weird almost unintentional form of clickbait.


I did the same thing as you.

Effectively what we have here is code allocating memory and handing it to you for care and feeding.

It's not unlike an unwanted gift. For some people this is an important and aggravating dynamic in their lives. I'm sure I've heard people complain about some aunt who "forces" plants or god forbid pets onto hapless relatives.


Are there tools that tell you where all your allocations come from? I imagine Valgrind but is there anything more lightweight?

I almost always write pool based memory allocators for my projects (CG, games), so I typically log that info myself. What would other folks use to profile that info?


On Apple's platforms, Instruments has a memory allocation logging tool that's very nice to use. Just attach to a program, no need to recompile:

https://www.agnosticdev.com/blog-entry/ios/profiling-memory-...

As a nice bonus, it can also look for memory leaks by running what is essentially a conservative garbage collector on your heap and reporting unreachable allocations.


Go's pprof comes with a great memory (and cpu + more) profiler: https://blog.golang.org/profiling-go-programs


jemalloc has a lot of profiling features you can turn on.


This falls into your "functions that might block" stipulation, but it should also be made obvious whether or not a function acquires a lock.


The HN appears to be different from the original.

"Let the callers of your API control allocations"

vs

"Don’t force allocations on the callers of your API"


I, too, read the title as meaning the opposite of what he's actually saying. He's saying that if you force your callers to do the allocation like Golang's io.Reader.read does, rather than doing the allocation yourself and forcing the new allocation on them (the way Python's fileobj.read call does), then they can reuse and pool allocations, improving efficiency substantially under some circumstances.

There are another couple of benefits, though:

1. A call which allocates can necessarily fail, although in garbage-collected languages like Golang, memory allocation failure are normally handled implicitly with something like panic(). (That's because the main benefit of garbage collection is not that you don't have to debug segfaults, but that you can treat arbitrary-sized data structures as if they were atomic integers — with immutability, even when they share structure. This allows you to program at a higher level. You lose this benefit if you have to know which calls allocate and which do not.)

1½. By the same token, dynamic allocation almost never takes a deterministic or bounded amount of runtime, though see below about per-frame heaps.

2. In many cases, you can do the allocation on the stack rather than the heap, with potentially substantial implications for garbage-collection efficiency. (You could see the MLKit's region allocation as an automatic version of this pattern.)

3. In environments like C++, it may make sense to use a different custom allocator, like the per-frame heap commonly used in game engines, or the pool allocator provided by the APR, or an allocator that allocates from a memory-mapped object database like ObjectStore. The alternative is something like the rarely-used allocator template parameter that makes all our STL compile errors so terrible. Even in ordinary C programs, it might make sense to use a static allocation for these things some of the time.

On the other hand, putting the allocation on the caller ossifies the amount of space needed as part of your API, and prevents that space from being dynamically determined at runtime by the algorithm. So there are times when letting the caller allocate all the space is undesirable or even impossible. And, as Cheney points out, it does make your API more of a hassle to invoke — but that's easily fixed with a wrapper function that does the allocation.

A third alternative, used for example by Numpy and realloc(), is the one mentioned in https://news.ycombinator.com/item?id=20888920 (though it erroneously implies that C has optional parameters) — have an optional buffer argument telling where to store the result, but if it's not passed in, dynamically allocate a buffer and return it. Taking advantage of this parameter to avoid extra allocations and cache misses commonly doubles the speed of Numpy code, in my experience, but it really hurts readability. As a general API design technique, this seems like the most constraining to future implementations, and I think it makes more sense to separate the underlying stable non-allocating API from the allocating helper wrapper, as described above.


> though it erroneously implies that C has optional parameters

I believe the (poorly conveyed) implication was that the parameter was nullable.


Yeah, the author said that in a subthread.


Since multiple commenters have found the title misleading, we've taken a crack at replacing it with something that expresses the article's point. If anyone suggests a better title—i.e. more accurate and neutral, and preferably using a representative phrase from the article (I couldn't find one in this case), we can change it again.


Thank you!


The ultimate of this is libraries like BearSSL (https://bearssl.org/) that do no dynamic memory allocation at all. This is particularly nice for moderately big embedded systems where you don't want to deal with the impossible to contain complexity that comes from having a heap.


Almost all modern crypto libraries avoid allocation. NaCl (https://nacl.cr.yp.to/), TweetNaCl (https://tweetnacl.cr.yp.to/), Libsodium (https://libsodium.org) (except for Argon2), Monocypher (https://monocypher.org) (even for argon 2, though users will never allocate hundreds of megabytes on the stack)…

From a systems point of view, the fact that modern crypto doesn't require heap allocations makes it one of the easiest domains to work with. (The maths are exacting, but the good APIs are a joy to use).


Its also nice for small embedded systems because it is more memory efficient than e.g. OpenSSL. I once ported it to the ESP32 for that reason.


What do libraries like this usually do in lieu of heap allocation? A large buffer reservation at program startup? Calls to alloca?


Ultimately this is why Halpern, Lakos, et al. pushed for polymorphic allocators to be added to C++. We make heavy use of them because it gives the caller the full control over the allocator used in nearly all situations then the code is written properly to take and propagate allocators according to the rules.

(A little example posted by Bartlomiej Filipek showing a vector neatly using a stack buffer)

  #include <iostream>
  #include <memory_resource>
  #include <vector>

  int main() {
      char buffer[64] = {};
      std::fill_n(std::begin(buffer), std::size(buffer)-1, '_');
      std::cout << buffer << '\n';
  
      std::pmr::monotonic_buffer_resource pool{
          std::data(buffer), std::size(buffer)
      };
  
      std::pmr::vector<char> vec{&pool};    
      for (char ch='a'; ch <= 'z'; ++ch)
          vec.push_back(ch);
  
      std::cout << buffer << '\n';
  }


I’m sorry, but I’m not seeing the point you’re trying to make from your example: std::vector takes a custom allocator as a template parameter already (also, the order of characters in implementation-specific in C++, FWIW).


There is a certain elegance to time-tested C APIs that show exactly how data structures are managed. One of the things you can lose in OO implementations is the ability to efficiently store data, or even understand where your memory is going. (And I feel that the varying semantics between languages don’t help, e.g. people that were used to doing "new X()" for everything in Java move to C++ and keep trying to "new" things that don’t need to be sent to "new".)

It is important to have the option to deviate from “objects” in projects that can require significant memory. I would not allow objects to wrap absolutely everything, being passed all around the system, to the point that it’s impossible to control the biggest users of memory.

As a simple example, imagine an object that strictly encapsulates storage of “64-bit int plus bool”. At a small scale, the wasted space is an implementation detail but for a few million objects you have a pretty ridiculous memory footprint. A fancy encapsulated object probably gives no option to avoid this, such as “put your array of million 64-bit values over here, and your buckets for bits over there”. When this happens enough times, you have holes all over the place and it’s almost too late (where do you start to fix it?). That is also the type of design that can consume shockingly more memory when moving from 32-bit to 64-bit, e.g. some class that had moderate holes with 32-bit pointers has bigger holes with 64-bit pointers.


Isn't this dependency injection though? As in, the caller must manage the buf() dependency on it's own - promoting performance and other qualities like code reuse. For example, the caller might have a library that manages creating these buffers for large bytes or maybe over network or what not. Then such dependency injection will promote code reuse of the library as well.


It's analogous to dependency injection, and you could do it via dependency injection by passing in an allocator object (or class, in the STL), but normally I think of dependency injection as injecting polymorphic callable objects, not monomorphic memory buffers. So I think of passing in memory buffers as a lighter-weight alternative to dependency injection for cases like this, and one that can provide some additional guarantees (see my top-level comment for details).


To be honest, it's more like separating the concern of resource allocation from the concern of resource utilization. And when your resources are things like memory or devices then that promotes separating the code you can unit test from the code that you have to integration test.


That's a good point. I hadn't thought of that --- if you want to test how your code handles allocation failures, allocating in more places means you have a lot more to test.


It's at least inversion of control.


From experience, I would generalise this advice to, "let the callers of the API control side effects".

The justification is largely the same, i.e. that the caller generally has more context and thus better ability to decide when and how side-effects should happen.


Lisp n00b on StackOverflow: "why the append function didn't change my list?"

Because it's letting the callers control side-effects! :)


I don't totally get this.. isn't it in the nature of reading via read(2) that it is possible it reads some data and then errors? If this is indeed reflected in the io.Reader.Read interface, then - regardless of who is allocating the buffer - the caller must evaluate both the result and the error.

It would indeed be possible in that scenario to return the buffer, and to return the error in a subsequent call, but this seems to change its behaviour into one that (to me) is surprisingly different.

The same is true for writing; write(2) can write some, but not all of the passed in data, and then err. What kind of API would one want tosee here?


>> The same is true for writing; write(2) can write some, but not all of the passed in data, and then err. What kind of API would one want tosee here?

If OS "writes some bytes and then errors" - then for the current call write(2) will return the number of bytes written, and then on the next call you'll get the returned error.


Depends on your definition, in C a partial read or write isn't considered an error. You can't have both 8 bytes read and also get an EIO, it would have to be one or the other. It depends on the device and the why the read/write stopped that determines if you get an error like EIO or it just reports less bytes than you asked.


I don't think this is correct, the return value is either -1 (an error) or a length is returned.


>What kind of API would one want tosee here?

In C it's almost always Length | error_code


In Common Lisp the way I usually handle this is with an optional arg. If the caller passes in a buffer[0] that's where I put the result; otherwise I allocate one and return it. The more general approach is to accept a continuation argument. One can even do both at the same time by dispatching a generic function on the type of the arg (but in that case the arg cannot be optional unless you write a wrapper function around the generic function.)

[0] Passing buffers around is much less necessary in Lisp than in C-like languages. This is about the times when it still makes sense to do so.


I'm not that great in golang but I don't understand why you "must" not consult the error first in this situation:

>First they must record the number of bytes read into the buffer, reslice the buffer, process that data, and only then, consult the error.


I'm not that familiar with io.Reader but it sounds like you might get back both some bytes and also a read error. If you handle the read error before handling the bytes by, say, "return err", then you lose the bytes you got before the error. Losing those bytes could be bad.


>Because of the high cost of retrofitting a change to an API’s signature to address performance concerns, it’s worthwhile considering the performance implications of your API’s design on its caller.

ObjC is very verbose partly because it went with the "signature encoding" approach, which has its own tradeoffs, but I do appreciate the API consistency.

https://developer.apple.com/library/archive/documentation/Co...


This is mostly a CoreFoundation thing, FWIW. Objective-C generally has enough expressivity to indicate ownership without this convention.


This is why for C/C++, I always like to have something like:

foo * bar(foo * useme=0);

where one can optionally pass memory to be used, but if nothing is passed, it will be allocated by the function (placement new can be helpful here!).


That can't be "C/C++" because C doesn't have optional parameters.

In C++, a better thing would be to take advantage of overloading directly:

  foo *bar(foo *); // doesn't allocate; non-null required
  foo *bar();      // allocates
Then there is no run-time check of the pointer value for null. bar() is just a wrapper like:

  foo *bar() { return bar(new Foo); }
Moreover, the value provided by bar() is pretty low; it's like a syntactic sugar that rewrites bar() to bar(new Foo). I'd consider not adding that to the API.


I don't think OP meant that as an optional parameter, but more like:

  // Definition
  foo * bar(foo * useme);
And then use as

  // Allocate foo for me pls.
  foo * result = bar(NULL)
or

  // I'll allocate the buffer myself ktkxbye.
  foo * result = bar(malloc(…))


That's what you'd do in C. Since the malloc return has to be checked, and the object may have to be initialized, the API could provide enough convenience to justify it.


> C doesn't have optional parameters

You are over-nitpicking. It's obvious that the GP meant that you can pass NULL by default if needed.


But when would that be needed? The argument is optional, and null is its default value.

But the way the overloads are collapsed into a single function, there has to be a run-time check for that null.

Using a null to indicate "argument is not there" is a C API trick. That parameter won't be optional in C.


If you only have a K&R-style declaration visible, you can still call it with the wrong number of arguments, but (contrary to what you'd expect from doing the same thing in JS or Lua) most C implementations will instead pass some garbage data as the additional parameter, rather than NULL. Also, most C implementations will allow you to cast function pointers in a way that results in calling the function with the wrong number of arguments.

If you're especially unlucky, that garbage data will happen to be NULL at first, but then become non-NULL when you recompile with debugging turned off, resulting in truly fascinating ways to spend your Friday and Saturday nights the month before shipping.

(I know you know this, Kaz. I'm mentioning this for the benefit of those who haven't yet spent their Friday and Saturday nights this way.)


If your compiler is allowing K&R declarations, or no declarations at all, without warnings, it's best to review the diagnostic options. If you have that nailed down early in your project, you won't have that frustrating afternoon later on.


That's true.


I'd agree if GP hadn't also used C++-specific optional parameters. Given the minefield that is vendor specific C++ish extensions to C, it seems worth a mention, even if I'm unaware of a compiler vendor that allows default parameters in your C code.


It was just an illustration in C++ (the C version would just document it, obviously). There is no way to do this in a vendor specific way without some sort of name mangling that will hopelessly break the ABI...


Yes there is:

  void foo(void* p=0); // has exactly the same ABI as
  void bar(void* p);
  bar(); // error: missing parameter
  foo(); // shorthand for foo(0);
  // void(*)(void*) is a different type than void(*)(void*=0)
  // but they can be implicitly converted the same as
  // eg void* and char*


hmm I guess so!


Yes, I know C has no function overloading (or optional parameters) of course. I didn't feel like writing it out twice though.

The pattern is more useful when there are more arguments to foo and you don't want to have bar(int a, int b, double c, foo * useme) and bar(int a, int b, double c) in the API.


There's another common C API pattern that I was surprised to not see mentioned. You pass a buffer pointer and the size of the buffer, and the function returns the number of items written to the buffer. If the pointer is null or the size is zero, the return value is the size that you need to allocate to ensure proper function completion.


I'm curious about what kind of situations you're using this pattern in C++. Not objecting to the practice in any way, it's just that I haven't encountered many situations where I wouldn't just have created another function for the allocating case that called the non-allocating one.


Something like...

    double * doMathOperation(unsigned N, const double * in, double aux1, double aux2, double * out = 0); 

(obviously, in C, you would leave the = 0 out and document it instead).


I've always been puzzled by this awkward API and couldn't understand why it was this way. It had the side-effect of making me unsure as to whether or not I should be using it and why, which is never a nice feeling when you're trying to write the best code possible. Now I understand! Code anxiety just decreased by a lot.


Don't do allocations in the call until the API becomes annoying and unpleasant to use. Also weighted upon the domain and performance implications of the particular API

By nature of being a buffer means it can probably be used in subsequent calls. You dont need to read everything all in one go


I sense that the modern CS education seems lost a lot of its root in computing fundamentals...

This type of findings are pretty much the daily routines of engineers a decade or longer ago...


This is useful even for those of us who have been around for "a decade or longer." We page out of memory the things we don't use very often.

Moreover, modern CS education teaches a lot of things that weren't taught in the past, and it likely only gives a hand-wave to things like memory-management etiquette in low-level C (or Go) API design. This is likely because (1) it's not that difficult to read up on the specifics if you're otherwise a competent engineer and (2) other concepts were valued higher. Perhaps the course focused on writing code with borrow-checking in mind instead.


My CS education is going on two decades old, I think it was decent, and the ideas in this article aren't something that was taught back then in any class I had.

For the most part, it was the other way around, we mostly learned abstraction, so the lessons in class focused on how to encapsulate and hide state. At a higher level than that, education really focused (and still does, I think) on engineering for functionality over engineering for performance.

In some sense, the problem with APIs that allocate is larger now than it ever was before, and growing larger. Data sizes are bigger now, and memory pressure relative to what the processor can do is going up.


While I agree that the IO interface is pretty standard everywhere, doesn't mean the standard couldn't/shouldn't be better for higher level languages.

At the application layer, I really don't care to utilize a partially filled buffer, I want a result set that has what was read. Of course, nobody is being stopped from creating such things as abstractions. StreamReader for C# as an example.


It's great that they are sharing their (re)discoveries, the real shame is that we software people don't have a giant, open and free repository of all these blog posts, Software Engineering articles, and interviews from our practitioners of all ages.


Well, arXiv, Library Genesis, Wikipedia, GitLab, GitHub, and Google Scholar together are a pretty good resource for this. They could be a lot better — for example, being fully legal — but they're pretty amazingly good right now


For low level APIs, I would say anything that could be configured on a per call basis should be parameterized: memory allocations, database & http connections etc.


FYI, in the recent versions of .NET you could do this by accepting Span<T> instances instead of the usual array + index + length.


This is also valid for any library, gem, module or SDK.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: