I personally use a 'stride' instead of doubling. On small sizes doubling works O...

kaslai · on April 14, 2021

Except on non-embedded platforms, oftentimes large blocks of allocated memory aren't occupying physical memory until you write to them. There's not much reason to avoid using exponential buffer growth on a system with a robust virtual memory implementation.

rcxdude · on April 15, 2021

I would like to add one personal example where this isn't very true and is kind of painful: for a while I was running a windows system with quite a lot of RAM but not much disk space, enough that having swap space equal to my RAM significantly reduced the space available on an already nearly full disk. In linux this didn't represent a problem, because if memory was allocated and not used, it didn't count at all. However windows does not over-allocate: if you request memory there must be somewhere (swap or RAM) to put it. Without swap space this meant getting memory allocation errors with less than half of physical memory used, which was extremely annoying. I either needed to give up a huge chunk of my disk space (in practice I often needed 2x my RAM in swap to avoid memory exhaustion), or not be able to use all my actual RAM.

kaslai · on April 15, 2021

Hmm, I was under the impression that Windows would also overcommit but it seems you're right. I've been in Linux land for too long I guess. If Windows doesn't have a physical place that can back the allocation, it will fail :(

Seems crazy to me given how useful it is.

koverstreet · on April 14, 2021

Never do this - you're introducing a hidden O(n^2).

Folks, this is why you take the algorithms class in CS.

sumtechguy · on April 15, 2021

I know it is a hidden one, I should have been more clear in my statment. I found stride to be better in my experiments in my code for many of my use cases. It is a mater of finding the correct stride for your code (sometimes I make it self tune itself). If you pick stride x2 you will waste memory in some (many) cases. Some libraries calls touch the whole thing and make sure it is 0 and paged in. Usually you want to find a balance of 'not a lot of waste and not a lot of speed loss'. Now in some cases yeah x2 on every alloc where you bubble over is the perfect choice. Other times the O(n^2) is the right choice. I found stride to be a nice compromise. As many times you know up front what the block size is. Finding a good compromise on slack is sometimes tough. Especially if you have buffers that are discarded and re-used over and over.

One project I used a compromised stride alg. Basically if it every had to double create one of the strides it doubled the stride from then on. So it would say start 'cold' at allocating 4 bytes (you get 4, 8 or 16, usually at the bottom anyway). Then when you have to allocate 2 blocks in a row to fulfill you double the stride for the next time (which is coming in on the next tick of the for loop). It was not perfect but it also did not balloon the memory wildly out. This was because of the way the incoming data was structured. But gave a decent hidden average of size. It ended up very nicely between O(n^2) and O(n). Both speed and size wise.

This is why I have the fancy CS degree. To know the trade off between runtime and space usage. If you blindly pick algs you will have a 'bad time'. Sometimes space is of consideration (you have 256k total in your box), sometimes speed is a consideration (I need this to finish in 3ms). Pick one alg over the other and you can have a bad time in some cases.

perl4ever · on April 15, 2021

If you multiply by two, you're wasting from 0 to 50% at any random time, for an average of 25%, right? So if you don't like that, then multiply by a smaller factor. Multiply by 1.02 if you can't abide by having more than 1% empty.

axiosgunnar · on April 14, 2021

Could you elaborate? This seems very interesting.

magicsmoke · on April 14, 2021

Asymptotically, there's no difference between allocating a n+1 buffer and a n+k buffer before copying your old data in. You'll still get O(n^2)

In reality, it depends on the data you're handling. You may never end up handling sizes where the O(n^2) asymptote is significant and end up wasting memory in a potentially memory constrained situation. At the end of the day, it all depends on the actual application instead of blind adherence to textbook answers. Program both, benchmark, and use what the data tells you about your system.

If I've got a 500 MB buffer that I append some data to once every 5 minutes, I might want to reconsider before I spike my memory usage to fit a 1 GB buffer just to close the program 15 minutes later.

deathanatos · on April 14, 2021

The O(n²) here is the time spent copying of the data; it's not about the size of the buffer, or that you'll temporarily use 2× the space. The program would die by becoming unusably slow.

Take your 500 MiB example. Let's say we start with a 4KiB buffer (well in excess of of what most vector types would start you with). If we grow it by a constant 4KiB each time it runs out of space, by the time the buffer is 500MiB we've copied ~30 GiB of data. If instead we grow the buffer by doubling it, we will have had to copy ~1000 MiB (1 GiB) by the time it hits 500 MiB, difference of 30×. (Which is why the program would slow to a crawl.)

I'm ignoring the time aspect of your supposition; at that rate, you'd never really get to 500MiB using the starting sizes & strides suggested upthread. That's a highly exceptional case; for nearly everything, the general recommendation from CS is going to be the right one. Yes, profile & optimize where and when needed, but the thread is clearly about defaults, and this is one of those defaults that holds pretty tightly.

Further, at that size, if you don't use the memory, most OSes will simply never allocate the page. (They'll assign a dummy zero-filled page that will get swapped with a real page of memory one once the first write happens. So even at a large size, you'll end up using 1000 MiB of virtual memory, but not real RAM.)

O(n²) is playing with fire: it grows fast, fast enough relative to the data that it is a common source of bugs.

Edit: Gah, I'm off by a factor of 1,000, I think, so it's "only" 30×. (The point stands, though; the next doubling is 2GiB vs. 128GiB, and the gap widens fast.)

magicsmoke · on April 14, 2021

Yes I'm aware of how the algorithm works. I also know that if I allocated 500 MiB at the beginning of the program expecting my memory usage to be roughly that size, and my prediction was off by 50 MiB maybe I don't want to go hunting for another 500 MiB of space before my program ends or I stop using the buffer and free it.

But your point about the virtual memory makes that moot anyways. Thank god for modern OSes. I've clearly been spending too much time around microcontrollers.

deathanatos · on April 15, 2021

The case here: we have a buffer, and it grows as we stick bytes into it. A "vector" in many languages.

If you double the underlying memory each time you need more space, you'll end up copying some number of bytes that is linearly proportional to the final size. The final copy will be n bytes, the second to last n/2, … n/4, n/8, etc. The sum of those copies is 2n; that is, we'll copy about twice the final amount of data during all the resizes of the buffer. 2n is called "O(n)" in order notation, which describes how a function (in software engineering, usually time or memory use) grows in response to its input; it doesn't care for the leading constant factor of 2 ([1]).

If you instead grow the buffer by some constant amount — and it can be a single byte, or a page, it doesn't matter — you'll end up doing O(n²) copies. One way to see this is to list the copies out; also, let's say there's an imaginary copy at the front of 0 bytes (and is free), but it'll simplify the sum here; let's call the amount we grow the buffer by "k"; each time we copy, up to size n, will be:

  0
  k
  2k
  3k
  ...
  n-3k
  n-2k
  n-k
  n

To sum this sequence, We can pair each of those up: n + 0 is n; n-k + k is n. n-2k + 2k is n, and so on. You'll get n/2 copies of size n, so the total copied is about n²/2, or 0.5n². Again, the leading constant is unimportant, so it's O(n²).

This is a lot more work. Like, a lot a lot. Say you expand the buffer until you have 500 MiB of data. If you grow a constant 4KiB at a time, you end up doing:

  In [6]: sum(range(0, 500 * 1024, 4096))
  Out[6]: 31744000

~32 GiB of copying; if instead you grow by doubling, you'll copy ~1GiB, or 30× less.

The larger point is that O(n²) grows viciously fast compared to O(n). To quote Bruce Dawson, whose blog is filled with amazing feats of debugging slow programs,

> Roughly speaking it’s due to an observation which I’m going to call Dawson’s first law of computing: O(n^2) is the sweet spot of badly scaling algorithms: fast enough to make it into production, but slow enough to make things fall down once it gets there.

— https://randomascii.wordpress.com/2019/12/08/on2-again-now-i...

There's also this blog: https://accidentallyquadratic.tumblr.com/

There are worse things than O(n²), but they usually are so bad they're caught in testing. But O(n²) can work well enough in testing. There are things better than O(n²), but worse than O(n), like O(n lg n), but O(n lg n) is usually fast enough in a full, production data set. (Sorting, in the general case, is O(n lg n).) This is what Dawson means by "sweet spot".

The even more meta conversation is that in the HN discourse, there are often conversations that wonder what the value of a CS degree is. This is one of the times where there is value. And… it's not that uncommon. I use my degree fairly often, but I also had a fairly good school, IMO, that balanced the theory and the practical aspects. E.g., my database professor taught us Btrees, relational algebra, etc., but also had us work with a real database, learn SQL, and joked that a btree is O(1) (It's not, it's O(log(n), but in a modern database the base of that log is huge, and so it generally takes no more than a half dozen splits in the tree to find your item. (Above that, you're talking about databases with billions or trillions of rows.) In theory, the number of splits keeps growing, but in practice, finding a dataset that large becomes harder exponentially fast, and soon you run out of hard disks on earth, or something, that for practical purposes it suffices. (And/or, you'll know when it doesn't.)

[1]: https://en.wikipedia.org/wiki/Big_O_notation#Family_of_Bachm... ; specifically, note the |f(n)| ≤ k·g(n); that leading "k" is an arbitrary (in the sense that we're saying there exists some k) constant, g(n) is the thing inside the O(…). So O(2n) and O(n) are the same, because that leading constant of k is going to "eat" the 2; the constant we find for one will just be twice what we'd find for the other, so we remove it from g(n) when we discuss these things. The shape of the curve — how it responds to input size changes — is the point.

axiosgunnar · on April 15, 2021

Thank you very much!