The article has been updated at this point to have a long header talking directly to Hacker News commenters, and explicitly asks us to challenge HN's censoring of political content. This is quite strange, as that experiment has already ended and HN is no longer censoring political stories.
I am not sure I would call this a bug. Is "call calloc, free, and calloc, without ever touching the memory" really a use case one should optimize for? One could argue that improving performance for this case only is worthwhile if it can be done in a way that barely impacts anything else.
Edit: apologies for being somewhat ambiguous. The "bug" I refer to is the Radar issue the writer of the blog created, not the performance issue with the Python code.
The bug report wasn't caused by calloc, free, calloc being expensive. It was caused by making a call that did "malloc + memset" when only malloc was needed. The code in question wasn't even invoking calloc at all, it just ended up being an excuse to investigate zeroing memory behavior.
Incidentally, the blog post asks "why was the memory being actively zeroed at all?" but never actually answers it.
This may be over-optimization. Allocating all-zero pages as copy-on-write doesn't reduce zeroing overhead. It just postpones it until the page is used, at which time zeroing takes place. Possibly at a higher cost, because you now have to take a page fault, allocate a page, mess with the MMU, and only then zero the page. Only for programs which allocate vast amounts of memory which they then never write is this a win.
If you zero upfront, you have to do all of that save the page fault anyway, but for large allocations you also thrash the cache zeroing stuff that's immediately evicted to memory. It really is faster in general to zero a page when the program actually touches it, since that's when the program wants it in cache (for certain thresholds of "large allocation")
The operating system can perform page zeroing during low load times and give out zero pages "for free" during higher load times. (So this can serve to smooth out CPU usage in spiky loads.) Or the OS may use special hardware that is inaccessible to userspace (e.g., Intel I/OAT) to zero pages in a way that doesn't consume CPU cycles.
Wouldn't it also help that the program would most certainly not write the entire large chunk of memory all at once. So the cost of copy-on-write zeroing is postponed and divided to individual write operations performed by the program instead of all of it happening up front at once.
The article mentioned testing on macOs and Linux implementations. I was curious and I tried the author's demo program on Windows Subsystem for Linux; it performs quite well, running almost instantly on a low-end Surface 3.
Its definitely interesting to see these very low-level implementation details bubble up to the surface from time to time.
I find the optimizations tend to derive from real use cases.
In this case, both Windows and Linux are used for huge server farms. MacOS not so much. Apple probably doesn't spend the energy optimizing this sort of thing since it's less likely to cause trouble for them.
Zeroing a 100MB buffer is pegging the CPU? It takes ~30ms to zero such a buffer on my machine, which seems to imply that `iter_content` here is allocating a buffer on each iteration of the loop. But if `iter_content` is just returning a bytes object, this would make complete sense.
One could perhaps pass in a bytearray object, or even have it return the same bytearray object each loop, with a very large warning on the tin about how you will want to take a copy if you want to keep it outside the loop's body. (i.e., the returned bytearray is owned by the iterator, and you're merely borrowing it for the loop; this is a potentially confusing API but would reduce the memory allocation churn significantly: you'd only alloc your buffer once, instead of O(download size) times.)
At gigabit Ethernet speed (in pps) setting big chunks of memory per packet happens to be expensive. At 10GbE you actually may need specialised hardware.
I agree. It's interesting that two seemingly independent sources make the same unfounded claim about this obscure C function so close in time. Since I replied in the first case, I was sensitive to the second case. Perhaps someone should make a third attempt so that we can conclude that it's a conspiracy to rewrite the history of calloc.
I am pretty positive they aren't unrelated. Both mentioned Python's requests library and PyOpenSSL if memory serves.
If I had to hazard a guess, it felt like the first one was potentially written by the reporter of the initial bug, the second by the actual developer who investigated it. This is just a guess, but I bet if you reread both you'll likely agree with me.
Comments on the post earlier in the week traced the history through many historic implementations of calloc and found that there was no evidence to support that claim.
At the time calloc was invented the class of "safety"-related integer overflow problems wasn't on the map at all. So a case of someone thinking about it at the time would be quite interesting.
I specifically asked for "historical evidence" because I expected that the first thing on your (or someone else's) mind would be to make this inference from prototype to intent. This expectation was met, and unfortunately you disregarded this subtlelity and moved on to reply. You didn't go look for historical evidence (that does not exist), a path which could have led to the "enlightenment" you sought. Historical claims should be based on historical evidence, not made up.
Just because an something is so, does not mean that it was intended to be so for reason X. As you'll find with experience, rationale cannot be deduced from code. Code is merely an indicator.
Indeed I disregarded this subtlety in my reply. Why? Because I asked you a question.
I don't mind putting in work to answer questions, but I won't put in busy work. If there's someone in front of me that knows the answers, I'd rather ask that person than try to find those answers again.
>a path which could have led to the "enlightenment" you sought.
But I wasn't seeking "enlightenment," or anything of the sort. I was seeking a piece of knowledge that you posessed. You essentially told me to go find it myself. I'm not doing that, because my time is a finite resource, and the person I'm talking to already knows and could tell me.
We merged comments here from https://news.ycombinator.com/item?id=13110615 because of an annoying limitation of our software. Feel free to check that thread if you want the gory details, but let's keep this thread on topic.