Hacker News new | past | comments | ask | show | jobs | submit login

You use mmap whether you want to or not: the system executes your program by mmaping your executable and jumping into it! You can always take a hard fault at any time because the kernel is allowed to evict your code pages on demand even if you studiously avoid mmap for your data files. And it can do this eviction even if you have swap turned off.

If you want to guarantee that your program doesn't block, you need to use mlockall.




You're not wrong. Applications and libraries that want to be non-blocking should mlock their pages and avoid mmap for further data access. ntpd does this, for example.

After application startup, you can avoid additional mmap.


This is technically true, but the use case we're talking about is programs that are much smaller than their data. Postgres, for instance, is under 50 MB, but is often used to handles databases in the gigabytes or terabytes range. You can mlockall() the binary if you want, but you probably can't actually fit the entire database into RAM even if you wanted to.

Also, when processing a large data file (say you're walking a B-tree or even just doing a search on an unindexed field), the code you're running tends to be a small loop, within the same few pages, so it might not even leave the CPU's cache, let alone get swapped out of RAM, but you need to access a very large amount of data, so it's much more likely the data you want could be swapped out. If you know some things about the data structure (e.g., there's an index or lookup table somewhere you care about, but you're traversing each node once), you can use that to optimize which things are flushed from your cache and which aren't.


Indeed. It's a question of scale: I write programs that can't afford to get blocked behind IO, ever, and that level, I need to pay attention to things like code paging, and even more esoteric things like synchronous reclaim.

If you're just optimizing stuff generally instead of trying to guarantee invariants, sure, ignore code paging and use direct IO for your own data.


But that's a different order of magnitude problem: control plane vs data plane.

At some point, we could also say that the line fill buffer blocks our programs (more often than we realize).

All of this is accurate, but at different scales.


Also many systems in 2021 have a lot of RAM and hardly ever swap.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: