> To have a different sub-tree, you must have a different root. However, you can...

Veserv · 2024-08-04T03:14:37 1722741277

As I explained, you are describing processes, in the Linux parlance, with a shared mapping. Threads with private mapping (shared root, different sub-tree) versus processes with shared mapping (different root, shared sub-tree) are equivalent in what can be expressed, but only the latter conforms to modern MMU hardware. The details of your scheme, and how they differ from existing code as far as I am aware, are in how to keep the various processes synchronized with respect to the shared component (synchronized copies vs references to a single shared sub-tree, keeping the TLBs synchronized, etc.).

As to software-managed TLBs, very few processors these days support such functionality, instead opting for hardware mapping table walkers in their MMUs which is the context for my comment. I am literally the author of memory mapping code for multiple architectures in a commercial operating system and even I do not need to consider such hardware.

wahern · 2024-08-04T04:12:10 1722744730

> As I explained, you are describing processes, in the Linux parlance, with a shared mapping.

I'm not describing processes because I'm not describing anything that currently exists, at least not in Linux or any other popular OS.

I may not be a maintainer for any current VM subsystem, but I've been around long enough to know it's ridiculous to get into semantic arguments about processes, threads, light weight processes, or other similar labels. The meanings behind such terms continually evolve and are dependent on context--particular operating system, hardware, etc.

If it's not possible to implement the OP's notion, please explain why it wouldn't be possible with current hardware; why one process'/thread's/LWP's/whatever's mapping table couldn't have a subtree shared (not copied) with another mapping table which could be manipulated with the appropriate semantics. I don't know if it's possible or not; that was the gist of my question. I don't need a lesson in how "process" and "thread" are currently defined and modeled in typical systems. Obviously they don't model what the OP was suggesting. You've hinted there might be hurdles with keeping the TLB coherent, but you haven't come straight out and said that yet. It would be genuinely interesting if you asserted the claim squarely, perhaps with some context about the relationship between TLB management, root page mapping address, and scheduling contexts.

EDIT: And just to be clear, I understand that in Linux processes can share memory, but the page table entries to the shared physical pages are copies, not references to shared entries, which is why both the protections and virtual addresses can be different. In the OP's proposed scheme having to maintain N copies of those mappings would obviously defeat the purpose; why even bother with threads if every time pages in the shared address space are mapped you've have to update every thread's page table--dozens if not hundreds or even thousands of separate tables. That's facially preposterous and it didn't even occur to me it needed to be stated explicitly, at least not in a forum like this. The whole idea clearly poses some difficult dilemmas. But because only allocations in the shared space need be globally atomic and simple--as they are now--it's not obvious that a scheme with private mappings on the side is impossible; at least, not obvious without recourse to specific knowledge of the details of how typical MMUs, TLBs, etc operate and are managed. And I wouldn't be surprised in the least if it's not possible to achieve what the OP suggested. In fact, I've been skeptical the entire time. I just don't have enough knowledge myself to explain clearly why it's not possible, and was hoping somebody would do that.

Veserv · 2024-08-04T06:30:19 1722753019

I took deliberate care to specify the terms and the concepts they correspond to, so I am confused by your statements that I was using ambiguous or loaded terminology. I was merely pointing out that the specific tactic proposed is untenable and instead demands a different approach and primitives.

As to your actual question, the underlying concept is fairly easy to do and fully supported by the hardware if done appropriately. There are no difficulties with TLB coherence because the problems are a subset of multithreaded TLB coherence, so any kernel supporting multithreading should already have those mechanisms readily available for repurposing.

The only difficulty is if the software abstractions of the memory manager you are working with disagree with the concept. I am unfamiliar with the Linux kernel, but if the abstractions are wrong for the concept then you probably need to go look and modify code related to processes rather than threads.