Well the problem with that is "this is a curve that roughly fits the data" is a bad way to go about constructing a model. It's a useful and neat observation, but that doesn't make it a good model. It might model random accesses to memory reasonably well (such as traversing a linked list, the example used there), but it doesn't model a scan over an array well. That doesn't make for a useful model. In contrast, the external memory model assumes a fast internal memory of size M (e.g., cache), which can be accessed in constant time, and a slow external memory of infinite size (e.g., RAM) which can only be accessed in blocks of size B (e.g., a cache line). Then you count how many blocks the algorithm needs to read or write. Now, scanning an array of size N takes O(N/B) I/Os, whereas scanning a linked list of the same size takes O(N) I/Os. The complexity of sorting is O(N/B log(N/B)/log(M/B)) I/Os. This models the same behaviour in a much cleaner way, applies equally to all levels of the memory hierarchy (you can also view RAM as the internal memory and a hard disk/SSD as the external memory), and is widely used in what you called "serious academic work" :) See also https://en.wikipedia.org/wiki/External_memory_algorithm
Furthermore, the introduction in the article you linked misunderstands Big-O notation so incredibly fundamentally that I don't think the author has done their background reading on machine models and Big-O notation.
It is not obvious that a two level model is a cleaner way to think about todays memory access, which has 4-5 levels of caches before you even hit possibly NUMA RAM, then an SSD, then a HDD and then maybe big datasets that can only be accessed over the network.
But then, I am a physicist, not an engineer, so to me starting from empirical observations is actually a very good way to construct a model.
Well you can apply it to any pair of (adjacent) levels of the memory hierarchy. But the main problem with the square root model is that it only models random access time, but not when they are incurred and when data is already in cache. (There are also 2-3 levels of caches, no architecture that Iām aware of has more than 3, maybe 4 if you count the CPU registers but their allocation is usually fixed at compile time)
Furthermore, the introduction in the article you linked misunderstands Big-O notation so incredibly fundamentally that I don't think the author has done their background reading on machine models and Big-O notation.