> Even when something isn't a galactic algorithm "how well it maps to being done efficiently in real hardware with hardware sized data sizes" is more and more often the important bit over "slightly better bounds at infinity"
You're just repeating the same claim. Why is that more and more often then important bit? Why is it more important now? Why are these factors not captured in complexity analysis?
For instance, some complexity analysis assumes random access memory of arbitrary size, but memory above a certain size is better modelled with logarithmic access time. But this too can be captured in complexity analysis, so it's not really evidence of any divergence.
And then you have cache-oblivious data structures that scale uniformly across all cache sizes, which is a product of complexity analysis.
So I'm asking for what exactly is being meant with a justification of why you think this matters now more than it did before.
Ignoring the galactic algorithm bit, as it's in the quote but I think you get why it would cause a divide here, just because you could try to do something in a field does not mean that's what is actually being done in a field. This is what I mean by probing the edges of mathematics vs practical construction, using the same toolset to probe isn't enough to unite those two approaches alone.
Look at this paper as an example. Does its worst case Big O analysis attempt to model memory hierarchies for constructable systems or does it further diverge from any hardware considerations in favor of asymptotic behaviors towards infinities for a generic construction?
This is a good summary, but getting down to exactly what I meant: there aren't any hash table authors or maintainers in the business who have been stopped from trying something because of Yao's conjecture. So this result is not going to unleash a wave of practical innovation.
>Why you think this matters now more than it did before.
More of the low hanging fruit has been picked over time. The motivation most of the original algorithms for a lot of computer science problems were practical. Once all (or most) of the optimal solutions for practical purposes have been found you are necessarily (or almost necessarily) left with only theoretical solutions.
I think because computers are more complicated than they used to be. They have more features that need to be accounted for than they used to. This in turned happened because cpu frequencies stopped increasing so we had to find other ways to make things go faster. Like simd, threads, speculative execution, caches.
If all of the latencies to the main memory and L1/L2/L3 caches have been only increasing throughout the years, is picking the right data memory layout more or less important than before?
>If all of the latencies to the main memory and L1/L2/L3 caches have been only increasing throughout the years
That's actually not the case at all both L1/L2 (to a lesser extent L3) have been improved dramatically in the past years. L1/L2 are effectively running at a very similar clock to the CPU, e.g. L1 nowadays could be 2 CPU cycles. About the L3, it does depend on the architecture and where the cache coherency occurs - yet generally the latency has not increased.
The main memory (DRAM) latency has been more or less similar, though.
>is picking the right data memory layout more or less important than before?
It has always been important, it's not about the latencies per se but the locality, which has been the case for more than two decades. The general rules are:: 1st, if you data set is small, e.g. N is tiny - it's effectively O(1) (as N is a constant), 2nd) pick bigO that works best [usually O(1) or O(logN)], 3rd) big the datastrtucture with the best locality.
I don't know where do you get the data but the latencies are surely not decreasing and 2 cycles for L1 is totally wrong. I should have perhaps been more specific that under latency I actually meant latency in terms of CPU cycles not the actual number (ns) which obviously depends on the CPU frequency.
Examining the few past generations of both Intel and AMD microarchitectures this is what I concluded. L1 is between 4 and 6 cycles. L2 on Intel is between 12 and 16 cycles whereas AMD is roughly around that but 1-2 cycles faster. L3 on AMD is ~40 cycles where L3 on Intel goes anywhere between 60 and even up to 120 cycles.
It is an engineering feat to increase the capacity of the cache without sacrificing a little bit of latency. It happens that the latency increase is sometimes less visible due to some other effects but generally speaking this is what happens.
> The main memory (DRAM) latency has been more or less similar, though.
~70ns of DRAM latency in a CPU core running @3Ghz is different than the core running at @5Ghz. Former is ~200 cycles while the latter is ~400 cycles. So, the higher the core clock is, higher the latency is in terms of CPU cycles stalled.
> It has always been important
Yes, I am aware of that and I am only trying to make a case for the parent comment since it was asking for a clarification why would it be more important today than it had been (at all) in the past.
> it's not about the latencies per se but the locality
I think it's vice versa because the actual issue we are trying to mitigate is the high DRAM latency. As a consequence it just happens that we use data locality to minimize those effects.
The answer is obvious, but I don't really consider that evidence of systems programming and complexity analysis diverging. When approaching any given problem, you would still start with a basic complexity analysis to get in the right order of magnitude, and then select among cache-friendly approaches that fall in that range. You almost never go the other way around.
You're just repeating the same claim. Why is that more and more often then important bit? Why is it more important now? Why are these factors not captured in complexity analysis?
For instance, some complexity analysis assumes random access memory of arbitrary size, but memory above a certain size is better modelled with logarithmic access time. But this too can be captured in complexity analysis, so it's not really evidence of any divergence.
And then you have cache-oblivious data structures that scale uniformly across all cache sizes, which is a product of complexity analysis.
So I'm asking for what exactly is being meant with a justification of why you think this matters now more than it did before.