Tensor libraries are high-level, so anything below them can be hyper-optimized. This includes the application model (do we still need processes for ML-serving/training tasks?), operating system (how can Linux be improved or bypassed?), and hardware (general purpose computing comes with a ton of cruft - instruction de-coding, caches/cache coherency, compute/memory separation, compute/GPU separation, virtual memory - how many of these thins can be elided, with extra transistors put to better use?). There's so much money in generative AI that we're going to see a bunch of well-funded startups doing this work. It's very exciting to be back at the "Cambrian explosion" of the early mainframe/PC era.