That's indeed done all the time in electronics: for example, RF CMOS usually trailing on a node three or four generations behind the bleeding edge.
However, all-optical/photonic computing is just intrinsically so much worse than electronics. On top of the issues that I touched on, there are also other fundamental problems, e.g. distribution of power: photons like to get absorbed by nearby electrons. How do you then supply all the active devices (switches/lasers/etc.) with power while maintaining some semblance of signal integrity and dense integration?
There is a special case: pumped laser amplification of signal in underwater fiber optic cables. That's all optical for the signal path as far as I know.
Could nonlinear wave interactions be applied in near vacuum, isolated from the lasers, amplifiers and counters? Think 100000*100000 imprecise loss-full tensor/matrix multiplications.
Exactly. While this speed vs space trade off makes less sense in mobiles, it might make perfect sense in industrial settings. Imagine 3D computers the size of a room (Craigh 2) but a 1000 times faster than any TPU only cluster.
Yes. IIRC, amd chips have been beating intel chips for a while now on transistor sizes but intel even with larger transistors still have a greater density on a chip (maybe it's changed in the latest gen).