Optical Circuit Switching for ML Systems

semessier · 2023-10-01T02:11:47.000000Z

to add to the MEMS cross connect discussion for HPC: there would be novelty and utility in Infiniband being modified to provision an extra OCX/OCS link if the fabric sees localized network congestion. This can be effectively done by the subnet manager, and it's likely that the current logical path would adapt accordingly. https://x.com/VHPCworkshop/status/1708303406278791238?s=20

jeffbee · 2023-09-30T21:11:31.000000Z

If your organization did not start their optical switch platform efforts ten years ago, don't despair. You can now buy this technology off the shelf. https://www.hubersuhner.com/en/products-en/fiber-optics/opti...

theideaofcoffee · 2023-10-01T00:02:09.000000Z

“Now”, this technology is multiple decades old at this point, but due to it still being a niche, low-volume product, is still fiendishly expensive. The previous (192x192 port) version of that Huber-Suhner switch ran nearly $205k the last time I priced it out and that was with discount approximately four years ago.

Individual MEMS mirrors are multiple $hundreds each, fiber collimators are approximately a hundred bucks in quantity.

buildbot · 2023-10-01T02:50:22.000000Z

That really does not seem that expensive - pretty sure 800Gb/s switches with that many ports cost around the same ballpark.

semessier · 2023-09-30T23:10:53.000000Z

minus the ml hype workload, there is no real novelty in the approach. https://x.com/VHPCworkshop/status/1708257890039910497?s=20

dgacmu · 2023-10-01T00:12:49.000000Z

This is not really accurate. OCSes like this have been used for a long time in telecom, and we've proposed their use in datacenters in various ways as research, but this is this the first large scale datacenter deployment that I'm aware of. It's quite nice to see it being published.

(* I was at Google when this was happening but have no relationship to it; I was publishing competing/collaborating work on OCSes with the UCSD folk a decade ago)

vinay_ys · 2023-10-01T07:45:52.000000Z

Traditional corporate network architectures were electrical packet switching networks with a hierarchical topology. This design worked well for smaller networks, but when scaled to large datacenter networks, it resulted in unpredictable packet blocking and dropping. This made it difficult to build disaggregated storage-compute architectures for latency-critical distributed SQL/NoSQL databases or near-realtime streaming big-data workloads.

The advent of faster and cheaper network speeds and feeds made it economically feasible to recreate decades-old circuit switching network topologies like Clos/Fat-tree in packet switched networks. This solved the problem of packet blocking and dropping for any-to-any traffic patterns, even at line speed. This simplification of the network topology made it easier to place workloads within the datacenter and to build larger workloads.

However, there was still a problem of head-of-line blocking due to flow routing imbalances. This could cause packet drops and latency spikes. This problem was solved with the development of end-to-end traffic aware routing techniques using software defined networks (SDNs).

As companies built more datacenters, they needed to interconnect them with high-speed campus/metro networks. These networks could be fully meshed, but there was a limit to how much bandwidth they could support. This led to the need for software defined controls for managing traffic priority and quality of service (QoS) across these high-demand links. This was especially important when campus/metro links failed and degraded capacity.

However, implementing all of these network functions in routers would have made them very complex, expensive, and difficult to manage. This led to a deeper dive into SDN.

As companies gained more experience with SDN, they realized that the traffic patterns between large pods of hardware (cells) within a single datacenter were fairly stable. This meant that the full-clos topology between these cells was underutilized. Additionally, the next generation of network speeds and feeds was so fast that it would have required a complete rip and replace of the existing infrastructure.

To avoid this rip and replace, companies looked for ways to make the existing layers feed/speed agnostic and to improve their utilization. This led to an interest in optical switching, but with much faster switching speeds and software defined control (unlike traditional telecom optical circuit switching).

Today, the hardware and software stack needed to implement all of this is still very bespoke and not as mature as electrical packet switching networks. This means that it is not yet a viable solution for smaller players. But the amount of investment AI/ML workloads are attracting and how much of a big difference these OCS interconnects can make between these large pods will likely lead to renewed interest in commercializing the OCS solutions for smaller players. So, in that way, this publication is very timely.

visarga · 2023-09-30T19:48:59.000000Z

I have high hopes for optical computing applications in ML, we need 100x efficiency to run LLMs on edge.

boulos · 2023-09-30T22:15:26.000000Z

Optical switching is about training at scale and replaces classical physical networking systems (people will say things like "electrical", "copper", etc.). Running models at edge devices is almost by definition without networking.

scrps · 2023-09-30T23:07:01.000000Z

Not my area, just curious.

If I am getting this (still reading the paper... on a phone :/)... Stupid fast latency due to no processing overhead but slow switching speeds due to how it physically switches, this is basically good for any type of reconfigurable cluster of systems that need low latency without a fixed topology, not strictly ML specific or have I misunderstood that?

cavisne · 2023-10-01T00:47:41.000000Z

Correct, google also has a paper on this at the DC level

https://arxiv.org/pdf/2208.10041.pdf