Yeah I know people preferring microservices for its horizontal scalability advantage but never for reducing latency. Maybe something was lost in the translation to the author.
They were likely running old Tibco/RV systems (network) distributed across the network (common for 1990s to early 2000 trading systems), and replaced the system (and hardware) with multi-core boxes, and use shared memory for message passing. Reduces internal latency from milliseconds to sub-microsecond.
Only thing that I can think of is that it would make it easier to detect the bottlenecks. Other than that, certainly in high-frequency trading - I only see it adding latency.