Of historical note is that more than 8 socket Opteron servers were implemented in similar way. There was an chip (IIRC called AMD Horus) that contained two HyperTransport/CoherentLink controllers connected back to back.
It's a standard approach to making bigger systems, as most built-in cache coherency protocols are optimized for a too-small case. AMD Horus allowed easy build of 8x8 servers, SGI continued their work from Altix in UltraViolet (now owned by HPE?) where they used custom cache coherency chips in between, implementing custom cache directory protocol over NUMAlink connections.
Also, the approach of having "router" and "local area switch" is used in Infiniband, where local subnets use 16bit ID for endpoints, and 128bit addresses for inter-subnet routing (the addresses are IPv6 compatible).