Data Center TCP: TCP Congestion Control for Data Centers

vitus · on Oct 18, 2017

Is this the same as the SIGCOMM '10 DCTCP paper [0]? I see that the author lists of both strongly draw from Microsoft, but there's no overlap otherwise. If so, what's changed in the past 7 years? Why publish the RFC now, especially since it's not on track to become an Internet standard?

Something called DCTCP has been in the Linux kernel since 2014 [1]; that commit even cites what looks like an earlier draft of this RFC that dates back to 2014, although that was on the standards track back then [2]. Why was that effort seemingly abandoned?

[0] https://people.csail.mit.edu/alizadeh/papers/dctcp-sigcomm10...

[1] https://git.kernel.org/linus/e3118e8359bb7c59555aca60c725106...

[2] https://tools.ietf.org/html/draft-bensley-tcpm-dctcp-00

otterley · on Oct 18, 2017

The RFC clearly states:

> This document describes DCTCP as implemented in Microsoft Windows Server 2012 [WINDOWS]. The Linux [LINUX] and FreeBSD [FREEBSD] operating systems have also implemented support for DCTCP in a way that is believed to follow this document. Deployment experiences with DCTCP have been documented in [MORGANSTANLEY].

> Why publish the RFC now, especially since it's not on track to become an Internet standard?

Presumably it's to guide future implementors who wish to attain compatibility with the existing implementations.

cryptonector · on Oct 18, 2017

> Why publish the RFC now, especially since it's not on track to become an Internet standard?

One might ask why ever publish RFCs that are not on the Standards Track? But that seems a bit silly. We have Experimental, Information, BCP, and Standards tracks for a reason.

As to why publish now? Most likely because the authors finally got the energy to reach the finish line recently. It does happen that some Internet-Drafts take years to reach the RFC-Editor queue. Taking a long time is not fatal.

paralelogram · on Oct 18, 2017

Is this better than Google's BBR?

https://cloudplatform.googleblog.com/2017/07/TCP-BBR-congest...

unmole · on Oct 18, 2017

Entirely different usecases.

jchw · on Oct 18, 2017

Really? Google appears to claim to be using BBR for intra-data-center communication.

kev009 · on Oct 18, 2017

DCTCP is designed for.. data center networks. You have first party control and can ensure ECN works end to end on your equipment.

BBR is designed for the "hostile internet" where you can't rely on ECN marking, and basically tons of people are willingly and unwillingly plotting against you.. middle boxes that do policing and shaping and just plain bizarre things, routers that clear options, other worse/unfair congestion controls, extreme variation in buffer sizes, etc

profquail · on Oct 18, 2017

The BBR whitepaper gave examples of improvements on high-capacity backhaul links (e.g. a 10Gbps WAN) and lower-speed, last-mile connections to end users.

lwheelock · on Oct 18, 2017

How is this practical considering known incompatibilities with traditional ECN and most enterprise environments include inter connectivity with systems out of the “controlled environment”?

The demand for this type of congestion control is seemingly driven by top of rack topology used in cloud architecture.

It would then stand to reason that hardware manufacturers would better solve variable queuing requirements in the switch than developing protocol support known to have a major incompatibility and potentially introduce interoperability problems between vendors in mixed networks.

jsnell · on Oct 18, 2017

First, DCTCP will degrade gracefully if ECN isn't working. Ok, so you won't get the early warning about congestion. But eventually congestion would cause packet loss, and DCTCP would react to it appropriately.

Second, Linux allows setting the congestion control algorithm per-route. So you could set up DCTCP for communicating with the IPs in the same data center, and use the default CC algorithm for everything else. And what if you can't use/don't want to use per-route settings? Well, you'll generally have two classes of machines anyway. Frontends that can communicate with the outside world, and backends that can't. So you could set up different congestion control based on the role of the machine.

Solving this in the switches seems tricky. Sure, per-flow rather than global or per-port queues could be used to solve the mice vs. elephants problem. But it does not help with TCP incast unless you also add huge buffers. You want switches to be simple, fast and cheap. A switch with per-flow queueing and huge buffers seems like the opposite.

dmpk2k · on Oct 18, 2017

> The demand for this type of congestion control is seemingly driven by top of rack topology used in cloud architecture.

That's one heck of a use-case.

wmf · on Oct 18, 2017

It's more expensive to solve things in the switch (although I know how to make it cheaper...) so if you can handle the limitations of DCTCP then it's a good solution. I can definitely imagine that deep-buffer VOQ switches would be less expensive than DCTCP in heterogeneous enterprise datacenters.

enos · on Oct 18, 2017

How is this different than Explicit Congestion Notification (ECN)? ECN has been in the kernel for many years and is half enabled by default.

YouKnowBetter · on Oct 18, 2017

DCTCP improves congestion control by using ECN markings to estimate the extent of congestion instead of just the presence of it. Given congestion markings on all of the packets in a window, DCTCP will halve the window just like traditional TCP does when it detects packet loss or when TCP+ECN does when it sees a single ECN marking. If DCTCP sees fewer markings it will back off proportionally less.

Link: https://www.soe.ucsc.edu/sites/default/files/technical-repor...

tonykarkats · on Oct 18, 2017

Yes, that is the idea. DCTCP just uses the ECN markings to adapt the congestion window proportionally to the amount of congestion.