Serving 6.8M requests per second at 9 Gbps from a single Azure VM

SEJeff · on Jan 20, 2016

They had me until the moment where I read "While our game servers are built more like HFT Quant trading servers" yet they are in the cloud... They clearly have no idea what they're talking to regarding finance, where latency matters a whole lot more than throughput.

"HFT Quant Trading Servers" optimize things down to the micro and even nanosecond level from the hardware to the bios all the way up to the networking interconnects and applications + c libraries. Optimization to this level is simply impossible from the cloud, where even with VT and SRIOV, the performance penalty is simply far too high (and measurably so!).

Disclaimer: I'm a technologist who has worked in the "HFT Quant / Electronic Trading" Industry the past 8 or so years. I literally build infrastructure like this for a living.

benaadams · on Jan 20, 2016

Yes, the analogy to HFT trading is a bit clumsy - sorry - what we're saying is that we've borrowed techniques from HFT (especially at the decisioning end) rather than saying our system is suitable for HFT (which it absolutely isn't).

Disclaimer: Author of article

SEJeff · on Jan 20, 2016

It was a good article, thanks for writing it. But when you know you're addressing an ultra technical crowd, please leave out stuff like that, it distracts from very interesting content.

benaadams · on Jan 20, 2016

It was a simile to express the critical importance of latency of calculation and dispatch in the system; in a graspable form for the reader - when I don't really know their technical ability - as its a blog post about a game; rather than a technical paper in a journal. I did say "is like" ;-)

For example in the interest management; it is working out what custom messages to send to what players about each other and with 50k players in the same combat it needs to do this without it ballooning to 2.5Bn messages for a round of position updates - of which there are many per second per player; then convert these individual compressed output streams that are unique per player; and move on to the next set - and even then as stated in the article we are still at 267M messages a second.

mc_hammer · on Jan 20, 2016

its probably fair.

gamers "flip their shit" when ping is not steady, and are only happy at ~50ms response times. a 200ms response in video games is usually considered "unplayable".

alfalfasprout · on Jan 20, 2016

In the meantime, a ping of 50ms is an eternity for most financial applications. For normal algorithmic/quant trading you usually want to stay under 10ms roundtrip. For HFT, anything above 100µs is pretty slow. Gives you some perspective.

benaadams · on Jan 20, 2016

It was more a reference to the required latency from message received on wire, to messages processed, routed and dispatched on wire - for the scale, rather than physical location and speed of light issues.

btym · on Jan 20, 2016

In any split-second game (and this article literally advertises "real-time twitch combat"), 200 ms is unplayable. With two players of equal skill, it comes down to "who can react quicker", and the 30 ping player will win every time.

vvanders · on Jan 20, 2016

Totally depends on the type of game and how the design is structured to deal with latency.

See my reference to SubSpace farther down, it's a highly skilled game that can be easily played on a ~250ms latency. The large majority of the game is predicting where the other player will be in ~0.75s and setting things in motion ahead of time to intercept them.

Any game based on prediction and designed for smooth re-integration of the game state can be done on latent connections. Heck there's even been some really impressive stuff in the fighting game space involving re-winding gamestate to resolve hits ~0.25s in the past(the original Counter-Strike does this as well, although not as well which is why you'd sometimes see people rubber-band back around a corner when hit).

rcarmo · on Jan 20, 2016

Yep. Loved it when we moved from ISDN to an E1, Quake became a lot more intense for my opponents :)

SEJeff · on Jan 20, 2016

50ms == milliseconds. In HFT, 50 milliseconds might as well be 5 years. Now 50 microseconds is quite fast, and 50 nanoseconds is where you want to be.

alfalfasprout · on Jan 20, 2016

Very few HFT firms are routinely operating in the nanosecond range. Under 50 microseconds is where most lie, and under 10 microseconds is where you'll find the fastest algos. Fact is, at that point even if you're colo-ing as a market maker the length of cable makes a difference. FPGA's and even ASIC's are pretty much necessary since even a CPU with off-die memory is going to be too slow

HFT is increasingly harder to make profits in: the real growth is in algos operating in the low milliseconds range capturing real market activity and non simple arbitrage.

SEJeff · on Jan 21, 2016

I'll counter that with: very few so called HFT forms are actually HFT. I've worked for two this way. My previous employer was Virtu financial (look them up, they IPO'd)

jnbiche · on Jan 20, 2016

> They had me until the moment where I read "While our game servers are built more like HFT Quant trading servers"

I read that and was hoping they were going to start talking about ring buffers and event sourcing, but no such luck.

benaadams · on Jan 20, 2016

That's like a secret sauce, post launch; great success, here's how we did it, article - too early for that :)

on Jan 20, 2016

[deleted]

SEJeff · on Jan 20, 2016

Everyone is welcome to their own opinion :)

I was at Ticketmaster for some time helping build very large load balanced web services which maxed dual 10G links on a regular basis in the early 2000s. I never said I'm any better than anyone or $industry is harder than another industry, so get the chip off of your own shoulder dude. Seriously, take the ego down a notch or five. I said that the statement made regarding HFT was factually incorrect, something you have not done anything to disprove.

Trying to push millions of packets over some exotic networking interconnect in as little time as possible is a very hard technical problem, not sure what planet this isn't a technical problem. Working with very bleeding edge new hardware and being in a place where you can build the perfect solutions even if they aren't always inexpensive is quite amazing. Finance is super boring, but the technology behind it is absolutely fascinating. I'm in it for the tech and the seemingly impossible problems, the money is only a small benefit.

TL;DNR: Quit being a dick, totally uncalled for. I'm a normal person just like you and everyone else here simply trying to better themselves.

on Jan 20, 2016

[deleted]

bpchaps · on Jan 20, 2016

As an innocent bystander and without commenting about the parent post, yeah. You need to lay off the ego.

zer0defex · on Jan 20, 2016

Fellow innocent bystander here: I agree.

brianwawok · on Jan 20, 2016

Seconded, where is the Solar Flare card bypassing the OS for network ops? (Does that even work in windows? Doubtful)

benaadams · on Jan 20, 2016

Uses the Registered Input/Output (RIO) API Extensions in Windows for user mode TCP/IP kernel bypass.

PacketDirect is the next step for Server 2016 which moves closer to the NIC https://www.youtube.com/watch?v=KaXfDjIhn0U

SEJeff · on Jan 20, 2016

A followup post on even more detail of this would be incredible if you're able to / have the time. I'm a Linux jockey who simply had no idea modern windows was capable of using hardware this well. Nicely done.

terry526 · on Jan 20, 2016

To be fair, it seems that even MS are a bit surprised about this throughput:

https://www.youtube.com/watch?v=CJeWIWkhVow&feature=youtu.be...

accountatwork · on Jan 20, 2016

Solarflare isn't the only option for OS bypass. In fact, no major cloud vendor uses Solarflare NICs even for applications where OS bypass is critical because Solarflare is way too expensive for what you get (the retail price is substantially higher than Mellanox, Annapurna back when you could buy those publicly, etc., without discounts, and the volume discount you get with other vendors is more than an order of magnitude larger than the discount Solarflare will give you).

The advantage of Solarflare is that it's easy to use, but at the scale large cloud vendors operate at (millions of machines), it's much cheaper to just have a networking staff that can properly operate another vendor's NICs.

brianwawok · on Jan 20, 2016

It was kind of a rhetorical question... because if you actually care about latency, you do not use windows nor use the public cloud.

I saw a few hundred Solarflare cards purchased, the prices were not that absurd. Only 10-20% of the total cost of the server, for a pretty amazing perf gain. Mellanox etc. tested similar but had less helpful sales engineers IIRC.

accountatwork · on Jan 20, 2016

Sure, you won't get much of a discount from anyone for quantities of a few hundred. You might be looking at a 2x price difference, which is pocket change for a few hundred machines. In quantities of hundreds of thousands or millions, the price difference is well over an order of magnitude because Solarflare won't play ball the same way other vendors will.

That adds up to a significant number, even if it's "only" 10%-20% per server, and you're probably overestimating the cost of the rest of the server since large companies are able to get steep discounts on almost everything in the quantities they buy in.

> Mellanox etc. tested similar but had less helpful sales engineers

We've found Mellanox engineers to be more helpful than Solarflare folks. I don't think that's because one company has inherently more helpful engineers. We're just not in a market that Solarflare cares about and you're not in a market that Mellanox cares about.

> if you actually care about latency, you do not use windows nor use the public cloud

You might be surprised by how many HPC shops have moved from fancy in-house infiniband networks and custom fabric to cloud hosted HPC clusters. HFT isn't moving to the public cloud anytime soon, but a number of large customers who care about sub-microsecond network latency have found real value in moving to the public cloud.

brianwawok · on Jan 20, 2016

> In quantities of hundreds of thousands or millions

Sorry, I do not know anyone who buys millions of high performance NIC cards. Maybe 1 or 2 supercomputer labs in the world? And with that in mind, I am not sure the price break a company will provide to someone buying a million $500 network cards is super important to me?

> That adds up to a significant number, even if it's "only" 10%-20% per server, and you're probably overestimating the cost of the rest of the server since large companies are able to get steep discounts on almost everything in the quantities they buy in.

Nope, I definitely knew what the whole server cost.

accountatwork · on Jan 21, 2016

So you're saying that you don't know anyone who buys in the quantities that we do, but definitely know what we pay?

I think you're responding to something that's not what I'm saying here. I never claimed that our pricing is relevant to you. Just that your pricing isn't relevant to us and that you're making generalizations that aren't valid outside of your niche.

happytrails · on Jan 20, 2016

Someone has to be a buzzkill :(

donutpepperoni · on Jan 20, 2016

Buzzkill or a realist? Personally I prefer when someone more knowledgeable on the topic corrects me about a statement I've made.

If the OP is making analogies that aren't comparable, I'd rather know than not so I can dig in with a healthy bit of skepticism.

corysama · on Jan 20, 2016

SEjeff is seriously over-reacting to reject the entire article on the basis of a bad analogy that isn't even central to the material. If the article was pushing this system for use in HFT, that would be bad. But, this is just someone who doesn't work in the field that SEjeff is very, very deeply involved in mentioning that field offhand, in a clumsy way, once.

ceedan · on Jan 20, 2016

bingo~

vvanders · on Jan 20, 2016

TCP for twitch game? Ugh.

Cool that you're getting a ton of throughput but you don't use UDP for throughput, you use it so that the one dropped packet doesn't back up all of your time-sensitive data. You want to drop packets that are out of order since you're using dead-reckoning to keep the game state psuedo-in-sync.

Seriously, there's a reason everyone uses UDP(or IPX!) since the days of doom.

corysama · on Jan 20, 2016

The whole article is about doing a test to see if they can get away with using TCP instead of UDP. They end up pretty pleased with the results. Though, they don't spend nearly as much time talking about latency as they do about throughput.

> For our high throughput needs, prevailing wisdom would suggest you need to write your server in C++, use UDP rather than TCP and run on a very high spec bare metal box running linux. We are already running the client in javascript; so if we ran over TCP (websockets), wrote the server code in managed C#, ran on Windows on a VM in the cloud – is this just madness?

benaadams · on Jan 20, 2016

Its a constraint of the browser technology stack; not only do we have to use websockets for real-time (TCP/IP) but we also have to go over SSL/TLS as intermediary hardware/software (routers, proxies etc) still routinely break websocket connections otherwise.

(Disclaimer: Author of article)

vvanders · on Jan 20, 2016

I really hope your game designers know how to mask latency. If you're going over TCP/IP then expect to need to resolve a game model that can sustain at least ~750ms of latency.

I totally understand the browser constraint part but you're going to hit some serious issues once you start having real-world clients that have >3% packet drop.

It gets even worse when you start adding different discrete layers on a single TCP connection(chat, larger game state not related to twitch parts, etc). Each one of those channels has the chance to drop a single packet and then your whole world comes to a stop until you can roundtrip that single packet that's unrelated to the realtime info you care about.

FWIW most stacks end up implemented channels over UDP with unreliable+unordered, reliable+unordered and reliable+ordered for the different latency needs of the specific channel.

I've done a fair bit of this stuff in a production environment and have to echo my original comment. TCP is great for lockstep but if you're doing dead-reckoning it will fall apart on anything other than a LAN.

Edit: I also don't mean this as a dig, just that I've seen many teams go through this and it's usually at a point where it's too late to make any changes.

benaadams · on Jan 20, 2016

Not saying its easy and we have discovered the reason most MMOs have long wind up animations for actions...

The other layers (chat, non-space game state) are handled by discrete microservices on other connections to other hosts - to keep the space flight data slim and connection dedicated in use.

Can always give it a try, our next public playtest is on Saturday 6 Feb 8pm GMT/UK | 12pm Pacific | 3pm Eastern US at http://www.ageofascent.com/

Just don't try it on a phone; give us a chance with the latency ;-)

vvanders · on Jan 20, 2016

I wish you the best of luck but I really think you've got a poor technology fit for what you're trying to do.

Games like SubSpace/Continuum were doing 300+ player twitch based on UDP based connections over 56k dialup back in '97. Any sort of packet loss(and subsequent TCP exp backoff) is going to weak havok on your latency.

Scaevolus · on Jan 20, 2016

"Windows/DirectX/UDP is too well-supported and mainstream, let's build a twitch MMO using Javascript/WebGL/WebSockets!"

I wonder what the intersection of "people that want to play MMOs" and "people that don't want to install anything" is. There are good game streaming technologies (e.g. Guild Wars) that minimize the patch time, so it feels like a bit of a straw-man.

terry526 · on Jan 20, 2016

Listening to this podcast atm; interesting stuff.

http://hanselminutes.com/509/inside-age-of-ascent-with-ben-a...

twodave · on Jan 20, 2016

Holy cow, talk about a flashback. I played the crap out of that game for about a decade.

dnr · on Jan 20, 2016

Add me to the HN/Subspace club: I played quite a bit of it starting from the end of beta (~'97) and also got pretty involved in the bot/modding/development scene that rose up after VIE left the picture. Some of the best gaming experiences I've ever had, and I learned a lot on the development side too.

vvanders · on Jan 20, 2016

Yup, many fond memories.

I was looking up the release date since my mind was a bit fuzzy on the year and it turns out it's been released on steam as freeware last year: http://store.steampowered.com/app/352700/

It's taking all of my self-control to not install it and waste the rest of the day.

s800 · on Jan 21, 2016

One thing has changed though: Backbone ISPs are throttling UDP traffic.

devit · on Jan 20, 2016

You can probably work around this issue by using hundreds of parallel TCP connections in a round robin fashion.

Assuming you are sending a packet per frame, this setup means that you are only sending a packet every few seconds per TCP connection, which means retransmissions should have already happened by the time you send another packet.

jsprogrammer · on Jan 20, 2016

WebRTC can give you UDP semantics.

nacs · on Jan 20, 2016

The article has small bits of useful info but the article and site reeks of submarine (Microsoft Azure) advertising for me..

Their homepage has a testimonial/review-blurb of the game as the very first thing from a Microsoft "White paper".

Below that there are 2 videos, one of which is an "Interview Rob Fraser - Microsoft".

And of course the article itself focuses on Azure with another 12 mentions of Azure and how it runs Windows, .Net and C# so well (I'd hope so).

Also, the 7 mill requests/second aren't real world numbers but just a software benchmark.

Oh and the benchmark was run on an instance that has 16 cores, 3TB of SSD storage, and 224 GB of RAM. That's more like a dedicated server than a single cloud server instance (and waay more expensive than dedicated at $3061 per month per instance).

brazzledazzle · on Jan 21, 2016

Microsoft may have given them free or heavily discounted access in exchange for being able to use them to advertise. That was apparently the case with TitanFall.

yread · on Jan 20, 2016

The actual code they're using

https://github.com/benaadams/benchmarks/blob/managed-rio-exp...

ckluis · on Jan 20, 2016

If possible - they should try and fix the C#/windows techempower implementations which are not anywhere near their results.

benaadams · on Jan 20, 2016

(Author of article) Working on it, currently the #1 contributor to AspNet Core's new webserver, and it is doing about 2.5M rps on equivalent techempower hardware

https://github.com/aspnet/KestrelHttpServer/graphs/contribut...

ckluis · on Jan 21, 2016

It looks like they are freezing techempower’s next round in a couple weeks. And congrats on the awesome work!

pjmlp · on Jan 20, 2016

Very interesting post.

I love that they could have the resources and time to explore using safer languages for their use cases.

jtwebman · on Jan 20, 2016

So wait was this all just serving Hello World?

profquail · on Jan 20, 2016

That's basically an echo server, which is a standard networking test. Leaving out the overhead of actually processing the payload means you're able to see how well the networking stack (and the parts of your program which interface with and manage it) can perform. Getting a clean number from a benchmark like this also makes it easier to estimate how much performance overhead is induced by the other parts of your program.

jtwebman · on Jan 21, 2016

I think it is just a fake number. Everything has running code also what if the VM you spin up is in a different part of the data center or next to another heavy network app? To me this seems like a waste of time to look at because depending on the logic it will not matter if the server can do twice as many hello world requests when it has to wait 30ms for business logic anyways. I want to see a real running system and the change out the web server as well as the hosting vm and see who comes out ahead with real systems. This just shows use this web server for making a CDN, nothing more.