Hacker News new | past | comments | ask | show | jobs | submit login
5M Bid request/s, 2ms max response time – The Road to Damascus (github.com/eloraiby)
157 points by sanjayts on May 24, 2019 | hide | past | favorite | 67 comments



Interested to know what Rust was missing. I built an ad exchange last year and it has been great. I have been using nightly builds, mostly for access to async/await, and it has been very fast and stable.

I have had to submit a few pull requests to various projects along the way, but didn't find the ecosystem prohibitively lacking.


Would you mind sharing what libraries you are using?


actix, cdrs, rusoto, cadence, diesel, futures/tokio, serde, chashmap, slog, postgres, criterion, r2d2, chrono

That isn't all of them but those are the main ones.


Thanks, that helps!


TechEmpower's Plaintext scenario is currently limited at 7M RPS due to network limits, though it uses a 10Gb NIC. Knowing that the Plaintext scenario is a very simple HTTP request (standard headers) that returns "Hello World!", how close to network saturation are you with 5M in this case with only "2 Gigabit Ethernet cards"?


Did you consider Vert.X? it's built on Netty and has it's own Linux epoll driver, async, and fiber support. It's impossible to know if it would be faster, but likely comparable and way less work than rolling your own.

In techempower benchmarks it exceeds 2 million http requests/second and it's a full REST framework.

And if you use the fiber support through Quasar you can pretend most things are normal blocking code.

Have to tried it or is this a case of NIH?


I spent a few cycles in media buying and later in sell-side ad tech. Please do say what you will about advertising and its effects on the web, but I will say this: it is a world of fascinating tech. As a buyer I experienced janky pacing all of the time across various platforms, because this is a HARD problem. We had to manually adjust campaigns on a daily basis to ensure pacing worked properly. It was common to stop a campaign and overspend by hundreds of dollars while all of the caching spun down.

I'm fascinated to see they are running that all on a single node. Its a massive amount of state aggregated from billions of events that needs to be served at extremely low latency, but couldn't it be partitioned somehow??? Google Fi/Spanner and BigTable have certainly been developed to support these issues. I've been trying to dig up what infrastructure powers Google AdX, but I haven't found anything. AdWords seems to be tied to Spanner, but AdX is/was an entirely different beast. In any case I'm quite certain that it isn't running pacing on a single, gigantic node.


As an anecdotal data point, I once configured a test campaign on Doubleclick Bid Manager (now Google DV360) about two years ago that I needed some quick exposure on. So I set a budget cap of 100$ just for safety and didn‘t do any targeting, so I was effectively bidding on half the worlds‘ ad inventory. What I didn‘t check or notice was that pacing wasn‘t set to even, but to Flight ASAP.

Suffice to say, I spent 730$ within _seconds_, so fast actually Googles systems couldn‘t even switch off fast enough to prevent 7,3x overspend, and the only thing that prevented stupid me from a five digit spend was probably choosing an unusual ad size.

Fascinating stuff indeed :)


> Its a massive amount of state aggregated from billions of events that needs to be served at extremely low latency, but couldn't it be partitioned somehow???

The bidder/pacer state is not necessarily massive, and certainly it does not consist of all the gazillions of past events. Depending on the strategy/bidding model, it can range from a few MB to several GBs, something that can fit in a beefy node.

> Google Fi/Spanner and BigTable have certainly been developed to support these issues.

I doubt any external store can be used with so low latency constraints (2-10ms) and high throughput (millions RPS). Perhaps Aerospike but even that is a stretch to put it in the hot-path. At this scale you're pretty much limited to fetch the state in memory and update it asynchronously every couple of minutes/hours.

Source: I also work in ad tech.


> Google Fi/Spanner

For anyone else confused it's probably Google F1 and Spanner.


Glad to see this kind of stuff written with a strongly typed language instead of Python.


Why, they wasted months evaluating with an obsession for statically typed languages that have so far not produced anything more quickly or markedly better than that which others are producing with less pedantic languages.


Static typing should be viewed as a tool for long term prosperity/health of a codebase rather than competing in the short term with a Python/s (etc.) - Modern one's can, but static typing isn't always obviously useful until it is.


Is this complete note? You didn't mention what you end up using.

Is it golang or pony or F$? CoreFX mention in the end confused me more.


I work with Wael. Development is still ongoing. One implementation uses Golang, the other uses F# with a library that wraps libuv for faster network performance. Pony was used to write the stress-testing client for both implementations.


I mentioned this elsewhere in the thread, but since you'll see the reply here; look into Vert.X if you haven't already.

It already does most of what you want and has support for native epoll transport.

I'm not sure what led Vert.X to be discarded, maybe not a Java shop? But we've used it extremely successfully for high performance REST and I know of several high profile tech companies that swear by it.

There's nothing I know of that compares with Vert.X in performance, stability, and popular adoption


Thanks for the recommendation! However, there are a few reasons why Vert.X wasn't considered, the biggest ones being that we're not a Java shop and the service in this blog post isn't HTTP/REST.

While the bandwidth benchmarks are fun to see and write blog posts about, we also care a lot about keeping tail latency at or below 2ms than we do about getting more bandwidth at this point.


Ah yes. The GC latency could be a problem. Java may still be viable with the new ZGC garbage collector. And Vert.X uses Netty underneath which is mostly protocol agnostic.

Still, those issues and not being a Java shop makes Vert.X/Netty likely a bad fit.

Thanks for replying with a well thought out response!


Have you considered Nim? You can achieve some really high performance with it[1]. Since you've considered Rust, Go, C and even Pony, Nim should really be on your list.

1 - See httpbeast in the latest round of the Techempower benchmarks: https://www.techempower.com/benchmarks/#section=data-r17&hw=...


Does Nim's compiler still crash all the time?

Also why is evey immature native language considered, but the two speed demon languages without garbage collection - ISPC and C++ - are nowhere to be found?


Was pony considered for the implementation itself? Other than the immature ecosystem it seems like a perfect fit here. Awesome write up and sounds like a fun job.


I'm really glad I came here and found this comment. It squares so many circles for me. Thanks rkallos.


Thanks for helping me answer all my questions about Pony! :)


You're welcome.


Do you use libuv on go side as well?


You don't want to use C lib in Go for performance reason.


Nope. The Go implementation uses net.Conn from the standard library.


> "I didn't want to rewrite everything from scratch, and definitely, I didn't want to handle all edge cases for epoll. My choice was to use libuv. The architecture I opt for: use 16 cores out of 40 for networking, having 16 'uv_loop' each running on its own thread. Callbacks will be passed from F# to each 'uv_loop' instance. The event loop will call them after parsing the bid request in C11."

Looks like libuv directly in C11? (not F# as before edit).


> The solution was to Marshal calls from F# to libuv and achieve 5 Millions (at least) bid requests/s on 16 threads (this solution scales with cores/NICs).


The language is called F# (pronounced F-sharp), not F$.


F$???? Is that a typo?


Could be a play on "M$"? A typo is probably more likely since the keys are next to each other.


Curious if there are numbers for other languages in high performance applications.

I am learning Clojure so I would like to know if anyone knows of the highest performant applications written in it.


> (scala? That's another story for another day)

This makes me curious - was it the language or the runtime characteristics?


It’s kind of sad that all this engineering effort was spent to essentially make the internet a worse place for everyone and waste users’ time and attention.

Imagine if a crime syndicate would brag about their efforts to make their worldwide criminal activities more efficient.


I can totally see where you're coming from. But major engineering achievements require efforts of many skilled people, who often like to be paid really well for their work. And the way the world works today is that a lot of big money is in the fields that are of questionable value to the society: advertising, finance, military, etc. And even in the fields that seem at first glance to be socially valuable, like health care, most of money comes not from healing people but from playing the game of "rip off public or private coverage providers".

Therefore I think the best we can hope for is that engineering breakthroughs achieved in profit driven fields will gradually leak into other fields where they can actually be used to improve people's lives.


This is why federal research and development agencies should pay higher salaries to attract said engineers.

Breakthroughs achieved would be funded by taxpayers and customers of these agencies and the patents for technology produced would be more likely enter the public domain.


It is very hard to get a large group of people to work efficiently towards some goal; very few organizations manage to do that. As far as I know such success stories are even more rare in the public sector than in the private sector. This means diverting resources away from private enterprise towards government agencies is not guaranteed to improve the situation.


Yeah I'd gladly switch to government work hours, retirement benefits, etc if they paid anywhere near what I make in finance.


... So when it's Google or FB blogging about technology originally developed to serve ads, it's hype and cool... But when the authors are more honest about the motivation behind developing a certain piece of tech, it's "kind of sad"?


I’ve never said anything Facebook or Google have done is ever cool, however I do understand your point of view as my opinion is not representative of what the majority of HN feels. But I’d like to set the record straight, I never think anything FB or Google is doing is cool.


Your opinion is exactly what the majority of HN feels.

As I read this article I was thinking "I bet the first comment on the article will just be complaining about the effort going into ads", and I wasn't wrong.

Honestly, it's becoming tiresome. Low latency computing is a great area to talk about, and instead it's derailed by this essentially off-topic discussion.


Do you legitimately think the world would be a better place if gmail, youtube, flickr, reddit, EVERY search engine, and basically every web content site disappeared?

Because that's what happens if you don't have web advertising. Free things disappear without revenue.

Or maybe you'd prefer to go back to the days of randomly-targeted or "PUNCH THE MONKEY" ads. Because THAT'S what happens without ad auctions and targeting.

The reality is: advertisers and ad-supported sites WANT to show you a relevant ad that you're likely to click (modulo obvious bad actors). That's how they get paid. Anything else is, by definition, "[wasting] users' time and attention."


A decade ago Amazon ads were for books and other products related to the page I was on. Today I get Amazon ads for whatever I had open on Amazon. I far prefer the old content-targeted ads to modern user-targeted ads.


No idea what the GP wants, but I want to go forward to a world where Westinghouse runs tasteful washing machine ads with no javascript on articles about buying a new home, instead of being able to microtarget me based on my income range and my regular visits to the laundromat to run washing machine ads next to articles about Syria.


Ultimately I ended up being convinced by Sam Harris' argument: that advertising convoluted our relationship with reality and we started expecting everything for free.

I run some services that are paid for by advertising and I see this myself: the second I start trying to charge money, I'm now the one service that people see as costing them money when all my competitors are "free." And the associated entitlement is toxic.

To answer your first question, imagine a world where advertising doesn't exist and we simply pay for those things.

I was reluctant to admit this since ads paid my way through university. But as ads make less and less money, it's harder and harder to ignore the truth. And while ads are viable, they impede the necessary cultural transformation we'll need to form a healthier relationship with the goods and services we want.


I decry this a lot. Ad blocker use is rampant. People don't want to pay for subscriptions. Etc.

It actually boils down to an expectation of slave labor for a lot of things, which people don't want to hear.


> Do you legitimately think the world would be a better place if gmail, youtube, flickr, reddit, EVERY search engine, and basically every web content site disappeared?

They might disappear, but not without replacements. Hosted email, videos, photos, discussion boards, and search are all services that would exist with or without a business model built on ad money. I don't care if they are free or paid. But the fact is that the privacy-depraved model wins because it's provided for free to the eyeballs.

And yes, I'd rather have punch-the-monkey ads. I'd even go so far as to say that I'd prefer my search results and most content-provider's offerings not ordered and filtered based on my previous behaviors.


I miss punch the monkey ads. Life seemed more carefree.

The web doesn't work without free services. For everyone who has extra money they would gladly use to pay for everysite they visit the majority won't and you will end up buying overpriced bundles from your phone company and your privacy will be worse. Free allows users a chance not to be tracked. Once you start involving money you can easily be traced.


Everything you describe could exist as a paid service, and I suspect would be much higher quality that way.


A paid youtube? Not sure they would be worth it without the endless amounts of free content by free users.

Facebook? There are plenty of sites like facebook without users. Charge for an account.. reduce the userbase, the platform loses value without the free users.


> A paid youtube? Not sure they would be worth it without the endless amounts of free content by free users.

Even if it were possible to convert these to paid services now, there's no way they could have started out paid with no existing content/user base.


A paid Facebook means it would be designed to actually serve you (instead of wasting your time) and people would use it to connect with friends instead of liking endless shitposts.


All your friends or anyone you want to contact would need to be paying customer.

If it is just friends why waste money on facebook just to talk. Most would rather spend that money on a more focused community/hobby like gaming and connect ingame.


I’d much rather go back to randomly targeted “punch the monkey” ads. Far better than the current crop of ads which are usually one of fraudulent, malware, or borderline pornographic.

Was I supposed to respond “oh no, current ads are so much better”?


> current crop of ads which are usually one of fraudulent, malware, or borderline pornographic.

That doesn't match my experience at all, but maybe you can chalk it up to targeting?


My own experience is that from running adblock all the time, aggressive blocking measures, and disabling targetting/personalisation, I get the absolute worst ads/ad networks when I do get them (for example when Firefox broke all addons a bit ago). Borderline malware, redirecting popup driveby Flash Player installers, it just feels like without enough of an ad profile built up you get whatever trash is left, no major brands or companies


I'd love to read what a crime syndicate does to improve their activities. Doesn't mean I agree with them... but no doubt it's really interesting and I might learn something from it.


Drug smuggling submarines. There was the court case of El Chapo and his IT guy testified about how he requested everyone's phones to be bugged. Also a secure comm network built for his organization, that then ended up bringing him down, because the IT guy folded (the FBI made a deal with him), etc.

Also you could learn a lot about tunnel building, and tunnel detection.


The crime syndicate scenario would be actually better because unlike advertisers, they actually provide (illegal) products such as drugs that have demand. I can give you examples of people happily buying drugs, but I’m not sure I could ever find someone who’d be willing to look at ads/spam voluntarily, let alone pay to do so.


Ads aren’t crime and ads aren’t universally net negative.


Depends. Google's real-time bidding is being investigated under GDPR.

https://www.cnbc.com/2019/05/22/irish-data-privacy-watchdog-...


The Best Minds of My Generation Are Thinking About How To Make People Click Ads (or serve them efficently)


Take a look at finance / trading. Things are even worse over there, but they don’t blog about it.


Used to be a semi-professional poker player, which seems to have a high crossover with finance/trading types for obvious reasons. I enjoyed the debates, but sometimes it was a bit overwhelming how everyone had convinced themselves that they were magically providing a ton of value to the world and that's why they were filthy rich. Also the massive intelligence that seemed to only work when applied in one domain. Didn't mind taking their money, but never pretended I was providing society a useful function while doing so. :)


At least finance/trading doesn’t try to stalk me and waste my time day to day, unlike advertisers.


I kind of like that some of the money from this industry is resulting in learnings and improvements to open source.

Found this article great, not many places to see 5m req/s, let alone on a single node.

I'm really interested in hearing more about those databases!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: