When I launched my former bitcoin casino in 2011 (it's gone, but it was a casino where all games, even roulette tables were multiplayer, on a platform built from scratch starting in '08), I handled all web requests through a server in Costa Rica that cost about $6/mo. Where I had a shell corporation for $250/year. Once the front end -- the bullet containing the entire casino code, about 250kb -- loaded, from Costa Rica, and once a user logged in, they were socketed to a server that handled the gaming action in the Isle of Man. Graphics and sounds were still sent from the Costa Rica server. I didn't have a gaming license in the IoM, though - that was around $400k to acquire legally. So I found a former IoM MP who was a lawyer, who wrote a letter to the IoM gov't stating that we didn't perform random calculations on their server, thus weren't a gambling site under IoM law. Technically that meant that no dice roll or card shuffle leading to a gambling win or loss took place on that server. So the IoM server handled the socketed user interactions, chat, hand rotation and tournament stuff. Also the bitcoin daemon and deposits/withdrawals. But to avoid casino licensing, I then set up a VPS in Switzerland that did only one thing: Return random numbers or shuffled decks of cards, with its own RNG. It was a quick backend cUrl call that would return a fresh deck or a dice roll, for any given random interaction on the casino servers. The IoM server would call the Swiss server every time a hand was dealt or a wheel was spun; the user was still looking at a site served from a cheap web host in Costa Rica. And thus... yeah, I guess I handled millions of requests a day over a $6 webserver, if you want to count it that way.
Focusing on a small but important detail that some have already mentioned but with a more aggressive tone... was your "loophole" system tested in an actual litigation at any point?
What I mean is that this:
> The IoM server would call the Swiss server every time a hand was dealt
might seem like a clever loophole around the laws in IoM, but in reality it sounds to me like the kind of technicalities that wouldn't really pass the reasoning of a human judge, who in their duty of interpreting the law and its intended spirit, would probably consider this an invalid trick and thus that the RNG of the system still resided in IoM, even if technically it didn't.
But of course, none of this matters if the casino never had any legal battle to fight where this idea could be tested in court, which is the equivalent of not being "caught".
It was never legally tested. It was what I felt I had to do such that the randomness didn't take place on the island. And no randomness ever did. I was in touch with a lot of officers of the large casinos operating out of there at the time, who were curious but skeptical about Bitcoin. I think by the time they realized it was a potentially valuable thing, I had already shut down operations, because I wasn't willing to chase the market into legally gray areas.
Yeah, that's like setting up a casino in one location with a permanent phone line open to switzerland to ask where the ball landed on the roulette wheels. Doesn't seem like it would hold up under scrutiny.
That's exactly what it was... and it really depends on how a locale defines gambling. In Costa Rica, for instance, you can run the game of chance as long as the money isn't landed onshore, because you're just generating random numbers. IoM was slightly different in that they didn't mind you landing cash, they just wouldn't allow you to generate the numbers. So it seemed natural to co-locate, and then Switzerland was a better backstop than either.
In any case I must admit that the trick, while we agree wouldn't (probably) hold water in court, it might have actually helped (we'll never know) to keep that casino out of some law enforcement watch list... especially if the officers were too overburdened with more important issues / lazy enough / not looking into that kind of activities at the time.
You’d be amazed at how cheap it is. You can usually get an MP to do your bidding or say your piece in Parliament for a donation of £10k or less. I’ve never heard of anyone paying more.
It would be a contempt of Parliament to either request or agree to such an arrangement [1]. If you have definitive evidence of this having happened in the recent past, you really should pass it on to the Parliamentary Commissioner for Standards.
This is why it’s a donation to the party - not to them or their office - although they can want a campaign contribution, which again doesn’t count as going to them.
I’ve had an MP literally solicit this from me - he emailed a bunch of local businesses basically stating his price list. He’s no longer an MP, or in politics - but you see this stuff everywhere, all the time. Any member’s bill you see has almost certainly been sponsored.
MPs typically move in packs ("areas" or "currents", that may or may not be officially organized around a thinktank or club), so buying the right MP you can actually move several of them. And then your representative goes and trades favors with the colleagues in government - "I'll vote for your controversial legislation if you pull that other lever over there". Representative democracy is a big sausage factory: in go blood and guts, out comes "edible" rules that society will live on.
I don’t think it’d hold up in court. This is the same as saying: “I bet you ten bucks when I mail my friend in Switzerland, he will mail back a ten.” The random part is being done in another country, but the betting is still being done wherever you and your friend are.
Isn't it like betting on games happening in a different country? I'm pretty sure that is still gambling, even if entropy is generated in a different place.
Maybe the laws are written that way in IoM just so you don’t have to be registered to bet on Kentucky horse races or sports or whatever? Could be that way on purpose. Interesting loop-hole.
Good question. The original plan was to take normal payment methods (Visa/MasterCard) but it became apparent after Bush passed the ban on online poker that Costa Rica was going to follow suit (or that visa/mc would soon start holding up payments from CR casinos... in which case we might be stuck with debt we couldn't use to pay winnings). Setting up a CR bank acct as a shell requires you to hand over power of attorney to a CR citizen, and also given how shady the entire corporate structure there was and the legal outfit we hired (who thought we could take $100 deposits as payments through their fake real estate portal) I evaluated other routes. These included landing funds in Cyprus and processing through Israeli banks at a 10% markup, and other shady sounding things. I had begun to give up on my little side code project when Bitcoin showed up. The benefit of the Isle of Man was that all funds could be landed there - in a Bitcoin wallet, on a caged dedicated server - without triggering any other financial issues. the only trouble was randomness and gambling.
I spent 3 years writing the software in my spare time, and launched it on the $20k I had in savings... personally bankrolling the games. When I began the alpha phase, you could still get an "information processing" license to run a casino in Costa Rica that would allow you to host gambling and which the credit card companies tacitly accepted and would do business with. (Those gambling payments to shady jurisdictions are how Paypal got big, and how Elon Musk made his first nine figures). But by 2010/11, all payment processors had abandoned Costa Rica, and there was nowhere in the world offering a license I could come close to affording.
I looked for investment, put together thick books of plans every few months, built 24 games... No one would get behind it, and I couldn't just shelf it, so this was how I launched it without breaking any laws.
A few months in, one existing online casino network offered me $100k to just hand it over to them and then come work for them, but I considered the offer extremely insulting.
Thanks. I ended up shutting it down around 2013, for a few reasons. One was that large casino software had moved into Bitcoin space and I couldn't compete against companies that were licensed in a way that allowed them to advertise. I wasn't making enough money from it to be worth waking up all hours to handle bug reports and manage the community, while also worrying about my bank account and keeping an eye on Bitcoin fluctuations. And another reason was that the platform / "gaming OS" was built around opening resizable/dockable windows for multiple games inside a single browser tab, meant to run full screen, and written completely in Flash AS3. By 2013 it was apparent that the front end platform was going to have to be completely rewritten for mobile, either native or JS. The site actually had run on iphone and droid before the flash plugin was removed, although it wasn't optimized for touch. But canvas drawing tools to do the kinds of things it did were in their infancy; things like a variable speed embedded video with a 3d ball physics simulation overlayed for the roulette table would probably be hard to accomplish even now. The headaches associated with running tech support, mostly alone on a daily basis, while also holding down a regular job and caring for a sick partner, made the idea of porting half a million lines of code daunting to say the least. And at the end of the day, without an injection of capital to license and white label the software, it wasn't really worth it.
So I went down a rabbit hole of trying to license the software for in-room gaming on cruise ships and Vegas hotel casinos. But Bally and Caesars pretty much dominated that space... if you even want to get a new game certified by the Nevada Gcb you have to put $100k down, non-refundable, for them to review the software (per game) and then they might start a review in a year or two. Bally gets to cut the line. I also had trouble trying to patent my original games. And of course, the world was coming closer to a consensus that Bitcoin would have to be regulated. So one day, I refunded all my players their balances and turned off the lights.
Uh, excuse me, but this whole story demonstrates that I went to extreme lengths to never break any laws, no matter how large or small the jurisdiction. Had I been willing to bend laws, I'd surely be a lot richer today.
Come on, cURL'ing to a foreign server to get a random number and not just reading /dev/urandom is logically identical. It's a hack, just like calling into GPL'ed code over HTTP is a hack to avoid "linking" the GPL'ed code. It doesn't really suddenly turn a site from gambling site into a non-gambling site.
I mean, I have mad respect for the hustle with the former MP etc. I agree with what you say in that you did not actually break the law - because you found a loophole (made a loophole? hustled it? again, I'm impressed). You ran a gambling site from IoM though :-)
What's so funny is - I was in a peer group of early btc founders who were starting random Bitcoin hustles and didn't care at all about laws. No one at that time had even approached the IoM about whether BTC was currency - nor would they have, because trying to find legal haven was the furthest thing from their minds. And so not surprisingly, the IoM didn't have a ready answer when I asked them whether gambling with Bitcoin was actually gambling. But the letter of the law was that gambling occurs _in the jurisdiction where_ the random chance takes place. So it really was different from hitting a local RNG, legally.
It added a nice little feature too, which was that every spin and deck could be stored on a separate server that would show them all at the end of the day. This was a little before "proven randomness" took off in btc casinos, but I made the RNG reports available daily for analysis (without explaining the whole infrastructure, obviously).
[edit] I just want to say that yes, you're obviously right, and yeah, I ran a casino from the IoM... without anyone knowing if that was okay or not... and it was just a moment in my life. of which I'm proud, I guess. I was living illegally in a small apartment in Alhama de Granada after violating my EU visa. hah. It was a great, great piece of software and I don't know if I'll ever write anything that good again. But it didn't really change my life or anything.
This was my main concern, and it was exactly what I needed a lawyer to sign off on before I set up a rig there. I was told that the gambling laws applied to where the chance took place, not where the money is distributed... after all, the whole thing with the IoM and the reason it's allowed to be a tax haven is that lots of people need to move money around without a lot of questions. But they defined gambling in this specific way and if only the money moved but the dice roll didn't take place on their shores, then it wasn't gambling under their jurisdiction. What you bring up was the conversation I had before locating there.
> @kapep don't know why I can't respond directly..
HN has a silly but effective piece of anti-flamewar UX which is that it hides the reply link in certain cases (some function of thread depth + amount of comments by you i think). However you can still reply by opening the comment on question (click on the timestamp, ie the "1 hour ago" link). Maybe you hit that.
Fwiw I think this is a great response and I'm happy you did not feel attacked by my comment because that was not my intent. Thanks for sharing & all the best!
> But the letter of the law was that gambling occurs _in the jurisdiction where_ the random chance takes place
"where the random chance takes place" could easily be interpreted as where the random number is _used_ and not where it has been created. Creating random numbers is not "chance" per se (in this context). Using random numbers to e.g. determining a winner would be the chance in my opinion.
I used to be. I got a web design job in San Francisco when I was 18 out of high school but I burned out and quit when I was 21. Drove a taxi so I could write and play music. Did it for a couple years. It was a good education. I feel old. Taxis don't even exist anymore.
> Come on, cURL'ing to a foreign server to get a random number and not just reading /dev/urandom is logically identical. It's a hack, just like calling into GPL'ed code over HTTP is a hack to avoid "linking" the GPL'ed code. It doesn't really suddenly turn a site from gambling site into a non-gambling site.
Around here (and probably elsewhere) bars aren't allowed to make wine stronger by adding spirit.
So if you mix a drink from wine (or similar) and spirit in that order you might lose your license.
Put the spirit in the glass first and all is ok.
I guess at this point it is just a shibboleth that inspectors use to see if the bar has read the rules at all, kind of like the no brown m&ms.
Point is though: rules matter, you can lose your license over it.
> (...) but this whole story demonstrates that I went to extreme lengths to never break any laws,
No, not really. It shows that you went to great lengths to find ways to exploit a loophole where, even though you are clearly breaking the spirit of the law, you argue that it doesn't break the letter of the law.
I get it that you have a vested interest in keeping up the plausible deniability thing, but you know it and everyone knows it that you went through great lengths to put up a tech infrastructure which meets absolutely no requirement other than exploiting a loophole.
I mean, you explicitly expressed your personal concerns in this very discussion regarding what you personally chose to describe as testing "legally grey areas". Who do you expect to fool?
I just really wanted to launch my software. Which when I began coding it, seemed completely legal and possible in Costa Rica. As the laws started to change - and even before Bitcoin came on the scene - I looked for how to do it without running afoul of anything. So it's not like I set out with a plan to exploit all the legal loopholes in the world, I just adapted my code and split it apart as necessary. I never even meant to take Bitcoin, let alone make it the only currency in the casino. It was just the only option if I wanted to launch. I had very little money and had written a giant gaming platform. I wanted it to see the light of day.
> Which when I began coding it, seemed completely legal and possible in Costa Rica. As the laws started to change - and even before Bitcoin came on the scene - I looked for how to do it without running afoul of anything. So it's not like I set out with a plan to exploit all the legal loopholes in the world, I just adapted my code and split it apart as necessary.
You are clearly and unequivocally stating that you set to exploit all the legal loopholes once your "grey area" was made black and white in Costa Rica.
Please, spare the thread from all that nonsense. You're not fooling anyone.
What you say I stated "unequivocally" is precisely the opposite of what I stated. I was quite clear - what I said was that I didn't want to pursue the business into a legal gray area. Therefore, I did what I had to do to keep it legal, including turning away 95% of the hits, retaining lawyers, and not violating any local laws. Moreover, it would have been perfectly legal to run the whole thing from Costa Rican servers at any point in time. I just didn't see a future in it, because they didn't offer full licensing, and the credit card companies pulled out while the software was in alpha. The IoM paper was intended as a step on the road to licensing.
I didn't exploit anything. I worked within the legal options that were available. In any case, I don't understand the accusation.
My only regret is that I didn't have the capital to buy a full license in the IoM or Malta outright. But the truth is, I wrote the whole thing from scratch and I was determined to launch it. You're free to your opinions, but you ought to avoid judging people's intentions while misreading their words.
Sure, you're not breaking any laws, just intentionally violating the spirit of the law for personal gain.
This isn't a court of law. I can understand trying to avoid language that makes it sound like you may have been in violation of the law to a judge or jury, but you literally described that what you did was intended to keep operating a service that had been banned by using an absurd technicality in the definition of the ban.
I'm honestly surprised it worked (though having a former MP of the tiny nation you were operating in as a lawyer might have helped) considering that your service still facilitated online gambling directly and was advertised as such, despite the randomness source being on a remote server rather than local.
In other words, using a non-local randomness source (like a remote server you cURL into or a webcam pointed at a bunch of lava lamps) is functionally indistinguishable from a local dice roll or other source of entropy. This "hack" is so flimsy it likely wouldn't hold up in court in a nation that is actually interested in pursuing such violations that has a population larger than a small city.
I suspect your anger derives from the fact that this would not be possible to get away with now, in any way shape or form, only ten years later. And truly the world is way more locked down now than it was then when people were like, "Bitcoin? What's that? you want to pay me to write a legal brief?" so, yeah. I feel sorry for kids ten years younger than me.
When I did it, the only thing I was really afraid of was getting arrested if/when I stepped back on American soil. There was redundancy so I could run the whole thing in Costa Rica if I had to cold shutdown the IoM servers. And the coin was in private wallets, mostly on my laptop. But I was very concerned about breaking any laws, anywhere. I was the only one to implement ID verification and fully block American players.
Call it a hack or whatever, they wanted my business and I needed their servers, and I split up my code so it would be legal according to their laws. Not too different from what a lot of companies do.
Hah! This made me laugh. Ok so not FOMO..(trust me, wasn't worth it except for the thrills).. why hostile? I was born in a Vegas family. My uncles all worked as blackjack dealers and pit bosses. When I was 7 they used to leave me in a corner of the casino for hours and tell me to stand there while my parents went and gambled. I taught myself to code there on a TRS-80 Model 100 in basic and practically the first thing I wrote was a slot machine. My view is that adults want to go gamble and that's their decision. I never took a dime from anyone I saw with a gambling problem... I would ban them from my site if they seemed addicted. I like to gamble myself. I count cards. Like everyone on my site... because my decks were single shuffle. So don't be so judgmental. I didn't do it for the money. I did it because I love the games.
I'm not angry, I'm hostile. That you think the only possible position from which to take issue with what you did is FOMO speaks volumes.
I'm not arguing that you violated any laws. You made it very clear that you went to great lengths to avoid doing anything that could have resulted in consequences to yourself.
EDIT: Since HN's rate limiting won't let me reply for a few hours, I'll just address the replies inline:
I'm not jealous. I'm sure noduerme made a sizable chunk of money with the whole operation at the time but their profile says they're now working as a taxi driver and sold all their bitcoin before the peak. They probably have a lot of other interesting stories to tell and that's nice. But dismissing any hostility or criticism as jealousy is thought terminating and frankly below even HN's standards.
Based on their backstory in the replies, I can see where their attitude comes from, but they severely underestimate how big of a problem gambling addiction is and how much of the profit of the gambling industry relies on it.
It's nice if the casino their parents worked at turned away obvious addicts but the word "obvious" is doing a lot of work here and there are also clear business reasons you don't want obvious addicts in your establishment the same way bars will be happy to have repeat customers buying drinks for five hours every day but will turn them away if they get blackout drunk or unsightly. "Not doing it for the money" may give you a clean conscience but it doesn't change the consequences of your actions.
It's also important to point out that online gambling is by its nature functionally anonymous for the gambler (even if you record IDs for legal reasons). The online casino isn't going to turn away the addict until they can no longer pay or have to resort to fraud to keep up the habit. And even if the casino implements limits, the proliferation of online casinos makes it considerably easier to go hopping than if you have to physically drive somewhere.
Gambling addiction not only ruins the lives of the addict but also impacts their friends and family, not just financially. It's true that not every person who gambles is an addict but the line between an expensive hobby and a managed addiction is hard to draw until you undeniably cross it.
But if you need a comment on HN to explain to you why gambling and especially online gambling is bad, a comment on HN isn't going to be enough to convince you.
Ok, I respectfully understand where you're coming from, and I've struggled with gambling addiction myself. I personally do not think it's immoral to offer games of chance to people, as long as they understand the odds and you're not cheating them. And believe me, running a small online casino mostly by myself with my own bankroll was literally setting alarms all night waking me up when someone was killing the tables and potentially going to bankrupt me. I got to know my big players (most of whom went on to become Bitcoin millionaires, since the early adopters were the only ones gambling on Bitcoin casinos in 2011)... but beyond that, I really don't think offering gambling is immoral as long as everyone knows what they're signing up for. I've dealt cards for a living, too. I don't drive a taxi anymore. But I've seen all sides of life. I don't think you can judge so easily. Yeah, big corporations suck and they screw people into debt, and gambling addiction does ruin peoples' lives, but I knew my players, and I don't think what I did personally hurt anybody. They came together to enjoy games, and yeah it was for real money, but it was also a community and they were there for fun - they could have gambled for a lot more money on other sites. One player built a puzzle in his escape room in Amsterdam in honor of / based on a game I designed. Like a bar, this is something people do for enjoyment, and you don't need to stand out there with a sign saying we're all going to hell. But I do get it and I'm not a fan of the companies that take advantage of human weakness and shake the dimes out of people's pockets. I was just trying to have a good time and give other people a good time.
Total profit from 2 years running the site? About $50k. It was a hobby. I never quit my job. I also turned away 95% of the hits because they were coming from America.
[edit] I should add that I strongly advised other BTC site owners, especially casino owners, to follow certain guidelines, and watched one of them who I had told to be careful launch, make about $1M with one game on a crappy website, and get jailed within a year. That wasn't the trajectory I was interested in.
Also I want to add... I had a feature on the site from day one that would let players set their own deposit limits through any date they chose. Once set, the limits could not be raised or revoked through that date, and any coin they sent beyond the limit would automatically be sent back. This was prominently displayed on the website along with an entire section of problem gambling resources. Some users did use it. Others I would put into the system involuntarily. The average rake on my non-poker games was about 2.5%, and some of the puzzle games I designed had a theoretical >100% payout if you could, e.g., solve a randomly spun Rubik's cube consistently in under 60 seconds. (No one ever hit over 100% on that one over time, though. If I had ever come across anything like that I would have written a bot to solve it... and I was waiting for a player to do so, so I could make them a partner).
The same jealousy that's called "opinionated" in software development, but really means "Doing things differently than how I do them threatens me, because my sense of superiority is rooted in how I do things."
Corporations do not so dissimilar things with teams of accountants and lawyers. Via tax planning, international subsidiaries and all other sorts of loopholes.
I'm equally hostile to corporations doing that. I don't recall the HN comment threats about Google doing the kind of things you describe being full of replies congratulating their ingenuity.
Big corporation doing bad thing is bad doesn't mean much smaller corporation doing bad thing is okay, it means we should work on preventing that bad thing and if we seemingly can't we should reconsider the underlying systemic conditions that enable it.
I think the lawyer who convinced the gov't that the operation was legal being a former MP and this taking place in a nation of less than 100,000 people is a crucial element to the story.
Incidentally both gambling AND fintech (including crypto) are on the list of industries I refuse to do work with. So I guess BTC gambling would have been off the table for two reasons.
I respect both the legality and the morality of it.
First, he did comply with all applicable law. No laws were broken.
Second, he did not break the spirit of the law. The law clearly allows gambling from the Isle of Man.
Third, he did not conflate the law with morality. What is the morality of a 400,000 GBP 'licensing' fee? Laws around licensing are weird. Another poster mentioned that pouring wine than liquor into a glass is illegal, but liquor then wine is fine. Not much moral sense in that reg.
Thank you. And yes, it's incredible the morally bankrupt things you see under color of law if you try getting into that business. Side story, I was once invited to be a guest of the now deposed dictator of Ghana when I casually floated to his lawyer who I met in a casino in Prague the idea of making them the next offshore gambling capital of the world. I did a little research on the country and then respectfully declined. In my view, I never broke a law or did anything immoral (since I was extremely transparent about the odds of every game and I made sure to only take adults from countries where gaming online was legal... who wanted to play, and knew the rules). And not conflating morality with legality -- while respectfully considering both -- is just a precondition to being an individual in a complicated world.
There is no "The law", just a bunch of different jurisdictions with different laws. He didn't break Switzerland law. Does Switzerland's law not matter for some reason?
People tend to severely underestimate how fast modern machines are and overestimate how much you need to spend on hardware.
Back in my last startup, I was doing a crypto market intelligence website that subscribed to full trade & order book feeds from the top 10 exchanges. It handled about 3K incoming messages/second (~260M per day), including all of the message parsing, order book update, processing, streaming to websocket connections on any connected client, and archival to PostGres for historical processing. Total hardware required was 1 m4.large + 1 r5.large AWS instances, for a bit under $200/month, and the boxes would regularly run at about 50% CPU.
I'm more than a little annoyed that so much data engineering is still done in Scala Spark or PySpark. Both suffer from pretty high memory overhead, which leads to suboptimal resource utilization. I've worked with a few different systems that compile their queries into C/C++ (which is transparent to the developer). Those tend to be significantly faster or can use fewer nodes to process.
I get that quick & dirty scripts for exploration don't need to be super optimized, and that throwing more hardware at the problem _can_ be cheaper than engineering time, but in my experience, the latter ends up costing my org tens of millions of dollars annually -- just write some code and allocate a ton of resources to make it work in a reasonable amount of time.
I'm hopeful that Ballista[1], for example, will see uptake and improve this.
I get a kick out of stuff like this - I’m mostly an exec these days, but I recently prototyped a small database system to feed a business process in SQLite on my laptop.
To my amusement, my little SQLite prototype smoked the “enterprise” database. Turns out that a MacBook Pro SSD performs better than the SAN, and the query planner needs more tlc. We ended up running the queries off my laptop for a few days while the DBAs did their thing.
Right. Local storage is much more performant and cost effective than network storage. I tried to run some iops sensitive workload on cloud. It turns out I need to pay several thousands dollar per month for the performance I can get on a $100 nvme ssd.
I'm working on a web app right now that does a lot of heavy/live/realtime processing with workers. The original thought was to run those workers on the node servers and stream the results over a socket, charging by the CPU/minute. But it surprised me that the performance looks a lot better up to about 16 workers if the user just turns over a few cores to local web workers and runs it on their own box. As long as they don't need 64 cores or something, it's faster to run locally, even on an older machine. Thread transport is slow but sockets are slower at a distance; the bottlenecks are in the main thread anyway. So I've been making sure parts are interchangeable between web workers and "remote socket web workers" assigned to each instance of the app remotely.
Isn't that what's happening if you use any managed database product? They have probably colocated everything as much as possible and used various proprietary tricks to cut latency, but still.
What reminded me of this the other day is how MacOS will grow your cursor if you “shake” it to help you find it on a big screen.
I was thinking about how they must have a routine that’s constantly taking mouse input, buffering history, and running some algorithm to determine when user input is a mouse “shake”.
And how many features like this add up to eat up a nontrivial amount of resources.
That particular example seems like something that's probably a lot cheaper than you'd initially think. The OS has to constantly take mouse input anyway to move the pointer and dispatch events to userspace. It also needs to record the current and new position of the mouse pointer to dispatch the events. Detecting whether the mouse is being "shaken" can be done with a ring buffer of mouse velocities over the last second or two of ticks. At 60 fps, that's about 120 ints = 480 bytes. Since you don't need to be precise, you can take Manhattan distance (x + y) rather than Euclidean distance (sqrt(x^2 + y^2)), which is a basically negligible computation. Add up the running total of the ring buffer - and you don't even need to visit each element, just keep a running total in a variable, add the new velocity, subtract the velocity that's about to be overwritten - and if this passes a threshold that's say 1-2 screen widths, the mouse is being "shaken" and the pointer should enlarge. In total you're looking at < 500 bytes and a few dozen CPU cycles per tick for this feature.
Or, alternatively, the engineer that worked on this at Apple has just read the above as another way of solving the problem and is throwing this on their backlog for tomorrow..
Thanks for the thoughtful analysis and napkin math. You may very well be right. I wonder if this is true in practice or if they suffer from any interface abstractions and whatnot.
On every modern (past few decades) platform, the mouse cursor is a hardware sprite with a dedicated, optimized, *fast* path thru every layer of the stack, just to shave off a few ms of user-perceived latency. Grab a window and shake it violently, you'll notice it lags a few pixels behind the cursor - that's the magic in action.
In some places there's no room left for unnecessary abstractions, I can imagine most of the code touching mouse / cursor handling is in that category.
The running-sum-difference approach suggested above is a box filter, which has the best possible noise suppression for a given step-function delay, although in the frequency domain it looks appalling. It uses more RAM, but not that much. The single-pole RC filter you're suggesting is much nicer in the frequency domain, but in the time domain it's far worse.
Not really? Sort of? I don't really have a good answer here. It depends on what you mean by "due to". It's certainly due to the impulse response, since in the sense I meant "far worse" the impulse response is the only thing that matters.
Truncating the impulse response after five time constants wouldn't really change its output noticeably, and even if you truncated it after two or three time constants it would still be inferior to the box filter for this application, though less bad. So in that sense the problem isn't that it's infinite.
Likewise, you could certainly design a direct-form IIR filter that did a perfectly adequate job of approximating a box filter for this sort of application, and that might actually be a reasonable thing to do if you wanted to do something like this with a bunch of op-amps or microwave passives instead of code.
So the fact that the impulse response is infinite is neither necessary nor sufficient for the problem.
The problem with the simple single-pole filter is that by putting so much weight on very recent samples, you sort of throw away some information about samples that aren't quite so recent and become more vulnerable to false triggering from a single rapid mouse movement, so you have to set the threshold higher to compensate.
Reading all of you sounding super smart and saying stuff I don’t recognize (but perhaps utilize without knowing the terms) used to make me feel anxious about being an impostor. Now it makes me excited that there’s so many more secrets to discover in my discipline.
It turns out that pretty much any time you have code that interacts with the world outside computers, you end up doing DSP. Graphics processing algorithms are DSP; software-defined radio is DSP; music synthesis is DSP; Kalman filters for position estimation is DSP; PID controllers for thermostats or motor control are DSP; converting sonar echoes into images is DSP; electrocardiogram analysis is DSP; high-frequency trading is DSP (though most of the linear theory is not useful there). So if you're interested in programming and also interested in graphics, sound, communication, or other things outside of computers, you will appreciate having studied DSP.
Don't worry, this is a domain of magic matlab functions and excel data analysis and multiply-named (separately invented about four times on average in different fields) terms for the same thing, with incomprehensible jargon and no simple explanation untainted by very specific industry application.
For both alternatives we begin by computing how far the mouse has gone:
int m = abs(dx) + abs(dy); // Manhattan distance
For the single-pole RC exponential filter as WanderPanda suggested:
c -= c >> 5; // exponential decay without a multiply (not actually faster on most modern CPUs)
c += m;
For the box filter with the running-sum table as nostrademons suggested:
s += m; // update running sum
size_t j = (i + 1) % n; // calculate index in prefix sum table to overwrite
int d = s - t[j]; // calculate sum of last n mouse movement Manhattan distances
t[j] = s;
i = j;
Here c, i, s, and t are all presumed to persist from one event to the next, so maybe they're part of some context struct, while in old-fashioned C they'd be static variables. If n is a compile-time constant, this will be more efficient, especially if it's a power of 2. You don't really need a separate persistent s; that's an optimization nostrademons suggested, but you could instead use a local s at the cost of an extra array-indexing operation:
int s = t[i] + m;
Depending on context this might not actually cost any extra time.
Once you've computed your smoothed mouse velocity in c or d, you compare it against some kind of predetermined threshold, or maybe apply a smoothstep to it to get the mouse pointer size.
Roughly I think WanderPanda's approach is about 12 RISCish CPU instructions, and nostrademons's approach is about 18 but works a lot better. Either way you're probably looking at about 4-8 clock cycles on one core per mouse movement, considerably less than actually drawing the mouse pointer (if you're doing it on the CPU, anyway).
> they must have a routine that’s constantly taking mouse input
Possible but unlikely. Well-written desktop software never constantly taking input, it's sleeping on OS kernel primitives like poll/epoll/IOCP/etc waiting for these inputs.
Operating systems don't generate mouse events at 1kHz unless you actually move the mouse.
“Constantly taking” is not the same thing as “constantly polling”. The ring buffer approach works identically in the event-driven approach, you just need to calculate the number of “skipped” ticks and zero them out in the ring buffer.
Yeah, it's a high quality mouse. But the only excuse for this is it's slightly cheaper to make everything USB. PS/2 worked much better. It was limited to 200Hz but needed no polling. Motherboards just stopped providing the port.
If the computer has to do anything at all it ads to complexity and it isn't doing other things. One could do something a bit like blue screening and add the mouse pointer to the video signal in the monitor. For basic functionality the computer only needs to know the x and y of clicks. (it could for laughs also report the colors in the area) Hover and other effects could be activated when [really] needed. As a bonus one could replace the hideous monitor configuration menus with a point and click interface.
This polling is not done by the CPU, this is a common misconception. In a typical modern system the only polling that happens with USB is done by the USB host controller and only when there is actual data the host controller generates interrupts for the CPU to process it. Obviously, when you configure the mouse at higher frequency you will get more interrupts and hence higher CPU usage but that has nothing to do with the polling.
And yet MacOS doesn't allow to change the cursor color. On my Windows 10 desktop I set the cursor color to a slightly larger size and yellow color. So much easier to work with.
Abstractions almost always end up leaky. Spark SQL, for example, does whole-stage codegen which collapses multiple project/filter stages into a single compiled stage, but your underlying data format still needs to be memory friendly (i.e. linear accesses, low branching, etc.). The codegen is very naive and the JVM JIT can only do so much.
What I've seen is that you need people who deeply understand the system (e.g. Spark) to be able to tune for these edge cases (e.g. see [1] for examples of some of the tradeoffs between different processing schemes). Those people are expensive (think $500k+ annual salaries) and are really only cost effective when your compute spend is in the tens of millions or higher annually. Everyone else is using open source and throwing more compute at the problem or relying on their data scientists/data engineers to figure out what magic knob to turn.
Spark is very very odd to tune. Like, it seems (from my limited experience) to have the problems common to distributed data processing (skew, it's almost always skew) but because it's lazy, people end up really confused as to what actually drives the performance problems.
That being said, Spark is literally the only (relatively) easy way to run distributed ML that's open source. The competitors are GPU's (if you have a GPU friendly problem) and running multiple Python processes across the network.
(I'm really hoping that people will now school me, and I'll discover a much better way in the comments).
Data engineers should be building pipelines and delivering business value, not fidgeting with some JVM or Spark parameter that saves them runtime on a join (or for that matter, from what I've seen at a certain bigco, building their own custom join algorithms). That's why I said it's only economical for big companies to run efficient abstractions and everyone else just throws more compute at the issue.
I work in the Analytics space and been mostly on Java and I am so glad that other people feel the same.
At this point, people have become afraid of suggesting something other than Spark.
I see something written in Rust to be much better at problems like this.
I love the JVM but it works well with transactional workloads and starts showing its age when its dealing with analytical loads.
The worst thing is then people start doing weak references and weird off the heap processing usually by a senior engineer but really defeats the purpose of the JVM
I guess your company is running on Java and running something else would cost a lot in training, recruiting, understanding, etc. But down the line, defeating the JVM will be understood only by the guy who did it... Then that guy will leave... Then the newcomers will rewrite the thing in Spark 'cos it will feel safer. Rinse-repeat...
(I'm totally speculating, but your story seems so true that it inspired me :-)
Some of it is what you mentioned about training and hiring costs that but mostly its this creation of the narrative that it will scale someday in the future. This is usually done by that engineer(s) and they are good at selling so a different opinion is ignored or frowned upon by the leadership.
I have now seen this anti pattern in multiple places now
Analytical loads deal with very large datasets in the order of terabytes even after you compress them. These workloads dont change much so keeping them in the heap eventually results in long processing pauses because JVM tries to recover this memory.
However, in most cases, this data is not meant to be garbage collected. For transactions, once you have persisted the data, it should be garbage collected so the pattern works.
There are a lot of other aspects that I can probably think of but the above one is the most important in my mind.
Yes, the whole idea of sending “agents” to do processing is poor performing and things like snowflake and Trino, where queries go to already deployed code, run rings around it.
Furthermore, pyspark is by far the most popular and used spark, and it’s also got the absolute world-worst atrocious mechanical sympathy. Why?
Developer velocity trumps compute velocity any day?
(I want the niceness of python and the performance of eg firebolt. Why must I pick?)
(There is a general thing to get spark “off heap” and use generic query compute on the spark sql space, but it is miles behind those who start off there)
U-SQL on Azure Data Lake Analytics[0] and ECL from HPCC Systems[1] are two I have fairly extensive experience with. There may be other such platforms, but those and Ballista are the three on my radar.
We had a system management backend at my last company. Loading the users list was unbearably slow; 10+ seconds on a warm cache. Not too terrible, except that most user management tasks required a page reload, so it was just wildly infuriating.
Eventually I took a look at the code for the page, which queried LDAP for user data and the database for permissions data. It did:
get list of users
foreach user:
get list of all permissions
filter down to the ones assigned directly to the user
foreach user:
get list of all groups
foreach group:
get list of all permissions
filter down to the ones assigned to the group
filter down to the ones the user has
I'm no algorithm genius, but I'm pretty sure O(n^2+n^3) is not an efficient one.
I replaced it with
get list of all users
get list of all groups
get list of all permissions
<filter accordingly>
Suffice to say, it was a lot more responsive.
Also worth noting was that fetching the user list required shelling out to a command (a python script) which shelled out to a command (ldapsearch), and the whole system was a nightmare. There were also dozens of pages where almost no processing was done in the view, but a bunch of objects with lazy-loaded properties were passed into the template and always used, so when benchmarking you'd get 0.01 seconds for the entire function and then 233 seconds for "return render(...)' because for every single row in the database (dozens or hundreds) the template would access a property that would trigger another SQL call to the backend, rather than just doing one giant "SELECT ALL THE THINGS" and hammering it out that way.
Note that we also weren't using Django's foreign keys support, so we couldn't even tell Django to "fetch everything non-lazily" because it had no idea.
If that app were written right it could have run on a Raspberry Pi 2, but instead there was no amount of cores that could have sped it up.
Yeah, I see this a lot. I think it's especially easy to introduce this kind of "accidentally quadratic" behaviour using magical ORMs like Django's, where an innocent-looking attribute access like user.groups can trigger a database query ... access user.groups inside a loop and things get bad quickly.
In the case of groups and permissions there's probably only a few of each, so fetching all of them is probably fine. But depending on your data -- say you're fetching comments written by a subset of users, you can tweak the above to use IN filtering, something like this Python-ish code:
users = select('SELECT id, name FROM users WHERE id IN $1', user_ids)
comments = select('SELECT user_id, text FROM comments WHERE user_id IN $1', user_ids)
comments_by_user_id = defaultdict(list)
for c in comments:
comments_by_user_id[c.user_id].append(c)
for u in users:
u.comments = comments_by_user_id[u.id]
Only two queries, and O(users + comments).
For development, we had a ?queries=1 query parameter you could add to the URL to show the number of SQL queries and their total time at the bottom of the page. Very helpful when trying to optimize this stuff. "Why is this page doing 350 queries totalling 5 seconds? Oops, I must have an N+1 query issue!"
Thanks. Yeah, I think I used that years ago when I first ran into this problem, and it worked well. Whether one uses an ORM or not, one needs to know how to use one's tools. My problem (not just with Django, but with ORMs in general) is how they make bad code look good. Like the following (I don't know Django well anymore, but something like this):
users = User.objects.all()
for u in user:
print(u.name, len(u.comments))
To someone who doesn't know the ORM (or Python) well, u.comments looks cheap ("good"), but it's actually doing a db query under the hood each time around the loop ("bad"). Not to mention it's fetching all the comments when we're only using their count. Whereas if you did that in a more direct-SQL way:
users = select('SELECT id, name FROM users WHERE id IN $1', user_ids)
for u in user:
num_comments = get('SELECT COUNT(*) FROM comments WHERE user_id = $1', u.id)
print(u.name, num_comments)
This makes the bad pattern look bad. "Oops, I'm doing an SQL query every loop!"
The other thing I don't like about (most) ORMs is they fetch all the columns by default, even if the code only uses one or two of them. I know most ORMs provide a way to explicitly specify the columns you want, but the easy/simple default is to fetch them all.
I get the value ORMs provide: save a lot of boilerplate and give you nice classes with methods for your tables. I wonder if there's a middle ground where you couldn't do obviously bad things with the ORM without explicitly opting into them. Or even just a heuristic mode for development where it yelled loudly if it detected what looked like an N+1 issue or other query inside a loop.
Django's ".only()" method lets you specify just the columns you want to retrieve - with the downside that any additional property access can trigger another SQL query. I thought I'd seen code somewhere that can turn those into errors but I'm failing to dig it up again now.
I've used the assertNumQueries() assertion in tests to guard against future changes that accidentally increase the number of queries being made without me intending that.
The points you raise are valid, but there are various levels of mitigations for them. Always room for improvement though!
> This is fine if you are working with a small data set. It is inefficient, but if it's quick enough, readability trumps efficiency IMHO.
And this is how you end up with the problems the parent is describing. During testing and when you setup the system you always have a small dataset so it appears to work fine. But when it’s real work the system collapses.
A good habit when writing unit tests is to use assertNumQueries especially with views. It's very easy even for an experienced developer to inadvertently add an extra query per row: for example if you have a query using only() to restrict the columns you return, and in a template you refer to another column not covered by only(), Django will do another query to fetch that column value (not 100% if that's still the case, but that was the behavior last time I looked). The developer might be just fixing what they think is a template issue and won't know they've just stepped on a performance landmine.
Sounds like a good habit, yeah. But the fact that an attribute access in a template (which might be quite far removed from the view code) can kick off a db query is an indication there's something a bit too magical about the design. Often front-end developers without much knowledge of the data model or the SQL schema are writing templates, so this is a bit of a foot-gun.
I've wanted a Django mechanism in the past that says "any SQL query triggered from a template is a hard error" - that way you could at least prevent accidental template edits from adding more SQL load.
I’m coming around to considering Django-seal to be mandatory for non-trivial applications. It will “seal” a query set so that you can’t make accidental DB lookups after the initial fetch. That way you can be more confident that you are doing the right joins in the initial query, and you are safe from the dreaded O(N) ORM issue.
SqlAlchemy has this as part of the ORM, it should really be part of Django IMO.
When I was doing a lot of DBIx::Class work, I contemplated writing a plugin to allow me to lock and and unlock automatic relation fetches and die (throw) if one was attempted when locked. I would carefully prefetch my queries (auto-fill relations from a larger joined query) and if something changed or there was an edge case I missed that caused fetching deep in a loop, it might kill performance, but be hard to track down if not given the best directions on what caused it other than "this is slow".
It's one of those things that in the long run would have been much more time effective to write, but the debugging never quite took long enough each time to make me take the time.
> For development, we had a ?queries=1 query parameter you could add to the URL to show the number of SQL queries and their total time at the bottom of the page. Very helpful when trying to optimize this stuff. "Why is this page doing 350 queries totalling 5 seconds? Oops, I must have an N+1 query issue!"
I'm so glad Symfony (PHP framework) has a built-in profiler and analysis tooling there...
Your pattern is quite powerful: get data from several sources and do the rearranging on the client (which might be a web server), instead of multiple interactions for each data item.
For SQL you can also do a stored procedure. Sometimes that works well if you are good at your DBMS's procedure language and the schema is good.
Sending multiple queries from your web server will likely put more load on the database server than using a stored procedure.
With either technique you are still pulling all the data you need from the DB but with multiple queries instead of a stored procedure you are usually pulling more data than you need with each query and then dropping any rows or fields you’re not interested in. Together with multiple calls over the network to the DB server and (often) multiple SQL connection setups this is much worse for performance on both the web and database servers
>Sending multiple queries from your web server will likely put more load on the database server than using a stored procedure.
Lots of people seem to not realize that db roundtrips are expensive, and should be avoided whenever possible.
One of the best illustrations of this I've found is in Transaction Processing book by
Jim Gray and Andreas Reuters where they illustrate the relative cost of getting data from CPU vs CPU cache vs RAM vs cross host query.
I had to fix a similar thing in our internal password reset email sender last year. The code was doing something like:
for each user in (get_freeipa_users | grep_attribute uid):
email = (get_freeipa_users | client_side_find user | grep_attribute email)
last_change = (get_freeipa_users | client_side_find user | grep_attribute krblastpwdchange)
expiration = (get_freeipa_users | client_side_find user | grep_attribute krbpasswordexpiration)
# Some slightly incorrect date math...
send_email
I changed it to a single LDAP query for every user that requests only the needed attributes. It cut that Jenkins job's runtime from 45 minutes to 0.2 seconds.
Given the context, it probably took the poster very little time to implement that fix, without digging into ldapsearch. With massive speedup, for their likely smallish ldap install. Seems like not a bad call at all.
I did some work to improve performance on a dashboard several years ago. The way the statistics were queried was generally terrible, so I spent some time setting up aggregations and cleaning that up, but then... the performance was still terrible.
It turned out that the dashboard had been built on top of Wordpress. The way that it checked if the user had permission to access the dashboard was to query all users, join the meta table which held the permission as a serialized object, run a full text search to check which users had permission to access this page, and return the list of all users with permission to access the page. Then, it checked if the current user was in that list.
I switched it to only check permissions for the current user, and the page loaded instantaneously.
If I look at traces of all the service calls at my company within our microservices environment, the "meat" of each service is a fraction of the latency -- the part that's actually fetching the data from a database, or running an intense calculation. Often times its between 20-40ms
Everything else are network hops and what I call "distributed clutter", including authorizing via a third party like Auth0 multiple times for machine-to-machine token (because "zero trust"!), multiple parameter store calls, hitting a dcache, if interacting with a serverless function, cold starts, API gateway latency, etc...
So for the meat of a 20-40 ms call, we get about a 400ms-2s backend response time.
Then if you are loading a front end SPA with javascript...fugetaboutit it
But DevOps will say "but my managed services and infinite scalability!"
Not sure what the exact use case was (i.e. the output of the filtering) but—from reading the first algo—seems to be something to do with determining group membership and permissions for a user.
In that case, was there a reason joins couldn't be used? As it still seems pretty wasteful (and less performant) to load all of this data in memory and post-process; whereas a well-indexed database could possibly do it faster and with less-memory usage.
In defense of whoever wrote the original code: it probably would have been reasonably fast if it had been a database query with proper indexes. The filters would have whittled the selection down to only the relevant data, whereas returning basically three entire tables of data to then throw away most of it would have been extremely inefficient.
The mistake of course was not thinking about why this approach is faster in a database query and that it doesn't work that way when you already need to get all the data out of LDAP to do anything with it.
Yeah - you likely want to do this a single simple query - which you can optimize if necessary. an O(N+1) query is bad. An O(N^2) query is something I have rarely seen. Congrats!
I'm working and company which process "real" exchanges, like NASDAQ, LSE, and, especially, OPRA feed.
We've added 20+ crypto exchanges in our portfolio this year, and all of them are processed on one old server which is unable to process NASDAQ Total View in real-time anymore.
On the other hand, whole OPRA feed (more than 5Gbit/s or 65B/day, yes, it is billions, messages of very optimized binary protocol, not this crappy JSON) is processed by our code on one modern server. Nothing special, two sockets of Intel Xeons (not even Platinums).
I've read your few posts a few times and I'm still not sure why you made your post. You're telling the person that you handle more data than them and thus need more resources than them. Was your goal to smugly belittle them? It's not like they said any problem can be solved on their specific resources.
Nope, I want to say, that even much more data could be processed on very limited hardware, and that it is additional confirmation that current hardware is immensely powerful and very under-estimated.
And that go-to solutions / golden hammers like REST / JSON are very much suboptimal.
I mean I'm still going to use it for client/server communication and the like because I don't have serious performance constraints enough to warrant something that will be more difficult to develop for etc, but still.
I'd initially responded, but I found that my response had an element of a pissing match over whose data is bigger, so I deleted it.
The thing is - when two engineers get smug, oftentimes lots of fairly interesting technical details get exchanged, so such discussions aren't really useless to bystanders.
It depends on what your server does with each request though; '65B a day' means little. If all it does is write it to a log then I'm surprised you're not using a rPI.
Could you share some more about that very optimized binary protocol? I know there are ways to be more efficient than JSON but since you call it crappy, your solution must be much much better. Honestly interested to readup more.
It is not "our" protocol, it is protocol designed by exchange and we need to support it, as we can not change it :).
Simple binary messages, with binary encoded numbers, etc. No string parsing, no syntax, nothing like this, only bytes and offsets. Think about TCP header, for example.
JSON is very inefficient both in bytes (32 bit price is 4 bytes in binary and could be 7+ bytes as string, think "1299.99" for example) and CPU: to parse "1299.99" you need burn a lot of cycles, and if it is number of cents stored as native 4-byte number you need 3 shifts and 4 binary ors at most, if you need to change endianness, and in most cases it is simple memory copy of 4 bytes, 1-2 CPU cycle.
When you have binary protocol, you could skip fields which you are not interested in as simple as "offset = offset + <filed-size>" (where <filed-size> is compile-time constant!) and in JSON you need to parse whole thing anyway.
Difference between converting binary packet to internal data structure and parsing JSON with same data to same structure could be ten-fold easily, and you need to be very creative to parse JSON without additional memory allocations (it is possible, but code becomes very dirty and fragile), and memory allocation and/or deallocation costs a lot, both in GC languages and languages with manual memory management.
Curious if the binary protocol uses floating point or a fixed point representation? Or is floating point with its rounding issues sufficient for the protocol's needs?
No GP but familiar with these protocols.
They use fixed point extensively; I can't even thing of an exchange protocol which would use floating point since the rounding issues would cause unnecessary and expensive problems.
Prices are integer fields. When converted to a decimal format, prices are in fixed point format with 6 whole number places followed by 4 decimal digits. The maximum price in OUCH 4.2 is $199,999.9900 (decimal, 7735939C hex). When entering market orders for a cross, use the special price of $214,748.3647 (decimal, 7FFFFFFF hex).
For NASDAQ it seems to have been something around 430k / share... Buffett's BRK shares threatened to hit that limit a couple months ago: https://news.ycombinator.com/item?id=27044044
Most of them use decimal fixed point. Sometimes exponent (decimal, not binary one!) is fixed per-protocol, sometimes per-instrument and sometimes per-message, it depends on exchange.
If you're optimizing for latency JSON is pretty terrible, but most people who use it are optimizing for interoperability and ease of development. It works just fine for that, and you can recover decent bandwidth just by compressing it.
"old skool" exchanges uses either FIX (old and really vernose), FAST (binary encoding for FIX) or custom fixed-layout protocols.
Most big USA exchanges uses custom fixed-layout protocols, where each message is described in documentation, but not in machine-readable way. European ones still use FAST.
I didn't seen FIX in the wild for data feeds, but it is used for brokers, to submit orders to exchange (our company didn't do this part, we only consume feeds).
I don't know why, but all Crypto Exchanges use JSON, not protobufs or something like this, and didn't publish any formal schemes.
Fun fact: one crypto exchange put GZIP'ed and base64'ed JSON data into JSON which pushed to websocket, to save bandwidth. IMHO, it is peak of bad design.
And msgpack if you want an order of magnitude faster serialization/deserialisation and can put up with worse compression (I think mainly due to schema overhead since protobuf files don't store the schema?)
Have a look at NASDAQ ITCH, OUCH, and RASH
(these are the real names, the story I heard is the original author of them didn't like the usual corporate style brand names and wanted certain people to squirm when talking about them).
In 2001 we started a GSM-operator on Compaq Server (it was before they were bought by HP) with whole 1Gb(!) of RAM and 2x10Gb SCSI disks.
It served up to 70K of subscribers, call center with 30-40 employees, payment systems integration, everything.
Next was 8 socket Intel server. We were never able to saturate it's CPUs - 300 Mhz (or was it 400 ?) bus was a stopper. It served 350-400K of subscribers.
And next: we changed architecture and used 2 servers with 2 socket Intel CPUs again but that was time when Ghz frequencies appeared on market. We dreamed about 4xAMD server. We came to ~1 mln of active subscribers.
Nowadays: every phone has more power than it was those servers.
Typical react application consumes more resources than billing system.
Gigabyte here, gigabyte there - nobody counts them.
People may underestimate how fast modern machines are, but that is probably in part because, at least in my fairly relevant experience, I have literally never seen a CPU bottleneck under normal circumstances. Memory pressure is nearly always the driving issue.
The CPU is rarely used up to 100% because most code fails to utilize several cores efficiently.
OTOH a service loading the single core with the main thread is a frequent sight :( Interpreted languages like Python can easily spend 30% of time just on the deserialization overhead, converting the data from a DB into a result set, and then into ORM instances.
It's also amazing how much you can fit in RAM if you're careful. I remember ~2007 people were aghast at Facebook's 4T memcached deployment that stored basically everyone's social network posts; now you can get single servers for ~$4K with 4T of RAM.
The trick is basically that you have to eschew the last 15 years of "productivity" enhancements. Pretty much any dynamic language is out; if you must use the JVM or .NET, store as much as possible in flat buffers of primitive types. I ended up converting order books from the obvious representation (hashtable mapping prices to a list of Order structs) to a pair of SortedMaps from FastUtils, which provides an unboxed float representation with no pointers. That change ended up reducing memory usage by about 4x.
You can fit a lot of ints and floats in today's 100G+ machines, way more than needed to represent the entire cryptocurrency market. You just can't do that when you're chasing 3 pointers, each with their associated object headers, to store 4 bytes.
The more I read comments on subjects I am intimately familiar with, the more I realize most people who comment on HN don't really know what they're talking about and mostly make things up.
To answer your question, you can't find these servers because they don't exist. A server with 4T of RAM will cost you at a minimum $20,000 and that will be for some really crappy low-grade RAM. Realistically for an actual server that one would use in an actual semi-production setting, you're looking at a minimum of $35,000 for 4TB of RAM and that's just for the RAM alone, although to be fair that 35k ends up dominating the cost of the entire system.
> the more I realize most people who comment on HN don't really know what they're talking about and mostly make things up.
I don't think they're typically making things up. It's what I prefer to call Reddit knowledge. They saw someone else claim it somewhere, they believed it, and so they're repeating it so they can be part of the conversation (people like to belong, like to be part of, like to join). It's an extremely common process on Reddit and most forums, and HN isn't immune to it. Most people don't read much and don't acquire most of their knowledge from high quality sources, their (thought to be correct) wider knowledge - on diverse topics they have no specialization on - is frequently acquired from what other people say and that they believe. So they flip around on Reddit or Twitter for a bit, digest a few nuggets of questionable 'knowledge' and then regurgitate it at some later point, in a process of wanting to participate and belong socially. It's how political talking points function for example, passed down to mimic distributors that spread the gospel to other mimic followers (usually without questioning). It's how religion functions. And it's how most teachers / teaching functions, the teachers are distribution mimics (mimics with a bullhorn, granted authority by other mimics to keep the system going, to clone).
It's because some very high percentage of all of humans are mimics. It's not something Reddit caused of course, it's biology, it's a behavior that has always been part of humanity. It's an increased odds of success method of optimizing for survival of the species, successful outcomes, meets the Internet age. It's why most people are inherent followers, and can never be (nor desire to be) leaders. It's why few people create anything original or even attempt to across a lifetime. It's why such a small fraction of the population are very artistic, particularly drawn to that level of creative expression. If you're a mimic biologically it's very difficult to be the opposite. This seems to be viewed by most people as an insult (understandably, as mimics are the vast majority of the population and control the vote), however it's not, it's simply how most living things function, system wise, by mimicry (or even more direct forms of copying). Humans aren't that special, we're not entirely distinct from all the other systems of animal behavior.
That saying, safety in numbers? That's what that is all about. Mimicry. Don't stand out.
The reason most Wall Street money managers can't beat the S&P 500? It's because they're particularly aggressive mimics, they intentionally copy eachother toward safe, very prosperous, gentle mediocrity. They play a game of follow, with popular trends (each decade or era on Wall Street has popular trends/fads). Don't drift too far below the other mimics and it's a golden ticket.
Nobody got fired for buying IBM? Same thing. Mimic what has worked well for many others is biologically typically a high success outcome pattern (although amusingly not always, it can also in rare occasions lead off a cliff).
The Taliban? The Soviet Union? Nazism? Genocide? Multi generational patterns of mistake repetition passed down from parental units? That's how you get that. People mimic (perhaps especially parental units; biology very much in action), even in cases where it's an unsuccessful/negative pattern. All bad (and good) ideologies have mimic distributors and mimic followers, the followers do what they're told and implement as they're told. And usually there are only a very small number of originators, which is where the mimic distributors get their material.
The concept of positive role models? It's about mimicry toward successful outcomes.
Being a mimic and being a leader/creator are not exclusive; you can do both in various fields. One can even mimic and eventually start creating their own things on the back of that: “fake it until you make it” is a real thing.
> The more I read comments on subjects I am intimately familiar with, the more I realize most people who comment on HN don't really know what they're talking about and mostly make things up.
HN doesn't look exactly like SlashDot, but it's absolutely just like SlashDot.
I just Googled [servers with 4T of RAM], but apparently no, it does not include the RAM itself. Came up with this (about $4K base, another $44K to max it out with RAM):
Yeah, I've been eyeing the EPYC servers, but RAM pricing seems to very roughly be in the ballpark of $1,000 for 128GB, so $4k for 4T sounded very attractive and I wanted to make sure that I wasn't missing anything. As
cvwright pointed out, you can get old DDR3 RAM for less, but I haven't found servers that can fit that many DIMMs. Thanks for your response.
I'm fairly sure I know where I can buy used/refurb Dell R820 at good prices with 512 or 1024GB of RAM, but I don't think I could accomplish the same for 4TB of RAM and $4000. Certainly not in one machine. And not with two $2000 machines each with 2048MB. We're close, but it's not that cheap yet.
Looking on ebay I can find some pretty decent R820 with 512GB each for right around $1500 a piece. Not counting any storage, even if they come used with some spinning hard drives, would end up replacing with SSDs. So more like three servers, 1.5TB of RAM, for $4500.
Yeah I run mirrors for certain sites in different regions of the world and a trick I often do is scrape the target site and then hold the entire site in memory. Even the top 500 posts of HN and all of the associated comments can fit in < 16 MB of RAM. If you want to serve that up over a socket, it's really fast to grab a pointer to the memory and just write it to the socket. You can get response times which are dwarfed by network latency that way.
In my experience disk i/o is the biggest bottleneck. It used to be sync()ing writes to disk for strict consistency but that's been pushed down to the DB now. I just looked at my DB systems and CPU is low but disk is nearly pegged.
My data sets are far too big to fit into memory/cache. Disk pressure can be alleviated by optimizing queries but it's a game of whack-a-mole.
I have exhausted EBS i/o and been forced to resort to dirty tricks. With RDS you can just pay more but that only scales to a point – normally the budget.
Does anyone know if any of the current crop of standard databases make use of things like Direct IO (where available on Linux), I’ve seen some write-ups that indicate you can get eye-wateringly fast performance when combined with an nvme drive.
Sure, but as far as software is concerned, optimizing for memory bandwidth (the typical bottleneck in modern systems) is not so different from optimizing for CPU.
In my experience it is almost always the database holding things up. If your app does not use a database or it makes very simple use of it, then I'm not surprised it is blazing fast. As soon as you need to start joining tables and applying permissions to queries, it all gets slow.
I'm running a crypto trading platform I'm developing on 30$ on DigitalOcean. I coded exclusively in Rust and recently added a dynamic interface to python. Today during the BTC crash it spiked at 20k events/s, and that's only incoming data.
This reminds me of back in 2003, a friend of mine worked for an online casino vendor; basically, if you wanted to run an online casino, you'd buy the software from a company and customize it to fit your theme.
They were often written in Java, ASP.NET, and so on. They were extremely heavyweight. They'd need 8-10 servers for 10k users. They hogged huge amounts of RAM.
My friend wrote the one this company was selling in C. Not even C++, mind you, just C. The game modules were chosen at compile time, so unwanted games didn't exist. The entire binary (as in, 100% of the code) compiled to just over 3 MB when stripped. He could handle 10k concurrent users on one single-core server.
I'm never gonna stop writing things in Python, but it still amazes me what can happen when you get down close to the metal.
OpenResty [1] is a good mix of these concepts. It serves requests through nginx (which is at its core just a lightweight event loop) and then serves pages through LuaJIT. If you need more speed you could always write an nginx module in C (or in some other language viz the C ABI).
Yeah currently for me there's no tool like Rust for performance, the borrow-checker isn't that complicated.
However, last month I started adding a python wrapper around the public api so I'm slowly going your way ;)
I use terraform and ansible, one droplet has monitoring (prom and grafana), 2 run services that currently run without any other infrastructure
The rest of the cost is a 250 gb space on DO
I'm running two different projects on a single instance of cheap VM too. Both of them runs on a not-so-memory-efficient programming runtime, yet the VM handles the load just fine.
Of course, a lot of it depends on what your app does for each request but most apps are simple enough and can live with being a monolith / single fat binary running on a single instance.
The problem with today's DevOps culture is that they present K8's as answers for everything. Instead of defining a clear line on when to use them and when not to.
Sure, startup is defunct now and I think arbitrage & data on centralized exchanges is a dead market now. Wall Street HFTs got into the arbitrage game, and the data sites laypeople actually visit are the ones started in 2014.
Codebase was pure server-side Kotlin running on the JVM. Jackson for JSON parsing, when the exchange didn't provide their own client library (I used the native client libraries when they did). Think I used Undertow for exchange websockets, and Jetty for webserving & client websockets. Postgres for DB.
The threading model was actually the biggest bottleneck, and took a few tries to get right. I did JSON parsing and conversion to a common representation on the incoming IO thread. Then everything would get dumped into a big producer/consumer queue, and picked up by a per-CPU threadpool. Main thread handled price normalization (many crypto assets don't trade in USD, so you have to convert through BTC/ETH/USDT to get dollar prices), order book update, volume computations, opportunity detection, and other business logic. It also compared timestamps on incoming messages, and each new second, it'd aggregate the messages for that second (I only cared about historical data on a 1s basis) and hand them off to a separate DB thread. DB would do a big bulk insert every second; this is how I kept database writes below Postgres's QPS limit. Client websocket connections were handled internally within Jetty, which I think uses a threadpool and NIO.
Key architectural principles were 1) do everything in RAM - the RDS machine was the only one that touched disk, and writes to it were strictly throttled 2) throw away data as soon as you're done with it - I had a bunch of OOM issues by trying to put unparsed messages in the main producer/consumer queue rather than parsing and discarding them 3) aggregate & compute early - keep final requirements in mind and don't save raw data you don't need 4) separate blocking and non-blocking activities on different threads, preferring non-blocking whenever possible and 5) limit threads to only those activities that are actively doing work.
Would you use Kotlin again for the back end? Having not yet used it for that purpose, it seems like you’d get the benefit of the JVM ecosystem along with a nice language (but perhaps too many power-features).
For that particular use-case (or related financial ones) I'd consider Rust, which was a bit too immature when I was working on this but would give you some extra speed. HFT is winner-take-all and the bar has risen significantly even in the last couple years, so if I were putting my own money at risk now I'd want the absolute fastest processing possible.
i tried to do something similar recently. Your architecture sounds a lot like mine. I did all the thread pool management and JSON parsing using C++ libraries/frameworks.
The real-time data was visualized on a website. Here is an example.
https://algot.io/
True. I can completely relate with this. I developed an open source crypto exchange connector [0] and created a fun twitter bot [1] on top of that. Currently twitter bot processes all the USDT market trades from Binance (around 260 markets with average 30,000 trades per minute) and calculates OHLC metrics every 15 minutes using InfluxDB. All these installations and calculations are done in a free tier 1 VCPU / 1 GB RAM AWS server (less than 10% CPU and less than 40% RAM usage always).
[0] : https://github.com/milkywaybrain/cryptogalaxy
[1] : https://twitter.com/moon_or_earth
They really do, and because of that they reach for over-engineered infrastructure solutions. I mean I get that you'd like to have some redundancy for your webserver and database, maybe some DDOS mitigation, off-site backup, etc, but you don't need an overblown microservices architecture for a low-traffic CRUD application. That just creates problems, instead of solving problems you WISH you had, and it's slower than starting simple.
I totally agree! I run a data science department at a corporation and it's amazing how much of our job is done on our laptops. I have a Dell Precision. When I need more power (a very rare situation), I can spin up a GPU cloud server and complete my big analysis for under $5.
may be OT, but how do you subscribe to these trade feeds, is there a unified service or do you need to do it individually for each source, and how much does it cost approximately ?
I'm guessing if you put all this data into Kinesis or message queues it would end up costing quite a bit more.
There are probably unified services that let you do it - I was kinda competing in this area but didn't want to deal with enterprise sales, and it's a bit of a hard sell anyway.
If you do it individually, there are public developer docs for each exchange that explain how their API works. It's generally free as long as you're not making a large number of active trades.
Never heard of a crypto exchange that charges for data feeds, the norm is free and fast. One of the positive of the industry compared to old school finance.
They're rent seeking in other ways though, no worries.
How do I get in touch with you? Definitely using more resources than this to process fewer integrations. I’m curious what trade offs you made to enable this.
What's the point of this post? OP is serving a file at 50req/sec. There is not even mention of a dB query. How is that able to relate to any kind of normal app?
I guess that the post was written as an answer to the mangadex post [1]. Mangadex was handling 3k req/sec involving dB queries. It was not just a cached Html page.
50req/sec for a Html file is super low which shows that a $4 month server cant do much actually. So yes this is enough for a blog, but a lot of websites are not blogs
> How is that able to relate to any kind of normal app?
There's too much competition involved in writing normal apps, which often attract significant investment that bootstrapped startups struggle to compete with.
It's interesting to see what kind of performance is possible for next to no money, when you throw out basic assumptions like using a database, and then start thinking about what you could build out of it.
My recent submission of HNBadges was made like this. It's just 3 files (html, css, JS) which I hosted for free on Netlify, but could have been hosted on a setup like OP. I used other services for XHR requests. I imagine it got a tonne of traffic from being on the first page, but I wasn't taking metrics.
Another example of clever use of resources is the https://haveibeenpwned.com/ website. Using a bloom filter (I think) to turn what could have been a back-end lookup into a "front end lookup" by requesting a small file from the server based on the password hash.
The only issue I have with the OP is his assumption that you'd get a nice smooth 60 request/second throughout the day! Most likely will be lumpy, and in the top of the lumpy periods (where most of your visitors visit) performance will be bad.
My $5/mo server can handle several thousand requests per second. It’s mostly a question of what server software you use. If you use some node, python, ruby thing, it’s going to be slow as shit and need a reverse proxy in front of it. If you use a fast compiled language with a good framework, you can rip through requests no problem.
I tried a bunch of different stuff and ended up using Haskell - all of its popular web libraries are fast as hell. Go was fast but its standard library leaked sockets or I was not cleaning up connections properly or something, and it would tank whenever something went viral. All the popular interpreted language backend I tried were absurdly slow, like tens of RPS.
Source for my current thing is at http://yager.io/Server.hs. It also does all my RSS stuff, image processing for my photo gallery, etc.
> These benchmarks show that a very cheap server can easily handle 50 requests a minute to a "full stack" website.
I did, and all I see is someone spinning some numbers idly, like, hey, if I can lay 1 brick every second, then with 20000 people we can build a house in one second! So good!
a) entirely and totally lacking in experience running a heavy load website.
b) 50 requests a minute is so atrociously bad, it’s not even worth talking about.
Sure maybe a db exists, but it’s not relevant when you compare this to the complexity of doing write operations.
Ie. this is some hiiiigh level arm chair commentary right here.
Sure, they’re just talking about their website, but anyone going “oh yeah, look at this, those mangadex guys should learn a thing or two and run it on django”. …has no idea what they’re talking about.
> Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that."
One of my proudest moment in my career is when I lowered our app processing time from ~8hrs to 17 minutes. When I deployed my first update, it reduced it to 2 hours. The sysadmin immediately contacted me that there was something unusual. I confirmed the results but he was skeptical.
Then with my second update, he told me that the app must be broken or that the script must be dying. There is no way it could complete this fast.
What was the issue? We processed terabytes of data. Each and every single line processed created a new connection to the database and left it hanging. A try catch was added when the connections failed and restarted the process. Removing the connection from the for loop and properly handling it reduced the time drastically.
And... why would you loop through millions of records when you can use batches? Also this was a phperlbashton* script. I turned it into a single PHP script and called it a day.
As a consequence, backup time was reduced to 2 hours as opposed to 12 hours (no one was allowed on the website until the back up was done).
I have a similar story where I reduced the memory used by a script from 1TB (yes, TB) to a few megabytes. The runtime was massively reduced too, from something like 1 day to a few minutes.
This was for a genomics project and they ran it on a supercomputer. When I looked into it, they were reading the entire input into a giant array before doing one pass and dumping the result out to disk. I made a tiny change (it was a Perl script) to make it stream the I/O instead.
This is the most extreme example I've come across of people using computing power just because it's there. Nobody questioned why the script took so long to run because the data really was in the TBs and other stuff also took that long to run. Waiting a day for the results was considered normal. I see the same thing on desktop apps etc., on a much smaller scale, of course. When I run an electron app it takes several hundred milleseconds to do anything at all. But nobody questions whether it should because everything takes several hundred milliseconds.
Mine was when I rewrote some application code to be processed within postgres to send a few thousand sms's (the logic to decide who should get them was the slow part + amount of data involved). It went from about 45 minutes to less than 1 minute. Was an amazing feeling to see the sms table filled up with the correct data - I also thought something was broken since we just accepted that those runs were normally supposed to be slow.
I had a similar experience as a intern years ago! Part of my daily responsibility was to manually enable/disable API integrators who had increased levels of traffic as it would take the API down for all partners. Pretty bad. Thousands of requests at peak would bring the server to its knees.
Until I worked out during some minor maintenance task that every request was logged to a flat file. Appended. Every request. The file was probably 100gb by the time I found it and every request log would lock the logging file. The server had been running for a couple of years by that time.
Normally benchmarks for things like this are measured in how many concurrent requests can be handled, i.e the C10K problem, not by how many requests you are able to serve in a day. It's also well known that you can serve a large amount of requests on limited hardware.
"By the early 2010s millions of connections on a single commodity 1U rackmount server became possible: over 2 million connections (WhatsApp, 24 cores, using Erlang on FreeBSD),[6][7] 10–12 million connections (MigratoryData, 12 cores, using Java on Linux).[5][8]"
Although I do understand the boxes listed above have more resources then the VPS you are using. I am also not criticizing your write up, or results, bench-marking is in general interesting to do. I just wanted to provide some additional information.
Yes, handling 50 requests per second doesn't mean the server can handle 4.2 million a day. That can only happen if they are uniformly distributed throughout the day, which isn't the case for most website traffic.
Right. I calculated what 5m/day converts into. And it's like 60 req/sec. Considering non even distribution and spikes, I would assume its like 200req/sec.
I would guess they may be muxed over fewer sockets, by their LBs, but that's not strictly necessary.
I'm not sure exactly what you mean by "run out of TCP sockets", but theoretically speaking, the only limitation is how much memory is available to store the necessary info about the socket (like address/protocol info and process info).
In practice, OS's do have a "max socket" or "max FD" limit, but that's usually configurable and (with enough RAM) could easily be set to "millions".
> I'm not sure exactly what you mean by "run out of TCP sockets", but theoretically speaking, the only limitation is how much memory is available to store the necessary info about the socket (like address/protocol info and process info).
Probably the 65k port limit since each connection will get assigned a remote port, which can be solved by binding to multiple local ports and using a load balancer in front or using multiple network interfaces.
The 65k limit is per client address and port. Each client can have 65,535 connections to a single port on your server from each and every 65,535 of their own ports. (65,536? I dunno what would actually happen if you tried to use port 0.)
I understand your excitement for being able to handle a decent amount of requests on such a small server, but just like many other websites that get on the frontpage of HN, your site is taking multiple seconds to load for me, depending on when I refresh.
As you said in your post, adding caching to your site increased your throughput by ~20% (or +10/req/sec). What you and other sites seem to lack is a more distributed caching, a la CloudFlare, S3 CloudFront, Azure CDN, etc. Those last two only really work well for a static site, however as mentioned in your post that's essentially what you're serving.
While I'm all for having a free-as-in-freedom hosting solution and keeping things lean, the internet is a fickle beast, and nothing looks worse for a company who posts on HN when their technology-oriented site can't handle a few thousand requests per minute. (Or in this case, when a blog claims to handle 4.2M requests a day -- 2.9k req/min)
Looks fine over here, and he doesn't have to route through a fucking Internet gatekeeper like Cloudflare or Amazon... let's enjoy this golden era before Chrome starts flagging any site which isn't fronted by a "reputable" cache like Cloudflare, Amazon, or whatever Google decides to introduce.
It would also be possible for OP to spin up their own Redis cache, and have multiple POPs near their target audience, and handle DoS type attacks against their site if need be, and easily be able to brush aside bot traffic, and...
Not all the above apply to a hobby-blog style site, but I wasn't referring only to OP's site in my original comment.
I understand that not everyone needs to feed into "fucking Internet gatekeeper"s as you described, but the fact that they provide valuable services is undeniable. They make a complex operation -- one that could mean the difference between a company being able to sell their product or not -- simple.
For sure, OP's site is handling this much better than most. And like I said, it's not every time that it takes multiple seconds. Some websites featured on HN/Reddit don't load at all when under load. However I was able to get it to take ~30s to load multiple times, over a period of around 10 minutes.
The important message there is that if you can change your problem from serving slow dynamic content to serving static content you can gain enormous performance benefits.
Whether that means actually using static sites for stuff that can be static or just properly caching expensive things. Even dynamic content doesn't have to be slow, but many CMS are seriously inefficient without a cache. I'm not really blaming the CMSes entirely here, part of that is because they need to be extremely flexible, but once you need dozens of DB queries per page it'll fall over quickly on small hardware.
We're using Next.js at my current company with a custom MongoDB based CMS.
Next has a thing called Incremental Static Regeneration[0] which allows us to grab the top ~100 pages from the CMS at build time, generate the pages, then cache them for however long we want. The rest of the pages are grabbed when requested, then are cached for the same amount of time. After the time, they're re-grabbed from the DB, then re-cached. Overall I think we're down to around 5-10% of the way things were done before, which was -- you guessed it -- hit the DB on every page load _just in case_.
Sit the Next.js site behind CloudFlare, and then we also don't really pay data transfer costs. Our servers are just low-tier GKE nodes, and we run around 3k/visitors at any given time, sometimes spiking up to 8k concurrent.
Even database queries aren't that slow on reasonable hardware, as long as the queries are simple. The problem appears once you have dozens of DB queries per page. It's really not a fair comparison to the site this topic is about, but for trivial queries you can easily get a few thousand requests per second out of Postgres on desktop hardware without any real tuning as long as the DB fits into memory.
But static content is of course still much faster and also much simpler.
Are you sure that's related to the server itself? The page loads instantly for me and the DNS is still resolving to an OVH IP address.
Timing info from Firefox:
Blocked: 0ms
DNS resolution: 8ms
Connecting: 9ms
TLS setup: 12ms
Sending: 0ms
Waiting: 30ms
The very last resource (favicon.ico) loaded after 466ms and that's mostly because of the other files being requested only after the CSS has come in (after about 195ms). All in all the entire site (without the Matomo tracking JS) loaded in half a second.
Maybe the website has switched hosts in the last ten minutes, I guess, but I doubt it. I think this is more likely to be a problem related to distance to origin and saturation of the underlying connection.
Working in the space, that's one of the more frustrating things to see on HN/Reddit/etc. It's not a complex or niche thing, and especially for sites that only make profit when people can actually visit them, it's kind of a necessity to stay up as much as possible.
I don't have a "Do all These for a Fast Website" list handy, but here are some key points I've found can be applied to most sites:
- Make sites that are fast by default: Small bytes sent over the wire, beyond just initial page load, too. Yes, that does mean that your giant Google Tag Manager/Analytics/3rd party script is bloated. Reach out to 3rd parties about reducing their payload size, it's saved me several MB over the years. Also, not writing efficient CSS is a huge killer when it comes to byte site. Devs shouldn't "leave it just in case" when it comes to code, you have version control for a reason. And when a new feature comes out, clear out the old cruft.
- Avoid unnecessary DB calls: Obviously you need to get the data onto the page somehow, but if you can server-side render, then cache that result, you're reducing the overall calls to the DB. Also, optimizing queries to return only-what-you-need responses helps reducing total bytes over the wire
- Balance between Server and Client side: Not only are servers getting more powerful, so are client devices. Some logic can be offloaded to clients in most cases, but there needs to be a balance. Business-critical logic should probably be done server side, but things like pagination & sorting -- so long as they client will likely see or use all the data -- is fine in my book. Having 2000 rows of JSON in memory is totally OK, but rendering 2000 at once might cause some issues. Again, balance
- Hopping on the latest-and-greatest bandwagon isn't the best: Devs hate re-writing the site every 6 months, and really the newest framework might not be the best for your use case. Keep up to date with new technology, but saying "not for me" is fine.
- Don't let (non technical) managers make technology decision: See above. More often than not, C-levels want to use shiny new things they read an article about on LinkedIn once, no matter if it fits the needs of the company or not. Thankfully I've only been at one place that was like that, but while I was there it was hell. Current VP was an original developer on the site back in the early 00's, so he knows how to deflect BS for us. That VP also knows that he's outdated in his knowledge by now, so he trust the Devs to make technical decisions that are best for the company.
Working extremely fast for me right now. Your post was about 20 minutes ago. I don't know how HN traffic fluctuates, but it seems really solid compared to most sites.
> Parts of the blog posts are cached using memcached for 10 mins
That means Django needs to accept the request, route it, pull the data from memcached, render the template.
For such a site I'd just set the `Cache-Control` headers and stick Varnish in-front of it acting as a reverse proxy. That'd likely increase the page load times significantly and make the backend simpler not worrying about manually caching in memcached and just setting the correct `Cache-Control` http header.
As it's budget hosting i'd probably not even bother with Varnish and outsource that to Cloudflares generous free tier, it's cheating as your server (Origin) isn't doing 4.2m requests but the practicality is really convenient.
Hey OP, in case it wasn't clear from my original comment, I am impressed with how stable your site is! "A few seconds" is wonderful for a #1 post, and far better than what usually happens to lightly-hosted sites that get to this point. Those are usually timeouts or outright failures to connect.
I just figured I'd start some conversation on the post, since there wasn't any comments when I initially looked. For better or for worse it seems like I got people talking.
All these "X requests per unit time" posts are starting to make me want to break out some of my experimental code... I have some services that can process several million events per second. This includes: compressing the event batch, persisting to disk, validation of business logic, execution of all view updates (state tracked server-side), aggregation and distribution of client update events, etc. These implementations are easily capable of saturating NVMe flash.
If you want to see where the theoretical limits lie, check out some of the fringe work around the LMAX Disruptor and .NET/C#:
You will find the upper bound of serialized processing to be somewhere around 500 million events per second.
Personally, I have not pushed much beyond 7 million per second, but I also use reference types, non-ideal allocation strategies, etc.
For making this a web-friendly thing: The trick I have found is to establish a websocket with your clients, and then pipe all of their events down with DOM updates coming up the other way. These 2 streams are entirely decoupled by way of the ringbuffer and a novel update/event strategy. This is how you can chew through insane numbers of events per unit time. All client events get thrown into a gigantic bucket which gets dumped into the CPU furnace in perfectly-sized chunks. The latency added by this approach is measured in hundreds of microseconds to maybe a millisecond. The more complex the client interactions (i.e. more events per unit time), the better this works. Blazor was the original inspiration for this. I may share my implementation at some point in the near future.
> The trick I have found is to establish a websocket with your clients, and then pipe all of their events down with DOM updates coming up the other way. These 2 streams are entirely decoupled by way of the ringbuffer and a novel update/event strategy.
Could you detail this, please? I don't get it. What is the flow?
1. Browser is sending events to web server via web socket, instantly as the event is occurring (?)
You got 1 correct. Everything that happens gets sent immediately as an event to the server (e.g. KeyDownEvent). These are pushed without blocking for a response to each - The websocket guarantees delivery and ordering.
Upon receiving an event from the client socket, it is immediately inserted into the LMAX ring buffer for processing.
Updates to the client are triggered by events+state determining when a redraw is required and issuing a special "ClientRedraw" event into the same queue. These events are grouped by client so that we can aggregate multiple potential updates in a single actual redraw. These result in view updates being pushed back down to the relevant clients. One performance trick here is that the client redraw is dispatched asynchronously from the server, so there is no blocking on processing the subsequent batches each time.
You can think of an E2E client view update as always requiring 2 events - the client event that triggered the change to domain state, and the actual redraw event(s) that result. For applications where the client should update at a fixed interval (e.g. game), a high performance timer implementation injects periodic redraw events. Because the upper bound of the ring buffer latency is around a millisecond, this allows for incredibly low jitter on real time events. Scheduling client draws as simple domain events is feasible.
This is a good, simple way to show how much can be done with modest resources.
Sometimes we see people fetishizing bigger and faster, then gatekeeping when people want to do the same work with modest means, whether it a four quid a month hosting service or a first generation Raspberry Pi. Not everyone has the money or desire for bigger & faster, and it's nice to see that here.
I'm not sure it's so much about fetishizing, and more about realistic expectations around software development in larger teams.
If you are the sole developer working on your own site - be it a side project/hobby/labour of love or your source of income - you have complete control up and down the stack and have the leeway to tweak performance wherever needed - whether that's indexing and optimizing queries in the backend, reducing the size of your static assets, caching, whatever. You can even yank whole features if you feel their inherent complexity and load outweighs their usefulness.
In anything including and above a medium sized company, a single developer will rarely have the leeway to do anything beyond tinker with their small slice of the stack. They might spend some hours carefully optimizing a query, but it's for naught because the frontend team have screwed up the webpack settings and the JS load runs into many MB. Or you have both done your jobs but the PM wants a ton of analytics on every page. And the CEO's pet feature is a maintenance and performance nightmare but nobody has the clout to have it removed or even simplified. Nobody wants to waste sprints on paying down tech debt in a feature factory, so it becomes progressively harder to fix performance issues.
At that point, the cheaper and politically easier option is to just fire the money cannon at expensive cloud services and hope the extra spend squeezes out some performance gains.
Absolutely. Even a terrible Wordpress instance can be beautifully (and transparently!) cached behind either nginx or varnish with ease, in which case you’re just serving static html pages and can probably handle any traffic you are likely to ever get.
I'm hosting my static blog site on a physical Raspberry Pi 3B+ powered by Solar [1].
That blog post got hugged by HN but it didn't even raise the CPU above 10% on a single core.
And a Raspberry Pi 3B+ is dog slow. And severely limited by bandwidth, unlike the Raspberry Pi 4B+. (But it uses less power so that's why I use a 3B+).
However I have another point to make. Professional rack-mount servers from HP and Dell can be had second hand for dirt cheap and you get a ton of CPU (20+ cores) and an ocean of RAM for next to nothing.
For many applications, an old Gen8 or similar Dell server will perform more than adequately. Even more so if you have a little bit more to spend on Gen9.
They are so cheap that you can like buy four to eight, sprinkle them across two different datacenters and even if one breaks, you won't be in any hurry.
That estimate can be verified with load testing systems like Artillery. My theory is that things would break far sooner than estimated along the following lines:
- Too many WSGI connections if the timeouts aren’t tweaked
- Too many database connections, especially without caching and tuning
- on the Apache side if MaxRequestWorkers isn’t set there will be memory issues with 1GB RAM
- the disk could easily hit IOPS limits, especially if there is a noisy neighbor
It’s not likely all or any of these things will hit IRL, but that all depends on traffic and usage patterns. It matters not, if you were getting 4.2 M requests each day you’d be in the Alexa Top 1000 and could probably shell out for the $8 server :)
All these examples show even more why software engineer(not developer) is a discipline where 10x salary difference can be seen between the best and the worst. For the 10x you are paying, you are getting a (n^2 - logN) times performance gain, especially when you are dealing with problems with large amount of data.
However, relying on people themselves is often not the best stable solution. I am wondering if all these N^2 mistakes people made can be prevented by innovative means like language features, framework improvements, tooling and etc.
And I'm talking about prevention, not the post mortem perf measure and fix kind
Very true. It also makes a difference as to which resource is being pulled, whether it is cached, what transport is being requested (SSL, compression, etc).
I really suspect the website would fall long before it hits anything close to 4.2 million requests (which the author also seems to except).
The comments in this thread surprise me quite a lot. I suppose it shouldn't, but it does. This post + the response calls to the surface how badly basic system operations knowledge is needed in the industry and how much of it is missing from the toolkit of most developers.
I don't have any one complete book that I can recommend, and I don't even really have a great reading list for this. But I'll make an attempt to share what I think is useful as a starting point.
1. Systems Operations is first and foremost about understanding systems, in all of their complexity, which means understanding the internals of your OS primarily.
2. Performance and networking, in particular, are super important areas to focus on understanding when it comes to learning the topic to help with software development.
3. A lot of it is about understanding concepts in abstract and being able to extrapolate to other situations and apply these concepts, so there's actually quite a lot of useful information that can be learned on one OS and still applied to another OS (or on one game engine and applied to another, et al).
Here's a few books I think are worth reading, not in any particular order of prevalence, but loosely categorized
Part 2: https://www.amazon.com/Windows-Internals-Part-2-7th/dp/01354... (I had the pleasure of being taught from this book by Mark Russinovich and David Solomon at a previous employer, was an amazing class and these books are incredible resources even applied outside of Windows, we used 5th edition, I linked 7th, which has the 2nd part pending publication).
The Art of Computer Systems Performance Analysis: https://www.cse.wustl.edu/~jain/books/perfbook.htm (no longer available from Amazon, but is available direct from publisher. This is basically the one book you should read about creating and structuring benchmarks or performance tests)
I guess that's a "reading list", but this is just a small part of what you need to know to excel in systems operations.
I would say for the typical software developer writing web applications, the most important thing to know is how databases work and how networking works, since these are going to be the primary items affecting your application performance. But there's obviously topics not included in this list that are also worth understanding, such as browser/DOM internals, how caching and CDNs work, and web-specific optimizations that can be achievable with HTTP/2 or QUIC.
For the average software developer writing desktop applications, I'd say make sure you /really/ understand OS internals... at the base everything you do on a computer system is based on what the OS provides to you. Even though you are abstracted (possibly many layers) away from this, being able to peel back the layers and understand what's /really/ happening is essential to writing high-quality application code that is performant and secure, as well as making you a champ at debugging issues.
If you're trying to get into systems operations as a field, this is just a brush over the top surface and there's a lot deeper diving required.
I'm not sure there was ever an argument saying otherwise. The ease of processing X million requests is heavily dependant on what those requests actually do. Trivial use cases shouldn't be a surprise to have high throughput.
And here I am building a simple API using Lumen (Laravel's stripped down, hopefully faster cousin) getting response times that are just abysmal.
The raw queries themselves are fast enough, but for some reason running them in a framework, transforming them in to a Resource and dumping it as json takes so long that I'm scared to find out what this super popular framework is even doing under the hood.
Once I learn enough Python I'd like to compare its performance to something like FastAPI. But even that probably won't come near what these recent posts are describing.
(Disclaimer - it's just a side project and I haven't really looked in to making it faster)
Working on other companies' mobile apps, about half the performance problems I've discovered have been down to some accidental crazy, like initialising something you only need once, in a loop, in 3 different places (because the code has become so unnecessarily complicated that no one really knows what it's doing). The rest are due to some piece of code accidentally blocking the UI thread.
A well written mobile app doesn't really have any need to be sluggish at all, including smooth animations and fast scrolling lists, it was doable 10 years ago, it's doable now. (*I don't know about games).
But unlike on the server side, the accepted wisdom in most places I've worked at is that the answer to the performance problems is: a new framework.
(I feel like this is a lie that developers tell the business side, and maybe themselves. It avoids having to explain that software is hard, sometimes you don't get it right the first time, and if you don't spend time and effort tending to it, it can turn into an ungodly and expensive mess - and that's got nothing to do with the hardware or the framework)
Re: benchmarking, sometimes the bottleneck is the machine or server that issues the requests, not the receiver that you are testing. To figure out your actual capacity, you sometimes need multiple request servers or a more powerful request server. This was the case for a project I did a few years ago. Not a critique of the blog post, just remembering something out loud
His site, https://peepopoll.com/, took about 10s to load for me. It’s also good to chart other metrics like response times while you benchmark. Requests per second isn’t the same as a low response time
Indeed. Recently a client needed to bench raw req/s processing power of their application server and I had to ask for a powerful server running on the same DC in order to discard any potential routing issues.
Combined with a CDN and you can do a lot more, I like to think in terms of origin requests rather than raw req/s. The problem I have is getting people to really understand how caching works at the cdn and browser layers and design their frontends and backend responses around that. There is also lot you can do with edge compute to clean the incoming requests before the CDN evaluates them to increase cache hits. Even if you are trying to give a "real time" view of data, caching it for even a second and allowing stale data to be served while it is updating can reduce origin requests significantly. I've seen people hammer sites hundreds of times a second looking for changes that only happen once every fifteen seconds or once per minute - the best thing you can do for yourself is handle all those requests at the CDN level (eventually you'll do log analysis and see the activity and can take other measures, but in the mean time don't let all of those requests go to the origin). Your CDN is probably giving you better rates for network egress then Amazon or Google anyway - the later are more focused on incentivizing you to use their ecosystem exclusively by penalizing you for sending data "outside". Cheap VPS hosts discourage you exceeding your bandwidth allocation because they are overselling their capacity and heavy usage upsets that - so again you want to shift as much as you can to your CDN.
It’s the database part that gets expensive for web applications. Serving up static web pages is absolutely trivial for modern servers.
The database is also the part that doesn’t easily scale, unless you pick a highly scalable database from the outset, and those have their own complexity and tradeoffs as well.
That’s why I believe every project should start with a bulletproof model of how the database will work first, then fill in the other details from there.
It’s not always as easy as picking Postgres and calling it a day, unfortunately.
50 rps not that much, though of course easily sufficient for many situations. This is also Django which certainly isn't the fastest choice. I played around with it a long time ago and liked it quite a bit, but you don't choose Django for performance but for the other benefits.
I'm really more surprised that static serving is so slow at 180 rps. This should be able to easily saturate the network, statically serving files is very, very fast. From what I see in the blog I doubt that the files are very large, so there is probably some other bottleneck or I'm missing something here.
I run a FOSS Pi-Hole esque public DoH resolver in the most expensive way possible (over Cloudflare Workers) and it costs $2 for 4M requests. Granted the CPU slice is limited (50ms) but the IO slice (30s) is plenty for our workloads (stub dns-resolution).
The reason this is cheaper in a sense is because Workers deploys globally and needs zero devops. Per our estimates, this setup (for our workload) gets expensive once the request range goes beyond 1.5 billion a month after which deploying to colos worldwide becomes cheaper even with associated cost of devops.
When my stuff has been shared on HN and Reddit, I’ve seen peaks of around 500 rps (according to google analytics), so if you can hit 1k rps you’re almost certainly ready to weather any kind of viral sharing that a blog might experience. Several krps on cheap VPS hardware is easy with a good compiled language backend (Haskell’s Warp is a good one). If you use a node/python/ruby/other interpreted backend, you will need aggressive caching through a reverse proxy.
this story remind me, the dot com bubble. dotcom companies bought servers from Sun Microsystem. they needs to handle the large traffics that "PC" server can't handle.
You have clearly stated in your footnote that if anything goes wrong then the given numbers can go down. To be honest, that's what happens all the time. Developers do something wrong and number of requests just drop down to the floor.
Only static websites are the one which handle large amount of requests at low cost. Web hosting providers don't make money out of those clients, so they run shared plans.
Most of the times the more interesting question is not really how to make a server that can handle 4.2M requests a day, but how to make something so useful that it gets more than 100 pageviews a day.
I was skeptical that it would fail under real situations (the calculation says it's 50 requests/second). But it looks like reaching the top of HN didn't crash it.
The new metrics for server performance is "unique buzzwords per paragraph". If you don't write a 4 page blog post with at least 4-5 ubpp then it is shit.
I used to be on the top 500 of Alexa with a shared hosting server running PHP. Never had any issue. Mind you, I wasn't doing anything complicated, but still.
You're running an 20 players mc server with that hardware? How? In my experience Minecraft-servers are insanely resource hungry, especially whilst generating new parts of the map or larger red stone contraptions. Back when I played around with hosting servers I used the "most optimized" mc-server fork called Tuinity - and even with that I had to allocate way more cpu and ram to the vms then u use. Would love to hear about your setup.
There are a few comments in here that predictably suggest that simple static sites can handle large request rates easily.
Sure, that's true - but to try to progress the conversation: how would you measure the complexity of serving web requests, in order to perform more advanced cost comparisons?
(bandwidth wouldn't be quite right.. or at least not sufficient - maybe something like I/O, memory and compute resource used?)
apache + wsgi isn't even close to being the most performant webapp server software, either. Bet he'd get 5x the performance out of nginx + lua on the same virtual hardware.
Even WordPress would work fine if we use a plugin like WP Super Cache (no idea why they don't cache things by default). It wouldn't beat a simple static page, but WordPress + Cache plugin + cheap VPS can easily handle #1 on HN.
It feels to me like most websites out there could run on way less hardware, if only people would embrace a few things.
#1 Minimalism. You don't need 400 KB of JS to display some mostly text content to your users with some interactivity sprinkled in.
You don't need to reinvent office software, or very rich text editors in browsers, stop using the web as a universal delivery platform/mechanism, because that's not what it was meant for. When browsers will ship integrated dependencies so that even CDNs don't need to be hit (like versions of jQuery, Bootstrap and numerous JS frameworks as well as WASM code like Blazor which contains a .NET runtime), then you'll be able to do that, but arguably that will never happen.
Use the web as a platform for displaying primarily text content with the occasional images, forms and a little bit of interactivity sprinkled in. Most sites out there simply aren't and shouldn't be like this (that said, when you have exceptional reasons for throwing aside that suggestion, do so): https://geargenerator.com
#2 Static content. You don't need to use Wordpress, Drupal, Joomla or many of the other CMSes out there, since they can get really heavyweight with numerous plugins and are not only a security challenge, but are also problematic from a performance perspective.
Consider using static site generators instead. When reading an article of yours, the DB shouldn't even be hit, since most of the article contents are unlikely to change often, so you should be able to pre-render each of the article versions as a set of static HTML and use the common JS/CSS that you already have for the rest of the articles. Furthermore, it's easy to just jump into CMSes and introduce ungodly amounts of complexity, all of which cause your back end to process bunches of code for each request. Static files don't have that drawback.
#3 Caching. Know when and what to cache, and how. Images, JS files, CSS files and even entire HTML pages should be cache friendly. Know which ones aren't, make exceptions for those and cache everything else.
Not only is it not necessary to hit the DB for many of the pages in your site at all, but also sometimes you shouldn't even hit the back end either. The most popular pages of your site should just live in a cache somewhere, be it within your web servers or a separate solution, so that they can be returned instantly. HTML is good for this, use it.
Furthermore, know what cache policies to use. Sometimes even the cache resources shouldn't be redownloaded, if the user already has these resources loaded from a different page. Use bundle splitting responsibly, extract common functionality in easily cacheable bundles and set the appropriate headers.
I don't claim to know it all, but working towards the goal of efficiently using pages should definitely be viewed as an important one: be it because you want to pay less for your infrastructure, or care about the environment, or even just want to manage fewer nodes.
Instead, nowadays far too many orgs just try to be the first to market and ignore the engineering based approach to ensuring that the solutions are not only functional but also sustainable. That saddens me.
You can also handle 50 requests per second on a 66MHz 486DX2 with 16MB of RAM and a 10Mbit/s network card. Not with modern "I have infinite resources" software, but we used to handle more than that traffic regularly in the early 90s.
Github/Gitlab pages is my choice for being free, managing https and just generally being pretty good. You can even use the CI to compile the site for you.
Ah, kids these days. So I guess by "webserver" grandparent meant "A server you actually control", and by "not webserver" people mean "It's on the cloud somewhere".
GH/GL-pages still respond to HTTP/HTTPS requests, in my dictionary that's a webserver, but I guess for the millennials (I say with snark) it means "I don't have to think about what happens, it just serves my content for me."
I read "do you need a webserver" like "do you need one yourself" and freeloading off someone elses server like with GL pages counts as not having a webserver. Otherwise how else would you serve content?
Broadly speaking people on HN have no clue how to setup a performant httpd/app server and are impressed by abysmal performance/cost metrics like this or the MangaDex post. Everything these days is obscured through multiple layers of SaaS offerings and unnecessary bloat like kubernetes.
~10k rps (it was concurrent connections but close enough) was state of the art in 1999. Now 22 years later ~50 rps is somehow impressive.
In my opinion it isn’t so much of a lost art or lamentable accretion of useless abstractions, but an increase in the scope of what web apps do these days. Most of us aren’t working on static sites or simple CMS publishing- those are a solved problem. Instead we’re building mobile banks, diagnostic systems, software tooling, 3D games, and shopping malls. The complexity is inherent in the maturity of the web and it’s many uses, as well as its global scale. Hugs of death are rare these days thanks to better architecture and infrastructure, even though the scale of users has grown 100X.
Yes there are wildly unnecessary abstractions that are used for small sites/apps, but I would contend they are artifacts of someone who it’s trying to learn something new, and/or get promoted. I have no problem with the former.
> ~10k rps (it was concurrent connections but close enough) was state of the art in 1999. Now 22 years later ~50 rps is somehow impressive.
I honestly don't understand how that can be true. I'm not suggesting you're lying of course, but when you put it this way it's almost like people are actively trying to slow their programs down. I have a few ideas on why that might be the case (switch to slow interpreted languages, switch to bigger web frameworks, bigger payload) but even that wouldn't explain all of it. Do you have any idea why things are this way?
This is just basic use the right tools for the right job 101. You've got what is basically a static website. You want to serve static files. To do that, you use a fast language and/or servers written in those languages.
It's something anyone who has done this for any length of time knows, that HN is impressed by this is confusing to some of us. If you were trying to get as little out of your server as possible you'd serve cached content using this framework in this language.
I thought you meant from 10k to 50 rps doing the same work, not that most of the work could be avoided in the first place.
> Is this stuff not being learned?
I don't know if it is. I recently finished my studies, and most people had no curiosity at all. As in, they learned a framework early, used it everywhere, and got a job using it. I do remember reading a few times on tutorial that you should put a Nginx as reverse proxy in front of your Django/Flask/Express server to server static files, so I think most people know/do that but I'm not sure.
On the other hand, having the wisdom of knowing what can be static in the first place? I don't think that it's something teached. In fact this kind of wisdom can be hard to find outside of reading lots of sources frequently in hope of finding little nuggets like that. I don't think I was ever taught explicitely "You should first try to find out if the work you're trying to do is necessary in the first place". In a way it's encoded in YAGNI, but YAGNI isn't universal, and is usually understood at the code level and not the tooling level.
> On the other hand, having the wisdom of knowing what can be static in the first place? I don't think that it's something teached.
All traffic is static by definition. You are not modifying bytes when they are in transit to user. And you don't have to serve different bytes each microsecond just because users want to be "up-to-date". The network latency is usually around 40ms or so. If your website serves 1000s of requests per second, you should be able to cache each response for 10ms, and no one will ever notice (today this is called "micro-caching").
Of course, most webpages can't be cached as whole — they have multiple "dynamic" parts and have to be put together before serving to user. But you can cache each of those parts! This is even simpler if you do client-side rendering (which is why MangaDex abysmal performance is pathetic).
Then there are ETags — arbitrary strings, that can be used as keys for HTTP caching. By encoding information about each "part" into a substring of ETag you can perform server-side rendering and still cache 100% of your site within static web-server, such as Nginx. The backend can be written in absolute hogwash of language such as Js or Python, but the site will run fast because most requests will hit Nginx instead of slow backend. ETags are very powerful, — there is literally no webpage, that can't be handled by a well-made ETag.
Even pages, that need to be tailored to user's IP can be cached. It is tricky, but possible with Nginx alone.
Instead of "static" you are better off thinking in terms of "content-addressable".
> On the other hand, having the wisdom of knowing what can be static in the first place? I don't think that it's something teached.
I think the trick is realising that reaching for a "programming language" is just one of the tools we have to solve a certain problem, and probably the last one we should reach for! For a stable system, you want less moving parts. A good programmer fights for it.
Can you solve a problem just by storing a JSON file somewhere? Can you solve a problem without a backend? Can you solve a frontend problem with just CSS or just HTML? Can you solve a problem without Javascript? Can you solve a data storage problem with just a database instead of database+Redis? Do you really need a full-fledged web framework where a micro-framework would suffice? Do you need micro services, Kubernetes, containers and whatnot for your site before it gets its first visitor?
I find that a lot of people go for the "more powerful" tool just to cover their asses. They don't want surprises in the future, so they just go for something that will cover all bases. But what you actually want is the things with the least power [1].
Another issue is that intelligent people have an anti-superpower called "rationalisation". They can justify every single decision they make, as misguided as it is. So it doesn't matter if a website could be done with a single HTML file: it is always possible to find a reasonable explanation for why it needed k8s, micro-services and four languages.
Software has become slower and less efficient because we have faster hardware to run it on and nobody's asking for the software to get more efficient. Back in the day you had inefficient, expensive hardware with limited software. There was demand to make software as performant as possible so it didn't cost you ten grand to serve a popular website. But now you can load up on hardware for pennies on what it used to cost. 16 cores? Sure! 32 gigs of ram? No sweat. Software can now be bloated and slow and it won't break the bank. On top of that, we now have free CDNs, free TLS certs, even free cloud hosting, and domains are a couple bucks. You can serve traffic for free today that would have cost $50K 15 years ago.
Sometimes there's also a misunderstanding of metrics that leads devs to not think about performance tuning. Like, "4.2M requests a day" is clearly incorrect at this 50rps benchmark. Traffic is not linear, it's bursty. You will never serve 50rps of human traffic steadily for 24 hours. If you're serving 4.2M requests per day, 90% of it will be in a 12 hour window, peaking at whenever people have lunch or get off work, with a short steep climb leading to a longer tail. So to not crash your site at peak visitorship, you realistically need to handle 300+ rps in order to achieve 4.2M requests per day. (But also that's requests per second... if under load it takes 5 seconds to load your site, you can still serve a larger amount of traffic, it's just slower... so a different benchmark is also "how many requests before the server literally falls over")
The best/fastest web servers today manage >600k rps on a single server [0].
django is on that list, near the bottom, at 15k. Clearly his request response is more complicated than a simple fortune.
There are a few factors I've experience here. Interpreted languages have encouraged development patterns that are slow. Ease of allocating memory has tended to promote its overuse. Coding emphasis has been heavily weighted on developer productivity and correctness of code over lean and fast.
I find that poor web server configurations are pretty common. Smaller shops tend to use off the shelf frameworks rather than roll their own systems. Various framework "production" setups often don't include any caching at all. Static files are compressed and then sent on every request instead of preserving a pool of pre-compressed common pages/assets/responses. It's like the framework creators just assume there's going to be a CDN in front of the system, and so they don't even try to make the system fast.
The latest crop of web devs have very little experience setting up production systems correctly. Companies seem more interested in AWS skills than profiling. Then you have architectural pits like microservices. Since there is so little emphasis on individual system performance it seems that it has become or is becoming a lost skill.
Then there is so much money being thrown at successful SaaS that it just doesn't matter that their infrastructure costs are potentially 50x what they actually need. It seems that the only people squeezing performance out of software are the poor blokes who are scrimping by on shoestrings with no VC money in sight.
It's death by a thousand paper cuts. Lots of things that aren't really that slow in isolation, but in aggregate (or under pressure) they slow down the system and become impossible to measure.
Let's do web development. Since you mentioned payloads: today they're bigger, and often come with redundant fields, or sometimes they're not even paginated! This slows down the database I/O, requires more cache space, slows down the serialisation, slows down compression, requires more memory and bandwidth...
And then you also have the number of requests per page. Ten years ago you'd make one request that would serve you all the data in one go, but today each page calls a bunch of endpoints. Each endpoint has to potentially authenticate/authorise, go to the cache, go to the database, and each payload is probably wasteful too, as in the previous paragraph.
About authentication and authorisation: One specific product I worked on had to perform about 20 database queries for each request just for checking the permissions of the user. We changed the authentication to use a JWT-like token and moved the authorisation part to inside each query (adding "where creator_id = ?" to objects). We no longer needed 20 database queries before the real work.
About 15 years ago I would have done "the optimised way" simply because it was much easier. I would have used SQL Views for complex queries. With ORMs it gets a bit harder, and it takes time to convince the team that SQL views are not just a stupid relic of the past.
Libraries are often an issue that goes unnoticed too. I mentioned serialisation above: this was a bottleneck in a Rails app I worked. Some responses were taking 600ms or more to serialise. We changed to fast_jsonapi and it went to sub-20ms times for the same payload that was 600ms. This app already had responses tailored to each request, but imagine if we were dumping the entire records in the payload...
Another common one is also related to SQL: when I was a beginner dev, our on-premises product was very slow in one customer: some things on the interface were taking upwards of 30 seconds. That wasn't happening in tests or in smaller customers. A veteran sat down by my side and explained query plans, and we brought that number down to milliseconds after improving indexing and removing useless joins.
A few weeks ago an intern tried to put a javascript .sort() inside a .filter() and I caught it. Accidentally quadratic (actually it was more like O(n^4)). He tried to defend himself with a "benchmark" and show it wasn't a problem. A co-worker then ran anonymised production data into it and it choked immediately. Now imagine this happening on hundreds of libraries maintained by voluntaries on Github: https://accidentallyquadratic.tumblr.com
All those things are very simple, and you certainly know all of them. They're the bread and butter of our profession, but honestly somewhere along the way it became difficult to measure and change those things. Why that happened is left as an exercise.
> All those things are very simple, and you certainly know all of them.
I wonder about that. Most of the people I graduated with didn't know about complexity. Some had never touched a relational database, most probably didn't know about views. I doubt most of them knew what serialization mean.
I wonder if it’s a matter of background. I never really had tutorials when starting out. I never had good documentation, even.
(Sorry in advance for the rant)
I also remember piercing together my first programs from other people’s code. Whenever I needed an internet forum I’d build one. Actually, all my internet friends, even the ones who didn’t go into programming, were doing web forums and blogs from scratch!
Today people consider that a heresy. “How dare you not use Wordpress”.
My generation just didn’t care, we built everything from scratch because it was a badge of honor to have something made by us. We didn’t care about money, but we ended up with abilities that pay a lot of cash. People who started programming post the 2000s just didn’t do it...
I think it is visible that I sorta resent the folks (both the younger, and the older who arrived late at the scene) constantly telling me I shouldn’t bother “re-inventing the wheel”. Well, guess what: programming is my passion, fuck the people telling me to use Unity to make my game, or Wordpress to do my blog.
I would guess this is close to a default Apache + mod_wsgi setup, and this is one of the easiest way of hosting Python web apps, so basically achievable by anyone on HN.
I assume (based on 180 req/s for static page) that he is using mpm_prefork, where each Apache child handles a single connection. If he switched to mpm_event, which uses event loop like nginx, ~10k rps should easily be achievable, but I don't think WSGI would work with that.
Yeah it's all default apache + mod_wsgi. This is also my first Django setup and I made it over-complicated as a learning exercise.
mpm_event is something I have not heard of before, thanks for bringing that to my attention.
I generally agree with your comment. But the point here is not this is cutting edge performance or anything, but rather that people on HN know how to set up this type of website.
If you have any tips on how to get ~10k rps (or even a more reasonable improvement) on a £4 a month server, I at least would be very interested in hearing about them.
An awful lot of professional programmers work in such heavyweight contexts that they don't have a good idea of how fast modern hardware can be.
I was talking with an architect at a bank whose team was having trouble getting under a 2-second maximum for page views. They blamed it on having to make TCP requests to other services, and said something like "at a couple hundred milliseconds per request, it adds up quickly!" My head nearly exploded at that. I spun up some quick tests in AWS to show exactly how many requests one could make in 2000 ms. I don't have the numbers handy, but the number is very large.
This junky slice of a server handling full page requests in 20 ms is a fine example to counter thinking that's endemic in enterprise spaces.
I had a discussion with a coworker about something about slow memcpys, that roughly went the same way except... you know, DDR4 RAM has a speed of roughly 40GBps.
Also, that "awful" 1MB memcpy is likely all in L3 cache these days. But even if it weren't in cache, we're talking about an operation that takes 50 microseconds (1MB read + 1MB written == 25microseconds + 25 microseconds).
Given that modern CPUs have like 16+ MBs of L3 cache (and more), and some mainstream desktop CPUs have 1MB of L2 cache... its very possible that this memcpy is far faster in practice than you can imagine.
1MB is big, its a million bytes. But CPUs are on the scale of billions, so 1MB is actually rather small by modern standards. Its surprisingly difficult to get intuition correct these days...
My point isn't that it's impressive in some ultra-tuned performance sense. It's that doing pretty mundane things on pretty basic servers is still very fast compare with a) the past, or b) what a lot of developers are used to professionally. That's why it is interesting to the crowd here.
I mean this is just basic stuff, for me it just sounds like a developer putting a site in production, no auto-scaling, patching ... might as well outsource this to a CDN since there is no database/redis/varnish ...
You don't need £4. As long as you can structure your site as a set of static files with interactivity done client side, even if some of the files change every minute or two, you can serve everything for $0 with Cloudflare in front of any free host. I've served 1M pageviews a day for $0 with Cloudflare + App Engine free tier and there's no reason it wouldn't scale to 100M or beyond.
The nice thing about using Cloudflare in front of a real host is that you can still do dynamic pages. You can instantly purge any file from the Cloudflare cache, so all you have to do is purge anything that changes and your users see the update instantly, while your backend only sees a couple of extra requests.
My site updates some data every 10 minutes or so, I don't think that would work with GitHub Pages. Maybe you could do something with Cloudflare Pages combined with Workers, but Workers have a limited free tier. The normal Cloudflare CDN scales to infinity for free.