My £4 a month server can handle 4.2M requests a day

noduerme · on Sept 8, 2021

When I launched my former bitcoin casino in 2011 (it's gone, but it was a casino where all games, even roulette tables were multiplayer, on a platform built from scratch starting in '08), I handled all web requests through a server in Costa Rica that cost about $6/mo. Where I had a shell corporation for $250/year. Once the front end -- the bullet containing the entire casino code, about 250kb -- loaded, from Costa Rica, and once a user logged in, they were socketed to a server that handled the gaming action in the Isle of Man. Graphics and sounds were still sent from the Costa Rica server. I didn't have a gaming license in the IoM, though - that was around $400k to acquire legally. So I found a former IoM MP who was a lawyer, who wrote a letter to the IoM gov't stating that we didn't perform random calculations on their server, thus weren't a gambling site under IoM law. Technically that meant that no dice roll or card shuffle leading to a gambling win or loss took place on that server. So the IoM server handled the socketed user interactions, chat, hand rotation and tournament stuff. Also the bitcoin daemon and deposits/withdrawals. But to avoid casino licensing, I then set up a VPS in Switzerland that did only one thing: Return random numbers or shuffled decks of cards, with its own RNG. It was a quick backend cUrl call that would return a fresh deck or a dice roll, for any given random interaction on the casino servers. The IoM server would call the Swiss server every time a hand was dealt or a wheel was spun; the user was still looking at a site served from a cheap web host in Costa Rica. And thus... yeah, I guess I handled millions of requests a day over a $6 webserver, if you want to count it that way.

j1elo · on Sept 8, 2021

Focusing on a small but important detail that some have already mentioned but with a more aggressive tone... was your "loophole" system tested in an actual litigation at any point?

What I mean is that this:

> The IoM server would call the Swiss server every time a hand was dealt

might seem like a clever loophole around the laws in IoM, but in reality it sounds to me like the kind of technicalities that wouldn't really pass the reasoning of a human judge, who in their duty of interpreting the law and its intended spirit, would probably consider this an invalid trick and thus that the RNG of the system still resided in IoM, even if technically it didn't.

But of course, none of this matters if the casino never had any legal battle to fight where this idea could be tested in court, which is the equivalent of not being "caught".

noduerme · on Sept 8, 2021

It was never legally tested. It was what I felt I had to do such that the randomness didn't take place on the island. And no randomness ever did. I was in touch with a lot of officers of the large casinos operating out of there at the time, who were curious but skeptical about Bitcoin. I think by the time they realized it was a potentially valuable thing, I had already shut down operations, because I wasn't willing to chase the market into legally gray areas.

carlmr · on Sept 8, 2021

If you go by technicalities you could have used the argument that you're not generating random numbers, only pseudorandom numbers.

nephanth · on Sept 9, 2021

Hum if you're running a casino, I hope you are using at least a bit of [entropy](https://en.m.wikipedia.org/wiki/Entropy_$computing$) . you wouldn't want your shuffles to be predictable...

seanalltogether · on Sept 8, 2021

Yeah, that's like setting up a casino in one location with a permanent phone line open to switzerland to ask where the ball landed on the roulette wheels. Doesn't seem like it would hold up under scrutiny.

noduerme · on Sept 8, 2021

That's exactly what it was... and it really depends on how a locale defines gambling. In Costa Rica, for instance, you can run the game of chance as long as the money isn't landed onshore, because you're just generating random numbers. IoM was slightly different in that they didn't mind you landing cash, they just wouldn't allow you to generate the numbers. So it seemed natural to co-locate, and then Switzerland was a better backstop than either.

j1elo · on Sept 8, 2021

In any case I must admit that the trick, while we agree wouldn't (probably) hold water in court, it might have actually helped (we'll never know) to keep that casino out of some law enforcement watch list... especially if the officers were too overburdened with more important issues / lazy enough / not looking into that kind of activities at the time.

amelius · on Sept 8, 2021

> in reality it sounds to me like the kind of technicalities that wouldn't really pass the reasoning of a human judge

Yes, only big companies can successfully "hack" the law based on its letter, see e.g. tax evasion.

mvc · on Sept 8, 2021

He got an MP on his side which seems to be all it takes these days in the UK.

madaxe_again · on Sept 8, 2021

You’d be amazed at how cheap it is. You can usually get an MP to do your bidding or say your piece in Parliament for a donation of £10k or less. I’ve never heard of anyone paying more.

gnfargbl · on Sept 8, 2021

It would be a contempt of Parliament to either request or agree to such an arrangement [1]. If you have definitive evidence of this having happened in the recent past, you really should pass it on to the Parliamentary Commissioner for Standards.

[1] https://erskinemay.parliament.uk/section/5023/corruption-or-...

madaxe_again · on Sept 8, 2021

This is why it’s a donation to the party - not to them or their office - although they can want a campaign contribution, which again doesn’t count as going to them.

I’ve had an MP literally solicit this from me - he emailed a bunch of local businesses basically stating his price list. He’s no longer an MP, or in politics - but you see this stuff everywhere, all the time. Any member’s bill you see has almost certainly been sponsored.

Grustaf · on Sept 8, 2021

Does that include the lobbyist's cut, or would you go directly to the MP? But how much do you really gain from buying a single MP?

toyg · on Sept 8, 2021

MPs typically move in packs ("areas" or "currents", that may or may not be officially organized around a thinktank or club), so buying the right MP you can actually move several of them. And then your representative goes and trades favors with the colleagues in government - "I'll vote for your controversial legislation if you pull that other lever over there". Representative democracy is a big sausage factory: in go blood and guts, out comes "edible" rules that society will live on.

silverscania · on Sept 8, 2021

The IoM isn't in the UK and nor does it have MPs. Probably just semantics but I have to point that out.

MagnumOpus · on Sept 8, 2021

IoM has MPs, just not UK MPs. The OP almost certainly meant an MP from the Manx Tynwald parliament.

silverscania · on Sept 8, 2021

It doesn't though, that's my point. The Isle of Man has MHKs and ministers, not "MPs": https://en.wikipedia.org/wiki/House_of_Keys.

It's like saying a politician in the UK is a "senator" and expecting people think you are an authority on the subject.

amelius · on Sept 8, 2021

It would be even cooler if you derived the random numbers from a lava lamp visible through a window of the Swiss embassy.

hdjjhhvvhga · on Sept 8, 2021

> we didn't perform random calculations on their server, thus weren't a gambling site under IoM law.

Does it really matter if you get your random number from /dev/urandom or a server in Switzerland?

withinboredom · on Sept 8, 2021

I don’t think it’d hold up in court. This is the same as saying: “I bet you ten bucks when I mail my friend in Switzerland, he will mail back a ten.” The random part is being done in another country, but the betting is still being done wherever you and your friend are.

ahoka · on Sept 8, 2021

Isn't it like betting on games happening in a different country? I'm pretty sure that is still gambling, even if entropy is generated in a different place.

withinboredom · on Sept 8, 2021

Maybe the laws are written that way in IoM just so you don’t have to be registered to bet on Kentucky horse races or sports or whatever? Could be that way on purpose. Interesting loop-hole.

the_absurdist · on Sept 8, 2021

Why doesn't it matter?

bcrosby95 · on Sept 8, 2021

Why go through IoM at all? Why not Costa Rica -> Switzerland, rather than Costa Rica -> IoM -> Switzerland?

noduerme · on Sept 8, 2021

Good question. The original plan was to take normal payment methods (Visa/MasterCard) but it became apparent after Bush passed the ban on online poker that Costa Rica was going to follow suit (or that visa/mc would soon start holding up payments from CR casinos... in which case we might be stuck with debt we couldn't use to pay winnings). Setting up a CR bank acct as a shell requires you to hand over power of attorney to a CR citizen, and also given how shady the entire corporate structure there was and the legal outfit we hired (who thought we could take $100 deposits as payments through their fake real estate portal) I evaluated other routes. These included landing funds in Cyprus and processing through Israeli banks at a 10% markup, and other shady sounding things. I had begun to give up on my little side code project when Bitcoin showed up. The benefit of the Isle of Man was that all funds could be landed there - in a Bitcoin wallet, on a caged dedicated server - without triggering any other financial issues. the only trouble was randomness and gambling.

ricardobeat · on Sept 8, 2021

With all this trouble, why not just set up shop in one of the many places where online gambling is legal?

noduerme · on Sept 8, 2021

I spent 3 years writing the software in my spare time, and launched it on the $20k I had in savings... personally bankrolling the games. When I began the alpha phase, you could still get an "information processing" license to run a casino in Costa Rica that would allow you to host gambling and which the credit card companies tacitly accepted and would do business with. (Those gambling payments to shady jurisdictions are how Paypal got big, and how Elon Musk made his first nine figures). But by 2010/11, all payment processors had abandoned Costa Rica, and there was nowhere in the world offering a license I could come close to affording.

I looked for investment, put together thick books of plans every few months, built 24 games... No one would get behind it, and I couldn't just shelf it, so this was how I launched it without breaking any laws.

A few months in, one existing online casino network offered me $100k to just hand it over to them and then come work for them, but I considered the offer extremely insulting.

zschuessler · on Sept 8, 2021

This story is fascinating. What ended up happening with the project, did you at least break even?

I would read a book about this, fwiw.

noduerme · on Sept 9, 2021

Thanks. I ended up shutting it down around 2013, for a few reasons. One was that large casino software had moved into Bitcoin space and I couldn't compete against companies that were licensed in a way that allowed them to advertise. I wasn't making enough money from it to be worth waking up all hours to handle bug reports and manage the community, while also worrying about my bank account and keeping an eye on Bitcoin fluctuations. And another reason was that the platform / "gaming OS" was built around opening resizable/dockable windows for multiple games inside a single browser tab, meant to run full screen, and written completely in Flash AS3. By 2013 it was apparent that the front end platform was going to have to be completely rewritten for mobile, either native or JS. The site actually had run on iphone and droid before the flash plugin was removed, although it wasn't optimized for touch. But canvas drawing tools to do the kinds of things it did were in their infancy; things like a variable speed embedded video with a 3d ball physics simulation overlayed for the roulette table would probably be hard to accomplish even now. The headaches associated with running tech support, mostly alone on a daily basis, while also holding down a regular job and caring for a sick partner, made the idea of porting half a million lines of code daunting to say the least. And at the end of the day, without an injection of capital to license and white label the software, it wasn't really worth it.

So I went down a rabbit hole of trying to license the software for in-room gaming on cruise ships and Vegas hotel casinos. But Bally and Caesars pretty much dominated that space... if you even want to get a new game certified by the Nevada Gcb you have to put $100k down, non-refundable, for them to review the software (per game) and then they might start a review in a year or two. Bally gets to cut the line. I also had trouble trying to patent my original games. And of course, the world was coming closer to a consensus that Bitcoin would have to be regulated. So one day, I refunded all my players their balances and turned off the lights.

oliv__ · on Sept 9, 2021

I would love to read a blog post about this. Hell I'd read a book: sounds like there might've been much more to this venture than you're letting on

arcticbull · on Sept 8, 2021

Very crafty. Nice. I'm legitimately impressed by every part of that.

wiz21c · on Sept 8, 2021

Does this article count as "Cost Rica leaks ?"

noduerme · on Sept 8, 2021

yeah, I'm blowing my HN cover.

ajnin · on Sept 8, 2021

Thanks for sharing that, I hope it does not land you into trouble. So, how many billions did you make with that site ?

Gepsens · on Sept 8, 2021

You're a fricking genius.

elcomet · on Sept 8, 2021

He's a frickin genius because he found ways to break the law with impunity for his own profit ?

I'm surprised by HN sometimes.

noduerme · on Sept 8, 2021

Uh, excuse me, but this whole story demonstrates that I went to extreme lengths to never break any laws, no matter how large or small the jurisdiction. Had I been willing to bend laws, I'd surely be a lot richer today.

skrebbel · on Sept 8, 2021

Come on, cURL'ing to a foreign server to get a random number and not just reading /dev/urandom is logically identical. It's a hack, just like calling into GPL'ed code over HTTP is a hack to avoid "linking" the GPL'ed code. It doesn't really suddenly turn a site from gambling site into a non-gambling site.

I mean, I have mad respect for the hustle with the former MP etc. I agree with what you say in that you did not actually break the law - because you found a loophole (made a loophole? hustled it? again, I'm impressed). You ran a gambling site from IoM though :-)

noduerme · on Sept 8, 2021

What's so funny is - I was in a peer group of early btc founders who were starting random Bitcoin hustles and didn't care at all about laws. No one at that time had even approached the IoM about whether BTC was currency - nor would they have, because trying to find legal haven was the furthest thing from their minds. And so not surprisingly, the IoM didn't have a ready answer when I asked them whether gambling with Bitcoin was actually gambling. But the letter of the law was that gambling occurs _in the jurisdiction where_ the random chance takes place. So it really was different from hitting a local RNG, legally.

It added a nice little feature too, which was that every spin and deck could be stored on a separate server that would show them all at the end of the day. This was a little before "proven randomness" took off in btc casinos, but I made the RNG reports available daily for analysis (without explaining the whole infrastructure, obviously).

[edit] I just want to say that yes, you're obviously right, and yeah, I ran a casino from the IoM... without anyone knowing if that was okay or not... and it was just a moment in my life. of which I'm proud, I guess. I was living illegally in a small apartment in Alhama de Granada after violating my EU visa. hah. It was a great, great piece of software and I don't know if I'll ever write anything that good again. But it didn't really change my life or anything.

noduerme · on Sept 8, 2021

@kapep don't know why I can't respond directly..

>>> where the random number is _used_

This was my main concern, and it was exactly what I needed a lawyer to sign off on before I set up a rig there. I was told that the gambling laws applied to where the chance took place, not where the money is distributed... after all, the whole thing with the IoM and the reason it's allowed to be a tax haven is that lots of people need to move money around without a lot of questions. But they defined gambling in this specific way and if only the money moved but the dice roll didn't take place on their shores, then it wasn't gambling under their jurisdiction. What you bring up was the conversation I had before locating there.

skrebbel · on Sept 8, 2021

> @kapep don't know why I can't respond directly..

HN has a silly but effective piece of anti-flamewar UX which is that it hides the reply link in certain cases (some function of thread depth + amount of comments by you i think). However you can still reply by opening the comment on question (click on the timestamp, ie the "1 hour ago" link). Maybe you hit that.

skrebbel · on Sept 8, 2021

Fwiw I think this is a great response and I'm happy you did not feel attacked by my comment because that was not my intent. Thanks for sharing & all the best!

kapep · on Sept 8, 2021

> But the letter of the law was that gambling occurs _in the jurisdiction where_ the random chance takes place

"where the random chance takes place" could easily be interpreted as where the random number is _used_ and not where it has been created. Creating random numbers is not "chance" per se (in this context). Using random numbers to e.g. determining a winner would be the chance in my opinion.

dcposch · on Sept 8, 2021

Are you actually an LA taxi driver, like it says in your bio?

What a story.

noduerme · on Sept 8, 2021

I used to be. I got a web design job in San Francisco when I was 18 out of high school but I burned out and quit when I was 21. Drove a taxi so I could write and play music. Did it for a couple years. It was a good education. I feel old. Taxis don't even exist anymore.

krzrak · on Sept 8, 2021

> Please write a book. Or an article, at least.

skinkestek · on Sept 8, 2021

> Come on, cURL'ing to a foreign server to get a random number and not just reading /dev/urandom is logically identical. It's a hack, just like calling into GPL'ed code over HTTP is a hack to avoid "linking" the GPL'ed code. It doesn't really suddenly turn a site from gambling site into a non-gambling site.

Around here (and probably elsewhere) bars aren't allowed to make wine stronger by adding spirit.

So if you mix a drink from wine (or similar) and spirit in that order you might lose your license.

Put the spirit in the glass first and all is ok.

I guess at this point it is just a shibboleth that inspectors use to see if the bar has read the rules at all, kind of like the no brown m&ms.

Point is though: rules matter, you can lose your license over it.

the_absurdist · on Sept 8, 2021

Logically identical is not legally identical.

rualca · on Sept 8, 2021

> (...) but this whole story demonstrates that I went to extreme lengths to never break any laws,

No, not really. It shows that you went to great lengths to find ways to exploit a loophole where, even though you are clearly breaking the spirit of the law, you argue that it doesn't break the letter of the law.

I get it that you have a vested interest in keeping up the plausible deniability thing, but you know it and everyone knows it that you went through great lengths to put up a tech infrastructure which meets absolutely no requirement other than exploiting a loophole.

I mean, you explicitly expressed your personal concerns in this very discussion regarding what you personally chose to describe as testing "legally grey areas". Who do you expect to fool?

noduerme · on Sept 8, 2021

I just really wanted to launch my software. Which when I began coding it, seemed completely legal and possible in Costa Rica. As the laws started to change - and even before Bitcoin came on the scene - I looked for how to do it without running afoul of anything. So it's not like I set out with a plan to exploit all the legal loopholes in the world, I just adapted my code and split it apart as necessary. I never even meant to take Bitcoin, let alone make it the only currency in the casino. It was just the only option if I wanted to launch. I had very little money and had written a giant gaming platform. I wanted it to see the light of day.

rualca · on Sept 8, 2021

> I just really wanted to launch my software.

Come on. Cut the bullshit.

> Which when I began coding it, seemed completely legal and possible in Costa Rica. As the laws started to change - and even before Bitcoin came on the scene - I looked for how to do it without running afoul of anything. So it's not like I set out with a plan to exploit all the legal loopholes in the world, I just adapted my code and split it apart as necessary.

You are clearly and unequivocally stating that you set to exploit all the legal loopholes once your "grey area" was made black and white in Costa Rica.

Please, spare the thread from all that nonsense. You're not fooling anyone.

noduerme · on Sept 8, 2021

What you say I stated "unequivocally" is precisely the opposite of what I stated. I was quite clear - what I said was that I didn't want to pursue the business into a legal gray area. Therefore, I did what I had to do to keep it legal, including turning away 95% of the hits, retaining lawyers, and not violating any local laws. Moreover, it would have been perfectly legal to run the whole thing from Costa Rican servers at any point in time. I just didn't see a future in it, because they didn't offer full licensing, and the credit card companies pulled out while the software was in alpha. The IoM paper was intended as a step on the road to licensing.

I didn't exploit anything. I worked within the legal options that were available. In any case, I don't understand the accusation.

My only regret is that I didn't have the capital to buy a full license in the IoM or Malta outright. But the truth is, I wrote the whole thing from scratch and I was determined to launch it. You're free to your opinions, but you ought to avoid judging people's intentions while misreading their words.

_6pvr · on Sept 8, 2021

Stop speaking for other people.

hnbad · on Sept 8, 2021

Sure, you're not breaking any laws, just intentionally violating the spirit of the law for personal gain.

This isn't a court of law. I can understand trying to avoid language that makes it sound like you may have been in violation of the law to a judge or jury, but you literally described that what you did was intended to keep operating a service that had been banned by using an absurd technicality in the definition of the ban.

I'm honestly surprised it worked (though having a former MP of the tiny nation you were operating in as a lawyer might have helped) considering that your service still facilitated online gambling directly and was advertised as such, despite the randomness source being on a remote server rather than local.

In other words, using a non-local randomness source (like a remote server you cURL into or a webcam pointed at a bunch of lava lamps) is functionally indistinguishable from a local dice roll or other source of entropy. This "hack" is so flimsy it likely wouldn't hold up in court in a nation that is actually interested in pursuing such violations that has a population larger than a small city.

noduerme · on Sept 8, 2021

I suspect your anger derives from the fact that this would not be possible to get away with now, in any way shape or form, only ten years later. And truly the world is way more locked down now than it was then when people were like, "Bitcoin? What's that? you want to pay me to write a legal brief?" so, yeah. I feel sorry for kids ten years younger than me.

When I did it, the only thing I was really afraid of was getting arrested if/when I stepped back on American soil. There was redundancy so I could run the whole thing in Costa Rica if I had to cold shutdown the IoM servers. And the coin was in private wallets, mostly on my laptop. But I was very concerned about breaking any laws, anywhere. I was the only one to implement ID verification and fully block American players.

Call it a hack or whatever, they wanted my business and I needed their servers, and I split up my code so it would be legal according to their laws. Not too different from what a lot of companies do.

noduerme · on Sept 8, 2021

>>> I'm not angry, I'm hostile.

Hah! This made me laugh. Ok so not FOMO..(trust me, wasn't worth it except for the thrills).. why hostile? I was born in a Vegas family. My uncles all worked as blackjack dealers and pit bosses. When I was 7 they used to leave me in a corner of the casino for hours and tell me to stand there while my parents went and gambled. I taught myself to code there on a TRS-80 Model 100 in basic and practically the first thing I wrote was a slot machine. My view is that adults want to go gamble and that's their decision. I never took a dime from anyone I saw with a gambling problem... I would ban them from my site if they seemed addicted. I like to gamble myself. I count cards. Like everyone on my site... because my decks were single shuffle. So don't be so judgmental. I didn't do it for the money. I did it because I love the games.

hnbad · on Sept 8, 2021

I'm not angry, I'm hostile. That you think the only possible position from which to take issue with what you did is FOMO speaks volumes.

I'm not arguing that you violated any laws. You made it very clear that you went to great lengths to avoid doing anything that could have resulted in consequences to yourself.

EDIT: Since HN's rate limiting won't let me reply for a few hours, I'll just address the replies inline:

I'm not jealous. I'm sure noduerme made a sizable chunk of money with the whole operation at the time but their profile says they're now working as a taxi driver and sold all their bitcoin before the peak. They probably have a lot of other interesting stories to tell and that's nice. But dismissing any hostility or criticism as jealousy is thought terminating and frankly below even HN's standards.

Based on their backstory in the replies, I can see where their attitude comes from, but they severely underestimate how big of a problem gambling addiction is and how much of the profit of the gambling industry relies on it.

It's nice if the casino their parents worked at turned away obvious addicts but the word "obvious" is doing a lot of work here and there are also clear business reasons you don't want obvious addicts in your establishment the same way bars will be happy to have repeat customers buying drinks for five hours every day but will turn them away if they get blackout drunk or unsightly. "Not doing it for the money" may give you a clean conscience but it doesn't change the consequences of your actions.

It's also important to point out that online gambling is by its nature functionally anonymous for the gambler (even if you record IDs for legal reasons). The online casino isn't going to turn away the addict until they can no longer pay or have to resort to fraud to keep up the habit. And even if the casino implements limits, the proliferation of online casinos makes it considerably easier to go hopping than if you have to physically drive somewhere.

Gambling addiction not only ruins the lives of the addict but also impacts their friends and family, not just financially. It's true that not every person who gambles is an addict but the line between an expensive hobby and a managed addiction is hard to draw until you undeniably cross it.

But if you need a comment on HN to explain to you why gambling and especially online gambling is bad, a comment on HN isn't going to be enough to convince you.

noduerme · on Sept 8, 2021

Ok, I respectfully understand where you're coming from, and I've struggled with gambling addiction myself. I personally do not think it's immoral to offer games of chance to people, as long as they understand the odds and you're not cheating them. And believe me, running a small online casino mostly by myself with my own bankroll was literally setting alarms all night waking me up when someone was killing the tables and potentially going to bankrupt me. I got to know my big players (most of whom went on to become Bitcoin millionaires, since the early adopters were the only ones gambling on Bitcoin casinos in 2011)... but beyond that, I really don't think offering gambling is immoral as long as everyone knows what they're signing up for. I've dealt cards for a living, too. I don't drive a taxi anymore. But I've seen all sides of life. I don't think you can judge so easily. Yeah, big corporations suck and they screw people into debt, and gambling addiction does ruin peoples' lives, but I knew my players, and I don't think what I did personally hurt anybody. They came together to enjoy games, and yeah it was for real money, but it was also a community and they were there for fun - they could have gambled for a lot more money on other sites. One player built a puzzle in his escape room in Amsterdam in honor of / based on a game I designed. Like a bar, this is something people do for enjoyment, and you don't need to stand out there with a sign saying we're all going to hell. But I do get it and I'm not a fan of the companies that take advantage of human weakness and shake the dimes out of people's pockets. I was just trying to have a good time and give other people a good time.

Total profit from 2 years running the site? About $50k. It was a hobby. I never quit my job. I also turned away 95% of the hits because they were coming from America.

[edit] I should add that I strongly advised other BTC site owners, especially casino owners, to follow certain guidelines, and watched one of them who I had told to be careful launch, make about $1M with one game on a crappy website, and get jailed within a year. That wasn't the trajectory I was interested in.

noduerme · on Sept 9, 2021

Also I want to add... I had a feature on the site from day one that would let players set their own deposit limits through any date they chose. Once set, the limits could not be raised or revoked through that date, and any coin they sent beyond the limit would automatically be sent back. This was prominently displayed on the website along with an entire section of problem gambling resources. Some users did use it. Others I would put into the system involuntarily. The average rake on my non-poker games was about 2.5%, and some of the puzzle games I designed had a theoretical >100% payout if you could, e.g., solve a randomly spun Rubik's cube consistently in under 60 seconds. (No one ever hit over 100% on that one over time, though. If I had ever come across anything like that I would have written a bot to solve it... and I was waiting for a player to do so, so I could make them a partner).

the_absurdist · on Sept 8, 2021

It's not anger. It's jealousy.

The same jealousy that's called "opinionated" in software development, but really means "Doing things differently than how I do them threatens me, because my sense of superiority is rooted in how I do things."

Jamieee · on Sept 8, 2021

Corporations do not so dissimilar things with teams of accountants and lawyers. Via tax planning, international subsidiaries and all other sorts of loopholes.

hnbad · on Sept 8, 2021

Are you trying to make an argument?

I'm equally hostile to corporations doing that. I don't recall the HN comment threats about Google doing the kind of things you describe being full of replies congratulating their ingenuity.

Big corporation doing bad thing is bad doesn't mean much smaller corporation doing bad thing is okay, it means we should work on preventing that bad thing and if we seemingly can't we should reconsider the underlying systemic conditions that enable it.

noir_lord · on Sept 8, 2021

You can respect the ingenunity if not the morality of it.

His solution was a clever hack because he worked around the law without apparently breaking it directly - clever.

That said gambling systems are specifically one class of software I refuse to work on.

hnbad · on Sept 8, 2021

I think the lawyer who convinced the gov't that the operation was legal being a former MP and this taking place in a nation of less than 100,000 people is a crucial element to the story.

Incidentally both gambling AND fintech (including crypto) are on the list of industries I refuse to do work with. So I guess BTC gambling would have been off the table for two reasons.

Cthulhu_ · on Sept 8, 2021

No laws were broken, else they'd be in jail. It's how the big companies that HN worships get away with not paying taxes.

Morally dubious? For sure.

maccard · on Sept 8, 2021

> No laws were broken, else they'd be in jail.

Just because OP isn't in jail doesn't mean they didn't break the law. Laws are broken all the time.

noduerme · on Sept 8, 2021

No. I'm not in jail because I did everything by the book. Everyone else thought it was the wild west.

_6pvr · on Sept 8, 2021

> No laws were broken, else they'd be in jail.

Generous to assume that all that break laws go to jail.

mellavora · on Sept 8, 2021

I respect both the legality and the morality of it.

First, he did comply with all applicable law. No laws were broken.

Second, he did not break the spirit of the law. The law clearly allows gambling from the Isle of Man.

Third, he did not conflate the law with morality. What is the morality of a 400,000 GBP 'licensing' fee? Laws around licensing are weird. Another poster mentioned that pouring wine than liquor into a glass is illegal, but liquor then wine is fine. Not much moral sense in that reg.

noduerme · on Sept 8, 2021

Thank you. And yes, it's incredible the morally bankrupt things you see under color of law if you try getting into that business. Side story, I was once invited to be a guest of the now deposed dictator of Ghana when I casually floated to his lawyer who I met in a casino in Prague the idea of making them the next offshore gambling capital of the world. I did a little research on the country and then respectfully declined. In my view, I never broke a law or did anything immoral (since I was extremely transparent about the odds of every game and I made sure to only take adults from countries where gaming online was legal... who wanted to play, and knew the rules). And not conflating morality with legality -- while respectfully considering both -- is just a precondition to being an individual in a complicated world.

concordDance · on Sept 8, 2021

"the law"?

There is no "The law", just a bunch of different jurisdictions with different laws. He didn't break Switzerland law. Does Switzerland's law not matter for some reason?

dna_polymerase · on Sept 8, 2021

The breaking the law consists entirely of not acquiring a license. Otherwise everything they did was perfectly legal.

noduerme · on Sept 8, 2021

finally, someone understands :)

_6pvr · on Sept 8, 2021

Is your assertion that, because you deem his actions immoral, they did not require intelligence?

nix23 · on Sept 8, 2021

>He's a frickin genius because he found ways to break the law with impunity for his own profit ?

Obviously he did not break the law, finding holes like that is the number one job of a tax-lawyer.

_8j7i · on Sept 8, 2021

Yes. Gambling should be legal anyway.

noduerme · on Sept 8, 2021

Sarcasm much? I'm a full blown idiot. I was just solving one problem at a time.

Gepsens · on Sept 8, 2021

Erm no, I really liked your post

DrJones1098 · on Sept 8, 2021

Same. It was very interesting.

noduerme · on Sept 8, 2021

thanks.

nostrademons · on Sept 7, 2021

People tend to severely underestimate how fast modern machines are and overestimate how much you need to spend on hardware.

Back in my last startup, I was doing a crypto market intelligence website that subscribed to full trade & order book feeds from the top 10 exchanges. It handled about 3K incoming messages/second (~260M per day), including all of the message parsing, order book update, processing, streaming to websocket connections on any connected client, and archival to PostGres for historical processing. Total hardware required was 1 m4.large + 1 r5.large AWS instances, for a bit under $200/month, and the boxes would regularly run at about 50% CPU.

mynameisash · on Sept 7, 2021

What Andy giveth, Bill taketh away.[0]

I'm more than a little annoyed that so much data engineering is still done in Scala Spark or PySpark. Both suffer from pretty high memory overhead, which leads to suboptimal resource utilization. I've worked with a few different systems that compile their queries into C/C++ (which is transparent to the developer). Those tend to be significantly faster or can use fewer nodes to process.

I get that quick & dirty scripts for exploration don't need to be super optimized, and that throwing more hardware at the problem _can_ be cheaper than engineering time, but in my experience, the latter ends up costing my org tens of millions of dollars annually -- just write some code and allocate a ton of resources to make it work in a reasonable amount of time.

I'm hopeful that Ballista[1], for example, will see uptake and improve this.

[0] https://en.wikipedia.org/wiki/Andy_and_Bill%27s_law

[1] https://github.com/apache/arrow-datafusion/tree/master/balli...

Spooky23 · on Sept 8, 2021

I get a kick out of stuff like this - I’m mostly an exec these days, but I recently prototyped a small database system to feed a business process in SQLite on my laptop.

To my amusement, my little SQLite prototype smoked the “enterprise” database. Turns out that a MacBook Pro SSD performs better than the SAN, and the query planner needs more tlc. We ended up running the queries off my laptop for a few days while the DBAs did their thing.

PhilipTai · on Sept 8, 2021

Right. Local storage is much more performant and cost effective than network storage. I tried to run some iops sensitive workload on cloud. It turns out I need to pay several thousands dollar per month for the performance I can get on a $100 nvme ssd.

noduerme · on Sept 8, 2021

I'm working on a web app right now that does a lot of heavy/live/realtime processing with workers. The original thought was to run those workers on the node servers and stream the results over a socket, charging by the CPU/minute. But it surprised me that the performance looks a lot better up to about 16 workers if the user just turns over a few cores to local web workers and runs it on their own box. As long as they don't need 64 cores or something, it's faster to run locally, even on an older machine. Thread transport is slow but sockets are slower at a distance; the bottlenecks are in the main thread anyway. So I've been making sure parts are interchangeable between web workers and "remote socket web workers" assigned to each instance of the app remotely.

KptMarchewa · on Sept 8, 2021

Who the fuck thinks it's a good idea to run database on networked storage?

Spooky23 · on Sept 8, 2021

You might have heard of this little company in Seattle called “Amazon”.

nexuist · on Sept 8, 2021

Like the river? They're gonna need to fight hard for that SEO.

skrtskrt · on Sept 8, 2021

Isn't that what's happening if you use any managed database product? They have probably colocated everything as much as possible and used various proprietary tricks to cut latency, but still.

panzagl · on Sept 8, 2021

The same people who will run a clustered database on VMs.

Waterluvian · on Sept 8, 2021

What reminded me of this the other day is how MacOS will grow your cursor if you “shake” it to help you find it on a big screen.

I was thinking about how they must have a routine that’s constantly taking mouse input, buffering history, and running some algorithm to determine when user input is a mouse “shake”.

And how many features like this add up to eat up a nontrivial amount of resources.

nostrademons · on Sept 8, 2021

That particular example seems like something that's probably a lot cheaper than you'd initially think. The OS has to constantly take mouse input anyway to move the pointer and dispatch events to userspace. It also needs to record the current and new position of the mouse pointer to dispatch the events. Detecting whether the mouse is being "shaken" can be done with a ring buffer of mouse velocities over the last second or two of ticks. At 60 fps, that's about 120 ints = 480 bytes. Since you don't need to be precise, you can take Manhattan distance (x + y) rather than Euclidean distance (sqrt(x^2 + y^2)), which is a basically negligible computation. Add up the running total of the ring buffer - and you don't even need to visit each element, just keep a running total in a variable, add the new velocity, subtract the velocity that's about to be overwritten - and if this passes a threshold that's say 1-2 screen widths, the mouse is being "shaken" and the pointer should enlarge. In total you're looking at < 500 bytes and a few dozen CPU cycles per tick for this feature.

headmelted · on Sept 8, 2021

Or, alternatively, the engineer that worked on this at Apple has just read the above as another way of solving the problem and is throwing this on their backlog for tomorrow..

Waterluvian · on Sept 8, 2021

Thanks for the thoughtful analysis and napkin math. You may very well be right. I wonder if this is true in practice or if they suffer from any interface abstractions and whatnot.

rollcat · on Sept 8, 2021

On every modern (past few decades) platform, the mouse cursor is a hardware sprite with a dedicated, optimized, *fast* path thru every layer of the stack, just to shave off a few ms of user-perceived latency. Grab a window and shake it violently, you'll notice it lags a few pixels behind the cursor - that's the magic in action.

In some places there's no room left for unnecessary abstractions, I can imagine most of the code touching mouse / cursor handling is in that category.

Waterluvian · on Sept 9, 2021

I wish this was true with Ubuntu but every time I boot up is a dice roll on if my mouse will be sluggish and low -FPS or not.

metafunctor · on Sept 8, 2021

If you shake the cursor up and down, it doesn't grow.

Looks like it's measuring something like number of direction changes in relation to distance traveled; ignoring the y axis completely.

spfzero · on Sept 8, 2021

That's some actual engineering there, understanding the problem you're trying to solve, instead of solving problems you don't even have. Nice.

WanderPanda · on Sept 8, 2021

Why not use an exponential filter that only uses a single variable?

kragen · on Sept 8, 2021

The running-sum-difference approach suggested above is a box filter, which has the best possible noise suppression for a given step-function delay, although in the frequency domain it looks appalling. It uses more RAM, but not that much. The single-pole RC filter you're suggesting is much nicer in the frequency domain, but in the time domain it's far worse.

WanderPanda · on Sept 8, 2021

Is it „far worse in the time domain“ due to its infinite impulse response?

kragen · on Sept 8, 2021

Not really? Sort of? I don't really have a good answer here. It depends on what you mean by "due to". It's certainly due to the impulse response, since in the sense I meant "far worse" the impulse response is the only thing that matters.

Truncating the impulse response after five time constants wouldn't really change its output noticeably, and even if you truncated it after two or three time constants it would still be inferior to the box filter for this application, though less bad. So in that sense the problem isn't that it's infinite.

Likewise, you could certainly design a direct-form IIR filter that did a perfectly adequate job of approximating a box filter for this sort of application, and that might actually be a reasonable thing to do if you wanted to do something like this with a bunch of op-amps or microwave passives instead of code.

So the fact that the impulse response is infinite is neither necessary nor sufficient for the problem.

The problem with the simple single-pole filter is that by putting so much weight on very recent samples, you sort of throw away some information about samples that aren't quite so recent and become more vulnerable to false triggering from a single rapid mouse movement, so you have to set the threshold higher to compensate.

Waterluvian · on Sept 9, 2021

Reading all of you sounding super smart and saying stuff I don’t recognize (but perhaps utilize without knowing the terms) used to make me feel anxious about being an impostor. Now it makes me excited that there’s so many more secrets to discover in my discipline.

kragen · on Sept 9, 2021

Yeah, DSP has a lot of really awesome stuff in it! I recommend https://www.dspguide.com/.

It turns out that pretty much any time you have code that interacts with the world outside computers, you end up doing DSP. Graphics processing algorithms are DSP; software-defined radio is DSP; music synthesis is DSP; Kalman filters for position estimation is DSP; PID controllers for thermostats or motor control are DSP; converting sonar echoes into images is DSP; electrocardiogram analysis is DSP; high-frequency trading is DSP (though most of the linear theory is not useful there). So if you're interested in programming and also interested in graphics, sound, communication, or other things outside of computers, you will appreciate having studied DSP.

hoseja · on Sept 9, 2021

Don't worry, this is a domain of magic matlab functions and excel data analysis and multiply-named (separately invented about four times on average in different fields) terms for the same thing, with incomprehensible jargon and no simple explanation untainted by very specific industry application.

Karrot_Kream · on Sept 8, 2021

Agreed, a single-pole IIR filter would be cheaper and have a better frequency domain.

hoseja · on Sept 8, 2021

Please elaborate.

kragen · on Sept 9, 2021

For both alternatives we begin by computing how far the mouse has gone:

    int m = abs(dx) + abs(dy);   // Manhattan distance

For the single-pole RC exponential filter as WanderPanda suggested:

    c -= c >> 5;                 // exponential decay without a multiply (not actually faster on most modern CPUs)
    c += m;

For the box filter with the running-sum table as nostrademons suggested:

    s += m;                      // update running sum
    size_t j = (i + 1) % n;      // calculate index in prefix sum table to overwrite
    int d = s - t[j];            // calculate sum of last n mouse movement Manhattan distances
    t[j] = s;
    i = j;

Here c, i, s, and t are all presumed to persist from one event to the next, so maybe they're part of some context struct, while in old-fashioned C they'd be static variables. If n is a compile-time constant, this will be more efficient, especially if it's a power of 2. You don't really need a separate persistent s; that's an optimization nostrademons suggested, but you could instead use a local s at the cost of an extra array-indexing operation:

    int s = t[i] + m;

Depending on context this might not actually cost any extra time.

Once you've computed your smoothed mouse velocity in c or d, you compare it against some kind of predetermined threshold, or maybe apply a smoothstep to it to get the mouse pointer size.

Roughly I think WanderPanda's approach is about 12 RISCish CPU instructions, and nostrademons's approach is about 18 but works a lot better. Either way you're probably looking at about 4-8 clock cycles on one core per mouse movement, considerably less than actually drawing the mouse pointer (if you're doing it on the CPU, anyway).

Does that help?

Const-me · on Sept 8, 2021

> they must have a routine that’s constantly taking mouse input

Possible but unlikely. Well-written desktop software never constantly taking input, it's sleeping on OS kernel primitives like poll/epoll/IOCP/etc waiting for these inputs.

Operating systems don't generate mouse events at 1kHz unless you actually move the mouse.

lilyball · on Sept 8, 2021

“Constantly taking” is not the same thing as “constantly polling”. The ring buffer approach works identically in the event-driven approach, you just need to calculate the number of “skipped” ticks and zero them out in the ring buffer.

hypertele-Xii · on Sept 8, 2021

Simply moving my USB mouse consumes 10% of my CPU. Computers these days...

Cthulhu_ · on Sept 8, 2021

Back in the day, moving your mouse made things look like they were processing faster.

codesections · on Sept 8, 2021

Not just _look like_ — on Windows 95, moving the mouse _actually_ made things process faster, for fairly bizarre reasons.

Source: https://retrocomputing.stackexchange.com/questions/11533/why...

trisiak · on Sept 8, 2021

What's behind the mouse cursor while you're doing it? Could it be the UX/UI layer keeping state up-to-date?

Other possibility, do you have a gaming mouse with 1000Hz polling rate configured?

hypertele-Xii · on Sept 8, 2021

Yeah, it's a high quality mouse. But the only excuse for this is it's slightly cheaper to make everything USB. PS/2 worked much better. It was limited to 200Hz but needed no polling. Motherboards just stopped providing the port.

6510 · on Sept 8, 2021

If the computer has to do anything at all it ads to complexity and it isn't doing other things. One could do something a bit like blue screening and add the mouse pointer to the video signal in the monitor. For basic functionality the computer only needs to know the x and y of clicks. (it could for laughs also report the colors in the area) Hover and other effects could be activated when [really] needed. As a bonus one could replace the hideous monitor configuration menus with a point and click interface.

serendipitous · on Sept 11, 2021

This polling is not done by the CPU, this is a common misconception. In a typical modern system the only polling that happens with USB is done by the USB host controller and only when there is actual data the host controller generates interrupts for the CPU to process it. Obviously, when you configure the mouse at higher frequency you will get more interrupts and hence higher CPU usage but that has nothing to do with the polling.

doublepg23 · on Sept 8, 2021

Gaming or off the shelf prosumer mobos still have PS/2 on them occasionally. Although, it's probably just using a converter to USB anyway.

voltagex_ · on Sept 8, 2021

Your report rate may be too high.

hypertele-Xii · on Sept 8, 2021

What does that mean?

voltagex_ · on Sept 9, 2021

Sorry - also known as polling rate: https://blog.codinghorror.com/mouse-dpi-and-usb-polling-rate...

softveda · on Sept 8, 2021

And yet MacOS doesn't allow to change the cursor color. On my Windows 10 desktop I set the cursor color to a slightly larger size and yellow color. So much easier to work with.

Grustaf · on Sept 8, 2021

That seems like an incredibly cheap feature. The mouse pointer png probably costs a lot more than the shake detector.

b9a2cab5 · on Sept 7, 2021

Abstractions almost always end up leaky. Spark SQL, for example, does whole-stage codegen which collapses multiple project/filter stages into a single compiled stage, but your underlying data format still needs to be memory friendly (i.e. linear accesses, low branching, etc.). The codegen is very naive and the JVM JIT can only do so much.

What I've seen is that you need people who deeply understand the system (e.g. Spark) to be able to tune for these edge cases (e.g. see [1] for examples of some of the tradeoffs between different processing schemes). Those people are expensive (think $500k+ annual salaries) and are really only cost effective when your compute spend is in the tens of millions or higher annually. Everyone else is using open source and throwing more compute at the problem or relying on their data scientists/data engineers to figure out what magic knob to turn.

[1]: https://www.vldb.org/pvldb/vol11/p2209-kersten.pdf

paulluuk · on Sept 8, 2021

What's wrong with relying on data engineers for data engineering?

disgruntledphd2 · on Sept 8, 2021

Spark is very very odd to tune. Like, it seems (from my limited experience) to have the problems common to distributed data processing (skew, it's almost always skew) but because it's lazy, people end up really confused as to what actually drives the performance problems.

That being said, Spark is literally the only (relatively) easy way to run distributed ML that's open source. The competitors are GPU's (if you have a GPU friendly problem) and running multiple Python processes across the network.

(I'm really hoping that people will now school me, and I'll discover a much better way in the comments).

b9a2cab5 · on Sept 10, 2021

Data engineers should be building pipelines and delivering business value, not fidgeting with some JVM or Spark parameter that saves them runtime on a join (or for that matter, from what I've seen at a certain bigco, building their own custom join algorithms). That's why I said it's only economical for big companies to run efficient abstractions and everyone else just throws more compute at the issue.

SNosTrAnDbLe · on Sept 8, 2021

I work in the Analytics space and been mostly on Java and I am so glad that other people feel the same. At this point, people have become afraid of suggesting something other than Spark. I see something written in Rust to be much better at problems like this. I love the JVM but it works well with transactional workloads and starts showing its age when its dealing with analytical loads. The worst thing is then people start doing weak references and weird off the heap processing usually by a senior engineer but really defeats the purpose of the JVM

wiz21c · on Sept 8, 2021

I guess your company is running on Java and running something else would cost a lot in training, recruiting, understanding, etc. But down the line, defeating the JVM will be understood only by the guy who did it... Then that guy will leave... Then the newcomers will rewrite the thing in Spark 'cos it will feel safer. Rinse-repeat...

(I'm totally speculating, but your story seems so true that it inspired me :-)

SNosTrAnDbLe · on Sept 8, 2021

Some of it is what you mentioned about training and hiring costs that but mostly its this creation of the narrative that it will scale someday in the future. This is usually done by that engineer(s) and they are good at selling so a different opinion is ignored or frowned upon by the leadership.

I have now seen this anti pattern in multiple places now

robertlagrant · on Sept 8, 2021

> I love the JVM but it works well with transactional workloads and starts showing its age when its dealing with analytical loads.

This is interesting. Can you elaborate a bit?

SNosTrAnDbLe · on Sept 8, 2021

Analytical loads deal with very large datasets in the order of terabytes even after you compress them. These workloads dont change much so keeping them in the heap eventually results in long processing pauses because JVM tries to recover this memory. However, in most cases, this data is not meant to be garbage collected. For transactions, once you have persisted the data, it should be garbage collected so the pattern works. There are a lot of other aspects that I can probably think of but the above one is the most important in my mind.

willvarfar · on Sept 8, 2021

Yes, the whole idea of sending “agents” to do processing is poor performing and things like snowflake and Trino, where queries go to already deployed code, run rings around it.

Furthermore, pyspark is by far the most popular and used spark, and it’s also got the absolute world-worst atrocious mechanical sympathy. Why?

Developer velocity trumps compute velocity any day?

(I want the niceness of python and the performance of eg firebolt. Why must I pick?)

(There is a general thing to get spark “off heap” and use generic query compute on the spark sql space, but it is miles behind those who start off there)

iaw · on Sept 8, 2021

Could you elaborate on other systems besides Ballista? (which looks great btw, thank you for sharing)

mynameisash · on Sept 8, 2021

U-SQL on Azure Data Lake Analytics[0] and ECL from HPCC Systems[1] are two I have fairly extensive experience with. There may be other such platforms, but those and Ballista are the three on my radar.

[0] https://github.com/MicrosoftDocs/azure-docs/blob/master/arti...

[1] https://wiki.hpccsystems.com/pages/viewpage.action?pageId=28...

disgruntledphd2 · on Sept 8, 2021

ECL is the devil. There's so little documentation on it, that you basically need to work for LNRS to actually get any experience with the damn thing.

If it had an SQL layer, I'd spend time evangelising it, but it's not worth learning another query language for.

There exists a world where it got open-sourced before Hadoop was built, and in that world it's probably everywhere.

danudey · on Sept 7, 2021

A lot of that is due to absolutely lousy code.

We had a system management backend at my last company. Loading the users list was unbearably slow; 10+ seconds on a warm cache. Not too terrible, except that most user management tasks required a page reload, so it was just wildly infuriating.

Eventually I took a look at the code for the page, which queried LDAP for user data and the database for permissions data. It did:

    get list of users
    
    foreach user:
        get list of all permissions
        filter down to the ones assigned directly to the user
    
    foreach user:
        get list of all groups
        foreach group:
            get list of all permissions
            filter down to the ones assigned to the group
        filter down to the ones the user has

I'm no algorithm genius, but I'm pretty sure O(n^2+n^3) is not an efficient one.

I replaced it with

    get list of all users
    get list of all groups
    get list of all permissions

    <filter accordingly>

Suffice to say, it was a lot more responsive.

Also worth noting was that fetching the user list required shelling out to a command (a python script) which shelled out to a command (ldapsearch), and the whole system was a nightmare. There were also dozens of pages where almost no processing was done in the view, but a bunch of objects with lazy-loaded properties were passed into the template and always used, so when benchmarking you'd get 0.01 seconds for the entire function and then 233 seconds for "return render(...)' because for every single row in the database (dozens or hundreds) the template would access a property that would trigger another SQL call to the backend, rather than just doing one giant "SELECT ALL THE THINGS" and hammering it out that way.

Note that we also weren't using Django's foreign keys support, so we couldn't even tell Django to "fetch everything non-lazily" because it had no idea.

If that app were written right it could have run on a Raspberry Pi 2, but instead there was no amount of cores that could have sped it up.

benhoyt · on Sept 7, 2021

Yeah, I see this a lot. I think it's especially easy to introduce this kind of "accidentally quadratic" behaviour using magical ORMs like Django's, where an innocent-looking attribute access like user.groups can trigger a database query ... access user.groups inside a loop and things get bad quickly.

In the case of groups and permissions there's probably only a few of each, so fetching all of them is probably fine. But depending on your data -- say you're fetching comments written by a subset of users, you can tweak the above to use IN filtering, something like this Python-ish code:

  users = select('SELECT id, name FROM users WHERE id IN $1', user_ids)
  comments = select('SELECT user_id, text FROM comments WHERE user_id IN $1', user_ids)
  comments_by_user_id = defaultdict(list)
  for c in comments:
    comments_by_user_id[c.user_id].append(c)
  for u in users:
    u.comments = comments_by_user_id[u.id]

Only two queries, and O(users + comments).

For development, we had a ?queries=1 query parameter you could add to the URL to show the number of SQL queries and their total time at the bottom of the page. Very helpful when trying to optimize this stuff. "Why is this page doing 350 queries totalling 5 seconds? Oops, I must have an N+1 query issue!"

simonw · on Sept 7, 2021

Django's prefetch_related mechanism essentially implements the pattern you describe here for you: https://docs.djangoproject.com/en/3.2/ref/models/querysets/#...

benhoyt · on Sept 8, 2021

Thanks. Yeah, I think I used that years ago when I first ran into this problem, and it worked well. Whether one uses an ORM or not, one needs to know how to use one's tools. My problem (not just with Django, but with ORMs in general) is how they make bad code look good. Like the following (I don't know Django well anymore, but something like this):

  users = User.objects.all()
  for u in user:
    print(u.name, len(u.comments))

To someone who doesn't know the ORM (or Python) well, u.comments looks cheap ("good"), but it's actually doing a db query under the hood each time around the loop ("bad"). Not to mention it's fetching all the comments when we're only using their count. Whereas if you did that in a more direct-SQL way:

  users = select('SELECT id, name FROM users WHERE id IN $1', user_ids)
  for u in user:
    num_comments = get('SELECT COUNT(*) FROM comments WHERE user_id = $1', u.id)
    print(u.name, num_comments)

This makes the bad pattern look bad. "Oops, I'm doing an SQL query every loop!"

The other thing I don't like about (most) ORMs is they fetch all the columns by default, even if the code only uses one or two of them. I know most ORMs provide a way to explicitly specify the columns you want, but the easy/simple default is to fetch them all.

I get the value ORMs provide: save a lot of boilerplate and give you nice classes with methods for your tables. I wonder if there's a middle ground where you couldn't do obviously bad things with the ORM without explicitly opting into them. Or even just a heuristic mode for development where it yelled loudly if it detected what looked like an N+1 issue or other query inside a loop.

simonw · on Sept 8, 2021

I agree that this stuff can definitely be handled better.

https://github.com/django-query-profiler/django-query-profil... has a neat option for detecting likely N+1 queries. I usually use the Django Debug Toolbar for this.

Django's ".only()" method lets you specify just the columns you want to retrieve - with the downside that any additional property access can trigger another SQL query. I thought I'd seen code somewhere that can turn those into errors but I'm failing to dig it up again now.

I've used the assertNumQueries() assertion in tests to guard against future changes that accidentally increase the number of queries being made without me intending that.

The points you raise are valid, but there are various levels of mitigations for them. Always room for improvement though!

benhoyt · on Sept 8, 2021

Thanks for the thoughtful responses and the link -- Django Query Profiler looks nice!

cuu508 · on Sept 8, 2021

    users = User.objects.all()
    for u in user:
        print(u.name, len(u.comments))

This is fine if you are working with a small data set. It is inefficient, but if it's quick enough, readability trumps efficiency IMHO.

Django ORM has a succinct way of doing the "SELECT COUNT(*)" pattern:

    users = User.objects.all()
    for u in user:
        print(u.name, u.comments.count())

And you can use query annotations to get rid of the N+1 query issue altogether:

    users = User.objects.annotate(n_comments=Count("comments"))
    for u in user:
        print(u.name, u.n_comments)

tinus_hn · on Sept 8, 2021

> This is fine if you are working with a small data set. It is inefficient, but if it's quick enough, readability trumps efficiency IMHO.

And this is how you end up with the problems the parent is describing. During testing and when you setup the system you always have a small dataset so it appears to work fine. But when it’s real work the system collapses.

danjac · on Sept 8, 2021

A good habit when writing unit tests is to use assertNumQueries especially with views. It's very easy even for an experienced developer to inadvertently add an extra query per row: for example if you have a query using only() to restrict the columns you return, and in a template you refer to another column not covered by only(), Django will do another query to fetch that column value (not 100% if that's still the case, but that was the behavior last time I looked). The developer might be just fixing what they think is a template issue and won't know they've just stepped on a performance landmine.

benhoyt · on Sept 8, 2021

Sounds like a good habit, yeah. But the fact that an attribute access in a template (which might be quite far removed from the view code) can kick off a db query is an indication there's something a bit too magical about the design. Often front-end developers without much knowledge of the data model or the SQL schema are writing templates, so this is a bit of a foot-gun.

simonw · on Sept 8, 2021

I've wanted a Django mechanism in the past that says "any SQL query triggered from a template is a hard error" - that way you could at least prevent accidental template edits from adding more SQL load.

jbboehr · on Sept 8, 2021

My SQL is a bit rusty, but, to reduce the queries even further, I think you can do:

> SELECT user_id, COUNT(*) FROM comments WHERE user_id IN $1 GROUP BY user_id

bradstewart · on Sept 8, 2021

> Or even just a heuristic mode for development where it yelled loudly if it detected what looked like an N+1 issue or other query inside a loop.

Fwiw, ActiveRecord with the Bullet gem does exactly that. I'd guess there's an equivalent for Django.

petters · on Sept 8, 2021

select_related and prefetch_related are absolutely essential when coding Django.

theptip · on Sept 8, 2021

I’m coming around to considering Django-seal to be mandatory for non-trivial applications. It will “seal” a query set so that you can’t make accidental DB lookups after the initial fetch. That way you can be more confident that you are doing the right joins in the initial query, and you are safe from the dreaded O(N) ORM issue.

SqlAlchemy has this as part of the ORM, it should really be part of Django IMO.

c17r · on Sept 8, 2021

I've never heard of django-seal before, thanks!

kbenson · on Sept 8, 2021

When I was doing a lot of DBIx::Class work, I contemplated writing a plugin to allow me to lock and and unlock automatic relation fetches and die (throw) if one was attempted when locked. I would carefully prefetch my queries (auto-fill relations from a larger joined query) and if something changed or there was an edge case I missed that caused fetching deep in a loop, it might kill performance, but be hard to track down if not given the best directions on what caused it other than "this is slow".

It's one of those things that in the long run would have been much more time effective to write, but the debugging never quite took long enough each time to make me take the time.

mschuster91 · on Sept 8, 2021

> For development, we had a ?queries=1 query parameter you could add to the URL to show the number of SQL queries and their total time at the bottom of the page. Very helpful when trying to optimize this stuff. "Why is this page doing 350 queries totalling 5 seconds? Oops, I must have an N+1 query issue!"

I'm so glad Symfony (PHP framework) has a built-in profiler and analysis tooling there...

tra3 · on Sept 7, 2021

This is an example of N+1 problem [0]. It should be a FizzBuzz for anyone doing any CRUD apps.

[0]: https://stackoverflow.com/questions/97197/what-is-the-n1-sel...

JJMcJ · on Sept 7, 2021

Your pattern is quite powerful: get data from several sources and do the rearranging on the client (which might be a web server), instead of multiple interactions for each data item.

For SQL you can also do a stored procedure. Sometimes that works well if you are good at your DBMS's procedure language and the schema is good.

spfzero · on Sept 8, 2021

Though stored procs will always execute on the server that is probably already the system bottleneck.

funnybeam · on Sept 8, 2021

Sending multiple queries from your web server will likely put more load on the database server than using a stored procedure.

With either technique you are still pulling all the data you need from the DB but with multiple queries instead of a stored procedure you are usually pulling more data than you need with each query and then dropping any rows or fields you’re not interested in. Together with multiple calls over the network to the DB server and (often) multiple SQL connection setups this is much worse for performance on both the web and database servers

bigbizisverywyz · on Sept 8, 2021

>Sending multiple queries from your web server will likely put more load on the database server than using a stored procedure.

Lots of people seem to not realize that db roundtrips are expensive, and should be avoided whenever possible.

One of the best illustrations of this I've found is in Transaction Processing book by Jim Gray and Andreas Reuters where they illustrate the relative cost of getting data from CPU vs CPU cache vs RAM vs cross host query.

JJMcJ · on Sept 8, 2021

Except for open and writing to files, it's about the heaviest action on a backend server.

chenxiaolong · on Sept 8, 2021

I had to fix a similar thing in our internal password reset email sender last year. The code was doing something like:

    for each user in (get_freeipa_users | grep_attribute uid):
        email = (get_freeipa_users | client_side_find user | grep_attribute email)
        last_change = (get_freeipa_users | client_side_find user | grep_attribute krblastpwdchange)
        expiration = (get_freeipa_users | client_side_find user | grep_attribute krbpasswordexpiration)

        # Some slightly incorrect date math...

        send_email

I changed it to a single LDAP query for every user that requests only the needed attributes. It cut that Jenkins job's runtime from 45 minutes to 0.2 seconds.

Thaxll · on Sept 7, 2021

Filter in app is rarely the right solution, your data should be organized in a way that you can get what you need in a single query. Reasons:

- it's memory efficient

- it's atomic

- it's faster

Also doesn't LDAP support filtering in query?

zetsurin · on Sept 7, 2021

Given the context, it probably took the poster very little time to implement that fix, without digging into ldapsearch. With massive speedup, for their likely smallish ldap install. Seems like not a bad call at all.

VintageCool · on Sept 7, 2021

I did some work to improve performance on a dashboard several years ago. The way the statistics were queried was generally terrible, so I spent some time setting up aggregations and cleaning that up, but then... the performance was still terrible.

It turned out that the dashboard had been built on top of Wordpress. The way that it checked if the user had permission to access the dashboard was to query all users, join the meta table which held the permission as a serialized object, run a full text search to check which users had permission to access this page, and return the list of all users with permission to access the page. Then, it checked if the current user was in that list.

I switched it to only check permissions for the current user, and the page loaded instantaneously.

dimitrios1 · on Sept 7, 2021

If I look at traces of all the service calls at my company within our microservices environment, the "meat" of each service is a fraction of the latency -- the part that's actually fetching the data from a database, or running an intense calculation. Often times its between 20-40ms

Everything else are network hops and what I call "distributed clutter", including authorizing via a third party like Auth0 multiple times for machine-to-machine token (because "zero trust"!), multiple parameter store calls, hitting a dcache, if interacting with a serverless function, cold starts, API gateway latency, etc...

So for the meat of a 20-40 ms call, we get about a 400ms-2s backend response time.

Then if you are loading a front end SPA with javascript...fugetaboutit it

But DevOps will say "but my managed services and infinite scalability!"

scns · on Sept 8, 2021

Like this? https://www.youtube.com/watch?v=y8OnoxKotPQ

dimitrios1 · on Sept 8, 2021

Exactly like that.

unclebucknasty · on Sept 7, 2021

Not sure what the exact use case was (i.e. the output of the filtering) but—from reading the first algo—seems to be something to do with determining group membership and permissions for a user.

In that case, was there a reason joins couldn't be used? As it still seems pretty wasteful (and less performant) to load all of this data in memory and post-process; whereas a well-indexed database could possibly do it faster and with less-memory usage.

hnbad · on Sept 8, 2021

In defense of whoever wrote the original code: it probably would have been reasonably fast if it had been a database query with proper indexes. The filters would have whittled the selection down to only the relevant data, whereas returning basically three entire tables of data to then throw away most of it would have been extremely inefficient.

The mistake of course was not thinking about why this approach is faster in a database query and that it doesn't work that way when you already need to get all the data out of LDAP to do anything with it.

onlyrealcuzzo · on Sept 7, 2021

Yeah - you likely want to do this a single simple query - which you can optimize if necessary. an O(N+1) query is bad. An O(N^2) query is something I have rarely seen. Congrats!

codebolt · on Sept 8, 2021

I'd also add that groups and permissions are probably constant and can be cached with a long timeout.

afrodc_ · on Sept 8, 2021

Is there a reason you shelled out with a subprocess versus using a library like ldap3? Just curious

IfOnlyYouKnew · on Sept 7, 2021

I believe the parent's point was that code tends to be faster than what people expect, not slower.

namenotrequired · on Sept 7, 2021

I think the child's point is that people expect code to be slower than it is because they have seen code be slow far more than necessary.

blacklion · on Sept 7, 2021

Crypto markets are very small :-)

I'm working and company which process "real" exchanges, like NASDAQ, LSE, and, especially, OPRA feed.

We've added 20+ crypto exchanges in our portfolio this year, and all of them are processed on one old server which is unable to process NASDAQ Total View in real-time anymore.

On the other hand, whole OPRA feed (more than 5Gbit/s or 65B/day, yes, it is billions, messages of very optimized binary protocol, not this crappy JSON) is processed by our code on one modern server. Nothing special, two sockets of Intel Xeons (not even Platinums).

Foomf · on Sept 7, 2021

I've read your few posts a few times and I'm still not sure why you made your post. You're telling the person that you handle more data than them and thus need more resources than them. Was your goal to smugly belittle them? It's not like they said any problem can be solved on their specific resources.

blacklion · on Sept 7, 2021

Nope, I want to say, that even much more data could be processed on very limited hardware, and that it is additional confirmation that current hardware is immensely powerful and very under-estimated.

Cthulhu_ · on Sept 8, 2021

And that go-to solutions / golden hammers like REST / JSON are very much suboptimal.

I mean I'm still going to use it for client/server communication and the like because I don't have serious performance constraints enough to warrant something that will be more difficult to develop for etc, but still.

Aeolun · on Sept 7, 2021

I feel like he’s saying two things.

One, that he’s surprised by how small crypto markets are.

Two, that this one server (or very few server processing) thing scales quite well to billions of messages a day.

I didn’t find any element of smugness here, but maybe I misread the tone.

lost-found · on Sept 7, 2021

They are just pointing out the size of crypto market data vs normal market data. I found it interesting, not their fault you didn’t.

nostrademons · on Sept 8, 2021

I'd initially responded, but I found that my response had an element of a pissing match over whose data is bigger, so I deleted it.

The thing is - when two engineers get smug, oftentimes lots of fairly interesting technical details get exchanged, so such discussions aren't really useless to bystanders.

secondcoming · on Sept 8, 2021

It depends on what your server does with each request though; '65B a day' means little. If all it does is write it to a log then I'm surprised you're not using a rPI.

joering2 · on Sept 7, 2021

Could you share some more about that very optimized binary protocol? I know there are ways to be more efficient than JSON but since you call it crappy, your solution must be much much better. Honestly interested to readup more.

blacklion · on Sept 7, 2021

It is not "our" protocol, it is protocol designed by exchange and we need to support it, as we can not change it :). Simple binary messages, with binary encoded numbers, etc. No string parsing, no syntax, nothing like this, only bytes and offsets. Think about TCP header, for example.

JSON is very inefficient both in bytes (32 bit price is 4 bytes in binary and could be 7+ bytes as string, think "1299.99" for example) and CPU: to parse "1299.99" you need burn a lot of cycles, and if it is number of cents stored as native 4-byte number you need 3 shifts and 4 binary ors at most, if you need to change endianness, and in most cases it is simple memory copy of 4 bytes, 1-2 CPU cycle.

When you have binary protocol, you could skip fields which you are not interested in as simple as "offset = offset + <filed-size>" (where <filed-size> is compile-time constant!) and in JSON you need to parse whole thing anyway.

Difference between converting binary packet to internal data structure and parsing JSON with same data to same structure could be ten-fold easily, and you need to be very creative to parse JSON without additional memory allocations (it is possible, but code becomes very dirty and fragile), and memory allocation and/or deallocation costs a lot, both in GC languages and languages with manual memory management.

localhost · on Sept 8, 2021

Curious if the binary protocol uses floating point or a fixed point representation? Or is floating point with its rounding issues sufficient for the protocol's needs?

andylynch · on Sept 8, 2021

No GP but familiar with these protocols. They use fixed point extensively; I can't even thing of an exchange protocol which would use floating point since the rounding issues would cause unnecessary and expensive problems.

This is typical (from NASDAQ http://www.nasdaqtrader.com/content/technicalsupport/specifi... ):

Prices are integer fields. When converted to a decimal format, prices are in fixed point format with 6 whole number places followed by 4 decimal digits. The maximum price in OUCH 4.2 is $199,999.9900 (decimal, 7735939C hex). When entering market orders for a cross, use the special price of $214,748.3647 (decimal, 7FFFFFFF hex).

mschuster91 · on Sept 8, 2021

> The maximum price in OUCH 4.2 is $199,999.9900

For NASDAQ it seems to have been something around 430k / share... Buffett's BRK shares threatened to hit that limit a couple months ago: https://news.ycombinator.com/item?id=27044044

nly · on Sept 8, 2021

Many exchanges use floating point, even I the Nasdaq technology stack.

X-Stream feeds do for example

blacklion · on Sept 9, 2021

Most of them use decimal fixed point. Sometimes exponent (decimal, not binary one!) is fixed per-protocol, sometimes per-instrument and sometimes per-message, it depends on exchange.

joering2 · on Sept 8, 2021

Thanks for the writeup!

nostrademons · on Sept 7, 2021

Some Googling turned up this protocol descriptor:

https://uploads-ssl.webflow.com/5ba40927ac854d8c97bc92d7/5bf...

If you're optimizing for latency JSON is pretty terrible, but most people who use it are optimizing for interoperability and ease of development. It works just fine for that, and you can recover decent bandwidth just by compressing it.

paraph1n · on Sept 7, 2021

There are many binary encoding protocols. A popular one is protobufs[1], which is used by gRPC.

[1]: https://developers.google.com/protocol-buffers

blacklion · on Sept 7, 2021

"old skool" exchanges uses either FIX (old and really vernose), FAST (binary encoding for FIX) or custom fixed-layout protocols.

Most big USA exchanges uses custom fixed-layout protocols, where each message is described in documentation, but not in machine-readable way. European ones still use FAST.

I didn't seen FIX in the wild for data feeds, but it is used for brokers, to submit orders to exchange (our company didn't do this part, we only consume feeds).

I don't know why, but all Crypto Exchanges use JSON, not protobufs or something like this, and didn't publish any formal schemes.

Fun fact: one crypto exchange put GZIP'ed and base64'ed JSON data into JSON which pushed to websocket, to save bandwidth. IMHO, it is peak of bad design.

blibble · on Sept 7, 2021

there's still ascii FIX floating around on the market data side for a few esoteric venues

FAST is not particularly common in Europe

the large European venues use fixed-width binary encoding (LSE group, Euronext, CBOE Europe)

blacklion · on Sept 7, 2021

Eurex uses FAST for sure, but I can be wrong about "common".

gpderetta · on Sept 8, 2021

Eurex/XETRA EOBI no longer uses FAST. It is about 8 year old t this point, but IIRC some products are still on the older protocol.

nly · on Sept 8, 2021

Euronext uses SBE specifically.

rewq4321 · on Sept 7, 2021

And msgpack if you want an order of magnitude faster serialization/deserialisation and can put up with worse compression (I think mainly due to schema overhead since protobuf files don't store the schema?)

https://msgpack.org/index.html

Good protobuf vs msgpack comparison: https://medium.com/@hugovs/the-need-for-speed-experimenting-...

wffurr · on Sept 7, 2021

https://www.opraplan.com/datafeeds