Meta quickly detects silent data corruptions at scale

kache_ · on April 4, 2022

The scale at which Meta operates at really boggles my mind. I work with an ex facebook guy who was on the infra side of things and the numbers he told me.. I couldn't even imagine. And I'm working on the order of magnitude of 100m/h, but still, completely different set of challenges.

silisili · on April 4, 2022

Same. I remember asking one guy at FB the process to ask for a new server. He said he can't even open a request for anything less than a thousand boxes. The largest fleet I'd worked on at that point was 12... different worlds.

lclarkmichalek · on April 4, 2022

I mean, that's not true in the general case. That'd be incredibly wasteful.

(Work at Meta, mostly on capacity)

summerlight · on April 4, 2022

Yeah, I guess it probably is some sort of criteria that requires a formal request? The resource under that number might be automatically provisioned via "freebie quota" per team. I don't work for Meta but I believe most of the big techs work in a similar way.

silisili · on April 4, 2022

Curious - do you know what he could have been referring to? SRE position, 6-8 years ago.

lclarkmichalek · on April 4, 2022

A bit before my time, but maybe a physical capacity order, to a non fungible pool. The infra is quite different now (mostly to avoid this kind of inefficiency)

mescaline · on April 4, 2022

[flagged]

ryeguy · on April 4, 2022

This is a technical post, stay on topic and stop posting flamewar bait.

ethanwillis · on April 4, 2022

It's not flamewar bait unless you turn it into that. It can be a fruitful discussion.

aliceryhl · on April 4, 2022

It's not flamewar content yet, but it definitely is flamewar bait.

fleddr · on April 4, 2022

I once read that Facebook was opening 2 or 3 massive new data-centers in the US for the purpose of hosting stale content.

You may have posted a photo 7 years ago, and statistics show that basically nobody ever revisits it. However, in case you do, it needs to be there. So these enormous buildings do basically nothing, but still need to be there.

It makes me wonder how it can go on like this. Users only keep adding content and never remove it. The income per user cannot grow forever, storage cannot get infinitely cheap, the model has to break one day?

amethyst · on April 4, 2022

There's no meaningful benefit to dedicating any amount of DC equipment just to stale content. Those are spindles (and networks) that could be taking meaningful hot reads and writes, and colocating stale and hot data is generally a better use of capacity than concentrating hot data in fewer locations.

darkwater · on April 4, 2022

What you say totally makes sense but still, even if new media takes more storage space, the accumulated stale date in the long run will win.

Or maybe I'm underestimating how much space newer material needs?

fleddr · on April 4, 2022

Exactly. And that doesn't even take into account higher res photos and 4K video. I remember the staggering statistic where just Instagram sees 100 million photos added per day, every day. And that was years ago.

ksec · on April 5, 2022

That is just Instagram, with photos. Imagine Youtube.

And again I have been saying this since ~2015/16, we dont have any meaningful roadmap for cost reduction on storage, whether that is Hot as in NAND, Bulk as in HDD, or Cold as in Optical Disc. I dont see 2TB SSD dropping below $100 in next 5 years, or 10TB HDD below $120.

Remember when Google promise infinite Gmail storage?

southerntofu · on April 5, 2022

> we dont have any meaningful roadmap for cost reduction on storage

Then maybe we could:

- stop encouraging users to post shit just so we can track them

- stop tracking them which requires many data points and a lot of processing power (for 0 benefit for the user or society at large)

- stop the copyright non-sense and actually use hyperlinks instead of reuploading the same content 500 times across platforms? maybe even do content-addressed storage (Bittorrent/IPFS) who knows?

gitfan86 · on April 4, 2022

Instagram does not guarantee photo quality. They can resize photos anytime they want. Eventually they turn 10 year old photos that have not been viewed in 9 years into thumbnails or just delete them.

jamesfinlayson · on April 5, 2022

I looked back at some photos from Christmas in Facebook Messenger and they looked noticeably degraded in quality.

amethyst · on April 4, 2022

The amount of ephemeral content (stories, expiring content) being created/shared has a larger impact on capacity and provisioning than you might expect.

cbetti · on April 5, 2022

This isn't how scaling works though. Across all applications the hot data growth outpaces the cold.

So if you're designing capacity for exponential growth, the future point at which you stop experiencing exponential growth and only have to worry about roughly linear growth is a much easier problem to solve.

londons_explore · on April 4, 2022

In a fleet of 100,000 machines, there will always be some clear failures... When the machine has 2x the number of segfaults of any other machine in the fleet, you send it for repairs and someone replaces the motherboard, ram and CPU... easy!

But the painful ones are the 'subtle' failures. Why does machine PABL12 sometimes give NaN as a result while all 99,999 machines return sensible numbers? But all the burn in hardware tests pass...

The solution was to simply exclude any machines that were outliers. Anything in the top or bottom 0.01% for any metric simply exclude that machine from future workloads.

Sure, in most cases there was nothing wrong with the hardware, but when you're spending hours debugging some fault caused by a sometimes-bad floating point unit on one core of one machine out of 100,000, you're just wasting your time. By auto-banning outliers, the machine will end up doing some other task where data consistency matters less.

jeffbee · on April 4, 2022

Was pabl12 an actual bad machine? Sounds somehow plausible, as if I heard of it before.

It was an annoying struggle trying to raise the visibility of broken CPUs during my years at Google SRE. The SRE org and the rest of the software side of Tech Infra resisted the whole concept, even though it was well-known among platforms hardware eng. The process for taking a known-bad machine out of service involved 1) the machine being reported independently by three different teams; 2) the machine continuing to be in service for days or weeks, at the leisure of some very asynchronous automation; and 3) the machine being returned immediately to service because it passed all of the cursory checks during reinstall. Really irritating. Consequently every major service had to maintain their own private blacklist.

It's nice to see that some influential people on the software side are starting to come around, with papers like "Cores That Don't Count" etc, but man they could have been on this boat a decade ago.

mjevans · on April 4, 2022

Reminds me of the typical story of someone with a complete damage protection plan and a flaky device. Take it in for repairs, passes all the tests, but they know it's funky, so snap it in half or otherwise completely wreck it right in front of the tech and demand that repair.

bryan_w · on April 4, 2022

Usually teams would consider a machine "bad" if that node in the cluster had elevated errors compared to the rest of the cluster they were running. Unfortunately this doesn't tell hardware teams what actually went wrong.

If one could show that the CPU said 2+2=9, I'm sure they would yank it out right away, but "it returns 500 errors a lot" isn't very debugable. The only thing they can do is run the diag and return it to service if nothing comes up.

jeffbee · on April 4, 2022

Well that's one of the reasons this is difficult to handle as an organization. The novice says "the machine is broken" and is mistaken. But the expert says the same thing, and is correct. Same with compiler bugs: novices believe the compiler is full of bugs, journeymen believe the compiler is infallible, but the wise return to the knowledge that the compiler is full of bugs. Maybe that company just needs "bad machine readability" or something.

And your last statement is definitely not true. I can recall multiple instances of demonstrable logic errors in which the machine repeatedly returned to service. This includes all of the machines of a certain generation of a certain vendor's CPUs that were found to have latent ALU bugs, 8 years after going into service.

notacoward · on April 4, 2022

> When the machine has 2x the number of segfaults of any other machine in the fleet, you send it for repairs

At that scale, it's quite likely sent to repair automatically and whoever's on call just gets a notification.

jgrahamc · on April 4, 2022

Some might enjoy this old Cloudflare debugging story about random crashes in production.

https://blog.cloudflare.com/however-improbable-the-story-of-...

ignoramous · on April 4, 2022

Add to that a bunch of "rare" / "unlikely" / "silent" CPU bugs (compute errors) that Google and Facebook see with regularity: https://muratbuffalo.blogspot.com/2021/06/cores-that-dont-co...

> So Google found fail-silent Corruption Execution Errors (CEEs) at CPU/cores. This is interesting because we thought tested CPUs do not have logic errors, and if they had an error it would be a fail-stop or at least fail-noisy hardware errors triggering machine checks. Previously we had known about fail-silent storage and network errors due to bit flips, but the CEEs are new because they are computation errors. While it is easy to detect data corruption due to bit flips, it is hard to detect CEEs because they are rare and require expensive methods to detect/correct in real-time.

https://muratbuffalo.blogspot.com/2021/06/silent-data-corrup...

> The paper claims that silent data corruptions can occur due to device characteristics and are repeatable at scale. They observed that these failures are reproducible and not transient. Then, how come did these CPUs pass the quality control tests by the chip producers? In soft-error based fault injection studies by chip producers, CPU CEEs are evaluated to be a one in a million occurrence, not 1 in 1000 observed at deployment at Facebook and Google... The paper also says that increased density, technology scaling, and wider datapaths increase the probability of silent errors.

dfdz · on April 4, 2022

Thanks for sharing.

notacoward · on April 4, 2022

To be clear, this is about corruption in the CPU/GPU/memory complex. There's a whole separate set of techniques (some of which I worked on) to detect and correct data corruption on disk.

huhtenberg · on April 4, 2022

I'm in the same boat and my takeaway is that the vast majority of a "silent" on-disk corruption actually happens on the way to the storage, i.e. the data gets corrupted in some RAM it passes through and then just ends up being written out in corrupted state. This is because, virtually all modern drives implement per-sector FEC coding, so if a bit does flip on the disk, you will either get back original data (now FEC-corrected) or you will get a read error.

That is, the so-called "bitrot" phenomenon is largely mis-attributed. Bitrot doesn't happen at rest. It happens in transit.

notacoward · on April 4, 2022

I can state categorically that bitrot on disk does exist, because that's one of the parts I worked on. It's pretty rare - unfortunately I don't think I can give you the numbers - but across enough exabytes it does happen enough to justify slow scans to detect it.

huhtenberg · on April 4, 2022

How did you know it was a change at rest?

The only correct way to test for bitrot is to read the data back immediately after it was written and the cache flushed. If it's the same as the original, we know it made it to the disk undamaged. Then re-read it again after some time. If it doesn't match, re-read immediately again, ideally using a different physical memory block. Compare again. If it doesn't match, take the disk to another machine and re-read again. If it doesn't match, only then it's an actual at-rest bitrot... OR it's a drive's firmware bug, because corrupted data must be corrected or it must not be returned at all.

notacoward · on April 4, 2022

> How did you know it was a change at rest?

Because we had the checks for it in flight. Also, more often than not these same blocks had been checked before, and found to be fine.

> The only correct way to test for bitrot is to read the data back immediately

No, the only correct way is to read it back after some time has passed. Mis-written data is not the same as bitrot.

> must be corrected or it must not be returned

Every error-correction technique has a limit to how many simultaneous errors it can correct. Beyond that, bits can be flipped in a way that seems valid but in fact is not (detectable by cross-checking with other erasure-coded fragments of the same block on other machines). Just because you haven't seen it doesn't mean it doesn't happen. As I said, and as others have said many times, with sufficient scale and time even the most unlikely scenarios become almost inevitable. Why do you persist in telling me I didn't see what I saw with my own eyes? Are you assuming that my thirty years in storage gave me less understanding or insight regarding these issues than whatever experience (if any) you have?

huhtenberg · on April 4, 2022

>> The only correct way to test for bitrot is to read the data back immediately

> No, the only correct way is to read it back after some time has passed. Mis-written data is not the same as bitrot.

Well, no. If you want to check for at-rest bitrot, you need to make sure that you've written out the correct thing. Otherwise it's not possible to tell at-rest corruption from the one that happened on the way in.

> Every error-correction technique has a limit to how many simultaneous errors it can correct.

But it can detect that the case when it can't recover. Which is why it will either produce a correct output or an error.

> As I said, and as others have said many times, with sufficient scale and time even the most unlikely scenarios become almost inevitable.

This is not an argument if it goes against how things actually work.

> Why do you persist in telling me I didn't see what I saw with my own eyes?

I am merely curious in your exact testing technique, because at-rest bitrot is vanishingly impossible, even at the exabyte scale. For it to happen, the data and its ECC (7-11% of the data size) need to be both corrupted in a coordinated way. That is exceedingly unlikely. Especially in the context of academic papers that found that on-disk corruption is nearly always clustered and is either small scale or full-sector failures.

So when you say you ran into a lot of these cases, it's only natural to ask for details. And "scale" is not a detail.

> Are you assuming that my thirty years in storage gave me less understanding or insight regarding these issues than whatever experience (if any) you have?

I have no way to tell. But given your experience, can you explain how at-rest bitrot, should it occur, can seep through the on-disk error correction? I am not talking about raid-style setups, just the banal ECC record in a disk sector [1].

[1] https://en.wikipedia.org/wiki/Advanced_Format#Overview (linking to Advanced Format, because it has a diagram)

notacoward · on April 4, 2022

> But it can detect that the case when it can't recover.

That is simply not true. For any parity/ECC/FEC/erasure-code scheme carrying M data bits in N (greater than M but less than 2M) total, there must be multiple data patterns that will match the same error checks. That's just mathematics. Also, bear in mind that ECC bits can be corrupted too. This opens up the distinct possibility of something that looks like a correctable error, but the "correction" leads to a wrong result. I've seen such issues in many kinds of storage systems, from low level to high. Anyone who has actually worked in this area, instead of deriving their "expertise" from a quick scan of Wikipedia, would be utterly unsurprised by the idea that disk firmware might do such a thing, or have bugs in their ECC implementation, or not follow a spec.

Whatever the causes, whatever the merely-theoretical probabilities, the fact remains that I've seen these. I've been paged for them. I've done the analyses of possible causes. A bit pattern was written and repeatedly verified over a quite long period of time (ruling out data path issues), then at some point a different bit pattern was read and would persistently be read thereafter. How is that not real bitrot? How does it matter, beyond ruling out everything above the disk level, what the precise causes are? If you can't answer those questions, you're just posting noise.

huhtenberg · on April 4, 2022

> That is simply not true.

It indeed is not. Had to reread the theory and I stand corrected, RS-style ECC can't detect errors in excess of the redundancy count.

> How is that not real bitrot?

It is and I can see how it can happen.

> How does it matter, beyond ruling out everything above the disk level, what the precise causes are?

It would've mattered if a drive could detect on-disk bitrot reliably, which was what the stats I worked with (also in exabytes, funnily enough) and the IEEE papers I read led me to believe.

For what it's worth, you won. Hats off.

notacoward · on April 4, 2022

Thank you for an interesting (despite being contentious) conversation.

jeffbee · on April 4, 2022

Didn't you answer your own question above? It's firmware bugs. The disk reported a successful write at block X but it actually wrote block Y. Later you read block Y and you get data X. The block-level ECC codes are consistent. You also stand a low but not zero probability that you requested a read at block X and were served up some other block, again with matching checksums. And of course there's always the possibility that your firmware simply has a bug in the code checker.

The paper "Parity Lost and Parity Regained" assigns a probability of 1.88e−5 to misdirected writes bugs among disks, so if you have a warehouse full of disks you now have this nightmare.

notacoward · on April 4, 2022

Fun question: what if a relocation table gets corrupted? And what protection is there against that possibility? You can bet it's not the same ECC as on data blocks. The rest is left as an exercise for the reader. ;)

jonah-archive · on April 4, 2022

uh oh, I recognize this one. Love to have a file corrupt after months at rest with no access logged and no mtime changes because a file on a neighboring track needed rewriting (SMR, of course).

booi · on April 4, 2022

ZFS and other checksumming file systems can detect bit rot in data at rest. When data is read back, that sector is checksummed and compared against when it was written before returning the request.

You can periodically scrub the entire pool to find and even fix these issues (in a pool with redundancy)

legulere · on April 4, 2022

The bit rot could have still happened during writing though, if the bitrot is found in the first scan.

huhtenberg · on April 4, 2022

Sure. However the main purpose of scrubbing is to flush out deteriorating media and to prompt the drive to relocate salvageable sectors and to report completely dead ones.

alipitch · on April 5, 2022

Slightly off-topic digression: This article discussed "enterprise" grade "silent data corruptions".

What are some recommendations for "personal data storage" grade "silent data corruptions"?

"personal data storage" for my case is < 1TB, text / binary files (jpg, mp4).

I am looking at a wikipedia list below, and then searching through hn comments.

<https://en.wikipedia.org/wiki/Comparison_of_file_systems#Blo...> column: Data checksum/ ECC

I found many comments on ZFS, and not so much comments on dm-integrity, BlueStore/Cephfs, and others. So I am thinking of looking into ZFS, but if there are any recommendations, I would like to seek advice.

I am experimenting with Git LFS, git-annex, I like the filesystem UI better, so I am looking for filesystem like solutions.

Hnrobert42 · on April 4, 2022

Interestingly, this site fails ungracefully (HTTP error code 500) when I try to visit from NordVPN, even after cycling through a few IP addresses. I’m noticing more and more sites block all VPN track. I get why, but it’s not good.

PragmaticPulp · on April 4, 2022

I've been on the other side of this (threat analysis, not Facebook).

Known VPN-associated IP addresses were far more likely to be associated with abuse than average. Not just a little bit, but approaching 2 orders of magnitude worse in our case. It's not even close.

It's too bad for the people who need to use public VPN services for whatever reason, but until we have perfect bot/abuse detection, banning VPN, Tor, and proxy services is far and away the most effective tool for cutting down on abuse.

thesz · on April 4, 2022

One of the VPN services I've used had their own IP addresses blacklisted - I was unable to view their list of servers while using their VPN.

When asked, they cited possible abuse as a reason. But whitelisted them again after a while.

londons_explore · on April 4, 2022

But it's hard to 'abuse' reading a blog post...

danuker · on April 4, 2022

There's denial of service, which wastes server resources, reducing the accessibility to humans interested in the content.

Dylan16807 · on April 5, 2022

I'd think that your average VPN block could just be rate limited if that's a big worry.

doix · on April 4, 2022

Depending on why you're using a VPN, you can just pay for a tiny VPS from ovh/hetzner, setup wireguard and use that as your VPN. Obviously don't do anything illegal, since everything is going through a server that is directly tied your credit card. But for privacy/security, it's good enough (for me anyway). I'm guessing it's luck of the draw if your IP has been blacklisted or not, but I've not had any issues the last 6 months whilst I travelled around.

wiredfool · on April 4, 2022

Hetzner commonly triggers cloudflare's bot detection, and there are some things that just refuse to talk to it’s ip space.

chaorace · on April 4, 2022

I've noticed this is quite popular among the kings of cargo-cult security: banking websites. I can only hope the proliferation of VPN-gating is more contained compared to the recent (banking-led) upswing in Android root-checks.

This type of security theater can be easily bypassed by any determined attacker and thus only serves to deter honest users.

someotherperson · on April 4, 2022

> This type of security theater can be easily bypassed by any determined attacker and thus only serves to deter honest users.

To play devil's advocate, the large amount of attackers aren't really determined. They're just fishing for easy targets. If you check the logs on a VPS you'll see an endless stream of people trying to exploit things like Wordpress 24/7 on your brand new VPS that has nothing but a html landing page.

With banks, I imagine they have a compliance check list they have to tick off to make sure that -- if and when a successful attack happens -- their insurance would pay out. If they haven't taken simple steps like blocking VPNs it could lead to the insurance company claiming negligence.

ignoramous · on April 5, 2022

> To play devil's advocate, the large amount of attackers aren't really determined. They're just fishing for easy targets.

The rise of the copy-paste attacker: https://en.wikipedia.org/wiki/Script_kiddie

criddell · on April 4, 2022

Do you know how they detect VPN traffic?

When I'm traveling I'll often pipe my traffic through a VPN on my home network. I have had some weird failures but I've usually assumed that it was due to an unreliable hotspot I'm using. Now I'm wondering if using a VPN is the real problem...

philjohn · on April 4, 2022

It'll be due to the exit nodes for the VPN having been put on a blocklist.

If you're tunneling through your home network it's unlikely to cause problems, unless you've been doing nefarious things from your home IP and that has also ended up on a blocklist.

pnw · on April 5, 2022

The first thing I noticed about this article is that like all Facebook pages, it silently corrupted my back button.

mtVessel · on April 5, 2022

Did you open it in Firefox? If so, that's the fault of Facebook Container, not the page.

PTOB · on April 4, 2022

I work on the physical side; building hyperscale datacenters. You guys should try your hand at managing errors in that system. You've got it all: memory leaks, thermal overloads, misallocated heaps, pipes with strong type requirements, dropped packets ... you name it.

Melatonic · on April 4, 2022

I would probably be overwhelmed just managing the infrastructure for your monitoring systems and infrastructure is my main thing :-D

mad44 · on April 4, 2022

https://muratbuffalo.blogspot.com/2021/06/silent-data-corrup...

HL33tibCe7 · on April 4, 2022

Completely off-topic digression: I still think the name change to “Meta” is a big mistake. Subjectively, for some reason I just really dislike the name. More objectively, the branding is very muddled, e.g: serving an “Engineering at Meta” blog post on fb.com.

Often with these things it’s just about time; it feels wrong because you’re just not used to the change yet. Maybe that will happen, but it’s been months now. Usually with these changes I change my mind quicker than that.

nwsm · on April 4, 2022

> the name change to "Meta" is a big mistake

I think it's too soon to tell. Facebook has really negative brand recognition (from my POV), and who knows, maybe "metaverse" style online interaction is the future. (For the record I'm anti-web3 and indifferent on metaverse communities)

CiPHPerCoder · on April 4, 2022

I will always say VR, I will never say "metaverse".

Their branding move was bold, yet unconvincing.

tmn · on April 4, 2022

Vr is a subset of the ‘metaverse’. The metaverse isn’t really something new. It’s just a rebranding of the portion of our lives that are contained within the digital realm. On top of that there are obviously ideas for how to adapt and grow that space, which is all to be seen

a0zU · on April 4, 2022

Based

tonguez · on April 4, 2022

“It’s just a rebranding of the portion of our lives that are contained within the digital realm.”

none of my life is “contained” within shitbook

grumbel · on April 4, 2022

The Metaverse, at least in theory, is the connection of all aspects of your digital existence into a seamless whole. It wouldn't be limited to Facebook, it would give you a digital identity you can freely carry between websites, VR environments and devices.

In reality of course none of that exists and Meta has so far not shown how they plan to accomplish that. Worse yet, Facebook is directly responsible for making things not seamless on the Internet and in VR. So I don't have much hope (or fear) of them actually being successful in building that. But their vision is a lot broader than just Facebook in VR.

tmn · on April 4, 2022

Lol that’s fine. Metaverse isn’t specific to shitbook. You’re on the metaverse via hn. At the end of the day, I think it’s dumb. I’m just iron manning the justification of the rebrand

tomrod · on April 4, 2022

No, you and I and parent commenter are on the world wide web on HN.

I see no 3d Second Life models interacting.

nwsm · on April 4, 2022

Meta's metaverse is centered on VR, but generally, metaverse and VR are orthogonal

_qzu4 · on April 4, 2022

The name Meta dilutes the brand significantly. I bet if you ask people what Meta is, most people outside tech can't tell. But if you ask what Facebook is, 100% of them can. They took a really good brand name and trashed it to the point they needed to rebrand.

darawk · on April 4, 2022

I thought that was the point, though. The brand Facebook is well known, but had developed negative associations. So they wanted to start fresh.

_qzu4 · on April 4, 2022

Two sides of the same coin.

jacobr1 · on April 4, 2022

But they are still keeping the name Facebook for that specific consumer product.

stewbrew · on April 4, 2022

Meta still redirects meta.com to https://about.facebook.com/meta

I don't think it's too soon.

nwsm · on April 4, 2022

It's too soon to tell if it's a mistake*

stewbrew · on April 5, 2022

It isn't too soon to see that they don't believe in this rebranding themselves. Ergo it was a mistake.

johndfsgdgdfg · on April 4, 2022

Can we please keep this type of rants and off-topic criticisms out of technical threads? Lately even reading technical threads has become difficult because of thread-hijacking off-topic rants.

ATsch · on April 4, 2022

I feel like, given the negative connotations of "Facebook", that's by design.

nimbius · on April 4, 2022

a muddled brand is better than the currently maligned harbinger of misery disinformation and insurrection that Facebook has been mired in. Recruiters at Meta probably appreciate the distance.

Sindisil · on April 4, 2022

How many candidates wouldn't know that Meta == Facebook, at least within the tech spheres?

pavlov · on April 4, 2022

The point of the rebrand is not to hide the association, but to make people think Meta > Facebook.

I suspect it does help because a lot of hiring goes on for Quest, Instagram, Portal, WhatsApp, Workplace — you can talk to candidates about those specific products rather than make them think of grandpa posting right-wing memes on the blue app.

Melatonic · on April 4, 2022

Yea this is exactly it. I would never want to work for Facebook but I might want to work for Oculus. Even though I know Meta is just Facebook in new clothing it might (for some) psychologically distance it enough.

That being said with all of the data Facebook has from Facebook itself I would imagine Suckerberg must know that for the long term success of his different businesses they do have to distance the smaller ones a bit more than they are now - personally I am hoping we see a major fall in Facebooks value and influence and things like Oculus are spun-off to be as autonomous as possible.

tomrod · on April 5, 2022

Yeah, this techie is not convinced that this whole Meta nonsense is meaningful. It's extremely disingenuous if recruiters actually expect people to not realize they are exactly who they are, a high paying corporation that has major negative externalities associated due to bad behavior.

RoboTeddy · on April 5, 2022

Computational proofs of integrity (STARKs, SNARKs) could detect silent data corruptions (at the cost of a ~1000x slowdown)

I wonder if we’ll see them used for large scale applications whose correctness is critical.

raphaelj · on April 4, 2022

It would be better if Meta would focus on detecting spam at scale.

I put a desk chair on Marketplace last Friday, and got 8 messages that were actually scams. These were trying to "schedule" a Fedex/DHL pickup, and would redirect me to fake branded websites that were requesting my personal details and bank account. This was so obviously fake it baffled me Meta can't detect these automatically.

I am also getting multiple message requests per week asking from hookups. These are obviously fake [1].

---

[1] https://imgur.com/a/yZDPh3C

ebbp · on April 4, 2022

It’s a different team, with a different skillset, that would be responsible for that. Big companies can focus on more than one thing at a time.

BbzzbB · on April 4, 2022

They ban like 1.7B account per quarter ignoring those blocked at registration. Isn't that focus?

Subjectively too I also see so much less bot activity on Facebook than I do on any other social media.

zitterbewegung · on April 4, 2022

I think that large tech companies giving a snapshot of what cool or interesting things they do is great but if there are bigger problems that don’t seem to have that kind of focus it just feels like a marketing / recruiting post (which isn’t that bad). But, the problem would be if they made public antispam systems they can’t give that to spammers which presents as a catch22. Also if you have humans in the loop to evade a spam system it is basically impossible .

spookthesunset · on April 4, 2022

At the scale of FB, handling fraud is a non-trivial effort. At any given time there are probably thousands of somewhat well funded fraud teams looking to bypass whatever shiny new countermeasure FB adds to their site.

There is a lot of money to be made from defrauding FB users. This monetary incentive results in criminals investing tons of effort into bypassing anti-fraud stuff. It is a non-stop effort of incremental moves on both parties that will carry on for as long as FB users remain a juicy target.

monkeybutton · on April 4, 2022

Somehow I knew it was going to be a bit.ly link before opening the image

tupac_speedrap · on April 4, 2022

Content seems interesting but the generic corporate image at the top, crap font and off-black low contrast text colour is getting on my nerves.

throw03172019 · on April 4, 2022

Reader mode works great on mobile Safari.