Hacker News new | past | comments | ask | show | jobs | submit login
So this guy is now S3. All of S3 (chaos.social)
595 points by aendruk on May 4, 2023 | hide | past | favorite | 358 comments



Last night I opened this, saw the HTTP 429 and figured "ah too many requests, I'll check the comments and try in the morning". The comments were all people swooning in shock about why some non-specific they (S3? Amazon? Someone else?) didn't use ".well-known" and others complaining about Mastodon and/or the fediverse. I had to read multiple comments to piece together the story, I swear it was like Elden Ring[0].

What this is actually about: BlueSky is Jack Dorsey's new Twitter clone, it is eventually intended to be some sort of fediverse thing but it's not there yet and it's not the source of the fediverse gripes here. You can authenticate your BlueSky user as the owner of a given domain or subdomain by placing a certain file with a given content somewhere under that domain/subdomain. However that "somewhere" was just a location one of the devs at BlueSky chose, rather than somewhere relatively standardised, like under the ".well-known" path (which you might recognise from things like OpenID Connect where the configuration doc is located @ example.com/.well-known/openid-configuration). So one user exploited this and became the "owner" of that Amazon S3 domain by setting up a storage account on Amazon S3 and following BlueSky's setup instructions. That is the main story here - some non-Amazon rando is now officially the Amazon S3 guy on Bluesky.

The next part is that someone posted about it on this https://chaos.social Mastodon instance, which got overwhelmed, the owners decided to save their server by electing to return a 429 response for that specific post if users don't belong to chaos.social, and that is why people are upset about Mastodon.

Interesting story, but I'm not interested in Dorsey's version of Twitter 2.0 unless it actually allows you to signup[1] and brings something compelling that Twitter didn't and Mastodon doesn't.

[0] - game with an intricate story that does its damndest to not actually tell you. If you want to know the story you have to piece it together yourself by picking up dozens of items scattered throughout the game and reading all their descriptions. Or you can do what I did - watch a video on YouTube.

[1] - they're doing an open beta and letting a little trickle of users on, who post about it on their Twitter/Mastodon/whatever. Feels a bit deliberate, like they're trying to build anticipation and frankly I detest little manipulative things like that so I'm out


> [1] - they're doing an open beta and letting a little trickle of users on, who post about it on their Twitter/Mastodon/whatever. Feels a bit deliberate, like they're trying to build anticipation and frankly I detest little manipulative things like that so I'm out

Frankly this cynicism feels unwarranted. Bluesky is not a finished product — it is still being built and, even with the small number of invited users so far, there have been problems that have needed attention. The moderation story is still being developed, the feeds are still being tweaked, the app still has bugs, federation still doesn't work yet. Having some users makes for a valuable feedback loop but the team would rapidly become inundated and burnt out (and the platform would possibly turn into a wild-west hellscape with irreversible reputational damage) if they were to open the floodgates entirely at this stage.


To clarify - it felt like this was an attempt to replicate the mid-00s play of building interest by restricting who can join and making it exclusive (Facebook did this by rolling it out uni-by-uni, Gmail was for a while invite-only and invites were highly valued) and therefore desirable.

Maybe that's in my head but layering this feeling on top of BlueSky being yet another microblogging service with a few other things that I don't love contributes to my impression of Bluesky being simply "meh". If it becomes the next thing that everyone uses, I'll inevitably have to check it out, I'm not going to be an early adopter though.


Your feelings resonate with me too. My attitude these days is that if a platform wants to make me feel excluded (in order to induce FOMO), then I accept being excluded. They win, I guess?


> The next part is that someone posted about it on this https://chaos.social Mastodon instance, which got overwhelmed, the owners decided to save their server by electing to return a 429 response for that specific post if users don't belong to chaos.social, and that is why people are upset about Mastodon.

It's like all these newfangled webapps don't understand the concept of caching static pages for anonymous users. There is absolutely no reason that something like this should result in more than one request (plus a handful more for static resources) handled entirely by the frontent webserver's in-memory cache for each person linked from other sides. But instead its all dynamic and the page shoots off more API requests before being able to show anything.


So the thing is that in one respect they actually do get caching, almost to a fault. One of the complaints I've seen among some Mastodon instance operators is that they end up storing some pretty hefty amounts of data locally as their instance caches remote posts, images and profiles from other instances that its members follow. One source of problems, which may have been resolved, was that even though there's a job that cleans out this cache the banner images from external profiles stick around. I saw this a while back and it seems like an easy fix so I imagine it's been addressed.

I don't think I am equipped to diagnose what the root cause was here. It's even possible that this instance wasn't intended to have viral posts (i.e. high profile posts that get would get shared to many external users) and they didn't want to invest in hardware/services to facilitate this.


I think the GP was referring to caching on the other end, caching static html that can be raised for all anonymous users.

The question is whether the server was having issues with a flood of new posts being sent in and stored, or a flood of anonymous users clicking a link and blogging down when the same html was getting rendered over and over.

Knowing Mastodon, I have a bunch of was the latter with the server coming out on all the new data it was trying to store locally


Archived version of the original Mastodon post: https://archive.is/fM06z


You should still follow Jonty (the poster at that Mastodon instance) wherever your socials are, because he's awesome, and posts about awesome things. He's also the organiser of EMFCamp (https://www.emfcamp.org/), which is a nerd/hacker camping festival.


Thanks, will do! I'm always on the lookout for interesting people on Mastodon!


> However that "somewhere" was just a location one of the devs at BlueSky chose, rather than somewhere relatively standardised, like under the ".well-known" path

I've not looked into BlueSky's domain based identity thing in any detail so I might be missing a point somewhere, but… If someone can manipulate its special location what would there be to stop the same someone being able to manipulate content under .well-known?

Are we just relying on .well-known having some extra protection (in this case by Amazon having created a bucket called .well-known so no one else could)? If so then .well-known is little safer than any other arbitrary location in this respect (because you are relying on every domain owner who might be spoofed to take an action to protect against this, rather than the protocol failing safe if nothing is done by the domain owner) and perhaps using DNS would be better.


> Are we just relying on .well-known having some extra protection [...] ? If so then .well-known is little safer than any other arbitrary location in this respect.

If .well-known had just been invented, that would be true. It's fairly well established at this point, though. For example, if someone can create arbitrary files in .well-known, they are also able to pass http-01 ACME challenges and thus issue TLS certs for your domain (modulo CAA) and MITM you. At this point, allowing users to modify .well-known is about as good an idea as allowing them to receive mail for postmaster@ or accepting incoming packets from 192.168.0.0/16 into your LAN.

Amazon S3 specifically would not be vulnerable because bucket names can’t have dots in them; same for every other service that doesn’t allow those. Neither would services that prefix usernames with ~ or @ or similar, nor services that already use http-01 ACME challenges to get certs thus are already using that path.

I’d be much happier if proving domain control were only done through DNS challenges, but that ship has sailed.


Good point with other common services like certificates via ACME.

Though MitM that way requires more steps than faking identity this way as you need to somehow get in the middle or redirect traffic towards you.

> I’d be much happier if proving domain control were only done through DNS challenges, but that ship has sailed.

Agreed.


Aaaah! Thanks! Great write up!!


Just wanna share that I found your comment a lot of fun to read, even if I had already pieced the story together from other comments. Thanks!


this is everything I wanted to know and more, all in one comment—thank you


[flagged]


> They should've picked a proper language that's able to perform and handle those requests like Go or Rust. Building a social network, especially distributed, and using a low performance language... what were they thinking?

Even if a Go or Rust variant of exactly the same implementation would have been two or three times faster, it would not have survived the same onslaught. The way to survive that sort of accidental DDoS is not to change language but to improve algorithm choices where possible (where that makes a difference in order, not just a small difference in scale), make sure you are scaling efficiently over the CPU cores (possible in all those languages in various ways) available per node without busting memory limits, and by scaling out once that becomes a limiting factor.

This was effectively a DDoS situation (not an attack, but from the server's PoV there is no difference between a malicious DDoS and a hug-of-death from a mass of interested parties) and using Go or Rust instead of Ruby would likely have made no difference what so ever.

(the rest of your post is not discussing technical matters pertaining to the rest of the thread so I'll not encourage the off-topic-mess by responding in any detail, other than to say that sort of complaint wrt handling potentially contentious issues is common on social networks and not Mastodon specific)


Or they should have just put configured caching of static content. This has nothing to do with Ruby, nor even the architecture of Mastodon (the software) which is not great, but about a server not being set up properly - any static cache or, even better, fronting it by a CDN, will trivially beat the most optimised compiled dynamic content generating framework.


Someone on one server trying to get a user on another server is maybe a bit much, but I don't know the content of the posts in question. For me it'd have to be something pretty extreme to take those actions, otherwise if I saw something that irritated me I'd just consider ignoring or blocking that person depending on the severity.

Remember though that the war is a pretty delicate subject and that a position that from your perspective seems perfectly peaceful could be seen differently by others. So to a German the message "no weapons to Ukraine!" could be seen as de-escalatory, but to a Ukrainian it could be seen as a betrayal or it may remind them of bad faith pro-Z/Putin trolling even if that doesn't describe you or your intent whatsoever.


Barring context, I'm going to guess that this was simply regular users on chaos.social ticking the box in the reporting interface that forwards the report to the originating server - an option that's vital in a federated system when reporting anything that you see as bad enough rather than just a violation of your own instances rules, but of course will be used for things not against either instances rules all the time on any larger instance.


> Remember though that the war is a pretty delicate subject

That’s good advice for a dinner party, not a valid defence of censorship on a social network.


If you want your message to be heard, you're going to think about how someone will respond to what you have to say. This isn't unique to the fediverse, to social networking or even to the internet - this is just how humans interact.

If you want to just run your mouth and then complain about censorship when you get booted for violating someone's TOS or pissing off a mod, go nuts. No skin off my nose.


Here's the original email where I proposed .well-known:

https://mailarchive.ietf.org/arch/msg/apps-discuss/1_a06NU8z...

> 1) I feel that /host-meta is too casual of a name and prone to collisions. It matches /^[\w\-]+$/, which I think is a subset of a fair number of sites' usernames."

...

> i.e. put something ugly and weird in there, like a semicolon, to minimize the chance that it interferes with people's existing URL structure.


And later, how the semi became the dot: https://mailarchive.ietf.org/arch/msg/apps-discuss/j6KWTSTVC...

Fun bit of history!


This proposal made complete sense and the counter argument was of such a tiring kind. I see that from time to time in these old standards mail threads. I can't put my finger on it, but perhaps it is that they only saw the "now", not the future. Where the "now" was time when a webmaster were in charge of every website, and had complete personal control.


Thank you!


Things to learn about the FediVerse from the 429 error:

* The FediVerse is lots of WWW sites. Some are WWW-hosting companies showing off, with all of the acoutrements of high-end WWW sites, including CloudFlare protection and lots of tweaking of the back end stuff. Others are one-person sites where someone has just set up the vanilla Mastodon/Pleroma/Pixelfed/Friendica/whatever software on a cheap hosted VM somewhere. There are lots of in-betweens. I have an account on two sites, at each of the aforementioned extremes, one with well over 20,000 users and the other with around 40.

* It's really easy to deny service to the one-person sites, and many of the low-end ones.

* Chaos.Social's about page explains that it's a couple of people running a WWW site in their spare time on spare hardware. That's a little misleading, as they've upgraded the hardware a bit. But it's still 2 people, with ~5800 users. For more, start at https://meta.chaos.social/resources .

* There's nothing global in the FediVerse. Nothing gets sent everywhere. Some commenters here can see the post cached by their local WWW sites where they have accounts. But I'm in the opposite situation: None of the places where I have accounts have cached that post, and since the Chaos.Social sysop put the 429 error in place to combat the server overloading, they actually cannot pull that post with just its URL entered directly, although simple tricks like searching for @jonty@Chaos.Social instead and reading the user timeline work just fine.

* There's nothing global in the FediVerse. Using the aforementioned trick, I see a different view of the thread from Mastodon.Scot to what I see from Toot.Wales, and both of those are different to what's seen from other places.


Most of us are just browsing for interesting light reading anyway, so blocking us if they can’t serve a Hackernews worth of users seems basically… like an appropriate amount of robustness, for light reading.

Maybe the FediVerse is just not friendly to the idea of a “global top among thousands of users” rating.


Just FYI, that 429 was explicitly placed on the specific URL HN links to. The remainder of chaos.social is up and running perfectly fine.

[Edit: already mentioned above, I just misread it.]


The aforementioned trick wouldn't work if it wasn't. But before the 429 error was put into place the entire site was affected across the board by Hacker News. See https://chaos.social/@ordnung/110312020977014678 .


Ah, yes, sorry, I misunderstood your comment. My bad.


Thank you for the explanation, that actually makes sense. But I still think that serving a 429 is some kind of backwards old-school sysop kind of response. "It's the right HTTP-Code! hohoho!". It's obviously running nginx and serving a static copy of that url or setting up caching would take the same amount of time as serving the 429. It's 2023, it's possible to serve some thousand requests to static content on pretty much everything now.


So server software is not written with high performance in mind?

I've read somewhere that federation is done via regular HTTP requests which ends up really hogging down servers if someone has a lot of followers.


Mastodon is written in Ruby on Rails and there are some inherent performance issues with that, it generates a huge number of Sidekiq jobs that can bog down a server quite easily. There are other, non-Ruby implementations aiming for compatibility with the Mastodon API though, so I’m curious to see how it will all shake out.


People can make a case for the developer productivity benefits of rails to a startup or business, but it’s hard to see it as worth the cost to the Mastodon community as a whole. But maybe it’s won the Fediverse market because of the depth of features which is a benefit of that productivity.


My read is it's mostly social, there's a lot of people accustomed to Mastodon and not much interest in exploring other options. There are implementations in Elixir (Pleroma, Akkoma) and work being done in Rust (Calckey, currently node but moving towards a Rust implementation). Mastodon dev team are not particularly open to criticism so my general sense is that admins should choose another project.


Yes, I’ve explored all this but I think Mastodon has won mainly because of its admin functionality, a huge fraction of dev effort is focused on tools to manage content moderation, spam policies and so on. Pleroma has a bad reputation because it was behind on that stuff at one time - maybe still - and a lot of instances had to be blocked because they couldn’t be managed effectively enough. I think it’s easy to underestimate how wide that moat is.


That's fair, I've heard complaints that mastodon's moderation tools also leave something to be desired but it might still be the best. I haven't been an admin so I can't comment firsthand.


FWIW, there are some - maybe one or two - alternate implementations of Mastodon server brewing out there. Not of a generic ActivePub server, but specifically a server with an intention to be fully compatible with Mastodon.

So in the future, there may be more, hopefully more efficient choices.


While Rails doesn't help, it also isn't really the problem here. The problem is a mix of deployment instructions that are complex and doesn't emphasise the need for robust caching enough (this should be behind a properly configured Nginx cache, and the entire site also ought to be behind a CDN), combined with a Mastodon-specific architecture that as you say is really aggressively generating async jobs. Mastodon is really unnecessarily heavy to run.


For those not getting the context(like me), this seems to be about the Bluesky Social(https://bsky.app/), a twitter alternative.


Further context: Bluesky lets you use a domain name you own as a user handle.

The official method is to set a TXT record, but apparently their "AT protocol" also lets you confirm a domain by serving `GET your.domainname.com/xrpc/com.atproto.identity.resolveHandle`

and `xrpc` was available as an S3 bucket name :)


Yikes, why didn't they use a /.well-known/ address instead of inventing a new directory? This is entirely on Bluesky, not AWS.


Because tech bros always believe they have a better solution than battle tested standards.


The less edgy, pithy, probably correct explanation is that the over-worked developer wasn't aware of the standard.


Stunning that there are (were) any 4-char bucket names left.


I guess I'm not too surprised in that, unlike domain names, these aren't obviously exposed to end users, so terseness doesn't particularly matter. Verbose and descriptive is honestly better for most names.


And given that bucket names are a giant shared namespace, there's absolutely an incentive toward lots of prefixing to help ensure you get the ones you want.


A while back I made one with a name like "postgresbackups" and was floored to realise later it was a global name.


To this day I don't know why it's a global name. For R2 we looked at this, saw the massive annoyance picking bucket names, and made it scoped to your account. CNAME records are orthogonal and can be set up to point to your bucket with a few button clicks.


Oh yeah, also we're more secure by default. Granted S3 was built a long time ago maybe when security was an afterthought and such mistakes are harder to correct now.

Other things I think we do better on:

* The account is the top-level thing we publish a cert for. Without knowing the bucket name you can't really do anything. With S3's global namespace, each bucket has a cert published which makes all buckets discovered as soon as they're created.

* Not default open to the world

* The R2-managed public bucket cname is shared and the URL for the bucket is random (i.e. just a UUID). Additionally, if you delete and recreate the bucket with the same name IIRC that random UUID is changed.

* We have a lot of sensible extensions like automatically creating a bucket on upload (granted not possible for S3 since buckets are global), setting per-object TTLs, handling unicode more gracefully (I think normalizing the key name is a saner choice with fewer foot guns even if there's some potential compatibility issues when you try to synchronize from a filesystem where you have two files with different forms but same normalized), etc etc etc.


> ensure you get the ones you want

Also to try to avoid having to special-case any logic in terraform etc.

Say you're working on a family of sites for tradespeople like plumber.io, electrician.io, carpenter.io, etc. A fair number of people from India have "occupational surnames" like Miller, Contractor, Builder, Sheriff, etc. Suddenly one Mr. Dev Contractor registers a bucket "contractor-dev" and you have to special-case your bucket names in your terraform.


Yanks too. What do you think Smith, Miller, Farmer, etc. are?


Yep, when writing IaC I always just give it a prefix like "$project-web" and terraform adds a long string of numbers at the end. It's going through CloudFront anyways, so no one should be referencing the bucket name directly unless they're writing to it (and writers can just do `aws s3 ls` to find the name).


N=1, but it appears users often create long and overly verbose bucket names.


Path based bucket addressing isn't supported anymore, so this must be a legacy bucket: https://aws.amazon.com/blogs/aws/amazon-s3-path-deprecation-...


No, they indefinitely delayed that deprecation. It's still delayed. I bet[1] it never happens. They haven't figured out what to do with S3 VPC endpoints and buckets with dots in the name, which both to this day require path-based addressing and are both completely legitimate uses. They just stopped talking about this plan entirely and it's been years; I think it's dead.

[1] If they ever actually turn off path-style addressing, come find me and I'll PayPal you a dollar. I don't think it'll ever happen.


The person who did it is in this thread, and apparently you are not correct. It was created yesterday: https://news.ycombinator.com/item?id=35821113

(I don't know anything about this personally, but since a lot of people are indicating an interest in this detail of the story, figured I'd try and surface that link better!)


Thanks!


Any time! I was curious about this too.


Path style access is supported for new buckets, at least for now

https://docs.aws.amazon.com/AmazonS3/latest/userguide/access...


Sometimes it feels like companies fund weak competitors to discourage / drown out competition.


There's enough imperfection in the world that no conspiracy is required.


"Never attribute to malice that which can be explained by incompetence"?


Microsoft funded Apple to keep another OS vendor alive, so it's not about discouragement. It's probably a lot cheaper to fund a competitor than paying the gov't and getting tagged as a recognized monopoly



Quick summary: code from Apple’s QuickTime for Windows found its way into Microsoft Video for Windows. The Microsoft investment was the result of Apple winning a lawsuit.


How does Bluesky compare to Mastodon? (Other than letting you register S3 as your user handle)


Here's how I think about it:

* ActivityPub -> AT Protocol (https://atproto.com/)

* Mastadon -> Bluesky (https://blueskyweb.xyz/)

Right now, federation is not turned on for the Bluesky instance.

There are differences in both, however. I'm not going to speak about my impressions of the Mastadon vs Bluesky teams because frankly, Mastadon never really caught on with me, so they're probably biased. ('they' being my impressions, that is, I just realized that may be ambiguous.)

At the protocol level, I haven't implemented ActivityPub in a decade, so I'm a bit behind developments there personally, but the mental model for AT Protocol is best analogized as git, honestly. Users have a PDS, a personal data server, that is identified by a domain, and signed. The location of the PDS does not have to match the domain, enabling you to do what you see here: a user with a domain as their handle, yet all the PDS data is stored on bluesky's servers. You can make a backup of your data at any time, and move your PDS somewhere else with ease (again, once federation is actually implemented, the path there is straightforward though). This is analogous to how you have a git repository locally, and on GitHub, and you point people at the GitHub, but say you decide you hate GitHub, and move to GitLab: you just upload your git repo there, and you're good. Same thing, except since identity is on your own domain, you don't even need to do a redirect, everything Just Works.

This analogy is also fruitful for understanding current limitations: "delete a post" is kind of like "git revert" currently: that is, it's a logical deletion, not an actual deletion. Enabling that ("git rebase") is currently underway. Private messaging does not yet exist.

Anyway if you want to know more the high-level aspects of the docs are very good. Like shockingly so. https://atproto.com/guides/overview They fall down a bit once you get into the details, but stuff is still changing and the team has 10,000 things to do, so it's understandable.


Steve, it's "Mastodon" like the animal and like the band. It hurts to read 4 paragraphs of good relevant text and cringe every time you misspell the name. :(


Ah yeah. I struggle spelling certain words. This is one of them. Thank you and sorry. (I spell the animal and the band this way too. Working on it.)


At least you're not alone in this one - it's so common that e.g. anyone registering anything (domains etc.) with mastodon really ought to keep it in mine and register the equivalent with mastadon.


Hey thank you for the reply, I'm also sorry. I usually have better grace than correcting random strangers on their spelling, but I've shown weakness on this day. :)


It’s all good.

I think the funniest one I struggle with all the time is “parallel.” I always think it should be “paralell.” I put the two parallel lines in the wrong spot in the word!


Thank you for the overview!


You’re welcome. :)


Solution is also on the works like use /.well-known/, so this is more like funny, rather than a big problem.

Key to trick was to have bucket named "xrpc" and store a file there: https://s3.amazonaws.com/xrpc/com.atproto.identity.resolveHa...

There is also another funny thing in the image, the user posting about is sending one from "retr0-id.translate.goog", which is odd. Somehow he has got https://retr0-id.translate.goog/xrpc/com.atproto.identity.re... to redirect to his page, and gotten that handle as well.


Eh, it’s worse than just funny; it’s concerning, because they should have known about and easily avoided this kind of vulnerability, it’s standard stuff you have to think about. So what else have they missed?


This is a private beta. Nobody is suggesting that any of this be used for anything serious just yet. Development happens out in the open, you can go find out what else they've missed by doing the work, or by waiting until others you trust have done so.

I myself have had an account for like a month now, but only started really using it a week ago, because that calculus changed for me, personally.

Like, it's not even possible to truly delete posts at the moment. This all needs to be treated as a playground until things mature.

This isn't even the first "scandal" related to this feature already!!!! There is another hole in what currently exists that allowed someone to temporarily impersonate a Japanese magazine a few weeks back.


Dunno. That’s such a fundamental piece of thinking you just have to come across in the design phase, I don’t know how you would build a beta that didn’t avoid the issue in the first place unless you had a flawed take on security in the first place.


It is surely easy to cast stones at a single bug, but I don't think that's the right way to look at things.


I wouldn’t have made my remark if this would just be a bug, though. We’re looking at a bespoke domain ownership verification mechanism that doesn’t handle its primary usecase well, failing at something solved in lots of different ways over the past decades.

I have written atrocious bugs over the years, so I’m definitely not in the stone casting business here. However, I can’t see this as simply a bug, rather than a fundamental design flaw. And if an entity is both becoming infamous for reinventing the wheel, and attempting to fill a sensitive niche, I feel it has somewhat of an obligation to accept criticism such as that.


> We’re looking at a bespoke domain ownership verification mechanism that doesn’t handle its primary usecase well

Okay this is exactly what I mean. How well do you know the AT Protocol? Because this comment seems to indicate you just learned about it from this link, yet you're still making grand claims like this.

This method of validating your identity isn't the primary one. It's not even documented! It was added two weeks ago, as an API endpoint to help serve moderation and administrative needs. Turns out the URL structure of the rest of the API is a bad call for this endpoint.

> and attempting to fill a sensitive niche,

If you want to criticize AT Protocol on privacy issues, there are far more important things that are closer to the fundamental aspect of the design to criticize.


"We'll build our own validation instead of using one of the existing standards that make perfect sense." is not just "a single bug". It's a flaw in architecture.

A PR of "Change external domain validation to use .well-known (or DNS01, etc)" is not a "bugfix"


okay so clearly you don't know what you're talking about because they do use existing standards/DNS as the primary way to validate domain ownership. It's free to not say anything and read the comments first before going off about something!


>okay so clearly you don't know what you're talking about because they do use existing standards/DNS as the primary way to validate domain ownership.

I'm not going to speak for the commenter you're replying to, but I don't think anyone here is talking about the standards-compliant, DNS-based domain verification system. I think we're all talking about the non-standards-compliant, /xrpc/-path verification.


With any kind of authentication when you have an insecure method it does not mather whether you also have a more secure method - your authentication is only as good as the weakest alternative.


Okay, yes, but this indicates that they didn't read the ActivityPub before developing their own new shiny protocol.


Paul has lots of experience designing protocols. He designed SSB. ActivityPub does a lot of things wrong from first principals.

The whole point was to start from scratch.


> ActivityPub does a lot of things wrong from first principals

I'd be curious to learn about those.


There is a lot more information here: https://twitter.com/bluesky/status/1511811083954102273?lang=...

From my own understanding, the biggest useful differences for me personally is: account portability, domains as usernames and content-addressable from the ground up.

- Account portability - Useful if/when you want to move between servers

- Domains as usernames - Ties into the same value as account portability. I've owned my own domain for decades, it never changes and probably won't, until years after I die

- Content-addressable - Caching and syncing becomes so much easier, which is a huge issue Mastodon currently suffers from.


Since you seem to default to sending me to RTFM :D, I'll give you a similarly short reply:

ActivityPub can identify users based on their domain too. Probably better than BlueSky does, because it uses better standardized mechanisms - the URI needs to dereference to a valid ActivityPub actor and the community has converged to using webfinger for discovery. The fact that web-finger is generally used for user discovery makes it easier to use the identical mechanism that BlueSky uses - where the identity (which in ActivityPub is a URL) is not tied directly to a domain. (Eg, if you do a webfinger query for marius.federated.id you will get a response where it tells you that one of the URLs for the ActivityPub identity associated with that is https://metalhead.club/@mariusor, you can check it out right now with curl https://marius.federated.id/.well-known/webfinger?resource=h...).

Account portability can exist in ActivityPub because the verbs for signaling to the network that an object/actor has moved to a different URL are in the vanilla vocabulary. The fact that nobody has implemented this so far does not make it impossible. (It's not like anyone so far needed to move from BlueSky to ... I don't know... BlueSky. So it being capable of moving identities is still equally theoretical in my view).

Regarding your last point (or the one made about it in the twitter thread), I don't really understand about how identifying content by its cryptographic signature is conducive to better caching and "syncing" (how in the world a hash would make it easier to sync content than a URL I don't know). HTTP clients, servers and proxies have very good caching and syncing mechanisms for anything that uses URLs to identify resources. Whatever BlueSky wants to do, must invent their own intermediary layers before anyone will be able to say "it's easier" with any certainty.

In my opinion nothing you mentioned can be called a "doing things wrong from first principals(sic)" - and I'm still hoping that linuxdude314 can make a better argument.

ActivityPub is fine for what it was designed to be: an exchange mechanism for "low impact" social activity. It's not meant to interact with cryptocurrencies, it's not meant to shelter dissidents from corrupt governments, it's not meant to help you interact with your drug dealer, nor whistle-blow on your employer. There are already options for those things. It is meant to allow your grandma to like your cat pictures in a more distributed manner than facebook offers. The people that imagine BlueSky will be doing something more than that, are - in my opinion - vastly overevaluating it.

(PS. Apparently this was not "similarly short", apologies.)


I don't work on the AT protocol, and don't have any deeper insights into it, I just started reading about it a week or two ago and still putting all the pieces together myself. I linked the twitter thread not as a "Go read this you fucker" but more like "there is no point in me repeating what has already been written elsewhere". I'm just trying to help understanding, not convince you of something, I have zero horses in this race :)

But something I can answer directly to as I have deeper expertise with it, is this:

> how in the world a hash would make it easier to sync content than a URL I don't know

URLs are pointing to a location while content-hashes point to specific pieces of content. Moving from URLs to hashes as URIs gives you the benefit of being able to fetch the content from anywhere, and cache it indefinitely.

Basically any large distributed system out there, no matter if it deals with caching or not is built on top of content-addressable blobs, as it reduces the complexity by magnitudes.

Suddenly, you can tell any of your peers "Give me content X" and you don't really care where it comes from, as long as it is verifiably X. Contrast that to URLs which point to a specific location somewhere, and someone has to server it. If the URL is unresponsive, you cannot really fetch the content anymore.

Content-addressing used in this manner is not new or invented by Bluesky, but a old concept that has been used for multiple use cases, caching is maybe the most common one, but definitely not the only one. Probably the first time I came across it was in Plan 9 (Venti) around ~2000 sometime. First time I actually used it in production was with Tahoe-LAFS, which must have been around ~2010 sometime I think.


You can treat a URL as a hash into a content-addressable store just fine. Mastodon does just that. Yet that URL also tells it where to retrieve the content if it's not available locally in a way that doesn't require tools to have any additional knowledge. If they do have additional knowledge, say of another caching layer or storage mechanism, they can use that just fine.

That is, I can just paste the URL for this article into my Mastodon instance, and if it has it, it'll fetch it from the local storage, if it doesn't it'll try to fetch it from the source, but there's nothing preventing a hierarchy of caches here, nor is there anything preventing peer to peer.

But while ActivityPub says that object id's "should" be https URL's for public objects, the basic requirement of ActivityStream is just that it's a unique URI, and there's nothing stopping an evolution of ActivityPub allowing URI's pointing to, say, IPFS or similar by content hash instead of a https URL.


I don't personally believe that one mistake indicates ignorance of an entire topic.


In general - no, but this kind of fundamental mistake might.


I hope I never work on software you folks use. The grand claims about something that is not even hard to fix is just wild to me.


Being easy to fix is completely irrelevant. The thing is that it's easy to avoid. The only way to end up there is to not put any thought into the domain verification scheme before deploying it. Any kind of review would catch it. That's what makes it look really bad.


If your response to a fundamental design flaw of identy verification is "something that is not even hard to fix" then that hope is mutual.


What about the next 500 easy-to-fix bugs?

Is there a public test suite?


> Is there a public test suite?

The entire specification (which is admittedly incomplete) and implementation are open source.

I am not aware of a dedicated test suite for alternative implementations. It's too early, IMHO. I personally would much prefer the team to focus their time elsewhere for the time being.


My instincts may be way off-base, but if I was developing a protocol at the core of my product vision, even if there was only one implementation, I would a want an authoritative test suite. I wouldn't trust myself not to integrate load-bearing idiosyncrasies (and bugs, honestly) otherwise.


I mean, I don't think you're incorrect, but I do think that like, semantics matters here. There is a test suite, of their implementation. Just because they have not extracted it and made it easily re-usable for alternative doesn't mean that they don't have checks to ensure regressions don't appear, you know?

They already build off of many related specifications, which have independent implementations, and a lot of the core protocol is RPC style, with schemas that they do publish. So there's already a lot of rigidity here for alternative implementations to use in a way that is extremely likely to be compliant.

I guess another way of putting it is "I don't exactly disagree with you but doing that takes work, and we're at the stage of this stuff where that work is being done, so expecting it right now feels premature to me." The spec isn't "released" or "done," it's in development.


The pull of NIH is a strong one.


But this isn't a implementation issue. This is a fundamental design issue. If their design philosophy is to throw stuff against the wall and see what fits then I don't see this as ending up as better than the existing fediverse.


Are there any Rust implementations of the protocol yet :vv:


Multiple, in varying degrees of maturity. And I'm also writing one from scratch, don't know if I'll bother to share it with anyone though, I just want to learn more deeply, and implementation is the best way to do that.

I have my eyes on https://github.com/sugyan/atrium as a foundational library in this space, and expect folks to coalesce on it. But we'll see.


For me, the worst thing about it is that they didn't just use webfinger. So webfinger isn't perfect, but it's there and in use. When they choose to invent new mechanisms for things there are perfectly serviceable options for, it makes me instantly sceptical of the rest.


Wouldn't be funny if it was a public beta that they want people to use for serious stuff. But it's neither serious, a beta or public, but basically a private alpha for playing around, so i'd be a bit lenient on screwups.


Google Translate recently moved translated web pages to domains like this. If you plug a webpage into GT it will put the translated content under <domain>-<tld>.translate.goog. This user's actual domain is https://retr0.id


Oof. This will not be the last time that decision causes a problem.


Reminds me of people taking the username “admin” or “hostmaster” at a free email service and being able to get domain verification emails.


Wait - nobody had ever created a bucket named xrpc before, ever? I would have imagined that short s3 buckets were squatted similar to domain names. (Or maybe they were, and it's this person who did so!)


There's an account bucket limit, so you'd need to create a huge number of AWS accounts with no immediate benefit.


> 429 Too Many Requests

Aight, level with me: Is every mastodon server running on a Raspberry Pi?


From the instance admins:

> We just started serving a http 429 error on the exact url of the post. So everything should go back to normal now.


So to answer GP's question, yes.


If you go to the user's profile and then to the post, it seems to be okay. So perhaps also looking at Referrer.


Doing such a thing never requests to the page that is linked, so it makes sense that nothing is blocking it


for the time being, archive.org has a snapshot of it: https://web.archive.org/web/20230504185520/https://chaos.soc...


The server itself seems to work fine. It only seems to be this specific post that's being 429'd. I'm guessing it's some kind of anti-DDoS setup kicking in.

Mastodon is also quite heavy to host, my single user instance will easily gobble up several gigabytes of memory if you let it. There are more efficient ActivityPub servers but specifically Mastodon seems to be written for running efficiently on huge servers.


Or running efficiently never maybe?


It will definitely "never" (barring fairly significant changes) run efficiently, you're right. It's extremely unnecessarily heavy in all kinds of ways that are made in ways that makes it run overall better on a large setup, though. All the instructions are also there to front it by proper caching, but setting it up in a proper resilient way is more effort.



unfortunately this is exactly why mastodon won't work

I already don't trust mastodon links because 9 times out of 10 they simply don't work. Everyone's tiny hobby server falls over when one post gets big, and obviously not everyone is going to scale their servers to support the load of a viral post that might happen once every 6 months and will be 100x their base load


That is the benefit of centralization, the experience for the end user can be controlled completely. Maybe a Mastodon friendly web cache that anyone running a semi-serious instance could easily opt into (for a fee) is needed. As a hedge to keep your Raspberry Pie instance online if something goes viral.

As a community effort where no one is expecting to get rich it might work.


When someone finds an annoyance, often even anecdotal, that is no evidence of why "Mastodon (or the fediverse) won't work".

It's an annoyance, often anecdotal at most. Not the foundation of why a platform cannot ever "work".


Mastodon is by design about small niche communities rather than centralised twitter alternative.


That is not the point.

If someone sent any link or post that is from that Mastodon instance and it went viral, the entire instance will be sent to the ground and out for hours, making the post unavailable to be viewed.

The worst part is journalists and the media have to be told that posting a link from a 'small niche community' on Mastodon will send a flood of traffic that will knock it down offline also giving the impression to others that it is not ready for mainstream at all or even ready to onboard on tens or hundreds of millions of users, daily like Twitter.


It's a fair criticism but I don't feel so fatalistic. This would look a lot different if everyone was opening this post on their own home server instead of chaos.social.

Unfortunately there's no way to construct a link that references the post but opens where it belongs for you. I think there needs to be a fediverse URL protocol to solve for this, ie this HN post would link to `fedi://@jonty@chaos.social/110307532009155432`, then when people clicked it they wouldn't have to talk to chaos.social, because it would be opened at their home server.

Another option could be a 'copy link for public access' that generates a static page for the purpose of sharing widely.

Journalists and media could also run their own server which is scaled for the traffic they expect, and mirror the post there. The main problem is linking to the source of the post instead of linking to a federated representation of it.


I think this is an absolutely absurd take. A post is the same post no matter who views it - it belongs on the instance where it was posted. Sure it might show up in your timeline or comments somewhere else but for the post itself there should only be one canonical link. If mastodon can't manage to show a simple text post with a small image to anonymous visitors without falling over then it's mastodon that needs to change and not how people interact with it. Most people don't even have a fediverse account ffs.


Where is the canonical location to access an email, or read an XMPP message? It's not just that it 'might show up' in my timeline, seeing it on my home server is where I want it to be - that's where I can take actions on it like replying, starring, or boosting. The post belongs in my client because that representation is the one that's relevant to me. I agree that the mastodon software could do better to optimize for public anonymous read, but it's not the most important functionality for the server to do.

> Most people don't even have a fediverse account ffs.

This is why you won't see a Bluesky post linked on HN, no one can open it. Imagine if you could sign up on your choice of thousands of servers and get the same access to the content rather than a central site, that's fediverse, it's not that complex.


Seems like a correct impression, then.


You can also see the same "toot" as a "tweet": https://twitter.com/jonty/status/1653915932677271552


And the original tooter is apparently Google Translate?


This post is on the front of HN. Many a larger website have succumbed to HN's warm embrace.


Isn't HN pretty small? This post has <400 upvotes over 3 hours. There can't be 1000x that amount of lurkers can there?


I can provide some statistics myself.

One of my blog posts was submitted to HN that had 194 points and 149 comments[1]. All dates are in UTC.

  1 - Unique visitors per day - Including spiders
  Hits       h%  Vis.     v%  Tx. Amount Data
  ------ ------ ----- ------ ----------- ----
   14439  1.49%  1148  1.19%  106.42 MiB 21/Jan/2023
   17043  1.75%  1754  1.81%  184.69 MiB 20/Jan/2023
   33560  3.45%  3267  3.37%  491.32 MiB 19/Jan/2023
   46568  4.79%  5816  6.01%  637.54 MiB 18/Jan/2023
  323797 33.32% 28928 29.88%    4.06 GiB 17/Jan/2023  <- Resubmitted on HN and websites started copy-pasting the article from the big website with the same mistakes, never checking my post which had a note about these mistakes :)
   24330  2.50%  3341  3.45%  360.48 MiB 16/Jan/2023  <- Put in a second-chance pool by a moderator and an article with a lot of mistakes published by some big website
   17074  1.76%  3348  3.46%  243.44 MiB 15/Jan/2023  <- Published on HN
    1041  0.11%   120  0.12%    3.70 MiB 14/Jan/2023
    1666  0.17%   171  0.18%    8.40 MiB 13/Jan/2023  <- Post published
     991  0.10%   123  0.13%  374.78 KiB 12/Jan/2023

  2 - Requested Files (URLs)
  Hits      h%  Vis.     v%  Tx. Amount Mtd      Proto    Data
  ----- ------ ----- ------ ----------- -------- -------- ----
  57604  5.93% 31427 32.46%  260.97 MiB GET      HTTP/2   /en/2023/01/13/msi-insecure-boot/
  31179  3.21% 11263 11.63%  245.20 MiB GET      HTTP/1.1 /en/2023/01/13/msi-insecure-boot/

  11 - Referring Sites (depends on Referer header, not very accurate for reasons)
  Hits       h%  Vis.     v% Tx. Amount Data
  ------ ------ ----- ------ ---------- ----
  446781 45.97% 29686 30.66%   5.95 GiB dawidpotocki.com
   14834  1.53%  9485  9.80%  79.85 MiB news.ycombinator.com
  (news sites with very low hundreds or even under, nobody checks sources)
[1]: https://news.ycombinator.com/item?id=34388533


HN has millions of page views per day (maybe @dang can give a more accurate and updated number), and things frequently gets reposted elsewhere. Happens many times that things on the frontpage gets brought down to its knees, this wouldn't be the first nor the last.


Source on the millions?

This person says they got 12k visitors over a day:

https://nicklafferty.com/blog/what-happens-when-you-re-on-th...

The websites hugged to death by this forum are usually tiny hobby projects.


Dang from 6 months ago https://news.ycombinator.com/item?id=33454140

> There's no stats page but last I checked it was around 5M monthly unique users (depending on how you count them), perhaps 10M page views a day (including a guess at API traffic), and something like 1300 submissions (stories) and 13k comments a day.


Forgot to mention, but "I was on the HN frontpage for X hours and got X views" doesn't always translate to the same happening for everything. Some topics are more interesting to the people just browsing HN, than others. I'd expect an article titled "I spent $6 Million On Google Ads Last Year" to be significantly less interesting than "MSFT is forcing Outlook and Teams to open links in Edge and IT admins are angry" for example, where the latter would surely gather a magnitude of more visits than the earlier, even if they would spend the same amount of hours on the frontpage.

Some content is simply more interesting for a broader audience.


Hacker News has 3.4 million users per month and 350,000 users per day, with 4 million pageviews a day. There are just under 1 million registered accounts, with several hundred added each day. Users post around 1,000 articles and 6,000 comments to the site per day. https://blog.samaltman.com/2017-yc-annual-letter


Not the OP but they were referring to the whole site. So definitely not millions, but the number is probably higher than you think.

From the blog you linked, the number of interest is 18k. 12k are only those with HN referrer headers. In reality, many setup strips that header so you can't track it exactly right. The author did mention they averaged 50 views before.

A big part of it are reposts. From my own submissions, posting to HN resulted in tons of different origins. Public ones like reddit, twitter and private ones like newsletters, dashboard & chat messages. You'll also be surprised by the wide variety of clients people use to access HN.

They also used Google analytics to track the numbers. Most people in HN block it either through the browser or an extension [0]. In reality it's probably double the traffic.

Don't forget to account for scraping & crawling bots. That's another big source of traffic that the author didn't track.

[0] https://plausible.io/blog/google-analytics-adblockers-missin...


I’ve seen other people that posted about the HN embrace talk about 50k extra visitors. I guess this is a single page, so 50k pageviews?


Indeed there are. Tens of us!

Maybe you underestimate how many people want to keep up on things but not interact?


More than tens I'd say. I suspect for each person that interacts, there are dozens that don't. If I have to bet, I'd put the ratio at 1:100. So 100 lukers for every 1 active user.


Just look at it failing the HN hug of death. If this one can't survive techies on a orange site rushing to the site then it cannot possibly survive a lotus of users from Twitter or TikTok rushing into any post on Mastodon, bringing it flat on to the floor.

I can only see Mastodon centralizing to cope with the load. But a server going down on this load from HN tells us it is no where near ready to handle an insurmountable amount of users or even begin to challenge Twitter which hosts 220M+ users every single day.


Mastodon user count has mostly been a steady growth. So far it hasn't really failed at a high level. We aren't in Bitcoin territory, the network isn't really slower than it was a year ago even if number of user is much higher. It's mostly distributed among many instances.


Twitter is pretty sluggish, to be fair. 7 seconds to load and render a single tweet on mobile.

https://pagespeed.web.dev/analysis/https-twitter-com-realDon...

Mastodon.social is actually much faster on this particular benchmark. So maybe there is hope.


No trouble viewing it from another Mastodon server:

https://hachyderm.io/@jonty@chaos.social/110307532115312279

EDIT: Ah I guess if you're not logged into a hachyderm.io account, you get forwarded. So probably don't use the above link.


That just redirected me to too many requests.


Maybe, but the admin commented it was intentional for that specific post, it was slowing down the entire site.


> slowing down the entire site

This is mind-blowing. Last I checked, the front page of HN sends tens of requests per second to each link. There are humans who can pack envelopes faster than the typical mastodon server can answer GETs. I'd love to see someone benchmark the top servers for a few seconds to see what it takes to break a reasonable latency SLA.


It has been 20+ years since slashdotting with the requisite hardware and connection upgrades and still things fall over.


Clearly they're not microservicing hard enough.


I suggest making the current team explain to 4 new teams how to port it to something fast, like elixir and rust (both!)

Probably messaging with pulsar and the build system from python 4, too.

I read this in a whitepaper. Let’s do this, guys! ;)


one alt-social crumbling from throwing rocks at another alt-social

This is peak Web 5.0 right here.


Aren’t federated services grand?


[flagged]


> It looks like these links auto-redirect when I access them from here, but when you access them from the homeserver they are served without redirect

"Works on my machine" isn't going to cut it for running a popular social network

Nobodies going to go around searching for mirrors, they'll just leave and go back to twitter


[flagged]


I can't see it today. I won't remember about looking at your post tomorrow, but I will remember that your website didn't work for a few hours yesterday, so I probably will not visit again.

I don't really care how you implement your fediserve gimmick, I haven't received it. Seems like lots of people will not receive it either.


If fediverse was going to succeed anywhere it would be on places like HN. The average user doesn't even know what "federation" means.


The fediverse has a problem with discovery, I agree on that piece. Coming from outside the network and trying to access a particular post or user profile is not smooth, mostly because it takes you to the wrong server (I don't use chaos.social, links to there are useless to me).

I've explained in other comments how a URL scheme would help with this.

There are many non-technical users on fediverse and it's working just fine for them, I see their posts all the time saying that they're having a good time despite your scare quotes. The problem in this case is HN users who aren't on the network anywhere, and there's not much I can do about that. I think if you were on it and had a home server this would not be that confusing, you'd just search for the post. Ideally you could skip the search step as well which is why I keep coming back to a URL scheme solution.


They will just miss out on low-effort posts from sites with semi-technical populations, like this one. It isn’t a big loss really.


Found a similar situation, but I think the key is mostly if you're on one of those servers and seek out the content or if you're logged into one of those servers, it won't forward you (even if you click it from here, assuming same browser/container).


This works from a Calckey instance, just confirmed. https://calckey.social/notes/9ebxxsy83i


yah that's fine, but the replication to other domains does no good for the typical user on the web who only has a link to the original URL, clicks it, sees an error page, and then clicks away


I understand. In my view it will be important to have a fediverse URL protocol so that these links are not aimed at the original domain, and instead open at your home server, or in a mobile app connected to the home server.


Lol this is great.


Try making a link to your email address without knowing which email provider your audience is using and you'll understand the utility.


Okay, I guess I'm not getting it either, but how is it federated/decentralized if they all redirect to the original server which throws a 429?


The federation occurred when the post was made, it was sent out to other servers and stored there. Then it got served to those users directly, without further contacting chaos.social.

To understand how it's decentralized, search for @jonty@chaos.social on your own fediverse home server and the post will pop up.

There's a particular user story here which is a bunch of people who don't have accounts on any server, wanting to see the content from a central location (chaos.social). I do think it's worth talking about this story and ways to fix it but it's not really accurate to blame it on the federation behavior.


They redirect you if you aren't logged in so you can't use them as an anonymous proxy. If you're logged in on your homeserver, you'll get that server's view of the post.


Could you expand on why being an anonymous proxy would be an issue in this case? I can't think of anything interesting off the top of my head.

You can't post (because you're not logged in), so there's no issues with moderation. The toot is already federated publically, so there's no issues with unintentional read access. It doesn't need to contact the original server, so there shouldn't be any load/DDoS issues. I must be missing something…


I produced those links by finding the post on each server so I was also surprised that they redirect when accessed directly. I mean it showed me the post without being logged in, and then I copied what was in the URL bar. Given how much browsing is available without being logged in I agree this should be fine if it loaded normally.


ugh I wish I had tried to generate these through a non-mastodon instance, could have saved a lot of confusion. It works fine through Calckey:

https://calckey.social/notes/9ebxxsy83i


All of these redirect to web.archive.org. Are they using it to offload traffic? That doesn't seem very nice.


No, all of these redirect to the original server when viewed by not-logged-in users. The original server has chosen to redirect to web.archive.


None of these links work for me. 429's on all of them right now.


You're being redirected to chaos.social because you don't have an account on those other servers.


To be fair, each of those is returning 429 at the time of this post.


Translating, I that's only the experience for "non logged in users". Mastodon isn't meant to handle load from the the anonymous public, it's meant for a federation of servers.


Ironically, I've seen pages make it to the front of HN that were run on RPis or weaker.

There was a guy running a site off of a single core 32bit ARM SoC that was able to handle the HN frontpage.


chaos.social is run by the chaos computer club, you can assume that they configured it that way on purpose.

my profile, on the same server, loads fine.


This was both funny and ironic at the same time.

Upon clicking reply fastidiously, i got the hn 429

"Sorry, we're not able to serve your requests this quickly."

Wow.


Looks like we hugged it to death


Sites like HN and that other one should track this. It could work like flagging - if enough people mark it as hugged to death (HTD), it would say so next to the link. Maybe it could even redirect to an archive if it’s currently HTD.


lightly blew in its general direction to death


chaos.social is run on four dedicated servers

https://leah.is/posts/scaling-the-mastodon/


Because it is webscale.


This is why mastodon , webfinger and ACME uss .well-known uri prefix. .well-known is reserved and you can't e.g. make a bucket named .well-known

It's funny the bluesky devs say they implemented "something like webfinger" but left out the only important part of webfinger that protects against these attacks in the first place. Weird oversight and something something don't come up with your own standards


> This is why mastodon , webfinger and ACME uss .well-known uri prefix

This is not how Mastodon does verification (at least not the main method). Mastodon doesn't just link users -> domain. It can link user -> webpage, for example to link social profiles between sites.

If you have a website with user generated content, and a user can set an arbitrary html attribute (rel="me") in a link pointing back to their profile, they can claim ownership of the page on Mastodon. Likewise, if they can set a link tag in the head element of the page for some reason.

Presumably this is somewhat harder to exploit than a (new, poorly thought out) dependency on a static file under /xrpc, but Mastodon does introduce more authentication footguns for sites than just .well-known! https://docs.joinmastodon.org/user/profile/#verification

Edit: authentication -> verification, since Mastodon distinguishes between the two (see below)


Neither of these are 'authentication'

You're thinking of how Mastodon does verified links. You could do something similar, provide a verified link on your profile to a file in an S3 bucket, but there's very utility (or risk) in that.

Mastodon also allows you to be discoverable via a custom domain, using .well-known as parent mentioned https://docs.joinmastodon.org/spec/webfinger/ https://www.hanselman.com/blog/use-your-own-user-domain-for-...


I'm thinking, mainly, of how these features are exposed in the UI and how users experience it. What matters is that users take (rightly or wrongly) a verified profile link to mean "I control this webpage". So e.g. if you could verify a Twitter handle on Mastodon, it would mean "if you trust the identity of this Twitter handle, you should also trust the validity of this Mastodon user". That's extremely important to get right no matter what you call it.

I'm not sure what Bluesky was attempting to do here but what they achieved in practice was allowing a user to claim control of a domain by claiming control of a page. But if you allow user generated content on the home page of your site, there's not a distinction (from a Mastodon user point of view) between the two. It's effectively the same problem if I can "verify" yourdomain.com on Mastodon - and my point is that you can do that without using .well-known.


> But if you allow user generated content on the home page of your site, there's not a distinction (from a Mastodon user point of view) between the two.

If you allow UGC with *arbitrary HTML* or explicitly support generating rel=me. Both are you explicitly giving someone control of the site (or at least letting them claim they have it).


What about serving the challenge file from the root or a near-root of the fully qualified url? Like www.domain.com/mastodon.txt or abc.freehost.com/mastodon.txt?

Maybe I'm old but what are some popular use cases for webfinger? (I'm just learning about it now)


The /.well-known/ path prefix is the standard name to use (https://www.rfc-editor.org/rfc/rfc8615) so that any sort of “we’ll host user content from our domain” thing can block it. (Hosting user content from the user’s domain is fine and doesn’t need this restriction.)

A few things are effectively grandfathered in due to their vintage: /favicon.ico, /sitemap.xml and /robots.txt are the three that occur to me—so if you’re running something vaguely like S3, you’ll want to make sure users can’t create files at the top level of your domain matching at least those names.

But nothing new should use anything other than /.well-known/ for domain-scoped stuff, or else you run into exactly this problem.


> A few things are effectively grandfathered in due to their vintage: /favicon.ico, /sitemap.xml and /robots.txt are the three that occur to me—so if you’re running something vaguely like S3, you’ll want to make sure users can’t create files at the top level of your domain matching at least those names.

I also recall /crossdomain.xml as an important one; allowing users to create an arbitrary file matching that name could allow certain kinds of cross-site attacks against your site.


I think crossdomain.xml died with Flash but I could be wrong, does anyone know?


None of the standardized web technologies use crossdomain.xml, but I think Acrobat Reader still uses it for... stuff. And acrobat still has a browser plugin, so I guess it's still a potential vector for abuse.


ah! Reader. That's a fun one. I once encountered an "Acrobat Reader-only" PDF that after filling out and selecting any applicable attachments on your filesystem you then... literally put in your credentials to the website in the PDF so that it could.. submit itself. I lost some braincells seeing that..


Oh man, then you really don’t want to know about a product I once created.

Reader could have an optional Flash plugin, and better yet, you could configure the PDF interactive plugin to dynamically download the swf file to run.

I built an entire Flex based rich UI that was dynamically loaded by the 1kb PDF you’d receive in email, the Flex app retrieved and posted data via HTTP APIs.

Because reasons.

That product was live for years. I think we shut it down as recently as 2 years ago.

To be 100% clear, wasn’t my idea.

But it was my mistake to joke about the absurd possibility to build such a thing in front of some biz folks.


oh looooooooooooord. O_O


https://twitter.com/subtee/status/1654858616065732609?s=12

in an interesting coincidence, I found this today!


impressive, but still haha


But no browsers support 3rd-party plugins anymore. (I think the Chromium PDF viewer might be a plugin internally though?)


I learned something new today. I guess .well-known's purpose isn't well known!


The most important people to know about this stuff are the people for whom it's effectively part of how to do their job correctly. I know what it means if there's a flashing single amber light on a railway signal in my country, but it's not important that you know, and wouldn't be important if I'm wrong, however it's very important that the train driver knows what it means.

You'd hope that people doing job X would seek at least some insight into whether there are best practices for doing X, even if it's not a regulated job where you're required by law to have proper training. Not so much unfortunately.

Example: Many years ago now, early CA/B Forum rules allowed CAs to issue certificates for DNS names under TLDs which don't exist on the Internet. So e.g. back then you could buy a cert for some.random.nonsense and that was somehow OK, and people actually paid for that. It's worthless obviously, nobody owns these names, but until it was outlawed they found customers. But, even though the list of TLDs is obviously public information, some CAs actually didn't know which ones existed. As a result some companies were able to tell a real public CA, "Oh we use .int for our internal services, so just give us a certificate for like www.corp-name.int" and that worked. The CAs somehow didn't realise .int exists, it's for International Organisations, like ISO or the UN, and so they issued these garbage certificates.

[Today the rules require that publicly trusted CAs issue only for names which do exist on the public Internet, or which if they did exist would be yours, and only after seeing suitable Proof of Control over the name(s) on the certificate.]


That is basically the idea of .well-known

Webfinger is when you want to multiplex multiple identities on a single domain

E.g. https://example.com/.well-known/webfinger?resource=nick@exam...

Will serve the challenge proving your handle is @nick@example.com


.well-known is basically the same idea https://datatracker.ietf.org/doc/html/rfc5785


Or why not just serve it from www.domain.com/.well-known so we only have one thing to block. :p


s3 supports my-bucket.s3-us-west-2.amazonaws.com style URLs as well


In fact the path method is deprecated, but I don't know if support will ever be removed, because some (old) buckets have periods in their names, and therefore don't work with the subdomain format.


I use S3 for a simple app landing page and I use .well-known for deeplinks... Unless I'm misunderstanding your comment which I probably am.


Yes, but you don't have access to s3.amazonaws.com/.well-known, only to yourdomain.s3.amazonaws.com/.well-known.


Ah yep. The article is down so I didn't understand the context, ignore me!


Nostr too


.well-known seems unintuitive

Also the penalty isn't very high here. Someone impersonated a domain on a burgeoning protocol for a short while. So what?


> .well-known seems unintuitive

We're talking about folks setting up a custom domain for a personal social media presence. If they can handle nameservers and DNS records, they can handle a folder with a dot in the name.


but it disappears when you add the dot.


They can and probably should but what if they decide not to?

That's the problem with expecting people to agree with and follow standards.


If they decide not to, then they get all the capabilities, responsibilities, and level of participation that come with not following a standard that others are expecting.

You've effectively described what happens when people don't agree.


There's already a strong precedent for something like .well-known being disregarded — the ~/.config directory. It's the same idea, a special directory starting with a dot, and the objection seems to be similar, that it's awkward. In the case of the config directory it's that the storage for an app is spread between multiple directories like ~/.local/share and ~/.cache instead of one directory like ~/.vim

https://wiki.archlinux.org/title/XDG_Base_Directory

I support both well-known and XDG because I think the benefit outweighs that perhaps they could have been designed better. But I don't think that those who opt out of it could only be doing so out of ignorance.


~/.config is an interesting contrast. The difference is .well-known has different producers and consumers, webmasters and web clients, respectively. Whereas the thing that uses an application's config files is the same as the thing that created it.


With .well-known they're sometimes different components of the same tool, like with letsencrypt. That's a good observation though. I hadn't noticed that.


This is the second time one of my posts has caused issues for the chaos.social admins. I am so, so sorry.

The hacker news DDOS is real.

Previously: https://news.ycombinator.com/item?id=34691489


The interesting thing to consider is that someone who is popular to read may well be as much of a headache for the local sysop as someone who is a frequent target for attacks. How long until they have a 'bot that just 429s all of your posts on sight, do you think? (-:


On a more serious note: Given that many user communities in the FediVerse are people conscious of their privacy and of suffering pile-ons by large majorities of outsiders, I wonder how many more examples of this there will be before people start asking the developers for out-of-the-box defaults that simply blacklist all requests that come in with referrer headers containing news.ycombinator.com and other similar "Slashdotting" sites.

It's not beyond the bounds of possibility. Compare https://github.com/mastodon/mastodon/issues/15431 .


Everything has happened before and will happen again: https://www.wired.com/2015/11/how-instagram-solved-its-justi...


Sorry for what, offering a free lesson in cache optimization? :)


This is a terrible implementation of domain verification. dns-01 and http-01 are more or less standardized at this point. Use them, and don't roll your own. Reference: https://letsencrypt.org/docs/challenge-types/.


They definitely should have used HTTP-01 if they’re doing verification on the web, but since this is about using a domain as identity this really belongs in DNS.

The issue with DNS-01 (and HTTP-01 to a lesser extent) as someone else mentioned is that the user friction is really high.

I’ve been working on a solution to this that I’ve been meaning to post to HN and this seems like as good an opportunity as any so here it is: [1]

It’s a method of storing a hashed (and optionally salted) verifiable identifier (think email or mobile) at a subdomain to prove authority for a domain.

1. https://www.domainverification.org


> this really belongs in DNS.

And the primary way of identifying yourself is in fact DNS.

> I’ve been working on a solution to this

Your solution is almost identical to the BlueSky one: put a TXT record at _atproto.<domain> that resolves to a DID. The difference is that they mandate the DID spec and you do not. Which is totally fine! Just figured I'd let you know :)


Thanks for taking a look and for your comment.

Another key difference is that the _atproto TXT record is discoverable since it’s always at _atproto. Whereas the “verifiable identifier” I use isn’t discoverable because it’s hashed and used as a dns label.

The ultimate goal here would be for these records to be populated by domain registrars upon a domain being registered (with registrant’s permission obviously).

This could create a kind of fast lane for domain verification across providers like Google Ads, Facebook, Office365 and everyone else that requests DNS verification.

The worst thing is that hundreds of providers request domain verification TXTs at the zone apex:

dig target.com TXT


I don't get http-based verification in general. If you want to really prove someone owns a domain, make them change an authoritative DNS record. Everything else feels like it is begging for edge cases to crop up. Why should my social media or SSL certificate vendor care about my web servers?


I worked on a product that required DNS changes to set up. Especially for business accounts, the level of friction was STUNNING. We had it take months to get set up because the contact had to submit a ticket to IT, write up the business justification, get director level approval, get security approval, and so on before it could get done. We had customers who couldn't even figure out which group in their company managed DNS. Yeah, you can argue that those companies are broken, but as an outsider I have no influence over that. The result was just that they couldn't use our product. On the flip side, we had consumer and small business customers who had purchased domains through simple webhosting things that didn't give them the required level of access to create a record (and/or they couldn't figure out how to do it). We eventually added an HTTP option and the success rate and time to success both improved hugely.


Especially for business accounts, the level of friction was STUNNING.

Honestly, that's a feature, not a bug.


Not in an economy that demands rapid, scalable, and infinite growth.


[flagged]


Doesn't capitalism demand infinite growth? If you don't have >= 2% YoY, the market thinks you're dead


Doesn't capitalism demand infinite growth? If you don't have >= 2% YoY, the market thinks you're dead

No, it doesn't. There are plenty of businesses, and even entire industries that have operated for centuries on less than 2% growth. And yet somehow the world kept turning.


Fair enough.


[flagged]


Anything touching the DNS records for the root of your entire web presence is not simple and needs substantial review.


Adding a new DNS record for a new, specific purpose is simple and low-impact, technically.


…which gets promptly forgotten about after it’s initial use case and years later your user database gets sold on the internet.

How many “low-impact” things have been compromised over the years, I wonder?


Unless you have anyone competent running the DNS config, or have a ticketing workflow of any kind, and I can't figure out what you think a DNS record with a onetime validation token could do if left unmanaged beyond some adversary discovery.


That's exactly why DNS verification is a overkill.


For representing and verifying identity, it should need director level approval.


Yes.


That's an overly technical way of looking at things. This issue is a whoopsie, not a catastrophic failure at AWS. It doesn't actually represent identity that much because anything critical has humans in the loop. The bank won't accept this as proof of identity. NYT won't accept this as proof of identity: if this bluesky account confessed AWS murders puppies NYT would call somebody they know at Amazon to check.

A company blog is a much bigger vulnerability when it comes to representing and verifying identity. Rather than let somebody fake identify to a computer system it allows faking identity to humans reading it. Yet I don't think most places require director signoff to post.


Both http and dns verification are stupid. Neither of them prove you own the domain.

http verification proves you temporarily control IP space relative to a viewer. dns verification proves you temporarily control name resolution relative to a viewer.

Both are trivially hacked, multiple ways. By the time someone finds out you did it (if they closely monitor CT logs, which nobody does) you've already had hours, days, weeks to run a MITM on any domain you want. The attack only has to work once, on any of 130+ CAs.

The solution is registrar-level proof. Cert request signed by the private key of the domain owner, sent to the registrar to verify, the registrar signs it if its true, it's sent to the CA who can see the registrar signed it. Now you know for a fact the domain owner asked for the cert. The only possible attack is to steal all three of the owner's private key, the registrar's private key, and a CA's private key.

I have been shouting about this for 10 years, none of the industry incumbents care. The internet is run by morons.


I can get behind registrar-level proof. And I can see why it won't happen, and it isn't because it's a bad idea.

One problem I see is the extra overhead for the registrars. Now they have one more thing to do: verify (sign) certificate requests. That extra work is probably enough to get registrars to push back against such a system.

The registrar would be assuming some of the functions of a CA. This would make it easier for a single entity be both registrar and CA. That would threaten the business model CAs and thus they'd push back against such a system.

If the CA were responsible for getting the registrar's verification for a certificate request then that'd add extra work for CAs, and thus the CAs would push back against it. If the domain owner was responsible for getting the registrar's verification for a certificate before submitting it to a CA, then the domain owners would be against it.

And this is all assuming that people could agree on a common set of protocols or data formats for this new system.


> extra overhead for the registrars. Now they have one more thing to do:

I suppose I only take issue with "more" - as it stands don't registrars do effectively nothing today besides print money? It seems like the kind of business that doesn't require much that isn't already automated, and where the only reason I don't have a successful registrar business is that the contracts with whoever owns the actual TLDs are difficult to get. Perhaps they need to look out for DMCA letters? Idk maybe I'm way off, feel free to correct me if anyone knows it's a difficult job.


Instead of certificates, could you not use published tokens, using the same mechanism that registrars already use for publishing DNS NS "glue" records?


> ...dns verification proves you temporarily control name resolution relative to a viewer.

> Both are trivially hacked, multiple ways.

I'm genuinely curious how it is trivial to "control [authoritative] name resolution relative to a viewer".


Find out what the CA uses for its DNS resolver. Attack it with cache poisoning, or BGP spoofing, or compromise the account controlling the target domain's nameserver records, or trick some other system into making a record you want.

The BGP attack requires knowledge of internet routing and the DNS attack requires knowledge of DNS server exploits, but either of them can be executed with very minimal network access that any consumer can get. Target the nameserver account admin with a phishing attack, account reset attack, lateral password bruteforce, etc.

You'd be surprised how incredibly stupid the admins of some of the largest computer networks are. It's really not hard to get access to some accounts. It should require more than just a username and password to hijack a domain, but usually it doesn't.

In any case, if all you want is a valid cert, you can do it a number of ways that nobody will notice. Again, this only has to work once, on any one of 130+ different organizations. Not all of them have stellar security.

And I'm not even talking about social engineering either the CA, Nameserver, or Registrar's support people, which I consider cheating because it's so much easier.


It's not as much that it's trivial (but it seems like it always is because social engineering never stops working) but that once the attacker has authed they can generally delete whatever extra file or record they made and stay authed, potentially hiding the attack.

Whereas, if that required a signature from a private key, with a counter or other log of use in the TPM, it'd be caught by an audit without having to notice the symptoms.

I know that in security design that I've been involved with there's a lot more scrutiny given to each use of a privileged key than there is to making sure that all website logging lists each file in the directory at each request, or logging the full public state of your DNS every minute. Requiring a signed request makes the attacker come in through the front door.


> I have been shouting about this for 10 years, none of the industry incumbents care. The internet is run by morons.

Or maybe, just maybe, hear me out on this... maybe your proposal is not as smart as you think it is.

For one thing:

> Cert request signed by the private key of the domain owner, sent to the registrar to verify, the registrar signs it if its true

What exactly does the registrar verify, and how?


The person who owns the domain creates a private key and uploads the public key to the registrar when they buy the domain. Literally a 68 byte string. Not exactly hard to store. The domain name itself may be longer.

The domain owner creates a CSR and signs it using their private key. Sends it to the registrar. The registrar uses the public key the user uploaded to validate the signature. This happens millions of times a day on shitty computers, this is completely old boring technology.

Now the registrar sends the Registrar-Signed-CSR back to the user. The user sends the RS-CSR to a CA. The CA uses the Registrar's public key to validate the Registrar's signature (exact same process as before). Now the CA can see the Registrar signed it, so it's legit.

Easy to automate. Boring old technology. Same flow millions of computers use every day, just with one extra party in the middle.


How does the CA get the registrar's public key in a way that cannot be spoofed or hacked like you say DNS and HTTP verification can? If your thread model already includes hacking a CA's network infrastructure, getting them to accept the wrong key as valid doesn't seem any more difficult than the others.


Doesn't mandatory DNSSEC also fix this?


DNSSEC adoption is pretty low outside of customers of large nameserver providers (that support DNSSEC out of the box), and also requires the registrar and registry support DNSSEC too. Unfortunately there are still some out there that don't.


> If you want to really prove someone owns a domain, make them change an authoritative DNS record.

You're not wrong (ignoring how easy it is to hack DNS), but at the same time it's hard enough to get people to buy their own domain name, nevermind understand the system well enough to add a TXT record.

It's a strategy that's fine to implement when your target audience is server admins. It's a terrible strategy when your target audience is everyday users who you hope own their own domain. Doubly so in a world where owning your own domain is so rare for individuals.


Convenience. DNS is routinely not automatable by API, or inconvenient to automate. HTTP, however, is normally easy to work with.


It's not even that it's not automatable, it's just that it follows a completely different control scheme and path than DNS.

for 99.99% of cases when a domain is pointed at me and I want to serve an SSL certificate for it, I can answer an HTTP-01 challenge. Needing to orchestrate a DNS challenge will always be a more complicated external thing.

HTTP challenge (and TLS-ALPN) are in-band, DNS is out-of-band.


Sure, adding a TXT record to verify domain ownership is fairly common and lots of tools still use it. But you either have to self host DNS (yet another container to maintain) or use your provider's API (yet another credential, yet another mailing list to subscribe to for inevitable breaking changes to the API).

In contrast, HTTP based verification often has built-in support with your webserver (Caddy) or only requires copy-pasting a few lines to your docker compose file.

There are edge cases, but they're also widely exploited so you won't run into them if you follow best practices.


If you control a domain's DNS entry but I can serve arbitrary content to users from its servers, who really owns the domain?


I think people don't want to put DNS admin credentials in places where they might get leaked. Would be cool if a DNS server or provider offered credentials that could only do ACME challenges and not change any other records.


> Would be cool if a DNS server or provider offered credentials that could only do ACME challenges

There's nothing preventing you from making the DNS record a CNAME to something under a zone that you're allowed to modify.

This is how one of my setups works; _acme-challenge.someservice.example.net is a CNAME to someservice.acme.example.net, and acme.example.net is served by a bind9 that allows dynamic zone updates based on TSIG-signed DNS update requests over WireGuard.

So the machine that hosts someservice has a DDNS key that signs DNS update requests for someservice.acme.example.net, and bind9 is configured to allow that key to change that record.


acme-dns[1] is probably what you might want if you are up for running your own bit of infra. Implements a simple rest api for changing the txt records for acme verifications and nothing more. It works nicely as a delegated nameserver.

[1] https://github.com/joohoi/acme-dns


DNS challenge is required for wildcards on LE at the very least.

But the reason for HTTP is pretty simple - it's extremely easy to implement. You only need to tell your ops to redir a subdomain to your app and you're done, you don't need DNS with API that have narrow enough permission to allow that one team in whole company to generate ACME stuff; most providers ACLs on DNS end at "this client have acesss to that domain via API".


It's not about proving ownership, if it was about proving ownership we would do this via something at the registrar level.

It's about proving /control/. If a domain name is pointed to me (my IP/CNAME) I control it and it is reasonable to allow that person to issue an SSL certificate for a domain (or subdomain) under their control. If you, as the domain owner, want to restrict that, CAA exists as your tool to do so.


HTTP-01 works really well for when you host a custom domain on a SaaS application. The domain ownership stays with the customer, and all they have to do is CNAME/ANAME to your server. No messing with DNS TXT or NS records, 3rd party DNS provider API keys, etc.


Seems like the crypto crowd is moving on to creating flaky decentralized twitter clones now


Bluesky was started within Twitter as a federated social protocol in 2019. Spun out in 2021 as a standalone company and it was also being used in the Crypto group at Twitter until Elon came in.

I'm sure just because of its age and principals involved it's been heavily influenced by the crypto crowd.

But suddenly, without anything else really stepping in to fill the void, it's the Twitter alternative of the day. Anyway not surprised to find some gotcha's like this all things considered.


> I'm sure just because of its age and principals involved it's been heavily influenced by the crypto crowd.

It builds off of several specifications that came from the crypto crowd. It does not use a proof of stake or proof of work blockchain, though, so depending on how you use the words "crypto" and "blockchain," it either is or is not those things.


> > I'm sure just because of its age and principals involved it's been heavily influenced by the crypto crowd.

> It builds off of several specifications that came from the crypto crowd. It does not use a proof of stake or proof of work blockchain, though, so depending on how you use the words "crypto" and "blockchain," it either is or is not those things.

From the protocol's FAQ docs itself:

> Is ATP a blockchain?

> No. ATP is a federated protocol. It's not a blockchain nor does it use a blockchain.

https://atproto.com/guides/faq#is-atp-a-blockchain

Architecturally, it's an attempt at improving ActivityPub in terms of account transfers & portability between federated instances, which ActivityPub doesn't inherently support. Mastodon, by comparison, requires one of those steps to be the explicit export into a locally-saved file, rather than communications between the federated instances themselves.

https://docs.joinmastodon.org/user/moving/


Yes. I agree with the FAQ. But I have heard decent arguments (that I disagree with) regarding what "blockchain" means by people I respect, and so I want to be respectful of that.


[flagged]


It got hugged to death too. Icing on the cake.


The link is to mastadon, not bluesky.


Talking about a flaky twitter clone in another flaky twitter clone. It’s inception!


I have no idea what this means. Can someone explain?


If you can upload a custom file to a domain/subdomain, bluesky social (Jack Dorsey's new twitter) uses it to verify you are the owner of the domain. Chaz uploaded his custom file to their Amazon s3 bucket and now since he was the first one to do it, his account is now associated with Amazon S3.


It's ridiculous that this is not in the title.


Hacker News discourages "editorializing" the title, which means there's incentive to repeat what's being linked to exactly.

Most of the time, it's a good thing, but in cases like this is where this falls over.

(You can also see this in the other direction parent comment, for what it's worth, "Jack Dorsey's New Twitter" isn't really accurate, as far as I'm concerned. It is more informative overall, though.)


Describing or at least providing context is not editorializing. I don't know how this "discouragement" is phrased, but it should instead encourage (if not require) that titles mean something to a general audience (at least as represented by HN's users).

I am routinely down-modded and even banned for merely asking for more-descriptive titles. It's anti-user, anti-community, anti-usefulness, and douchey.

All we needed here was, at least, "Bluesky Social allows domain hijacking" or whatever it's actually doing (which I don't have a grasp of, even after following the cryptic link).

Or even just "This guy is now all of S3 on Bluesky Social." But that wouldn't be as click-baity, would it?


> Describing or at least providing context is not editorializing.

Absolutely. I'm not saying that I think that the title here is good. Just that I understand why it ended up as the title.

> I don't know how this "discouragement" is phrased,

You can find the guidelines here: https://news.ycombinator.com/newsguidelines.html

To quote the relevant part:

> Otherwise please use the original title, unless it is misleading or linkbait; don't editorialize.

That's it.

> (which I don't have a grasp of, even after following the cryptic link)

I described it over here, if you're still curious: https://news.ycombinator.com/item?id=35820670


Thanks for the info! I'll check it out.


In this case, I agree something more descriptive would have been helpful. Even the comments have been mysterious, given the linked web site only returns "429 Too Many Requests".


FWIW it seemed obvious to me. I think a minority of people who play in this space can’t conceptualize others’ understandable ignorance of the norms and axioms.

https://xkcd.com/2501/


The title doesn't even mention bluesky, the all-important context here.

*Edit: typo


You got all that from a "429 Too Many Requests". That's an impressive level of deduction Holmes!


Hahahahah. Try https://media.discordapp.net/attachments/1043284184698994700... to see what the conversation is about!


Nah. The responses to your comment saved us the time.


It's unfortunate that such a bright man has written such a comment.


>Chaz uploaded his custom file to their Amazon s3 bucket

their is who exactly? and why does bsky associate it with the s3 domain if it's just a file in a random bucket?


That's the whole point. Chaz uploaded the file to his own s3 bucket. He is one of thousands (millions?) of people who could have done the same thing with their own s3 bucket. He was the first.


I would argue this is worse (and more hilarious) than Musk buying and giving out checkmarks for people.



Any domain on the public suffix list should just be ignored I suppose. https://publicsuffix.org/list/


It'd be nice as an extra precaution, but please don't build things that rely on the Public Suffix List for security (this list by its nature is only a laggy incomplete approximation of the actual use of domains).


bluesky dev here. whoops.

as others mentioned, not a hard fix.


What is the easy fix? Use the .well-known/ standard instead of the current mechanisms, and roll back verification for anyone who’s already been verified with the flawed approach?


Hilarious though! I'm guessing this is the kind of stuff the Beta was supposed to find. Any other cool/funny bugs y'all have found?


I created a rich text system for posts to handle things like links and mentions with the eventual goal of it being the basis for all kinds of rich text (bolding, italics, spoiler tags, etc)

the flexibility bit us on the butt. people started faking mentions via the APIs and one user figured out he could pack 1000 mentions into one "@everyone" and cause us all to get notified. pretty predictable in hindsight but I dropped the ball there


Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.

http://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule


Do you have a public test suite so we can see how solid your protocol is?


Can I ask what's your tech stack?


[flagged]


(not paul)

Given that the AT Protocol is not compatible with ActivityPub, I don't see how the first step, let alone the second or third, is an accurate description of the dynamics at play here.

> And also how long until Twitter 2.0 or X app uses AT

Ironically, pre elon-buys-twitter, it did in fact look like Twitter was going to end up implementing AT, if all went to plan. But then post acquisition, if anything, that looks less likely.


There is something really endearing about this. It takes me back to those days where we were building something new and bugs just popped up and we had fun with them, doing silly things and then fixing them. Morale was high, team was fun, we were all learning and building something new.


If you want to prove domain ownership, you have to do it at the domain level.

The ability to serve a file “www.example.com” in no way demonstrates ownership of “example.com”; it demonstrates that you control www.example.com.

If you want to prove ownership of a second level domain you must do it through a record in DNS, or through demonstrating control of something that is publicly known to control the domain such as the administrative contact emails.

This really is a solved problem in the PKI space; they should have borrowed that rather than invent their own.


As said multiple times in this thread, the primary way of identifying yourself in this protocol is a TXT record in DNS.


The "primary" way doesn't really matter if a user checks their app and sees that it was verified.

Unless the UI makes it clear it was verified with "non-primary" methods so users can be cautious, any method of verification is essentially "primary" from the user POV.


Yes. It is a problem that this method has issues. They’ll be fixed. My point is, they did not ignore that case, they focused on it! This is just a bug in an additional method.


Unless it's been recently updated their help article only lists TXT record for verification https://blueskyweb.xyz/blog/4-28-2023-domain-handle-tutorial


Everything is moving very quickly, this is just generally the case for anything related to at protocol/bluesky (which is fine, they've been quite busy).


is blueskyweb.xyz an official site for them? I've only seen bsky.social which now redirects to.. bsky.app? (if so couldn't the web stuff live on.. bsky.app?)

edit: bsky.app links to blueskyweb.xyz so I guess so. weird.


Mastadon has no chance if every time something becomes a little bit viral it's instance dies over the traffic.


Mastodon is a bunch of individual instances of varying power but maybe they should build in something that detects load and archived itself and redirects to the archive.


Let’s be honest… Mastodon was never going to make it outside niche technical circles, regardless of scaling. It’s a fun project though.


OK, this is a very cool method of verification for a social network but they goofed. Everyone goofs. In this thread folks being like "this is a bunch of jokers", yeah, I think they learned their lesson about rolling their own domain verification and everyone had some fun.


It's not just this incident, like not reading your own ToS to know what is in there, they seem to be sloppy


Good thing nobody uses bluesky, this would be really awkward otherwise.


429 Too Many Requests - hackernews hug of death?



Here is a link that (currently) doesn't break: https://mastodon.gamedev.place/@jonty@chaos.social/110307532...


Original has a 429, alternative link for this post thanks to the Fediverse!

https://mastodon.social/@jonty@chaos.social/1103075321453803...


That doesn't work either, it just redirects back to the 429'd page on chaos.social. It's because the admins of the site put this up temporarily to deal with HN load.


Isn't this how over a decade ago people "hacked" Google by hosting a similar file on google sites/blogs when that was a thing? How's anyone still using this method for verification in this day and age :/


The same mastodon post was also posted to twitter:

https://twitter.com/jonty/status/1653915932677271552


It looks a bit comical how Twitter rivals are doing based on this:

-Post points out mistake in BlueSky's tech kind of comparable to "rolling one's own crypto" in concept (not as dangerous but not the kind of mistake one wouldn't expect that profile of an outfit to make) -Post is inaccessible because it's on mastodon and that specific mastodon instance seems borkd for whatever reason

All after lots of rage about twitter's tech being fragile in the sphere and probably going down any time soon as they fired most of their staff - but it's chugging along fine


Does anyone have a screenshot? I'm getting a 429.



Thx. Also. Yikes!



Yet another link to a mastodon instance that can't handle more than 10 concurrent users...


They'll just commandeer it later...


This is absolutely hilarious.


Am getting a 429 error. Can someone please fill me in on what was here?


Kiss of death


They switched it to make the page redirect to the Web Archive for now.


I see just error 429 can someone please explain?


They switched it to make the page redirect to the Web Archive for now.


429 too many requests


[flagged]


It's not that hard to modify code.. and I'd argue it's new enough that people should expect growing pains if they want to be early adopters...

"Death" of a product for one bug? ....


This was a simple thing, the fact they got it so wrong only makes me wonder what other bad choices have been made.


Have you... ever worked at a software startup?


I mean I think you could also do this on Mastodon but it would just show a "verified" highlight for amazon s3 in your profile, not that it would get verified as your username.


I don’t know what specifically you’re speaking of, but for the stuff I know of, Mastodon uses WebFinger, which puts the important stuff inside /.well-known/, and .well-known should be blacklisted as a “username” in any of these sorts of systems, for this very reason. (https://www.rfc-editor.org/rfc/rfc8615 specifies the /.well-known/ path prefix.)


No, you can’t. To own s3.amazon.com the HTML on that exact domain and not a sub path needs to have a specific link back to your profile on it.


Without understanding all the context, why not just serve files directly from your smartphone? You can approve your own contact list and share keys.

Yes, of course bandwidth is a concern, but then again there should be our way to monetize, and that's entirely possible using direct payments which already exist.

A little bit of cashing, CDN, and your phone as the initial file server. Tag a photo or video has shared.

Really it's just an RSS feed and CDN.


I think this is not about using s3 to serve files, but someone having verified owning s3 on bsky by putting some challenge file in his bucket. My guess, also missing context.


That is correct.

1. Bluesky allows you to use a domain as a handle by creating a TXT record on an _atproto subdomain of the domain you wish to use (see https://mxtoolbox.com/SuperTool.aspx?action=txt%3a_atproto.s... for mine)

2. You can also serve up your DID by having the URL "https://<handle>/xprc/com.atproto.identity.resolveHandle" return the DID.

3. AWS buckets have the URL structure http://s3.amazonaws.com/[bucket_name]/

4. register "xrpc" as an S3 bucket, drop a file named "com.atproto.identity.resolveHandle" with the correct JSON in it

5. boom! your username can now be s3.amazonaws.com

Hope that helps.


Thanks for the explanation. Kinda surprised xrpc hadn't been registered as a bucket name long ago. Or maybe it was.


Just created it yesterday. I don't think there's as much incentive to squat on the S3 namespace like there is for domain names.


Yeah, a bucket name isn't the face of your company.

At a previous company I worked at, every bucket had the company name prefixed. Never had a problem with squatters.

I wonder if Amazon actually has any policies to prevent squatting.


Sounds like Bluesky screwed up by not implementing the https://publicsuffix.org/ list


The root cause here IMHO is more subtle than that, but I do agree that implementing that at some point is probably a good idea.


Like Pied Piper? If yes, there is an entire show on it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: