Hacker News new | past | comments | ask | show | jobs | submit login
Amazon S3 Path Deprecation Plan – The Rest of the Story (amazon.com)
604 points by jeffbarr on May 9, 2019 | hide | past | favorite | 135 comments



Thank you for listening! The original plan was insane. The new one is sane. As I pointed out here https://twitter.com/dvassallo/status/1125549694778691584 thousands of printed books had references to V1 S3 URLs. Breaking them would have been a huge loss. Thank you!


If we're talking textbooks, well then. This is a textbook case for the 301 HTTP response code.


The old REST-style S3 URLs are specifically excluded from being able to redirect:

https://docs.aws.amazon.com/AmazonS3/latest/dev/how-to-page-...

You can create a new bucket or switch your existing one to "Static Website Hosting" mode to enable the ability to 301 your content going forward. But the URL for the "website" version isn't the same as the REST URL. And again, there's no way to redirect from the old naming scheme to the new one.

If you have content that you've ever linked with one of those URLs, it's stuck there forever.


> And again, there's no way to redirect from the old naming scheme to the new one.

For customers, no. For Amazon itself, yes. And I think that is what parent commenter meant. That Amazon should 301 all requests that are using old paths.


It's not that simple, unfortunately - it won't work for the old dotted addresses and S3 is not HTTP


This isn't a restriction if you're AWS and looking to give more customers a soft landing over an extended deprecation timeframe.

The certificate concern some are raising is also a furphy.


Except for all the dotted bucket names which can't be redirected because the result will always trigger a certificate error.


That's not inherently accurate.

You can do inline generation of LetsEncrypt certificates with bucket-name-specific CN/SAN.

The fact that bucket names could contain characters which are wholly invalid as DNS labels is a bigger issue.


There's a book out there that points to my website. I wish they had just mirrored it. My website has been dead for a few months, and some O'Reilly book has a dead link.


I just now realised that if the web required two-way links there would be no way to put them into books!

Some context:

Back when the www was first released the most common criticism was the lack of back links. This is such a stupid and obvious deficiency that it really wasn't worth even looking into a system that was an obvious stillbirth. It wasn't just the "experts" saying this, but so many of them did because it was just so dumb. So you've probably never heard of this "world wide web" thing -- not only were there only one-way links, but it had its own homegrown markup language dialect and instead of using an ordinary protocol like FTP or even gopher it pointlessly used its own http protocol.

(Also back then there was this research protocol called TCP/IP, which was another waste of time given that the OSI protocol stack was poised to dominate the networks just as soon as a working one was written. I wonder what the modern equivalents are).


Pfff, TCP/IP will never succeed. It doesn't have enough layers! /s

https://archive.org/details/elementsofnetwor00padl

"The Book": The Elements of Networking Style: And Other Essays & Animadversions of the Art of Intercomputer Networking, by M. A. Padlipsky (1985)

The World's Only Know Constructively Snotty Computer Science Book: historically, its polemics for TCP/IP and against the international standardsmongers' "OSI" helped the Internet happen; currently, its principles of technoaesthetic criticism are still eminently applicable to the States of most (probably all) technical Arts-all this and Cover Cartoons, too but it's not for those who can't deal with real sentences.

Standards: Threat or Menace, p. 193

A final preliminary: Because ISORM is more widely touted than TCP/IP, and hence the clearer present danger, it seems only fair that it should be the target of the nastier of the questions. This is in the spirit of our title, for in my humble but dogmatic opinion even a good proposed Standard is a prima facie threat to further advance in the state of the art, but a sufficiently flawed standard is a menace even to maintaining the art in its present state, so if the ISORM school is wrong and isn't exposed the consequences could be extremely unfortunate. At least, the threat / menace paradigm applies, I submit in all seriousness, to protocol standards; that is, I wouldn't think of being gratuitously snotty to the developers of physical standards -- I like to be able to use the same cap to reclose sodapop bottles and beer bottles (though I suspect somebody as it were screwed up when it came to those damn "twist off" caps) -- but I find it difficult to be civil to advocates of "final," "ultimate" standards when they're dealing with logical constructs rather than physical ones. After all, as I understand it, a fundamental property of the stored program computer is its ability to be reprogrammed. Yes, I understand that to do so costs money and yes, I've heard of ROM, and no I'm not saying that I insist on some idealistic notion of optimality, but definitely I don't think it makes much sense to keep trudging to an outhouse if I can get indoor plumbing . . . even if the moon in the door is exactly like the one in my neighbor's.

Appendix 3, The Self-Framed Slogans Suitable for Mounting

https://donhopkins.com/home/Layers.png

    IF YOU KNOW WHAT YOU'RE DOING,
    THREE LAYERS IS ENOUGH; 
    IF YOU DON'T,
    EVEN SEVENTEEN LEVELS WON'T HELP
https://en.wikipedia.org/wiki/Michael_A._Padlipsky

On the occasion of The Book's reissuance, Peter Salus wrote a review in Cisco's Internet Protocol Journal which included the following observations:

Padlipsky brought together several strands that managed to result in the perfect chord for me over 15 years ago. I reread this slim volume (made up of a Foreword, 11 chapters (each a separate arrow from Padlipsky's quiver) and three appendixes (made up of half a dozen darts of various lengths and a sheaf of cartoons and slogans) several months ago, and have concluded that it is as acerbic and as important now as it was 15 years ago. [Emphasis added] The instruments Padlipsky employs are a sharp wit (and a deep admiration for François Marie Arouet), a sincere detestation for the ISO Reference Model, a deep knowledge of the Advanced Research Projects Agency Network (ARPANET)/Internet, and wide reading in classic science fiction.

In a lighter vein, The Book has been called "... beyond doubt the funniest technical book ever written."


You might want to read my comment in the other subthread over here:

https://news.ycombinator.com/item?id=19867467

Also, thanks a lot for the reference, strangely the first time I hear of this book! However, I have to most strongly disagree with the claim of "three layers is enough.", and you'll hopefully come to understand why, after checking the above comment. Which, by the way, DOESN'T speak in favor of OSI AT ALL - nor in favor of IP, for that matter. Two brief quotes from over there:

"[…]

This does not mean that we should be doing OSI. Good grief, no. …

[…]

————

[…]

[22] Someone will ask, What about IPv6? It does nothing for these problems but make them worse and the problem it does solve is not a problem.

[…]"


Your timeline’s all wrong. TCP/IP the Internet built on it were already well established by the time the web was born.

There was nothing particularly special about HTTP or HTML, or even the concept of the web. What made it a success was the availability of a server reference architecture, and, more importantly, a browser. It was easy to try it out, see the value, and get up and running with your own server if you had something to publish.

Discoverability was a problem in the early days. There were printed catalogs of wesites! Backlinks might have helped, but clearly were not a fundamental requirement for the web’s success.


The fact the tcp (and it’s own institutional infrastructure) were already established is what made the whole OSI network effort even more enjoyably absurd. It was the last gasp of Big IT trying to take over the crazies. Most amusingly to me, it seemed only to be discussed in enterprise contexts and Very Important IT Journals. Such people were officially committed to deployment, while their own people were busy getting stuff done. IIRC the first nail in the coffin was the US military ignoring the naked emperor and officially deciding to stick with TCP. But by that time most people with real work to be done had ignore the whole OSI effort.

> Discoverability was a problem in the early days.

Yeah, I remember being at a conference in which a smart person (actually a smart person, no snark) said that discoverability would be over, as indexing the web would require keeping a copy of everything which, of course, is completely impossible. And we all nodded, because indeed, that did make sense. And about six months later, when altavista launched, it seemed only to confirm this belief.


You both get so much of this story so utterly… /not even/ __quite__ wrong, but more importantly, leave so much detail out that, if I didn't presume better(which I do! Would seem rather paranoid if I didn't.), I'd suspect lying by omission. All of this, which includes the story to follow, makes me—and I don't say this for exaggeration purposes, it really does have an emotional impact—very sad, although it doesn't surprise me, barely anyone realizes the true technological horror lurking deep in the history of the 'Internet'.

Please, consider looking A BIT more at the history of TCP/IP:

http://rina.tssg.org/docs/DublinLostLayer140109.pdf (Slides!)

http://rina.tssg.org/docs/How_in_the_Heck_do_you_lose_a_laye... Day, John - How in the Heck Do You Lose a Layer!? (2012)

Abstract:

"Recently, Alex McKenzie published an anecdote in the IEEE Annals of the History of Computing on the creation of INWG 96, a proposal by IFIP WG6.1 for an international transport protocol. McKenzie concentrates on the differences between the proposals that lead to INWG 96. However, it is the similarities that are much more interesting. This has lead to some rather surprising insights into not only the subsequent course of events, but also the origins of many current problems, and where the solutions must be found. The results are more than a little surprising."

And here, a rather lengthy excerpt from later in the paper, as I suspect a lot of people might presume that the paper would go for some points it definitely DOESN'T go for:

"[…]

This does not mean that we should be doing OSI. Good grief, no. This only implies that the data OSI had to work with brought them to the same structure INWG had come to.[10] OSI would have brought along a different can of worms. OSI was the state of understanding in the early 80s. We have learned a lot more since.[11] There was much unnecessary complexity in OSI and recent insights allow considerable simplification over even current practice.

OSI also split the addresses from the error and flow control protocol. This creates other problems. But the Internet’s course is definitely curious. Everyone else came up with an internet architecture for an internet, except them. These were the people who were continually stressing that they were building an Internet.

Even more ironic is that the Internet ceased to be an Internet on the day most people would mark as the birth of the Internet, i.e. on the flag day January 1, 1983 when NCP was turned off and it became one large network.

It was well understood at the time that two levels of addressing were required. This had been realized in the ARPANET when the first host with redundant network connections was deployed in 1972. The INWG structure provided the perfect solution. It is clear why the Internet kept the name, but less clear why they dropped the Network Layer. Or perhaps more precisely, why they renamed the Network Layer, the Internet Layer. Did they think, just calling it something different made it different?

[…]

————

[…]

[10] There was little or no overlap between SC6/WG2 and INWG.

[11] And if the politics had not been so intense and research had continued to develop better understanding, we would have learned it a bit sooner.

[…]

[22] Someone will ask, What about IPv6? It does nothing for these problems but make them worse and the problem it does solve is not a problem.

[…]"

http://csr.bu.edu/rina/KoreaNamingFund100218.pdf more slides!

And much more, here:

http://rina.tssg.org/ (I find it rather very strange the RINA folks and the GNUnet folks seem to each pull their own thing instead of working together, it very much seems like a—hopefully NOT inevitable—repeat of the very thing John Day describes in the slides & articles above…)

——

Addendum #1:

See also, for a network security perspective:

http://www.toad.com/gnu/netcrypt.html

http://bitsavers.informatik.uni-stuttgart.de/pdf/bbn/imp/BBN... see also Appendix H here, starting on PDF page 180

———

Addendum #2:

Something in the back of my mind & the depths of my guts tells me I should link the following here, albeit I remain completely clueless as to why, or how it could seem relevant to—& topical for—any of the above, so, I'll just drop it here without explanation:

https://en.wikipedia.org/wiki/Managed_Trusted_Internet_Proto...

(Interesting standards compliance section there, by the way.)


> You both get so much of this story so utterly… /not even/ __quite__ wrong

Sorry, but no. It is perfectly factual.

> leave so much detail

I was on mobile (hence the typo). I included the level of detail necessary to make my point.

Speaking of points – did you have one?


There is this preoccupation in these documents with "applications" and "services" being an important part of the design of a network. I find it wrong to the point of troubling. Although granted I say this now only with the power of hind-sight.

With the power of hind-sight; Imagine an internet where Comcast, AT&T, et al, have such granular control over access to your infrastructure. Ala Cart billing based on each addressable service you happen to run for example. DNS hijacking on steroids as another. The development of new protocols for all but biggest "participating" organizations would be stillborn. Capitalism would have strangled the Internet baby in the crib long ago if we had "done it right."

The road to hell is paved with good intentions and these are some of the best intentions.


>Imagine an internet where Comcast, AT&T, et al, have such granular control over access to your infrastructure

They already do, tho, we call it port filtering & deep package inspection, and when they do that, people get mighty angry. ;)

Also, you seem to implicitly assume that they'd have come to gain as much power as they have now, but that seems doubtful, given how different this would work.

Also, I think you (unintentionally!) attack a strawman there - the point here consists more of matters like congestion control.

Besides:

I said what I said about hoping for cooperation between RINA & GNUnet for a reason:

RINA lacks the anti-censorship mechanisms of GNUnet, while GNUnet lacks some of the insights from RINA research.

And those anti-censorship mechanisms would make your point entirely moot.


You're probably right about the straw man, but I wasn't trying to make an argument against better models. Just rationalizing why what we got ain't so bad after all.

Yes, they do all that now but it's kludgy, easy to detect, and a much more obvious overreach of their presumed authority. See Comcast/Sandvine fiasco.

As to the implicit assumption of the Telco's powers under such a system. History is my biggest "proof". It's a reasonable assumption given economic and games theory. At best it wouldn't have been the Telco's directly that ended up with the control. Someone would have and the result would still be the same.

How about this: How many terrible government regulations about filtering and censorship that are technologically infeasible with the current internet been not only technically possible, but fully fledged features of an objectively better design?

Again I'm not arguing against research and better designs, just rationalizing what we got.


It is a shame that OSI didn't happen though and instead we have an ad-hoc mess of layered cake without clear boundaries as demonstrated by OSI Model.


I don't agree: the OSI protocols were the classic camel-is-a-horse-designed-by-committee: heavyweight and looked like a pain to use. Looked like, as I never saw a working stack.

The IETF/RFC/Working Code/Interop(RIP) approach has given V4 incredibly long legs. At least the OSI model itself kinda survived.


ISODE [1] worked, and it was usable before there were commercial ISPs offering TCP/IP services.

[1] https://en.wikipedia.org/wiki/ISO_Development_Environment


If you want low performance, that’s one way to get it.


A book (on Libgdx) uses one of my repos as a starting point, telling the readers to clone my repo and do certain changes. I've left the repo alone, bugs and all, as I think it's cool that people uses my code. But the authors never reached out or anything, I only discovered it by chance. I could easily by accident have invalidated their whole chapter.


This announcement comes as a relief. I was already drafting the email I'd need to send to current/former clients, letting them know that they'd have to hire me (or someone else) to write/run a migration to update asset paths hardcoded in static HTML pages or risk broken assets going forward. IIRC, an older version of the Froala WYSIWYG editor didn't support uploads using "nested" object paths (e.g. bucket/post-1/photo.png) and VH paths, which is why I leaned on the path-style feature for a few projects. So, not only would the fix (for the projects using Froala) involve migrating the S3 objects, I'd also have to change application code and/or upgrade the Froala WYSIWYG editor.

Hindsight is 20/20, of course, but I'm (unpleasantly) surprised they didn't take the thousands of ways a hard cut-over would break the web before drafting and (softly) announcing the initial plan.


Stupid question but can’t amazon simply do a redirection?


It'd break working paths, because bucket names can contain dots which work fine in "old style" paths but breaks using vhost style (a cert can only have a single wildcard, and the wildcard does not match dots, so ".s3.amazonraws.com" will not match "foo.bar.s3.amazonraws.com", and it's not possible to create a ".*.s3.amazonraws.com" cert).


But if it is an auto-redirect, can’t aws redirect to an alias of the bucket that replaces dots by another character?


How do you handle conflicts when an other bucket name legitimately uses whichever replacement character you picked? Historically bucket names are supersets of DNS names. In fact, early 2018 Amazon modified the bucket naming rules in one of the older regions as the names that region allowed were completely inaccessible in vhost style (not just cert mismatch, the names were literally not expressible):

> The legacy rules for bucket names in the US East (N. Virginia) Region allowed bucket names to be as long as 255 characters, and bucket names could contain any combination of uppercase letters, lowercase letters, numbers, periods (.), hyphens (-), and underscores (_).


Or use an md5 of the bucket name with a prefix and make bucket names with that prefix followed by a bunch of hex chars illegal bucket names. It doesn’t need to be pretty, it’s only an alias that will never be typed or viewed by a human.

[edit]: I assume that new bucket names that break subdomains have been made illegal now so they work with a finite and static list of names. I am sure they can come up with a substitution that doesn't create collisions and with a reserved prefix you can avoid future collisions.


You don't allow them register whatever replacement character you pick?


> You don't allow them register whatever replacement character you pick?

1. it's a few years too late for that.

2. it makes very little sense and is not really workable, the set of valid characters in DNS name is limited and quite finite, you're suggesting amazon just bans e.g. z from bucket names (and not explaining what happens to all existing bucket names with a z in them).


  s3.amazonaws.com/abc-xyz -> abc-xyz.s3.amazonaws.com
  s3.amazonaws.com/abc-x.y.z -> only punycodeof(abc-x.y.z).encoded.s3.amazonaws.com


Punycode encodes ascii characters literally, so punycodeof(abc-x.y.z) is abc-x.y.z which fixes nothing whatsoever.


well how about somethingLikePunyCodeButNotActuallyPunyCodeSoThatEveryoneUnderstandsHisPointAndDoesntFocusOnIrrelevantTechnicalities(abc-x.y.z) ?


Still doesn't help with domain censorship. This was discussed in-depth in the other thread from yesterday, but TLDR, it's a lot harder to block https://s3.amazonaws.com/tiananmen-square-facts than https://tiananmen-square-facts.s3.amazonaws.com because DNS lookups are made before HTTPS kicks in.


And even if you use encrypted DNS, the domain is still in the clear via SNI. There's an RFC for encrypting SNI, but that's not here yet.


Encrypted SNI seems to be tailor made for this situation. The public portion of the request SNI would just be s3.amazonaws.com but the ESNI extension has the full subdomain name. That (+encrypted dns) solves the privacy issue, while still enabling all the new features and improvements they have planned.


Does encrypted SNI solve the issues that AWS are trying to solve by removing path-style S3 requests? Namely, removing the complexity and single-point-of-failure of having one domain, and making it possible for subdomains to have varying allowed cipher settings? I don't know how encrypted SNI will work, but it surely it must choose the cipher before it sends the domain, so it ought to rule out one of the reasons for the change?


According to https://tools.ietf.org/html/draft-ietf-tls-esni-03 ESNI parameters (e.g. public key) are queried via DNS. In the above case, the browser queries a TXT record for _esni.tiananmen-square-facts.s3.amazonaws.com, in addition to A and/or AAAA records for tiananmen-square-facts.s3.amazonaws.com. It then uses the public and other parameters to encrypt the SNI.

So it simply shifts the problem to DNS. To keep requests confidential from sniffers on your LAN or somewhere along the path, you're expected to use something like DoH.


Aren’t TLS certificates also sent in plaintext?


No longer since TLS 1.3


However, encrypted SNI and DNS-over-HTTPS should provide a better solution to that problem once they're ready, and there's no reason to think Amazon won't adopt encrypted SNI.


It actually helps tremendously, since at the very least now there can be a "black market" for legacy (pre-Sept-2020) buckets, especially those on dedicated accounts that can be provided to organizations spreading facts like this.


> It actually helps tremendously, since at the very least now there can be a "black market" for legacy (pre-Sept-2020) buckets

Err, no, countries will just block the legacy bucket URL style and say that only the bad guys would still be using it.


If they went to that extreme maybe they’d block AWS altogether. Or all SSL traffic!


People didn't seem to remember that the exact thing happened to Google. Picasa, youtube, GCloud, GSuite, and finally Gmail.

Who's next, domain fronting on Microsoft Azure?


That would mean they are blocking all S3 buckets indiscriminately.


Only old S3 buckets that are accessed the old way.


Couldn't they just middle-man the traffic and block specific URLs?


ssl prevents that.


It explicitly does not. It means there are additional barriers to doing it - people would need to accept a bad cert (we already know the overwhelming majority will), or they would need to slip in their own CA that allows them to generate their own valid certs for MITM, but that is eminently doable for the Chinese government inside of China. They can then block all traffic for people that do not use the cert that allows them to decrypt said traffic. It functionally is the exact same thing, and would still allow "legitimate" traffic without problem.


That's not what explicitly means. Ssl explicitly does prevent mitm attacks from intercepting URLs of requests.

The fact you can get around it by ignoring the cert is a bit irrelevant. It's like saying locks don't work because people can break your window.


As noted, you don't have to ignore the cert, and we're talking about state level actors.

And it's not the window. It's like saying locks don't work if the state has a master key, which they do.


They already have their own CA in browsers, so they can easily MITM. That’s why mobile apps will use certificate pinning to verify their server


I thought countries who did this already issued their own certs to be able to analyze traffic. Like China. Maybe I misunderstood.


I don't understand, why would restricting access to an anti-censorship service help fight censorship?


I mean that relative to the previous proposal, this is better. But other commenters make good points that legacy buckets could be blocked.


Why is having a black market good? Is it that there's inherent value in making a market for something people find valuable (even if the seller is some tech person and the buyer is a dissident in a country that probably can't legally send money to San Francisco in the first place), or is it something else?


What could Amazon do that would "help"?


Still let people use the old path system, even on new servers. This is an often-used trick to get around government censorship that will be destroyed with this change.


TBH, though, I'm not sure that would really help that much over time. It's clear there are real benefits for having bucket names in subdomains, such that the majority of new users would pick that format in any case. In that situation, governments may end up just blocking s3.amazon.com in any case because "only old bad guys" continue to use it.


Why S3 must carry the burden of getting around censorship?

If you are talking about China, yeah, Google used to carry that burden. Now GAE, GCloud, Youtube, Gmail are all gone. The whole IP range was blacklisted.

Now what?

Just because something accidentally works does not mean it will last forever.


They don't "must", but it would be very nice of them.


China isn't the only repressive country in the world, there's plenty of people using domain fronting right now in other parts of the world.


I believe their point was that a central authority would have finer grained censorship control with the v2 bucket scheme instead of the old path based scheme.


I don't think they really want to. It's literally the biggest reason for the change (they can work around the load balancing issues) and they do not address it at all in the article.

The Internet is not what it use to be, and it's going to become more and more difficult to deal with censorship.

The Internet treats censorship as a malfunction and routes around it. - John Perry Barlow


Why is it harder, just block s3.amazonaws.com that's it. I don't think countries that block domains care about s3.amazonaws.com.


This is interesting for a few reasons. IMHO, the original deprecation plan was reasonable. Not generous, but reasonable. Especially compared to what other cloud providers (eg. Google Cloud) have done. It did seem like a diversion from their normal practice of obsessively supporting old stuff for as long as possible, but it really wasn't too bad.

Responding to feedback, publicly, and explaining what they were trying to do and why they needed to do it, is incredibly refreshing.

This seems like a big PR win for AWS. I'm left trusting and liking them more, not less.


How was the original plan “reasonable”? S3’s FAQ talks about durability in terms of tens of thousands of years https://aws.amazon.com/s3/faqs/. To honor that claim, AWS has the burden to support v1 URLs until the end of the internet.


Storage durability has nothing to do with this. Changing how you access data saved in storage is a reasonable change to keep up with the evolution of networking technologies.

The change here was deprecating an access pattern, not destroying data or anything remotely similar.


Tell that to the author of “The Peace Corps and Latin America”, who used S3 v1 URLs dozens of times in the book, with the assumption that they’d be available forever: https://books.google.com/books?id=Q312DwAAQBAJ&pg=PA135&dq=h...

And thousands of other books like that.


We're arguing over semantics. Here's the exact quote from the page you linked:

> designed to provide 99.999999999% durability of objects over a given year. This durability level corresponds to an average annual expected loss of 0.000000001% of objects. For example, if you store 10,000,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000 years

They are referring to durability as corruption in the underlying data store due to known storage-technology risks. If they were to take into account all other risks, they would also have to include the risks of nuclear war, cyber warfare, the US government classifying your organization as a terrorist threat, etc etc.


Testing some of the other, non-s3 links in that bibliography, they're also all dead.

I'm not supporting the old deprecation policy, in fact I thought it was insane. But if anyone publishes a URL assuming that it won't disappear they've not been paying attention. If the peace corps just migrated to Azure instead the links would also die.


Why would they think/assume that URLs have infinite durability? If you buy a floppy disk with some data, it's very likely that you can still get the data that it holds if you manage to get a Floppy Disk Drive. But you cannot expect any change in the evolving conditions that allow you to get a floppy disk driver.

It's very likely that there would be something that replaces URLs in the future.

Amazon promise seems to be that they will always keep your data integrity and keep it accessible. The way I see is that they will be moving my data to newer or better storage types and will keep my data unchanged regardless of any technology change.

To keep with the analogy, Amazon's promise here is that they will always keep the data that you originally stored in your floppy disk, but they cannot promise to give it back to you in a floppy disk. Next year they might give it back to you in a CD and the year after in a cartridge.

The data has been always there, intact. They are just giving it to you in a different medium.


Did they discuss that with Amazon prior to publishing? That seems like a completely silly assumption to make for a product that’s been around for only 13 years.


Also every AWS book out there that has instructions on how to install the AWS CLI uses the v1 URL! You know why? Because that’s the official URL! https://twitter.com/dvassallo/status/1125502432924975104?s=2...


The distinction is academic if you don't have the ability to change the URL.


It's pedantic, but durability has little to do how the object is routed. The object is still there.


That's not being pedantic.

A move operation is representationally an atomic copy and delete.

The original object is clearly no longer there.

It's clear cut breaking backwards compat for no reason at all.

I don't even know why they aren't supporting both API versions indefinitely it doesn't really make any sense it's literally a url rewrite/301 for anything hitting the old domain. Want to avoid our bottleknecked legacy LBs for better performance? Hit the new LBs. Hell even the sdk doing this upfront will alleviate a crap tonne of legacy requests.

People need to stop allowing corps making breaking backwards compat so nonchalantly it's unprofessional. And on top of that, AWS has a really good track record of maintaining backwards compat, allowing them to get away with this is just asking for more down the line.


Could you please clarify what Google Cloud did in comparison? I'm not arguing, just want to know more about Google Cloud.


It is good to see them modifying their plan. I get the need to have stable APIs but the reciprocal challenge of updating APIs to handle new use cases and scale better.

The think that I wish companies would do in this case is the following. Set a hard date for when the change will happen (hopefully giving at least 18 months). Then, send a weekly (monthly) report detailing the number of things a customer has that isn't compliance. For example, every week AWS could send a report summary to the account holder of any URL accessed via the path structure. They could then login to see more details. A lot of systems are sprawling and people are busy putting out fires so a constant reminder and hard end date keeps it top of mind for people so you don't end up working through the weekend trying to get something back up and running. I wish Apple would do this for deprecated APIs (e.g.), email me regularly that I have an app in the store that is using deprecated APIs and they will stop working on date X.


> Bucket Names with Dots – It is important to note that bucket names with “.” characters are perfectly valid for website hosting and other use cases. However, there are some known issues with TLS and with SSL certificates. We are hard at work on a plan to support virtual-host requests to these buckets, and will share the details well ahead of September 30, 2020.

I’m mystified how they’re planning on doing this. Anybody care to speculate?


They're already a CA, could they reasonably just issue a certificate for every bucket? I have no idea how many buckets there are in total.

~They probably couldn't take the Cloudflare approach of jamming 100 customer domains onto each certificate, since that would leak bucket names too easily.~


A change like this would already end up exposing every bucket name - at least if you wanted them to actually work in browsers. CT logs are required for all browsers now (subject to https://chromium.googlesource.com/chromium/src/+/master/net/... ).


There is progress being made on allowing redaction of subdomains in CT logs.

https://tools.ietf.org/id/draft-strad-trans-redaction-01.htm...


> They probably couldn't take the Cloudflare approach of jamming 100 customer domains onto each certificate, since that would leak bucket names too easily.

Issuing one certificate at a time wouldn't make a difference since they're all submitted to public CT logs. Bucket names shouldn't contain sensitive information and security through obscurity is a bad idea.


> security through obscurity is a bad idea

Obscurity is a good and sensible layer for defense in depth. Systems A and A' were the only difference for A' is added obscurity will result in A' being more difficult to attack.


Yeah, good point. It makes me doubt that they will issue bucket-specific certificates at all. Perpetually exposing every single bucket name seems like a bad trade-off just to satisfy certificate verification.

Maybe there will be a name translation scheme for bucket names with periods (kinda like punycode for IDNs).


> security through obscurity is a bad idea

only if that's your only defense, which it was for a lot of old crypto schemes & why the crypto community consensus was to publish algorithms/assume the attacker had the implementation. that mindset isn't universally applicable


With an Internet that doesn't push back packet rejection to bad hosts, rather than victims that are being flooded; being able to individually address buckets sounds like an increased risk.


Isn't it technically valid to issue a cert with (star).domain.com, (star).(star).domain.com, (star).(star).(star).domain.com, etc...?


No you can only have a single wildcard per domain listed in a cert.


And only as the leftmost component, that is, (star).example.com is valid, foo.(star).example.com is not.


You can technically create them, but IIRC browsers don't trust them.


And creating/signing them is a violation of the CAB Forum Baseline requirements.


It would probably be easier to work with customers who can't migrate to make a wildcard for their use-case, e.g. for xyz.evilcorp bucket names, you could just make one *.evilcorp.s3.amazonaws.com cert.


They could probably even generate them in a lazy-loading style


Given that the certificates don't really need unique keys this would actually be feasible, yes (since then generating the cert only requires a RSA signature).


I'm guessing that the vast majority of buckets with dots in their names are actual domain names that are CNAME'd to S3, not just random strings of characters that happen to contain dots.

In that case, you can create a CloudFront endpoint with your bucket as the origin, and point your domain name at the CloudFront endpoint instead. CloudFront can already handle TLS with arbitrary domain names, so I wouldn't be surprised if this becomes the official recommendation for buckets with dots in their names. You already need to do this anyway if you want to use TLS with your S3-hosted static website, because you're probably not accessing it at a ((sub)sub)subdomain of s3.amazonaws.com.


Use some kind of encoding? Maybe a type of Punycode?


For anyone still confused to why AWS dominates the cloud market, it's because they're willing to grandfather features with a reasonable sunset horizon.


I'd say the momentum from having no competition in the space for nearly a decade is far more impactful.


I'd say it's a combination of both.

Not having competition means they got a lot of users, but having long-term feature support means people didn't run away.


Malloc for the internet: "We launched S3 in early 2006. Jeff Bezos’ original spec for S3 was very succinct – he wanted malloc (a key memory allocation function for C programs) for the Internet. From that starting point, S3 has grown to the point where it now stores many trillions of objects and processes millions of requests per second for them. Over the intervening 13 years, we have added many new storage options, features, and security controls to S3."


It's nice to see that instead of deprecation support for the old paths will continue for all buckets created on or before the cut-off date of Sept 30, 2020.

So if you don't want to change, you can continue using the old paths. Just might limit access to some new features coming later that are dependent on the virtual host sub domains.


This is a great step forward. Particularly changing the rules a little so that old buckets won’t break after a certain date.

Thank you for taking the time to write this up Jeff.


You are welcome. And now, back to my stay-cation.


Okay probably a dumb question but why can't they just have an automatic redirect from the path style to the virtual hosted ones under the hood? People get both options up front while they can work with the one they like.


"In this example, jbarr-public and jeffbarr-public are bucket names; /images/ritchie_and_thompson_pdp11.jpeg and /jeffbarr-public/classic_amazon_door_desk.png are object keys."

I think this should be:

"In this example, jbarr-public and jeffbarr-public are bucket names; /images/ritchie_and_thompson_pdp11.jpeg and /classic_amazon_door_desk.png are object keys."


You are correct; thanks for spotting this! All fixed.


Props to Amazon for listening to feedback and altering course.


Grandfathering is a good idea. GJ AWS.


"GJ"?


Good job


Kind of tangential, but is Bezos a programmer type? I thought he came from banking or the big 4. I’m curious if the “malloc for the internet” bit is verbatim.


Great quite.

I had thought the same about his background, from https://en.wikipedia.org/wiki/Jeff_Bezos looks like he was technical at college and the first couple of years (or less) of his career.

> He graduated from Princeton University in 1986 with degrees in electrical engineering and computer science.

and

> After Bezos graduated from Princeton in 1986,... He first worked at Fitel, a fintech telecommunications start-up, where he was tasked with building a network for international trade.[28] Bezos was promoted to head of development and director of customer service thereafter.[29] He transitioned into the banking industry when he became a product manager at Bankers Trust; he worked there from 1988 ...


I knew I should have wiki’d it! I suppose a Princeton compsci grad would know a bit about compsci ...


He got degrees in computer science and electrical engineering from Princeton. I have no idea how competent he is, but I think you can assume at least equal to the average college grad SDE.


In 2006 absolutely. Today, almost certainly not.


He said once in a "fireside chat" with his brother that if Amazon went all to shit and never took off, he would probably be a perfectly happy software engineer somewhere.


He was at D.E. Shaw, a company not known for hiring dead wood.


I don't think Jeff is a programmer, but I think he is very smart and his success shows that he can understand and create businesses around lots of different types of concepts and ideas. I imagine in reality there is some amount of collaboration that happens before Bezos spouts off something like "create a malloc for the internet", but in any case, it's very strong for his brand of leadership that the lore states it came straight from him.


1.) He graduated from CS at Princeton.

2.) Only an actual programmer would know what malloc is and how it works (he knows what it is exactly since he has the idea "malloc for the web")


Point taken, I didn’t know Bezos had an engineering background! Amazing.


He's literally an EE/CS Major. He's a programmer at heart. His technical depth is key to a lot of Amazon's biggest successes.


Wow! News to me. Thanks for letting me know.


It seems to me that adding a 301 redirect from the old URL to the new would not unresonably stress the resources of AWS? It seems perfectly resonable to update the library access, but breaking old URLs seems unessesary. They could even add a second of latency to incentivise people who can update their links.


Some HTTP clients don’t support redirects, or at least require an explicit configuration to enable them. So this would still be a breaking change for some applications.


Yeah pushing the new way is fine but not removing the logic to resolve the old way is better.


Agreed! I think Amazon definitely made the right call here.


Pre-signed urls still come back from the S3 SDK as a V1 path style. I'm assuming this either changes at some point, or that will continue to work?


This was a DOA when I first read it. S3 or AWS wouldn't break a single customer before changing anything.


Does this change affect S3 access via the various AWS SDKs, or just the format of URLs?


I still don't get why there was such an uproar about this: Amazon should just issue a "301 Moved Permanently" and be done with it.

If your app for some arcane reason doesn't understand an HTTP status code that's been around for 20 years... your code is bad and you should feel bad.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: