Hacker News new | past | comments | ask | show | jobs | submit login
The Sunset HTTP Header Field (ietf.org)
159 points by okket on May 16, 2019 | hide | past | favorite | 82 comments



I'm aware that the following is typical HN middlebrow matter, but I'm asking anyway. Does anyone know why RFCs are still formatted as if they were written on a typewriter in the seventies? I mean, here's a sentence quoted verbatim from this document:

    Sunset header fields will be served as soon as the sunset date is

    Wilde                           Informational                     [Page 9]
    --------------------------------------------------------------------------
    RFC 8594                        Sunset  Header                    May 2019

    less than some given period of time.
What problem does this solve? How is this more useful than it is cumbersome? also, how do people write this? Do they manually space space space space align the footer and then copy it every 30 or so lines?


Don't worry, there's an RFC for that!

RFC 7990 -- RFC Format Framework: https://tools.ietf.org/html/rfc7990

There's various tools to automatically format things as necessary, just like any other kind of text wrapping.

As far as the overall "philosophy" behind keeping it this way, the honest answer is that the IETF is just a particularly unlikely group to change things without a clear need, and there are likely all sorts of tools small and large that expect RFCs to follow these conventions at this point.


As an example of this (though for a non-RFC document):

Here's the "source" XML that is authored: https://openid.net/specs/openid-connect-core-1_0.xml

That can be compiled in to this HTML: https://openid.net/specs/openid-connect-core-1_0.html

Or to this RFC-like plaintext: view-source:https://openid.net/specs/openid-connect-core-1_0.txt

Most new RFCs are authored this way.


First line in the Abstract for that RFC:

> In order to improve the readability of RFCs while supporting their archivability, the canonical format of the RFC Series will be transitioning from plain-text ASCII to XML using the xml2rfc version 3 vocabulary;

Is it readable? Yeah

Is it archivable? Yeah, XML is (AFAIK) one of the most closely followed standards I could think of.


The legal world works the same way too. They're amazingly low tech. A lawyer in my family mentioned that it's basically a combination of:

- everyone can do it with any software (or even a typewriter)

- consistency with legacy documents. The format doesn't just change on you as you're reading through legal history

- it works fine, why change it?

I'd also add a guess:

- There's no room for implementation detail to affect formatting. Last thing you want is a whole bunch of formats that are similar but not identical, just because someone's software is a bit different.

- could you imagine trying to get everyone to change? We should be so lucky that everyone's already this consistent


> could you imagine trying to get everyone to change?

Yes, this is exactly what I do for a living, consolidating policy and legal documents and their related business workflows into modern applications. My requirements for how the text editors work are far more meticulous than your average app exactly for the reasons you stated. Concerns with formatting that most products would blow off as trivial are deal-breakers in this industry.


Is there a reason to not store the document in an abstract format that is more easily handled by systems useful for legal analysts (e.g. giving you the ability to diff text), and just “renders” to the accepted format? (I’m picturing storing the docs as LaTeX, but anything like that would work. Maybe there could be a legal “theme” for a markdown processor, for example.)

Because, in such cases, it wouldn’t really matter if the editor renders the source to text incorrectly, as long as the proofer renders it correctly. Just like with WYSIWYG desktop-publishing software.


Storage formats aren't the issue. We diff and merge documents just fine, and do render them in different formats in some uses cases for specific audiences. Nor is it about a final rendered document. It is the details of workflows and collaborations that happen before a document is ever finalized where the the editing and reading experiences must match.


Very likely: Because they are managed by software that has existed for decades when expectations and needs were very different than today, but which works well and correctly. Why invest time and effort into changing a system that is working well enough? Whether you need to view it on a screen or print it out, this format works.

Also, I guarantee there are any number of downstream consumers of RFCs which take this sort of format as a given, and which will break on even a minor change. And why break those downstream systems if you don't have to?

Basically, any changes will break something. So the benefits of the changes need to be bigger than the costs of the changes. Not to mention the cost in wasted time of all the humans bikeshedding how to change it to make it "better".

Dealing with the ongoing cost of humans having to read across artificial page breaks is a pretty minor concern compared to the costs of all that.


You can read it with any software you like, now and in 30 years when Microsoft Word is a quaint relic in a museum.


How does that not hold for a plain .txt document without page-marker-ascii-art?


I presume there exists a documented way to print them so that everything lines up properly.


The files contain ASCII form-feeds between pages. If you send that directly to a printer, it will cause it to start a new page.


As I first encountered MS Word running on a Xenix system 31 years ago I suspect it will be alive and well in 30 years time!


To be fair, I recall having conversations in the 1990s about how MS Word was such an evil proprietary format and that one day we wouldn’t be able to read it. And here we are nearly 30 years later and Word docs (in a much evolved file format) are still here and still widely supported and easy to read using many tools, including open source ones.

Not saying that proprietary formats aren’t still a bad idea for other reasons, but predictions of unreadability don’t seem to have panned out for any common file formats.


How do you open word documents from the 90’s?!? Do you have a Windows 95 VM or something?

Even with modern Microsoft Word, the formatting of old documents is often mangled.

To this day, up-to-date PowerPoint can’t reliably display presentations made with up-to-date PowerPoint on a different machine, let alone OS!


> How do you open word documents from the 90’s?

LibreOffice


I open them in Word 2016.


> still widely supported and easy to read using many tools, including open source ones

One someone who has not tried could possibly say that.

The numerous doc file formats are a constant headache for anyone doing document processing. Not even Word itself can read its own older formats reliably. Sometimes you have better luck with LibreOffice, sometimes not.

And that's the mostly widely used document file format. Anything else from the same era is completely dead in the water. Manually viewing them can be done in emulators with a bit of work but any automatic processing is a huge undertaking.


Good luck editing these WordPerfect and CorelDraw files!


I put my vaccination history into a ClarisWorks document. At least, I assume I did from the file name…

Could be worse. My dad used a video tape format even more obscure than Betamax.


Yikes. At least laserdiscs weren't homemade so I could replace what little I had.


I'm not sure about Word docs, but FWIW 90s era Excel files have become progressively harder to open.


Unlike Word I actually spent a few years of my life working on this.

At the surface layer this era of Excel ("BIFF" documents) isn't too bad, getting say, a table of small integers representing people's annual salaries out of an XLS file is very do-able and many programs today will get that right.

As you start to dig down it gets nastier pretty quickly. Formulae require implementations that match not just what Microsoft's published documents (I have loads of these on a shelf I rarely look at now) say, but what Excel actually did, bug for bug, back in the 1990s. Maybe the document says this implements a US Federal tax rule, but alas Excel got the year 1988 wrong, so actually it's "US Federal tax rule except in 1988".

You also run into show stoppers that prevent the oft-imagined "Just transform it to some neutral format" because Excel isn't a typed system. What is 4? Did you think it's the number 4? Because the sheet you're trying to parse assumes it's actually the fourth day of the Apple Macintosh epoch in one place, but in another place uses it to index into an array. Smile!

Finally in complicated sheets (often "business critical") there's a full-blown Turing complete programming language, complete with machine layer access to the OS. Good luck "translating" that into anything except an apologetic error message.


> Good luck "translating" that into anything except an apologetic error message.

I'm going to have to steal that line. :)


> Does anyone know why RFCs are still formatted as if they were written on a typewriter in the seventies?

They are formatted in plain text with fixed page sizes because that's what they've always done, it works fine, and there's no compelling reason to change.

> also, how do people write this?

The thing about keeping the same format for a few decades rather than changing it with each shift in popular fashion is that there is plenty of supporting tooling.

https://www.rfc-editor.org/pubprocess/tools/

https://tools.ietf.org/


It's super readable with any plain text editor or browser. And it's really nice to read something not filled with images, crazy fonts, colours and other junk. It's just pleasant.


It sucks on e-readers. It would be much nicer if there were no headers and footers, and no newlines except between paragraphs...



It could help applications using capability urls like in password reset links. Maybe the exact date isn't important, but the problems with capability urls is that they contain a (temporary) secret, but browsers and servers happily log all requests, even if that part of the URL is protected through TLS (edit: to outsiders).

Maybe having such a field will help treating those URL differently than "normal" ones, so that the secret is better protected.

edit: I failed to read the question correctly.



Here's the sunset rfc draft converted from XML to JSON.

https://www.dropbox.com/sh/duhmxzaehy0dwuc/AADyKPN5UVU1HKT9M...

Looking at the JSON, the structure is pretty basic. You could see it rendering in any format/style pretty easily.



Yes please? What tool is this? Are you suggesting that ASCII art is required for paging text?


Lynx[0], a text-based browser (now primarily) used for bash scripting.

[0] https://en.wikipedia.org/wiki/Lynx_(web_browser)


I can use Lynx for bash scripting?


It can be used in non-interactive mode from scripts, e.g.

  lynx -dump $URL > $FILE
This is a simple way to extract the content of a web page as plain text.


Interesting. Even as an experienced web scraper, it would have never occurred to me to use Lynx.


Wow, it sure looks pretty in lynx.


Evidently. Why does the text exist twice? Once in blue and green, and once in a smaller dark shadow behind it?


I believe it's a CLI tool to display RFCs, on a translucent terminal, with a browser window behind it. That still leaves other questions...


Google could use this whenever they launch a new service!


This RFC is really indicative of what I hate about certain standards.

This is all very abstract. No user agent will use this in the same way. You'll end up writing code per application anyway. All this is is a convention that people will interpret a bunch into. This is not going to be interoperable.

The epitome of this is ActivityPub.


What does it matter what the UA does with the information? The point is that the information is supplied; the UA can provide whatever benefit from it that the user wants from it.

For example, I was just thinking that forum software could have their backends scrape image links you attempt to embed; and, if they find that the link has a sunset policy (e.g. it was from an anonymous image upload service that expires posts after a week), then they could rehost the image instead of hotlinking to it. (Or, cache the image for now, and then rewrite the link from a hotlink to a cached link when the deadline hits.)

That’s not something you would standardize behaviour on. But, once you have servers providing the information—not to a particular client, but just spewing it out as an objective fact out into the world—clients can do all sorts of things.


Standardisation does nothing for this. The software could've always just used an X-Sunset header or just another field in their dto.

My point was that there will never be a client that will blindly read that value and understand what to do with it. Which is why the only difference to "X-Sunset" is that there's an RFC with suggestions on how it could potentially be meant there, along with a syntax restriction.

ActivityPub is the epitome of this because they supply a lot of vocabulary and syntax variation to express them (including very complex ways to normalise that syntax, JSON-LD is not fun to parse) but doesn't actually made the implementation commit to anything. It's setting up a bunch of arbitrary rules for nothing. Every single AP implementation still orients itself on compatibility with Mastodon.


>Standardisation does nothing for this

Sure it does. If the information is standardized then the community as a whole can innovate around it. Browser plugins and the like can be made that make use of the information, etc.


Or HTTP-based APIs could check for it and write out log entries or send out alerts to notify ops or the devs that they'll need to investigate further as to how this affects them.


It's not surprising, almost anyone can publish an RFC by doing the appropriate bureaucracy so there are plenty of RFCs which no one really cares about.


ActivityPub is more about living in a small part of a very big linked data graph and letting very different kinds of apps federate amongst themselves disjointly, encouraging more diverse apps that actually can interoperate with these two disjoint groups and provide niche value without the networks being a walled garden.

Very different than an HTTP header.


ActivityPub is too complicated and loosey-goosey. There are a couple of dozen partial implementations out there, but most of them don't federate properly or at all (which I thought was the entire point of the spec!).

Ideally such a standard should be really easy to implement, so we get true diversity, and democratisation, but implementing ActivityPub is a nightmare. Of course it doesn't help that the test suite has been down for months. https://test.activitypub.rocks


(You should note you're telling the author of go-fed how hard it is to implement AS)


Ha! Do you agree?


ActivityPub is not easy to implement due to two paradigms. One: most applications are not linked data aware, which depending on the tools could require a different way of retrofitting an existing application. Two: the tooling around the loosey goosey behaviors isn't as defined as, say, something like Swagger. Since AP is machine readable and RDF based it shouldn't be a stretch to evolve an ecosystem that helps developers.

I chose to go down a very difficult route with go-fed, something I don't think anyone else has done with JSON-LD. So my perspectives on the challenges in the area are very warped. For example, over the course of a mere 8 hours of work I've used go-fed to get my personal blog federated. But I stand very biased.


Hey CJ.

I don't think that graph will ever be readable without vendor-specific hacks applied for every piece of software participating. AS will meet the same fate as XMPP. Some very core features might work, but the diversity in user agents and servers and their quirky implementations will ultimately be to its detriment. Maybe that's leftover pessimism from the time I participated in the XMPP WG. I'd love to be wrong.

I have a lot of respect for implementing AS in Go. I tried and didn't even finish my relay implementation. It was just too counter to the Go core competency and mindset. Or I'm just not a fan of wrangling interface{}s around.


Hey! I totally understand the pessimism. I am untainted by XMPP for better or worse.

I started typing out a reply on my phone but I think I need to sit with a keyboard and just make a blog post on it -- I hope you don't mind a wait. I hope I'll do a fair job characterizing the problems you identity and give them my alternate outlook.


In this particular case, this RFC isn't Standards Track, it is merely Informative, presumably partly because it isn't normative (it's not about behavior but about conventions/hints).


This RFC reserves an http header name. Is is not just informative.


Many capital-I Informative RFCs reserve things like HTTP Header Names. Some of the easiest examples are April Fools RFCs that reserve all kinds of things, and while some people do like to use HTTP 418 I'm a Teapot (https://http.cat/418), that's not a capital-S Standard.

The IETF has workflows for Informative things versus Standard things. It's still useful to "reserve" things in Informative RFCs in case the RFC later moves to the Standards Track. Or also as some of the April Fools RFCs seem to indicate the internet takes its jokes pretty seriously and you never know when an Informative thing maybe adopted just because it was interesting to some developer for some project.


I'm kind of torn here, because I could see this being really useful for web archivists -- e.g. the people who jumped to try to preserve as much of Tumblr as possible as that seems to be "sunsetting" -- but the sites that are actually likely to configure this properly are the very ones I'd expect to have some of the best already-existing archives of. If someone cares enough to set this header, they probably also care enough to try to preserve things reasonably well, or at least have users who do.

The other use cases listed in the RFC don't seem incredibly compelling. Anyone have one that comes to mind?


The most compelling use-case for me is the web-api deprecation. It very clearly tells you when a service will no longer be available. Currently there are no standard mechanisms for this and are mostly driven by documentation. If an api used this then a consumer could look for these and raise alarm bells.


There are also plenty of services that host user content, but only for a short period of time. Pastebin entries with a set expiration could for example set this header to inform clients about the expiration date. That way software can detect links to content that will expire and act accordingly.


Yes, and I think that's the smartness of this is they clearly came from a stance of Deprecation and realized they could make it more general. Good work, imo.


That is the first thing I thought of. We had several case of partners API being deprecated and we were not notified in a timely manner. This could be used to trigger monitoring alerts.


I am not sure if Sunset is the best term here. Sunsetting in such a context would indicate retirement:

https://en.wikipedia.org/wiki/Application_retirement

However the example given is indicative of a session duration, not a sunset period.

> For example, a pending shopping order represented by a resource may already list all order details, but it may only exist for a limited time unless it is confirmed and only then becomes an acknowledged shopping order.

Admittedly, naming things is hard, perhaps this could have been called Expiry: or Unavailable-After: or Valid-Until:


I hate Sunset too, albeit for a different reason. It's businessese, like Legacy, Utilize, and for that matter, Retire (in this sense).

Choose the word that is old, short, and plain. Sunset is a little better than others because at least it is poetic. But for my HTTP headers I would prefer something a little more straightforward, rather than euphemistic --- like Expires, but that's already taken.


Well, we already have Expires for marking responses as stale in cache, and it's also takes just a date string, so things could get confusing there.


Perhaps we could Sunset the Expires header first...


My impression was the shopping example would be more aligned with something like delivery info -- it's not there just for a session, but a month after the package is marked "delivered" it's probably at least marked for deletion.

(To be fair, I'm not sure announcing that kind of mid-term expiry is a real need-to-be-filled in today's web.)


I'm afraid this won't be useful in the real world. API consumers who continue to use legacy endpoints after being told not to are unlikely to update their code to listen to a new HTTP header. This strikes me as a great idea in theory, but unlikely to be useful to websites actually looking to deprecate endpoints.


I don’t think this is intended to be about endpoints, but rather about the resources under an endpoint. E.g., “this image will be deleted from this image-hosting service in 30 days” or “this is a link to an item in your Dropbox Trash folder, and the Trash is emptied every 48 hours” or even “this is a temporary share link and will work for 24 hours.”

Given these use-cases, I could also see the use of a version of the header that doesn’t specify when the retirement of the resource will happen, but just specifies that the resource is, by design, not going to stick around forever. E.g. the URL of a “previous version” of something, where the system only keeps around N previous versions (so as soon as someone adds enough newer versions, the version you’ve linked to will disappear.)

Another fun use-case is sticking this header on everything on a given domain (by e.g. reconfiguring your web load-balancer to emit the header.) Archive.org could then use this as a signal to automatically prioritize archiving content from the domain, before any human being realizes the service is being retired and prioritizes that archiving.


At WS-REST 2015(I think), I did talk about this with Erik Wilde (@dret), and we talked about the fact that is no good way to deprecate bookmarked url or an url stored in an uncommitted transaction. The idea was to add information so clients and user agent could keep track of resources: you can check it before their sunset date. The resource then could change the sunset date or redirect you to the new url.


I'm happy to see this is finally entering the RFC phase. I've already seen this used in the wild for an API (imgur [0]), and I was surprised when they linked to the proposal from 2017 [1]

[0] https://apidocs.imgur.com/?version=latest#api-deprecation

[1] https://tools.ietf.org/id/draft-wilde-sunset-header-03.html


Terribly vague naming. As a non-native speaker I would've never guessed the purpose, and I've been in the English speaking tech sphere for many years. Does anybody actually say "this website will sunset in 5 years"?


I've heard the term used for deprecation before, but not in a long time. I agree it's not great. At least they spelled it right, though!


Tech is never really deprecated; instead, like the setting sun, it simply departs to rise again another morning (when a web developer independently re-discovers it).


Some big companies do, but I never heard a startup use the term.


I really like this idea. I used to work in advertising/marketing and we were launching a lot of purpose built, short-lived landing pages, e.g. to collect entries in a contest, then announce the winners. These websites were live for a period specified in the contest terms, then usually archived. I could see myself routinely implementing this header in such cases.

Also, from the resource consumer point of vie this could help guide efforts like archive.org to make snapshots before it's too late.


Was looking for a way to tell API client about endpoint deprecation and stumbled upon another RFC from Feb 2019: https://tools.ietf.org/html/draft-dalal-deprecation-header-0...

Actually, "E. Wilde" is on of the authors there too.


That is an early stage draft, not a RFC.


tools.ietf.org (and only that site) seems to be down for me at present, failing in the TLS handshake, PR_CONNECT_RESET_ERROR.

Here’s another URL for the content: https://www.rfc-editor.org/rfc/rfc8594.txt


FWIW tools.ietf.org is working for me.


The header that inspired Evan Spiegel




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: