Hacker News new | past | comments | ask | show | jobs | submit login
Cool URIs Don't Change (1998) (w3.org)
387 points by tarikozket on July 17, 2020 | hide | past | favorite | 154 comments



While the main concept (Don't change your URIs!) is good, I can't agree at all with their advice on picking names, in particular the 'what to leave out' section. No subject or topic? The justification for this is flimsy at best - 'the meaning of the words might change' So what? People cope with this all the time in other media, e.g. old books. It's not too confusing. What's more confusing is a URI that has all the meaning removed, after all this whole URI discussion is about the human appearance of URIs. Take out the topics and you are just left with dates, numbers and unspecific cruft. If I was designing a company's website, I'm sure as hell going to put the product pages under '/products'.

FWIW, the document's own URI is terrible: 'https://www.w3.org/Provider/Style/URI' - who could have any idea what the page is about from that? And what if the meaning of the word 'Provider' or 'Style' changes in x years from now? :) You could argue that the meaning/usage of 'URI' has already changed, because practically no-one uses that term any more. Everyone knows about URLs, not URIs. Not many people could tell you what the difference was. So the article's URI has already failed by its own rules.


IMO that's a pretty good URL. For example if you drop it often in conversations, you can remember it, since it's short enough and has no numbers or awkward characters. I would have preferred lower casing and if you try it with lower cased letters, it doesn't work, but other than that ...

No, a URL doesn't necessarily have to give you the title of the article, even if having some related words in it might be good for SEO value. If you paste it in plain text or similar, add a description to it. Here's how:

Cool URIs Don't Change: https://www.w3.org/Provider/Style/URI

There, now the reader will know what's this about.


I don't think being able to remember URIs is particularly useful. In 99% of cases they're clicked on or shared, not recited from memory.

I'd still get far more value out of this:

https://www.w3.org/article/1998/cool-uris-dont-change/


I think the short gist of it is: naming things is a hard problem.

I think you both stumbled upon a fundamental part of the discussion: the tension between finding a way to identify resources (or concepts, or physical things) in a unique and unambiguous fashion, and affordances provided by natural language that allow human minds to easily associate concepts and labels with the things they refer to.

The merit of UUID's, hashes or any other random string of symbols which falls outside of the domain of existing natural languages, is that doesn't carry any prior meaning until an authority within a bounded context associates that string with a resource by way of accepted convention. In a way, you're constructing a new conceptual reference framework of (a part of) the world.

The downside is that random strings of symbols don't map with widely understood concepts in natural language, making URL's that rely on them utterly incomprehensible unless you dereference them and match your observation of the dereferenced resource with what you know about the world (e.g. "Oh! http://0x5235.org/5aH55d actually points to a review of "Citizen Kane")

By using natural language when you construct a URL, you're inevitably incorporating prior meaning and significance into the URI. The problem is that you then end up with the murkiness of linguistics and semantics, and ends with all kinds of weird word plays if you let your mind roam entirely free about the labels in the URI proper.

For instance, there's the famous painting by René Margritte "The treachery of images" which correctly points out that the image is, in fact, not a pipe: it's a representation of a pipe. [1] By the same token, an alternate URI to this one [2] might read http://collections.lacma.org/ceci-nest-past-une-pipe, which incidentally correct as well: it's not a pipe, it's a URI pointing to a painting that represents a physical object - a pipe - with the phrase "this is not a pipe."

Another example would be that a generic machine doesn't know if http://www.imdb.com/titanic references the movie Titanic or the actualy cruiseship, unless it dereferences the URI, whereas we humans understand that it's the movie because we have a shared understanding that IMDB is a database about movies, not historic cruiseships. Of course, when you build a client that dereferences URI's from IMDB, you basically base your implementation on that assumption: that you're working with information about movies.

Incidentally, if you work with hashes and random strings, such as http://0x5235.org/5aH55d, you're client still has to be founded on a fundamental assumption that you're dereferencing URI's minted by a movie review database. Without context, a generic machine would perceive it as random string of characters which happens to be formatted as a URI, and dereferencing it just gives a random stream of characters that can't possibly be understood.

[1] https://en.wikipedia.org/wiki/The_Treachery_of_Images [2] https://collections.lacma.org/node/239578


Good comment, thanks for sharing.

It's an interesting topic. I agree with you that identifiers can be intended for humans or machines, and there's often different features to optimize for depending. URIs are the strange middle ground where they include pitfalls of having to account for both humans and machines.

In an interesting way, each individual website has to come up with its own system for communication. It may be a simple slug (/my-new-blog/), or it may be an ID system (?post=3). It could be something else completely.

There is some value in offering that creativity, but a system where URIs are derived from content also makes a lot of sense to me. You mentioned a hash which I think is the right idea.

It seems reasonable enough that URIs could take inspiration from other technologies like git, or even (dare I say) blockchain. This leads naturally to built in support for archiving of older versions, as content is diffed between versions.

There's some fun problems to think about like how to optimize the payload for faster connections, then generate reverse diffs for visiting previous versions. Or if browsers should assume you always want the newest version of the page, and automatically fetch that instead.

This solves some problems, and creates many others. Interesting thought experiment anyway.


This is one of those classic, foundational documents about the Web. But it's rarely followed. Tool use has come to dominate the form that URIs take; tools are used both for delegation and to absolve humans from crafting URIs by hand. Switching tools frequently ruins past URIs.

Additionally, widespread use of web search engines has made URI stability less relevant for humans. Bookmarks are not the only solution to find a leaf page by topic again. A dedicated person might find that archiving websites may have preserved content at their old URIs.

Some of this is allowed to happen because the content is ultimately disposable, expires, or possesses limited relevance outside of a limited audience. Some company websites are little more than brochures. Documents and applications that are relevant within organizations can be communicated out of band. Ordinary people and ordinary companies don't want to be consciously running identifier authorities forever.


The web has evolved well beyond what it was envisioned to be at the time this was written - a collection of hyperlinked documents.

The reason for the eventual demise of the URL will simply be the fact that the concept of "resource" will just not be sufficient enough to describe every future class of application or abstract behavior that the web will enable.


I disagree.

It depends on how you define a "resource" and what which value you attribute to that resource. And this is exactly the crux: this is out of the scope of the specification. It's entirely left to those who implement URI's within a specific knowledge domain or problem domain to define what a resource is.

Far more important then "resource" is the "identifier" part. URI's are above all a convention which allows for minting globally unique identifiers that can be used to reference and dereference "resources" whatever those might be.

It's perfectly valid to use URI's that reference perishable resources that only have a limited use. The big difficulty is in appreciating resources and deriving how much need there is to focus on persistence and longevity. Cool URI's are excellent for referencing research (papers, articles,...), or identifying core concepts in domain specific taxonomies, or natural/cultural objects, or endorsement of information as an authority,...

The fallacy, then, is reducing URI's to how the general understanding of how the Web works: the simple URL you type in the address bar which allows you to retrieve and display that particular page. If Google et al. end up stripping URL's from user interfaces, and making people believe that you don't need URI's, inevitably a different identifier scheme and a new conceptual framework will need to be developed to just to be able to do what the Web is all about today: naming and referencing discrete pieces of information.

Ironically, you will find that such a framework and naming scheme will bear a big resemblances, and solves the same basic problems the Web has been solving for the past 30 years. And down the line, you will discover the same basic problem Cool URI's are solving today: that names and identifiers can change or become deprecated as our understanding and appreciation about information changes.


I don't think it has evolved. I feel that it became more like a hack, on top of a hack, on top of another hack, and so on.

In the late 90's - early 2000's, HTML started to being pushed into fields that, at my opinion, were unrelated (remember active desktop?). Before you had time to react, HTML was being used to pass data between applications. At the time I was already doing embedded stuff and I remember being astonished to learn that I have to code an HTML parser/server/stack in my small 16-bit micro because some jerk thought it was a good idea to pass an integer using HTML (SOAP, for example).

In the meantime, HTML was being dynamically generated, and then dynamically modified in the browser, and then modified back in the server using the same thing you use to modify it in the browser. It's a snowball that will implode, sooner or later.


"a hack, on top of a hack, on top of another hack, and so on" is evolution.

My HN username may be a case in point, drawing from a selection of twice five[0] digits due to legacy code of Hox genes: https://pubmed.ncbi.nlm.nih.gov/1363084/

[0] "This new art is called the algorismus, in which / out of these twice five figures / 0 9 8 7 6 5 4 3 2 1, / of the Indians we derive such benefit"

https://upload.wikimedia.org/wikipedia/commons/thumb/3/35/Ca...


You might see the Homer Simpson Car[0] and call it evolution too. But what I see is a mess, described as a sequence of hacks and bad decisions, just like HTML (and web stuff) today.

[0] https://www.wired.com/2014/07/homer-simpson-car/


Homer Simpsons car did not evolve at all. It was designed all in a single iteration.


SOAP is XML not HTML, unless I'm missing something.

I'm happy that the world moved on to the point that json/yaml-like formats are strongly preferred.


Correct. I wanted to say "pass an integer over HTTP (SOAP, for example)". An XML to pass some value, all over HTTP, ~20 years ago.


The web has evolved because:

(1) some operators only care about a handful of the URLs under their domain;

(2) hardly anyone uses link relations, so most links are devoid of semantic metadata and are essentially context-free, requiring a human to read the page and try to guess the purpose of the link;

(3) so many 'resources' are now entire applications, and the operators of these applications sometimes find it undesirable to encode application state into the URI, so for these you can only get to the entry point -- everything else is ephemeral state inside the browser's script context.

But I disagree with the statement that "the reason for the eventual demise of the URL will simply be the fact that the concept of 'resource' will just not be sufficient enough to describe every future class of application or abstract behavior that the web will enable."

URIs are a sufficient abstraction to accomodate any future use-case. It's a string where the part before the first colon tells you how to interpret the rest of it. It'd be hard to get more generic, yet more expressive.

The demise of URLs, if it ever comes to pass, will be due to politics or fashion: e.g. browser vendors not implementing support for certain schemes, lack of interoperability around length limits, concerns about readability and gleanability, and vertical integration around content discovery.


The web has evolved well below what it was envisioned to be 20 years ago. I can't think of a single Web-based activity I do that is not a significantly worse experience now than it was in the past.


Rhetorical question: Why must we charge annually to control domains? Should we stop doing this in the name of greater URL stability?

The article states early on, “Except insolvency, nothing prevents the domain name owner from keeping the name.” As it turns out, insolvency is a pretty significant source of URL rot, but also so is non renewal of domains by choice or by apathy, whether for financial or mere personal energy reasons (“who is my registrar again? Where do I go to renew?”) especially by individuals. You start a project and ten years later your interest has waned.

Domains are an increasingly abundant resource as TLDs proliferate. Why not default to a model where you pay once up front for the domain, and thereafter continued control is contingent on maintaining a certain percentage of previously published resources, and if you fail at that some revocable mechanism kicks in that serves mirrored versions of your old urls. Funding of these mirrors comes from the up front domain fees. Design of the mechanism is left as an exercise for the reader :-)


There's one additional way people can lose domains: if they no longer meet the policies that allow them to keep the domain.

- The UK leaving the EU means British companies can't keep their .eu domains, unless they have a subsidiary in the EU.

- A trademark dispute can mean someone loses a domain.


The registration process is more or less automated for almost all TLDs, if they were free then it'd be a bot race to register absolutely everything.

If limited per customer it'd still be a similar situation, probably involving lots of 'fake' accounts and registrant details.

Years ago .info domains were being sold very cheaply. Their registrations skyrocketed and the quality of the average .info domain clearly went down.


It's true that domains shouldn't be free, but it's a pity the money ends up piling up at ICANN. If I understand correctly, they have hundreds of millions of dollars just sitting around, on account of their monopoly.


Domain renewal is definitely the lesser cost to maintaining a website. If you can afford the server, the domain is basically free already.


Why is that? I can point a domain at free hosting. GitHub Pages and SDF.org come to mind.


For most websites maybe, but you can change host to cut costs if needed. Not so with domains if you want to keep the URLs working.


Blogger and Tumblr will map a domain to a blog for free.


And that will definitely not change in 100 years


Blogger has been serving urls for something like 17 years. I’d wager its sites have something like 2x or more average url lifespan at this point than the typical site. What we want right now is more url stability not perfect assurance of 100 year url lifespan. Don’t let the perfect be the enemy of the good.


> Why must we charge annually to control domains?

Namespace pollution. What if my great-great grandson wants my user name on Google? I took it. Similarly, I took the .net domain with my last name.


> Why must we charge annually to control domains?

Spam, squatting, maybe.


Charging a small annual fee to me seems to be a much more elegant solution than any sort of domain monitoring system. It is a very simple way to make defunct domains available again and provide some resistance against one person registering massive amounts of domains.


It works OK to recover unused domains (but definitely no perfectly) but it does mean nearly all URLs get broken eventually even if the content is still archived somewhere.


If curious see also

2016: https://news.ycombinator.com/item?id=11712449

2012: https://news.ycombinator.com/item?id=4154927

2011: https://news.ycombinator.com/item?id=2492566

2008 ("I just noticed that this classic piece of advice has never been directly posted to HN."): https://news.ycombinator.com/item?id=175199

also one comment from 7 months ago: https://news.ycombinator.com/item?id=21720496


I think this is just unrealistic. Let's look at this example:

    http://www.pathfinder.com/money/moneydaily/1998/981212.moneyonline.html
This consists of:

0. Access protocol

1. Hostname/DNS name

2. Arbitrary chosen path hirarchy

3. File extension

This is really a description where to find a document ("locator" not "identifier"). So, if you are:

- re-organizing / cleanup your file structure

- change or hide the file extension

- enable HTTPS

- migrating files to a different domain name

This WILL change the URL. What are you going to do? Not cleanup your space anymore? Stick to HTTP? So URLs DO change. That's just the reality.

If you want something that does not change, don't link to a location but link to content directly: E.g.

- git hashes do not change

- torrent/magnet Links don't change

- IPSFS links do not change.

Or use a central authority, that stewards the identifier:

- DOI numbers don't change

- ISBN numbers don't change


> What are you going to do?

The article addresses this by reminding you that though URIs often look like paths, they can be aribtrarily mapped.

By all means move the resource, but put a redirect under the old URI. This means old links continue to work, which is the key point of the article.


Yes. Have you tried to do that even for moderately complex sites?

I have tried to do it a few times, and eventually just gave up. Carrying forward bad naming decisions from the past, is tremendous effort. When cleaning up the house, I also don't leave around sticky notes at the places where I removed documents from.

On top of this:

- When using static site generators, it's not even possible to do 301 redirects (you would have to ugly slow JS version).

- It does not help if you don't own the old DNS name anymore.


> When using static site generators, it's not even possible to do 301 redirects (you would have to ugly slow JS version).

That isn't always true, depending on your choice of web server. You can use mod_rewrite rules in Apache's .htaccess files so if your generator is aware of previous URLs for given content it could generate these to 30x redirect visitors and search bots to the new right place.

Off the top of my head I'm not aware of a tool that does this, but it is certainly possible. It would need to track the old content/layout so you'd need the content in a DB with history tracking (or a source control system) or the tool could leave information about previous generations on each run for later reading. Or it could simply rely on the user defining redirect links for it to include in the generated site.

Of course if you are using a static site generator for maximum efficiency you probably aren't using Apache with .htaccess processing enabled! I suppose a generator could also generate a config fragment for nginx/other similarly though that would not be useful if you are publishing via a web server where you do not have appropriately privileged access to make changes to that.


I have done this for a moderately complex site, it was a bit of work, but not the end of the world. I'm sure some things went missing, but we got 99% of it, which I consider successful enough.

You can do 301s statically, by generating whatever your particular version of an .htaccess file is in place. Or, you can generate the HTML files with the meta-redirect header in place.

The DNS is obviously an issue, but that's not really relevant. The article is advocating for URLs not changing. It's not saying that they mustn't change, just that it's really cool for everyone if they don't.


>When using static site generators, it's not even possible to do 301 redirects (you would have to ugly slow JS version).

I know it's 2020 and all that, but sometimes you don't need 20 MB of minified JS to achieve something: https://en.wikipedia.org/wiki/Meta_refresh


Using a SSG does not mean you don't have a intelligent server that can't do redirects. That's a limitation of certain web hosts (GitHub Pages for eg).

Netlify allows dead simple redirects, and so do most other static hosting platforms.


Even GitHub Pages behind Cloudflare is capable of issuing a 301.


A classic way to do these redirects is on the front web server itself: .htaccess, nginx config, etc.

When you change the structure of your urls, you can generally generate redirect rules to translate old urls to the new structure. Or run a script to individually map each old url to its new one.

Note: I've never done the layter for more than a few hundreds urls, I don't know if it scales well for a very large site


> When cleaning up the house, I also don't leave around sticky notes at the places where I removed documents from.

This is a poor analogy. Perhaps “I’m a librarian for a library with thousands or millions of users, and when I rearrange the books, I don’t leave sticky notes pointing to the new locations”


I don't know about this specific website (or if it even exists), but the 981212 part of the "link" looks like the identifier to me. The way many sites are set up, most of the link is "locating", but it also contains a unique "identifying" component (page/post/item id). You can remove almost all of the locating parts and the identifier still works so the link can be resistant to everything from just a title change to a complete restructuring (as long as the IDs are kept).


The text below that example says that the .html ought not to be there. That's clearly not intended to be part of what that example is demonstrating, but I guess it's just there because they were going for real world examples.

The arbitrary path hierarchy is not so bad. Better than every URI just being https://domainname.com/meaninglesshash. You can also stick a short prefix in front, like https://domainname.com/v1/money/1998/etc, so that all documents created after a reorg can use a different prefix. If your reorg is so severe that there's no way to keep access to old documents under their old URI, even if it has its own prefix, it seem unlikely they'll be made available in any other location. In that context you can imagine the article is imploring you "please don't delete access to old documents".

Your remaining objections, for host name and access, boil down to "don't use URIs at all, and don't bother to avoid changing them". As I type this comment I'm starting to realise that was your whole point, but it was a bit buried alongside minor objections to this particular example. It's also perhaps a bit of an extreme point of view. Referencing a git hash alongside a URI is sensible, but on its own it's pretty useless, and many web pages won't have anything analogous.


Would say the most excusable part is the protocol but of course that generally ends up being a 301, albeit the URI has indeed changed.

Hostname, well perhaps if a company has been merged/sold.

Path/query is really down to information architecture and planning that early on can go a long way, e.g. contact, faq belonging in a /site subdirectory.

File extension doesn't really matter nowadays

Main thing is there's no technical reasons for the change. I recently saw someone wanting to change the URLs of their entire site because they now use PHP instead of ASP. They could use their webserver for PHP to deal with those pages and save the outside world a redirect and twice as many URLs to think about.


> - enable HTTPS

I really wish HTTPS hadn't changed the URL scheme so you could host both HTTPS and fallback HTTP under the same URL. However most HTTPS sites will redirect http://domain/(.*) to https://domain/$1 (or at least they should) so this doesn't need to break URLs.


> This is really a description where to find a document ("locator" not "identifier").

This is excellent. I wish more people would make your distinction between URL and URI. URIs really are supposed to be IDs. When put in that parlance, it's hard to say that IDs should change willy-nilly on the web. That said, I think that does deprioritize a global hierarchy / taxonomy for a fundamentally graph-like data structure.

> If you want something that does not change, don't link to a location but link to content directly

I see motivation for this, but I've personally found this to be equally as problematic as blending the distinction between URIs and URLs. Most "depth" and hierarchy that's in URLs is stuff that ideally would be in the domain part of the URL. For instance:

http://company.com/blog/2019/02/10-cool-tips-you-wouldnt-bel...

would really map to:

http://blog.company.com/2019/02/10-cool-tips-you-wouldnt-bel...

and the "blog" subdomain would be owned by a team. You could imagine "payments", "orders", or whatever combo of relevant subdomains (or sub-subdomains). In my experience this hierarchical federation within an organization is not only natural, it's inevitable: Conway's Law.

So I do very much believe that the hierarchy of content and data is possible without needing a flat keyspace of ids. Just off the top of my head, issues with the flat keyspace are things like ownership of namespaces, authorization, resource assignment, different types of formats/content for the same underlying resources etc. Hierarchies really do scale and there's reason for them.

That said, most sites (the effective 'www' part of the domain) are really materialized _views_ of the underlying structure of the site/org. The web is fundamentally built to do this mashup of different views. Having your "location" be considered a reference "view" to the underlying "identity" "data" would go a long way to fixing stuff like this.


ISBN numbers are notoriously doubly- and re-allocated.

DOI and ISBN are as much locations as URL.

Content based URN are the only option.


In the very footer of this page:

> Historical note: At the end of the 20th century when this was written, "cool" was an epithet of approval particularly among young, indicating trendiness, quality, or appropriateness. In the rush to stake our DNS territory involved the choice of domain name and URI path were sometimes directed more toward apparent "coolness" than toward usefulness or longevity. This note is an attempt to redirect the energy behind the quest for coolness.

It's 2020 and "cool" still has that same meaning, as an informal positive epithet. I believe "cool" is the longest surviving informal positive epithet in the English language.

"Cool" has been cool since the 1920s, and it's still cool today. "Cool" has outlived "hip," "happening," "groovy," "fresh," "dope," "swell," "funky," "bad," "clutch," "epic," "fat," "primo," "radical," "bodacious," "sweet," "ace," "bitchin'," "smooth," and "fly."

My daughter says things are "cool." I predict that her children will say "cool," too.

Isn't that cool?


"Smooth" is definitely still current slang, with a meaning similar to "cool." And "smooth" came first:

> Slang meaning "superior, classy, clever" is attested from 1893. Sense of "stylish" is from 1922.

> A 1599 dictionary has smoothboots "a flatterer, a faire spoken man, a cunning tongued fellow."

It may be time to bring that one back. "Did you see Keith chatting up that girl at the bar? Total smoothboots."

https://www.etymonline.com/word/smooth


I would say the 1599 sense more accurately reflects the current sense of "smooth" than the 1893/1922 citations do.


Sophisticated used to mean false, as in sophistry: with intent to deceive. So a sophisticated wine, was an adulterated wine, that had something other than fermented grape juice in it.


TIL. I was quite surprised that sophistication used to mean deceptive/misleading behavior.

https://en.wikipedia.org/wiki/Sophistication


"Sophistry" still does.

"Silly" is the standard example of semantic shift over what people generally perceive to be a pretty extreme distance: https://www.etymonline.com/word/silly


Or water, presumably?

To the Romans, drinking wine merum (pure) was a sign of barbarity. They drank their wine diluted.

I don't see a mention of "sophisticated" above - was this just a fun fact?


Roman wine was very strong, around 15-20% ABV. No wonder they added water to it (and sometimes lead as a sweetener).


I've never heard anyone use smooth the way everyone uses cool.


I second this. Linguistics and "slang" is pretty interesting in its own right. I believe that "cool" is more universally used. Usually smooth is used to describe an action that someone did, not really heard it being used in place of "cool".

(under 30 male, west coast USA perspective)


I tend to agree, smooth is IME also usually (but definitely not entirely) used sarcastically, when someone does something accidentally silly, like bumping into a glass door or similar.

(over 30 male, NZ)


"cool" appears in Shakespeare and in the Coventry Christmas Play with the same meaning it's used for today.


Minor correction: "fat" in your list of positive epithets is actually spelled "phat".

I enjoyed reading your list, it was like a trip down memory lane.


I knew that, because of Meet the Parents. https://www.youtube.com/watch?v=jkClkOqgt8k


It is also very international: can be used in Europe and East-Asia with the same meaning, probably globally in fact. Heck, the Japanese government started a "Cool Japan" program few years ago, and it has been borrowed as 酷 (kù) in Chinese. That’s cool.


The W3C is just betting on that URI outlasting "cool". It's a bold move.


Pretty sweet, but I'd disagree that all of those epithets have been "outlived".


I've always liked how "fool" is at home everywhere from the King James Bible to gangsta rap.


Goosberry Fool and Raspberry Fool are pretty delicious. As is Eton "Mess"


'Cool's cool.'

I don't have my hard copy here and Google is failing me but this is addressed by Terry Pratchett in (I think) Only You Can Save Mankind.

The context is some teen-agers talking about how it's not cool to say Yo, or Crucial, or Well Wicked, but Cool is always cool.

Would appreciate the full quote if somebody can find!


'Yo, Wobbler,' said Johnny.

'It's not cool to say Yo any more,' said Wobbler.

'Is it rad to say cool?' said Johnny.

'Cool's always cool. And no-one says rad any more, either.'

Wobbler looked around conspiratorially and then fished a package from his bag.

'This is cool. Have a go at this.'

'What is it?' said Johnny.

...

'Yes. We call him Yo-less because he's not cool.'

'Anti-cool's quite cool too.'

'Is it? I didn't know that. Is it still cool to say "well wicked"?'

'Johnny! It was never cool to say "well wicked".'

'How about "vode"?'

'Vode's cool.'

'I just made it up.'

The capsule drifted onwards.

'No reason why it can't be cool, though.'


Thank you - much better to have the text than my paraphrasing.



Schroeder, Fractals, Chaos, Power Laws, points out there are probability distributions which follow the Lindy effect, and suggests they be used for project planning. ("the longer an engineering task remains unfinished, the longer it will probably take to finish")


That's fetch


The expression fetch, popularized by the 2006 American cultural meta-documentary and seminal work "Mean Girls" is in fact an abbreviation of the term "fetching", i.e. attractive in the British vernacular.

I have spoken.


"Fetch" has become a perfectly cromulent word in its own right.


fetch is perfectly cromulent, depending on your CORS settings.


"Sweet" is probably more common than "cool" in New Zealand, although it's usually tied into "sweet as", as in "that's a sweet as car" or "the weather is sweet as today".


"dandy" might have had a longer lifetime ~1800-1950 RIP.


Maybe we can resurrect it.

That would be just dandy!


Dandy please don’t smooth that fly


I'd say 'dope', 'fresh', 'sweet' are still alive.


"fresh" in the vein of "cool" was definitely already in the list of "dad phrases" when I was in school. I think the usage as shorthand for "breath of fresh air" such as used in product reviews is distinct. "Sweet" is getting there faster than "cool" is. "Dope" as cool never took off here due to a local meaning of "dope" as "idiot".


I agree. And when I'm driving my Camaro, 'bitchin' still works.


Lindy


proper URIs are all the go


glitchin'


That was so totally W3C validated URI yo.


One thing I have been wondering about - speaking of changing URIs, did they (W3C) change/merge the domain name from w3c.org to w3.org at some point? Some old documents seem to point to w3c.org instead of w3.org. (e.g. http://www.w3c.org/2001/XMLSchema) Not that it hugely matters, the old (?) w3c.org links still work, since they are redirected anyway.

Example from a book: https://books.google.com/books?id=yLj8m3K0kNoC&pg=PA224&dq=h...


According to WHOIS, w3c.org is from 1997 while w3.org is from 1994.

A message from a W3C staff member on a W3C mailing list on 1999-06-21 mentions [1] that w3c.org should redirect to the corresponding page at w3.org, and the latter is considered the 'correct' domain.

[1] https://lists.w3.org/Archives/Public/www-rdf-comments/1999Ap...


This is a great link and I think I’ll share it to people. I find that I struggle trying to explain why URIs shouldn’t change because it’s so ingrained in me.

One of OneDrive’s pet peeves is that if I move a file it changes the URI. So any time someone moves a file, it breaks all the links that point to it. Or if they change the name from foo-v1 to foo-v2. I wish they’d adopt google docs.


I wish operating systems managed files in a similar way. Ideally filesystems would be tag-based [1] rather than hierarchy-based. This would make hyperlinks between my own personal documents much easier and time-resistant as my preferences for file organization change.

[1] https://www.nayuki.io/page/designing-better-file-organizatio...


MacOS does this. Native mac apps somehow can preserve file references even after the source file has been moved or renamed. The unfortunate part however is many cross platform apps are't written using the Mac APIs which then leaves an inconsistent experience.

I think it's for reasons like this that many mac users strongly prefer native apps over Electron or web apps.


>I think it's for reasons like this that many mac users strongly prefer native apps over Electron or web apps.

Users on every OS do.


Could have fooled me with regard to Windows. I'm unfortunately not sure what a "native" Windows app is at this point. They've gone through so many frameworks over the years, everything is a mish-mosh.

And this isn't just a result of legacy compatibility. If you are a developer today, and you want to make a really good Windows app, what approach do you take? Is it obvious?


On windows its just a resource hog. On linux and mac they stick out like a pimple on a pumpkin. The number 1 annoyance for me is because they are based on chromium which doesn't have wayland support, all electron apps do not dpi scale properly with multiple monitors.


IMO a fully native Windows app would probably just be Win32, but really WPF/UWP stuff is just about native as well.

WPF is still my favorite GUI framework/toolkit by far, if we're talking standard business app development.


Do you know if this something that any mac app can use or is it limited to Apple in-house applications?


Any Mac app could do it in system 7, and they can now too.


I've encountered this disconnect between web documents and file systems a few times - in Windows at least, moving or renaming a file also changes the URI (unless you check the journal, how can you know that C:\test.txt that was just moved to D:\test.txt was the same document?), so its hard to argue why doing it over HTTP should be any different...


Content-addressing helps a ton. I wish links on the web, especially, had been automatically content-addressed from the start. Would have helped a bunch with caching infrastructure and fighting link-rot. Oh well.

Does make updating more awkward, and you still need some system of mapping the addresses to friendly names.


The problem boils down to this question: What is an URI actually referencing? Does it identify a discrete piece of information (e.g. a text) in the abstract sense? Or does it identify to a specific physical and/or digital representation of that information?

Within the context of digital preservation and on line archives, where longevity and the ephemeral nature of digital resources are at odds, this problem is addressed through the OAI-ORE standard [1]. This standard models resources as "web aggregations" which are represented as "resource maps" who are identified through Cool URI's.

It doesn't solve the issue entirely if you're not the publisher of the URI's your trying to curate. That's where PURL's (Persistent URL's) [2] come into play. The idea being that an intermediate 'resolver' proxies requests to Cool URI's to destination around the Web. The 'resolver' stores a key-value map which requires continually maintenance (Yes, at it's core, it's not solving the problem, it's moving the problem into a place where it becomes manageable). An example of a resolver system is the Handle System [3].

Finally, when it comes to caching and adding a 'time dimension' to documents identified through cool URI's, the Memento protocol [4] reuses existing and defines one extra HTTP Header.

Finding what you need via a Cool URI then becomes a matter of content negotiation. Of course, that doesn't solve everything. For one, context matters and it's not possible to a priori figure out the intentions of a user when they dereference a discrete URI. It's up to specific implementations to provide mechanisms that captures that context in order to return a relevant result.

[1] https://en.wikipedia.org/wiki/Object_Reuse_and_Exchange [2] https://en.wikipedia.org/wiki/Persistent_uniform_resource_lo... [3] https://en.wikipedia.org/wiki/Handle_System [4] https://en.wikipedia.org/wiki/Memento_Project


If you have sequential pages, I don't like dates in the URIs. For example if you have something spread over 5-pages (e.g. a 5-part blog post), I should be able to guess the URIs for all 5 parts just given one. Dates mean that I cannot do that.


In the early days, before the spam, a post would create pingbacks at some well-known-url, so post #2 would create a pingback link at post #1 if you referenced it.


Cursors can be used to solve this issue sometimes.

https://en.wikipedia.org/wiki/Cursor_(databases)


How would that help be able to predict the page urls?


There is a pretty cool bet [1] on longbets.org about exactly this.

[1] http://longbets.org/601/


I wouldn't have accepted a a 30x as the URI not changing.

The migration to TLS for the majority of sites would have won him the bet but I see this one is still serving up non-TLS


Looks like it’s pretty likely to be lost, which I think is pretty cool.


The author made sure he lost when he added the 301 clause


> I didn't think URLs have to be persistent - that was URNs. This is the probably one of the worst side-effects of the URN discussions. Some seem to think that because there is research about namespaces which will be more persistent, that they can be as lax about dangling links as they like as "URNs will fix all that". If you are one of these folks, then allow me to disillusion you.

Most URN schemes I have seen look something like an authority ID followed by either a date and a string you choose, or just a string you choose. This looks very like an HTTP URI. In other words, if you think your organization will be capable of creating URNs which will last, then prove it by doing it now and using them for your HTTP URIs. There is nothing about HTTP which makes your URIs unstable. It is your organization. Make a database which maps document URN to current filename, and let the web server use that to actually retrieve files.

Did this fail as a concept? Are there any active live examples of URNs?


URN namespace registrations are maintained by IANA [1].

One well-known example is the ISBN namespace [2], where the namespace-specific string is an ISBN [3].

The term 'URI' emerged as somewhat of an abstraction over URLs and URNs [4]. People were also catching onto the fact that URNs are conceptually useful, but you can't click on them in a mainstream browser, making its out-of-the-box usability poor.

DOI is an example of a newer scheme that considered these factors extensively [5] and ultimately chose locatable URIs (=URLs) as their identifiers.

[1] https://www.iana.org/assignments/urn-namespaces/urn-namespac... [2] https://www.iana.org/assignments/urn-formal/isbn [3] https://en.wikipedia.org/wiki/International_Standard_Book_Nu... [4] https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Hi... [5] https://www.doi.org/factsheets/DOIIdentifierSpecs.html


The most common place I have seen them is in XML namespaces, eg in Jabber like xmlns='urn:ietf:params:xml:ns:xmpp-streams'

When a protocol ID is a URI it is common to use a URL rather than a URN so that the ID can serve as a link to its own documentation.

There is a bonkers DNS record called NAPTR https://en.wikipedia.org/wiki/NAPTR_record which was designed to be used to make the URN mapping database mentioned towards the end of your quote, using a combination of regex rewriting and chasing around the DNS. I get the impression NAPTR was never really used for resolving URNs but it has a second life for mapping phone numbers to network services.


GS1 (the supermarket item barcode people) integrated their global-trade system that’s used by literally everyone with URN URIs - I worked with them when doing an RFID project a few years ago. In practice it meant just prefixing “urn:” to everything - it felt silly.


That is nice in theory, but in practice stuff like archive.org are vital. If you see a document you want to refer to later, you need to archive it, either in a personal archive or via archive.org.

There are too many moving parts to trust that even domain names will be the same. See geocities and tumblr for recent example. If you want a document, you should have archived it.


The article isn't arguing that URIs don't change; it's arguing that they shouldn't. (The part involving judgement is elsewhere in the title—the word 'Cool'—so it can certainly seem like an assertion of fact rather than of value at a glance.) It thus seems to me that the response "in practice, URIs do change" doesn't undermine that point; your discussion of the need for some solution to the problem rather supports their point—if URIs didn't change, then there wouldn't be a problem to be solved.

(Or maybe your point was deeper, that one not only can't trust that the resource location won't change but even that the resource itself will still be available somewhere? That is true, too! But saying that archive.org is the solution is just making one massively centralised point of failure. That doesn't mean that we shouldn't have or use archive.org, but that we should regard it as just the best solution we have now rather than the best solution, full stop.)


The problem with URIs is that they weren't foreseen as the gateway to a whole slew of web applications, whose URIs can have a lifetime no longer than to serve that one request. There is a continuum here from long lived useful URIs all the way to ephemeral ones.

And then there are the URIs that aren't even made for human consumption, ridiculously long, impossible to parse or pass around. Another class is those that get destroyed on purpose. Your favorite search engine should just link to the content. Instead they link to a script that then forwards you to the content. This has all kinds of privacy implications as well as making it impossible to pass on for instance the link to a pdf document that you have found to a colleague because the link is unusable before you click it and after you click it you end up in a viewer.


Thankfully, you can fix Google Search not linking directly to search result URLs by installing a browser extension.

For Firefox, I recommend the extension https://addons.mozilla.org/en-US/firefox/addon/google-direct.... The extension’s source code: https://github.com/chocolateboy/google-direct.


> Your favorite search engine should just link to the content. Instead they link to a script that then forwards you to the content. This has all kinds of privacy implications as well as making it impossible to pass on for instance the link to a pdf document that you have found to a colleague because the link is unusable before you click it and after you click it you end up in a viewer.

I can copy Google link just fine.


Good for you. Now try it a number of times instead of just once and you'll see they insert their 'click count' script in there a very large fraction of the times.

Here is a sample:

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&c...

Obtained by right clicking the link to the pdf and then 'copy link location'. What you see is not what is sent to your clipboard.


Whenever I see a person or API use URI instead of URL I feel like I'm in an alternate universe. Turns out the distinction is that URIs can include things like ISBN numbers, but everything with a protocol string is a URL so really URL is probably the right term for most modern uses.


To be clear, the difference is that an URI generally only allows you to refer to a resource ("Identifier"), whereas an URL also tells you where to find and access it ("Locator").

For instance, `https://example.com/foo` tells you that the resource can be accessed via the HTTPS protocol, at the server with the hostname example.com (on port 443), by asking it for the path `/foo`. It is hence an URL. On the other hand, `isbn:123456789012` precisely identifies a specific book, but gives you no information about how to locate it. Thus, it is just an URI, not an URL. (Every URL is also an URI, though.)


A URI that cannot be used as a URL (ie as a locator for the resource) is a URN (a name).


I agree, but the confusion will continue. I read maybe a decade ago that URLs are just for network content and came away with the understanding that while URIs could include any "protocol" (like file: or smb: or about:) URLs were more specific. And thus if you wanted to talk about protocol agnostic locations, you should use URI. But that was totally wrong!!

End of the day, there is not clarity, so just use the term that will be best understood by the person you are talking to. URL is a good default, probably even for "about:"


It would be good if more care would be taken when designing URL schemes. It is not accidental that URL shorteners are used everywhere.

Look for example at this link:

    https://www.amazon.com/Fundamentals-Software-Architecture-Engineering-Approach-ebook/dp/B0849MPK73/ref=sr_1_1?dchild=1&keywords=software+architecture&qid=1594966348&sr=8-1
Maybe each part has a solid reason to exist, but the result is a monster.

I would prefer something like this:

    https://amazon.com/dp/B0849MPK73
And guess what, the above short link actually works! But Amazon didn't use this kind of links as a standard.


The second link is completely undescriptive and just like with bit.ly and other shorteners you don't know where you end up after clicking it.


Fair point. A compromise could a somewhat shorter version of the original link:

    https://amazon.com/Fundamentals-Software-Architecture/dp/B0849MPK73/
This includes the main title of the book + ID (this variant also works).


There is also an argument for your original version with just the ID regarding unchanging URL's.

The Amazon URL that includes the title should be fairly stable, but if you look at e.g. a Discourse forum URL you see it contains the topic title, which can change at any time and then the URL changes with it. The old URL still works, because Discourse redirects, but this can't be taken for granted.

So Discourse then has these URL's referring to the same topic:

    - https://forum.example.com/t/my-title/12345
    - https://forum.example.com/t/my-new-title/12345
    - https://forum.example.com/t/12345
And using the last version may be best to use when linking to the topic from somewhere else.


> https://amazon.com/Fundamentals-Software-Architecture/dp/B08...

But... this link works. Everything after /B0849MPK73/ is because you reached that product page through search, and it stores the search term in the URL. You can remove it and the site works just fine.


If you’re interested in taking this to a new level. You should check out initiatives like

handle.net (technically it’s like a url shortner, but there’s an escrow agreement you need to sign first to make sure that the urls stay available). Purl and w3id.org (that allow for easy moving of whole sites to a new domain name. And of course https://robustlinks.mementoweb.org/spec/


TL;DR (from [1]). Guidelines for the "best" URIs:

* Simplicity: Short, mnemonic URIs will not break as easily when sent in emails and are in general easier to remember.

* Stability: Once you set up a URI to identify a certain resource, it should remain this way as long as possible ("the next 10/20 years"). Keep implementation-specific bits and pieces such as .php out, you may want to change technologies later.

* Manageability: Issue your URIs in a way that you can manage. One good practice is to include the current year in the URI path, so that you can change the URI-schema each year without breaking older URIs.

1: https://www.w3.org/TR/cooluris/#cooluris


I'm in the midst of moving a website from mediawiki to a bespoke solution for hosting the data which will enforce structure on what's being presented. In the process, URLs will change, but, part of the migration is setting things up so that, for example, if someone goes to http://www.rejectionwiki.com/index.php?title=Acumen they will be redirected automatically to http://www.rejectionwiki.com/j/acumen so old links will always work. This seems a minimal level of backwards compatibility (although I wonder if there is any specific protocol for how to implement this that will keep search engine mojo—but not a lot because the site gets most of its traffic from word of mouth between users).


The point of the article is that someone visiting the old URL should the old resource as opposed to a 404, an error, or some different content. If you can't keep the old URL the second best thing to do is a redirect. (EDIT: I guess being pedantic the point is to design the URLs so you don't need to change them later, but "get it perfect the first time" is kinda useless advice :-)

This is what 301 HTTP status (permanent redirect) should be for... [1] So it seems to me if you use 301 you should be good to go.

Also from a quick search it seems the recommended thing to do is remove the old URLs from your sitemap.

1: https://en.wikipedia.org/wiki/URL_redirection#HTTP_status_co...


Yes, and adding a note doing the 301 will preserve the search engine mojo.


It's kind of fun to see that this has been posted several times on hn before, but never took off.

e.g.: https://news.ycombinator.com/item?id=8454570 https://news.ycombinator.com/item?id=10086156 https://news.ycombinator.com/item?id=803901

In this one https://news.ycombinator.com/item?id=1472611 the URI is actually broken - not sure if it changed or if it just was a mistake of OP back then.


A comment I didn't post 7 hours ago (was busy):

True. Yet this submission will have dramatically greater visibility than it otherwise would have because the HN facebook bot linked it 5 minutes ago[1]. As a web archivist, I've dealt a lot with the erosion of URI stability at the hands of platform-centric traffic behavior and I don't see it letting up any time soon.

Sidenote: The fb botpage with a far larger audience, @hnbot[2], stopped posting some months ago.

[1]: https://facebook.com/hn.hiren.news/posts/2716971055212806

[2]: https://facebook.com/hnbot


Does this go against REST, where a url is a specific resource and http transforms it?


Fielding's thesis [1] talks about this.

Here's some selected quotes:

6.2.1 "(...) The definition of resource in REST is based on a simple premise: identifiers should change as infrequently as possible. Because the Web uses embedded identifiers rather than link servers, authors need an identifier that closely matches the semantics they intend by a hypermedia reference, allowing the reference to remain static even though the result of accessing that reference may change over time. REST accomplishes this by defining a resource to be the semantics of what the author intends to identify, rather than the value corresponding to those semantics at the time the reference is created. It is then left to the author to ensure that the identifier chosen for a reference does indeed identify the intended semantics."

6.2.2 "Defining resource such that a URI identifies a concept rather than a document leaves us with another question: how does a user access, manipulate, or transfer a concept such that they can get something useful when a hypertext link is selected? REST answers that question by defining the things that are manipulated to be representations of the identified resource, rather than the resource itself. An origin server maintains a mapping from resource identifiers to the set of representations corresponding to each resource. A resource is therefore manipulated by transferring representations through the generic interface defined by the resource identifier."

[1] https://www.ics.uci.edu/~fielding/pubs/dissertation/fielding...


No. /posts/18 will always refer to the same post. Post 18 will never show up on another url. And no other post will show up on 18. You may delete it but intentionally deleting something because it needs to be gone is not about what this post is talking about.


SEO has caused many companies to adopt unsustainable naming schemes. A url that references and ID is not going to have to change if a word in the title of an article is changed.


The number one worst offender of this is microsoft onedrive. Document name or location changed? well you'll need to reshare the file/folder with everyone.


> When someone follows a link and it breaks, they generally lose confidence in the owner of the server.

Is it a bias I've developed or has anyone else realized just how many dangling links on microsoft.com? Redistributables, small tools, patches, support pages, documentation pages. I've recently found out when a link domain is microsoft.com I subconsciously expect it to be 404 with about 50% chance.


I've noticed that the fashion industry is just rife with linkrot, and they spoil very quickly. If you're looking at a forum post from longer than 3 months ago chances are links to specific products will instead redirect to the store's front page or a 404.

Is there a benefit to this? I am mostly just frustrated.


Redirecting to the front-page is SEO-BS. It's supposed to help your domain reputation, but I find it honestly obnoxious compared to a standard 404.


It's really interesting to see perils of old findings becoming relevant when it becomes an actual pain to practitioners. Recent hype to functional programming language and using immutable data was already out there among academics in 90s but wasn't really used in practice until now.



There is a new reason that probably didn't exist back then, the application/cms powering the old pages has been replaced and it would be a massive effort to get the old pages working on the same urls they did before.

I think archive.org is the better long term plan. Not only does it preserve urls forever, it also preserves the content on them.


Side topic, sorry in advance but, am I the only one frustrated by how this page is rendered in a mobile browser? I know, probably this wasn't an issue back in 1998 but I would have expected something that was more resilient to devices from w3. Of course, I might be overseeing issues.


The site is perfectly responsive (even if the margins are a bit large). The problem is that makers of mobile phone browsers decided to assume pages are not responsive and need a large width unless you include a specific meta tag - which is an absolutely stupid assumption and not something anyone could have foreseen in 1998.


I have lot of bookmarks with nice URLs that still don't exist anymore.


"An URI is for life, not just for Christmas."


That bring us to the story of ipfs and ndn



“Dope” URIs Dont Change, that’s gas.


urn?


That's the fifth "reason" listed in the article.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: