Hacker News new | past | comments | ask | show | jobs | submit login
Trailing slashes on URLs: Contentious or settled? (zachleat.com)
95 points by josephscott on Jan 26, 2022 | hide | past | favorite | 71 comments



This is one of those "made up problems" as it only exists because you insist in doing magic, i.e. mapping /foo to either /foo/index.html or /foo.html.

I am guilty myself, because we need to do things that match expectation of users, but this problem simply disappears if you stop doing magic.

So, the real question is, what "magic" do you prefer?

My implementation was, when given a trailing slash, check if the recource is a directory (as this is what the trailing slash signifies) and use the convention of loading the index.html file at that directory if it is. If it's not a directory, then it's a 404.

If there's no trailing slash, try to load the file with the equivalent extension to the Accept content-type header. So, if the client wants JSON, you try /foo.json for the path /foo. If client requests HTML, try /foo.html.


> i.e. mapping /foo to either /foo/index.html or /foo.html

Urls lost their 1:1 mapping to physical files a long time ago. Nowadays they are mostly just arbitrary strings, although well-crafted ones may reflect the hierarchy of content in their structure.


> Urls lost their 1:1 mapping to physical files a long time ago.

Says who?

Nearly every web server in existence supports static assets which map 1-to-1 to the file system.


Sure it's supported, but that's not how majority of websites work today.


Being pedantic I'd say by volume of HTTP requests it's still true (i.e a single dynamic page load will still bring in multiple static assets)


  > This is one of those "made up problems" as it only exists because you insist in doing magic, i.e. mapping /foo to either /foo/index.html or /foo.html.
If the underlying resource (web server) is Linux, then there is no magic. In Linux, everything is a file, including directories (maybe in Unix too, I don't know, I only work with Linux). So e.g. `/var/www/foo` is the same resource as `/var/www/foo/`. But without the context of the underlying filesystem being available to query that (e.g. with the `file` command) there is no way to know that `/foo` refers to a directory.


When `index.html` is a file, does `ls index.html` show the same result for you as `ls index.html/`?


I see, yes, if the server decides to send index.html when `index.html/` is requested, that is "magic" and your point is very valid.


This assumes that you are serving webpages from a directory with HTML and php files. With many modern services (for example, anything written in django) this isn't how it works.


Very much required if you want to use relative URL calculations.

Stumbled across this in Go yesterday

Compare a relativeUrl ref from 'http://localhost/1.0' vs. 'http://localhost/1.0/'

e.g. https://play.golang.com/p/AsUSx6bRWn6


This is the correct answer. Relative paths are incorrect without a trailing slash.

Use trailing slashes please.


Either could be desired behavior and neither is incorrect. If anything, just be consistent within a single site. IMO you're most likely good as long as you consciously pick one, you know why you picked it, and stick with it.


I literally just closed a ticket about this issue yesterday.. 'why are these relative URLs downloading from the parent folder'.

"Because your CMS is rewriting out trailing slashes. Make it stop that, and then you'll be in the relative path you think your in".

The internet uncertainty principle: It's not DNS, it's trailing slashes. (apologies to Herr Heisenberg :)


Maybe stop "thinking you're in a path" and actually explicitly define which path you wanna be on [1]. The <base> element has existed expressly for this purpose since 1991, although it was called <address> in the first draft.

[1] https://developer.mozilla.org/en-US/docs/Web/HTML/Element/ba...


It depends on your expectations. For example, you could construct a page on https://blog.example.com/my-summer-vacation-2020 and load all resources in the form of static/style.css, static/comments.js, media/header.webp and so on. As long are you're aware of the differences and the way your web server resolves static content, you can pick either, whatever you prefer.

I have no idea what Vercel, Render, and Azure Static Web Apps even are, but you should obviously be aware of their limitations when using them to host HTML. I'm sure there are plenty of other behaviours that differ between them and there's hardly any universal truth to be found in the decisions some big corporations once made.

This article seems more about the default slug-to-content resolution algorithm from a bunch of cloud providers than it is about trailing slashes.


I don't get it; that's just how URLs work? This is hardly unexpected behaviour in my opinion. They're not always required, it's a matter of expectation and understanding of how relative URLs work.

You can see /1.0 as a file and /1.0/ as a directory, so the relative URLs make perfect sense either way.


Why does .Parse trim the 1.0 when there isn't a trailing slash?


Because the term "relative" is relative to the current directory, not the current file. It's subtle, but if a directory is being referenced in a form that looks like a file (without the trailing slash) with no additional context, then the parent directory is considered.


Is that per standard, or the result of an implementation detail in Go that might be different in e.g. Python?


Part of the URI standard:

      return a string consisting of the reference's path component
      appended to all but the last segment of the base URI's path (i.e.,
      excluding any characters after the right-most "/" in the base URI
      path, or excluding the entire base URI path if it does not contain
      any "/" characters
https://datatracker.ietf.org/doc/html/rfc3986#section-5.2.3


It's a client-side thing, not a server-side thing. That is, if you have a resource (like an href to another page) in your html, and that link is a relative link, the full URL that your browser ends up requesting depends on whether there is a trailing slash or not, because "relative" is relative to the current URL's directory, and what is considered the current directory differs depending on the presence of a trailing slash.


Trailing slashes are terrible in right-to-left contexts. The trailing slash suddenly teleports to the beginning of the URL visually. Take this example:

  <div dir="rtl">
    http://example.com/example/
  </div>
This displays as if it were the string:

  /http://example.com/example
It's awful. Much better if everything was configured to leave off the trailing slash.


I know extremely little about RTL rendering, but it looks to me like if URLs are always meant to be interpreted as LTR, then they should be wrapped in <span dir="ltr"> any time the context is RTL. Perhaps this could be added heuristically (like how automatic linking happens heuristically) by wysiwyg inputs.


Seems like a bug in the URL detection heuristic, no?

I would not have expected that URLs are detected inside rtl context in the first place.


It's not about URL detection (which would wrap this URL in an <a> object). It's just about the way forward slashes are treated in a right-to-left context.


Machines will occasionally add trailing slashes, and they will rarely remove them.

Users will occasionally add trailing slashes, and will occasionally remove trailing slashes.

So as long as $PATH and $PATH/ map to the same outcome, and redirection from the ‘wrong’ one to the ‘right’ one to the other uses a 307/308 to allow non-GET methods to redirect, then everything will always work out okay.

Varying from that recipe in any regard is a source of pain and trauma in every complicated ingress scenario I’ve worked with for twenty years. (Dispatch methods, regular expressions, exact string matches, all of them.)


> Machines will occasionally add trailing slashes, and they will rarely remove them.

This doesn’t match my experience at all.

In the context of Linux and Windows file system stuff, it’s quite the opposite: all normalisation techniques that I can recall having encountered have eliminated trailing slashes, and it’s not unknown to treat a trailing slash specially in some way (e.g. rsync, or more tenuously Vim’s 'directory' option), and historically a lot of Windows stuff would choke on a trailing backslash (or on forward slashes at all), though it’s exceedingly rare now.

Of URLs, hmm… I can’t think of ever having encountered the addition or subtraction of a slash, except for the normalisation of URLs with no path (https://example.comhttps://example.com/). Query string parameters, sure, but the path, never.

—⁂—

> and redirection from the ‘wrong’ one to the ‘right’ one to the other uses a 307/308 to allow non-GET methods to redirect

I’m not sold on 307/308; I think it’s probably encouraging the wrong thing, and that you want non-safe submissions to the wrong URL to fail—in fact, I’d go so far as to say that it’s preferable for these redirects to only be done for GET and HEAD requests, and return 405 on POST. These redirects are for humans that have mistyped, copied a slightly incomplete URL, or where a plain-text link detector has misperformed—all situations where HEAD and GET are the only possibilities; these redirects are not for machines except in those situations, and not for POST. POST targets are usually hard-coded or generated, so if someone gets it wrong, they can fix it because it’s obviously not working. But if you use 307/308, then you’re just slowing things down for every single user by adding a pointless redirect. (In fact, I mildly wonder if doing slash redirects is a bad idea even on GET, since the wrong URLs invariably end up in the source of web pages, slowing down the link rather than just having it obviously broken and must-fix. I think the casual cases outweigh that very mild harm, but I’m not certainly decided.) No, better not to encourage an incorrect target for any non-safe methods.

Another thing to think about here is that this kind of flexibility has a habit of leading to a lack of robustness, especially of security (Postel was wrong: <https://en.wikipedia.org/wiki/Robustness_principle#Criticism>). I can imagine a situation (inordinately convoluted and exceedingly improbable, but realistic) where the use of 307/308 allows you to smuggle something malicious in.


You can, of course, deviate from my recommendations as you see fit. As you note, you’ll have a higher incidence of breakages (due to inverted-Postel), and that means a higher incidence of debugging/support issues as a result. As long as you make that choice consciously, I see no reason to object.


The common redirects that most webservers do are:

/path to /path/ *if and only if /path is a directory in the filesystem*

/path/ to /path/index.htm[l] (usually the default of some indexing module, often configurable so you or some other module can add index.php etc.)

Redirects can be internal or external. Nginx, for instance, does step 1 via an external redirect and step 2 via an internal redirect, so the final url displayed in the browser when the input was /path would end up as /path/ but not /path/index.html even if that's what's being served. You can, however, combine both steps into one and make it an internal redirect by doing something like "try_files $uri/index.html ..."

It's not standard to rewrite:

/path to /path.html

But almost any webserver can be configured to do so, and various websites and web apps may have reasons for doing that. Then it's merely a matter of understanding the webserver's internal configuration rule order to determine whether it'll redirect /path preferentially to /path/index.html or to /path.html

There's no universal right or wrong. Every one of those paths is different, and webservers can choose which to rewrite to which other.


> SEO: if your content exists at two (or more!) distinct URL endpoints, it is a SEO no-no. […] You need redirects.

Or you could just only serve it from one of the two URLs and let the other 404. (Yeah, static file servers don’t tend to be fond of working this way, but it’s not an unreasonable option if you’re forming a matrix of fundamental possibilities. And indeed, in the analysis, it’s what half the servers do at least some of the time; and it’s what all standard static file serving software that I know does out of the box.)

> Vercel, Render, and Azure Static Web Apps: slashless /resource returns content [from resource/index.html] but without redirects, resulting in multiple endpoints for the same content.

This is obviously Wrong with a capital W because of how it breaks relative URLs in the file. Surely it should just be considered a bug and fixed (probably to redirect to /resource/)?

> Almost everyone agrees that /resource should return content from resource.html

This is a biased view because you’re only considering mildly-opinionated static file servers and their configuration. If you’re serving it yourself with things like nginx or Apache httpd, you won’t get this out of the box and must opt into it. (No idea about others like Caddy.)

> (When both resource.html and resource/index.html are present and /resource/ is requested.) Netlify redirects to /resource instead.

I say of this too that it is obviously a bug. Netlify isn’t just taking an opinionated stance, it’s doing what is fairly unequivocally the Wrong Thing.


>> SEO: if your content exists at two (or more!) distinct URL endpoints, it is a SEO no-no. […] You need redirects.

> Or you could just only serve it from one of the two URLs and let the other 404.

That's not a good idea. The URL you are choosing to remove may have built up some link equity, which you will lose if you 404 it. Instead of 404ing it, 301 redirect to the surviving URL. This allows search engines to consolidate equity from both URLs into one.


I’m not saying remove a URL; that would of course be wrong. I’m saying not to have used it in the first place.


This + better user experience if external sites are linking into both page versions. You dont want one group getting 404 when you could quickly get them to the correct page.

As a rule if a page has traffic it should be redirected to the next best when being removed, even if its a old promo page that's not relevant etc.


> SEO: if your content exists at two (or more!) distinct URL endpoints, it is a SEO no-no. […] You need redirects.

No you don't.

<link rel='canonical' href='whatever form you want your URL to have' />


(You’re responding to a point in the original article, not a point I made. I contemplated mentioning <link rel=canonical> myself, but others have also already mentioned it in the thread so I didn’t bother.)


> with things like nginx or Apache httpd, you won’t get this out of the box and must opt into it

For anyone wondering, in Apache this can be done with MultiViews [0], which is kind of nifty, but also a can of worms.

I find it interesting that TFA says "when resource.html and resource/index.html both exist ... Everyone agrees that /resource should return content from resource.html" but it seems like [1] MultiViews works the opposite way: "if the server receives a request for /some/dir/foo, if /some/dir has MultiViews enabled, and /some/dir/foo does not exist, then the server reads the directory looking for files named foo.*" [0].

[0] https://httpd.apache.org/docs/2.4/content-negotiation.html#m...

[1] I could be misinterpreting the doc. I haven't tested the actual behavior of httpd to know exactly what happens in a no-trailing-slash request of this nature.


To me, conceptually, trailing slashes are as valuable as trailing commas in English. They're separators, and on the end there's nothing left to separate.


When dealing with hierarchical filesystems, a trailing slash means "is a directory" -- see CLI tab completion. A static site is a filesystem. From this perspective, if there might be child pages, a trailing slash makes sense.

But URLs don't need to correlate with filesystems -- see "clean URLs" in various php-based routers (all non-asset URLs are rewritten to load a single script). From this perspective, I agree with you.


When dealing with hierarchical file systems, very few places include that trailing slash when referring to the directory. In most places it’s optional and ignored. In a few places, including the trailing slash will change behaviour (e.g. rsync, mv, and where symlinks to directories are involved where it’s just all completely unpredictable and varies widely).

This analogy would be an argument for allowing a trailing slash and redirecting to sans-slash, but not for requiring a slash.


I agree with regards to the trailing slash, but in my opinion URLs should correlate with filesystems, since they share many properties.


URLs are not filesystem paths. You can map a filesystem to a URL and (hopefully) vice versa, if you want to. There are an infinite other use cases for URLs, however, so no they do not correlate.


What they (the path part of a URL, and a filesystem path) do share is that they're each paths. A description of getting from here to there. Or there and back again, perhaps. The substrate could be anything or nothing.


Do you put a full-stop (period) at the end of a sentence?


I would argue it's not needed for the last sentence. In fact, when people are freed from convention in contexts like chat programs, they often drop unneeded bits like trailing periods


No, it is needed, because that’s how you know the sentence is complete. The period does not separate sentences, it signals that a sentence is finished. Without the period you don’t know whether the author has completed their thought or not.

For example, as far as I can know, your last sentence might have been intended to be: “In fact, when people are freed from convention in contexts like chat programs, they often drop unneeded bits like trailing periods and any semblance of grammatical decency.”


By that argument, there would be no sentences in spoken language, since speakers generally do not say "period" or "full stop" at the end of their thoughts. I find your argument unconvincing


We do have a verbal “period” - your voice inflects downward. If you stop talking without the inflection it sounds like you intend to continue your sentence


I noticed my inner voice reading your (and the previous period-lacking comments upthread) in just this way. It makes me slightly uneasy (which is interesting, because while I notice when friends does this in SMS/chat, I don't get that feeling - I even occasionally do it myself).



This is why we use intonation in spoken language.


Note that placing the period inside the quotation marks creates uncertainty about whether the quoted sentence terminated with a period or not. I understand that classic formal English demands it, but I enjoy the irony that adhering to ‘grammatical decency’ led to reintroducing the same uncertainty about whether that sentence-terminating quote is a fully-quoted sentence, or merely a phrase.


> Without the period you don’t know whether the author has completed their thought or not.

Why does an author publish an unfinished thought? Do you regularily chat with people who send in the middle of their sentences?

I absolutely fucking hate

chatting with people who

do this sort of

stupid thing.


> No, it is needed, because that’s how you know the sentence is complete.

In Ancient Greek, Latin, Hebrew, among other languages, people didn’t end sentences with periods. They didn’t even put spaces in between words. You can tell where one word ends and the next begins because you know the words of the language-and, in most languages words contain certain patterns; for example, in English, certain letters almost never occur at the end of a word, which can help you guess word boundaries even if you don’t know a particular word. Likewise, ending of sentences can be inferred from grammatical structures - although, ancient Latin style tends to involve massive run-on sentences, in part because when the end of sentences are not marked, people care less about how long sentences are. Alphabets were around for many centuries before whitespace and punctuation were invented and standardised.


Why have you dropped the full-stop in this comment but none of your other comments? You’re just putting it on.


well no, that was a demonstration of a whole other approach to writing

if you care to browse my comments it looks nothing like this, at all, catch me on twitter dot com to see this all day erry day

I do know the local social norms but there is more than one way to illustrate them ¯\_(ツ)_/¯


It's not "unneeded bits". In fact, chatters aren't freed from convention, but rather have established a new convention where adding a period at the end of chat messages can now be viewed as "aggressive".

https://www.nytimes.com/2021/06/29/crosswords/texting-punctu...


If you automatically serve an index.html when the url is /resource/ I presume you would also serve the same page at /resource/index.html which in practice means that you are again in same content at two different URLs land. I lean more toward the principle of be permissive in what you accept principle here and would serve the same content for: /resource /resource/ and /resource/index.html if presented with the url without doing a redirect. But in all my links or documentation standardize on just one of those. which in practice means that for most crawlers you'll only have one effective URL for the content, while still providing an experience that isn't annoying for users if they happen to type in a trailing slash for the browser.


The canonical link meta tag solves the SEO issue. Google uses this hint to help it understand which is the URL that you want to have indexed and avoids duplicate content. In this case you would add this meta tag:

<link rel="canonical" href="https://example.com/resource/" />

https://developers.google.com/search/docs/advanced/crawling/...


> without doing a redirect

Do you never use relative URLs?


I've used Django for a long time. Django defaults to adding trailing slashes to make relative URLs easier to implement correctly. I've always found that sensible and useful. I've recently been putting some APIs behind GCP's API Gateway and discovered that their OpenAPI implementation strips trailing slashes: https://cloud.google.com/endpoints/docs/openapi/openapi-limi...

So... I guess no more trailing slashes for me.


Trailing slashes should be optional and the server response should be the same regardless of their presence. At least, that's how I expect my applications to behave. It's the principle of the least amount of surprise. You might be pleasantly surprised that it works if you add the slash as opposed to be mildly annoyed when it gives you a 404. Requiring the slash would be very surprising. The least surprising is that they both work the same way. I see no technical reason for them to behave differently.

Some web-servers do this and others require fiddling with to get them to behave that way. But ultimately. the slash is just a separator and not semantically relevant. It's like many languages now allowing trailing commas in lists. It's convenient. The comma does not add an extra element.

One place where this comes up in practice is with specifying base URIs in e.g. configuration files. Somewhere else, this base URI is consumed to construct a full URI using a suffix that may or may not have a leading slash.

If your base URL is http://foo(/) and you want to append (/)bar, you might end up with http://foobar, http://foo//bar, http://foo/bar depending on what people do on both sides. Or those three with a trailing slash. There is no right answer.

The only sane behavior that follows the principle of the least amount of surprise is to make sure that base uri and suffix will be separated by exactly one slash and assume nothing about the presence of leading or trailing slashes. That way, nothing will break or behave unexpectedly if people add or omit a trailing or leading slash.


The lack of canonicalization has caused us tons of trouble on an Angular app. Check out https://symflower.com/en/company/blog/2021/path-independent-... - you'll be laughing (or crying)


> The NearlyFreeSpeech.NET member site you are attempting to reach appears to be temporarily unavailable.

https://archive.is/XvE2X


/path is different from /path/. They can serve the same content depending on web server config, though. For an example of how they can be different, check https://news.ycombinator.com/newest vs https://news.ycombinator.com/newest/.


You know what’s better than the “resource” URL leading to resource/index.html? Having the “resource” URL lead to plain resource.html. Then you can still have a directory named “resource” and any relative links in resource.html to “resource/thing” (to reach resource/thing.html) will feel natural, as “thing” is indeed in a subdirectory.

(Apache supports this using “MultiviewsMatch”.)


I like to use /resource/ traling slashes only if the page is a mere index of pages in that directory. If the page has its own content I use /resource without a trailing slash, even if there is a matching directory, because I think it looks prettier - but that is of course purely a matter of taste.


When in doubt, check the spec.

In this case it is in RFC 2396 "Uniform Resource Identifiers (URI): Generic Syntax" [1].

In Section 3 you find this. The forward stroke ("slash") is a separator:

    URI that are hierarchical in nature use the slash "/" character for separating hierarchical components.

[1] http://www.ietf.org/rfc/rfc2396.txt


… and?

(Your answer doesn’t enlighten anything. You could possibly be saying “it says use slash as a separator, not a terminator”, but it can be quite reasonable to declare a null component at the end as a means of accessing the default resource for that “folder”. Pages often have other resources associated with them. The fact of the matter is that the folder/file divide just isn’t present in URLs, and so the answer in a situation like this is ambiguous.)


I initially understood the prose in Section 3 like you interpret it (separator not terminator). That was a mistake. The RFC's BNF grammar is quite clearly stating that path segments may be empty, so it may end in "/".

Therefore, in relation to the article, the URI RFC does indeed not offer guidance about what would be the better approach.


> SEO: if your content exists at two (or more!) distinct URL endpoints, it is a SEO no-no. SE-no-no. SEO-apolo-graphql-anton-ohno (I apologize for nothing). Ahem. You need redirects.

Probably not.


Any reason not?

I would say from SEO and UX perspectives this is a must.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: