I'm Daniel Kehoe, and I participated in the brief www-talk "Space after Periods" discussion thirty years ago (as Daniel Miles Kehoe).
I wrote a blog post three years ago, "Personal History: Punctuating the Web (1993)," that offers more context and perspective on the discussion [0].
We discussed this on Hacker News three years ago when Microsoft Word began flagging double spaces after a period as errors [1].
That brief discussion thirty years ago still seems to arouse strong feelings for people.
Today, looking back, I'm mostly amused that Guido van Rossum (the developer of the Python programming language) wanted web browsers to collapse multiple spaces to a single space after periods. Python, notably among programming languages, treats whitespace as significant.
I heard about this issue back in the 80's talking to a friend doing an early WYSIWYG word processor development project. Typeset, proportional spaced, yes, same spacing between sentences and words, and if you eyeball typeset books from the precomputer era, that's what you will see.
I also looked it up in Chicago Manual of Style, and one thing I remember is, you want to get your hands on an earlier edition (for geek fun) because they have much more detail about typesetting than they do in later editions. (I don't know what the edition sweetspot is but I found a 1930's copy and I liked it)
>mostly amused that Guido van Rossum...wanted web browsers to collapse multiple spaces to a single space...Python...treats whitespace as significant.
I learned to type on typewriters in the late 70s. I was thrown in a typing class in high-school and I thought to myself "this is a useless class, I'm NEVER going to ever need this". Turns out it was one of the best things I learned in high-school.
Anyway, I never used double spaces after periods. My typing teacher would always give me grief for it, but for whatever reason, I just never did it. As you can see, I still don't.
I was thrown in a typing class in high-school and I thought to myself "this is a useless class, I'm NEVER going to ever need this".
Interesting, given that I specifically sought out typing class in high school in the late '70s because I was confident that there was going to be a lot of keyboard usage in my future, might as well learn to use it.
And I don't want to hear whinging about subtle mechanical keyboard differences. When you learn at an under-funded rural school with manual typewriters, you also learn to appreciate anything that isn't making your fingers do mechanical printing. :-)
> I think it's mostly
propaganda by Knuth and Kernighan (TeX and troff) that makes people
want this.
> Let's keep HTML simple!
was unexpected. Knuth as propaganda!
The discussion noted some O’Reilly books sentence-spaced, and layout software had different levels of support with TeX notably supporting it. And that’s the key: it’s a typographical layout choice. For document input, for writers, it’s a matter of markup for the output typographical layout not double-spacing per se. It’s an aesthetic choice only, similar to the spacing we add between letters, around punctuation etc.
So the question is not, should we support double spacing, but: should we support this sentence spacing typographical technique?
Because double spaces are an hack used to achieve layout they should not be necessary, but word processors not supporting it leads to people implementing it themselves. (If you’re a product manager, this is the sort of thing you spot to indicate a missing feature.)
It gets worse on the web: because most white space is collapsed together the double-space trick doesn’t work, and it’s _very_ hard to do in CSS.
Marc Andreessen’s 1993 comment ‘how is a text widget supposed to know where a sentence
ends and where it doesn't? That sort of syntactic analysis we can't
do...’: yes, it’s hard. English parsing is hard. Other languages are hard. I don’t believe that’s a reason not to do it any more.
My website has sentence spacing [1] in CSS as a purely aesthetic choice based on a technique by Tom Fine [2, 3]. And it’s automated: I run the site text, single-spaced, through a Python script that understands sentences and marks up sentence spacing.
“Some” O’Reilly books aside, any professionally designed book published for the last hundred years on the shelves of anyone reading this, as well as every magazine and newspaper, uses single-spaced sentences.
That HTML collapses multiple spaces is a feature, not a bug. It means you can, if you want, double-space sentences while authoring/editing your source files and the result will still come out “correct” (according to settled practice).
> any professionally designed book published for the last hundred years on the shelves of anyone reading this
With respect, that's an incorrect assertion. I have books on my shelf published up through the eighties with wider spacing between sentences. A more accurate comment might be that in the modern design era, and I'd guess this is approximately post-WW2 onwards, sentence spacing became less common and it is now common or standard not to. That clarification is true.
How do I know? I like wider sentence spacing (see my comment above on this thread) and have kept an eye out for it in books when I buy them.
I suspect it was computers that killed wide sentence spacing: the web, as seen in the linked discussion thread, combined with word processors that weren't as capable as TeX and didn't support it. Here we are thirty years later and it's become the absolute norm, so much we can barely imagine an alternative. But I regret that technical decisions and limitations have left us in this position.
I’d love to, but most of my books are in storage. The two I bought recently as examples of typography are an old copy of H G Well’s Veronica, which does display sentence spacing but is too old to use as an example for you, and an ancient book in Estonian printed in German-style fraktur.
If it helps I do think the eighties one, and I wish I could remember what it was, was indeed an outlier. I’d actually love to track down the O’Reilly’s books mentioned in the linked thread.
In addition to that it only ever has been an English typographic tradition. The option to turn it off in LaTeX is called \frenchspacing for a reason. It could be \worldspacing as well.
Thanks! I agree re the style -- I've attempted to make a 'spirit of the old web' (no Javascript, no tracking cookies, etc) and text-based, CSS-based site. Essentially: communicate what I want to, without the overhead that's normal in many sites today.
Yes, both bold and italic are being applied by the browser, and I do need to fix that. Italic is a temporary choice because I don't like the look of EB Garamond (the current text font) in its italic face, and need to find an alternative. Bold, though, is not intentional. I've noticed the macOS and iOS stacks mimic bold differently, which is an interesting thing to have found.
Excellent point re allcaps for acronyms. Should be pretty easy to add to the site generator. Thanks for the suggestion :)
It's funny that such a flippant exchange is actually hugely significant for the English language. As most Roman text across the World is now consumed via HTML this brief discussion put the final nail in the coffin for the double-space-after-sentence jig.
The fact that the exchange is recorded is magnificent to me. We don't have a written dialogue of the guys who decided to start putting spaces in English words in the first place...
It doesn't treat all whitespace as insignificant. It does, by default (for contexts where `white-space: normal` is in play) collapse how sequences are displayed—namely, that they appear just as if a single space had been used.
so to clarify then HTML in older versions treats all whitespace as insignificant but can be overridden in combinations of newer versions HTML and CSS when interpreted by a browser that understands the styling decisions overriding the default behavior?
The <pre> tag has existed since HTML 2 for displaying preformatted/whitespace-sensitive text, and HTML 1 had the mostly similar <LISTING> tag (plus <PLAINTEXT> which is a little different).
ok well I forgot the PRE tag as CaptainNegative pointed out but when you say
>No. It works the way I described.
what you described made reference to white-space: normal which is a CSS property that I don't believe is available as part of the HTML standard itself (although I don't really keep up anymore so I could be wrong) but certainly wasn't part of older versions of the spec.
You are putting undue focus on a parenthetical (that I only even put in as a hedge[1] in the first place).
Copy and paste my comment somewhere, delete the parenthetical, and then read the result to yourself.
"HTML [...] treats all whitespace as insignificant" is simply inaccurate, no matter how you want to constrain it (e.g. "in older versions" or not). Whitespace is not insignificant.
Let me be clear about what I meant by whitespace insignificance.
When you put plain text into an element, that is equivalent to a string in typical programming terms. No, whitespace is not entirely insignificant within a text node. But almost. If we leave out <pre> and other special cases here, HTML specifies to ignore any extraneous whitespace and simply collapse it into a single space. So it is “extraneous whitespace insignificant” in a sense. It doesn’t ignore whitespace interely, but no one would expect that in the contex of a string in any language, even a whitespace insignificant one.
In a text node HTML goes out of it’s way to minimize the meaning of whitespace, but it does do the minimum of respecting that words have spaces between them. You can put spaces some places and have it break or change stuff, like in the middle of an attribute name or value, in the middle of an element name, etc. But you would expect that to happen in any whitespace insignificant language. Outside of that and a few special cases, the default behavior is to ignore whitespace (for example whitespace between the beginning or ending tag of an element and the text node it contains), and as such HTML is very much whitespace insignificant in my opinion.
The reason why I commented that this design was absolutely the right call is basically cases like building a website in PHP, where you mix the two languages together. Here you end up adding a lot of whitespace from indenting your code, etc., and it would be a nightmare if HTML didn’t treat whitespace as it does.
> HTML specifies to ignore any extraneous whitespace and simply collapse it into a single space[...] Outside of that and a few special cases, the default behavior is to ignore whitespace
No it doesn't, and it's not. What you're describing is how the browser displays the content. (And a few other things—like interactions when you select text to drag and drop or copy it to the clipboard.)
> building a website in PHP[...] you end up adding a lot of whitespace from indenting your code, etc., and it would be a nightmare if HTML didn’t treat whitespace as it does
You keep saying "HTML" when you mean something else. In almost every instance if you just said "the browser" (broadly) instead, then you'd be good, but you keep saying "HTML".
There are absolutely parts of the browser that don't care whether they're seeing one space or a thousand varied whitespace characters (tabs, carriage returns, linefeeds, etc), because based on what style properties are in effect at that place the browser will be presenting that content to the user as if there's one space character when laying it out and putting it on screen. But the only whitespace that gets ignored in HTML, really, is the whitespace inside angle brackets around attributes and element names.
Your string metaphor is a good one. Content marked up with HTML is like one big string, and as you say, no one would expect whitespace in a string to be insignificant. It's not insignificant in HTML, either; it does, by default, get painted as if sequences of multiple whitespace characters were a single space, in most contexts. But again, that's a separate thing entirely.
I don’t understand your distinction between “the browser” and “HTML” in this context. The browser is merely the interpreter of the language, but the HTML specification lays out how the language should be interpreted.
Also, this is an example of whitespace that is ignored:
<p>[whitespace here]I’m a text node[more whitespace here]</p>
I don’t believe that is what you referred to when you said “inside angle brackets around attributes and element names”.
Here the whitespace or sequence of spacelike characters is not collapsed into a single space. It is simply ignored, and the text node (string) begins at the first non-whitespace character.
That is actually what I referred to when I said that you end up adding a lot of extra whitespace when building a website in, say, PHP. Because that is where it typically ends up in the generated output.
$ dump ./scratch/p.html
3c 70 3e 20 20 0a 20 20 49 27 6d 20 61 20 74 65
< p > . I ' m a t e
78 74 20 6e 6f 64 65 20 5b 20 20 20 20 5d 20 20
x t n o d e [ ]
20 20 3c 2f 70 3e 0a
< / p > .
(I replaced your first square bracket sequence with two spaces followed by a newline (U+000A) followed by two more spaces, and I replaced the second square bracket sequence with a space followed by a literal left square bracket, followed by four spaces characters, followed by a literal right square bracket, followed by four more spaces.)
The text node's value is exactly the sequence of characters between the closing angle bracket in `<p>` and the opening angle bracket in `</p>`:
" \n I'm a text node [ ] "
> The browser is merely the interpreter of the language, but the HTML specification lays out how the language should be interpreted.
You're right about the second half, but you're wrong in thinking that it says extra whitespace should be ignored. It doesn't. The bigger problem, though, is in the first half.
I think you have an oversimplified understanding of what's going on in a browser and of the relationship that HTML has to what you see when the browser paints the content on the screen and lets you interact with it; a fundamental misunderstanding seems to exist on your part regarding the pipeline that you do or don't think of as existing between the markup and what you actually get when you open the page in a browser—there's a lot more to it than the browser being "merely the interpreter" for HTML.
I see. Don’t know if you’re still checking for replies on this thread. Livin’ up to my name. Thanks for taking the time to explain, though.
I’m going to have to look further into this to get a better understanding, but I suppose the rules for collapsing whitespace in a text node exist somewhere in the HTML specification, but not at the “interpretation” stage as I assumed.
To be clear what I imagined was that at the interpretation stage a text node would be marked to begin at the first non-whitespace character and end at the last non-whitespace character. And then within the text node there might be additional whitespace that would need to be collapsed into a single space.
Since the first type is not rendered at all and the second type is collapsed to a single space I assumed the rules could exist at two different points in the process/pipeline.
So what I gather here is that both types exist at a later stage than “interpretation” (basically what you see when you open Developer Tools and inspect individual nodes).
But I guess the subtlety here is that at whichever stage the whitespace collapsing/removal happens, the rules for it would still have to be defined by the HTML specification somehow.
And another subtlety to counteract that is that HTML is a markup language and not a programming language. One is executed, one is rendered. So any comparison between say Python and HTML needs to take that into account.
So even though there is some whitespace ignoring going on at some point from:
<p>[whitespace]This textnode has extraneous whitespace[whitespace]</p>
To the point where [whitespace] is not rendered in the viewport, the fact that the ignoring does not happen at the “interpretation” stage is important because that’s as far as the comparison between say Python and HTML can go before the two veer off in different directions.
I’m mainly typing this out for my own understanding, but again, will have to look into it myself to validate or correct my current framework of thinking about this. Thanks for an interesting discussion
Pretty much. HTML parsing produces a content model, where the model's whitespace matches pretty faithfully what's in the source document. At some later point, that model is massaged into the thing that you see and interact with—but the model itself retains everything; this is like a filter, if it helps to think of it that way, or a projection of a complex (e.g. 3D object) onto a lesser substrate (e.g. 2D plane).
Offhand, and after a few glasses of wine, there are a couple points where the whitespace collapse will occur:
- at the display level—when it's time for the browser to actually put the thing on the screen—for CSS contexts where the white-space property is "normal" or something similar, at least, or
- at the interaction level, when something like text selection happens, and the browser computes essentially the equivalent of node.innerText (versus node.textContent; alternatively: node.nodeValue, in cases where the node in question is a text node)
> looking back, I'm mostly amused that Guido van Rossum (the developer of the Python programming language) wanted web browsers to collapse multiple spaces to a single space after periods. Python, notably among programming languages, treats whitespace as significant.
It does so by collapsing any amount of space into single INDENT and DEDENT tokens, so I'm not sure what the irony is supposed to be. If the last line started with 8 spaces, Python will not distinguish between the next line starting with 10 spaces and the next line starting with 50.
> If the last line started with 8 spaces, Python will not distinguish between the next line starting with 10 spaces and the next line starting with 50.
Perhaps I misunderstood your comment, but this is quite wrong. If I have a block that's indented at 8 spaces, and the next line is 10 spaces, but should be in the same block, it will definitely throw an IndentationError.
Yes, you've misunderstood. If the last line started with 8 spaces, Python will see a difference between (the next line starting with 8 spaces), (the next line starting with 6 spaces), and (the next line starting with 10 spaces). Option 1 does nothing. Option 2 produces a DEDENT token. And option 3 produces an INDENT token.
In the same scenario, Python does not distinguish between (the next line starting with 10 spaces), which produces an INDENT token, and (the next line starting with 50 spaces), which produces the same INDENT token. The amount of whitespace isn't relevant. All that matters is the comparison between one line and the previous line.
It's a bit funny because it's actually hard to represent Python code in regular HTML, exactly because of this collapsing.
For example, here is some Python code typed correctly in the comment box but rendered without the code formatting support (using a | at beginning of line to prevent the code formatting to kick in):
|def foo(flag):
| if flag:
| print("this is indented")
| print("is this?")
If you check the HTML, it looks like this:
<p>|def foo(flag):</p>
<p>| if flag:</p>
<p>| print("this is indented")</p>
<p>| print("is this?")</p>
One thing that doesn't get brought up is that there is a difference between typewriter and word processor. One space works best in proportional fonts with typeset work. Two spaces work best in monospace fonts like text editor. The reason is that in typeset, the space after periods is wider than space between words, while in monospace need to put extra space to distinguish sentences. I think that some of the conflict comes from when and where people learned the rule.
Before computers, people used typewriters and were taught to use two spaces after period. Typesetters knew to use the right space. Early computers were mostly monospace. Word processors introduced variable fonts and were smart enough to collapse two spaces. Email and text editors were monospace. The web made everything proportional and collapsed two spaces to one hiding any difference in source. One space has won enough that people use one space in monospace text.
Historically English-language typography, in contrast to French typography, used a larger space between sentences than between words. Typists imitated this traditional style. The single-space style gained typograhic ground through the twentieth century, becoming the norm in the 1950s.
Preferring a single space between sentences is a reasonable choice. But flagging or "correcting" the two-space style is simply a mistake.
This is covered in the "The PC Is Not a Typewriter"[1], by Robin Williams. (No, not that Robin Williams.) I adjusted from two spaces to one back in 1999 or so, when I read this book, and the adjustment wasn't as painful as I thought it'd be. I learned on a typewriter in the late '80s.
For some reason, Stephen King uses two spaces between each word in his Tweets.[2] Not sure if this is some statement he's making or a technical glitch.
> Did you switch from two spaces to one space while writing in a text editor using monospace font too?
Yes, I switched to one space no matter what, but my use of monospaced fonts was rare back then. Even today I use only one space after periods, even when using monospace fonts. HN is an interesting case: The input field in comments is (I think?) monospace, but the output of posts is Verdana, which is not monospace.
Except it was brought up in the 3rd message[0] of the linked thread.
> Typists follow the rule of adding an extra space between
sentences within a paragraph. The rule for books (and by extension,
anything that does not use a fixed-pitch, monospaced font) is to use
the same space between sentences as between words.
Yes, I wrote that message 30 years ago. I actually had copies of both "Words into Type" and the "Chicago Manual of Style" on my desk when I wrote that message because I had worked in the publishing industry as a copy editor before getting involved in the tech industry. Some publishing houses had their own style guides or were inconsistent from book to book (like O'Reilly), but WIT and Chicago were the accepted authorities. Whether the conventions of the book industry should have guided technologies like web browsers is a matter for debate, but at the time I wanted to make the point that two spaces after a period were typists' habit and not a publishing convention. I'm amazed that this point is still debated so passionately when most people have never touched a typewriter.
This is a continuation of a discussion on a different page about spaces and tabs in HTML documents [0]. It's actually a much more interesting discussion than this title and the tail end of this conversation would indicate, hinging on a disagreement over whether white space has semantic meaning or is purely presentational.
Terry Allen from O'Reilly argues that white space is significant to the meaning of the text and therefore needs first-class support in HTML, even outside of PRE. Others emphatically consider white space to be a formatting issue that has no place in HTML, which they feel should be entirely presentation-agnostic.
Interestingly, a couple of times people mention the possibility of building in support for TeX and other markup languages to fill the need for precisely-formatted documents. Specifying the presentation of HTML in a separate file (what we know as CSS) doesn't appear to be on anyone's radar yet. I wonder what the web would look like today if that had been the route we took.
I actually added a DVI (TeX) renderer to Mosaic in 1993 (while working at US CS&E). NCSA/Marc wasn't interested in the patch. If the server said the document was MIME typed <the correct type for DVI>, the regular page renderer was bypassed and code that I ripped from ... not sure where ... drew into the browser instead.
Being harder to parse is not a trivial detail, it's the whole ballgame. Written language has a lot of redundancy. That's one of the things that makes it work. The redundancy makes it an error-correcting code so that the information doesn't get dstryed by miner errers. That is what allows you to glean the meaning of a message even without whitespace. Y cn d th sm trck by lmntng vwls (u o eiiai ooa).
Just because you don't completely destroy the message by eliminating whitespace doesn't mean that the whitespace wasn't semantically significant.
[UPDATE] Whitespace is actually a pretty modern innovation, and it is not universal. In old manuscripts the text is often allframmedtogetherlikethis. Also, even in modern German it is common to compose verylongcompoundwords.
Back around 2006, there was an internal collection of quotes from senior Google engineers. There was one from a guy working on a particularly thorny issue with Russian search. He had just finished a project on Thai segmentation, and his response to how things were going with Russian was something like "Great! At least they have words!" (Obviously, he was being deliberately imprecise and understood that obviously Thai has words, just not whitespace-delimited.)
On a side note, the Korean alphabet is well-designed. In particular, teaching children syllabification rules is trivial. Syllables are pre-arranged into squares. Unfortunately, the lack of distinction between r-l, and p-b-f, and restrictions on consonant clusters makes it unworkable as a replacement alphabet for English.
I have to say, that is not a great example. Your earlier point still holds good that there is enough redundancy in English to distinguish these things.
Here "Isit?" means "Is it?" and that is obvious from the context.
Yes, that's true, but it's still ambiguous out of context, and the fact that it is also a direct response to what I said adds some unexpected humor so I give it bonus points for that.
The other day I discovered a font called Elstob that has an "Old-style punctuation spacing" mode that (among other things) adds significantly more space between sentences:
When Spacing is set to 1, the spacing between words and sentences and around punctuation marks is a good match for most books printed in the late eighteenth century.
Spaces are significant, but now we have unicode, we can stop thinking about the number of spaces and start using the correct unicode character for the type of space we want. Unicode has at least 16 different types of space character.
As an experiment, here is some text with differing spaces. I wonder if HN can handle it:
En space between sentences: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent eu ullamcorper mi, id dictum nibh. Pellentesque fermentum efficitur viverra. Ut tincidunt ut nunc non viverra. Nunc accumsan ultrices libero ut efficitur. Vestibulum a eros a urna vestibulum tempor. Ut vel sodales tellus. Aliquam enim velit, varius eget mi ut, blandit dictum nulla. Quisque non malesuada felis. Duis sit amet sapien at risus efficitur fermentum. Aenean posuere tempus elit, nec eleifend tellus bibendum a. Suspendisse potenti.
Em space between sentences: Pellentesque semper sed enim a rutrum. Suspendisse iaculis laoreet leo, a tristique lorem tincidunt id. Ut lacinia, sem a sodales fermentum, leo diam elementum elit, vitae egestas risus velit non magna. Curabitur nec ligula quis sem imperdiet rhoncus sit amet a lorem. Donec ut risus sapien. Aenean et nisl quam. Aenean nec interdum metus, quis vulputate enim. Interdum et malesuada fames ac ante ipsum primis in faucibus. Nunc pulvinar molestie imperdiet. Etiam non nisi id leo blandit placerat vel ac ligula. Duis in urna quis erat dignissim pellentesque hendrerit sed sapien. Pellentesque tempor magna eu lacus laoreet tempor. Fusce nec tortor sed urna placerat bibendum eu id mi.
[edit: no, it can't. They work in plain HTML, so I guess the HN form is converting them to plain spaces.]
[edit, the second: try html entities [removed] — doesn't work either.]
We could use [space], option+[space], and option+shift+[space] to give a regular space, an en space, and an em space respectively. This would be similar to hyphen-minus/en-dash/em-dash, so easy to remember. And it could be done in software.
I was going to say you need 5 space buttons to chord all possible spaces, but I guess you can use 3 space buttons in combination with ctrl, (alt|option), and (win|command).
This made me curious whether HN collapses dashes as well. These all render the same in the text entry box:
- hyphen
– n dash [option -] or [Alt + 0150]
— m dash [option-shift -] or [Alt + 0151]
Also, these render as 3 spaces or single space wide in the text entry box:
... three periods
… ellipsis [opt ;] or [Alt + 0133]
EDIT: So the - – — work, and can visually suggest what should be happening in one's typography if you do not double space after sentence ending punctuation.
One slight annoyance I have on my Mac(s) is that if you have an ellipsis following by a period, you can tell the difference between the two. I would think it's more proper (?) that all the dots transparently flow together: ….
The good news is, you don't have to think that hard. A number of them are language specific, or very specific to a particular context.
The bad news is, with European languages having the longest tradition of printed word, they also still have quite a few varieties.
And in a 2023 where even today I can get oohs and aahs from multi-decade white-collar professionals when I show them that you can automatically generate a table of contents for a document if you just use the "Header 1" and "Header 2" formatting stuff correctly, I can't imagine a world where people are routinely entering multiple types of spaces typographically correctly.
(Which type of space is the best to use when holding the space bar down to center something, anyhow?)
> but now we have unicode, we can stop thinking about the number of spaces and start using the correct unicode character for the type of space we want.
my keyboards only have one spacebar, man. that means things I type get one type of space, though there can be multiple spacebar presses in a row.
GenX and older. I get the impression that you read ellipses with a very different tone than I do. Ellipses indicate trailing off your sentence, especially with suggestive connotations [1]. As you can imagine, this is uncomfortable to read just as it is to listen to.
That seems like an unfortunate tool to have discarded. Right now, I can use vertical separation to structure text, but horizontal separation.... Well, that's far more limited.
Within sentences, there's the dash—either em or space-en-space, depending on where you're from—and there are plenty of other structural options. Between sentences, you've got the ellipsis, and that's it. The way they're used in the Declaration, in place of paragraph breaks, is helpful even when you're not just trying to save space—for example, to structure a list of short words or phrases without creating the visual separation from the text that comes with using separate lines.
No, that’s the whole point I’m making. A semicolon is no better than terminal punctuation in creating visual horizontal space, which is a tool for communicating semantic structure, but one that we never use. I am not speaking to what is grammatically proper, but rather to what is possible outside of current accepted usage.
I have no issues with people exercising their personal choices when it comes to the number of spaces after a period. However, I found it to be quite annoying when they began complaining about my preference for a single space and asserting that their method of using two spaces is more correct than mine. Ultimately, I've come to the realization that this is more akin to a matter of personal belief rather than a technicality. It seems nearly impossible to bridge the gap and find a consensus on this issue.
My issue with single-space is that it turns programmatically parsing out sentences from "pretty simple regex on terminating punctuation + (two spaces OR newline)" to "tangled mess of heuristics and corner-cases".
Obviously humans can figure this out easily, so I don't really notice or care if I'm reading something with single-spaced sentences, and it's pretty rare that I need to parse sentences... but the moment that need does arise, I'm gonna end up judging the hell out of that "personal choice" - to say the least.
While I cannot comment on the specific use cases mentioned by the previous person, one example where parsing out sentences can be useful is in the context of Computer Aided Translation (CAT) tools, also known as translation memory. As a software engineer in the field of localization, I encounter this requirement on a daily basis.
In CAT tools, there are well-established heuristics that handle the majority of cases, making sentence parsing relatively problem-free. For the remaining cases, reputable CAT solutions provide features that allow translators to merge or split segments, thus accommodating various sentence structures. As a result, this issue is generally not a significant concern.
It is unreasonable to expect or demand that individuals discard their writing styles solely for the purpose of text processing, especially when both styles are equally prevalent. The blame, if any, lies with the limitations of the language itself for not providing clearer guidelines in this regard.
Most common for me would be keyboard navigation in text editors. Emacs, for example, assumes double-spaced sentences by default when using M-a / M-e / M-k / C-x DEL for sentence-based navigation/editing: https://www.gnu.org/software/emacs/manual/html_node/emacs/Se...
Being able to count sentences per paragraph, words per sentence, etc. is also handy for maintaining good writing style. Excessively-long sentences or paragraphs can be trickier for folks to read; having solid metrics around sentence/paragraph length helps identify candidates for simplification.
> But seriously, how is a text widget supposed to know where a sentence
ends and where it doesn't?
By scanning for terminating punctuation followed by either two spaces or a newline. What's that? You don't use double-spaces between each sentence? Sucks to be you! The Man™ can pry my double-spaces from my cold dead fingers.
And I say terminating punctuation in general (not just periods) because there's a growing number of cases of other such punctuation occurring within certain proper nouns - for example, "Panic! at the Disco".
I thought I was specifically taught that this is why you do two spaces. It isn't that you need two spaces after a sentence, per se; but the system has an easier time knowing to do inter sentence spacing instead of title if you do that.
That is, Dr. Smith should have less spacing. I'm guessing there are so few things that that applies to, that most systems that care can just hardcode it nowadays?
Further, I thought any typesetting system would play with the spaces after words to make things pleasant. That is, a space isn't a set width in any typeset thing. That really only became a thing with typewriters.
Since the majority of the text I produce these days is rendered as markdown before being shown to a reader, I've taken to putting newlines after periods. It makes prose much easier to edit with a linewise-thinking code editor, and the reader doesn't notice.
After developing this habit, I now personally find it also easier to skim text with the newlines shown because I know that the beginning of the line is also the beginning of the sentence.
Same for me, I write either markdown or latex and both have this behavior. In latex it's outright recommended, for ease of commenting out or moving sentences.
Am I the only one who thought this was about low-orbit adventure following menstruation?
Because seriously I did. Thankfully, by the 1990s NASA had a bit more experience with female astronauts. A popular story of NASA’s initial accommodation is entertaining and seemed like legit fodder for HN thread:
Until today, I never put it together that Guido van Rossum (python) and Just van Rossum (LettError) were related, though it makes total sense in retrospect. mind blown
There has never been consistency here. Vote with your fingers, space stuff however you want.
And don't consider web tech to be a typographical decision, it's just the only technically feasible option if you want to make something whitespace-amount-agnostic so people can indent their html.
I used to put two spaces after sentences because people told me it is the correct thing to do. Now I put one space after a sentence because different people told me it is the correct thing to do. When in Rome, do as the Romans do.
For years, I kept my wifi with the passphrase “This is my passphrase. There are many like it, but this one is mine.” Without the quotes of course. With two spaces after the period.
At this point I am not going to overcome forty years of muscle memory. I assume whatever tool is doing the rendering will make things right according to the current convention.
I'm beginning to doubt if there was any humor there in the first place.
> When in Rome, do as the Romans do.
I mean if it had been a French city I could say that a pun about French Spacing is there in the joke. But Rome isn't a French city. So still very confused if and what the joke is.
I’m told it’s because the period is considered passive-aggressive. I assume it’s too formal final; maybe The Kids These Days hold out hope that a sentence might not really be over? At any rate whoever got the McDonald’s account embraces it wholeheartedly.
Only when sending a one-sentence text, where the beginning and end of the sentence are clearly the bounds of the text itself. You see periods still between sentences of multi-sentence texts. Sometimes other markers are used like ellipsis or emoji.
I really genuinely like this change for orthography. The feel & sound of spoken language is highly sensitive to the context and formality: you talk differently giving a speech than you do at dinner with your family etc. It's cool that we're adapting the written forms to more completely express the full range of formality that we actually produce written language in now.
The details of orthography are all just convention and tradition anyway. As much as it pains prescriptivists and peaked-in-high-school wellactually pedants the true language is the spoken and writing is merely a tool we use to represent it. These additions make writing a more complete & capable representation.
And from a more CS view it's cool too. We've hijacked sometimes-redundant punctuation to convey nuances of tone and intent. Essentially increasing the "bandwidth" of writing.
I've seen house programming styles that mandate a tab is 3 spaces, as well as 5. Personally, I have no dog in this fight at all. I'll follow whatever the house style is.
I used to be firmly in the one-space camp because my first intro to the subject was from Robin Williams _The Mac Is Not A Typewriter_. However, at some point -probably one of the regular resurrections of this debate on HN - I came across a convincing argument that the line of reasoning in that book is wrong; that it was actually common for typeset books to use larger spaces between sentences than between words and that when the typewriter was invented, the two space rule came about because that's what people were already used to seeing.
I got so sick of all the arguments from ignorance that I decided to get empirical and have taken to examining the type in old books to see if there's any kind of consistency as you work back in time from the word-processor era to the typewriter era to the pre-typewriter-era. The problem is that I don't have access to enough old books to come to much of a conclusion.
I have _The Works of Shakespeare_ from 1919 (MacMillan and Co.) and it definitely uses wider spaces between sentences than between words. This is true both in the fully justified text of the preface as well as in the ragged right of the plays.
I also have a Webster's Dictionary from 1937 and it too uses wider spaces between sentences than between words. All the text is fully justified.
However, virtually all modern books use the same width space between words as between sentences. Notice that I'm _not_ saying "one space" or "two spaces" because books are typically fully-justified and so it can easily be the case that all the spaces on a line are much wider than a single space character; it's just that the space between sentences is no wider than the space between words.
Which brings me to what I feel like is usually ignored in these debates: justification. Professionally typeset works that get printed are almost universally fully justified. Everything else - typical word-processor output, the entirety of the web, etc. is left justified. if you're going to write an algorithm for full-justification you're going to have to mess with spacing and it doesn't seem entirely unreasonable to put a little more space between sentences than between words. Although presumably it's algorithmically simpler to treat all spaces the same way, especially if there aren't great rules for even knowing when a sentence ends e.g. "No, I expect you to die Mr. Bond".
> I used to be firmly in the one-space camp because my first intro to the subject was from Robin Williams _The Mac Is Not A Typewriter_.
So which camp are you in now? I did a 'view source' on your comment. I see you still type with a single space after period. So are you still in the one-space camp? Can you share a little bit more about which camp you are in and why?
despite what many pedants will have you believe, unlike many other languages there is no central authority of how english should be spoken or written. there isnt even a correct spelling of words.
I learned something interesting about myself here...
I was always in the "two spaces after the period" camp, because that's how I was taught as a young sprout. So I assumed that's how I still write.
I've been writing and publishing fiction and nonfiction for decades and took a quick tour through my work. My early work does indeed have two spaces after each period.
But, at some point, I started to shift. I went through a couple of years where I was inconsistent about using two spaces or one, and after that, all of my writing uses one space exclusively.
I didn't even notice that change in my own writing. It boggled my brain a little bit.
This is is interesting. When I learned to type, I also learned to use two spaces after a sentence. The habit stuck. I don’t think about it at all, it just happens. And, yes, paragraphs sometimes look weird to me if sentences are not separated by two spaces.
Two semesters of touch-typing in the 1980's has cemented two-spaces into my muscle memory. I can't shake it; it looks/feels wrong to use one space even though I respect the efficiency and aesthetics of that.
I will write two spaces as long as Emacs has by-sentence navigation. I wouldn't care what anyone thinks about it.
I studied printing and typography in particular, as a student. I worked in printing and publishing, mostly in a large newspaper before I pivoted to programming. I haven't done any work in that area since the time Adobe Pagemaker overtook QuarkXPress (around the time of G5 and MacOS X). But, I think the principles are still valid to this day as nothing really changed in the printed text.
So, when paginating a newspaper, the goal is to achieve a "pleasant look". This includes a bunch of rules that... sort of have some justification sometimes, and less so other times... well, here are some examples:
* Don't allow for four consecutive lines to end with hyphens because it breaks the visual image of the body of text as a rectangle (or whatever other shape that the text was intended to flow in). Obviously, four here is a heuristic, and in reality will depend on the space between lines, the size of the font etc. But, the rule says "four".
* Similarly, don't allow for four consecutive lines to have spaces between words align vertically because it creates an artifact in the otherwise uniform rectangle filled by text.
* Don't allow for a single line from the previous paragraph to start a page / a column because readers want a paragraph to be close together as it's supposed to convey a single thought.
And there are many, many more, including the size of dashes that should be used in specifying year or time spans, used in math formulas as minus, used in direct speech, used as way to separate sub-sentences and so on. Unfortunately, these also vary by language and country. Sometimes even the prose vs verse will affect the preferred length of dashes.
So, when it comes to periods, they can happen in acronyms, in initials or to signal the end of the sentence. When they are used in acronyms or, and especially in initials / titles, it's very desirable to keep the word following the period with the space on the same line with whatever preceded the period. I.e. "V. I. Lenin" should be treated as a single word, from typographical perspective. In practice, in typography, there are "unbreakable" symbols, mostly dashes and spaces, and these would be used for titles and initials to prevent accidental linebreak from cutting through someone's name. In contrast to this, it's highly desirable that the linebreak occurs where the sentence ends.
Since keyboards offer limited input capabilities when it comes to text formatting, and in order to help editors to present text in a more easily digestible way, it seems like putting two spaces after a period, if it's meant to terminate a sentence is a small price to pay.
My typical way of working with prose in Emacs is that I repeatedly ask it to re-flow the paragraph (because of the edits I make to it). So, on top of being able to skip number of whole sentences in any direction, I would also prefer that the editor doesn't break initials off the name, while try to break lines on sentence end.
Most computer users today use crippled text editors s.a. VSCode or MS Word, which don't offer similar functionality, and so to their users double space after period would appear as a meaningless chore. I understand that the needs of the majority is where it will end in the end of the day, but, personally, will never accept this kind of approach.
Is 10 guaranteed to solve all problems? 99% confidence it will? How much does it cost in weight budget to add some 9s to that? Oh wow tampons are light. 100 it is then.
Everybody around me who would find utility with them has non-normal usage patterns (birth control, PCOS, hysterectomy) so there is a pretty high variance in my observations (across individuals and time) and I don't need to actively seek more samples. I just make sure there is a box of tampons around and borrow one to use for wound care when I need one and replace the empty box when requested.
It's not this forum, it's the default behavior of HTML unless you specifically opt in to significant white space, which few sites do because it has a tendency to create spurious newlines unless the code is maintained very carefully.
I'd be very curious to know if you have any examples of a website that does preserve multiple spaces after periods—I would suspect that in most cases where you think that is happening, it's actually because the font handles the extra space.
Whether it's a bummer or not, the point is this is how every website has worked for the last 30 years.
I find that if you look closely at old books, the spacing is pretty variable. I have one book by my desk from 1895 and the spacing after a period is more like 2½ normal spaces, but I have another book from 1978 and the spacing after period is basically identical to the spacing after a word. There's not much consistency from book to book.
Yeah, it makes sense for HTML -- there is no easy way to differentiate sentence-ending periods from other periods. This is just one of those things that gets worse as technology improves. Another example is phone connections -- phone connections used to be fantastic, and they worked even when the electricity went out, but now we have cell phones, which commonly have terrible connections and are dependent on electricity. Just another example of how advancing technology degrades the user experience in order to achieve other efficiencies.
In general, technology makes things worse but cheaper. :-)
A book printed with letterpress is nicer than a laser printed book, and an illuminated manuscript is even nicer! There are some technology shifts where the new thing is strictly better, like DVD to Blu-Ray, but the majority of the time you have give something up to move forward, like losing the ability to record when we left VHS behind.
AI is going to really accelerate the trend by being really bad at a lot of tasks, but good enough to make do with.
You want non-breaking spaces for this. They're the same width as a normal space, but don't collapse together in HTML.
I once encountered a bizarre bug caused by a user whose keyboard was somehow configured to automatically insert these if they added two or more sequential spaces.
And web browsers muddied the water a lot, since rendering two spaces is generally done by rendering two spaces, unless you're on the web and then have to do something special (some people do! is pretty common). Personally I think this alone is the biggest influence that trained generations that one space is the majority and therefore the most correct.
---
I wish they called the zero-width spaces "non-space", so we could have both (non-breaking space) and &zwsp; (breaking non-space).
I wrote a blog post three years ago, "Personal History: Punctuating the Web (1993)," that offers more context and perspective on the discussion [0].
We discussed this on Hacker News three years ago when Microsoft Word began flagging double spaces after a period as errors [1].
That brief discussion thirty years ago still seems to arouse strong feelings for people.
Today, looking back, I'm mostly amused that Guido van Rossum (the developer of the Python programming language) wanted web browsers to collapse multiple spaces to a single space after periods. Python, notably among programming languages, treats whitespace as significant.
[0] https://danielkehoe.com/posts/personal-history-punctuating-t...
[1] https://news.ycombinator.com/item?id=22975299