Hacker News new | past | comments | ask | show | jobs | submit login
Hyphens, minus, and dashes in Debian man pages (lwn.net)
147 points by signa11 on Nov 2, 2023 | hide | past | favorite | 88 comments



Classic "everyone is using the software wrong, but it's the fault of everyone, and not the software".

Some distros like Void seem to patch this out.[1]

From mandoc/mdocml's mandoc_char(7) [2]

In roff(7) documents, the minus sign is normally written as ‘\-’. In manual pages, some style guides recommend to also use ‘\-’ if an ASCII 0x2d “hyphen-minus” output glyph that can be copied and pasted is desired in output modes supporting it, for example in -T utf8 and -T html. But currently, no practically relevant manual page formatter requires that subtlety, so in manual pages, it is sufficient to write plain ‘-’ to represent hyphen, minus, and hyphen-minus.

Which is the common-sense thing to do.

Meanwhile, GNU projects become increasingly less relevant due to obnoxiousness like this.

In general the amount of wankery of "the correct hyphen" is staggering.

[1]: https://man.openbsd.org/mandoc_char

[2]: https://github.com/void-linux/void-packages/blob/20c66829134...


> everyone is using the software wrong, but it's the fault of everyone, and not the software > GNU projects

There's a way more popular thing that breaks commandline snippets: auto-replace from `--` (double hyphen-minus) to `—` (emdash), in many chat applications and, particularly, on Mac OS.

The argument for having such problem is the same: "everyone is using the software wrong", but alas, some people just forget to backtick their snippets (when that's even supported).

So it's not just GNU projects; it's GNU projects where there's a good chance to actually fix such problems despite the opinion of some code authors.


On iOS it’s really almost impossible to type -- in an input that isn’t explicitly defined to allow it. In this comment box for instance, I had to type `- -` (dash space dash), and then delete the space in the middle. I’m not sure if it’s possible to disable this behavior on a web text box.

Virtually any other method of inputting -- automatically, invariably converts it to a single em dash.

The best part about this is that it happens silently and automatically, unlike every other autocorrect on the platform. “Smart punctuation” can be disabled globally for the entire device, but I don’t think that’s particularly reasonable, either.


Gotta love smart punctuation.

Just yesterday, I had to fix a broken page because someone dared store a link tag in a word document, which helpfully and silently converted its quotes into smart quotes for the user.

I have a Ruby class that's whole purpose is to undo this kind of help in user provided text, but I missed a spot.

Thankfully, I've seen this happen a thousand times, so I knew what was happening. The user didn't though. And trying to explain why her quotes are the wrong kind of quotes would be impossible.

I cannot fathom the arrogance that drives the writer of any text processing application to silently replace what the user inputs, especially what they paste, with what they prefer instead. In the case of apple, they'll even delete other parts of the text if you disagree with their choice.

It's why I use notepad for almost everything and I avoid excel like the plague.


> I had to type `- -` (dash space dash), and then delete the space in the middle

Which itself seems like a bug, if the intended behaviour of adjacent dashes is to turn them into a long dash.

There is a similar annoying bug in Confluence (it's some weirdo corporate sort-of-wiki software), where backticks normally result in monospaced text, but if you type a closing backtick and then move to the start of the line and type the opening backtick, it just doesn't work. You get non-monospaced text, with literal backticks in it. To fix it you have to delete both of the backticks, then re-type the opening one, and then re-type the closing one.


Turn off "Smart Punctuation" in Keyboard settings.


Ok, but I want that on for the 90% of time I am talking to people.


What frustrates me is that this is a solved problem: In Word, for example, automatic substitutions can be overridden by undoing (via Ctrl Z or backspace) and repeating the same sequence, which will disable substitution.

Preferably I should be able do type - - and then have it be replaced by —, and backspace to turn it into --.


I agree with your first paragraph but not the second. What frustrates me is any sort of override of my keyboard keys – 100% of the time when I press backspace, I intend to delete one (1) character behind the blinking cursor. Not a whole word, not perform undo, not anything else. One character.

(Alternatively, if text is selected, then of course I’d like to delete the selection.)


-- ?

I had no trouble at all. iPhone 11, iOS 17.

Edit... as another commenter points out: it's the "smart punctuation" keyboard setting that handles this. I have it turned off...


It's pretty much mandatory for developers and tech people using a Mac to turn off the global "Use smart quotes and dashes" and most of the text replacement gunk.

It's quite counterproductive to have someone paste a json blob in Slack/mails/etc. ending up with all with the wrong quotes, as in “string”


It's a bit ironic because Apple did not do this for a very long time and only implemented it in recent years after being pressured by Windows converts who were used to it happening there.


It doesn’t happen on Windows, other than in specific applications like Word.


Sounds mostly just like Mac OS then, no? Where in specific applications like Notes and Slack it is being replaced. But not in other specific applications like Finder, Terminal, Music, or Firefox.

I don't like it either, but one seems modelled after the other.


I responded to your “Apple […] implemented it in recent years after being pressured by Windows”. I have no idea what Apple actually implemented. In any case, there’s no such application-independent functionality in Windows proper. Word for Mac probably also had it all along. And it’s as wide-spread in platform-independent web-tech-based apps (e.g. Electron apps).


> I have no idea what Apple actually implemented.

The things the person I responded to mentioned:

> the global "Use smart quotes and dashes" and most of the text replacement gunk.


MacOS has (had?) a fairly reasonable keyboard layout that let you type such things directly.


This reminds me in a bug of website my school uses for homework. It only accepts negative numbers using hyphen-minus as found on US layout keyboards. Some students would enter minus because of their layout or IME and come to me as a TA for help wondering what they were doing wrong. Really was confusing figuring out what was going wrong the first time.


Isn't this pretty common? str to int conversions don't commonly support this (quick test: Go, JavaScript, and Python – all throw an error on "str to int −10").

Are there keyboard layouts that have a "minus sign"? I can't find any (standard) keyboard layout that doesn't have - and has − instead (you will need - anyway for lots of contexts).


It’s probably almost entirely IME stuff (I’ve had similar issues with some other dashes in MacOS, with one kind of dash being replaced by a “better” dash, but not the one I was typing, with a Japanese layout)


If an IME doesn't enter a standard hyphen-minus by default, no matter what the language is, that is basically engineering malpractice. How could anyone not know that would fail hard more often than not?


I type Japanese frequently, and hitting the dash/minus key results in "ー" instead of "-". The reason is justified too, namely in that Japanese is almost always typed monospaced so a normal "-" screws up the formatting. The same goes with numbers too, like "3" instead of "3".

I always switch to English input if I need to type in lots of numbers since converting to half-space everytime gets annoying and feeding mono-/full-spaced numbers into text fields usually leads to the program on the backend crapping the water closet.


A dash is definitely not the same as a minus. A hyphen however is so close as to cause a serious ambiguity, so much so that they call the standard character a hyphen minus. So if someone comes along and invents a minus or a hyphen that is distinct from a hyphen minus, and can't even be visually distinguished most of the time, that is not particularly helpful for general use.

People want to use Unicode for everything, when some things are really typesetting abstractions that should be avoided in general purpose computing. Groff and troff have their peculiar exceptions for historical reasons, but surely in retrospect it would have been better for a hyphen minus to render as a hyphen minus rather than as a hyphen.

As far as Japanese goes, the keyboard mapping there is unfortunate. Perhaps they should have considered using a shift key or something to shift between hyphen minus and dash. And in my view those ought to be two characters with typesetting variations, not six or seven.


Maybe they were pasting from a man page?


This is a good joke, but I do remember a course I took that required homework assignments in Mathematica. I definitely copied some of those from the help files. This was down to unimaginative assignments and fairly comprehensive examples in the docs. It even worked in exams, e.g. we were asked to compute and 3d plot a particular Lorenz attractor, this exact problem was in the (available in exam) help files.


It’s a difficult issue. If you write a calculator app as a web page, you might want to use the minus sign for the rendering to look typographically nice. But someone copying the output then might have trouble getting it working when pasting into some number input. A native app of course can convert to hyphen-minus on copying. But really, maybe it should become more common to accept the minus sign for minus in UI input.


Close enough : New Azerty has 4 of them on the same key ?

https://norme-azerty.fr/en/

Technically, if not practically standard. (Yet ?)


It's funny we're having this conversation when the real, _actual_ bug is this:

> The specified behavior of groff is that an ASCII "-" (Hyphen-Minus) in the input becomes a Hyphen in the output.

This makes no sense and virtually any program that does similar things (ligatures, auto-emoji, formatting...) on output that will realistically be copy-pasted should burn in hell.


This sounds like the same issue that caused me to switch from Google Docs to Markdown for my notes. One day I spent way too much time trying to figure out why I was getting errors when copy/pasting a command back to the command line. IIRC I finally copied the command to a `vim` window and found that GD was replacing consecutive spaces with some other character that the shell didn't interpret as a space.


Does turning off "Automatic Substitutions" (under Tools > Preferences > Substitutions) fix the problem? I have that turned off, as well as turning off "Smart quotes" (under the General tab) which drove me nuts with code fragments.

They may also add new options every now and then. I don't remember seeing the "Insert emojis using the colon character" option. Turned that damn thing off.


I'm not sure if you're if you're referring to G. Docs or not. I still use it for stuff that I need to share (and won't need to copy/paste to the CLI.) It did not occur to me at the time that there might be options to defeat Docs reformatting.

IAC Markdown and Mkdocs has been a solid platform for notes and with a private Gitea instance, all of my notes are local and private. Since MkDocs can serve the pages it produces, I can use it away from my home LAN (homelab) too, even w/out network access (though I suppose G. Docs has some way to cache documents locally. But I'd probably have to plan in advance.)


I hear you. I had the same damn good experience and it was only after pasting into vim that I realised what was going on.


It makes sense for typeset output. It doesn't make sense for man pages and the tool should make that distinction easy to select.



That's what you get when you don't think WYSIWYG as a necessity. Troff is not really unique in that there are many other markup languages with hyphen-minus-dash distinctions, which may be detected contextually, distinguished from various states, or have to be manually marked as such. Any such distinction is a potential mistake open for users, but WYSIWYG provides an immediate feedback so that many mistakes can be readily corrected. Even a semi-WYSIWYG interface like a side-by-side preview would work great, to slightly lesser extent---it is rare to see a Markdown editor not paired with a preview. To my knowledge this is never a norm in troff, I'm not even sure if there does exist a preview editor for troff (contrary to TeX and friends!), and unless such editor can be made a norm, troff should be better shelved for good.


WYSIWYG editors make this problem worse, not better.

It is extremely common, if not expected, for a WYSIWYG editor to change the character that has been typed by the user to a different character which the editor thinks better corresponds to the intention of the user. That very principle is essentially wrong. And in a WYSIWYG editor, the user is not necessarily aware that this "helpful" change has taken place. In particular, the user will not be aware if the characters are visually identical.

Typesetting concerns only belong to typesetting tools. And even then, almost all these typesetting rules relating to quotes and dashes are obscure, actively debated, dependent on language and most important of all, completely useless.


Hmm. "ronn" doesn't get mentioned, I should make sure it does the sensible thing (because it's 2023 not 1983 and I don't expect anyone to use an obscure typesetting-ish notation to write very stylized man pages that are almost exclusively consumed in monospace terminals...)


I recently experienced this firsthand searching some manpages. It lead to extreme confusion.


This comment from the package maintainer caught my eye. It is in response to the upstream author’s proposal to turn the hyphens problem — caused by man page authors but felt by anyone who uses man — into everyone’s problem. The suggestion was that doing so is the right way to persuade man page authors to fix their “bugs”.

”Externalizing costs onto relatively powerless consumers in the hope of using them to relay complaints to the people in power offends my sense of natural justice.”

It reminds me of the “Knee Defender” — plastic blocks designed to help airline passengers disable the recline feature not on their own seat, but on the seat in front of them. Part of the marketing copy from the manufacturer was that if your fellow passengers complain about you using these blocks then you should encourage them to complain to the airline with you, to appeal together for more space for everyone on board.

What a selfish and asinine suggestion that was. The upstream groff maintainer’s position is not dissimilar.


> What a selfish and asinine suggestion that was

Why? Isn't it essentially just a stronger way of turning to the person and asking them not to recline their chair? Are you saying the person in front is entitled to to, since the chair has that function?


Fixing the man pages to have the right dashes seems like something a LLM could do


The existence of multiple dashes, hyphens, etc. in English typography should be considered a bug, not a feature. Modern typesetting should standardize on a single glyph instead of perpetuating this nonsense, and it should be hyphen-minus because that's the one that's on keyboards.

The same goes for curly quotes.


Disagree. The bug is that typewriters didn't include more characters, and that we've replicated that behaviour with computers even though we can now do better.

The different dashes have different uses and meanings, so it is appropriate that they have different glyphs. It is not nonsense.

Typography will continue to evolve over time. People seem to want more richness and complexity, not less. We've already moved beyond fixed-width fonts, even though that was all typewriters and early computers could do.


> Disagree. The bug is that typewriters didn't include more characters, and that we've replicated that behaviour with computers even though we can now do better.

Typewriters didn't need multiple dash characters, because you can extend a hypen into an en-dash by typing hypen, backspace, half-space, hypen. You can make an em-dash with another backspace, half-space, hypen. Of course, computers lost half spaces and overstriking, for the most part.

Maaaaaaybe you need a separate minus from hypen, but that could be something you swap in, if needed (my typewriter has two keys that you can swap out, if you sent away for the keys anyway; not sure if smith-corona still stocks those)


> computers lost half spaces and overstriking

And gained option and option+shift, which is even better (and many programs will auto-convert multiple hyphens to an en- or em-dash anyway).

How common was half spaces and overstriking anyway? I don't recall seeing it much, but it's been years since I've seen a typewritten letter.


Half spaces were pretty useful for centering text, or squeezing in corrections. And overstriking was how you underlined things. Filling forms with a typewriter is kind of overstriking, and was pretty common (typewriters were great for filling out forms with carbon or carbonless duplicates).


It depends on the users and the typewriter but it could be quite frequent: I remember an old typewriter which didn't have é so you had to use e and ' to make it, same for ê, etc.


Probably only used by the same sort of people that now use en- and em-dashes!


Some early typewriters mapped 0 and o and 1 and i to the same keys even.


Using 5 technically different extremely similar glyphs for 4 extremely niche corner cases of typography is nonsense.

Text is not written only to be read visually by a human on printed paper. Arguably, text today is almost never used that way. What actually matters to humans is that text can be copied and pasted and rendered unambiguously. The distinction between minus-dash, dash, en dash, em dash and minus is pointless. It is literally impossible to confuse one for the other in actual usage. Same for all the 14 technically different quotes in the English language. Literally pointless to anyone but typography geeks.

These typographic rules are vestigial. We have inherited them from a time when it made no sense to limit the number of different characters used in script and it made no sense either to make it easy for a reader to figure out which typesetting blocks the typesetter put on the machine. Both of these things are now very important.


> Using 5 technically different extremely similar glyphs for 4 extremely niche corner cases of typography is nonsense.

They're not extremely niche — they are common and mostly obvious.

> What actually matters to humans is that text can be copied and pasted and rendered unambiguously

Yes. What is ambiguous is the hyphen-minus. It literally combines two meanings in its name.

> These typographic rules are vestigial.

They are still widely used, and still function as intended, so they are not vestigial.

> We have inherited them from a time when it made no sense to limit the number of different characters used in script

I would argue the opposite: they declined in use only because of a (temporary) limit to the number of characters available on some technologies. Thanks to Unicode the limit is removed and their use is increasing again.


I would also point out that in addition to the different dashes, there are (and with good reason) different hyphens. The hyphen-minus is too overloaded.


> minus-dash, dash, en dash, em dash and minus is pointless

Hyphens and em-dashes have literally opposite functions. Hyphens join words together, as in compound adjectives. Em-dashes separate clauses in a sentence. (And en-dashes are used for ranges.)


Variable width fonts are older than fixed-width fonts. Typesetting long predates typewriters and computers.


Of course. The argument here though is should we return to richness of the past, or stick with the limitations of typewriters and early computer systems?

Most people have already decided they prefer variable-width fonts (except when writing code) and multiple typefaces to setting everything in 12pt Courier. Using different dashes instead of just the keyboard hyphen-minus is the same issue.


I too miss the time when computers were meant to be symbolic, and 7bit codes were more than enough to explain ideas, and the small minority of people who actually needed to perform typography had dedicated softwares and as far as I know it worked pretty well, in that nobody had to wonder what kind of hyphen or what kind of space would a simple keystroke translates into...


I miss the time when only writers had to type, and everyone else had secretaries to do it for them. /s

Now we do our own typing, and increasingly, our own typesetting.


I don't think I made my point clear.

Less than 5% of what I'm doing is related to typography (namely: only when displaying text for the end user). 95% of the time when I manipulate text, the text is intended for a computer to process further. I don't want typographic issues such as the proper length of spaces when delimiting keywords on the command line, or proper collation order to slow down my `git grep`, any more than I want to be bothered by variable width fonts when I'm writing code.

UTF-8 to compose user visible strings in GUI is a godsend, though.


Aesthetics argues against that, as well as tradition. (and you failed to note the use of primes and double primes)

It's also a meaningful differentiation to be able to communicate a sentence such as:

“’struth,” he said, “she’s 5′2″ with eyes of blue.” or, “Usually, 35–40 m.p.h. is a workable and safe speed for travel—unless pedestrians are involved, in which case adjust by −20 m.p.h.”

which are far less intelligible when set with unidirectional/same width characters.

It's also useful to have the uni-directional versions ' and " for special usages in computer science contexts, while the curly forms can be used within say a .csv without the need to be escaped.


To be honest this seems alright (or rather, no more awkward than the original):

"'struth," he said, "she's 5'2" with eyes of blue." or, "Usually, 35-40 m.p.h. is a workable and safe speed for travel - unless pedestrians are involved, in which case adjust by -20 m.p.h."

The only one that's kind of meh is the em dash in "travel—unless", which I had to replace with " - " here. But that's kind of a display issue: there's nothing stopping anything from a font rendering "<space>-<space>" as "—", just as there's nothing stopping a font from rendering "st" as "st". That these ligatures are in Unicode is a historical mistake, and I kind of see these dashes and quotation marks as roughly similar: they should be done mostly on the rendering end, whenever possible.

Note I prefer using en dashes – like this – in a lot of my writing, which looks better to my eyes than regular dashes - like this - but having to use separate dashes for an aesthetic issue like this is kind of silly, IMHO.

What people did in hand-writing varied wildly too. I betcha that in the handwriting for lots of people the "angled" quotes look pretty straight, and things like prime marks and quotes look pretty much identical, and distinguishing between "minus sign" and "dash" is even harder.


> "she's 5'2" with eyes of blue."

you don't easily know where the quote ends in the more complicated cases (e.g. quotes at the end of a paragraph) if you use the same marks, especially since in typography quotes aren't repeated (though maybe they should)


Yeah, it was late, and I failed to make the latter portion of that an internal/quoted quote which would have made my point more clear.


To offer a different perspective, the parent's version looks better (i.e. readable) and more pleasing to me.


There should have been a base simple version, and variations of it to keep the keyboard functional. The variations would need to be specifically called out to be used over the simple case. The way it looks now is like a bucket with so many dupes it becomes a mess.


I really appreciate the use of proper em-dashes aesthetically— I use them for parentheticals, interruptions, and to join related thoughts.

iOS and MacOS have native means of inputting the character and on Windows I have a small utility that replaces "--=" with an en-dash and "==-" with an en-dash.


On Windows, I use http://wincompose.info/ for all my special-character needs (and use the system compose key on Linux).


You can just use Windows Key + . in newer Windows

It's sold as the emoji keyboard but there's a tab for symbols that includes all the dashes. This interface also has built in clipboard history too.


Same. Once you learn the appropriate uses for each and how to input them, it becomes easy to just use the correct dash.


> Just use the correct dash

Except that spaced vs unspaced and en vs em dashes are issues of British vs US English. Differences in usage as well as appearance.


Sure. Use whatever is standard for your variety of English. Either is better than just using hyphen-minus everywhere.


While I the other typography looks great, in safari on iOS your “minus twenty” doesn’t actually look better than plain old -20, to me.


While I agree with most of what you've said, I'm not sure I see a use-case for the distinction between Hyphen, Minus and Hyphen-Minus. Specifically, with having all three. If I were designing Unicode, I'd remove the Minus sign specifically. That way, if you absolutely need to e.g. use an old font that used long minuses (in old texts they could be even an em-dash), it would still be _encoded_ as the parseable expected by every software to date


I would remove Hyphen-Minus, whose only reason for existence was that the mechanical typewriters saved a key by using the same character for both purposes.

Distinct hyphens and minuses are necessary both due to their different meanings and due to their different traditional appearance in any typeface that does not have fixed width.

At most the en dashes could be unified with minuses, because they normally have identical appearance, even if they have different meanings.


The minus symbol aligns with the + symbol and numbers, en- and em-dashes align with text (and there should be different forms for use with lowercase, and all caps settings).


> which are far less intelligible

they seem fine to me

maybe you're just used to it, and i'm used to the other it, and mine requires fewer distinct glyphs


Personally I find the "adjust by −20 m.p.h." a little odd looking.


There’s a reason that killer apps in the 80s were rich text editors: aesthetics and layout matter.

Having said that, there’s definitely environments (shells and the like) where you want different behavior. Though even there it’s tricky: emoji being Unicode and a lack of random rules means that emoji “just work” everywhere. Other normalization rules might wreak havoc in certain languages people don’t know about.

ASCII only for talking to the computer is a bit of an outdated concept at this point


> ASCII only for talking to the computer is a bit of an outdated concept at this point

It's not. The mapping between ASCII and keyboard keys is clear. Wherever unambiguous English text is required (for example programming and documentation), ASCII is still the best.

Each person may want to use renderers that will do fancy things, and that's fine because it's on the render side. The issue with Unicode is that it's a messy mix of presentation and data. ASCII isn't. So whenever you create a Unicode document, you are unintentionally embedding presentation information into it, and it's not the case with ASCII.


If you find a bug tracker for English, we can think about reporting this, but I’m sure they have alot of more important fixes to make.

Anyway, enough people seem to like all the extra dashes, and they don’t cause much harm, so I think it will be an uphill battle.

And who knows, maybe image recognition will free us from keyboards yet, and we can return to more sensible input paradigms, like pencils.


But they're used for different things. We could probably get it down to two though (not using their real names, but what I suggest they could be called for simplicity):

- The hyphen. Use it for minus and concatenation, and anywhere you need something that looks like a dash ;)

— The dash. Used to break up a sentence — mid flow no less! — and that's all. In some ways the opposite of the hyphen

And that's it, you can throw the rest away. Nobody uses them outside of fussy books, and even then the people reading them wouldn't notice their absence.


> Use it for minus and concatenation

Hyphen and minus usually have a different vertical position, because hyphens should be positioned relative to the x height, whereas minus should be consistent with plus and positioned relative to the digit height.

In addition, minus signs are substantially wider than hyphens (again, like the plus sign). Minus width is often similar to en-dash width.

You want minus signs to be like the plus sign without its vertical bar, but you don’t want the plus sign and hyphen to have the same width, or same vertical position. That would just make at least one of them look awkward in context.


I understand there are subtle differences, but for me at least they aren't noticeable enough. i.e. that they matter less than the benefit that would be reaped from having just these two symbols.

Sounds like you disagree, and the differences would really stick out for you?


> Nobody uses them outside of fussy books

En-dashes are used all the time (correctly) to indicate number ranges. The hyphen is too small, and just looks wrong for this use-case. e.g. 0–100 vs 0-100.


but hyphen-minus can't be used as a minus since its width and height don't match numbers/+

Also any modern typesetting can autoreplace what's on the keyboard, so it shouldn't stick to the bastardized version just because it's on the keyboard


> but hyphen-minus can't be used as a minus

That depends on fonts. In some it is represented more like minus (matching plus and center of numbers) and not like hyphen.


Which fonts have identical glyphs for these two symbols?


See the comment above "hyphen-minus [is] found on US layout keyboards", but not on all keyboards.


The Académie anglaise awaits your formal bug report with interest.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: