This, like many things, is a case where many different problems get complected, because no one is able to step back and tweak every level of the stack to cleanly separate out the relevant models/abstractions.
Ideally, a computer keyboard would be able to directly send both an arbitrary number of named control functions, and arbitrary unicode text (either as full strings or as code units one by one). Instead though, keyboards (in every existing keyboard protocol) send a very limited number of scan codes, and what to do with those is left entirely up to the operating system. Thus the operating system can’t just get a symbol from a foreign-language keyboard and know what to do with it, but needs to be put into a special mode depending on what keyboard is plugged in. If multiple keyboards are plugged in with different language layouts, too bad: at least one of them will not behave as expected.
Then at every level of the software stack, from the low-level device drivers, up through operating system services, to end user applications (e.g. browsers or terminals) and then on to custom behavior running on those applications/platforms (like a webapp or whatever), everyone gets to take a whack at the meaning of the keyboard code. At each level, there’s logic which intercepts the keyboard signal coming in, digests
it, and then excretes something different to the next layer.
As a result, it’s almost impossible for application authors (much less web app / terminal app authors) to know precisely what the user intended by their keystrokes. And it’s almost impossible for users to fully customize the keyboard behavior, because at several of the levels user access is impossible or difficult (especially in proprietary operating systems, or in locked-down keyboard firmware e.g.), and even where users do have access, it’s very easy to make a minor change that totally screws something up at another level, because none of the relevant abstractions are clean.
Furthermore, custom user changes at any of these levels are almost never portable across applications, operating systems, or hardware devices. Every change is tied to the specific hacks developed in a particular little habitat.
Overall, a very disempowering and wasteful part of the computing stack.
Why do you want to put smart features into a keyboard if your PC is able to do that with around 0,0001% of cpu capacity? Smart keyboards would need special hardware and firmware updates, and this would make them more expensive.
Nobody is forced to use cryptic key combinations like the Emacs defaults. I use Emacs for about 30 years, and I figured out quickly how I can define my own keyboard mapping, even for function keys and other special keys. Emacs has always been more convenient to me than any other editor. The same counts for terminals. Keyboard macros or shell scripts are your friend if your desktop supports them.
If you know how to handle xmodmap then you can redefine even your whole keyboard which also affects every terminal. For instance I remapped the "/" key with xmodmap so that I don't have to use the shift key anymore, in any terminal. However I don't know if xmodmap is able to handle multiple key strokes. If not then this would be a nice to have feature for a coming release.
In my opinion terminals are still one of the most productive features of a computer. Usually we don't use multiple different terminals at the same time. Usually we have one favorite terminal, and that's why custom keyboard macros and mappings are (or should be) sufficient.
I think you actually don't want a smart keyboard. You want an intermediate layer that translates your special key mapping into control sequences of arbitrary terminals, don't you?
> I think you actually don't want a smart keyboard. You want an intermediate layer that translates your special key mapping into control sequences of arbitrary terminals, don't you?
I'm not sure about him. But for my part, I would love not having to fidget with X configuration, or Windows keyboard layout dialogues to track down which is the right layout whenever I plug a USB French AZERTY keyboard into my laptop with a Canadian multilingual QWERTY builtin keyboard.
And it would be the best if I could use both keyboards at the same time, because unless I mess around manually with input device selectors everytime I plug a new keyboard, my hardware has no way to know its layout.
And that's when I plug a French Mac AZERTY keyboard, and notice that some keys aren't at the same place and I once again need to find the appropriate layout.
Really, sending typed characters as UTF8 strings would be far preferable, and it wouldn't require processing power on the part of the keyboard (the keyboard would just send the strings, not parse them).
> Really, sending typed characters as UTF8 strings would be far preferable, and it wouldn't require processing power on the part of the keyboard (the keyboard would just send the strings, not parse them).
This is not a keyboard problem but simply a driver problem. Usually USB devices transmit a USB ID to the PC so that the PC can handle them appropriately. So if we simply had a customizable layer between USB keyboards and the application layer then your problem would be solved. If the layer knows the USB ID of your keyboard then it could choose the correct driver automatically so that you wouldn't have to worry about your french keyboard.
But why need such a layer when the keyboard could easily handle it itself? Each USB keyboard would need its own special driver to be installed on the system? Or would you have instead a large list of USB IDs and associated layouts? With some separate mechanism to allow custom layouts, or something.
What's the problem with the keyboard knowing its own keys and just telling the symbol of the key that is pressed? Sure, it would still need some sort of escape sequence, or a way to mark whether a key's string is literal like Ù or symbolic like AltGr, but that doesn't sound too difficult.
It would also make it less difficult to add a new symbol like € to keyboards.
> Ideally, a computer keyboard would be able to directly send both an arbitrary number of named control functions, and arbitrary unicode text (either as full strings or as code units one by one)
You're right about how the input stack is tangled up, but wrong about the ideal state. The keyboard is absolutely not the right place for this sort of intelligence.
No keyboard has 100k+ keys, so Unicode input is fundamentally a UI problem. Look at the enormous number of Chinese input methods, all of which need to cooperate closely with the GUI to work. Or heck, how do you type é? On OS X, you can either press option-e followed by e, or press and hold e, and select from a popup menu. These both require integration with the OS and GUI, and cannot be handled by the keyboard itself.
Even setting aside Unicode input, we still often need to know which keys were pressed. I'm programming a FPS game - what happens when the user presses the 2 key? If it's on the number row, it should select weapon #2; but if it's on the numpad, it should move the character backwards. So it's not enough to know the key's character; I need to know which physical key was pressed!
The layered approach you describe is confusing and error-prone, but it's necessary, because software needs to act at different levels. Some software wants very high-level Unicode text input, while others need to know very fine-grained keyboard layout details. All of the data must be bubbled up through all layers.
I said this was my ideal, not that it was a practical way forward for our current software stack.
In my ideal world, your app registers that it needs a “pick weapon #2” button, and a “walk backward” button. Then I can configure my keyboard firmware and/or the low levels of my operating system keyboard handling code to map whatever button I want to those semantic actions.
The problem with the scheme where the application is programmed to directly look for the “2” key and then pick which semantic meaning to assign based on context, is that there is at that point no way to intercept and disambiguate “put a 2 character in the text box” from “pick weapon #2”.
This isn’t the biggest problem for games, but it’s a huge pain in the ass when, for example, all kinds of desktop applications intercept my standard text-editing shortcuts (either system defaults or ones I have defined myself) and clobbers them with its own new commands (for example many applications overwrite command+arrows or option+arrows or similar to mean “switch tab”, but in a text box context I am instead trying to say “move to the beginning of the line” or “move left by one word”), often leaving me no way to separate the semantic intentions into separate keystrokes.
The problem is that there is no level at which I can direct a particular button on my keyboard to always mean “move left by one word”... the way things are set up now, I have literally no way to firmly bind a key to that precise semantic meaning, but instead I need to bind the key to some ambiguous keystroke which only sometimes has that meaning, but sometimes might mean something else instead.
I can touch-type on a QWERTY keyboard. For the Latin alphabet, I want a QWERTY layout even when I'm typing French or German; even when the keyboard is physically labelled AZERTY or QWERTZ. I type Korean on a 2-Set Hangul layout, but have never used a keyboard that was labelled with that layout. (Aside: typing in this layout does not generate a Unicode codepoint per keypress, but combines two or three letters into a single codepoint).
If the keyboard was responsible for deciding all this, I'd need three keyboards and I'd need to carry them around with me whenever I wanted to use somebody else's computer.
> In my ideal world, your app registers that it needs a “pick weapon #2” button, and a “walk backward” button. Then I can configure my keyboard firmware and/or the low levels of my operating system keyboard handling code to map whatever button I want to those semantic actions.
Then the OS / Firmware is dealing with questions like, "What button fires the secondary dorsal thrusters". Does it make sense to handle that kind of question so far from the site where the semantic knowledge is present? No.
Likewise, a web server doesn't provide application-level semantic information in its replies, only protocol-level semantic information. One application might think 301 means "update the bookmark" and 502 means "try again later", but another application might think 502 means "try another server" or that 404 might mean "delete a local file" or "display an error message to the user".
Likewise, a game is prepared to deal with buttons, not semantics. The number and layout of buttons is closely tied with design decisions. A FPS gives you WASD, plus QERF for common actions, ZXC for less common actions, 1234 for menus / weapon selections. The design of the game from top to bottom is affected by this. You swap in a controller for a keyboard, and you'll decide to change how weapon selection is presented: maybe spokes around a center so you can use a joystick instead of items in a row corresponding to numeric buttons. You add auto-aim to compensate for the inaccuracy inherent in gamepad joysticks, but the vehicle sections become easier. You might even redesign minigames (Mass Effect has a completely different hacking minigame for console and PC versions).
Or look at web browsers. As soon as you hook a touch interface to the web browser you might want to pop up an on-screen keyboard in response to touch events, but you need to move the viewpoint so that you can see what you're typing.
Input is inherently messy, and you can't pull semantics out of applications because you'll just make the user experience worse.
ON THE OTHER HAND...
> The problem is that there is no level at which I can direct a particular button on my keyboard to always mean “move left by one word”...
This is available through use of common UI toolkits. I believe on OS X, you can bind a button to mean "move left by one word" in all applications which use the Cocoa toolkit (which is the vast majority of all applications on OS X). The way this works is there are some global preferences which specifies a key binding for "moveWordLeft:". The key event, when it is not handled by other handlers, then gets translated to a "moveWordLeft:" method call by NSResponder. The method for configuring these key bindings is relatively obscure, suffice it to say that you can press option+left arrow in almost any application to move left one word, and you can configure the key binding (i.e., choose a different key) across applications on a per-user basis.
"Likewise, a game is prepared to deal with buttons, not semantics. The number and layout of buttons is closely tied with design decisions. A FPS gives you WASD, plus QERF for common actions, ZXC for less common actions, 1234 for menus / weapon selections."
Except on my keyboard layout an FPS should be giving me QSDZ, for the same pattern of movement keys, because I'm French and use AZERTY. Or AOE, because I use Dvorak; ARSW because Colemak...
Not really, but you take my point...
And of course, most games let you redefine your keys anyway, probably largely for this reason. I'm not sure how much this undermines your other points.
I was simplifying; I don't use QWERTY either. Most operating systems provide two ways to identify key presses, let's call them "key codes" and "char codes". So the char codes on a French layout are QDSZ instead of WASD but the key codes are the same, and the keys are in the same physical location so it doesn't matter. The only difficult part is figuring out how to present key codes back to the user.
What if you want to switch keyboard layout dynamically?
I normally use a Dvorak layout, but I toggle between that and a UK layout when pairing.
I'd hate to have to bring along my keyboard every time I visit a co-worker's PC, just so I could type in Dvorak on their machine too.
Your way seems like it would also require me to buy an expensive specific Dvorak keyboard just to be able to type in Dvorak. Whereas I get by with a moderately expensive keyboard - with custom unmarked keycaps.
That's a bleak outlook to have. Like the author said, you can always just go for graphical toolkits and ignore terminals. Whereas, embracing unix and terminals, I personally find it an empowering and frugal part of the computing stack: predictable, works everywhere, light and simple. Limiting factors can be a positive thing.
I think you are misunderstanding me. I am not calling terminals disempowering.
I am criticizing the keyboard handling (and general input device handling) at all levels of the computing stack from device firmware and drivers up through web or curses applications. As a user, it is effectively impossible to get the keyboard to behave the way I want (or even in a way that I can anticipate with some kind of mental model) in every context.
As a concrete example: the spacebar key has its behavior overloaded all over the place. Some people make their keyboard firmware turn holding down the spacebar into a modifier key. At the operating system level, modifier + spacebar often has a special meaning: for instance in OS X command + spacebar indicates "switch to the next available keyboard layout", but then later that shortcut was also adopted by the Spotlight search feature (recent versions of OS X change the keyboard layout shortcut to command-shift-space but for a few versions the shortcuts collided and the results were unpredictable). System-wide utility software (for instance Quicksilver) uses or at once point used command-space as a default shortcut to pop up its UI regardless of the current application. Then various applications want to interpret the spacebar in several ways, for example for activating a selected button or other UI widget, or for scrolling, or for entering a physical space character. Photoshop famously uses a held-down spacebar to switch the active tool to the grabber hand, with command-space switching to the zoom in tool and option-space switching to the zoom out tool; other design/image software followed Photoshop’s lead, but sometimes implements things a bit inconsistently, for example in Illustrator command-option-space is required for zoom out instead. In the browser, things really get gnarly, because the browser UI itself uses the spacebar for all three of scrolling (space = page down, shift-space = page up), activating UI widgets, and entering space characters in text fields, depending on the current context, and then certain in-browser applications/widgets try to reinterpret the spacebar, for example a video widget will interpret the spacebar as play/pause, or a game will interpret the spacebar as shoot or jump.
All of these layers are clobbering each-other, so that existing software is already incompatible with defaults set at other levels of the stack. But as a user, if I try to make any changes to how one part works, I’m almost guaranteed to break something at another level.
Perhaps worse, the application keyboard context often changes without making the change obvious to the user, so that moment by moment I often can’t predict precisely what will happen if I press the spacebar.
(And this is not limited to the spacebar: enter, tab, arrow keys, and most other keyboard commands change their meaning based on the context in inconsistent and confusing ways. Once you get to the terminal, as explored in the original linked post, you get all kinds of other inconsistencies with certain keystrokes which only work in some contexts but not others, and various duplicate keystrokes which cannot be separately assigned, and so on. But these problems are not unique to terminals.)
Ideally, a computer keyboard would be able to directly send both an arbitrary number of named control functions, and arbitrary unicode text (either as full strings or as code units one by one). Instead though, keyboards (in every existing keyboard protocol) send a very limited number of scan codes, and what to do with those is left entirely up to the operating system. Thus the operating system can’t just get a symbol from a foreign-language keyboard and know what to do with it, but needs to be put into a special mode depending on what keyboard is plugged in. If multiple keyboards are plugged in with different language layouts, too bad: at least one of them will not behave as expected.
Then at every level of the software stack, from the low-level device drivers, up through operating system services, to end user applications (e.g. browsers or terminals) and then on to custom behavior running on those applications/platforms (like a webapp or whatever), everyone gets to take a whack at the meaning of the keyboard code. At each level, there’s logic which intercepts the keyboard signal coming in, digests it, and then excretes something different to the next layer.
As a result, it’s almost impossible for application authors (much less web app / terminal app authors) to know precisely what the user intended by their keystrokes. And it’s almost impossible for users to fully customize the keyboard behavior, because at several of the levels user access is impossible or difficult (especially in proprietary operating systems, or in locked-down keyboard firmware e.g.), and even where users do have access, it’s very easy to make a minor change that totally screws something up at another level, because none of the relevant abstractions are clean.
Furthermore, custom user changes at any of these levels are almost never portable across applications, operating systems, or hardware devices. Every change is tied to the specific hacks developed in a particular little habitat.
Overall, a very disempowering and wasteful part of the computing stack.