This, like many things, is a case where many different problems get complected, because no one is able to step back and tweak every level of the stack to cleanly separate out the relevant models/abstractions.
Ideally, a computer keyboard would be able to directly send both an arbitrary number of named control functions, and arbitrary unicode text (either as full strings or as code units one by one). Instead though, keyboards (in every existing keyboard protocol) send a very limited number of scan codes, and what to do with those is left entirely up to the operating system. Thus the operating system can’t just get a symbol from a foreign-language keyboard and know what to do with it, but needs to be put into a special mode depending on what keyboard is plugged in. If multiple keyboards are plugged in with different language layouts, too bad: at least one of them will not behave as expected.
Then at every level of the software stack, from the low-level device drivers, up through operating system services, to end user applications (e.g. browsers or terminals) and then on to custom behavior running on those applications/platforms (like a webapp or whatever), everyone gets to take a whack at the meaning of the keyboard code. At each level, there’s logic which intercepts the keyboard signal coming in, digests
it, and then excretes something different to the next layer.
As a result, it’s almost impossible for application authors (much less web app / terminal app authors) to know precisely what the user intended by their keystrokes. And it’s almost impossible for users to fully customize the keyboard behavior, because at several of the levels user access is impossible or difficult (especially in proprietary operating systems, or in locked-down keyboard firmware e.g.), and even where users do have access, it’s very easy to make a minor change that totally screws something up at another level, because none of the relevant abstractions are clean.
Furthermore, custom user changes at any of these levels are almost never portable across applications, operating systems, or hardware devices. Every change is tied to the specific hacks developed in a particular little habitat.
Overall, a very disempowering and wasteful part of the computing stack.
Why do you want to put smart features into a keyboard if your PC is able to do that with around 0,0001% of cpu capacity? Smart keyboards would need special hardware and firmware updates, and this would make them more expensive.
Nobody is forced to use cryptic key combinations like the Emacs defaults. I use Emacs for about 30 years, and I figured out quickly how I can define my own keyboard mapping, even for function keys and other special keys. Emacs has always been more convenient to me than any other editor. The same counts for terminals. Keyboard macros or shell scripts are your friend if your desktop supports them.
If you know how to handle xmodmap then you can redefine even your whole keyboard which also affects every terminal. For instance I remapped the "/" key with xmodmap so that I don't have to use the shift key anymore, in any terminal. However I don't know if xmodmap is able to handle multiple key strokes. If not then this would be a nice to have feature for a coming release.
In my opinion terminals are still one of the most productive features of a computer. Usually we don't use multiple different terminals at the same time. Usually we have one favorite terminal, and that's why custom keyboard macros and mappings are (or should be) sufficient.
I think you actually don't want a smart keyboard. You want an intermediate layer that translates your special key mapping into control sequences of arbitrary terminals, don't you?
> I think you actually don't want a smart keyboard. You want an intermediate layer that translates your special key mapping into control sequences of arbitrary terminals, don't you?
I'm not sure about him. But for my part, I would love not having to fidget with X configuration, or Windows keyboard layout dialogues to track down which is the right layout whenever I plug a USB French AZERTY keyboard into my laptop with a Canadian multilingual QWERTY builtin keyboard.
And it would be the best if I could use both keyboards at the same time, because unless I mess around manually with input device selectors everytime I plug a new keyboard, my hardware has no way to know its layout.
And that's when I plug a French Mac AZERTY keyboard, and notice that some keys aren't at the same place and I once again need to find the appropriate layout.
Really, sending typed characters as UTF8 strings would be far preferable, and it wouldn't require processing power on the part of the keyboard (the keyboard would just send the strings, not parse them).
> Really, sending typed characters as UTF8 strings would be far preferable, and it wouldn't require processing power on the part of the keyboard (the keyboard would just send the strings, not parse them).
This is not a keyboard problem but simply a driver problem. Usually USB devices transmit a USB ID to the PC so that the PC can handle them appropriately. So if we simply had a customizable layer between USB keyboards and the application layer then your problem would be solved. If the layer knows the USB ID of your keyboard then it could choose the correct driver automatically so that you wouldn't have to worry about your french keyboard.
But why need such a layer when the keyboard could easily handle it itself? Each USB keyboard would need its own special driver to be installed on the system? Or would you have instead a large list of USB IDs and associated layouts? With some separate mechanism to allow custom layouts, or something.
What's the problem with the keyboard knowing its own keys and just telling the symbol of the key that is pressed? Sure, it would still need some sort of escape sequence, or a way to mark whether a key's string is literal like Ù or symbolic like AltGr, but that doesn't sound too difficult.
It would also make it less difficult to add a new symbol like € to keyboards.
> Ideally, a computer keyboard would be able to directly send both an arbitrary number of named control functions, and arbitrary unicode text (either as full strings or as code units one by one)
You're right about how the input stack is tangled up, but wrong about the ideal state. The keyboard is absolutely not the right place for this sort of intelligence.
No keyboard has 100k+ keys, so Unicode input is fundamentally a UI problem. Look at the enormous number of Chinese input methods, all of which need to cooperate closely with the GUI to work. Or heck, how do you type é? On OS X, you can either press option-e followed by e, or press and hold e, and select from a popup menu. These both require integration with the OS and GUI, and cannot be handled by the keyboard itself.
Even setting aside Unicode input, we still often need to know which keys were pressed. I'm programming a FPS game - what happens when the user presses the 2 key? If it's on the number row, it should select weapon #2; but if it's on the numpad, it should move the character backwards. So it's not enough to know the key's character; I need to know which physical key was pressed!
The layered approach you describe is confusing and error-prone, but it's necessary, because software needs to act at different levels. Some software wants very high-level Unicode text input, while others need to know very fine-grained keyboard layout details. All of the data must be bubbled up through all layers.
I said this was my ideal, not that it was a practical way forward for our current software stack.
In my ideal world, your app registers that it needs a “pick weapon #2” button, and a “walk backward” button. Then I can configure my keyboard firmware and/or the low levels of my operating system keyboard handling code to map whatever button I want to those semantic actions.
The problem with the scheme where the application is programmed to directly look for the “2” key and then pick which semantic meaning to assign based on context, is that there is at that point no way to intercept and disambiguate “put a 2 character in the text box” from “pick weapon #2”.
This isn’t the biggest problem for games, but it’s a huge pain in the ass when, for example, all kinds of desktop applications intercept my standard text-editing shortcuts (either system defaults or ones I have defined myself) and clobbers them with its own new commands (for example many applications overwrite command+arrows or option+arrows or similar to mean “switch tab”, but in a text box context I am instead trying to say “move to the beginning of the line” or “move left by one word”), often leaving me no way to separate the semantic intentions into separate keystrokes.
The problem is that there is no level at which I can direct a particular button on my keyboard to always mean “move left by one word”... the way things are set up now, I have literally no way to firmly bind a key to that precise semantic meaning, but instead I need to bind the key to some ambiguous keystroke which only sometimes has that meaning, but sometimes might mean something else instead.
I can touch-type on a QWERTY keyboard. For the Latin alphabet, I want a QWERTY layout even when I'm typing French or German; even when the keyboard is physically labelled AZERTY or QWERTZ. I type Korean on a 2-Set Hangul layout, but have never used a keyboard that was labelled with that layout. (Aside: typing in this layout does not generate a Unicode codepoint per keypress, but combines two or three letters into a single codepoint).
If the keyboard was responsible for deciding all this, I'd need three keyboards and I'd need to carry them around with me whenever I wanted to use somebody else's computer.
> In my ideal world, your app registers that it needs a “pick weapon #2” button, and a “walk backward” button. Then I can configure my keyboard firmware and/or the low levels of my operating system keyboard handling code to map whatever button I want to those semantic actions.
Then the OS / Firmware is dealing with questions like, "What button fires the secondary dorsal thrusters". Does it make sense to handle that kind of question so far from the site where the semantic knowledge is present? No.
Likewise, a web server doesn't provide application-level semantic information in its replies, only protocol-level semantic information. One application might think 301 means "update the bookmark" and 502 means "try again later", but another application might think 502 means "try another server" or that 404 might mean "delete a local file" or "display an error message to the user".
Likewise, a game is prepared to deal with buttons, not semantics. The number and layout of buttons is closely tied with design decisions. A FPS gives you WASD, plus QERF for common actions, ZXC for less common actions, 1234 for menus / weapon selections. The design of the game from top to bottom is affected by this. You swap in a controller for a keyboard, and you'll decide to change how weapon selection is presented: maybe spokes around a center so you can use a joystick instead of items in a row corresponding to numeric buttons. You add auto-aim to compensate for the inaccuracy inherent in gamepad joysticks, but the vehicle sections become easier. You might even redesign minigames (Mass Effect has a completely different hacking minigame for console and PC versions).
Or look at web browsers. As soon as you hook a touch interface to the web browser you might want to pop up an on-screen keyboard in response to touch events, but you need to move the viewpoint so that you can see what you're typing.
Input is inherently messy, and you can't pull semantics out of applications because you'll just make the user experience worse.
ON THE OTHER HAND...
> The problem is that there is no level at which I can direct a particular button on my keyboard to always mean “move left by one word”...
This is available through use of common UI toolkits. I believe on OS X, you can bind a button to mean "move left by one word" in all applications which use the Cocoa toolkit (which is the vast majority of all applications on OS X). The way this works is there are some global preferences which specifies a key binding for "moveWordLeft:". The key event, when it is not handled by other handlers, then gets translated to a "moveWordLeft:" method call by NSResponder. The method for configuring these key bindings is relatively obscure, suffice it to say that you can press option+left arrow in almost any application to move left one word, and you can configure the key binding (i.e., choose a different key) across applications on a per-user basis.
"Likewise, a game is prepared to deal with buttons, not semantics. The number and layout of buttons is closely tied with design decisions. A FPS gives you WASD, plus QERF for common actions, ZXC for less common actions, 1234 for menus / weapon selections."
Except on my keyboard layout an FPS should be giving me QSDZ, for the same pattern of movement keys, because I'm French and use AZERTY. Or AOE, because I use Dvorak; ARSW because Colemak...
Not really, but you take my point...
And of course, most games let you redefine your keys anyway, probably largely for this reason. I'm not sure how much this undermines your other points.
I was simplifying; I don't use QWERTY either. Most operating systems provide two ways to identify key presses, let's call them "key codes" and "char codes". So the char codes on a French layout are QDSZ instead of WASD but the key codes are the same, and the keys are in the same physical location so it doesn't matter. The only difficult part is figuring out how to present key codes back to the user.
What if you want to switch keyboard layout dynamically?
I normally use a Dvorak layout, but I toggle between that and a UK layout when pairing.
I'd hate to have to bring along my keyboard every time I visit a co-worker's PC, just so I could type in Dvorak on their machine too.
Your way seems like it would also require me to buy an expensive specific Dvorak keyboard just to be able to type in Dvorak. Whereas I get by with a moderately expensive keyboard - with custom unmarked keycaps.
That's a bleak outlook to have. Like the author said, you can always just go for graphical toolkits and ignore terminals. Whereas, embracing unix and terminals, I personally find it an empowering and frugal part of the computing stack: predictable, works everywhere, light and simple. Limiting factors can be a positive thing.
I think you are misunderstanding me. I am not calling terminals disempowering.
I am criticizing the keyboard handling (and general input device handling) at all levels of the computing stack from device firmware and drivers up through web or curses applications. As a user, it is effectively impossible to get the keyboard to behave the way I want (or even in a way that I can anticipate with some kind of mental model) in every context.
As a concrete example: the spacebar key has its behavior overloaded all over the place. Some people make their keyboard firmware turn holding down the spacebar into a modifier key. At the operating system level, modifier + spacebar often has a special meaning: for instance in OS X command + spacebar indicates "switch to the next available keyboard layout", but then later that shortcut was also adopted by the Spotlight search feature (recent versions of OS X change the keyboard layout shortcut to command-shift-space but for a few versions the shortcuts collided and the results were unpredictable). System-wide utility software (for instance Quicksilver) uses or at once point used command-space as a default shortcut to pop up its UI regardless of the current application. Then various applications want to interpret the spacebar in several ways, for example for activating a selected button or other UI widget, or for scrolling, or for entering a physical space character. Photoshop famously uses a held-down spacebar to switch the active tool to the grabber hand, with command-space switching to the zoom in tool and option-space switching to the zoom out tool; other design/image software followed Photoshop’s lead, but sometimes implements things a bit inconsistently, for example in Illustrator command-option-space is required for zoom out instead. In the browser, things really get gnarly, because the browser UI itself uses the spacebar for all three of scrolling (space = page down, shift-space = page up), activating UI widgets, and entering space characters in text fields, depending on the current context, and then certain in-browser applications/widgets try to reinterpret the spacebar, for example a video widget will interpret the spacebar as play/pause, or a game will interpret the spacebar as shoot or jump.
All of these layers are clobbering each-other, so that existing software is already incompatible with defaults set at other levels of the stack. But as a user, if I try to make any changes to how one part works, I’m almost guaranteed to break something at another level.
Perhaps worse, the application keyboard context often changes without making the change obvious to the user, so that moment by moment I often can’t predict precisely what will happen if I press the spacebar.
(And this is not limited to the spacebar: enter, tab, arrow keys, and most other keyboard commands change their meaning based on the context in inconsistent and confusing ways. Once you get to the terminal, as explored in the original linked post, you get all kinds of other inconsistencies with certain keystrokes which only work in some contexts but not others, and various duplicate keystrokes which cannot be separately assigned, and so on. But these problems are not unique to terminals.)
Terminals aren't weird. It's just that they work by sending and receiving strictly characters. If you look into ASCII or Unicode, there is no Ctrl+Shift+I character, so such a thing cannot be transmitted.
That's not quite the end of the story because terminals can certainly generate special escape sequences for certain keys, depending on the terminal type. For example, although there is also no ASCII character corresponding to the arrow keys, VT-100 type terminals can transmit the arrow keys somehow, so you can use them in text editors and shells. This is because they send a special sequence instead of a single character. For instance, left arrow is actually the three characters ESC[D. You can easily see this at the Bash prompt in your xterm or Gnome terminal or whatever VT100-type console you're using. Type Ctrl-V, and then hit your left arrow key. You will see ^[[D.
In principle, your terminal could also turn Ctrl-Shift-I into some special escape sequence which an application could parse. Such a sequence just doesn't exist in the terminal protocol you are using, that is all.
Moreover, your terminal emulator application probably steals some of these combinations for itself. Shift+PgUp is commonly used for scroll back these days, and so won't be sent into the terminal session even if there exists a code for it.
The function keys have VT100 escape sequences, but some function keys are mapped already. In Gnome Terminal, F1 brings up help. But F2 sends the escape ESC[OQ escape sequence. If we go into Gnome Terminal "Edit/Keyboard Shortcuts" and remap help to some other key, we can then use F1 in the terminal: it sends the escape sequence: It is ESC[OP.
Terminals are weird, but it's not because of the data stream.
The real mind-bending weirdness is in the kernel tty layer, for historical performance reasons. Userspace apps weren't able to keep up with typing in the early days, so the kernel is expected to handle stuff like simple editting (^H) and line buffering. And it has to handle baud rate and uart settings, of course. And it has to trap "special" keys like ^C so that hung applications can be reliably terminated via a signal. And it has to detect dropped lines to free up the terminal for other users.
And it still does all this nonsense even in a world where those use cases are all forgotten.
Terminals are weird.
(But no, there aren't any other good alternatives, so we just deal with it.)
The tty layer handles line editing so that programmers do not have to replicate this functionality.
If you write a simple program that obtains commands using "fgets", you automatically get simple editing, and that functionality goes away when the input is redirected from some other device or file.
The simple editing is consistent from program to program and maintains the user's preferences.
The tty input editing could be done in user space, like in the C library (and in POSIX implementations on non-Unix kernels like Cygwin and whatnot, that seems to be where it is).
The mind-boggling complexity is in sessions, controlling terminals, POSIX job control, foreground and background process groups and such.
Plus the quirks: like Ctrl-D just means "return now from the system call", so it only signals EOF when typed on an empty line due to read returning zero. And on TTY's it's a recoverable condition:
cat /dev/tty /dev/tty
a
b
c
[Ctrl-D] # first "EOF"
d
e
f
[Ctrl-D] # second "EOF
Well, sure. But in a sane modern architecture that simple editor would be implemented in a userspace app (the terminal emulator would be an obvious candidate) and speak a sane protocol instead of the ioctl madness we have with termios.
But it doesn't and it won't, so we all deal with it. But it remains crazy.
Bill Joy's vi (1978) was implemented in user space; that counts as ancient, just a couple of years after the first Unix release. The ioctl madness just has to be applied twice: to change to raw mode input when starting the editor, and then to restore the original mode when exiting or jumping to the shell. The tcsetattr and tcgetattr functions are very easy to use: get the structure, tweak a few things, set the structure.
Another thing is that SIGINT interruption (via Ctrl-C or whatever character is configured) and SIGHUP signals
wouldn't be reliable if faked outside of the kernel.
Firstly, the tty driver can read and respond to a Ctrl-C even if the application is single-threaded and spinning in a loop, or blocked on some device other than the TTY. If Ctrl-C were handled in user space, then there would have to be a thread through which the terminal I/O goes. Secondly, that thread would have to be completely reliable, and never block or hang for any reason other than reading from the TTY.
Also, a security feature known as a SAK (secure attention key) needs to be in the kernel. A TTY which implements a SAK cannot be entirely transparent.
> In principle, your terminal could also turn Ctrl-Shift-I into some special escape sequence which an application could parse. Such a sequence just doesn't exist in the terminal protocol you are using, that is all.
Not only why, but also how they work in some more detail. Only after reading this I understood, for example, why C-u works in the terminal even if right/left arrows do not in this particular moment (like after using read builtin or cat without arguments). Very good read.
"If you do still want or need to make a terminal application that is interactive rather than just being a command-line tool, what is the best way to go about it? You should write it inside Emacs, using Emacs Lisp, and run it as an application by invoking Emacs to run the function that is the entry point for your application. This way you can have legacy terminal support to make use of ssh and tmux, by running Emacs in a terminal, and modern graphical support to display fancy graphics and use more keybindings, by running Emacs in a graphical environment."
Though the latter point is interesting, I find the suggestion unexpected. Isn't curses-based applications a possibility for the author? Why would you limit your users to those that use Emacs, unless the application is just for you?
I'm working on a ClojureScript on Node.js (i.e. instant boot) functional UI library[1] and I find it pretty empowering. I also kept the Node.js touchpoints very separate so I can soon provide a JS Canvas implementation so the same UI code runs on terminal and browser for free. I was suprised how easy it was to implement vi/Emacs-like sequence keybinds, with prefixes and all that. I'm really liking the ability to easily whip up terminal user interfaces for various simple and complex applications.
If you're going to do something new, you better stop worrying about backward compatibility. ssh and tmux won't work? big effin deal. Those projects would need to grow up in order to keep up with the times. You can't be conservative like that if the goal is to right the wrongs and learn from past mistakes. The road is going to be bumpy but the light at the tunnel would be worth the travel.
I say if the limited scancode thing is in the keyboard hardware then we need to fix the hardware itself as well. New keyboard standard that sends Ctrl/Shift/Alt as separate scancodes. Hell a completely programmable keyboard firmware sounds even better. Throw in some n-key rollover and now we're talking!
I would like to see some convergence of terminal and windowing-system happening. I initially dreamt of a graphical terminal but then I thought why not go one step further and make it the main interface (so something in the middle of a graphical terminal and a tiling window manager). However I still need to carve out the details.
So when I want to ssh into a remote machine, I don't use your new program, unless you've also written an ssh replacement. If I want virtual screens, I don't use your new program, unless you've also written a tmux (or screen) replacement. Et cetera.
Your new terminal program is useless till you've replaced or updated "all" the programs that depend on traditional terminal behavior. And "all", in this case, is reasonably close to being literal. If you get 80% of what I need, it's not enough. And the last 20% will likely differ from person to person.
And I'll need to replace my keyboard, too?
I totally don't buy that you're not being tongue-in-cheek.
"I would like to see some convergence of terminal and windowing-system happening. I initially dreamt of a graphical terminal but then I thought why not go one step further and make it the main interface (so something in the middle of a graphical terminal and a tiling window manager). However I still need to carve out the details."
While I am skeptical about discarding (some amount of) backwards compatibility, I'm very interested in this. Feel free to shoot me an email.
Interesting that there was no mention anywhere of termcap/terminfo or curses. In making a new terminal standard, you need some way of doing what the old ones did (or refusing to), but if you support some basic set of primitives and provide a terminfo file, a lot of the tooling should fall into place.
If you do still want or need to make a terminal application
that is interactive rather than just being a command-line
tool, what is the best way to go about it? You should write
it inside Emacs, using Emacs Lisp...
>Wasn't there a time when actual, serious apps were written using XULRunner?
Not really. At least nothing to write home about.
But in general what you mean I guess is some kind of runtime environment. Sure there are lots, but not that many for the Terminal and Emacs is not really a lightweight and proper option.
Weird is a matter of perspective. I started programming, on terminals (mostly, for class work - some Apple/Commodore twiddling aside), in the early 80s. Learning about scan codes on PCs in the later 80s seemed complicated to me, because it wasn't what I was used to at first.
If you are used to a remote device that sends and displays characters, then switch to local integrated devices which exchanges much lower level signals to be mapped in software, the new stuff seems strange. Whether or not it's better or worse depends, I guess.
"Whether Alt+char is succesfully picked up by your terminal application is dependent on the quality of your connection."
This! The most annoying behavior in my daily terminal usage. Whenever I encounter this problem, I feel like "gosh, someone must reinvent the entire stack."
But I didn't know that mosh is able to process the sequence correctly. What a good boy. But doesn't this mean the official OpenSSH client is also capable of fixing the problem?
To clarify somewhat: html is a presentation language.
That's not appropriate for either control or data planes by default.
example-
let's define the data plane as an associated list of byte-strings of u8 characters. This covers current common unix pipe usage on the terminal. If we include some lisp syntax for routing & choose to complect some control and data on the command line we then can compose:
But, fundamentally, the control plane needs to be asynchronous, so there needs to be a signal handler that a program can listen on that takes a stream of associated-lists with the rough form of:
the program listening needs to be able to register with its parent process that it gets a key stream.
anyway. we can work out this system in some level of detail with different representations. The key point is that this needs to be just data addressable easily, without significant difficulty parsing. It also needs to be lossless by default and not dependent upon hacks like timeouts as part of its interface to its consumers. (n.b., it should allow adding as many bucky bits as a keyboard developer wants).
now, if the terminal wants to formally specify as its interface that it takes all output streams keyed by "html-presentation" and render them as html, that seems like a very featureful possibility. I would not write that, but I could see others doing so.
It’s not like we’re doing any better with the client-side web development stack (to take one example).
Building simple composable abstractions and systems is just really hard, and takes a lot of practice and refinement. Unfortunately, programmers don’t necessarily get much practice before their designs become the foundation on which everyone else needs to work (for example, there is very little emphasis placed on designing effective software abstractions in academic computer science programs), and there’s often no easy way to go fix the defects afterward.
Ideally, a computer keyboard would be able to directly send both an arbitrary number of named control functions, and arbitrary unicode text (either as full strings or as code units one by one). Instead though, keyboards (in every existing keyboard protocol) send a very limited number of scan codes, and what to do with those is left entirely up to the operating system. Thus the operating system can’t just get a symbol from a foreign-language keyboard and know what to do with it, but needs to be put into a special mode depending on what keyboard is plugged in. If multiple keyboards are plugged in with different language layouts, too bad: at least one of them will not behave as expected.
Then at every level of the software stack, from the low-level device drivers, up through operating system services, to end user applications (e.g. browsers or terminals) and then on to custom behavior running on those applications/platforms (like a webapp or whatever), everyone gets to take a whack at the meaning of the keyboard code. At each level, there’s logic which intercepts the keyboard signal coming in, digests it, and then excretes something different to the next layer.
As a result, it’s almost impossible for application authors (much less web app / terminal app authors) to know precisely what the user intended by their keystrokes. And it’s almost impossible for users to fully customize the keyboard behavior, because at several of the levels user access is impossible or difficult (especially in proprietary operating systems, or in locked-down keyboard firmware e.g.), and even where users do have access, it’s very easy to make a minor change that totally screws something up at another level, because none of the relevant abstractions are clean.
Furthermore, custom user changes at any of these levels are almost never portable across applications, operating systems, or hardware devices. Every change is tied to the specific hacks developed in a particular little habitat.
Overall, a very disempowering and wasteful part of the computing stack.