Reading the readme, I find myself wondering what problems this solves for. The example of wrapping nano strikes me as particularly odd, since editing files is already fairly easy to do programmatically either via direct file operations or with tools like sed. Aside from editors, most tools that offer a tui also expose the functionality for programmatic access (typically with some additional flags to the same binary). So it just strikes me as the wrong abstraction to interact with most things.
The other potential goal I could imagine was automation. If that is the case, I would recommend the author to make that more clear in the examples and also to describe it in relation to `expect`, which would probably be my go to tool for such use cases.
Whatever the case it does look like a fun project, even if I don’t see any cases where it would be the right tool for the job.
Hey, project lead here. I had a very specific use case in mind: I’m playing with using LLM agent frameworks for software engineering - like MemGPT, swe-agent, Langchain and my own hobby project called headlong (https://github.com/andyk/headlong). Headlong is focused on making it easy for a human to edit the thought history of an agent via a webapp. The longer term goal of headlong is collecting large-ish human curated datasets that intermix actions/observations/inner-thoughts and then use those data to fine-tune models to see if we can improve their reasoning.
While working on headlong I tried out and implemented a variety of ‘tools’ (i.e., functions) like editFile(), findFile(), sendText(), checkTime(), searchWeb(), etc., which the agents call using LLM function calling.
A bunch of these ended up being functions that interacted with an underlying terminal. This is similar to how swe-agent works actually.
But I figured instead of writing a bunch of functions that sit between the LLM and the terminal, maybe let the LLM use a terminal more like a human does, i.e., by “typing” input into it and looking at snapshots of the current state of it. Needed a way to get those stateful text snapshots though.
I first tried using tmux and also looked to see if any existing libs provide the same functionality. Couldn’t find anything so teamed up with Marcin to design and make ht.
playing with the agent using the terminal directly has evolved into a hypothesis that I’ve been exploring: the terminal may be the “one tool to rule them all” - i.e., if an agent learns to use a terminal well it can do most of what humans do on our computers. Or maybe terminal + browser are the “two tools to rule them all”?
Not sure how useful ht will be for other use cases, but maybe!
This makes a lot of sense. I would call that out, because it's really surprising out of context. Hopefully you can see how unusual it would be to try to use human interfaces from code for which in at least the majority of cases, there are programatic interfaces for each task that already exist, and would be much less bug prone / finicky. I guess the analogy would be choosing to use Webdriver to interact with a service for which there is already an API.
I had this issue of needing to control Docker containers in a VPS, without sharing access to the server itself. It seems like it will be easy to create a simple web service that can communicate with the ht API, list my containers, show me the stats, and restart containers if I want to. I can manage all security in the web service.
You could even do something like a reverse proxy to very limited paths although I tend to think that would ultimately be a bad idea and making your own http calls is probably better.
You probably don’t want to expose that service to the internet…
I see this as something like the console management for VPS. Back in the day, I remember reading about how prgmr.com had setup a console that you’d directly SSH into. That’s now this interface [1] (and a company name change), but I could see how programmatically working with this would be helpful.
The comment you're replying to mounted the Docker daemon as a local socket, accessible only on the machine. (It exposes an HTTP server still.)
I don't see why one would be any more comfortable exposing a shell to the internet than the Docker daemon. It grants _more_ capabilities. Either should likely be protected by authentication.
My understanding was that there was a server running Docker containers where the admin wanted to allow others to control/start containers without giving them access to the machine through a local login. The idea proposed was to make the docker port accessible to the outside world (authenticated, somehow).
I'm not sure I'd want to expose the Docker port to the outside world (or outside of a strictly firewalled subnet). Even if it is wrapped, this seems to dangerous to me.
The service I talked about is not a shell. It's a command line program that operates as the shell when you login via SSH. Instead of bash/zsh/etc, this program runs instead. The purpose is to give the VM admin access for out-of-band management (serial console, reinstalling the OS, etc). I'm a big fan of this approach, where you don't necessarily get full access to the host, but you do have enough access to do the work that's needed (and still SSH encrypted). No more, no less. To me, this seems like a great approach for something like restricted VPS or container admin.
I ended up doing something similar for an SSH jump box a few jobs back where you could setup some basic admin things (like uploading SSH keys) using a CLI program that was used as an SSH shell.
To bring it back to the original post -- like others, I had a hard time seeing what the OP could be used for. Until I thought about this OOB CLI. It would be great for scripting access to something like this.
I think I was very unclear. I didn't mount anything, the docker daemon by default is accessible over http (over a unix domain socket rather than the typical TCP).
I was proposing that this persons web app not do any sort of subprocess automation via something like ht, and instead take in requests and talk to the docker daemon on clients behalf. Since that allows any sort of authentication or filtering that needs to happen.
I wasn't really seriously proposing the straight reverse proxy setup. That's one of those layer violations like PostREST that is either genius or lunacy. I haven't figured out which one.
to me this seems essentially like 'screen/tmux without the multiplexing features' which is useful because most of us do 'terminal multiplexing' via our window manager and we're really just using screen because we want to detach the process from the terminal session (e.g. as a glorified nohup wrapper). another similar tool is `nq`.
I wrote an integration testing framework which I wanted to integrate with a tool exactly like this so it could be used to, e.g. test a command line app like vim.
Expect is what I tried to integrate with first. It falls over quite quickly with any kind of app that does anything mildly complicated with the terminal.
Interesting. When we decided to build ht we didn't compare it to expect (which I hadn't heard of or used) but I'm comparing the two now as they seem related.
How exactly did `expect` fall over?
From what I can tell, expect does not provide the functionality of a stateful terminal server/client under the hood for you so it isn't as easy to grab "text" screenshots of a Terminal User Interface, which is one of the main motivations behind ht (will update the readme to make this main use-case more clear)
Screenshots was one thing that didnt work that I needed but I think lots of control characters used by command line apps also messed up pexpect.
I built my own probably not very good equivalent of your thing called icommandlib. I'm going to investigate ripping it out and replacing it with your tool.
none of the Linux greeters meet all my needs, so i fall back to `login`. but i still need a graphical program for actually entering in my password -- particularly because some of my devices don't have a physical keyboard (i.e. my phone). so i take the output of a framebuffer-capable on-screen-keyboard [1] and pipe that into `login`. but try actually doing that. try `cat mypassword.txt | login MobiusHorizons`. it doesn't work: `login` does some things on its stdin which only work on vtty. so instead i run login on /dev/tty1, and pipe the password into /dev/tty1 for the auth.
yes, this solution is terrible. a lot of things would make it less terrible. i could fix one of the greeters to work the way i need it (tried that). i could patch `login` (where it probably won't ever be upstreamed). i could integrate the OSK into the same input system the ttys use... or i could reach for `ht`. everything except the last one is a day or more of work.
As others have kinda alluded to, it could be useful for testing TUI applications. I develop a logfile viewer for the terminal (https://lnav.org) and have a similar application[1] for testing, but it's a bit flaky. It produces/checks snapshots like [2]. I think the problems I run into are more around different versions of ncurses producing slightly different outputs.
The immediate thought I had upon reading the description was "this would be great for Minecraft servers".
Most of us running Minecraft servers on Linux have it wrapped in screen or tmux because the CLI is the only way to issue certain commands including stopping it properly.
What I typically do is create a systemd service for game servers and attach a TTY. That way it starts with the rest of my web services, and linux already handles "i/o to processes" via files that other process can access(e.x. /run/minecraft/{stdin,stdout,stderr})
Would you mind expanding on that, or can you point me at some relevant documentation?
I've never seen a systemd service example for Minecraft which allowed for sending commands to the server CLI and seeing the result without involving screen/tmux/etc. The top result on Google just doesn't allow command input at all, running the service "headless", the one on the official MC wiki uses screen, and the only other options I've seen use RCON which is neither secure nor does it show the responses you'd get on the MC console.
If there's a way to run just the straight Minecraft JAR as a background service and still be able to interact with it in the occasional cases where I need to I'm very interested.
Oh yea one more comment, this stdin redirection isn't really necessary in minecraft from the last decade.
The minecraft server has a built-in RCON server running on a separate port than can be enabled(https://wiki.vg/RCON), and once enabled can be interacted with an RCON client(like https://github.com/Tiiffi/mcrcon).
So instead of redirecting stdin to a systemd process, you can also just leave stdin disconnected and use the built-in RCON server to do commands every so often.
Basically, you setup the standard minecraft service and then create a "socket" in systemd to use as stdin for the process(relevant documentation in systemd.socket and systemd.exec).
What this will do is add a systemd dependency on minecraft.service to start minecraft.socket first(which creates the fifo `/run/minecraft.stdin`) then setup minecraft.service to listen to this socket for it's StandardInput(while leaving stdout and stderr pointing towards the journal).
The service can then be started and set to automatically start on boot(`systemctl daemon-reload && systemctl enable --now minecraft`). While running, data can be written to the socket file with `echo` and redirection(e.x. `echo "help" > /run/minecraft.stdin`), and the output will be visible in the journal(`journalctl -xef --unit minecraft.service`)
If you set stderr/out to go over the socket as well, then you can attach something like `screen` to it and use it like a typical TTY(or `telnet`).
This uses the file `/run/minecraft.stdin` as the socket, but the documentation for systemd.socket shows that this can also be a TCP port to listen for connections(and systemd.service shows using regular files, but then you have to manually set them up).
andyk here. it's clear our readme is lacking use cases! adding some now. When we introduced ht on twitter I gave a little more context -- https://x.com/andykonwinski/status/1796589953205584234 -- but that should have been in the project readme.
Also a few people comparing to `expect`. I haven't used `expect` before, but it looks very cool. Their docs/readme seem only slightly more fleshed out than ours :-D
Looks like the main way to use expect is via:
spawn ...
expect ...
send ...
expect ...
etc.
so, the expect syntax seems targeted more towards testing where you simultaneously get the output from the underlying binary and then check if it's what you expect (thus the name I guess). I can't see if there is a way to just get the current terminal "view" (aka text screenshot) via an expect command?
ht is more geared towards scripting (or otherwise programmatically accessing) the terminal as a UI (aka Terminal UI). So ht always runs a terminal for you and gives you access to the current terminal state. Need to try out expect myself, but from what I can tell, it doesn't seem to always transparently run a Terminal for you.
There might already be some other existing tool that overlaps with the ht functionality, but we couldn't find it when looked around a bunch before building ht.
Sorry, my wording wasn't very clear. I wasn't trying to imply that ht is more geared towards scripting than `expect` (in fact I'd say `expect` is more scripting-oriented being an extension of a scripting language) but rather that ht is more geared towards scripting the terminal as a UI than `expect`.
Am I wrong about that? (I may very well be since I haven't used `expect` before)
Based on my understanding/recollection of `expect`, the concept is that you're scripting a command/process (or command sequence) via a "terminal connection" (or basic stdin/stdout), based on the (either complete or partial) "expected" dialogue response.
I guess that might in theory be possible to script a TUI with but I suspect it'd get pretty convoluted over an extended period of time.
(BTW I mentioned this in a comment elsewhere in this thread but check out https://crates.io/crates/termwiz to avoid re-inventing the wheel for a bunch of terminal-related functionality.)
I see what you're saying. When I was writing scripts in `expect`, I didn't really ever try to automate tui programs. So, this could absolutely be a better way to script the the terminal as a ui as you said.
I think it is a really good idea to separate vt100 emulation "backend" from the UI. Then all terminal emulators could use a common implementation and just focus on displaying the text, instead of emulating quirks of 50-year old devices.
Using JSON as RPC also seems like a good idea, especially when SSH-ing in, as the language server protocol has shown.
That being said, I don't see this project doing anything (yet?) about escape sequences (for colors, clickable links, mouse and clipboard integrations, setting title and cwd, etc) and handling shortcuts (e.g. translating ctrl-c to sigint). There are very few terminal emulators that get all of that right (I think Kitty, iTerm, WezTerm), so it would be great if this project could lead towards more of them being written.
Most sequences used by CLI apps that are supported by popular (widely used) terminal emulators are supported here, i.e. there's good compatibility with VT100/VT220/VT520/etc.
At the moment there's no support for mouse and clipboard - ht uses asciinema's avt for terminal emulation, where this was never needed (although may be added to avt if ht will need it).
Regarding the colors: internally there's full support for standard indexed colors (palette) and RGB. The "getView" call currently returns a plain text version of the screen buffer, stripped of all color information (plaing text is what andyk's headlong project, which uses ht, needs), but we can easily make it return color attributes for each cell or segment of text (either by modifying "getView", or adding complementary "getRichView" or sth like that).
Is there an actually convincing simple protocol for a 2D character buffer? (As opposed to a jumped-up typescript like the DEC terminals.) I’ve tried looking at the IBM tradition, and while it’s certainly different, I wouldn’t say it’s better on this particular point.
(There is also the part where a VT100-style display is a bit more than just a character buffer—there’s rewrapping on resize, for example, which IIUC be on for some lines and off for others—but let’s assume we’re willing to postpone updating window contents until the resize is concluded so can afford to do that on the application/VT100-processor side. Even though that feels like 1995.)
(Projects that build on this crate includes "ratatui" among others...)
In particular, I note at least these specific feature areas:
* "support functions for applications interested in either displaying data to a terminal or in building a terminal emulator": https://lib.rs/crates/termwiz
ht uses asciinema’s avt as the underlying terminal emulator, so there wasn’t much re-invented here really. It was mostly glueing avt, PTY and JSON RPC together.
Combined with nohup, this could probably be useful for detaching/reattaching to long running processes across user sessions?
I'm back to using tmux again, but for a while I was using another program dtach, to start vim sessions that I could disconnect and reattach to. Inside neovim I'd have a bunch of terminals & buffers & what not, so it felt redundant having tmux also hosting an event higher level environment.
Dtach is super super lightweight. Tmux is keeping copies of each screen in memory, is doing some reprocessing. Dtach is basically a pipe wrapping input a program's input and output, data in data out.
I even wrote a little shell script to let me very quickly 'dta my-proje' which will go autocomplete my-project name and either open the existing dtach session in that project or go create dtach session with vim in it. https://github.com/jauntywunderkind/dtachment/blob/master/dt...
It would be interesting to see something like dtach used for automation or scripting, as it seems targeted for. The idea of being able to relay around the input and output feels like it should have some neat uses. There isn't really a protocol, afaik, and there's definitely no retained state for a getView like ht. But it should in many ways function similarly?
It's pretty interesting. I have actually stumbled over this yesterday, because I needed to figure out a way to automate a very specialised and ancient TUI application (think car rental software that is still used worldwide). Meanwhile I've wrote a crude parser for ANSI escape sequences and managed a virtualised screen representation; but I'm also having a fair bit of problems sending commands.
ht seems to be almost "there" for me, and would allow me to easily build a sequence of actions. However, it's kinda missing the color representation, which is also a problem for me: I need to read the color of a specific row/column to know that a specific menu item is selected correctly, and then proceed from this.
I tried to contrast to `expect` in a couple of my other responses, but yeah this is my sense too after looking briefly at `expect` - that ht always transparently sets up a terminal for you under the hood and you interact with that so you can always grab a screenshot of any terminal UI.
I don’t think `expect` is targeted at this use case (though I am only learning about `expect` right now so could be wrong)
Indeed, maybe musl will finally be the long-term compatible Linux native binary format to one day replace win64+wine.
glibc compatibility breakage is the bane of my existence.
The situation is made worse by GitHub pushing people to build against the default extremely recent glibc in "ubuntu-latest" for binary artifact builds rather than what should be done: building against the oldest glibc possible (you can still do that within a "ubuntu-latest" container).
glibc breaks "version compatibility" at times for the most IMO ridiculous reasons but also has support for running binaries linked against older glibc versions on newer glibc that just gets completely ignored.
It depends on one's definition of "feasible", I guess.
Building against the "most recent glibc pushed by GitHub" doesn't make it feasible but building against an older glibc (which you can still do on a container which is otherwise using a recent glibc) is at least somewhat feasible.
The primary benefit from building against musl is that it "coincidentally" means that even the people who don't care at all about supporting older systems end up doing so "accidentally" due to their desire to use musl (for, presumably, some other reason).
You’re welcome :) The asset names got slightly weird in this release (and GH doesn’t let me change it because there’s a dot in the name and it prevents removal of file extension), but with a future release the GH action should name them better. But you would probably rename the downloaded file anyway.
Oh this is cool. I looked at using tmux before we built ht because I’ve used screen and tmux forever. I didn’t find libtmux though. Will def check it out.
I’m guessing it’s mostly useful for writing a terminal emulator in the browser. In theory I think all input and output on the terminal is done via text, which may contain control codes, so you’d still have to write code to render the text, support mouse input and selection, etc.
My first thought was using it as a compatibility layer for VT100-style CLI programs. Hypothetically, if we wanted to finally replace VT100 emulation and move to some new legacy-free protocol for terminals, we would need some sort of shim for running legacy VT100 programs and displaying them in our shiny new VT-next terminal. Similar to the `vt` program from plan 9.
It looks like the use case is if you want to script the usage of a tui program. In most cases it would be best to script the operation yourself with sed rather than ht/nano. I could see this being useful for scripting internal tui tools without access to the source code.
I shared the motivating use case for why Marcin and I built this (LLM agents using terminals) in a diff comment but I’ll also expand the readme to give examples of use cases.
This is an awesome project. It aligns with what I was hoping to get out of expect for certain interactive commands but had a hard time doing. Like selenium or cypress for terminals. I could see this being valuable for end to end testing developer tooling workflows. Maybe it was me not knowing enough about expect. Looking forward to seeing what comes next!
To expand on what andyk wrote in the sibling comment:
Programs running in a terminal don't get individual components of composite key presses such as ctrl+a, shift+b, so they don't see "a with ctrl modifier" or "b with shift modifier". The modifier keys are handled by terminal emulator before sending the key's ascii value to the program, modifying the regular ascii letters appropriately. So when "a" (ascii value 0x61) is pressed while holding shift, its ascii value is ... shifted (down) by a constant 0x20, making it ascii 0x41, which represents "A". Similar with ctrl key, which shifts down the ascii value by 0x60, turning "a" into 0x01. So to send "ctrl+d" you send input with a single byte of value 0x04 ("d" ascii 0x64 minus 0x60). ht uses PTY under the hood, and this is how you send keyboard input into to a program via a PTY. This is kinda low level though, and there's definitely a possibility of implementing a high level input method in ht, which would parse string such as "<ctrl+d>" and automatically turn it into 0x04 before sending it to the process.
In other words, the way input in ht works right now was the easiest, simplest way of implementing this to get it out the door.
The other potential goal I could imagine was automation. If that is the case, I would recommend the author to make that more clear in the examples and also to describe it in relation to `expect`, which would probably be my go to tool for such use cases.
Whatever the case it does look like a fun project, even if I don’t see any cases where it would be the right tool for the job.