Hacker News new | past | comments | ask | show | jobs | submit login
Entr: Rerun your build when files change (jvns.ca)
202 points by zdw on July 1, 2020 | hide | past | favorite | 81 comments



This post got me thinking that it'd be interesting if instead of being given an explicit list of files to watch for changes to, the list of files were inferred with an LD_PRELOAD that listened for `open` / `stat` / etc. system calls that a process ran.

For example, in the example on the blog post, `git ls-files` will almost certainly ignore autogenerated build files, but it's possible for one of those files to change without the output of `git ls-files` changing. Similarly for things like third party packages that are installed system-wide.

With an LD_PRELOAD, all you'd need to do is

    my-watcher ruby foo.rb
and the watcher would figure out which other Ruby files were opened, be them git versioned Ruby files in the current folder, or Ruby gems / Ruby VM dependencies in $HOME/.rbenv/, or system packages in /usr/local, or config files in /etc, ...

I guess I wouldn't actually be surprised to hear if someone has built this already.


Some build systems work that way, tup is one I think. They use strace to intercept file I/O and figure out what has been updated, and thus can figure out an optimal way to re-build.


I think Tup uses FUSE rather than strace (which is why it doesn't track external dependencies, and requires relative paths for all internal files), but I might be wrong.


I built a dependency checker that worked similarly to this once.

By hooking the filesystem calls, you can make a list of all files that a given process touches. When that process finishes, serialize a dependency file containing the hashes, timestamps, and sizes of all those files. Next time you run that same command line, read the dependency file from the last run and compare to the current filesystem state. If it's the same, and you know your command is idempotent, you can skip execution entirely.

Now, if you put that logic in a dll, you can inject it into arbitrary third-party processes to which you don't have source code and it will still work. Name the dependency file after the hash of the command line.


The ClearCase SCM does something similar - it has the notion of "derived objects" and "audited builds".

You can run an audited build command under a special ClearCase wrapper and it will look at the versions of the elements used in the build - if that build has already occurred with the same input elements before, even if in a different view, it can "wink in" the derived object result - that is, it can cause that previous build to be visible in your current view. This can save a lot of time when you're building a large codebase.


A less intrusive approach could use kernel queues[0] on most Unix-like systems. This way, LD_PRELOAD would not need to be used and the process for responding to desired disk I/O is independent of the processes performing same.

0 - https://www.freebsd.org/cgi/man.cgi?query=kqueue&apropos=0&s...


fabricate.py (https://github.com/brushtechnology/fabricate), based off the now ancient memoize.py.

In my experience with C/++, it is faster to combine Make & ccache: just have every C file depend on every header file, and let ccache decide if it needs to be rebuilt.


I really disagree there, specially as the project gets complex enough. ccache will not speedup anything related to linking, for example. But it will increase mtime of all your object files as it writes them even if they come from cache. So at some point even archive file creation dominates your "incremental" build time.

I suggest you don't wait until it's too complex to fix the mistake and do proper dependency tracking since the very beginning. It's not that hard in C/C++.


My project is ~3.6 million lines of code with a 300 ms incremental null build and a 2-10s touch-a-header-file build. I only generate a few hundred exes, and most of the code lives in a single dylib.

I know that's not large, but I've got one Makefile that's only about 200 lines long. It's a pretty good trade-off.


Yes, it's not large at all, but I am already surprised of the claim that you can link 4 million LOCs of line in 300ms. For completeness, around 10x that MLOCs of C++ takes 2 minutes here with gold, on Xeon machines. Even writing the main exec is a good chunk of time (stripped, it already measures almost quarter of a gigabyte).


Most of the code is tuned C code — tuned in the sense of being fast to compile, with a nice C++ wrapper to make it nice to use. I'm seeing compilation in the 100kloc/s on an older laptop.


Cool idea! Minor tweak though I'd rather have the watcher run separately, but I think that would still be easy enough if you resolved file paths to the project root.


If you're using Bazel, bazel-watcher [1] takes care of this, and does it in a much smarter way than just looking at a predetermined list of files.

[1] - https://github.com/bazelbuild/bazel-watcher


Entr and many other similar alternatives do not satisfy my needs. The main fault is that they take list of files instead of root directory and possibly include/exclude patterns. So when I create or rename files, these tools either don't register new files, or they just fail on missing files.

I'm rather reusing existing stuff but seeing how every scriptable filesystem watcher misses the point, I'm inclined to write my own inotifywait/inotifywatch wrapper.


That's literally in the article; `entr -d`. The manpage also has an entry. http://eradman.com/entrproject/entr.1.html#d


Terminal 1:

    mkdir /tmp/project
    cd /tmp/project
    touch file{1,2}
    ls | entr -d echo "a change"
Terminal 2:

    rm file2
Terminal 1:

    entr: cannot open 'file2': No child processes
Regarding `-d`:

> Track the directories of regular files provided as input and exit if a new file is added. This option also enables directories to be specified explicitly. Files with names beginning with ‘.’ are ignored.

So first, it doesn't track NEW directories, and second, it exits if new file is added. Exactly how is this useful?

EDIT: All I want from filesystem watcher is to track files by pattern and just re-run the command, however complex that may be to implement via OS interfaces. When I'm working on a Python project that has packages (directories), I may also refactor (rename files), and this simplistic behavior does not catch that.


I think the interface for `-d` is clunky, but here is the way it is intended to be used:

    $ while true; do
    > ls -d src/*.py | entr -d ./setup.py
    > done
that way when a file/dir is added/removed it restarts


This is in fact the documented recommendation, from the manpage:

  Rebuild project if a source file is modified or added to the src/ directory:
    $ while true; do ls src/*.rb | entr -d make; done


> All I want from filesystem watcher is to track files by pattern and just re-run the command, however complex that may be to implement via OS interfaces.

I've had good luck with modd: https://github.com/cortesi/modd/

I just tested exactly what you described, did not see any errors, and it ran my commands exactly the way I wanted.


I can attest to modd too, it uses a (per project) configuration file to define include/exclude (recursive) globs, along with respective blocks of commands.

I would also like to extol virtues of its sister project devd[0] from same author which makes web development palatable to me. It's a little websocker server that injects a tiny script into HTML to reload the page in browser once modd detects file change, rebuilds front-end and sends devd a SIGHUP.

[0] https://github.com/cortesi/devd


> All I want from filesystem watcher is to track files by pattern and just re-run the command, however complex that may be to implement via OS interfaces.

I assume I'm misunderstanding something, so this is probably a naive question and maybe you can explain further, but if that is all you need why not just write it yourself? It sounds like an <1 hour project to write something cross-platform in Qt that monitors the filesystem recursively with QFileSystemWatcher https://doc.qt.io/qt-5/qfilesystemwatcher.html checks for the configured regexen and runs the configured command.


I have implemented stuff like this. It is not even remotely a <1 hour project. Not if you want it to work reasonably well and not be broken all the time. It’s a full workday for someone who already understands the problem. There are a number of cases you often want to handle:

- Multiple files may change in quick succession, for example, if you hit “save all” in an editor. You might want to delay triggers to see if more events arrive.

- You will sometimes see incomplete / corrupted versions of files, because you ran a command while another program was writing the file. The other program probably should atomically rewrite the file, but you’re stuck fixing the problem. The user doesn’t want to see errors from this and you generally want to rerun the command.

- You need to execute commands in the correct order.

- A command may change state from being queued&ready to having outdated inputs.

If it’s just one command, sure, you could probably do it in an hour. But if there’s more than one command (usually the case!) then it gets far more complicated.

Just looking at the last time I implemented something like this--it was watchman + about 500 lines of code figuring out which actions to run and when. And that’s not even with any parallel execution.


What's wrong with watchman? Seems to be made for that very use case.

https://facebook.github.io/watchman/


I've ran into this issue myself and it's not filesystem watchers missing the point, it's actually something not supported in a number of operating systems. Thus you then either have to resort to nasty user-space hacks (which will be comparatively slow and resource hungry) or accept that you can't watch nested directories.

A more technical description is mentioned here: https://github.com/fsnotify/fsnotify/issues/18


inotifywait can do this just fine. Here's an emulation of `entr` with an `include pattern`:

    inotifywait -rq -m -e close_write --format %f . | grep '\.rules$'
Sure, there may races due to limitations of underlying OS interfaces, but no such races really matter when you edit with mammal speed, so it's perfectly sufficient, and so it is perfectly possible to make good file system watcher, without any user-space power-hungry hacks.


inotifywait is a user-space program that borrows it's name from inotify (because it uses inotify) and when using the -r (recursive) flag it sets up multiple inotify watches to work around the very problem I just described.

In fact it's man page comes with a big fat warning about using -r:

> Warning: If you use this option while watching the root directory of a large tree, it may take quite a while until all inotify watches are established, and events will not be received in this time. Also, since one inotify watch will be established per subdirectory, it is possible that the maximum amount of inotify watches per user will be reached. The default maximum is 8192; it can be increased by writing to /proc/sys/fs/inotify/max_user_watches.

(ref: https://linux.die.net/man/1/inotifywait)

Really what would be nice would be for Linux to support this natively so we don't have to resort to dirty hacks.


> inotifywait is a user-space program that borrows it's name from inotify (because it uses inotify) and when using the -r (recursive) flag it sets up multiple inotify watches to work around the very problem I just described.

What problem?

What you quoted is totally irrelevant for my workflow. On my crappy laptop and all projects in my work directory, the watches are established almost instantly, without any spike in CPU.

There are 4677 directories in Linux kernel and you can just put a knob change in /etc/sysctl.d/ if your code base is bigger than kernel.

You're just inventing OS limitations to justify limitations of lazy user-space programming.

EDIT: Setting up watches on kernel tree including directories in .git:

    ~/src/linux % inotifywait -r -m -e close_write --format %f . |& ts
    Jul 01 16:03:10 Setting up watches.  Beware: since -r was given, this may take a while!
    Jul 01 16:03:10 Watches established.
Less than 1 second. Big deal!


Your workflow might be ok, others might not. But blaming developers for being "lazy user-space [programmers]" when Linux lacks a feature that Windows and macOS support, and which could cause complications when worked around, is a really unproductive way to hold a discussion on HN.


You still didn't specify which feature Linux lacks that Windows and macOS supports. I have shown you that there's no need for any workaround, that the OS feature works as intended, but it seems you're keen on believing FUD scraped from web pages of lazy user-space programmers instead of considering my arguments and evidence.

FYI: What OS interface do you think that `entr` uses on Linux? That's right, inotify.


Not the original poster, but Windows allows you to get all changes in an NTFS volume, effectively allowing you to monitor notifications to an unlimited amount of files and directories (didn't remember what the feature was called, a quick googling led me to https://docs.microsoft.com/en-us/windows/win32/fileio/change... ).

This allows software like Voidtool's Everything to index the entire filesystem and update that index in real time. Using inotify for that is not possible since you'd have to recursively register for hunderds of thousands of directories, which far exceeds the limit of allowed registerations (and either way, would be quite wasteful to hold 100,000 registration handles). As a result, the Linux equivalent to Everything is unable to find recently created files, erronously finds deleted files and has outdated metadata fo recently changed files. Alternatively, you can attempt to reindex right before searching (e.g. rerun updatedb if you are using locate), but then your searches are much slower and the benefit of indexing for a quick search is reduced significantly.

A nicer API would allow to use one registration handle to get notifications for an entire directory tree or at least have a seprate API for an entire filesystem (like the NTFS option mentioned above).

At least, that was the conclusion I came to when I last researched the topic. If you know of a supported way of doing this on Linux I'd really love to know! It will allow me to finally make the program I wanted to make!


> As a result, the Linux equivalent to Everything is unable to find recently created files, erronously finds deleted files and has outdated metadata fo recently changed files.

Have you heard of https://www.man7.org/linux/man-pages/man7/fanotify.7.html ? This doesn't require watches on individual directories. Though:

> Calling fanotify_init() requires the CAP_SYS_ADMIN capability.


You're being quite aggressive about your disagreement here. I don't think it's "FUD" to rightfully point out that using inotify on a large directory tree takes extra userspace code that is explicitly called out in the inotifywatch docs as performing poorly enough to worry about race conditions. Not to mention that it requires a limits tweak. This isn't "lazy," it's wishing for the mechanism to be more capable. Would you have this same attitude if the default limit were 1024? 512?


Why wonder what my attitude would be with lower limit? Default limit is 8192 and it's OK for use case exemplified by entr and you can easily change it.


I agree with your frustrations around this limitation and think file watchers should do their best to support arbitrary file addition/removal in directories. That being said, the races do matter when you edit with "mammal speed" but use source control to merge/rebase commits that affect your watched directories. Especially in a team setting when multiple coworkers are working on the same directories, git can be blasting rapidly through a lot of file moves, deletions etc so it's not really an edge case.


Git is a good point. However, the only time I desire commands triggered by changes in source tree is during actual code writing to instantly test my changes. On the other hand stopping it during Git operations could harm my workflow. This could be easily solved by simple 1s debounce, with this hypothetical debounce filter

    inotifywait -rq -m -e close_write --format %f --exclude .git | grep '\.rules$' | debounce 1 | xargs ...


here you go: it looks at everything in a specific directory.

    `while inotifywait -q -q -r -e modify -e move -e create -e delete --exclude '[*~#]' $(mapi $wd $@); do`
wd is the working directory.

and mapi

    my $prefix=shift;
    foreach my $s (@ARGV) {
           print "$prefix/$s ";
    }

Adjust to taste.

it is _hax_, as it were. but it has worked well for me.


Have you tried Tup[1]?

[1] - http://gittup.org/tup/


Nodemon[1] is also good, normally Node centric but all parameters can be configured.

[1] https://nodemon.io/


> To install, get node.js, then from your terminal run:

There's no way I'm going to invoke node.js and associated `node_modules` with 1:10 code to junk ratio, for a any CLI scripting.


This seems to fill a similar role as watchman[1] (specifically with watchman-make[2])

I've heard good things about watchman because it will make a best effort to let filesystem changes "settle" before running the command specified. If there's a comparison of these two written up or if someone can give their testimonial I'd love to hear it.

[1] https://facebook.github.io/watchman/

[2] https://facebook.github.io/watchman/docs/watchman-make


dependency free, solid design, perfect with tmux, great online reference. does not try to do magic: http://eradman.com/entrproject/


You do realise that's the tool TFA is about right?


Similar tool that I currently use: https://github.com/watchexec/watchexec


Original author here, glad you like it!

watchexec was borne out of a few frustrations with entr, mostly around how it handles new files being created. However, from a pure design standpoint, entr is just better than most anything out there due how closely it hews to UNIX philosophy, and it gets so far on just that.

The only real improvement that can be made to these tools (currently) would be to have perfect information about what caused a file to change. Currently, most tools require you to tell it files/patterns to ignore to avoid triggering loops where the file watcher changes files and ends up triggering itself over and over. watchexec did good work here by ingesting your .gitignore and using that.

Unfortunately OSes don't provide great info when it comes to file modifications. On Linux, ptrace/LD_PRELOAD would enable us to know the set of all files changed as a result of running the file watcher (and thus ignoring them automatically). DYLD_INSERT_LIBRARIES is a thing on macOS, though it is subject to SIP restrictions with some binaries. I'm unsure what mechanism exists on Windows. The highly platform-dependent nature of this is one reason why I haven't really pursued this line of work in watchexec.


Thank you for watchexec. It makes my life better every day.


Commands like this are super-useful, and I use them all the time when editing LaTeX papers.

I used atchange for a long time:

http://jeffreycopeland.com/work/PDF/1996-03-atchange.pdf http://users.fred.net/tds/lab/atchange.html

And now, out of habit, my own wrapper around inotify. But more polished solutions like entr do seem the place to start now.


Cargo Watch is something similar in the Rust ecosystem - https://github.com/passcod/cargo-watch


Entr is amazing - I'm using it in my go projects to hot reload when developing!

https://smalldata.tech/blog/2019/03/15/hot-reload-go-applica...


Very good piece of software

Tip: Emulate vscode Live preview of Markdown with entr:

    ls file.md | entr script.sh
script.sh

    #!/bin/sh
    pandoc file.md -o file.pdf
    mupdf file.pdf


I’m sorry: why not just use make? This is a serious, not sarcastic question.

When developing I use make to invoke my program; that ensures everything is rebuilt.


Does make have watch functionality built in? Sometimes I use make and entr together. entr detects filesystem changes and make efficiently rebuilds. But nowadays a lot of languages have their own build tools besides make, and entr works directly with any of them.


Here is another way to look at it:

    $ while sleep 2; do make -s build; done
What files are interesting is already encoded in your Makefile and make is pretty good at figuring out if a given file changed or not so, instead of duplicating efforts, run make every second or two and let it figure out what needs to be done.

It won't eat your memory or pool of file pointers and your CPU will barely feel it.


That's a solid approach if your dependencies are already managed by make, but languages like Rust and Go do their own dependency management, and Rust in particular is pretty slow to compile. I wouldn't want it running every 2 seconds.


Well it knows from the filesystem which files have changed.

If you mean “invoke as soon as a file has changed”: I certainly don’t want that behavior as many code changes require multiple files to be edited and there’s no automatic way to understand as the change hasn’t been written yet.


Because make doesn't do this?


Guard [1] is a great tool in the ruby world for rerunning tests, reloading your browser, or running any arbitrary task based on files changing inside your file system. Its built on top of Listen [2] to do the fs piece, which has pretty well supported at least across linux and mac os over the years.

[1] - https://github.com/guard/guard [2] - https://github.com/guard/listen


See also https://linux.die.net/man/8/incrond

> The inotify cron daemon (incrond) is a daemon which monitors filesystem events and executes commands defined in system and user tables. It's use is generally similar to cron(8).


Thanks for sharing this. I'm continuously surprised by how many tools exist in the *nix ecosystem that I've never come across. It makes me question if there's a discoverability problem here. Is there any website that broadly categorizes what utilities are available given a desired goal?


I've been meaning to 1. do a blog for the 'command of the day' and also 2. categorising commands by use / intent.

But until then, no I don't know any.


Nice tool - I'll be putting that in my toolkit.

I love the way she straces the tool to find out how it works - a true hacker at work :-)


Entr requires one watch per file. One tool that I haven't seen mentioned here yet is fswatch, which uses directory based patterns and works more akin to a find: it outputs changed file names to stdout.

Interesting thing to look at and see if it maybe suits you better.



I use a shell script named "onsave" that wraps the following incantation:

    inotifywait --event modify --recursive . --quiet
If anyone wants the full script I can post it here. One thing many of these tools miss is that I typically want the build command run once at the start.


I use this kind of thing regularly with web stuff like Django, Flask and frontend/JS stuff using parcel, but I wonder what other things this is appropriate for? I can't imagine it being very useful for C/C++ code. I don't always want to rebuild every time I save a file.


One of the issues I've seen with entr, fswatch, etc is they don't respond to ATTR changes (e.g. via touch).

When developing on VM, sometimes you have to "forward" filesystem events to the guest. Usually that happens via TCP/UDP and touch, but wont trigger the reload.


I used to use `inotifywait`. It met all my requirements so I just stuck to it. Allowed me to have LaTeX rendered to PDF in Evince on one side and write in Vim on the other side.

Since Evince reloads file if it changes on disk, that meant I had immediate feedback.


Another great tool that I use everyday is watchexec - https://github.com/watchexec/watchexec.


There are better tools which use the ptrace api to check if files changed which the build actually depends on.

One downside is that ptrace is not recursive/reentrant so some tools won't work.



This should go nicely with that new js bundler and that new package manager.


entr is a great too, I use it all the time to automate various little things when I'm developing. The coolest part that it works in every environment I care about, which is FreeBSD, Linux, and macOS.


Mmh. interesting. I have been using watch + redo to do this.


any equivalent for Windows?


I created https://crates.io/crates/runwhen a while back and it's cross platform. Unfortunately you have to build it yourself :-p I never got around to creating an automated build pipeline for it.


I will say, inotify is one of the best designed kernel facilities, it avoid so many of the pitfalls of other facilities used for the same task, and is generally easier to use. I wonder if any BSDs plan to implement it, outside of Linux ABI layers.


This tool seems like a slight improvement on inotify.

Otherwise, it doesn't seem especially unusual or unique to me, and yet the author is presenting it as a revolution.

Am I missing something?


You're missing the fact that another person's knowledge or experiences could be different from your own.


Yeah, it's https://xkcd.com/1053/ happening yet again.


entr is cross-platform and utilizes the best of the platforms it runs on.

Also, the docs are explaining the problems with inotify: http://eradman.com/entrproject/


A slight improvement in usability can make all the difference in the world for a user's experience.


You're not. It's glorified without any substance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: