Hacker News new | past | comments | ask | show | jobs | submit login
Ncdu 2: Less hungry and more Ziggy (yorhel.nl)
236 points by signa11 on July 25, 2021 | hide | past | favorite | 42 comments



This is how you do a migration to another language: you lay out the benefits, fully accept the drawbacks, and try your best to be reasonably supportive of people who, for various reasons, cannot use the new version. It’s refreshing to see considering that the usual maintainer response to “I’m sorry, but I need to use the C version” I see in these kinds of situations is a combination of gaslighting (“your usecase is not really valid or is too niche for me to care”, “are you telling me that you want security vulnerabilities”) and outright refusal to accept that changes can mean that some people will be adversely affected. The author wanted use Zig, they did it, and they are polite enough to continue to provide basic maintenance for the C code for those that really need it.


>> and try your best to be reasonably supportive of people who, for various reasons, cannot use the new version.

Keep in mind that "maintainence" according to them means "pretty much what I've been doing for the past few years: not particularly active in terms of development, just occasional improvements and fixes here and there." If you're trying to apply this to situations such as "py-cryptography" -- they are not very comparable.

Pretty much every license in existence has a line like the following: "THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED". This is especially true when the upstream never declared "support" for a given use case to begin with, such as different Libc implementations, big-endian platforms, etc...

Don't complain about free beer.


Why is py-cryptography not comparable?

Had py-cryptography done the same thing (i.e. made the same announcement and promises, and followed through on them), I'd think we could say the exact same thing about py-cryptography as above too.


I’m actually not too familiar with the py-cryptography situation, but perhaps it is similar; I’m not sure. But what I’m talking about is projects that move to a completely different language and in the process cause problems for a lot of their users, who understandably voice their concerns. It is always nice to see projects willing to consider and make a good-faith effort to accommodate these needs. The reason I felt the need to leave a comment, though, is that I sometimes see the opposite: complete apathy, sometimes even willful shaming of those who explain why the change is problematic for them. They’re treated like luddites, or entitled, for merely mentioning that the change has affected them in ways that may not have been obvious when the decision was taken. That’s just being a jerk.

I can (and have) actually written quite a bit about this, but I’m not impressed by arguments where people quote the terms of a license to justify them being a jerk. I maintain several open source projects, and I can tell you that the license tells you nothing but what you must do to avoid being sued. It is a legal contract, not a guideline for how you should behave, unless you’re one of those people who goes around providing the bare minimum courtesy to others as is necessary to not get in legal trouble (we have a word for those people: jerks).

When I work on software other people use, I am a considerate person who understands that my code will often be used in ways I did not expect it to be. I think people that do is are really awesome! As a software maintainer, I believe it is my duty as a nice person to extent some amount of courtesy to them. Of course, there are limits to everything, but that limit is absolutely not at the point where my legal liability ends.

So, coming back to this topic: I have seen projects that take the stance you have in response to people reporting things like “I can’t compile this software for my platform anymore :(“. I think that is a legal, but jerk move. In this case the maintainer not only understood the concerns that people may have from the changes, but they also committed to providing basic support for it even though they obviously didn’t have to do anything. A good maintainer recognizes that a minimal bit of effort can pay off handsomely in goodwill, and I just wanted to call attention to that fact.


Just started learning Zig, but I'm already amazed by the language. Feels so intuitive, simple and productive, really a good choice for quick prototyping. I'm glad to see Zig being used as a "successor" to C in projects like this.


Very cool. In my mind, ncdu is "the" TUI tool for disk usage analysis. I copied its UI in making btdu, a similar tool specifically for btrfs filesystems. Written in D!

https://github.com/CyberShadow/btdu/blob/master/ARCHITECTURE...


I can vouch for ncdu...does one thing and does it well. Also see:

https://github.com/jarun/nnn/

https://github.com/bootandy/dust

> There is, after all, nothing more annoying than having to get re-acquainted with every piece of software you've installed every time you decide to update...

Looking at you $EVERY_MODERN_GUI_APP


nnn switched the keybindings around 3+ times while I was using it, and I don't tend to use this kind of stuff more than a few times a month, so it takes a long time to memorize them. It's a nice tool; I wish it's keybindings didn't forcibly evolve so fast.


I love Ncdu, it's one of the first things I install in computers I need to manage remotely. Good to know that version 2 is underway. Thanks!


tmux, ncdu, htop .. there's a bunch of utils of similar quality


Nice to know that the C version will still be maintained as well, there are systems where setting up Zig would be a hassle.

> Ncdu 2.0 doesn't work well with non-UTF-8 locales anymore, but I don't expect this to be a problem nowadays.

I can believe this, but has there been any investigation into how common non-UTF-8 locales are nowadays? GNOME has been broken with non-UTF-8 locales for years (I can find a 6-year-old bug report that still has not been resolved) so that may have pushed users that previously were using non-UTF-8 locales to UTF-8, but that is only GNOME and does not apply to users of any other environment, who may have continued using non-UTF-8 locales without issues.


ncdu is one of my favourite tools, and Zig one of my favourite languages.

For like me who are curious:

- C version source: https://code.blicky.net/yorhel/ncdu/src/branch/master/

- Zig version source: https://code.blicky.net/yorhel/ncdu/src/branch/zig

The code is nice and very well commented.


> - C version source: https://code.blicky.net/yorhel/ncdu/src/branch/master/

> The code is nice and very well commented.

I opened the first file in src/: browser.c. Let's take the function browse_draw_mtime() and start picking nits :-)

It has a buffer of 26 chars, which will (for a regular C string) mean 25 characters + 1 NUL terminator.

  char mbuf[26];
But in the end, it prints with the printf()-like function:

printw("%26s", mbuf);

the count there is supposed to exclude the NUL terminator. So it should be 25 and not 26. Note that it cannot cause a problem, but it may indicate that the author didn't carefully grasp the exact definition of the format.

Before that, in a branch the mbuf buffer is filled with the strftime() function. This function is stupidly defined by the C standard and POSIX didn't make it better. It is not the author's fault, but one has to account for its flaws.

  strftime(mbuf, sizeof(mbuf), "%Y-%m-%d %H:%M:%S %z", localtime(&t));
One would assume that the result string is truncated if it happens that the result would be too long, so that code would be fine. Well, not quite, it just protects from writing after the buffer. The standards say that in cases when the buffer is not long enough, the content of the result string is indeterminate :-/ Actually, it could even be not a string (not a properly terminated string).

So one should check the value returned by strftime(), check if it is 0, and act accordingly.

Again, it is not dangerous, since the printw("%26s", mbuf) won't read after the buffer. But it may write garbage, for example after year 9999, when the expected result string is too long for the buffer.

-------------

Then in the last file, util.c, there are those macros:

   #define oom_msg "\nOut of memory, press enter to try again or Ctrl-C to give up.\n"
and

   #define wrap_oom(f) \
which at some point does:

    write(2, oom_msg, sizeof(oom_msg));
This is going to emit a NUL character after the newline, because sizeof() of a string literal accounts for it. So, to stop after the newline is emitted, it should be sizeof(oom_msg)-1.


> Let's take the function browse_draw_mtime() and start picking nits :-)

Let’s…not? I don’t see why you would read “this code is nice and well commented” and immediately comb the code to try to find dubious “bugs” with it? It might have been mildly relevant if you responded with “no, I don’t think the code is actually that readable, for example look at this part” but to jump in when nobody made any assertions of correctness or safety and then bring up minor problems is just strange. If someone says “I think she is very pretty” do you respond with “let’s look at her parallel parking skills, shall we? Eh, I’ve seen better”?


When I see “this code is nice and well commented”, I open the first file that comes up to see how good it is, how an example it can be. Then what I see there is functions without any comment, with single-letter variables, with little error checking, with hardcoded yet repeated literal values, and with a few mistakes (with or without consequences but that requires some investigation to be assessed). Then I open a second file and I find a mistake there too. That's in less than 3 minutes, in a review that's just a quickly oversight picking and peaking random excerpts.

So should I have said, "baaaah, this sucks!"? As it nevertheless far from being the worst code ever, I'd rather warn that what I am going to do amounts to nit-picking, and put a smiley to show that I don't want to disparage the work presented to me.


> Then what I see there is functions without any comment, with single-letter variables, with little error checking, with hardcoded yet repeated literal values

Well, you didn't really talk about those things the first time, you tried to find bugs instead…


In the end, downvotes indicate how many people had a strong negative reaction towards a post. It leaves out majority of the aspects worth considering. I found your post insightful so thank you. I wonder if the downvoters press dislike emoji more often during code review.


Well thanks :-) (as I discover it now, there seems to be only a balance of -1, I guess it was more severe when you saw it). What's funny is that the day before, I got like 10 upvotes for saying how, when I occasionally, quickly browsed a few short patches for Linux kernel drivers, I found rather obvious mistakes which should have been caught by reviewers, if they did there work properly. Oh well... :-)

To sum up, I'd say I quickly looked at 2 of the most common sources of mistakes:

- off-by-one errors: here, in the sub-species which concern how the NUL terminator is counted (or not counted), whether it is by the code itself, or by calling standard functions which behave differently in that respect (sometimes for good reasons, sometimes not, but in all cases one should be careful and check their spec twice);

- error management: what happens, what could we get as a result when an input is not in a typical range, how can we deal with it; standard functions are not especially regular in that respect, one should check that aspect of their spec too, for it can be unintuitive (as in the case of strftime() here).

Then I explained how the consequences were not serious in that case. In a well commented code, I'd expect a small comment to explain why this check is not done, why one can emit that without much care, and so on. Otherwise, one cannot know if it works by design or by chance; also if future modifications should happen with those pieces of code, problems could arise.

Emitting a NUL character to a (pseudo) terminal is not serious. In the worst case it could mess the display for 1 or 2 lines before it recovers, but generally it won't even have a visible effect, the terminal will simply ignore it.

However, if that gets written or redirected to a file (here, a log file for example), then the NUL character is quite present. Since a civilised text file oughtn't to contain a NUL character, a program which expects well-formed text files may not be ready to handle it properly. For example, reading such a file with the standard C function fgets() or similar will land you in a world of troubles. fgets() only stops when it encounters a newline character, so the NUL character will be included inside the string returned by fgets() (here, coming right after a newline, it would be at the start of the next line); but when you'll ask for the string length, it will be 0 (NUL being the first character), and printing the string will be the same as printing an empty string. Basically, you have lost the line content.

So it is better to be as correct as possible from the start. It is also often easier than thinking exhaustively of all possible implications.


One thing I really appreciate about ncdu, that seems to be missing from its work-alikes (gdu, pdu,etc.), is the option to export scan results to disk & import them later. Not a feature that helps every day, but extremely handy sometimes.


Seems like a known limitation that may be addressed at some point:

> Exporting an in-memory tree to a file. Ncdu already has export/import options, but exporting requires a separate command invocation - it's not currently possible to write an export while you're in the directory browser. The new data model could support this feature, but I'm still unsure how to make it available in the UI.


Note that you can import _any_ data that's structured like a nested orderable hierarchy, it doesn't have to be a file that `ncdu` itself created. I've used this functionality to scan and then navigate S3 buckets, Artifactory repositories, and even git branches (where the "size" is the number of commits).


gdu added that feature just a couple of releases ago.


Wow! you weren't kidding, the feature appears in release 5.3.0 (5 days ago!).


For another cool Zig project also on HN at the moment, see River, a dynamic tiling Wayland compositor: https://news.ycombinator.com/item?id=27948650


Hi, thanks for working on an ncdu rewrite! In case you're looking for feature requests, there are a few things I wish that were supported OOTB:

1. breadth-first search? sometimes most of the directories are tiny and when one with a lot of files is found, I'd like it to be measured as the last one. I think that right now directory tree is explored using BFS, 2. add a way to display currently calculated directory tree state. Not sure about ncdu2, but ncdu doesn't show much apart from the current item, number of files and total size. I'd like to be able to use the approximate information it has gathered so far

And once again, thanks for the great work! I really appreciate it.


Slightly off-topic, but as much as I love ncdu and install it everywhere, I wish it didn't need to exist!

By this I mean: why are both Windows Explorer and all major Linux file managers incapable of showing total recursive directory size in bytes the same way OS X's Finder can? ncdu traverses home folders very quickly, why can't this be built in to a GUI file manager as an option, as Finder's Calculate All Sizes is?

I cannot be the only user for whom this is an absolutely indispensable feature --- one that actually keeps me from migrating entirely to Linux.


That would be nice, but ncdu is still needed when there is no gui.


Thanks so much for ncdu. I install it on all servers, because it’s so helfen when you need to find out what takes all the storage.



I recently use https://github.com/bastengao/gncdu which is in Go. Easy deploy.


I didn’t realize ncdu (1 or 2!) was still maintained. I have some fixes for WSL compatibility I need to upstream (but first, let me see if 2 already fixes them).


If WSL is not sufficiently compatible with Linux to run Linux applications, is that not something to be fixed in WSL instead?


WSL runs it just fine. ncdu has a bug in its hard link handling that causes it to recursively add the size of the same directory to the total, telling me my 120 GiB spare disk is consuming exabytes of space.


Ah, okay, so it's a fix for handling hard links rather than a fix for WSL compatibility, it's a problem that could show up just the same on non-WSL systems.


D vs Zig vs Go for practical, pragmatic development of possibly-cross-platform tools? Fight.


I don't like any of those languages very much, but Zig seems perfectly fine for this sort of stuff. From the very beginning, I've seen people say that Zig should be used to offer easy, human-readable rewrites of basic C utilities without sacrificing execution speed. Meanwhile, I think Go would be much too heavy for a program like this, and D seems to be a little out of it's depths in this conversation.


I'm not sure if it's really a problem for ncdu use case, but in Golang all the file I/O operations consume threads (in contrast with network ones), so is not easy to parallelize them: https://github.com/golang/go/issues/6817

Btw, you forgot Rust :)


In Rust, it's easy: https://github.com/bootandy/dust/blob/master/src/dir_walker....

(Dust is "du + rust").


CLIs: For now, go. In 3 years? Zig.


Nim


Nim


Nim!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: