WizTree is famously almost 50x faster than WinDirStat (on normal Windows NTFS drives) by reading the Master File Table (MFT) instead of walking the tree to measure each file.
WizTree isn't open-source like WinDirStat but "free as in beer" with optional donations.
What's the downside of just reading the MFT? Why doesn't Microsoft do it in file explorer, and why wouldn't every tool use it instead of walking through the file system? Maybe there's no downside but it's such a huge speed boost that it would be weird to not use it otherwise, right?
>What's the downside of just reading the MFT? Why doesn't Microsoft do it in file explorer, and why wouldn't every tool use it instead of walking through the file system?
One disadvantage is that you can't read the MFT of network shares or device emulators presenting "virtual drive letters" to the OS.
The typical (and slower) Win32 API functions FindFirstFile()/FindNextFile() used to iterate through the files structure work at a higher level of abstraction so they work on more targets that don't have an NTFS MFT. Indeed, if you point WizTree to a SMB network share, it will be a lot slower because it can't directly read the MFT.
It's conceivable that Microsoft developers could have programmed Windows Explorer differently to have an optimized code path of reading MFT for local disks and then fall back to slower FindFirstFile()/FindNextFile() for non-MFT disks. Maybe that adds too much complexity and weird bugs. I notice that most of the 3rd-party "Win Explorer replacement" utilities also don't read MFT.
> It's conceivable that Microsoft developers could have programmed Windows Explorer differently to have an optimized code path of reading MFT for local disks and then fall back to slower FindFirstFile()/FindNextFile() for non-MFT disks
Surely this would have been worth doing, even if it meant flushing out bugs elsewhere.
Along with the reasons others have mentioned, it would also bypass any filter driver in the file system stack (Windows has the concept of a stack of filter drivers that can sit in front of the file system or hardware) and would also ignore any permissions (ACLs) on who can see those files. There’s no way they can credibly use this technique outside of say something from SysInternals: it violates the security and layering of the operating system and its APIs.
Is there a Linux equivalent for those "filters"? I'm a bit clueless about win32 and NT sadly enough...
Would that mean that there's no way to "scope" the MFTs?
Edit:
That also makes sense, since if I got it right they aren't necessarily supposed to be consumed by userspace programs?
I guess that's why those tools always ask for admin access and basically all perms to the FS.
It's a bit sad that the user gets exposed to a much slower search and FS experience even if the system underneath has the potential to be as fast as it gets. And I don't think ReFS is intended to replace NTFS (not that it's necessarily more performant anyways)
There is no equivalent on Linux. That's why linux has no online antivirus scanners (scanners that scan the file as it's opened) while this is a basic feature of every antivirus program on Windows.
Linux has device mappers (dm-crypt, dm-raid and friends). But those sit below the file system, emulating a device. Window's file system filter drivers sit above the file system, intercepting API calls to and from the file system. That's super useful if you want to check file contents on access, track where files are going, keep an audit log of who accessed a file, transparently encrypt single files instead of whole volumes, etc. But you pay the price for all that flexibility in performance.
> That's super useful if you want to check file contents on access, track where files are going, keep an audit log of who accessed a file, transparently encrypt single files instead of whole volumes
Or if you just want to generally make the filesystem so slow that everyone has to invent their own pack files just to avoid file system api calls as much as possible.
Filters are vaguely similar to things like mountpoints overlaying portions of the filesystem. E.g. in Linux you might have files in /d1/d2/{f1,f2,f3} in the root filesystem but you also have a mountpoint of a 2nd filesystem on /d1/d2 that completely changes the visibility / contents of d2. Filter drivers can do similar things (although they are not actually independent mountpoints).
You need admin permissions to read the MFT on Windows. The traditional security model of both Windows and Linux assumes that the kernel is a security barrier between system and unprivileged user, and between different unprivileged users. An admin being able to bypass security restrictions isn't traditionally seen as a problem.
Indeed, only in very recent history has the admin/root user/owner been seen as a threat to the system and the system employs defenses against them. I'm hoping that trend reverses because I really hate the direction things are going.
There are pretty good reasons to do that. We've been really lax in what is allowed to run as root/admin when in reality, those permissions should only be used when doing things like reading the MFT or snooping on all the network traffic with Wireshark. It should not be required to run as root/admin in order to install most software because installing software is a very common thing to do.
Even if you want more control over your system, I still think technically capable people would be better served by having a separate administrator account from your normal day-to-day account which you have to explicitly log into (so no UAC prompts, you need to go onto that other account and then you get the UAC prompt). Unfortunately, I think most Desktop OSes are still too unusable with this sort of workflow due to how much software insists on admin for installation.
I largely agree. I think what makes the "the user is a threat" model so difficult to me is that there is a lot of truth to it. Users often don't know enough to make good decisions.
I really like your idea of logging in separately, such that is isn't something you're going to do cavalierly. That seems like a great compromise to me! I fully agree that we way overuse admin and really don't need it for the majority of things.
> it would also bypass any filter driver in the file system stack
The main use case for filter drivers is antivirus, and that is primarily about file contents not file metadata - so if MFT access bypassed filter drivers, that might not be a major issue. I think most non-antivirus use cases are also primarily about data not metadata.
If necessary, one could even devise a design in which MFT access is combined with filter drivers - MFT scanning to find matching files, then for each matched file access its metadata via standard APIs (to ensure filter drivers are invoked) before returning to client. That would be slower than a pure MFT scan but still faster than a scan done purely with standard APIs. A registry key could turn this on/off so sites can decide for themselves where to place the performance versus security tradeoff
> and would also ignore any permissions (ACLs) on who can see those files
They could expose an API which enables MFT scanning with some degree of ACL checking added.
If you do the ACL check as late as possible in processing the query, it would give much better performance than standard APIs that evaluate ACLs on every access. For example, suppose I want to scan a volume for all files with the extension ‘*.exe’. The API would only have to do an ACL check on each matching entry, not on every entry it considers.
There also might be reasonable situations in which ACL checking could be bypassed. For example, if I am requesting a search for files of which I am the owner, just assume the owner should have the right to read the file’s metadata. Or, if I have read permission on a directory, assume I am allowed aggregate information on the count and total size of files in that directory and its recursive subdirectories. These “bypasses” could be controlled by system settings (registry entries / group policy), so customers with higher security needs could disable them at the cost of reduced performance.
Rather than putting this in the OS kernel, it could be a privileged system service which exports an API over LPC/COM/etc. Actually with that design it isn’t even necessary to wait for Microsoft to implement this, it could always be implemented as an open source project, if someone felt sufficiently motivated to do so. (Or even as a proprietary product, although I suspect that would limit its adoption, and the risk is if it takes off, Microsoft would just implement the same thing as a standard part of Windows.)
Reading the MFT directly requires Administrator permissions, and doing it correctly means reimplementing support for every nook and cranny of NTFS including things like hard links, junction/reparse/mount points, sparse files, etc.
You call that a workaround but it’s basically the best possible situation security-wise. If this didn’t work securely then it wouldn’t be possible to implement disk defragmenter or even explorer. It’s so core to Windows NT’s security model that I wouldn’t call it a workaround.
You do similar things even with more modern stacks - assign a permission to an application and grant permissions to the application to the user.
The only real concern is that Windows NT permissions are not as granular as they could be.
> Windows NT permissions are not as granular as they could be.
For objects, Windows NT permissions are ridiculously granular; e.g. GENERIC_WRITE can be mapped to a half-dozen separately settable type-specific flags, depending on the object type (file, named pipe, etc.). It’s too granular for even an administrator to make sense of, arguably, and the documentation is somewhere between bad and nonexistent. (The UI varies from decent, like the ACL editor you can access from e.g. Explorer, to “you can’t make this shit up”, like SDDL[1].)
For subjects, the situation is not good, like on every other conventional OS. You could deal with that by introducing a “user” for each app, as on Android. But I’m not aware of any attempts to do that (that would expose this mechanism in a user-visible way).
(Then there’s the UWP sandbox, which as far as I tell is build with complete disregard of the fundamental concepts above. I don’t think it’s worth taking seriously at this time.)
I have no idea if there’s a granular object permission that could give access to the MBR of a disk. I’ve thankfully never had to dig that deep into Windows internals.
I’ve had to work with SDDL before to setup granular permissions for WMI monitoring on a whole lot of computers and my god, did it make me love the Cloud and Linux. I can’t emphasize enough how unintuitive setting these permissions is creates systemic over privileging.
Been using the portable version of 1.4 for decades after first coming across it in some PC magazine or something like that many years ago. Not terribly pretty, but it does what I need and it still works.
One possible reason is that it isn't a published part of the filesystem's external interface, and the format is not guaranteed to be static between versions or even point releases (though in reality, while the behaviours may be officially undefined that are unlikely to change significantly).
Also, it requires admin elevation to access. Anything running elevated is a potential security concern as it can access much else too.
> Why doesn't Microsoft do it in file explorer
Not sure, but it could be because that would be seen as an unfair advantage so to avoid anti-trust allegations they would have to publish the format and make stability guarantees for it, so others could use it as easily/safely. That, and the reasons above & below too.
> and why wouldn't every tool use it instead of walking through the file system?
Largely because walking the filesystem works for all filesystems, local and remote, so you cover everything with one tree walk implementation. Implementing a tree-walk over the MFT data where available is extra work to implement and support for one filesystem, and not many care enough, or are not aware of the potential speed benefit at all, for it to be a huge selling point such that all toolmakers feel compelled to bother.
> One possible reason is that it isn't a published part of the filesystem's external interface, and the format is not guaranteed to be static between versions or even point releases (though in reality, while the behaviours may be officially undefined that are unlikely to change significantly).
I am not going to pull every document, but the MFT structure is documented and published. I am uncertain what you mean by "external interface".
Though all the sub-pages of that state things like “[This structure is valid only for version 3 of NTFS volumes; it may be altered in future versions.]” — while it is true that any API could see breaking changes in future, this suggests that you should expect them, so I'd not call it supported in the same sense of the main file/directory access APIs which I would not expect to see breaking changes in (additional properties & functionality yes, but not existing things changing behaviour).
A lot of people talking about the details, does not constitute official documentation though.
You can find a lot of articles talking about SQL Server's DBCC IND and DBCC PAGE, but that isn't official documentation – they are essentially internal functions and not supported and could change or go away entirely despite having been around for many versions, as they have in Azure). Similarly there articles talking about sys.dm_db_database_page_allocations which sort-of does the job of DBCC IND, but again this is not officially documented & supported.
> I am uncertain what you mean by "external interface".
I meant the published interface. Maybe "supported API" would have been a better phrase to use?
Though as pointed out below, there is at least some official documentation on the MFT structure.
It's probably also racy to access the raw MFT while there are concurrent programs creating new files (or deleting files). That complication can be avoided by using the ordinary OS directory iteration primitives.
Yep but then the tradeoff of performance gains are completely discarded. The easiest solution is to take a snapshot with VSS, which is both fast and makes a quiesced copy of $MFT. From there, one could monitor FS changes if they wanted to have live updates.
With RAM sizes now, it's curious why any OS wouldn't just cache some or all of metadata for some local volumes on a block basis rather incur the greater resource usage of transforming disk into different structures, and then caching and track individual entries.
I am building an advanced filemanager (FileNinja) for Windows with full integrated everything search & query. you have the option of saving bookmarks to virtual folders that consist of everything searches. Instant directory sizes, tags, custom file descriptions for ntfs. Anyone interested?
https://youtu.be/JREufgkf5pk?si=sP05UCOrskpX8OTq
Try Everything 1.5a - an "alpha" version with many improvements, in development for years but inexplicably hidden away on their website. Never experienced any instability.
You should not be starting it when you want to search. You should open it when you log in, and leave it in the tray. It will do a full index on launch then subscribe to filesystem notifications to keep itself up to date for as long as it’s open.
Do that and it’s alarmingly fast and responsive except for the minute or two right after launch.
Contrasting seemingly all the other responses to this, I use it the same way you do (only opening it when needed) and I'm fine with the delay: even at its slowest rebuilding the index and searching is faster than the in-built windows Search.
WizTree also understands things like OneDrive and Dropbox, and know that files "stored in the cloud" aren't taking up any disc space -- WinDirStat thinks my drive is 140% full.
Wiztree and WinDirStat will both double count hard links. I have a 12TB hard drive holding "17TB" because of sparse files and hard links. Windows file manager properties agree with Wiztree and WinDirStat as far as space used. I think the file manager looks for free space and calculates that separately, while Wiztree and WinDirStat are just adding up used space.
Wiztree takes like 3 seconds where WDS takes 30. In realy big analysis and cleanup scenarios with rescans, it's enough to let you do your job faster. In every day scenarios, it removes any hesitation to visualize a system. It's basically free and near instant.
Fact is, WDS community must be kind of abandoned, or else it would be doing the same trick. It's SO much faster that it becomes a genuine quality of life improvement. I need it, and don't mind using a non free tool until the OSS solution has the capability.
Didn't try AltWinDirStat, but did try FastWinDirStat.
The thing is, FastWinDirStat uses a licensed propietary component. No problem for me, but the author did have some back and forth with another user on GitHub.
Seems FastWinDirStat license don't match with using a closed source library, or something...
As for its actual functioning, it does as it says. Works much faster than WinDirStat
Looks like a pretty clear violation of the WinDirStat license. They took WinDirStat which is GPL, linked it with some other proprieraty code and distributed the result.
(They could have been clear-ish (with caveats) by distributing only the source code and let the users do the compiling and linking, similarly to how you could download ZFS and build it into Linux. But you mustn't distribute the result further.)
You’ve got me interested but I’m finding it quite annoying that WizTree doesn’t actually have pictures of the software UI on the website. At least not under any of the obvious links I’ve checked.
SpaceSniffer's UI is less clunky, but Wiztree's scan is an order of magnitude faster. That kind of speed difference can affect when you're willing to use the tool.
I find myself much more willing to pop open wiz tree to get a quick view of my system or a particular storage folder.
They are getting very close to releasing windirstat-next [0] and already have some betas out, you can learn more about it in the subreddit. [1]
>WinDirStat fans,
>As a new pet project, I recently started some substantial revisions to WinDirStat in my GitHub branch and will work with current maintainer (Oliver Schneider) to eventually publish a new release hopefully in the next few months. The current changes on deck speed up performance drastically (seconds compared to minutes in some cases). It uses less memory compared to recent alternatives (WizTree), is faster as scanning network paths, and obviously isn't pushy about donating (although I certainly would not discourage folks from donating to their favorite opensource projects).
>Oliver recently opened up the GitHub Issues trackers, and I would love to hear suggestions or known bugs for the existing version:
>For the nerds interested in the changes I have queued up, you can visit the GitHub page
It really frustrates me when a project has a GUI and there are no screenshots of it on their github.
I know exactly what Windirstat looks like, I have no idea what this will/does look like and shouldn't have to install it to find out. I don't like Windirstats UI.
I have done this before, and it works, but the client downloads every single file locally in the background before it's able to do the comparison. I suspect this is because WinDirStat requests metadata about the file itself, and the sync client needs to download the files in order to serve that metadata and/or file contents information to the application.
About a decade ago when I worked in IT I used Treesize Pro to scan SMB volumes and delete the MOVIES that employees had downloaded on their computers, which meant they were downloading MOVIES to our SAN because we used roaming profiles. I'll never get over that these people downloaded literal pirated movies to our servers. It was constant. I ended up putting file size limits on their directories, or a file extension block, I forget.
The free version can't scan network drives if the servers on a domain IIRC.
edit: Oh I thought I read SMB drive not google drive for some reason. I'm not sure if this does that or not. It's my favorite GUI so worth checking out.
I like "ncdu", a TUI equivalent for Unixy systems.
Although I learned the hard way that if you run it on a Mac home folder, and have iCloud's "optimize Mac storage" turned on, macOS will suddenly try to download literally everything in your iCloud storage to try to count the size of it, probably filling your disk. Oops.
There's also "diskonaut", a TUI which displays the output like the treemap of WinDirStat. Bonus is that the display is incremental and updates as it scans everything, so you don't need to wait for the complete scan to see how everything looks like.
Written in Rust, and it's a `cargo install diskonaut` away if you have the rust toolchain installed.
I find Space Sniffer http://www.uderzo.it/main_products/space_sniffer/ to be a much better visualisation. It updates in near real-time. If you have lots of copy/move ops going in background, you will see those dirs blinking rectangles growing/shirking in Space Sniffer.
I used Space Sniffer for a very long time, but looked for alternatives because it crashes somewhat regularly and is generally a bit of a resource hog.
Since trying WizTree I don't think I'll ever go back, it's so much faster, hasn't crashed once on me, and the visualizations are completely adequate for me to be able to see where space is being used.
Also, SpaceSniffer is the only tool of all the popular ones that correctly accounts the size of files with NTFS compression, accurately accounts for directory junctions and links (by not counting them) and NTFS alternate data streams.
I really like the visualization, the functionality to watch live file access is so freaking cool. It's a bit slow though, no? After using something like WizTree, it's hard to go to something that needs minutes to finish scanning. I do wonder what kind of a performance impact it has.
Not open source (freeware), but much faster than WinDirStat for NTFS - WizTree https://diskanalyzer.com/. In short - it scans the actual filesystem metadata directly instead of enumerating files through the OS APIs, which makes it extremely fast.
Your first statement is already refuted by other commenters.
For the second, the reason is offical software support, internationalization and accessibility. Microsoft provides certain guarantees for its officially released software. They don't want to provide those for Powertoys.
I have used WinDirStat for years. It's not perfect, but it solves my use case very well.
My use case is just: my disk is full, I don't know why. This happens on one of my computers like once a year, so the fact that it's slow is fine. It usually helps me spot some folder set that is taking up a lot of space that I don't need on that PC, or something large that is duplicated.
My personal favorite example is wedding photos and videos. Turns out: those are huge, I am not going to delete them, but they don't need to be backed up on every computer I own.
Treemaps are generally a really cool way to visualise hierarchical data. See also the Observatory of Economic Complexity, which has treemaps of international trade and economic statistics.
There are quite many apps for visualizing disk folder structure for almost any OS. Any flavor you like: lists (diskwave, omnidisk), treemaps (GrandPerspective), pie charts (DiskSavvy), sunburst (baobab), icicle, etc.
The two winning visualization types are sunburst and treemaps.
Both have their own cons and pros, but in our tests user sunburst performed slightly better for regular users.
My personal bet is that no disk space analyzing tool's developer took it seriously or tried to actually advance the algorithms. Most of the apps I know use quite straightforward implementation and haven't been touched for years. Guess a little bit of filtering, grouping and changing coloring algorithms could significantly improve the treemap's perception, but someone has to do the job.
disclaimer: I'm the original designer of DaisyDisk.
Same for me. Before using it I was using WinDirStat and before that SequoiaView, but I've now stuck to using only TreeSize Free for years now. It's good enough and the user interface is very clean.
It still tickles me that many Linux distros come with a tool like this preinstalled (https://wiki.gnome.org/action/show/Apps/DiskUsageAnalyzer). Ah, the good ol' Windows days of having to hunt down tools that were actually useful. Far in the rearview mirror for me.
Note: There’s a gotcha when using it in multi-user environments (like a server).
Users with Administrator access do not have permission to enumerate directories / files inside other Admin users home directories. So any per-user files are not counted in this scenario.
Source: ‘The mysterious case of the Windows server with a full disk but WinDirStat shows it as only half-full’ :-)
I use WinDirStat. At one point a number of years ago, I became curious about the parent program KDirStat. So I actually got that installed on some Ubuntu or something. It was interesting. Like a prototype for WinDirStat or something.
Wow. A system utility that reads the Windows MFT is about the last thing (after system drivers maybe) that I would expect to work under an emulator on Android.
It's a longtime friend of mine who has kept my computer from wasting space.
But it's pretty outdated, and I think there are better programs out there now.
I like Diskitude by Evan Wallace the most for this kind of quick and easy drive content sizes inspection.
It does full scan, but is pretty fast and easy to use. And is super tiny.
I was about to share this, I just told someone about it the other night, been using it well over a decade now without any issue, the same exe I downloaded in 2011 or so.
- https://github.com/bootandy/dust (command line, extremely fast)
- mate-disk-usage-analyzer (gui/gtk, a bit more intuitive and allows operating on files too).
I'm a fan of WinDirStat. Yes it can be slow, but it runs on darn near every Win OS, hasn't changed in years, is a small executable, and the site always seems available. I used it many times to solve disk space issues on the job.
It's likely the product is now abandoned-ware or no longer developed. I'm sure this is the same version (or just a point different) that I had on my PC about a decade or so go. That said, that version worked OK.
Still used in Windows IT environments as it's portable, even if it is a bit slow and there is no console version for generating tree-maps for remote viewing.
It is known to run on Windows 95 (IE5), Windows 98 SE, Windows ME, Windows NT4 (SP5), Windows 2000, Windows XP, Windows Vista, Windows 7, 8 and 8.1.
...it is also known to run on Windows 10 and 11, and likely any newer versions too. IMHO this is a great example of how software should be. One tiny binary that is very widely compatible, doesn't have any user-hostile "features", and remains stable and bug-free. It's a contrast from an industry that largely can't produce such achievements, is becoming increasingly hostile, and quite telling when there are already comments here complaining about its age.
WizTree isn't open-source like WinDirStat but "free as in beer" with optional donations.
There's also a fork of WinDirStat patched to read the MFT but I don't know anyone who's tried it: https://github.com/ariccio/altWinDirStat