Hacker News new | past | comments | ask | show | jobs | submit login
NFS > FUSE: Why We Built Our Own NFS Server in Rust (xethub.com)
141 points by ylow on Sept 19, 2023 | hide | past | favorite | 91 comments



For people who are interested in doing something similar in Go, some time ago I implemented a generic VFS that can be exposed both via FUSE and NFSv4.

It’s part of Buildbarn, a distributed build cluster for Bazel, but it can also easily be used outside that context.

Details: https://github.com/buildbarn/bb-adrs/blob/master/0009-nfsv4....

My recommendation to the authors would be to use NFSv4 instead of NFSv3. No need to mess around with that separate MOUNT protocol. Its semantics are also a lot closer to POSIX.


Very nice. The mount protocol is pretty minimal, though different OS clients seem to use it differently. The NFSv3 chattiness is definitely a thing, but since its all localhost its not "too bad". v4 will be nice and is something I will love to try to implement. Its quite a bit more complex though.


Another possibility I am interested in is SMB3. (SMB4 is rather nasty). Similarly, supported by most operating systems, and does not look too horrifying to implement (The protocol documentation is really really hard to read though). And has the advantage that it does not require Windows Pro edition.


It appears that Windows still doesn't have an NFSv4 client: https://learn.microsoft.com/en-us/windows-server/storage/nfs...


This is awesome. NFS is very much undervalued as a cross-platform interface for talking to a (not necessarily remote) filesystem.

~20 years ago I was using SFS (Self-certifying file system), which was a SSH-like Internet-usable TOFU filesystem, using NFS as the "backend" for talking to the host OS. The site has since disappeared, but is archived at https://web.archive.org/web/20080330201843/http://www.fs.net....


My only experience with NFS is it being excruciatingly slow. Is that not a universal issue?


If anything, it's probably one of the fastest ways to access files remotely that I have come across.


When i shared the same conrwnr over nfs and cifs from truenas scale nfs was faster by about 15%..


> XetHub has the world’s first natively cross-platform, user-mode filesystem implementation, allowing you to mount arbitrarily large datasets on your machine.

Not really world's first. CERN has developed EOS (https://eos-web.web.cern.ch/) for many years, and even though it's not available natively on Windows, it is available on Linux and macOS. EOS uses FUSE, though, not NFS.

> This enables you to, in just a few seconds, locally mount ~660 GB of Llama 2 models or write DuckDB queries to analyze large parquet files and scan just the data you need.

If you mount all instances of EOS at CERN on your machine with the FUSE client, that in principle mounts hundreds of PB of data from LHC experiments, although much of this data requires special permissions to be accessed. However, there's also a lot of open data. See https://opendata.cern.ch/.


If it's not available on the most popular OS, it's not cross-platform


The client is available on Windows via a samba gateway. There's also a sync client for CERNBox for Windows, and even Android and iOS (https://cernbox.web.cern.ch/cernbox/downloads/). The server is not available on Windows, though. However, I'd say that among physicists Windows is definitely the least popular OS.


Gluster did this about a decade ago, though not in Rust. Actually twice - home-grown NFSv3 and Ganesha-based NFSv4. At Facebook we used the NFSv3 version to run dozens of clusters serving tens or perhaps hundreds of thousands of machines. My own involvement with that piece was minimal (I was a maintainer of Gluster overall for a while) but it worked pretty well for those users who needed a real mountable filesystem with real files instead of vaguely file-like objects that let you down every time you try to step beyond whole-file read and write.


Nice! I did look at Ganesha when I was exploring other NFS server implementation I could perhaps borrow and simplify. It looked great, but pretty much all implementations are too "real-filesystem" for me to convert. I needed something which I can make a very simple virtual filesystem interface over, and so wrote my own.


(Love the Gluster project btw. Thanks for maintaining it!)


Thanks! I'm no longer a maintainer, or even working in tech, but still glad to share my experiences if folks might find them useful. I'm Obdurodon most places, @hachyderm.io on Mastodon, if you want to hit me up any time.



Anyway, interesting idea. As another comment pointed out, I wish there was an easy way to mount NFS and 9p shares as a user.

I've been toying with 9p+Wireguard lately, as an alternative to SMB/Webdav/sftp/etc for sharing files with friends over the Internet. Wireguard+NFS works well enough, but it's relatively hard to integrate in userspace. Toying with rust-9p[1], I obtained acceptable performance. I still have to experiment with putting Wireguard in userspace as well (probably with Tailscale's implementation), but I come back to the difficulty of mounting 9p and NFS shares as a user. I also wish it was simpler to export directories over both protocols, on an unprivileged port, with access rights limited by those of the user.

Regarding the article content, I find 9p to be even simpler and probably more ubiquitous than NFS, though I don't know if MacOS and Windows natively support it.

[1]: https://github.com/pfpacket/rust-9p


If you haven't seen it, Cloudflare also has an implementation of Wireguard in userspace, called Boringtun [0], written in Rust and successfully deployed to millions of iOS and Android devices running the 1.1.1.1 app.

For reference, another Rust project that depends on Boringtun is Onetun [1], which uses it to encrypt packets sent over a virtual smoltcp interface. I imagine you could follow a similar approach to integrating rust-9p with Boringtun, and you wouldn't need to leave the Rust ecosystem (whereas you might face more obstacles integrating it with Tailscale's wgengine, which is written in Go).

[0] https://github.com/cloudflare/boringtun

[1] https://github.com/aramperes/onetun


> Anyway, interesting idea. As another comment pointed out, I wish there was an easy way to mount NFS and 9p shares as a user.

Add e.g. foo.bar.com:/path /net/foo.bar.com nfs to /etc/fstab with options=user ?

> Regarding the article content, I find 9p to be even simpler and probably more ubiquitous than NFS, though I don't know if MacOS and Windows natively support it.

Windows certainly does not. MacOS I don't know, but if you mean "natively" as "stock install without third-party software", then probably not.


> Windows certainly does not.

Actually, scratch that, comments below suggest that WSL2 uses 9P for file sharing with the host, so it's possible that a version of Windows with the WSL2 bits installed will support 9P.


Hi! Author here. Happy to answer any questions!


Have you run any POSIX compliance checks against it? In particular, how are you handling assignment of inode numbers (fileid in NFS speak) to "virtual" files and making sure those stay reasonably consistent? (I vaguely remember this being a bit of a tripwire)

Also did you look at NFSv4 / what made you decide to go with NFSv3? My (superficial) impression was that NFSv4 did a lot of simplifying and throwing out legacy stuff (e.g. rpcbind), but I don't know too many details on this…

[edit: NFSv4 answered on parent / https://news.ycombinator.com/item?id=37575304 ]

FWIW I think NFSv4 would be a rewrite rather than an extension :D


I tested by uh... building a "mirrorfs" and compiling this project in it :-) (cargo does stress it pretty hard). I have not looked for POSIX compliance testers. Is there one you can point me to?

NFSv4 did throw out a lot of the legacy stuff, but also added a bunch of other state information like locks and delegation which is quite a bit more annoying to implement. Technically cos this is meant for "localhost-mounting-only", delegation should be easy to build (since we are expecting only exactly 1 mount). But v3 had far fewer APIs to implement, and so is a good starting point.


Note that there’s absolutely no requirement to support more advanced features such as locking and delegations. Even if a client offers to accept a delegation, a server may just deny/ignore it as part of OPEN.


There's https://github.com/google/file-system-stress-testing but in this case I believe it would mostly stress-test the NFS client rather than your code… and I have never tried to use it, no idea if it even does anything useful.

I know from lore that there are compliance-test suites for the POSIX file system API, but I believe those are commercial products :(


xfstests is very thorough and reusable beyond xfs.


If I may ask, is this mirrorfs available somewhere? The demofs from nfsserve seems to create and server a pure in-memory fs tree if I read the sources correctly.


NFSv4 is a completely different protocol and basically doesn't share anything with v2/v3. While technically it's still implemented using Sun RPC, it only has two procedures: NULL and COMPOUND. Version 3 is actually an RPC protocol, with a different procedure for each filesystem operation (READ, WRITE, CREATE, ...). One of the genius moves that Sun made with NFS was to distribute the protocol files (rpcgen) which autogenerate stubs (in C) making it easy to implement servers and clients. Nothing like that exists for v4.


Look at RFC 7531. XDR definitions for NFSv4 exists, and they can also be used to generate C bindings if you want.


Sure, running it through rpcgen generates bindings for both RPCs, NULL and COMPOUND:

    /// /*
    ///  * Remote file service routines
    ///  */
    /// program NFS4_PROGRAM {
    ///         version NFS_V4 {
    ///                 void
    ///                         NFSPROC4_NULL(void) = 0;
    ///
    ///                 COMPOUND4res
    ///                         NFSPROC4_COMPOUND(COMPOUND4args) = 1;
    ///
    ///         } = 4;
    /// } = 100003;


Which is also what you want, right? The issue is that with compound calls there is some state that’s carried over between operations (current/saved file handle), so you’d need to implement that yourself anyway.


Well yeah that's the difference, it doesn't generate the state machine for you so you "just" need to implement it yourself. For v3, rpcgen spits out a working function that you link into your program and you're done (on the client side anyway). Much easier.


> You simply need to implement the vfs::NFSFileSystem trait.

Wait, what......

.. So not only is it an easy-to-run Rust NFS server and client (I gave up on NFS some years ago because I couldn't figure out how to run Samba) but it's also meant to use as a _substitute_ for FUSE in stacks like SSHFS or an app wanting to serve debug data as files?

Awesome!


Exactly. You build the server-side (as a FUSE alternative), and just use the OS's NFS client to mount it.


Sweet! I've been monkeying around with writing an HFS interpreter (in rust ofc). Unfortunately 'fuser' is less than ideal on a Mac. This could be an interesting alternative.

Edit: Oh. It's async. The other bit that I liked from fuser: the default implementations in the trait cut down on the boiler plate. Setting the RO mount option got you defaults that DTRT.


You can interoperate async and threads, but yes you have to tolerate having Tokio or similar as a dep


I don't think this includes a client, in fact the post argues that not having to implement the client because it already exists in the kernel was a major benefit of the approach…


This library is amazing. Also very happy to see such a permissive license.

Btw, why did you decide to use NFS instead of WebDAV? WebDAV should be easier to implement and is also supported on all platforms.


I have not looked at WebDAV closely. I was looking for something that is already supported in all the major operating systems without additional libraries. I don't believe WebDAV is commonly supported out of the box?


WebDAV is supported by Windows, MacOS and Linux. On all platforms you can mount it without admin/sudo (`net use` on Windows, `mount_webdav` on macOS and `gio mount` on Linux). Here is official Go package[1] using which you can implement your own WebDAV server (you only need to implement your FS interface).

[1] https://pkg.go.dev/golang.org/x/net/webdav


Interesting. I do not know much about webdav. Great to learn something new! Will take a look.


WebDAV on Linux works over drumrull FUSE :D


I believe you can install an NFS client on windows home if you just go to programs and features > add a feature > NFS client, but I don't have a way to check.

The fact you just need to impl one trait is amazing. I will definitely be testing this out


You're right, there is no way to do it in windows home edition


Another option is plan9; Windows uses the p9 protocol to mount host/guest directories and drives for IO in WSL.


Wow I didn't know that! https://superuser.com/questions/1749690/what-is-this-weird-p... that is really interesting.


This is clever, considering the constraints. We use a fork of catfs write now to have a read through cache over NFS, but we could, in theory use a similar flow as here to do it and this will be a good example. Nice. Part of the problem, though, is that we currently use extended attributes, and I think you need NFSv4 plus a modern Linux to get those across NFS.


Doing some quick googling, I see https://lwn.net/Articles/353831/ which sounds like an xattr protocol extension to nfsv3. I do not know the extent to which it is supported though...


Couldn't make it work. But works with v4 on Linux 6.


NFSv3? Wouldn’t NFSv4 be more appropriate?


NFSv4 is quite a bit more complicated and is also stateful which makes it a bit more challenging to implement. NFSv3 is a good starting point, and is still supported on all the major platforms. Certainly, extending with a v4 implementation would be great future work.


Also keep in mind that NFSv4 is mostly adds necessary complexity. Not accidental complexity.

For example, NFSv4 supports accessing files that have been unlinked, but still have one or more open file descriptors. NFSv3 does not. It needs to resort to a technique named ‘silly renames’.

http://linux-nfs.org/wiki/index.php/Server-side_silly_rename


You will have to throw away almost all of your code for NFSv4, it's a completely different protocol. One of the advantages of 4.x is that you don't need the portmapper or mount protocols, you just need to know the hostname/port of the server.


You can run NFSv3 without the portmapper, and for most servers (like unfs3) the mount protocol is served by the same process as the filesystem, so it isn't two daemons -- just one.


Version 3 will be supported until the end of time. Version 4 was an evolutionary dead end that never took off or saw much adoption.


Funnily enough, "the end of time" for NFSv3 is in about 15-83 years, depending on whether the implementation uses signed or unsigned 32-bit integers for timestamps: https://lwn.net/Articles/717076/


True, but I define end of time as a 32-bit integer number of seconds from 1970.


The only sensible interpretation frfr


Version 4 has been implemented in and widely used in Linux, FreeBSD, Solaris, Microsoft Windows...

I'm not sure what value of "much adoption" you're using, but I think it's had great adoption.


It's widely implemented but not adopted. Most vendors implemented v4 to tick a box for some RFQ. I don't think any of the commercial fileserver vendors publish numbers but I used to work in that space and internally customer adoption of v4 was always a single digit percent. When I was on the customer side in HPC none of the supercomputer clusters were using v4 (I'm sure there are exceptions that I wasn't aware of).


I use it all the time. It's a huge improvement over NFSv3 for many reasons (performance, network ports, locking, ACLs, ...).

All the operating systems I names have fully implemented and mature NFSv4 stacks. Definitely not a MVP to tick a box.


That’s completely incorrect. NFSv4 was a quantum leap in performance; greatly reducing network traffic via COMPOUND and dramatically improving write performance via delegation.

All my file access is over NFSv4 (in a mix of 10 GbE and 100 GbE).

Having GP’s crate expand to support NFSv4 would be welcome but a nontrivial bit of work.


NFS4 is a huge ugly bloated mess of a protocol, and they keep adding more stuff to it.

NFS3 is small (I wouldn't say "elegant" but by comparison...), works, and doesn't have anything wrong with it.


NFSv4 has sane, good locking out of the box. Implementing lockd on an NFSv3 server is a big world of hurt.


Is there some way to configure NFSv4 not to silently corrupt data or surface transient network errors to userspace?

(Correct NFSv3 implementations block forever, which is the best of three bad alternatives.)

https://access.redhat.com/solutions/1179643


NFSv4 should never do that, and I've never seen it happen.


Then again, if you're using file locking on a distributed filesystem at non-trivial scale you're in for a big world of hurt anyway. Source: I was a developer and maintainer of exactly such a beast for over a decade.


Page is completely unreadable in Firefox for Android.


No, it's that dark reader doesnt play nice with the background texture on that site. Perfectly readable with dark reader disabled.


Thank you kind soul!


NFS seems to be a good fit for their use cases, however I don't believe that NFS supports an `inotify` like interface which makes it less useful for usecases like virtual filesystems for developer environments.

Maybe newer NFS revisions allow for the NFS server to notify clients of changes like this, which would be really open things up here.


NFSv4.1 and later do, via a callback operation named CB_NOTIFY.


That's great to hear. Thanks!


Very good point. Is there anything out there that's like NFS but also supports inotify? By "like NFS" I mean a remote filesystem protocol (not a block-level protocol) that works across platforms and can support nearly all filesystem features.


I guess these are used as a soft mount, considering that if you terminate the application or it crashes a hard mount would just hang indefinitely. That's probably fine for a read-only snapshot, most problems with soft mounts are around writes.


I wouldn't recommend "hard" mounts to anyone.


Well you trade system lockups with sometimes getting IO errors under load, which one is better ... shrug


Nice. A similar project, but more license-constrained, is fuse-t for macos: https://github.com/macos-fuse-t/fuse-t


Did i read correctly, nfs v3?

IIRC the statelessness of nfs was largely considered a design mistake, and there usually are a large number of hacks to overcome the consequences.

So much so that nfsv4 moved to a stateful protocol…


I don't know that its considered a "design mistake". It is not without flaws, but it does make it extremely simple to implement. Its a very acceptable point as a tradeoff between simplicity and performance. Will certainly love to implement NFS4, or even SMB3 at some point though.


So glad there is at least one other human being out there who understands what a huge step backwards NFSv4 was.

NFSv3+wireguard is approximately the perfect network filesystem protocol.


Care to elaborate on why NFSv4 was a huge step backwards compared to NFSv3? I'm not familiar with the details of either protocol.


So it’s a local file system driver accessed via NFS? Is that right? I get that NFS is on one side, but what’s on the other?


FUSE is used to present a file system interface of something that isn’t a file system. E.g. present an LDAP directory as a file system.

Writing FUSE drivers is pretty frustrating, though, and not well supported by all OSs.

My takeaway is that this project discards FUSE in favour of making an NFS server that can be easily extended to do the same thing: present non-file system systems as mountable nfs remote file systems.

It’s a neat idea, and does avoid any extra kernel drivers, etc. A safer abstraction at the cost of adding a network stack and NFS’s sometimes wonky effects (especially with hard mounts).


This NFS Server on one side, and the OS's builtin NFS client on the other.


It appears as a network drive on all major operating systems and can be accessed as any other local filesystem (through kernel syscalls).


This is interesting, is it possible to mount NFS as a client in an unprivileged user namespace on linux?


On Gnome you can use `gio mount` otherwise afaik no.


"I love files"

Who doesn't?


People waiting on a backup process to restore over a billion tiny files.


Uncompressed DPX video files. For those, a SAN exporting a block device (iSCSI, AoE) is preferable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: