Hacker News new | past | comments | ask | show | jobs | submit login
Freqfs: In-memory filesystem cache for Rust (docs.rs)
142 points by haydnv on Oct 1, 2021 | hide | past | favorite | 76 comments



Musing: sometimes I wish file systems & databases were the unified. I'm imaging just a single fast db engine sitting on my storage—and your traditional file system structure would just be tables in there. I kinda just treat SQLite like that, but it's not as transparently optimized as it could be for large files. Why? I don't want to mentally jump around technologies. I want to query my FS like a DB, and I want to store files in my DB like a FS. The reality is though that there isn't a one size fits all DB that exists.

And more on topic: tokio-uring is really fast [1], and I'm really loving tokio in general.

[1] https://gist.github.com/munro/14219f9a671484a8fe820eb35d26bb...


We've been there. Before we had filesystems as we know them today, there were many different ways of persistent data storage. Roughly these could be grouped into two camps: The files camp and the records camp.

The record based approach had many properties we know from modern databases. It was a first class citizen on the mainframe and IBM was its champion.

In my opinion hierarchical filesystems won as everyday data storage because of their simplicity and not despite it. I think the idea of a file being just a series of bytes and leaving the interpretation to the application is ingenious. That doesn't mean there is no room for standardized OS-level database-like storage. In fact I'd love to see that.


I've always been annoyed how searching a few hundred thousand NTFS records for a filename containing some arbitrary text takes a relatively long time - even using specialized tools like FileLocator Pro which I believe directly scan low-level structures like the MFT - while I can do an equivalent search in a SQL database in milliseconds. I wish filesystems like that one had vastly more performant indexing structures for the metadata (without relying on add-on layers that defer indexing and - at least in my experience - tend to break down or be out of date or obfuscate files they think you don't care about - I'm looking at you, Windows Search).


I've been using a free tool by voidtools tool called "Everything" for years, it provides almost-instant search on windows NTFS volumes: https://www.voidtools.com/support/everything/


Thanks. I used it years ago but it didn't suit my needs. I haven't found an indexing tool that does.

It's the difference between synchronous indexing that's baked into the system (as in file system metadata structures and database indexes, which update at the same time your data is changed) vs. fragile add-ons that index asynchronously (which in general I find tend to be too slow to update, missing results, and prone to breaking).


The lack of structure is ultimately why writes to filesystems are comparatively fast and reliable, though.


well, there is a good reason they cant do that. they dont know what the files are.

perhaps filesystems should be extensible in a way that supports indexing intelligently.

(i know you mentioned an aversion to addons)


Modern filesystems in a way combine both approaches - they store the data unstructured but give the ability to also store metadata (attributes) in a structured way.


That functionality is mostly an afterthought in ext and ntfs, though. More of a big deal for Apple with hfs but still not something you'd build a database on.


> I think the idea of a file being just a series of bytes and leaving the interpretation to the application is ingenious.

the file as an opaque box for applications to store a real data structure is poisonously anti-file. it's totally what files are, what we think of them, but imo, systems like 9p, or linux's procfs or sysfs are The True Way for files: small discrete pieces of data which are part of a system of directories tlthat express a larger compilated hierarchical system of data.

Files won, but only the stupidest wrongest version. Easy to copy and manage but utterly useless on their own, unscriptable, pointless eithout their complex applications there to use them.

I dont think db's/records are that interesting either. i think we just need to really try files. Fine grained files. As opposed to these big ole blobs the OS cant really interact with.


Don't you need mandatory locking of files and directories, or rather powerful transactional semantics for the filesystem then?


A lot of filesystems have snapshots. NTFS & others have transactional capabilities. I don't regard locking as necessary or helpful when the OS can provide these capabilities.


There aren't really any hurdles to implementing an SQL database on top of a plain block device, are there? So I wonder why no-one has gone there. This would allow the database server to do caching in a way that makes sense for the database and not have to hope that the filesystem cache does the right thing.


Commercial databases routinely do exactly that. O_DIRECT basically exists because Oracle needed it


Are you thinking of something like WinFS?

https://en.wikipedia.org/wiki/WinFS

Or more like Beos BFS with its extended attributes, indexing and querying?

https://en.wikipedia.org/wiki/Be_File_System

Also I think a lot of the old mainframe filesystems had the concept of records and indexes built in since they were primarily used for business operations.


Your filesystem is a database. It's just a document-oriented database, rather than relational SQL.


It even has a lot of the same features as a full fledged DB.

For example, most file systems today are journaling. Which is exactly how most databases handle atomic, consistent, and durability in ACID.

About the only thing it's missing is automatic document locking (though most file systems support explicit locks).

That said, there are often some pretty hard limits on the number of objects in a table (directory). Depending on the file system you can be looking at anywhere from 10k to 1 billion files per directory.

There are also some unfortunate storage characteristics. Most file systems have a minimum file size of around 4kb, mostly to optimize for disk access. DBs often pack things together much more tightly.

But hey, if you can spin using the FS as a DB... Do it. Particular for a read heavy application, the FS is nearly perfect for such operations.


The biggest problem is the lack of good transactional facilities.


you can lock directories, you can atomically swap directories (on linux), CoW filesystems make cloning kind of cheap. That could be used to implement transactions and commits. Getting the consistency checks/conflict detection during the commit right would be the most difficult part. Change notifications could be used to do some of that proactively. It's a terrible idea, but it could be done.


There is a transactional API for NTFS in Windows [1]. It allows transactional operations not just within a file but also across files or across multiple computers (to make sure something is applied to your whole fleet atomically).

1: https://en.wikipedia.org/wiki/Transactional_NTFS


Yup, the I in ACID is a bitch :)


You mean it's always been NoSQL?

Astronaut with gun: always has been.


It doesn't really support transactions very well, though.


True, but that's not at all a requirement for a database, and MVCC can be built on top where needed.


I have the exact same wish. On top of that, i'd wish for application data to be stored in the system database by default, neatly namespaced and permissioned, so that you can allow for greater interoperability if desired and manually query and combine data across different applications.

There was some research being done on the concept of a db as a filesystem: https://youtu.be/wN6IwNriwHc


We actually did work on this a few years ago but did not get enough takers for it. We created a one size fits all database, that leverages the full capability of the file system.

Try it here: https://github.com/blobcity/db

PS: I am the chief architect of the DB, and the project is no longer being actively maintained by us. But if you make a contribution, we will oblige to review and merge a PR.

Bottom line, nothing you do can make your database faster than the filesystem. So why not make a database that just uses the filesystem to the fullest, than creating a filesystem on top of a filesystem. BlobCity DB does not create a secondary filesystem. It dumps all data directly to the filesystem, thereby giving peak filesystem performance. This is scientifically really the best it gets from a performance standpoint. Not necessarily the most efficient in data storage / data-compression standpoint.

This means, we gain speed, while compromising on data-compression. We produce a larger storage footprint, but are insanely fast. Storage is cheap, compute isn't. So that should be okay I suppose.


Wasn't this what Microsoft was working on with WinFS in Longhorn which later became Vista but without the WinFS part?

And I think ReiserFS was also working towards this but got abandoned for obvious reasons.


I remember watching a talk about that: https://www.youtube.com/watch?v=wN6IwNriwHc

Previous HN discussion: https://news.ycombinator.com/item?id=20394088


Yeah I just learned about tokio-uring and I'm planning to get it into the next major release of freqfs


until an underlying change in technology happens and then you wish they were no longer unified (rust to ssd to nvme, for example).

I would prefer more pluggable interfaces personally.

(hi Ryan, long time no see!)


Helloo Jerry!!! Great to hear from you!!


> freqfs automatically caches the most frequently-used files and backs up the others to disk. This allows the developer to create and update large collections of data purely in-memory without explicitly sync’ing to disk, while still retaining the flexibility to run on a host with extremely limited memory.

Why not let the OS take care of this?


One advantage is consistency across host platforms, but the main advantage is that the file data can be accessed (and mutated) in memory in a deserialized format. If you let the OS take care of it, you would still have the overhead of serializing & deserializing a file every time it's accessed.


That's what mmap is for.


It might be possible to replace freqfs with mmap on a POSIX OS, but a) you would still have to implement your own read-write lock, and b) you would (I think probably?) lose some consistency in behavior across different host operating systems.


Which OSes does this run on that doesn’t have some kind of mmap operation?


It should work on Windows (because tokio::fs works on Windows) although I have not personally tested this


You can do mmap on Windows, eg. https://github.com/danburkert/memmap-rs


mmaps for read, explicit API for writing, a-la LMDB. Buggy readers can read inconsistent data but cannot corrupt the os.


Corrupt the OS? How might that happen?


Sorry, I meant the DB!


Personally I don’t see a scenario for myself, but I can imagine that there are some where this might be useful. But isn’t there a extremely high risk of data loss an inconsistency when adding an extra layer on top of OS file system handling?


Freqfs seems like a shim you'd add to an existing project for a quick optimization. Whereas mmap et al. are "better" the same way any specific, built-to-purpose code will be "better" than just bolting a framework on. Sometimes it's the right call to do the extra work; sometimes it's 100% more effort (both development and maintenance) for an extra 10% gain.


If there is any concurrent access to cached files not through freqfs, there is a risk of inconsistency and crashes.


You can pick and chose.

Maybe your caching strat of you OS isn't best for your use case. Also, you may use a network file system, or several types of FS, and want your cache warm up to be tuned up and consistent.


> Maybe your caching strat of you OS isn't best for your use case.

On the other hand the OS does know about memory pressure from IO and from heap memory for the whole system. This crate will only know about cache pressure within a single process.

> Also, you may use a network file system

Which can also be set to do aggressive caching, at the expense of consistency.

> and want your cache warm up to be tuned up and consistent.

the description doesn't say that it's doing cache warmup any more eagerly as regular reads would


The benefit of bringing something in process is as always more control, and usually at the expense of having to make decisions with less data about the rest of the system than an OS level service would have.

Sometimes you need very explicit control over when things are read from cache and when they aren't. This can be hard with network file systems. Especially when you have two different use cases on the same filesystem, which isn't that odd, even within a single application.


Presumably for a similar usecase as SQLite [0]. Performance. You can beat the OS, and by a noticeable margin, by doing things in memory and avoiding the I/O bottleneck.

[0] https://www.sqlite.org/fasterthanfs.html


I think GP's point is that the OS usually has a file system cache in RAM.


I think the P's point, supported by evidence, is that the OS cache is not optimal for all use cases.


No cache is optimal for all use cases. That's an impossible goal.


Thus why things like Freqfs exist and we don't always "let the OS take care of this."


Yea friend, we're in violent agreement :)


the actual question is whether the person making the choice is up to the task of measuring/proving that their manual caching is optimal for their use case. A lot of the time the answer is simply "no". For example, the people who say "just buy enough RAM and turn virtual memory off" for the most part do not understand the implications of what they are talking about.


My thought exactly.


Upon seeing this, I can’t help but think of “So What’s Wrong with 1975 Programming” from the author of Varnish [1].

[1] https://varnish-cache.org/docs/trunk/phk/notes.html


I realized based on several of the comments here that I should have included a comparison with OS filesystem caching in the documentation for freqfs. I will update this in the next release.

The major advantage of freqfs over just letting the OS handle file caching is that with freqfs you can read and mutate the data that your file represents purely in memory. For example, if you implement a BTree node as a struct, you can just borrow the struct mutably and update it, and it will only be synchronized with the filesystem in the event that it's evicted (or you explicitly call `sync`). This avoids a lot of (de)serialization overhead and defensive coding against an out-of-memory error.

Again, I will update the documentation to clarify.


This is called mmap?


mmap does map file contents into memory but does not provide read-write locking. The main limitation is that a file is not necessarily the same thing as the Rust data structure that it encodes. For example, if you store a Rust Vec in a file, loading the file with mmap won't allow you to `push` a new item onto the Vec.


I honestly don't think I like this at the application level. You're removing a degree of freedom from operators and users. I have a ton of memory on all of my home devices, and usually just take the working directories for frequently used applications and mount them as tmpfs. I do the same thing for application working directories of applications I deploy at work as well, where we have complete freedom to deploy memory-optimized servers with lots of RAM. Putting an extra in-memory cache on top of the OS filesystem that is already in-memory is an unnecessary extra step and doubling the memory use of each file that can't be turned off without patching and recompiling your application. The OS is already smart enough not to add a cache on top of tmpfs.


I don't know that it's fair to say it's "doubling" the memory use of each file because the OS cache memory is still "free" from the perspective of an application. Where it comes in handy is an applications like databases or training an ML model where there are hot spots that get accessed/updated extremely frequently--then the application doesn't have to incur serialization overhead in order to read/write the data that the file encodes (although as another poster pointed out it might also be possible to do this with mmap).


Is this some kind of excercise or what? I mean, my OS already has filesystem cache why would I need another one? More buffer bloat?


> This crate assumes that file paths are valid Unicode and may panic if it encounters a file path which is not valid Unicode.

Love it when a program could simply work, but chooses to fail because it doesn't like my life choices.


This will hardly increase performance if any. Note that OS caches frequently used files in-memory. When you are using this, you are basically competing with the OS for in-memory file cache.

This library might have other uses that I am not aware of.


are there embedded systems, including potentially realtime, that are running OSes or on CPUs that don't provide caches or page fault support?


In those cases, the filesystem provides its own block cache.


Unfortunately this doesn't meet my use case (which they list as an intended use case): serving static assets over http. I currently use an in-memory cache without eviction. It doesn't meet my requirements because I store the in-memory content precompressed

https://github.com/serprex/openEtG/blob/master/src/rs/server...

edit: seems it can. Nice


I think if you call your precompression function in FileLoad::load it should do what you need--please file an issue if this is not the case: https://github.com/haydnv/freqfs/issues


I think I don't understand the problem. Precompressed files are still files and can be cached.


I think what they mean is that they want the file on disk to be uncompressed but the file in memory to be compressed.


Don't filesystems already do this? Like, really really good?


Why did you link to doc.rs instead of to the source repository?


requires tokio / async. ugh. I'm out.


Why hasn't anyone told me about this?!?!? I love you so much for posting, I needed something like this for a personal project I'm fiddling around with in my spare time.


That's great! Please let me know how it goes! Feel free to file any bug reports or feature requests here: https://github.com/haydnv/freqfs/issues


They just did.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: