Looks very promising. Anybody tested it yet? The benchmarks look phenomenal http...

mnw21cam · on Dec 1, 2021

I'd be very cautious about the benchmarks. For example, betrfs was measured performing 1000 4-byte writes into a 1GB file. It isn't clear whether there were any sync operations - there certainly wasn't a sync after each write, although there might have been a sync after the whole set of 1000. That speed up is a simple characteristic of a filesystem that is log-structured (so it is writing those 1000 events as a single sequential disc access) and doesn't store data in 4kB blocks (so it doesn't have to load the other 4092 bytes in the block before writing it). The filesystem I wrote in 1999 for my undergrad project would have done the same thing. One of the benchmarks I wrote for my system showed exactly the same amazing performance benefit. (My benchmark had me generate a tree of a thousand small files in a ridiculously short time - ext2 thrashed all over the disc doing the same thing.) Unfortunately it is unrealistically optimistic because that isn't a write pattern that is going to happen very often. Usually each small write will have an fsync after it. Unless you actually have a thousand writes without a separating sync, then this speedup isn't going to be realised.

I'm struggling to see how the find/grep benchmark could possibly have such a fantastic performance benefit for betrfs, given the fact that all those filesystems are effectively reading a tree or known-location structure. The only conclusion I can reach is that maybe the betrfs test had a hot cache and the others didn't. I could possibly be persuaded if betrfs keeps all its metadata in a small easily-cached part of the disc, but there are disadvantages to that too. I don't think this test is valid.

williamkuszmaul · on Dec 1, 2021

It seems like you may be jumping to conclusions a bit prematurely. The paper (https://www.cs.unc.edu/~porter/pubs/fast15-final.pdf) is very explicit that they start with a cold cache. They also go into detail for why they do well on grep. As I understand it (but I'm not an expert), betrfs's advantage here comes from the fact that it stores files lexicographically by their full names (and metadata), meaning that related files are stored nearby each other on disk. This gives better locality than what you would get with a standard inode structure.

Based on that, it seems like the outcomes of the tests are pretty reasonable.

mnw21cam · on Dec 1, 2021

I'll concede on the hot cache suggestion. Storing the files lexicographically is an interesting thing - it means that grep/find (or anything else that reads through the files/directories in order) would perform well. But this makes the test to some extent contrived to specifically run fast on this particular system.

I do agree that this kind of filesystem mechanism should give good performance benefits. But in the general case they won't be quite as fantastic as these benchmarks make out.

donporter · on Dec 1, 2021

Please note that the benchmark sources are also available, e.g.; https://github.com/oscarlab/betrfs/blob/master/benchmarks/mi...

marco_craveiro · on Dec 1, 2021

Indeed, sounds very interesting. However, from their github [1]:

> NOTE: The BetrFS prototype currently only works on the 3.11.10 kernel.

This is a tad limiting, hopefully they will port it to latest...

[1] https://github.com/oscarlab/betrfs

tyingq · on Dec 1, 2021

It's also currently stacked on top of ext4, and the tree data has to sit on some other filesystem as well. So promising design, but quite a long way from ready for production.

klyrs · on Dec 1, 2021

Not only that, but they need some fun patches...

> Our design minimizes changes to the kernel. The current code requires a few kernel patches, such as enabling direct I/O from one file system to another. We expect to eliminate most of these patches in future versions.

I prefer half-baked projects that are honest about their status over overpromised vaporware, personally

skyde · on Dec 1, 2021

how large are those random write? For video editing something like XFS perform very well. The reason is because you usually read a whole image at once and each image are very big.