According to Ted Ts'o (maintainer of ext4):
"the main reason why we did the ext3 -> ext4 fork was that adding 64-bit numbers required major surgery, and we didn't want to break a lot of production users who were using ext3. But from a file system format perspective, ext2, ext3, and ext4 are the same logical file system."
(http://article.gmane.org/gmane.linux.file-systems/97986)
> "The ext4 driver has been able to register itself as ext[23] for quite some time now, so it's transparent."
itself imply that ext4 merely builds on ext2/3 ? That is, there shouldn't be too much compatibility code, since 3/4 are logical extensions of ext2.
The need for a split was explained in the article and more by Linus: it wasn't incompatibility but the scary possibility for instability in a filesystem that most people used as the canonical linux filesystem.
edit: think of ext3/4 as a fully backwards compatible 'branch' that is only now being 'merged' back in.
IIRC once you mount an ext2/3 volume as ext4, you can't go back.
ext4 has made a lot of improvements, but has also had a lot of corruption issues, even if corner cases. If you don't need extremely large files or extremely cluttered directory structures, you don't _need_ ext4, though I haven't compared directly in a few years, and I suspect a lot of these stability issues have shaken out.
The worst I experienced was kvm guests with ext4 filesystems stored in files on ext4 filesystems causing the not only the guest, but often the _host_ filesystem to go corrupt. Obviously, using LVM volumes or basically anything but filesystem images for production VMs would have avoided this, but the infrastructure I supported at the time already had a lot of this in place and too many other problems to solve to focus on it.
If you actually mount it as ext4 and enable the new features, yes. However, if you mount ext2 or ext3 using the ext2/3 implementation in the ext4 driver, that doesn't break backward compatibility with ext2/3.
It's actually really interesting, some of the backwards compatibility features. For example, directory hashes that allow fast name lookup are hidden inside of "empty" elements of the previous list format so that older drivers can still find files in the newer versions of the file system.
Many structures also include their size, so that drivers can tell how much functionality any given element supports by which fields it contains or omits.
reminds of FAT, where clever scheme was put to keep the long file names in between the short ones, using several records, all begining with \0x00 so that they are ignored.
I remember that as well. I took a while to upgrade to ext3, and a much longer while to upgrade to ext4 (long after ext4dev became ext4). For most types of software, I like novelty. For filesystems, I like boring; boring is good.
I remember choosing between Xia and Ext2 when I first installed Linux. Both had just came out that year if memory serves. Friend tried to push me to use Xia, but I went the other way. Had more headaches initially, but time shows which one won out in the end.
Thats funny, I was just remembering installing one of my first Linux distros where I had to choose between 2.2 or the brand new 2.4. I remember booting between them to see which one worked best with what I had.
I suppose I had to decide to deal with ext3 not long after that.
yes, me too. I remember living on the edge with the 2.4, I used Slackware. I recompiled the kernel once or two times a day :/
Always patching nvidia driver (in cli) because I cannot access X11 without driver :D
> For a while, some thought that might be a filesystem called reiser4, but that story failed to work out well even before that filesystem's primary developer left the development community
This was a nice toungue-in-cheek for Hans Reiser going to prison
Reading the comments, I'm guessing I was the only fan of SGI who saw supercomputers used XFS (or variants), figured it would handle my lowly workloads, put it on my machines, worked around its few weaknesses, and just never thought of filesystem choice again aside from boot partition.
Still my favorite filesystem for what was my favorite supercomputing & NUMA company.
But honestly, over the last years i have a strong feeling for the old, fast, stable (and most importantly simple!) filesystems like ext* or xfs.
Just yesterday i had btrfs telling me that my disk is full (when in fact it is half full with nearly 400GB free!).
Also have seen some major, hard to resolve problems with ZFS years ago.
It led me to believe, why the hell we need those ultracomplicated can-do-everything filesystems?! The only thing i actually care for is fast access without some complicated compression, deduplication and what-not steps in between and data safety. And as far as i can tell, ext4 and XFS are pretty safe. Never seen those destroy files. Ever.
On the other hand finding problems and debugging filesystems in kernel can be a major headache. I want my filesystem to as simple as possible, nowadays. ZFS and btrfs is all nice and so on, but next time i look for a filesystem, i'll go back to the established and simpler ones.
I've had the opposite impression of ZFS - it's saved my data on a number of occasions when ext* would have failed completely. I personally think Sun's engineers did an amazing job with ZFS. Plus while the file system is hugely complex, management of it is remarkably straightforward - which leads to less user / sysadmin error when running it.
Sadly though, I do agree with your judgement of Btrfs. I trialled it for about 6 months and found it to be conceptually similar to ZFS but very much it's practical polar opposite.
However, like with any software decision, the right file system choice depends as much on the platform's intended purpose and the administrator, as it does on age and features. For file servers, ZFS is an excellent choice; but for low footprint appliances, you'd be better off with ext4 or xfs. And desktops is just a question of personal preference.
I'm not an expert (but, so most people are no experts in this area), but i have a feeling that ZFS needs huge amounts of memory (compared to ext). It does nifty stuff, for sure. But do i need all of that? Hardly.
For example, i am wondering why i would want to put compression into the file system. Or deduplication. Or the possibility to put some data on SSD and other on HDD. If i have a server and user data, it should be up to the application to do this stuff. That should be more efficient, because the app actually knows how to handle the data correctly and efficiently. It would also be largely independent of the file storage.
I've seen some cases where we had a storage system, pretending to do fancy stuff and fail on it. And debugging such things is a nightmare (talking about storage vendors here, though).
But for example, a few years ago we had major performance problems because a ZFS mount was at about 90% of space. It was not easy to pinpoint that. After the fact it's clear, and nowadays googling it would probably give enough hints.
But in the end I would very much like that my filesystem does not just slow down depending on how much filled it is. Or how much memory i have.
edit: Also, just to clarify. I think that Sun had one of the best engineers of the whole industry. Everything they did has been great. Honestly, i have huge respect for them and also for ZFS. I still think that ZFS is great, but in the end, i am wondering if it is just too much. For example, nowadays you have a lot of those stateless redundant S3 compatible storage backends. Or use Cassandra, etc. Those already copy your data multiple times. Even if they run on ZFS, you don't gain much. If you run ext4 and it actually loses data, the software cares about that. That's just one case, and of course it depends on your requirements. Just saying, those cases are increasing where the software already cares for keeping the important data safe.
> I'm not an expert (but, so most people are no experts in this area), but i have a feeling that ZFS needs huge amounts of memory (compared to ext). It does nifty stuff, for sure. But do i need all of that? Hardly.
ZFS needs more memory than ext4, but reports for just how much memory ZFS needs is grossly over estimated. At least for desktop usage - file servers are a different matter and thus that's where those figures come from.
To use a practical example, I've ran ZFS + 5 virtual machines on 4GB RAM and not had any issues what-so-ever.
> For example, i am wondering why i would want to put compression into the file system.
A better question would be, why wouldn't you? It happens transparently and causes next to no additional overhead. But you can have that disabled in ZFS (or any other file system) if you really want to.
> Or deduplication.
Deduplication is disabled on ZFS by default. It's actually a pretty nichely used feature despite it's wider reporting.
> Or the possibility to put some data on SSD and other on HDD. If i have a server and user data, it should be up to the application to do this stuff. That should be more efficient, because the app actually knows how to handle the data correctly and efficiently. It would also be largely independent of the file storage.
I don't get your point here. ZFS doesn't behave any differently to ext in that regard. Unless you're talking about SSD cache disks, in which case that's something you have to explicitly set up.
> I've seen some cases where we had a storage system, pretending to do fancy stuff and fail on it. And debugging such things is a nightmare (talking about storage vendors here, though). But for example, a few years ago we had major performance problems because a ZFS mount was at about 90% of space. It was not easy to pinpoint that. After the fact it's clear, and nowadays googling it would probably give enough hints. But in the end I would very much like that my filesystem does not just slow down depending on how much filled it is. Or how much memory i have.
ZFS doesn't slow down if the storage pools are full; the problem you described there sounds more like fragmentation, and that affects all file systems. Also the performance of all file systems is also memory driven (and obviously storage access times). OS's cache files in RAM (this is why some memory reporting tools say Windows or Linux is using GB's RAM even when there's little or no open applications - because they don't exclude cached memory from used memory). This happens with ext4, ZFS, NTFS, xfs and even FAT32. Granted ZFS has a slightly different caching model to the Linux kernel's, but file caching is something that is free memory driven and applies to every file system. This is why file servers are usually speced with lots of RAM - even when running non-ZFS storage pools.
I appreciate that you said none of us are experts on file systems, but it sounds to me like you've based your judgement on a number of anecdotal assumptions; in that the problems you raised are either not part of ZFS's default config, or are limitations present in any and all file systems out there but you just happened upon them in ZFS.
> i am wondering if it is just too much. For example, nowadays you have a lot of those stateless redundant S3 compatible storage backends. Or use Cassandra, etc. Those already copy your data multiple times. Even if they run on ZFS, you don't gain much.
While that's true, you are now comparing Apples to oranges. But in any case, it's not best practice to be running a high performance database on top of ZFS (nor any of CoW file system). So in those instances ext4 or xfs would definitely be a better choice.
FYI, I also wouldn't recommend ZFS for small capacity / portable storage devices nor many real time appliances. But if file storage is your primary concern, then ZFS definitely has a number of advantages over ext4 and xfs which aren't over-complicated nor surplus toys (eg snapshots, CoW journalling, online scrubbing, checksums, datasets, etc).
> why the hell we need those ultracomplicated
> can-do-everything filesystems?!
We don't but some people do. Or think they do, but that's the same. It's not like I'm losing something because I can always use the simpler filesystems myself. I went with ReiserFS for over a decade: never really had to think about my file systems which is what I want, after all. Once you find a good file system, you stop thinking about file systems.
For me, this means two things.
I want my file system to be safe and transactional: either my change gets in or it doesn't, but I don't want to find my file system in some degraded in-between state. Ever. I'm willing to pay for that with cpu time or i/o speed, that's like the first 90% of my requirements.
The second criteria is that it's generally lean and doesn't do anything stupid algorithmically. It should support big files, provide relatively fast directory lookups (so that 'find' will run fast), and have some decent way of packing files onto the disk that doesn't defragment the allocations too badly, and ideally do some book-keeping during idle i/o so that I don't really have to run a defragmenter, ever. But these are kind of secondary requirements that aren't worth anything unless the file system keeps my files uncorrupted and accessible first and foremost.
Very reasonable requirements. Same here. I just want it to store files, retrieve files, have decent performance, and never screw up in a way that prevents recovery. I'd hope the baseline Is that so much to ask in 2015? ;)
As someone who has lost data on _EVERY_ single linux filesystem listed in this thread, I can say what I want out of a filesystem is code that hasn't changed for years. Once the "filesystem experts" move on to the latest code base, then I start to feel confident about the stability of the ones they left behind. As other said, what I want first out of a fileystem is "boring". I would much rather be restricted to small volumes/files/slow lookup times/etc, than discover a sectors worth of data missing in the middle of my file because the power was lost at the wrong moment 6 months ago.
Making data smaller, slower, etc. doesn't solve the problem. Good design and implementation are what it takes. Wirth's work shows simplifying the interfaces, implementation, and so on can certainly help. Personally, I think the best approach is simple, object-based storage at the lower layer with complicated functionality executing at a higher layer through an interface to it. Further, for reliability, several copies on different disks with regular integrity checks to detect and mitigate issues that build up over time. There are more complex, clustered filesystems that do a a lot more than that to protect data. They can be built similarly.
The trick is making the data change, problem detection, and recovery mechanisms simple. Then, each feature is a function that leverages that in a way that's easier to analyze. The features themselves can be implemented in a way that makes their own analysis easier. So on and so forth. Standard practice in rigorous, software engineering. Not quite applied to filesystems...
I'm with you on that. Obviously lol. There's two paths to getting the complex functionality without the problems: filesystem over object model; application layer over filesystem model.
In the 80's-90's, many systems aiming for better robustness or management realized filesystems were too complex. So, they instead implemented storage as objects that were written to disks. Many aspects important for security or reliability were done here on this simple representation. The filesystem was a less privileged component that translated the complexities of file access into procedure calls on the simpler, object storage. Apps that didn't need files per se might also call the object storage directly. Some designs put the object storage management directly on the disk with on-disk, cheap CPU. Supported integrated crypto, defrag, etc. NASD's [1] and IBM System/38 [2] are sophisticated examples in this category.
The other model was building complex filsystems on simpler ones. The clustered filesystems in supercomputing and distributed stores in cloud + academia are good examples of this. The underlying filesystem can be something simple, proven, and highly performing. Then, the more complex one is layered over one or more nodes to provide more while mostly focusing on high-level concerns. Google File System [3] and Sector [4] are good examples.
So, we can have the benefits of simple storage and complex filesystems with few of their problems. That there's many deployed in production should reduce skepticism that it sounds too good to be true. Now, we just need more efforts in these two categories to make things even better. Nice as ZFS and BTFS look, I'd rather they had just improved XFS in directions of these categories instead. Duplicated effort would've led to innovation instead on top of any innovations they produced.
I did the same thing for the same reason and I'm still finding songs in my mp3 collection with long strings of nulls in them. They only spent a year on XFS but that, plus a flaky display driver, did quite some damage. Luckily I kept my less bulky files on an ext2 partition and they were fine.
Turns out supercomputer software might be written to a different set of requirements... Who knew!
I try not to rely on one storage device: assume it will have issues and implement accordingly. Yet, I've had audio issues with some files on XFS partitions. I didn't investigate further to see what it was but it is interesting that you said that. Might be an issue there.
EDIT to add: check your hard disks and OS's you plug them into, too. I lost a ton of data once over a bug in a driver that looked like a filesystem bug but had nothing to do with it. Several layers where issues can pop up but aren't obvious from their results.
Not relying on storage is a too general advice. I've never seen in practice a backup system that would survive your filesystem silently corrupting your data at random.
(Yeah, I know they exist. I've just never been in some place as big as to actually use them.)
You don't need anything big; it's just a matter of saving an hash of the file, then occasionally rehashing the files and comparing. If it's different, fetch the file from another location and update the local copy.
I use git annex, which has this built-in; just run "git annex repair" and it'll verify & fix any damaged files.
Forgot to add something relevant to other commenter's claim about needing to be "big" or whatever. I was one of last holdouts of 1 logical function = 1 physical hardware as I liked to customize software for function (esp security TCB). On a related note, having lots of redundant systems for protecting files takes all kinds of servers and is expensive, right? Especially with all that hashing and crypto?
Stumbled onto them looking for accelerated crypto & cheaper x86 boxes. So, 6x Artigo at $300 (back then), 3x UPS at $50 each, 2x network switch's at $50, and 12x cables at $5 each = a very high availability solution for 100GB+ data for $2,110 plus tax. Breaking 1TB just took an extra grand or so. At that time, getting a similar level of storage with only two-way redundancy from "bargain" server vendors was more like $10,000-20,000.
Just gotta invest wisely and your problem turns from "cost too much" to "wow I'm sinking a lot of time into this project." The latter is more fun, at least. :)
You nailed it. And there were typically utilities to do it on every 'NIX box I encountered. Along with scripts to mostly automate the process. :)
Then there was my evolved scheme of using diversity where I had different OS's or filesystems with stuff handled at application layer. More complex to set up but decreases chance any one set of code is going to mess you up. Safety or security through diversity is a powerful technique. Harder to do today with so much code reuse: things might only be different on the surface with same bugs lurking underneath. All these different web security schemes using the same OpenSSL library is a good example.
My application-layer file management avoided the issues. To be honest, I didn't even know what they were other than a lack of FSCK. Problems happened, they were corrected, and so on. It's an attitude I learned from research into cluster storage and safety-critical systems.
That said, this does seem to be a big problem for anyone depending on XFS by itself. Sounds like filesystem developers should dig in and fix that zero's problem among any other that are known. Just seems wasteful to ditch such a good, battle-hardened filesystem over a few bugs. Better to just fix them.
XFS was designed for use by the company that made these [1] bad boys. They did it and a clustered variant. Their systems worked on both large files for media and applications where lots of small data had to be processed in parallel (or just faster). So, it was general purpose albeit tilted for the big stuff a bit. I never benchmarked it, though, so I won't speculate about what's better for small files.
"With the advance in SSD, HDD will be mainly used for big files, so XFS may yet end up dominating."
Good point. We're already seeing that with all the videos and games. So, there might be a resurgence of XFS use. I doubt it will dominate, though, unless it stays the default in enough distro's.
I've always used XFS when I expect lots of small files. Maybe 10 or 15 years ago, I was benchmarking filesystems to store maildirs; XFS was the fastest by far when it came to directory scans. I'm not sure if that's still the case, but I haven't had a reason to re-examine.
I mostly used lowest common denominator of it with backups, clusters, and so on. Any other issues I countered with storage design and recovery procedures. I often did recovery around once a week or month to prevent silent buildup of issues. So, there might be issues I'm not aware of but it worked fine almost every time.
Another commenter mentioned audio skips which happened to maybe a handful of my own files out of GB of music. Could be one of those issues or a quirk with that commenter's setup.
I feel like, a few years from now, a new kernel developer will wonder why there are ext2 and ext4 modules but not an ext3 module, and this will be the answer.
The Extended File System is not a hypothesis, it was a real filesystem. But it was replaced by ext2 (from the same author) very early, and by the time the distributions we know today were being put together it had already become the default choice. I don't know when exactly ext was removed from the kernel tree.
It's the "fourth extended" filesystem, not the "thirty-fourth extended" filesystem. Hopefully we'll have something else to replace it before we get that far!
I'm glad the author chose to leave out the details and just say he left the community. We don't need to read the lurid but irrelevant details in every article about filesystems.
Your comment was enough to cause me to Google it. I'm glad I did (that isn't to say happy about the circumstance), this kind of drama doesn't come around these parts often...
As far as I can tell most of those comments pre-dates his conviction.
During the trial, I too saw plenty of reasons not to want to assume guilt prematurely - there were plenty of weird circumstances, and the evidence did not necessarily seem overwhelming at a distance (quite possibly reading the full transcript might have reduced that doubt) though it certainly pointed in his direction. The question really was not whether or not he looked guilty, though, but whether or not the evidence proved it beyond reasonable doubt.
But of course he afterwards ended up leading police to her grave, causing whatever doubt was justified before to disappear (and the pre-conviction doubt also steadily dropped largely thanks to his own behaviour during the trial).
Heh, thats funny. Just realized we called it file system wars while in reality it was a healthy debate. Compare it to init system wars to get a perspective :o
It's not clear why they don't remove both ext2 and ext3. From the article it sounds like they're equally redundant but harmless other than taking up space in a directory listing. Does ext2 have "features" (read: potentially useful bugs) that are no longer present in ext4?
Edit: Ahh. Looks like I missed "good filesystem for developers to experiment with."
I don't doubt that at all but I'm still confused. That's was SCM is for... Not like removing it from the repo makes it disappear off the face of the earth...
Yeah, I think it's still in use on embedded systems that need read-write filesystems. (Read-only filesystems often end up being squashfs instead for space reasons.)
ext2 doesn't have journal. I'm not sure if you can remove the journal from ext4 and use the ext4 driver, but running without journal could make sense on usb drives.
You can run ext4 without journaling. You can also run ext2 with the ext4 driver. The reason for keeping ext2 around is more for educational purposes rather than compatibility with removable devices.
This is the second LWN subscriber link to be posted in as many days. I suspect they might have been a little peeved as posting those for the entire world to use isn't exactly the intended use.
Jonathan Corbet has explicitly said he likes seeing LWN subscriber links posted to services like HN, as long as nobody is systematically posting every paid article.
> For a while, some thought that might be a filesystem called reiser4, but that story failed to work out well even before that filesystem's primary developer left the development community.
"no fstab changes required. The ext4 driver has been able to register itself as ext[23] for quite some time now, so it's transparent.
Many/most distros have been using the ext4.ko driver for ext3 & ext2 for years. You may already be using that on some boxes, and not even know it. ;)
-Eric (Sandeen)"
Some more techincal information here: http://article.gmane.org/gmane.linux.file-systems/97986 (thanks madars for the correct url)