Hacker News new | past | comments | ask | show | jobs | submit login

There is nothing I hate more than an app that modifies files secretly when I open them. Then I have to get all defensive to copy files before I open them to keep them intact. You may not see the problem with changing the checksum or hash of a file, but silently tampering with files is a nightmare in many domains. If you open a file and accidentally change something trivial (some apps like to store things like presentation state i. e. window positions, last page viewed, zoom level, ...)

For example in many regulated domains such as human subjects research files must be approved and only approved files may be used. "Is this version of the consent document the version that the IRB approved?" Well let's see... (1) file modification date is after the approval date and (2) checksums do not match.

Not to mention that writing a single byte of content to a filesystems marks the entire blob as needing backup.

The fact is the filesystem is the user's database, save is commit, and it should be under the users control because application developers do not have the faintest idea about user context.




You are right, this is a use case I absolutely did not take into account, but I want to separate user-defined actions and app-defined actions. A level of zoom is something a user does to read a document, but I wouldn't consider that as data to be persisted automatically, unlike characters typed or a font chosen. I value the idea of persisting it, but that would be a user-specific action ("Save view" or something like that)

In the case of checksums in a database, that is why read-only modes should be used and I don't see what automatically saving would change. If anything, when the user zooms on a document in read-only mode, either it shouldn't be stored or storing it should trigger the same flow as modifying the document


I see what your are saying, but in other use cases the presentation state needs to be considered as part of the document. This is one of the reasons zip/jar containers work somewhat well. You can audit different chunks of data separately and cryptographically sign them. sqlite actually has an archive format[1] that is interesting to think about and I have pondered using it for some applications (store the files and also store tables of metadata/analysis)

[1] https://sqlite.org/sqlar.html


Why not do it like Blender: just autosave into some software directory, have the possibility to restore on crash, have the possibility to restore the last n autosaves from disk and add a setting for how many to save etc in the options.


Word has a fairly simple solution to this: there's a big slider labeled "Autosave" in the title bar, right next to the save button, allowing you to turn this behavior on and off at any time.

95% of the time I want changes persisted immediately, but it's nice to be able to turn it off when I don't.


Depends a little on the type of file. A prose document, sure, probably want autosave by default. A vector graphics file? I want autosave when I'm creating it, but I do NOT want autosave when I'm copying out a piece buried several groups in and behind some things I need to delete/move out of the way. I also don't want to have to think about whether I need it or not.

But generally the way autosave works is to save a copy that can be recovered on a crash, and only overwrite the original if directed by the user. That works for both use cases. (Haven't used Word in years, so I'm not sure if they have a different behavior now.)


For an application working with reasonably sized files sqlite files it would be reasonable to

1. on opening a file clone it to a temporary folder

2. edit the temporary file there on disk

3. on save mv/cp the temporary file over the destination

I am probably missing a lot of use cases, but it migth be a good idea for a game like Factorio where you are expected to have multiple on disk saves of the same run at different times.


In the sqlite case, I think it actually can save uncommited edits to a separate journal file until committed. At least, one of the systems I am familiar with that uses sqlite as a container format (MRI scanner) seems to do this, so I suspect sqlite supports that mode natively.

I'm just pushing back against the idea that its a good or helpful idea to "help the users" by taking the deliberate "save" action away from them.

As an aside, one of the things that has been learned from this class of MRI scanners is that users need to feel "in control" of the machines they're using. The "look how smart this machine is by doing all these magical things you used to do yourself!" attitude works well in sales but really does not go over well in the field because users encounter the fuckups and are held responsible for them. So they quickly start to distrust the machines.


> Not to mention that writing a single byte of content to a filesystems marks the entire blob as needing backup.

If the size, mtime, and inode number stay the same (i.e. it writes into the file directly instead of replacing it), then most backup software will skip it. AFAIK to do otherwise you either need to read the whole file every time, or be live monitoring audit events to see what files have been opened for write, or be the filesystem (e.g. ZFS snapshots, which can be maximally efficient since it knows exactly which blocks it's modified)

Of course this has its own downsides. While those writes may have been "unimportant", the fact is that your backups are now flawed. And if the application has had the foresight to distinguish unimportant writes, and preserve the mtime, I'd rather they just not make those writes in the first place


I am under the impression that modifying a file's content updates the modification time. Is this incorrect? Modifying a file without updating the mtime or allowing mtimes to be edited in userspace sounds like a security nightmare.


On Linux, the mtime ("modification time") is indeed automatically updated when the content is changed, but it can also be set arbitrarily from userspace (e.g. using `touch`) without special permissions. This is very useful, e.g. when you mirror a directory you'll typically want to preserve the mtimes, to help identify changes later.

Linux also has a ctime ("status change time"), which is automatically updated when the content or metadata (inode) are changed. It is not possible to change from userspace (you have to use tricks such as changing the computer clock or directly modifying the disk). This gives you the security benefits, but it is not commonly used e.g. in backup tools, precisely because they want to be able to set "fake" timestamps.


Yes it does. mtimes can definitely be edited, if you have permission, but it is rare. I have a photo script that pulls the taken time from Exif and writes it to the file mtime.


> mtimes can definitely be edited, if you have permission

It just requires you to be the owner of the file (https://serverfault.com/a/337810), no special permissions. On the other hand, editing the ctime is not easy -- it requires tricks (e.g. change the computer clock, which typically requires root privileges).


There's also SSD wear issues. There shouldn't be, because SSDs are durable, but some applications find dumb reasons to write multiple GB in a minute.

And by some applications I pretty much just mean browsers, but still.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: