Hacker News new | past | comments | ask | show | jobs | submit login

The scenario they wanted to handle is not a failing deletion request, but multiple requests caused by deleting one e-mail. This can happen if the index DB thinks the request failed and retries it, while it actually went through on the file DB side.

This would cause the file to be removed too early, as the counter would be decremented more than it should be.

That said, this magic number solution sounds pretty fidgety - preventing the index DB from retrying deletion requests would be enough to achieve the same effect (some unnecessary files kept, but no erroneous deletions).




There could be other necessary cleanups in the delete-function that needs to be completed aside from the deletion of the physical file. Other than that I generally agree, but also know never to assume to much about code written by other people.

I guess another way of solving this would be by doing something like putting each file in its own folder on the file system, giving the folder the same name as the original file and renaming the file to something. "file" or whatever. Then you create one symlink per email linking to the file and the email links to that symlink. When deleting the email you delete the symlink. After the deletion of the symlink you could check if there are no more symlinks in the folder, and safely delete folder if none present.

On receiving the first email with filehash X:

  mv ./original_unique_name.file ./original_unique_name.file/newfilename
  ln -s ./original_unique_name.file/newfilename ./original_unique_name.file/symlink_for_email_1
On receiving subsequent emails with filehash X:

  ln -s ./original_unique_name.file/newfilename ./original_unique_name.file/symlink_for_email_2
  ln -s ./original_unique_name.file/newfilename ./original_unique_name.file/symlink_for_email_3
  ln -s ./original_unique_name.file/newfilename ./original_unique_name.file/symlink_for_email_4
 ... etc
On delete email Y:

  unlink ./original_unique_name.file/symlink_for_email_$Y
  (pseudo) if no symlinks in ./original_unique_name.file/ { rm -rf ./original_unique_name.file/ }
Thus avoiding the need for a counter as each email is linking to its own symlink.

Will cost a few bytes per email, but seems like they have some to spare!


1. This is nothing to do with distributed storage

2. Symlinks are not required to be linked to existing file

3. Symlinks are not atomic

4. You still need to maintain filedb (or how do you resolve which storage contains given file?)


1. Depends on your distributed storage. If you use clustered file system your argument is void.

2. While true I fail to see how that is relevant. What part of my described flow would be broken?

3. While true I again fail to see relevance. Are you just listing characteristics of symlinks? A field in a database could exist with a filepath pointing to a non-existing file as well.

4. Sure. I was describing how to avoid the counter and magic number, not the database.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: