Hacker News new | past | comments | ask | show | jobs | submit login

I've been using Time Machine backups for years (company policy) via an external USB drive, and now that it's full and I only tend to remember it once every couple of months, a backup takes a full workday or more to run.

At the same time I've got Arq Backup running to back up my code folder (not everything in there is on accessible git remotes for me), but it's very heavy as well given the number of small files (code + git files). But at least it doesn't end up months out of date I guess.

Does anyone have a good backup solution for one's code folder? Large amount of small files (probably tens or hundreds of thousands. It's got a load of node_module folders as well I'm sure)




I have an hourly cron job that rsyncs my code folder to another one with venv, node_modules, Rust target directory etc. excluded. I only back up that folder. That cuts out most pointless small files and saves a lot of space too. I haven’t had much problem with .git folders because (1) I can only work so fast so the delta is usually fairly small; (2) periodic rsync cuts out a lot of intermediate changes. But if you still need to optimize, maybe you could repack .git too.


My code folder is a sym link to a folder in my iCloud drive. The beauty is all my code will be sync'ed to cloud automatically, and the bonus is I can see this folder in every machine that syncs with the same Apple account. I believe this approach works with any cloud drive like DropBox, Google Drive, One Drive, etc.


It also means that you can destroy your ‘backup’ easily, and get your erroneous change synced to every machine that syncs with the same Apple account, destroying most copies of the affected file(s).

Unlike Dropbox, iCloud Sync doesn’t version files, so that’s not easily corrected, even if you spot the problem early.


> Does anyone have a good backup solution for one's code folder? Large amount of small files (probably tens or hundreds of thousands. It's got a load of node_module folders as well I'm sure)

I've been using Duplicacy (along with Arq and Time Machine) which is amazing at speedy backups and deduplication. However, I've found that restores require quite a bit of CLI-fu [1].

Considering a move to Restic, because they have a killer feature which allows one to mount a snapshot as a FUSE filesystem [2].

[1] https://forum.duplicacy.com/t/restore-command-details/1102

[2] https://restic.readthedocs.io/en/v0.3.2/Manual/#mount-a-repo...


Borg Backup is quite good. It doesn't take advantage of FSlogger, but otherwise is well designed.


Arq is great, but like all third-party backup solutions, it relies on scanning the file system. See my comment: https://news.ycombinator.com/item?id=22311053.


If the backup disks is full, I'm not surprised doing a backup takes ages. It's got to go through purging stuff on the fly alongside backing up new stuff and constantly rebuilding indexes, which will grow and trigger even more purging requiring more re-indexing. Horrible. Your backup disk should always have some free working space for it to run efficiently.

Can't you reducing the retention period so you always have some working space free, or just get an appropriately sized backup disk? Time Machine backups should run in just a few minutes automatically every hour on a well configured system.


Use git and gitignore the node_modules folder. You shouldn't need to back up that directory, the package.json file should have all the required package info


Or use yarn 2, where if you check the local yarn cache folder in it's much, much smaller because it's compressed blobs for each dependency.


Probably best to ignore both git and node_modules folders with arq. Arq goes back and validates old backups sometimes (this makes my fans spin up much more than actually backing up weirdly) and both of those folders are going to be scanned and revalidated after the initial backup.

Fun fact, rumor on the webs is that a new Arq version is in the works! (Look at their twitter feed for screenshots they sent somebody recently)


> Probably best to ignore both git and node_modules folders with arq.

I agree with this. I also have (is) `Photos Library.photoslibrary` and (contains) `cache` as my exclusions. The exclusions have made my Arq backups less painful.


ValentineC, this is the best hacker news response I've gotten in a long time! Excited to mark those cache folders and stop syncing 200mb every hour.


It's not a Mac-centric solution but, all of my computers' crucial folders are sycned to each other via syncthing.

Desktop computers also back them up to another disk via backintime (an advanced backup tool built on rdiff).

This gives me realtime replication and two co-located backups. It's fire & forget kind of setup.


It’s definitely good to add more to the exclusion list for Time Machine (using System Preferences), since it might pull in very large things that you just don’t care about. For example, take a look at some things in ~/Library.


Can’t you just keep the external drive connected so that it does small incremental backups all the time?

I have a local external drive on a usb hub. I also have Backblaze for offsite backup. Neither one requires active management


If you use git, even without remotes, then perhaps use rsync or rclone to sync those repos to one or more storage areas? Could be sent to an SSH server, Google Drive, Backblaze B2 or whatever you like.


I use FreeFileSync periodically.

Relying on a cloud-based system always seems dodgy to me, since they sometimes get "confused" over which is the master.

I also put my code into Fossil for an easy-to-copy and move .db file.


I back up my entire home directory, including all gitted code projects twice daily using restic. It seems to work for now.


Hook up a timemachine NAS or similar to do the backups wirelessly?


You publish it on the Internet.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: