Hacker News new | past | comments | ask | show | jobs | submit login

Do you mean backing up an entire domain? Like example.com/*

If so that's starting to roll out in v0.8.5rc50, check out the archivebox/crawls/ folder.

If you mean archiving a single page more thoroughly, what do you find is missing in Archivebox? Are you able to get singlefile/chrome/wget html when archiving?




Edit: The first option. ( previous stuff removed )

Lemme check my current version ( edit: 0.7.2 -- ty, I will update and test soon :D)


Ah ok. One caveat: it's only available via the 'archivebox shell' / Python API currently, the CLI & web UIs for full depth crawling will come later.

You can play around with the models and tasks, but I would wait a few weeks for it to stabilize and check again, it's still under heavy active development

Check archivebox/archivebox:dev periodically


No worries. I can do that.

You guys probably hear it all the time, but you are doing lords work. If I thought I could be of use in that project, I would be trying to contribute myself ( in fact, let me see if there a way I can participate in a useful manner ).


Thanks! I love working on archiving so far, and it's been very motivating to see more and more people getting into archiving lately.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: