Hacker News new | past | comments | ask | show | jobs | submit login

Hi! Is there any one place that would be easiest for folks to grab these snapshots from? Would love to try my hand at finding documents that moved/documents that were removed.





Hmm, I can put them here for now: https://source.coop/harvard-lil/data-gov-metadata

Unfortunately it's a bit messy because we weren't initially thinking about tracking deletions. data_20241119.jsonl.zip (301k rows) and data_20250130.jsonl.zip (305k rows) are simple captures of the API on those dates. data_db_dump_20250130.jsonl.zip (311k rows) is a sqlite dump of all the entries we saw at some point between those dates. My hunch is there's something like 4,000 false positives and 2,000 deletions between the 311k and 305k set, but that could be way off.


Very cool! I take a look :)



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: