PSA for anyone playing around with that haveibeenpwned file: the text list is 30 GB after un7zipping but every line is padded with many trailing spaces (up to 61 chars I think, making all lines equal in length) and the line endings are the Windows ones (\r\n, instead of just a \n). With trailing spaces and \rs stripped the file is 'just' 21 GB and fits onto a RAM disk on a 32 GB machine with no swapping.
This has bit me because I was playing with that file and to save space and fit it onto a RAM disk (to save time) I've converted it to binary and then when converting it back it had a different hash despite being 'exactly the same', only later I've realized it's using Windows line endings and trailing spaces padding (and that I didn't even have to bother with binary, since 21 GB comfortably fits in a RAM disk on a machine with 32 GB RAM while 30 GB didn't with browser, Windows, etc. running).
Troy said it's a quirk of the export, might be fixed in next version of the file but doesn't matter for the compressed 7z size (which is true, I've checked personally with 7z going at it for hours and it really did compress those line endings and spaces down to nothing).
Edit: Also see my other comment about how very overly eager at rejecting passwords this filter might be (unless I've made an error in my reasoning that someone has since pointed out): https://news.ycombinator.com/item?id=17323068
This has bit me because I was playing with that file and to save space and fit it onto a RAM disk (to save time) I've converted it to binary and then when converting it back it had a different hash despite being 'exactly the same', only later I've realized it's using Windows line endings and trailing spaces padding (and that I didn't even have to bother with binary, since 21 GB comfortably fits in a RAM disk on a machine with 32 GB RAM while 30 GB didn't with browser, Windows, etc. running).
Troy said it's a quirk of the export, might be fixed in next version of the file but doesn't matter for the compressed 7z size (which is true, I've checked personally with 7z going at it for hours and it really did compress those line endings and spaces down to nothing).
Edit: Also see my other comment about how very overly eager at rejecting passwords this filter might be (unless I've made an error in my reasoning that someone has since pointed out): https://news.ycombinator.com/item?id=17323068