Hacker News new | past | comments | ask | show | jobs | submit | _ose7's comments login

They currently have 150B posts (https://www.tumblr.com/about) so I expect that to be quite challenging. If you seriously make any progress though, it'd be cool to see.


Maybe 1:100 of those (being incredibly generous) are actually unique posts and not someone else reposting something from someone else, so maybe it's not such a hard task...


It's fairly common for commentary to be added to a reblog via tags rather than after the quoted post, so even reblogs without commentary may need saving.


This assumes there is commentary on Tumblr worth saving.


I'm guessing you've bought into the meme that Tumblr is literally nothing but left-wing politics.


No, the comment section on an average tumblr page:

X has shared this Y has shared this Z has shared this ....

Actual commentary? Not so much.

On a side note I tried going back there to see if the situation has improved and somehow the interface is even worse now. The blog I was trying to read would only appear as a slide in on the side and would disappear at the drop of a hat. I don't even understand how you are supposed to use it now.


Are you saying that meme is inaccurate?


Are you implying it's not?


Yes. The meme aligns with my own experience of tumblr, so I'm inclined to believe it over contrary anecdotes (of course a more rigorous study would be a different story)


That and porn, and Stephen Universe fans.


Step 1: Hack into a supercomputing center with at least 1Gbps line.

Step 2: Massively-parallel downloading of all the sites using clustered nodes, compression of it, and resulting data stored into high-performance, clustered filesystem.

Step 3: Move it off of there when traffic is low or overnight if system doesn't go offline overnight.


Step 0: Acquire at least ~1PB of storage to store all the data.


I have over 1PB of storage at my disposal.


Business or personal? It's doable but it's a lot of money to buy all those drives.


Personal.


$6k for 600T fully assembled backblaze storage pod: https://www.backuppods.com (no affiliation, just pointing out that ~$12k isn't a massive amount).


Note that hard drives are not included.

1000TB / 8TB = 125 HDDs

125 * $200 = $25k


Aww! Damn my quick posting (and too good to be true!). Thanks.


It's in the supercomputing center. How aboug I modify it where a filterimg step is run deleting everything that doesn't match on desired image festures?


> Step 3: Move it off of there when traffic is low or overnight if system doesn't go offline overnight.

Where are you moving it to? You think that even if you manage to hack into a "supercomputing center" that nobody's going to notice 1PB of storage filled with GIFs?


Having worked in a supercomputing center for a detector at the LHC, yeah, you could probably get away with storing a few hundred TBs for a short period of time without anyone noticing or caring. A whole petabyte might be pushing it.


And I learned about it from you people. All of them said security was lax. Most of them said they personally were using the supercomputer for their own stuff at some point.


What's this "you people" bub? :) in this case, it's not a security issue, it's a resource accounting issue.


People working in and around HPC centers. Obviously. :)

And no, it's both accounting and security issue. One guy I know who does security in ASIC's that stole HPC time in the past did it by modifying the accounting system to not show his jobs. It was easy as it wasnt designed to stop accounting fraud by hackers.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: