They're also forcing all adult-oriented content on Tumblr (which isn't limited to porn, although porn makes up the majority of their userbase) behind a logged-in-users-only wall, blocking it from non-Tumblr users and in turn, external search engine results.
They currently have 150B posts (https://www.tumblr.com/about) so I expect that to be quite challenging. If you seriously make any progress though, it'd be cool to see.
Maybe 1:100 of those (being incredibly generous) are actually unique posts and not someone else reposting something from someone else, so maybe it's not such a hard task...
It's fairly common for commentary to be added to a reblog via tags rather than after the quoted post, so even reblogs without commentary may need saving.
No, the comment section on an average tumblr page:
X has shared this
Y has shared this
Z has shared this
....
Actual commentary? Not so much.
On a side note I tried going back there to see if the situation has improved and somehow the interface is even worse now. The blog I was trying to read would only appear as a slide in on the side and would disappear at the drop of a hat. I don't even understand how you are supposed to use it now.
Yes. The meme aligns with my own experience of tumblr, so I'm inclined to believe it over contrary anecdotes (of course a more rigorous study would be a different story)
Step 1: Hack into a supercomputing center with at least 1Gbps line.
Step 2: Massively-parallel downloading of all the sites using clustered nodes, compression of it, and resulting data stored into high-performance, clustered filesystem.
Step 3: Move it off of there when traffic is low or overnight if system doesn't go offline overnight.
It's in the supercomputing center. How aboug I modify it where a filterimg step is run deleting everything that doesn't match on desired image festures?
> Step 3: Move it off of there when traffic is low or overnight if system doesn't go offline overnight.
Where are you moving it to? You think that even if you manage to hack into a "supercomputing center" that nobody's going to notice 1PB of storage filled with GIFs?
Having worked in a supercomputing center for a detector at the LHC, yeah, you could probably get away with storing a few hundred TBs for a short period of time without anyone noticing or caring. A whole petabyte might be pushing it.
And I learned about it from you people. All of them said security was lax. Most of them said they personally were using the supercomputer for their own stuff at some point.
People working in and around HPC centers. Obviously. :)
And no, it's both accounting and security issue. One guy I know who does security in ASIC's that stole HPC time in the past did it by modifying the accounting system to not show his jobs. It was easy as it wasnt designed to stop accounting fraud by hackers.
I've had similar problems. At this point, I basically run a botnet, albeit one I pay for. I have a rolling swarm of DigitalOcean and Vultr VMs that act as RPC clients for a custom system I wrote.
I've been meaning to integrate wpull into my scraper structure, so I can also ship what I scrape off to the internet archive, but I've not had the motivation.
Tumblr has a huge porn community, and always has. I don't have actual statistics, but it's huge. In general, people on tumblr might have a regular, personal blog, and then a side blog where they reblog and/or comment on/add captions to pornographic images, gifs and videos. In that sense, they can sort of curate porn that interests and appeals to them. The addition of this personal touch and intellectual component appeals to a lot of people, including a lot of women. But there are also various sex advice blogs, etc., which will be affected by this change as well, artists who do non-pornographic nude artwork, and potentially even various LGBT communities where sex can at times be a topic.
There are various levels of NSFW-flagging on Tumblr: users flagging their own site NSFW, Tumblr marking a blog NSFW and the user having no way to change that, or enough individual posts being flagged NSFW by users and AI that Tumblr decides to call the entire blog NSFW.
Maybe your vision of tumblr's userbase is biased by the kind of people you follow. Personally, I'm on the "computer science/glitch art/study/science" part of tumblr and I don't see much NSFW content (Apart from some bots who follow me which I block automatically).
The areas are kept quite separate, but that doesn't mean that there isn't still a huge porn userbase. Many people on tumblr have multiple blogs on different subjects, and deliberately keep their porn blog(s) separate from the rest.
Sounds a lot like what happened with LiveJournal. About 10-15 years ago LJ was where all the LGBT and niche erotic interests where, like fanfic. It's been bought by a Russian company and they've been turning the screws on the LGBT users for years.
Probably not, all the stats I've seen shown that it's something like 20-25% of total consumption, so certainly a lot, but not a majority.
There are reports of that upwards of 80% of Tumblr users have been "exposed to porn" occasionally or even accidentally, but that doesn't mean that the majority of Tumblr is porn.
This is a direction they headed in shortly after Yahoo acquired them, it's not a new thing with the Verizon acquisition. Blogs marked "adult" and NSFW posts already won't show up in search engines[edit: I stand corrected], by default in searches on the website, or searches on the mobile app (even when logged in).
That is not true at all. All Tumblr blogs, including NSFW blogs, have always been viewable and indexable by external search engines through their public URL (blogname.tumblr.com) unless flagged as private/hidden (a choice of the blog owner). They have only been hidden from Tumblr's search engine if the user has their account set to not see adult content. You can verify this by searching for "tumblr porn" on Google. Five of the results on the first page are porn blogs on Tumblr.
>All Tumblr blogs, including NSFW blogs, have always been viewable and indexable by external search engines through their public URL (blogname.tumblr.com) unless flagged as private/hidden (a choice of the blog owner).
There is definitely something in account settings somewhere that removes you from search results, I may have mixed that up with the NSFW flag.
>They have only been hidden from Tumblr's search engine if the user has their account set to not see adult content
As I mentioned, this is default. And "adult" blogs do not show up at all in search results on mobile.
My point stands, though. Yahoo already started making adult content harder to find (accidentally or intentionally), the momentum is already there.
https://techcrunch.com/2017/06/20/tumblr-rolls-out-new-conte...