I ran into the same problem. I started a pet project that involved scraping reddit (though for a different purpose than AMAs). Their robots.txt and an admin writeup from somewhere on their site made me realize that I'd probably just have to take it down and/or my scraper would just get blacklisted. It's a bummer because there's 1001 great ideas out there for filtering, categorizing, and viewing reddit's data in different ways. And it seems like they encourage 3rd party interaction to some extent with their API and all, yet scraping is kind of needed in most cases.
I do like the format for sure. The only thing I would consider is maybe nesting the Q/A divs (.qitem) for threads because a lot of times the Q/A content is contextual to past Q/As. You already order them that way and that helps a lot but on one of the ones I was reading it got confusing on whether they were speaking in the context of a thread or if it was a fresh Q/A. Maybe set it as a view option to toggle or something (maybe have it be a carousel where each frame contains all the Q/A divs in a thread starting with the root level, and keep it displayed flat like they are now).
I made sure to be nice to reddit. The scrapper is set to crawl reddit once every 12 hours for new "top monthly iamas".
Very good suggestion for nested threads. A good example of a reply to a question is on the westboro-baptist-church thread. I think it's possible to implement this suggestion. Will fool around with it on localhost and see what I come up with.
Yeah, I had a limiter put in mine as well so that it only made a request every 6 or 8 seconds.
No worries on the suggestion. Those threaded comments can be tricky sometimes.
Hey, if you do hear back from them about their stance on this sort of thing, I'd really appreciate if you could let me know what they say. I sort of halted my project after a certain point because I had the fear I'd just have to take it down as soon as I completed it.
User comments, but going by users rather than threads. That way you could get a profile where someone posts, or turn it around and see what prolific posters existed in a given subreddit.
The thing is it wouldn't sweep everything. Instead a user would only get scraped if a request was made to my app, and I had a tool that would go through a request queue (storing to my own DB) in a metered way so that reddit only experienced a handful of requests from me per minute.
Nonetheless it still breaks robots.txt and if I could dig it up admins have said in the past that don't want automated/batched requests hitting their site.
I do like the format for sure. The only thing I would consider is maybe nesting the Q/A divs (.qitem) for threads because a lot of times the Q/A content is contextual to past Q/As. You already order them that way and that helps a lot but on one of the ones I was reading it got confusing on whether they were speaking in the context of a thread or if it was a fresh Q/A. Maybe set it as a view option to toggle or something (maybe have it be a carousel where each frame contains all the Q/A divs in a thread starting with the root level, and keep it displayed flat like they are now).