Thank you for the tip. I wasn't aware of that, but it was not a problem to update the rules to account for the full AWS range based on the new information. I greatly appreciate your feedback. I am not sure what DO is though, would you be so kind as to deacronymize that for me, thank you.
We also run crawlers on our home laptops, on university servers, on every cheapo hosting service we can find (especially if they offer decent or "unlimited" bandwidth), and so on. Tools like wget and wpull can randomize the timing between requests, use regex to avoid pitfalls, change the user-agent string, work in tandem with phantomjs and/or youtube-dl to grab embedded video content...
Good luck playing whack-a-mole against the crawlers. I admit to being very curious what you're openly hosting online that you really don't want to get saved for posterity?