You'd want to white-list *both* based on user-agent string *and* IP / CIDR origi...

You'd want to white-list both based on user-agent string and IP / CIDR origin.

Keeping those lists maintained would be a bit of fairly constant effort, unless you could come up with a self-training mechanism, or a self-validating mechanism.

Killing Google's ability to crawl your site is pretty much as bad or worse than blocking bots. Though you should also be able to ID bad guys by noting which IPs the spoofed user-agents are coming from to, say, some honey-pot links specifically disallowed for that user-agent in your robots.txt file.