Not OP but crawling is easy if you don't try scanning 5+ pages a second - almost all rate limiting/heuristic based 'keep server costs low' engines, including Cloudflare, don't care if you request every page, but will take action if you do something like burst every page and take up just as many server resources as a hundred concurrent users.
Now, that is assuming you aren't on some VPS provider. If you're going to crawl, you'll have the best chance when you use your own IPs on your own ASN, with DNS and reverse DNS set up correctly. This makes it so the IP reputation systems can detect you as a crawler but not one that hammers every site it visits.
Also, I imagine that, for a search engine like this, it doesn't expect content to change much anyways - so it can take its time crawling every site only once every month or two, instead of the multiple times a week (or day) search engines like Google have to for the constantly-updated content being churned out.