Hacker News new | past | comments | ask | show | jobs | submit login

Back when I worked for a very large tech company, building their web crawler, I had good success with Golang. On four servers, with 10 GigE interconnect and SSD, and a very fast pipe to the Internet, I was able to push about 10K pages / second sustained. At any given time, there were probably several million connections open concurrently.

I've played with Elixir as well, and it's also great for this type of thing.

proxycrawl.com looks very cool, I'm actually looking for a proxy service for my current scraping project. Are they also a good choice if you're doing lower tiers (like thousands of requests a day)?




Golang is a good choice too but in my experience its nothing compared to what you can do with Erlang Queue and Elixir. Regarding your question about proxycrawl, I do not know honestly, I tested the service for few days on some few millions per day and it was great too. I would say they are good for a very high volume, we are still using it, so that should be a good signal to try them.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: