The only complication is if you want to use Meeseks (https://github.com/mischov/meeseeks) which requires the Rust compiler and runtime be installed because it has native bindings. Meeseks is useful because it's a bit faster than the default Floki (https://github.com/philss/floki) and because it can handle very malformed HTML.
As for Elixir itself, here's a quick example:
```
# Assume this contains 1000 URLs
urls = [....]
# This will utilize 100 threads; if the second parameter is omitted, it will use threads equal to CPU cores. For I/O bound tasks however it's pretty safe to use much more.
Meeseeks's speed difference with Floki is not that significant, and my initial findings are they've leveled out even more with OTP 21, sometimes even swinging in favor of Floki.
The better handling of malformed HTML by default is the much bigger deal.
Thank you man (I know you are the author of Meeseks), I didn't know that. Always knew that the current info was the Meeseks was faster than Floki but it seems that OTP 21 largely eliminated that as you said.
It was pretty interesting to see Floki get a lot faster and Meeseeks actually get a little slower with OTP 21. I'll enjoy figuring out why. I hope to get a chance to work on the OTP 21 performance of Meeseeks before too long.
On the plus side there were some nice memory improvements for Meeseeks in OTP 21.
Don't let this sound patronizing because it's not -- but have you looked at how many times is the boundary between the BEAM and the Rust code crossed? I haven't inspected Meeseks' code so can't talk, just wildly guessing.
My ancient experience with Java <-> C++ bridges has taught me that if your higher-level language calls the lower-level language very often then the gains of using the lower-level language almost disappear due to the high overhead of constantly serializing data back and forth.
Anyhow, we should probably take this discussion to ElixirForum and not here. :)
(I am @dimitarvp there and almost everywhere else on the net, HN is one of the very few exceptions of inconsistent username for me).
As for Elixir itself, here's a quick example:
```
# Assume this contains 1000 URLs
urls = [....]
# This will utilize 100 threads; if the second parameter is omitted, it will use threads equal to CPU cores. For I/O bound tasks however it's pretty safe to use much more.
results = Task.async_stream(&YourScrapingModule.your_scraping_function/1, max_concurrency: 100)
```
It's honestly that simple in Elixir. For finer grained control the line count is little bigger -- but little. Not hundreds of lines for sure.