Namely, there is no guarantee that the bytes between `<page>` and `</page>` will be valid UTF-8. It may be the case that you only run this program with UTF-8 input, in which case, UB is never triggered. But it's worth pointing out here since there is nothing actually stopping your program from hitting UB.
Also, as long as you're bringing in the twoway crate, you might as well use it on lines 43 and 48 since you're just searching for a single needle.
Ah gotya. Yeah, I haven't added reverse searching to aho-corasick yet. Ran out of steam.
Either way, my point here is to be a counter-balance. To be fair, you did say, "But with Rust I managed to safely use." But the code you posted is technically unsound. It's not a huge deal if you know you'll always be feeding the program valid UTF-8. But it is worth mentioning here in this HN thread that is specifically comparing the safety properties of competing programming languages. :-)
I believe your use of `unsafe` on this line is unsound: https://gist.github.com/daaku/58557e2545612df8f40b13b66b7d3b...
Namely, there is no guarantee that the bytes between `<page>` and `</page>` will be valid UTF-8. It may be the case that you only run this program with UTF-8 input, in which case, UB is never triggered. But it's worth pointing out here since there is nothing actually stopping your program from hitting UB.
Also, as long as you're bringing in the twoway crate, you might as well use it on lines 43 and 48 since you're just searching for a single needle.