I don't see any way the ALG could work reliably without being stateful - after all, the string it's looking for might end up split across multiple packets for any number of reasons out of control of the network hardware.
Perhaps this is the reason why all of these kinds of protocols always seem unreliable...
It might, but in a very high proportion of cases the networking stack takes the happy path and a single logical protocol unit translates directly to a single packet on the wire. That being the practical reality, it's quite possible to write a stateless protocol parser and have it apparently work. Except, as you point out, when it doesn't.
Perhaps this is the reason why all of these kinds of protocols always seem unreliable...