There's definitely a conversion cost. For strings, Python apparently caches the ...

burntsushi · on Jan 29, 2021

Be sure to use auto configuration to get it to go even faster, depending on your use case: https://docs.rs/aho-corasick/0.7.15/aho_corasick/struct.AhoC...

Or just be sure to enable the DFA option if you can afford it. It looks like the Python library is just the standard NFA algorithm.

itamarst · on Jan 29, 2021

Yeah, I was using DFA.

Next step is trying alternative approach, but if that alternative doesn't work I'm going to see about wrapping your package for Python.

Thanks for all your work on it!

burntsushi · on Jan 29, 2021

Nice! Reach out if there are any problems or if you need something exposed in the API. Looking at the pyahocorasick issue tracker, there are a number of features/bugs that your wrapper package would resolve. :)

liuliu · on Jan 29, 2021

NumPy also support conversions without copying. One thing I haven't found good way to bridge between Python is the pandas.DataFrame, it seems to be quite Python focused object and iterating through DataFrame is particularly slow.

itamarst · on Jan 29, 2021

Internally Pandas often uses NumPy arrays, especially for numeric data, so might be able to pass things that way in some cases?

E.g. `df["column_name"].values` will you get you a NumPy array.