Given the number of articles I guess you're processing each day, I think you sho...

pjscott · on Nov 16, 2011

Another anecdote: I was writing an HTML-to-text converter. The prototype used lxml and some custom DOM-traversal and formatting logic in Python. I got about a 17x speedup from porting the thing to use C and libxml2 (the parser that lxml uses). The port to C took most of an afternoon, and it's currently chewing through a lot of HTML without a problem.

JonnieCache · on Nov 17, 2011

I can confirm that libxml2 is indeed great.