Nice! I wrote a .NET wrapper myself, never got around to a Python extension though. One question - did you experience any memory leak issues with the CLD? Said, .NET wrapper DLL seems to leak and I never really checked if it was the C++/CLI I added on top or the actual CLD native C++ code. I doubt the latter since (according to my basic understanding) nothing is created in the original code which needs to be cleaned up manually. Before I start debugging mixed-mode .NET applications I just wanted to be sure.
Encoding, but obviously not language, should be provided explicitly as metadata (e.g. Content-Type HTTP header). Also, most of content available on the web is already UTF-8 (65.9% according to a recent survey[1]).
In my experience chardet misclassifies very often iso-8859-1 as iso-8859-2. I saw the misclassification even in small spanish pages, which were using only the typical spanish characters.