Nice work! Thank you for taking my feature requests to heart :)

dredmorbius · on July 10, 2023

This is the fun of posting my work publicly. Someone will pick it apart, and then it's a challenge to see if I can address the concerns.

I knew my first attempt was quick-and-dirty. It took a few minutes to improve that a lot, a few hours to track down numerous other issues (mostly involving title-casing exceptions, and adding more terms to that script which should either not be titlecased (e.g., "DNA", or which are mixed-case (e.g., iPhone), or which are ambiguous and should be treated differently in different cases (e.g., "us", which might be a first-person plural pronoun, or an abbreviation for "United States").

For the latter, I determined that "US" appearing either immediately after "the" or at the start of a title was virtually always the United States sense of the string. I've enshrined that in `titlecase`, though there are probably some other terms that tend to occur before or after the term which could be used to further disambiguate, say, "US Congress", "US Senate", or "US Law", for example. The additional gain from those is small.

If I were writing an AI then it might incorporate those weights, but this is just a simple sed script...