I clicked on count_1w.txt and scrolled to the bottom and found a lot of what seem like misspellings of Google. Then I clicked on count_2w.txt and did the same and regretted doing so
I have been building a word game and I came across these datasets. I find the range of words delightfully quirky, and something that may be useful in a game.
One thing I noticed about count_1w.txt is there are brand names like Starbucks in there.
Natural Language Corpus Data: Beautiful Data - https://news.ycombinator.com/item?id=13197612 - Dec 2016 (13 comments)
Natural Language Corpus Data (2009) [pdf] - https://news.ycombinator.com/item?id=6411711 - Sept 2013 (3 comments)
Natural Language Corpus Data: Beautiful Data - https://news.ycombinator.com/item?id=1483187 - July 2010 (1 comment)