Natural Language Corpus Data: Beautiful Data (2009)

dang · 2023-12-29T22:45:59 1703889959

Natural Language Corpus Data (2009) [pdf] - https://news.ycombinator.com/item?id=6411711 - Sept 2013 (3 comments)

Natural Language Corpus Data: Beautiful Data - https://news.ycombinator.com/item?id=1483187 - July 2010 (1 comment)

a2128 · 2023-12-29T22:52:16 1703890336

I clicked on count_1w.txt and scrolled to the bottom and found a lot of what seem like misspellings of Google. Then I clicked on count_2w.txt and did the same and regretted doing so

zerojames · 2023-12-29T23:24:43 1703892283

I have been building a word game and I came across these datasets. I find the range of words delightfully quirky, and something that may be useful in a game.

One thing I noticed about count_1w.txt is there are brand names like Starbucks in there.