Yeah, I wasn't sure how I wanted to deal with duplicates so I mostly ignored the...

jonathankoren · on Jan 16, 2022

So I tried an experiment using 15,918 five letter English words. I used a basic scoring strategy of scoring a word by summing up the frequency of the candidate letters in the candidate words as determined by a regexp of included and excluded letters. (e.g. `.aves` would score `waves` 1, but `saves` as 0 since `s` is already included)

Variations included adding in the frequency of the letter at a particular position, and adding in the frequency of two letter combinations.

Interestingly enough, the winning strategy was using single letters and using figuring in the position. Second second best was using two letters and position.

ngram=1 posfreq=True mean attempts: 4.34 WinPct 91.280%

ngram=2 posfreq=True mean attempts: 4.35 WinPct 91.186%

ngram=2 posfreq=False mean attempts: 4.37 WinPct 90.074%

ngram=1 posfreq=False mean attempts: 4.38 WinPct 90.445%

Since my base dictionary is way bigger than the Wordle one, I also mixed in a smaller 1,382 word dictionary (google-10000-english.txt) and then combined them by either just sorting by the score, or normalizing the scores, and then sorting. Normalizing the scores was strictly worse.

normalize=False ngram=1 posfreq=True mean attempts: 4.34 WinPct 91.280%

normalize=True ngram=1 posfreq=True mean attempts: 4.43 WinPct 90.281%

FWIW, the absolute worse one was:

normalize=True ngram=1 posfreq=False mean attempts: 4.43 WinPct 89.835%

I should write this up.

hddqsb · on Jan 17, 2022

Which solution list did you use to calculate the mean attempts?

Another comment mentioned https://botfights.io/game/wordle, if you evaluate your solver on their word list you could compare scores.