I unfortunately can’t imagine having time to test this, but I imagine there may be a way to accomplish this with embeddings.
The game itself is sort of an embeddings clustering problem, with the added difficulty that each group needs to only be alike in 1 way (versus a full vector distance which measures how alike they are in every way).
Maybe there is some way to search for a vector of weights, which, when multiplied by all members of a group of 4, produces weighted vectors with the least distance from their center? And then it’s an optimization problem to find the 4 groups that find minimize the total distance from each groups center?
It may be possible to find a weight vector that selects for a particular slice of a words meaning
That approach works well for a game like [Codewords](https://en.wikipedia.org/wiki/Codenames_(board_game)) where you're trying to find a single-word common hint between many of your words (that doesn't hit any of the other words).
My feeling is that it'll struggle with word-plays in OnlyConnect/Connections (like missing letters, added letters, words-within-words homophones, etc) as well as two-step references (such as {Venice, Dream, Night, Nothing} => "last words of Shakespeare plays"}).
I thought it would. But I've spent a fair bit of effort both using embeddings and also using prompts to GPT4, as well as combinations of the two approaches, to try to make a good spymaster for Codenames with essentially zero success.
I was playing a bit with embeddings in 2021. I'd played codenames online with friends in lockdown and we often had interesting boards we'd talk about, so when I saw papers like this (https://arxiv.org/abs/2105.05885) I looked into the topic. I found the suggested clues were very good, and there were some 'clue scoring' functions which correlated with the actual best spymasters. Wasn't scientifically rigorous as OPs post, but I would say it was good.
I tried with clustering similar embeddings but it did extremely poorly (~0%) since the groupings are often deceiving with words in a group only having one small way in which they're connected and lots of spurious fake groups to throw you off. Maybe looking for groups with high similarity on only a sibset of embedding dimensions might help, but I didn't have much time to play either :) A notebook to get you going if you do want to play: https://colab.research.google.com/drive/1KJeSB9Q5XzSeT9ONUJ_...
The game itself is sort of an embeddings clustering problem, with the added difficulty that each group needs to only be alike in 1 way (versus a full vector distance which measures how alike they are in every way).
Maybe there is some way to search for a vector of weights, which, when multiplied by all members of a group of 4, produces weighted vectors with the least distance from their center? And then it’s an optimization problem to find the 4 groups that find minimize the total distance from each groups center?
It may be possible to find a weight vector that selects for a particular slice of a words meaning