Hacker News new | past | comments | ask | show | jobs | submit login

Possible I missed it, but I don’t see any reference to how they addresses chat thread continuity (aka memory) issues that are major issue for AI chat bots. Did they address this and if so, how?



This was mentioned in the "still room for improvement" section.

> It is important to recognize that CICERO also sometimes generates inconsistent dialogue that can undermine its objectives. In the example below where CICERO was playing as Austria, the agent contradicts its first message asking Italy to move to Venice. While our suite of filters aims to detect these sorts of mistakes, it is not perfect.


I think this might be a strength of Diplomacy for current AI models - making contradictory plans with two different players is a perfectly normal human move, as is saying one thing and doing another, as is cooperating on round N and defecting on round N+1.


Agree that detecting it might be hard and that it might actually as is be beneficial, but that said, there huge difference in my opinion between knowingly breaking promises based on a heuristic and having no memory of past promises and simply basing current tactics on the present. Imagine if real world literally had no memory of the past or a single actor within it, it would lose out to opposing players as soon as they realized it had no memory, you could make any promise you wanted and it would literally forget it. Might be wrong though, just know it’s major issue with current chat bots and an easy tactic to tell if you’re chatting with a bot or not.


Indeed, consistency (with the lengthy dialogue histories, but also the game state as well as game plans) was a huge challenge for us. We spent a lot of time working on techniques for detecting and filtering these kinds of low quality messages. You can see the Supplementary Materials in the paper for full details, but TL;DR: we built a suite of classifiers for detecting common mistakes (training classifiers to discriminate between human messages and counterfactuals), and used these classifiers as an ensemble which acted as a filter on top of message generation.


Did you capture conversational "transactions" as structured data in the game state, or was the chat history itself the only storage for that aspect of the game?

I would think you could avoid much of this issue by creating a more sophisticated structured game model and use the language model only for converting between structured and unstructured data.


They do have structured game model although it doesn't capture everything in chat. Language model still had lots of problems with consistency even with structured game model input.


Congrats on getting the related research published.

Feel like a hack would have been to try to force dialogue into an extractable form that stored a state model relevant to the game, even additional hacks like asking the opposing player to restate their understanding of prior agreements; disclosure that I have no idea how the game Diplomacy works, so might be irrelevant.

Beyond that, no idea how Facebook manages its AI research, but quick Google confirms my memory that Meta/Facebook has done prior research on enabling AI memory capabilities related to recall, forgetting, etc.; which I mention just in case you were not aware.


Do you think it is an easier challenge than than just “converse with a human” since there is a purpose/game state which is driving the interactions?


It is closed domain, so yes, certainly easier than open domain chatbot.


mainly by limiting it to "blitz" diplomacy which is a very limited version of the actual game, with only a few mins per phase and iirc no press on build phases. Human players with this limitation rely on a lot of acronyms to save time (they have to talk to 6 other players in those 2 minutes!).

Still natural language so impressive but no way it would hold up in a 10 minute conversation




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: