Which part of "suprises and mistakes" is covered in this announcement text? http...

jakear · on Feb 14, 2023

Q: Which part of "suprises and mistakes" is covered in this announcement text?

A: "Our teams are working to address issues such as misinformation"

Q: Also, user feedback is not going to improve the LM behavior.

A: finalModel = prospectiveModels.sort(m => m.averageUserFeedback)[0]

(Not the exact implementation, obviously. But if you think data can't improve AI models... I don't know what to say)

PaulDavisThe1st · on Feb 14, 2023

Fair enough, I missed the line about work on misinformation.

However, AFAIK, there is no AI-field wide consensus about sensible ways to address this, and no conviction from anyone that there are any reliable techniques. So it's nice that they have "teams working" on it, but IMO that doesn't justify deployment of clearly flawed technology for this purpose.

Data from users ("It was wrong") is hard to incorporate. A scheme like the one you propose basically implies using users as literal testers, which wouldn't matter so much if this was a UI/UX question. Instead, users will be given garbage, a few of them will find out, a few of those will leave feedback. This is not a sane model for improving the behavior of a language model.

jakear · on Feb 14, 2023

> there is no AI-field wide consensus about sensible ways to address this, and no conviction from anyone that there are any reliable techniques

Oh no, an open problem! Better just give up. Certainly don't allow anyone in the public to be involved in finding a solution. Much better to have an internal team stumbling around in the dark for years then force their product out when a different company comes along who was willing to develop in the open and has moved much faster accordingly.

> A scheme like the one you propose basically implies using users as literal testers

Users are literal testers, that's why the product is out now. Some users are being given garbage, some of them are finding out, and some of them are leaving feedback. That user flow is occurring at a measurable rate.

In the future, different models will be made. They will be given access to different "oracles" (in the computability theory sense), and these oracles will change their behavior. They will be able to do things like query {the web, wolfram alpha, python, prolog, etc.} and provide cited sources in responses. However, it's not enough to add the oracle. You must also verify the oracle improves the user's experience. This is done by comparing the measured feedback rate with/without the oracle(s).

PaulDavisThe1st · on Feb 15, 2023

> Better just give up.

Certainly not. But don't release a modification for a major public facing service based on "we'll probably figure it out one day".

> Users are literal testers, that's why the product is out now

I regard this as immoral (not for the UI/UX case, as I mentioned above). I don't expect or require you or anyone else to agree with me.