Hacker News new | past | comments | ask | show | jobs | submit login

I'm going to echo other people's skepticism and give a concrete example that's easy to reproduce and which has virtually no dependence on real experience in the physical world. Try asking it about public transit wayfinding trivia. Pure text matching, well defined single letter / digit service names, closed system of semantic content. All there is are services and stations and each service is wholly defined by the list of stations it stops at and each station is wholly defined by the list of services that stop at it. This should be a language models bread and butter. No complexity, no outside context, just matching lists of text together.

I talked to it about the NYC subway. Every time I nudged it with a prompt to fix a factual error or omission, it would revise something I didn't ask for and introduce new errors. It was inconsistent in astounding ways. Ask it what stations the F and A have in common twice and you'll get two wrong answers. Ask it to make a list putting services in categories, it will put the same service into more than one contradictory category. Point this out, it will remake the list and forget to include that service entirely. And that's when it isn't confidently bullshitting about which trains share track and which direction they travel.

Bullshit is worse than a lie. For a lie is the opposite of the truth and thus always uncovered. But bullshit is uncorrelated with the truth, and may thus turn out to be right, and may thus cause you to trust the word of the bullshiter far more than they deserve.




I've been spending some time trying to get a sense of how it works by exploring where it fails. When it makes a mistake, you can ask questions in a socratic method until it says the true counterpart to its mistake. It doesn't comment on noticing a discrepancy even if you try to get it to reconcile its previous answer with the corrected version that you guided it to. If you ask specifically about the discrepancy it will usually deny the discrepancy entirely or double-down on the mistake. In the cases where it eventually states the truth through this process, asking the original question that you started with will cause it to state the false version again despite obviously contradicting what it said in the immediately previous answer.

ChatGPT is immune to the socratic method. It's like it has a model of the world that was developed by processing its training data but it is unable to improve its conceptual model over the course of a conversation.

These are not the kinds of logical failures that a human would make. It may be the most naturalistic computing system we've ever seen but when pushed to its limits it does not "think" like a human at all.


> If you ask specifically about the discrepancy it will usually deny the discrepancy entirely or double-down on the mistake.

I have had the exact opposite experience. I pasted error messages from code it generated, I corrected its Latin grammar, and I pointed out contradictions in its factual statements in a variety of ways. Every time, it responded with a correction and (the same) apology.

This makes me wonder if we got different paths in an AB test.


How the hell does one A/B test a language model that even the designers don’t fully understand?

Of course, I’m sure that once you start plugging engagement metrics into the model and the model itself conducts A/B tests on its output… hoo boy….


I pasted error messages from code it generated. It kept generating the same compiler error eventually. When I applied the "socratic method" and explained to it the answer based on stack overflow answers. It would at first pretend to understand by transforming the relevant documentation I inserted into it, but once I asked it the original question, it basically ignored all the progress and kept creating the same code with the same compiler errors.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: