All these LLMs make up too much stuff, I don't see how that can be fixed.

elwell · 2024-06-27T17:31:29

> All these LLMs make up too much stuff, I don't see how that can be fixed.

All these humans make up too much stuff, I don't see how that can be fixed.

testfrequency · 2024-06-27T17:36:24

I know you’re trying to be edgy here, but if I was deciding between searching online and finding a source vs trying to shortcut and use GPT, but GPT decides to hallucinate and make something up - that’s the deceiving part.

The biggest issue is how confidently wrong GPT enjoys being. You can press GPT in either right or wrong direction and it will concede with minimal effort, which is also an issue. It’s just really bad russian roulette nerdspining until someone gets tired.

sva_ · 2024-06-27T17:57:06

I wouldn't call it deceiving. In order to be motivated to deceive someone, you'd need agency and some benefit out of it

advael · 2024-06-27T18:54:50

1. Deception describes a result, not a motivation. If someone has been led to believe something that isn't true, they have been deceived, and this doesn't require any other agents

2. While I agree that it's a stretch to call ChatGPT agentic, it's nonetheless "motivated" in the sense that it's learned based on an objective function, which we can model as a causal factor behind its behavior, which might improve our understanding of that behavior. I think it's relatively intuitive and not deeply incorrect to say that that a learned objective of generating plausible prose can be a causal factor which has led to a tendency to generate prose which often deceives people, and I see little value in getting nitpicky about agentic assumptions in colloquial language when a vast swath of the lexicon and grammar of human languages writ large does so essentially by default. "The rain got me wet!" doesn't assume that the rain has agency

sva_ · 2024-06-28T01:08:03

Well the definition of deception, according to Google and how I understand it, is:

> deliberately cause (someone) to believe something that is not true, especially for personal gain.

Emphasis on the personal gain part. It seems like you have a different definition.

There's no point in arguing about definitions, but I'm a big believer in that if you can identify a difference in the definitions people use early into a conversation, you can settle the argument at that.

advael · 2024-06-28T02:52:09

I both agree that it's pointless to argue about definitions and think you've presented a definition that fails to capture a lot of common usage of the word. I don't think it matters what the dictionary says when we are talking about how a word is used. Like we use "deceptive" to describe inanimate objects pretty frequently. I responded to someone who thought describing the outputs of a machine learning model as deceiving people implied it had agency, which is nonsense

testfrequency · 2024-06-27T18:01:47

Isn’t that GPT Plus? Trick you into thinking you have found your new friend and they understand everything? Surely OpenAI would like people to use their GPT over a Google search.

How do you think leadership at OpenAI would respond to that?

advael · 2024-06-27T18:38:42

The problems of epistemology and informational quality control are complicated, but humanity has developed a decent amount of social and procedural technology to do these, some of which has defined the organization of various institutions. The mere presence of LLMs doesn't fundamentally change how we should calibrate our beliefs or verify information. However, the mythology/marketing that LLMs are "outperforming humans" combined with the fact that the most popular ones are black boxes to the overwhelming majority of their users means that a lot of people aren't applying those tools to their outputs. As a technology, they're much more useful if you treat them with what is roughly the appropriate level of skepticism for a human stranger you're talking to on the street

mistermann · 2024-06-27T20:05:52

I wonder what ChatGPT would have to say if I ran this text through with a specialized prompt. Your choice of words is interesting, almost like you are optimizing for persuasion, but simultaneously I get a strong vibe of intention of optimizing for truth.

advael · 2024-06-27T20:39:53

I think you'll find I'm quite horseshit at optimizing for persuasion, as you can easily verify by checking any other post I've ever made and the response it generally elicits. I find myself less motivated by what people think of me every year I'm alive, and less interested in what GPT would say about my replies each of the many times someone replies just to ponder that instead of just satisfying their curiosity immediately via copy-paste. Also, in general it seems unlikely humans function as optimizers natively, because optimization tends to require drastically narrowing and quantifying your objectives. I would guess that if they're describable and consistent, most human utility functions look more like noisy prioritized sets of satisfaction criteria than the kind of objectives we can train a neural network against

mistermann · 2024-06-27T22:04:43

This on the other hand I like, very much!

Particularly:

> Also, in general it seems unlikely humans function as optimizers natively, because optimization tends to require drastically narrowing and quantifying your objectives. I would guess that if they're describable and consistent, most human utility functions look more like noisy prioritized sets of satisfaction criteria than the kind of objectives we can train a neural network against

Considering this, what do you think us humans are actually up to, here on HN and in general? It seems clear that we are up to something, but what might it be?

advael · 2024-06-27T22:36:09

On HN? Killing time, reading articles, and getting nerdsniped by the feedback loop of getting insipid replies that unfortunately so many of us are constantly stuck in

In general? Slowly dying mostly. Talking. Eating. Fucking. Staring at microbes under a microscope. Feeding cats. Planting trees. Doing cartwheels. Really depends on the human

mistermann · 2024-06-28T07:19:21

I would tend to agree!!

> Talking.

Have you ever noticed any talking that ~"projects seriousness &/or authority about important matters" around here?

advael · 2024-06-28T19:15:27

I think most people do that all the time. Projecting authority is one of the most important skills in a world dominated by human institutions, because it's an effective means of manipulating most humans. Sad but true

mistermann · 2024-06-28T20:10:09

Do you know any single person who can stop the process, at will? Maybe not always, but at least sometimes, on demand (either internally or externally invoked)?

advael · 2024-06-28T22:27:13

What, like not project authority? Admit that they are lost, confused, powerless, don't know something, aren't in control? Break the empire society kayfabe?

Yes, absolutely. I view this as one of the criteria by which I assess emotional maturity, and despite societal pressures to never do so, many manage to, even though most don't

I'm not a sociologist, but I think the degree to which people can't turn it off maps fairly well onto the "low-high trust society" continuum, with lower trust implying less willingness or even sometimes ability to stop trying to do this on average, though of course variation will exist within societies as well

I have this intuition because I think the question of whether to present vulnerability and openness versus authority and strength is essentially shaped like a prisoner's dilemma, with all that that implies

mistermann · 2024-06-30T17:20:57

> I'm not a sociologist, but I think the degree to which people can't turn it off maps fairly well onto the "low-high trust society" continuum

We're not fully aligned here....I'm thinking more like: stop (or ~isolate/manage) non-intentional cognition, simulated truth formation, etc.....not perfectly in a constant, never ending state of course, but for short periods of time, near flawlessly.

advael · 2024-07-01T07:28:04

Sure. There are people who can do that. I think it's a hard skill to master but definitely one that can be performed and improved somewhat reliably for people who manage to get the hang of it initially and care to work at it, and which I have seen a decent number of examples of, including a few who seem better at it than me

mistermann · 2024-07-01T20:21:08

Could you name any such (famous) people?

I think we're not talking about exactly the same thing though, which I'd say is my fault. I would like to modify this:

> stop (or ~isolate/manage) non-intentional cognition, simulated truth formation, etc.....not perfectly in a constant, never ending state of course, but for short periods of time, near flawlessly.

...to this (only change is what I appended to the end):

> stop (or ~isolate/manage) non-intentional cognition, simulated truth formation, etc.....not perfectly in a constant, never ending state of course, but for short periods of time, near flawlessly, without stopping cognition altogether (such as during "no mind" meditation or "ego death" using psychedelics). Think more like a highly optimized piece of engineering, where we have ~full (comparable to standard engineering or programming) access to the code, stack, state, etc.

advael · 2024-07-03T23:41:05

I'm not close enough to anyone you'd probably consider famous to claim to know the inner workings of their mind, and you keep adding more weirdly circuitously specified conditions. At this point I'm not sure what point, if any, you're trying to get at, and it's hard not to form the impression that you're being deliberately obtuse here, though it also could just be the brainrot that comes of overabstraction

mistermann · 2024-07-04T18:05:42

> and you keep adding more weirdly circuitously specified conditions.

1. I explicitly acknowledged I misspoke and wanted to clarify: "I think we're not talking about exactly the same thing though, which I'd say is my fault. I would like to modify this:"

2. What is circuitous about my question? Is my refined question non-valid?

> At this point I'm not sure what point, if any, you're trying to get at

I encourage you to interpret my question literally, or ask for clarification.

> ...and it's hard not to form the impression that you're being deliberately obtuse here...

obtuse: ": lacking sharpness or quickness of sensibility or intellect : insensitive, stupid. He is too obtuse to take a hint. b. : difficult to comprehend : not clear or precise in thought or expression".

I'd like to see you make the case for that accusation, considering the text of our conversation is persisted above.

Rhetoric is popular, and it will work on most people here, but it will not work on me. I will simply call it out explicitly, and then observe what technique you try next. You do realize that you people can be observed, and studied, don't you?

> ...though it also could just be the brainrot that comes of overabstraction

Perhaps. Alternatively, my question could be valid, challenging to your beliefs (which I suspect are perceived as knowledge), and you lack the self-confidence to defend those beliefs.

You are welcome to:

1. genuinely address my words

2. engage in more rhetoric

3. stay silent (which may be interpreted as you not seeing this message, regardless of whether that is true)

4. something else of your choosing

refulgentis · 2024-06-27T20:34:20

FWIW I don't understand a lot of what either of you mean, but I'm very interested. Quick run-through, excuse the editorial tone, I don't know how to give feedback on writing without it.

# Post 1

> The problems of epistemology and informational quality control are complicated, but humanity has developed a decent amount of social and procedural technology to do these, some of which has defined the organization of various institutions.

Very fluffy, creating very uncertain parsing for reader.

Should cut down, then could add specificity:

ex. "Dealing with misinformation is complicated. But we have things like dictionaries and the internet, there's even specialization in fact-checking, like Snopes.com"

(I assume the specifics I added aren't what you meant, just wanted to give an example)

> The mere presence of LLMs doesn't fundamentally change how we should calibrate our beliefs or verify information. However, the mythology/marketing that LLMs are "outperforming humans"

They do, or are clearly at par, at many tasks.

Where is the quote from?

Is bringing this up relevant to the discussion?

Would us quibbling over that be relevant to this discussion?

> combined with the fact that the most popular ones are black boxes to the overwhelming majority of their users means that a lot of people aren't applying those tools to their outputs.

Are there unpopular ones aren't black boxes?

What tools? (this may just indicate the benefit of a clearer intro)

> As a technology, they're much more useful if you treat them with what is roughly the appropriate level of skepticism for a human stranger you're talking to on the street

This is a sort of obvious conclusion compared to the complicated language leading into it, and doesn't add to the posts before it. Is there a stronger claim here?

# Post 2

> I wonder what ChatGPT would have to say if I ran this text through with a specialized prompt.

Why do you wonder that?

What does "specialized" mean in this context?

My guess is there's a prompt you have in mind, which then would clarify A) what you're wondering about B) what you meant by specialized prompt. But a prompt is a question, so it may be better to just ask the question?

> Your choice of words is interesting, almost like you are optimizing for persuasion,

What language optimizes for persuasion? I'm guessing the fluffy advanced verbiage indicates that?

Does this boil down to "Your word choice creates persuasive writing"?

> but simultaneously, I get a strong vibe of intention of optimizing for truth.

Is there a distinction here? What would "optimizing for truth" vs. "optimizing for persuasion" look like?

Do people usually write not-truthful things, to the point it's worth noting that when you think people are writing with the intention of truth?

advael · 2024-06-27T21:20:43

As long as we're doing unsolicited advice, this revision seems predicated on the assumption that we are writing for a general audience, which ill suits the context in which the posts were made. This is especially bizarre because you then interject to defend the benchmarking claim I've called "marketing", and having an opinion on that subject at all makes it clear that you also at the very least understand the shared context somewhat, despite being unable to parse the fairly obvious implication that treating models with undue credulity is a direct result of the outsized and ill-defined claims about their capabilities to which I refer. I agree that I could stand to be more concise, but if you find it difficult to parse my writing, perhaps this is simply because you are not its target audience

refulgentis · 2024-06-27T21:28:49

Let's go ahead and say the LLM stuff is all marketing and it's all clearly worse than all humans. It's plainly unrelated to anything else in the post, we don't need to focus on it.

Like I said, I'm very interested!

Maybe it doesn't mean anything other than what it says on the tin? You think people should treat an LLM like a stranger making claims? Makes sense!

It's just unclear what a lot of it means and the word choice makes it seem like there's something grander going on, coughs as our compatriots in this intricately weaved thread on the international network known as the world wide web have also explicated, and imparted via the written word, as their scrivening also remarks on the lexicographical phenomenae. coughs

My only other guess is you are doing some form of performance art to teach us a broader lesson?

There's something very "off" here, and I'm not the only to note it. Like, my instinct is it's iterated writing using an LLM asked to make it more graduate-school level.

advael · 2024-06-27T21:47:13

Your post and the one I originally responded to are good evidence against something I said earlier. The mere existence of LLMs does clearly change the landscape of epistemology, because whether or not they're even involved in a conversation people will constantly invoke them when they think your prose is stilted (which is, by the way, exactly the wrong instinct), or to try to posture that they occupy some sort of elevated remove from the conversation (which I'd say they demonstrate false by replying at all). I guess dehumanizing people by accusing them of being "robots" is probably as old as the usage of that word if not older, but recently interest in talking robots has dramatically increased and so here we are

I can't tell you exactly what you find "off" about my prose, because while you have advocated precision your objection is impossibly vague. I talk funny. Okay. Cool. Thanks.

Anyway, most benchmarks are garbage, and even if we take the validity of these benchmarks for granted, these AI companies don't release their datasets or even weights, so we have no idea what's out of distribution. To be clear, this means the claims can't be verified even by the standards of ML benchmarks, and thus should be taken as marketing, because companies lying about their tech has both a clearly defined motivation and a constant stream of unrelenting precedent

mistermann · 2024-06-28T07:22:58

> There's something very "off" here

You mean on this planet?

If not, what do you think of that idea? Does something not seem....weird?

swatcoder · 2024-06-27T18:10:33

In reality, humans are often blunt and rude pessimists who say things can't be done. But "helpful chatbot" LLM's are specifically trained not to do that for anything but crude swaths of political/social/safety alignment.

When it comes to technical details, current LLM's have a bias towards sycophancy and bullshitting that humans only show when especially desperate to impress or totally fearful.

Humans make mistakes too, but the distribution of those mistakes is wildly different and generally much easier to calibrate for and work around.

urduntupu · 2024-06-27T17:35:12

Exactly, you can't even fix the problem at the root, b/c the problem is already with the humans, making up stuff.

testfrequency · 2024-06-27T17:38:56

Believe it or not, there are websites that have real things posted. This is honestly my biggest shock that OpenAI thought Reddit of all places is a trustworthy source for knowledge.

p1esk · 2024-06-27T18:13:39

Reddit has been the most trustworthy source for me in the last ~5 years, especially when I want to buy something.

QuesnayJr · 2024-06-27T17:50:55

Reddit is so much better than the average SEO-optimized site that adding "reddit" to your search is a common trick for using Google.

acchow · 2024-06-27T19:47:49

While Reddit is often helpful for me (Google site:reddit.com), it's nice to toggle between reddit and non-reddit.

I hope LLMs will offer a "-reddit" model to switch to when needed.

empath75 · 2024-06-27T18:41:31

The websites with content authored by people is full of bullshit, intentional and unintentional.

testfrequency · 2024-06-27T22:47:52

It’s genuinely concerning to me how many people replied with thinking reddit is the gospel for factual information.

Reddit, while it has some niche communities with tribal info and knowledge, is FULL of spam, bots, companies masquerading as users, etc etc etc. If people are truly relying on reddit as a source of truth (which OpenAI is now being influenced by), then the world is just going to be amplify all the spam that already exists

CooCooCaCha · 2024-06-27T20:33:35

If I am going to trust a machine then it should perform at the level of a very competent human, not a general human.

Why would I want to ask your average person a physics question? Of course, their answer will probably be wrong and partly made up. Why should that be the bar?

I want it to answer at the level of a physics expert. And a physics expert is far less likely to make basic mistakes.

nonameiguess · 2024-06-27T20:59:45

advael's answer was fine, but since people seem to be hung up on the wording, a more direct response:

We have human institutions dedicated at least nominally to finding and publishing truth (I hate having to qualify this, but Hacker News is so cynical and post-modernist at this point that I don't know what else to do). These include, for instance, court systems. These include a notion of evidentiary standards. Eyewitnesses are treated as more reliable than hearsay. Written or taped recordings are more reliable than both. Multiple witnesses who agree are more reliable than one. Another example is science. Science utilizes peer review, along with its own notion of hierarchy of evidence, similar to but separate from the court's. Interventional trials are better evidence than observational studies. Randomization and statistical testing is used to try and tease out effects from noise. Results that replicate are more reliable than a single study. Journalism is yet another example. This is probably the arena in which Hacker News is most cynical and will declare all of it is useless trash, but nonetheless reputable news organizations do have methods they use to try and be correct more often than they are not. They employ their own fact checkers. They seek out multiple expert sources. They send journalists directly to a scene to bear witness themselves to events as they unfold.

You're free to think this isn't sufficient, but this is how we deal with humans making up stuff and it's gotten us modern civilization at least, full of warts but also full of wonders, seemingly because we're actually right about a lot of stuff.

At some point, something analogous will presumably be the answer for how LLMs deal with this, too. The training will have to be changed to make the system aware of quality of evidence. Place greater trust in direct sensor output versus reading something online. Place greater trust in what you read from a reputable academic journal versus a Tweet. Etc. As it stands now, unlike human learners, the objective function of an LLM is just to produce a string in which each piece is in some reasonably high-density region of the probability distribution of possible next pieces as observed from historical recorded text. Luckily, producing strings in this way happens to generate a whole lot of true statements, but it does not have truth as an explicit goal and, until it does, we shouldn't forget that. Treat it with the treatment it deserves, as if some human savant with perfect recall had never left a dark room to experience the outside world, but had read everything ever written, unfortunately without any understanding of the difference between reading a textbook and reading 4chan.

ssharp · 2024-06-27T17:38:23

I keep hearing about people using these for coding. Seems like it would be extremely easy to miss something and then spend more time debugging than it would be to do yourself.

I tried recently to have ChatGPT an .htaccess RewriteCond/Rule for me and it was extremely confident you couldn't do something I needed to do. When I told it that it just needed to add a flag to the end of the rule (I was curious and was purposely non-specific about what flag it needed), it suddenly knew exactly what to do. Thankfully I knew what it needed but otherwise I might have walked away thinking it couldn't be accomplished.

GiorgioG · 2024-06-27T17:45:49

My experience is that it will simply make up methods, properties and fields that do NOT exist in well-documented APIs. If something isn't possible, that's fine, just tell me it's not possible. I spent an hour trying to get ChatGPT (4/4o and 3.5) to write some code to do one specific thing (dump/log detailed memory allocation data from the current .NET application process) for diagnosing an intermittent out of memory exception in a production application. The answer as far as I can tell is that it's not possible in-process. Maybe it's possible out of process using the profiling API, but that doesn't help me in a locked-down k8s pod/container in AWS.

empath75 · 2024-06-27T18:46:23

I think once you understand that they're prone to do that, it's less of a problem in practice. You just don't ask it questions that requires detailed knowledge of an API unless it's _extremely_ popular. Like in kubernetes terms, it's safe to ask it about a pod spec, less safe to ask it details about istio configuration and even less safe to ask it about some random operator with 50 stars on github.

Mostly it's good at structure and syntax, so I'll often find the library/spec I want, paste in the relevant documentation and ask it to write my function for me.

This may seem like a waste of time because once you've got the documentation you can just write the code yourself, but A: that takes 5 times as long and B: I think people underestimate how much general domain knowledge is buried in chatgpt so it's pretty good at inferring the details of what you're looking for or what you should have asked about.

In general, I think the more your interaction with chatgpt is framed as a dialogue and less as a 'fill in the blanks' exercise, the more you'll get out of it.

neonsunset · 2024-06-27T17:56:04

From within the process it might be difficult*, but please do give this a read https://learn.microsoft.com/en-us/dotnet/core/diagnostics/du... and dotnet-dump + dotnet-trace a try.

If you are still seeing the issue with memory and GC, you can submit it to https://github.com/dotnet/runtime/issues especially if you are doing something that is expected to just work(tm).

* difficult as in retrieving data detailed enough to trace individual allocations, otherwise `GC.GetGCMemoryInfo()` and adjacent methods can give you high-level overview. There are more advanced tools but I always had the option to either use remote debugging in Windows Server days and dotnet-dump and dotnet-trace for containerized applications to diagnose the issues, so haven't really explored what is needed for the more locked down environments.

BurningFrog · 2024-06-27T18:03:04

If I ever let it AI write code, I'd write serious tests for it.

Just like I do with my own code.

Both AI and I "hallucinate" sometimes, but with good tests you make things work.

bredren · 2024-06-27T18:43:45

This problem applies almost universally as far as I can tell.

If you are knowledgeable on a subject matter you're asking for help with, the LLM can be guided to value. This means you do have to throw out bad or flat out wrong output regularly.

This becomes a problem when you have no prior experience in a domain. For example reviewing legal contracts about a real estate transaction. If you aren't familiar enough with the workflow and details of steps you can't provide critique and follow-on guidance.

However, the response still stands before you, and it can be tempting to glom onto it.

This is not all that different from the current experience with search engines, though. Where if you're trying to get an answer to a question, you may wade through and even initially accept answers from websites that are completely wrong.

For example, products to apply to the foundation of an old basement. Some sites will recommend products that are not good at all, but do so because the content owners get associate compensation for it.

The difference is that LLM responses appear less biased (no associate links, no SEO keyword targeting), but are still wrong.

All that said, sometimes LLMs just crush it when details don't matter. For example, building a simple cross-platform pyqt-based application. Search engine results can not do this. Wheras, at least for rapid prototyping, GPT is very, very good.

spiderfarmer · 2024-06-27T17:32:43

Mixture of agents prevents a lot of fact fabrication.