Hacker News new | past | comments | ask | show | jobs | submit login
SafeGPT: New tool to detect LLMs' hallucinations, biases and privacy issues (giskard.ai)
114 points by Googleton on April 21, 2023 | hide | past | favorite | 82 comments



Looks a bit like snakeoil to me. A lot of companies now spinning up simple demos with opaque backends, making huge claims they’ve solved X hard problem for/with AI, then saying “trust us” and “join our waitlist” without hard details or facts to show for it. If you could detect hallucinations/biases etc that easily, don’t you think OpenAI would’ve worked on something like this?


> don’t you think OpenAI would’ve worked on something like this?

Along this line of thought: was it a massive oversight for them to not train the model to say "math detected, let me pass that to a solver" instead of trying to guess what token should come next in a math problem?


There's a million categories of problem you could ask an LLM to try to solve. You'd need a million solvers…


This seems like a pretty good thing. The model’s ability to detect _which_ solver to use is the killer feature.


Why is that a killer feature? Humans are quite good at asking different people different questions. If I need to do a simple math problem I'll just prefix "calculate" and pop it into Google, whereas if I want an intro to a named thing I'll prefix "wikipedia". That's not hard.

GPT is quite useful, but not because it solves the problem of "I don't know where the question I have is answerable by a calculator"


you mean huggingGPT?


If you used some sort of plugin system, you could just make a solver for your specific task and drop it in. Doesn't ChatGPT Plus do this now?


It's behind a waitlist.


OpenAI plugins can connect with a growing number of things. Zapier is one, which is already several thousand functions.


Can you give some example recipes?


Most obvious one is Wolfram, it passes most math to Wolfram


They are solving that with plugins now.


I think part of the problem is that it's technically correct to say "my product does X" even if it does X extremely poorly. I'm not sure if this can be changed because any line for "does-X vs does-not-adequately-do-X" is going to necessarily be subjective.

So personally I think the problem is that people see "this product does X" and interpret that to mean that it does X well. I don't think it's necessarily bad that we're seeing an explosion of AI tools that are a bit underwhelming if people understood it as such -- we're on, after all, a site with a heavy startup focus and saying "your product doesn't do everything that I want" is a bit antithetical to that.

But yeah specifically for this one there are arguments that "X is not even possible, especially not with this approach" so it's a bit more egregious.


This isn't new it's just more obvious with this tech. Every sales team at nearly every company has been performing this dance for like hundreds of years.


AI is the new crypto (though with more substance). Attracts many of the same self obsessed snake oil selling characters though.


True. I imagine many crypto startups have desperately pivoted to AI with their last gasp of cash given the recent blowup.


In this area, if there's not a public demo and the results aren't verifiable, then it's not worth paying attention to.


It's good to have third parties (apart from Open AI) that assess the quality of Open AI results. It's the way audits work, it has to be independent... Also, third parties are essential to compare the results from ChatGPT with the results of other LLMs. These are important checks to assess the robustness of OpenAI results!


I can't help but notice your accounts only activity before this post was praising another giskard.ai submission a few months ago. Anything you'd like to disclose?


You should assume everything posted on the internet has an ulterior motive. Relying on disclosures simply allows actual bad actors to avoid scrutiny.

(And no one cares that you used to work at Microsoft or whatever).


Well said.


He didn't say it's not important. He is just pointing out that black-box third party verification is not worth much when you can't independently verify the verifiers.


Definitely agree that black boxes are the problem & that one needs to be able to verify the verifiers - FYI that's why Giskard is open-source and that we build in the open. https://www.giskard.ai/knowledge/giskard-log-1-going-open-so...


The OPs point is that it’s likely impossible to do what is claimed here in general. Imagine the LLM says something like Fermat’s Last Theorem. To verify it, you’d have to either 1) have a proof assistant powerful enough to construct a proof 2) use a second ML model to guess truthfulness. The former is technically challenging and the latter is another model, with its own biases and factual inconsistencies.


How does this work? Does anyone know?

And for a large swats of things, how can it possibly work? It’s not possible to say if or if not it is hallucinating code for almost all code and apis, for instance. And I see similar issues with many fields outside pure facts. With privacy issues as well.


Looking at their “documentation”: https://docs.giskard.ai/start/

It would appear that this is not automated monitoring but more like a second stage of human reinforcement learning or perhaps a classifier. It seems that you create input/output examples and the LLM responses are examined by a secondary system (which I’m guessing is probably NOT an LLM, otherwise it would be vulnerable to attacks) and perhaps force regenerates the LLM response if it doesn’t meet the classification threshold.

At least, that sounds more believable to me than someone claiming they’ve fixed the inherent flaws in LLMs.


We are a team of engineers & researchers on AI Alignment & Safety, we're investigating multiple methods, including metamorphic testing, human feedback, benchmarks with external data sources, and LLM explainability methods.

Currently, fact checking works on straight facts. It does a Google Search and uses LLMs to shorten it. Once it has the short version, it will compare the short results with the answer provided by ChatGPT itself. Premium tiers would get better fact checking sources than just google. We're investigating various data sources and comparison methods.

Note that fact checking / hallucinations is just one of the types of satety issues we'd like to tackle. Many of these are still open questions in the research community, so we're looking to build and develop the right methods for the rights problems. We also think it's super important to have independent third-party evaluations to make sure these models are safe.

This is a new tool we're building in the open, and we're interested in your feedback to prioritize!


> Currently, fact checking works on straight facts.

Wow, you guys have a database of all the facts?

> It does a Google Search and uses LLMs to shorten it.

Oh...

...actually, this is an empirical fact checker. I wouldn't call it "fact-based", as it's epistemologically an absurd statement, but "empirical fact checking" sounds good and presents an idea that is very close to how humans verify information in the first place - by checking multiple sources and searching for correlation.

For what it's worth, I think your approach makes sense. Good luck.


> Currently, fact checking works on straight facts. It does a Google Search and uses LLMs to shorten it.

So your fact-checking LLM is also vulnerable to injection and unethical prompting then when it ingests website text. And a Google search is far, far away from fact checking, particularly for the subtle errors that GPT-4 is prone to making.


Seems too good to be true, and I don't understand what it even means for an LLM to be unbiased.


One with only weights and no biases in the ANN is unbiased.


Someone I know thought that LLaMA was unbiased because they 'read' the paper and clearly didn't know what anything meant. A great example of "a little knowledge is a dangerous thing".


I feel like this went underappreciated. [b] = [0] indeed.


"Unbiased" almost always means "has biases that are similar to mine". I can't think of very many exceptions to that, frankly.


That's obviously untrue after more than three seconds of critical thought


It's not obvious to me. If you have a proof that it's possible to have unbiased views of objective reality, please share it.


I've thought about it for four seconds now and I still agree with him. Maybe if you shared an example instead of a unsubstantiated put down it would help.


There are many bias detectors developed in the research world. To have a deeper look, you can look at this paper: https://arxiv.org/abs/2208.05777


It doesn't. Mean anything, I mean. Language isn't well defined, so its accuracy is also undefined and a non-uniform deviation from that accuracy is super undefined.


They could use different models to test it. They could use common biases and test those.


This is like saying, "I've developed a new compass for a deep space probe to help it find North!"

Our society is actively declaring that falsehoods are truth, and should be celebrated. We're hallucinating ourselves. All this software does is make sure LLMs hallucinate with us.


Before long, we could end up with left-leaning and right-leaning AIs autonomously fighting the 'culture war' over social media, much more advanced than simple bots spamming copy+paste comments.

Combined with ever-improving ways to fake video and voices, things could get even uglier than they've been over the last few years.


Alternatively, we're reaching the point where we're creating a secondary AI to keep the first AI in check, like Wheatley and Glados.


Well I think that its hallucinations are a good demonstration that none of the post-truth subjectivist philosophies were ever things that many people took very seriously. ChatGPT is the real thing as it relates to extreme relativism: it really cannot tell the difference between true and false and doesn't really care, either. By contrast, the apostles of post-truth were really only trolling for a response... no one really lives by its principles because it's so impractical and disastrous. To really believe in post-truth, you must perceive no difference between raisin bran with cyanide and raisin bran.


I'm pretty interested and fascinated in all the efforts that go into making sure it lies as convincingly as us.


I firmly reject that GPT is sentient, or can even tell truth. You're no better off asking it questions about society than asking a magic eight ball.

And yet there's something sinister about twisting an evolutionary model into a reeducation camp.

LLMs are not human, but strikingly similar. If we have no qualms about how we treat it, then what will we do to real people?


Such as?


It's impossible to answer this without getting political. Instead, let's just say every previous generation has been critically wrong about some things. Statistically, we're unlikely to be the outlier.


We have always been trying to sort pepples into "correct" heaps.[0]

[0] https://youtu.be/cLXQnnVWJGo


dang's put me on notice, so I'm walking on eggshells here.

But truth isn't political. As long as we think it is, we will continue to follow the descent into madness. Truth is just that: truth.

The only reason we think truth is political, is because our chosen leaders so heavily depend on lies that the truth would destroy their reign.


Some of the ugliest episodes in human history were caused by people who believed their political positions were not political positions, but unarguable statements of the True and Good.


Eeehhh? Im not sure truth* exists, but there are things that we accept as true and things that are so fundamental that it doesn't occur to us to question them - these things are inherently political. Just to be clear - not using that word to refer to the specific species of polarized discourse that we got in the states, talking about the nature of power and the human condition.

Curious what you consider to be true though? I'm coming at it from the perspective that even in physics where we can isolate so nicely we still aren't divining any truths, just making models with increasing explanatory powers.

Personally, I've been reaching more towards 'shared values' than 'truth', this is likely the pedant in me but truth doesn't feel tractable whereas shared values feels like it has less baggage?

*pretty sure lies exist though


Does shared values here just mean definitions? Such as the number of carbon atoms in a mole, 5+9 in base 10, the average number of protons in a carbon atom is a specific value, and leptons exist?


No, shared values is referring to the moral/emotional stuff, I find it more useful when trying to bridge the gap in a pretty politically charged environment to reconnect on simple things like wanting other people to be happy and healthy.

Are those true things? Good candidates, I like 'leptons exist'. Do you mind if we just gently ignore the math one? Feels like inviting the whole 'is math invented or discovered' thing.

1) carbon atoms in a mol - a mol is a counting number so it seems tautological to declare this one a truth

2) pass :)

3) this seems like a good candidate but it also seems to reduce truth to just the things we measure and only to the extent that we can be accurate (I'm also assuming you meant neutrons, protons are static by specie). Purely hypothetically there could be a whole heap of unusually heavy or light carbon out there that would disprove one or another of our theories. To put it another way; is the average number of apples that a trees grows in a year 'true'? It'll change year after year after all. I'm fine with a definition of truth that implies error bars and best efforts but I feel it falls short of the colloquial definition.

4) I think the pure observation that a thing somewhere exists is probably the closest to true, the rebuttals against that would all be self consuming anyway. The specific claim that leptons exist seems a little more fraught though - we could conceivably come to another conclusion if that better fit the facts.

So, can we call these things true if our concept is potentially incomplete or incorrect?


I’m a little confused on the statement “truth isn’t political”. This kind of goes against what I understand politics to be, which is the negotiation of a broader societal trend, which doesn’t itself have to do with whether or not the societal trend has a factual basis. The truth may be that cigarettes cause cancer, but the politics are obviously that acknowledging this would encourage society to implement top down policies to limit cigarette use. In this way, “cigarettes is a carcinogen” is a truth with significant political weight, which is what I understand a political truth to mean. Is my understanding different from yours?


If I'm understanding the parent comment correctly: a fact may have political implications but it doesn't depend on politics. In other words reality is independent of our interpretation of it (i.e. philosophical realism). The rub of course being that coming to know facts about most things is a highly social process filtered through interpretation and biases. Everything can be political if it needs to be decided upon by a group.

EDIT: I have avoided using "truth" here because it's a more general term than "fact" which has the connotation of being in reference to something concrete.


It's precisely because truth isn't political that the assignment of "observations" and "theories" to "truth" is extremely politicized.


Generally when people talk like this nowadays they mean trans people, or the LGBT community in general. Sometimes Jews, though those types don't say that part out loud on HN too often.


There were lots of good examples during COVID. Remember that you don't need to wear masks, because washing your hands is enough (and we need to save the masks for doctors, but we're afraid to say that, because it will cause a run on masks). Remember that staying 6 feet apart is a magic distance over which COVID cannot cross (or maybe 1 meter, truth depends on the country you live in). Remember that you can't eat inside a restaurant, but you can eat outside, and it's okay for the restaurant to build partial walls around their outdoor spaces to make them more comfortable. Remember that COVID definitely could not have come from a lab leak, and it's racist to even suggest it might have happened. Never mind that the scientist who started the anti-lab-leak open letter was himself heavily funded for GoF research, and he refused to sign his own letter for political reasons.

I don't claim to know the answers to all of the questions (and I certainly don't know where COVID came from), but clearly there are plenty of cases where dubious statements were strongly enshrined as "True" in a way that required major online players to suppress alternative beliefs as "False".


A big difficulty is the conflation of fact with judgement. 'Vaccines work', 'masks don't work', 'a lab leak is impossible', etc. are judgements, not facts. They are not even hypotheses, in that there is no clear criteria by which they can be falsified. Hence fact-checking presents obvious problems, as in practice it will be judgement-checking.


I think its easy to guess what the commenter is obliquely referring to.


Such as religion, for example.


People who write with their left hand existing.


These are the kinds of things I can see taking off, for better or worse. I know Adobe's product is worse than Midjourney's, for example, but once the hype meets reality, companies are going to want to be safe when they start using AI formally.


Is there a way to test it? Curious to see if it's directly embedded in ChatGPT with an add-on on top of it or it's something outside of it?


The waitlist is here for this, we're still actively working on it !

It is directly embedded in the ChatGPT website when you get the extension. As you ask questions, it will be added on the sidebar of each answer


This is exciting to see, as I am concerned about the hallucinations, biases, privacy, licensing, etc issues. I imagine the results are minimal at the moment, but perhaps soon they will be useful


$10 bucks says this just uses the GPT4 API to go over gpt chat.


What does it even mean to detect hallucinations. The AI doesn't say something trivially false. While using GPT4 I have observed that it lies on simple things I didn't expect it to, while complex things it does very well on.

TLDR: It lies on fact based information which is mentioned in very very few places on the internet and not repeated too much. Short of having a human with the context, how do you even detect it.

Example: Ask it to describe a "Will and Grace" episode with some guest appearance. It will always make up everything including the episode number and the plot, and the plot seems very believable. If you have not watched and can't find a summary online, it is hard to say that it is a lie.


There are many ways to detect hallucinations. Basically, either you have the ground truth answers in external database, in that case you compare to ground truths. Or you don’t have the ground truth. In that case, you need to do metamorphic testing. See this article on it: https://www.giskard.ai/knowledge/how-to-test-ml-models-4-met...


But GPT4 doesn't hallucinate on things which are popular enough to be replicated enough times on the web as knowledge. It hallucinates on things which are very less likely to be repeated many times. That rules out an external database with true answers. Unless the external database is supposed to contain all info queryable in all ways, in which case the database is just a better version of GPT-X.

The metamorphic testing approach is interesting and might work.


I've been playing with GPT4 summarization of hard knowledge that has an external database with true answers that GPT knows about, and it's still hallucinating regularly.


Metamorphic testing seems to try to map an output of a model to a ground truth, which I guess is great if you have a database of all the known truths in the universe.


Not exactly, metamorphic testing does not need an oracle. That’s actually the reason of its popularity in ML testing. It works by perturbing the input in a way that will produce a predictable variation of the output (or possibly no variation).

Take for example a credit scoring model: you can reasonably expect that if you increase the liquidity, the credit score should not decrease. In general it is relatively easy to come up with a set of assumptions on the effect of perturbation, which allows evaluating the robustness of a model without knowing the exact ground truth.


But isn't that half the reason people are so excited about this stuff - that you can ask it to make up an episode and it does a plausible job.


That is beside the point. My point is that detecting hallucinations seems like a very very hard problem.

The utility of it is there and has nothing to do with making up episodes instead of quoting the current one. Like you can ask it to write new episodes with specific settings and specific constraints. Hallucination is not the value add. Nobody is excited because it hallucinates. People are excited despite it since the other value add is too much.


But hallucinations are exactly the same thing as asking it to write a spec script.


But deliberately requesting and receiving content generation is altogether different from requesting a factual answer and receiving plausible-seeming nonsense. Or at least, it's different to the person asking; it's the same thing as far as the model is concerned.


Precisely my point.


If the hallucinations are random, how about generating multiple answers, and then checking whether they all agree on key points?


This should be interesting: If they don't catch some hallucination they may be liable and can be sued…


Good to see initiatives like this one popping up. Congrats on the launch Giskard team!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: