Explaining Large Language Models Decisions Using Shapley Values

xianshou · 2024-12-28T16:38:17 1735403897

This doesn't replicate using gpt-4o-mini, which always picks Flight B even when Flight A is made somewhat more attractive.

Source: just ran it on 0-20 newlines with 100 trials apiece, raising temperature and introducing different random seeds to prevent any prompt caching.

yorwba · 2024-12-28T18:18:18 1735409898

The newline thing is the motivating example in the introduction, using Llama 3 8B Instruct with up to 200 newlines before the question. If you want to reproduce this example with another model, you might have to increase the number of newlines all the way to the context limit. (If you ask the API to give you logprobs, at least you won't have to run mutiple trials to get the exact probability.)

But the meat of the paper is the Shapley value estimation algorithm in appendix A4. And in A5 you can see that different models giving different results is to be expected.

goldemerald · 2024-12-28T06:23:36 1735367016

While I love XAI and am always happy to see more work in this area, I wonder if other people use the same heuristics as me when judging a random arxiv link. This paper has one author, was not written in latex, and no comment referencing a peer reviewed venue. Do other people in this field look at these same signals and pre-judge the paper negatively?

I did attempt to check my bias and skim the paper, it does seem well written and takes a decent shot towards understanding LLMs. However, I am not a fan of black-box explanations, so I didn't read much (I really like Sparse autoencoders). Has anyone else read the paper? How is the quality?

mnky9800n · 2024-12-28T11:17:17 1735384637

I think that we should not accept peer review as some kind of gold standard anymore for several reasons. These are my opinions based on my experience as a scientist for the last 11 years.

- its unpaid work and often you are asked to do it too much and therefore may not give your best effort

- editors want to have high profile papers and minimise review times so glossy journals like nature or science often reject things that require effort on the review

- the peers doing a review are often anything but. I have seen self professed machine learning “experts” not know the difference between regression and classification yet proudly sign their names to their review. I’ve seen reviewers ask you to write prompts that are mean and cruel to an LLM to see if it would classify test data the same (text data from geologists writing about rocks). As an editor I have had to explain to adult tenured professor that she cannot write in her review that the authors were “stupid” and “should never be allowed to publish again”.

chongli · 2024-12-28T15:28:03 1735399683

A further issue is peer review quid pro quo corruption. The reviewer loves your paper but requests one small change: cite some of his papers and he’ll approve your paper.

I don’t know how prevalent this sort of corruption is (I haven’t read any statistical investigations) but I have heard of researchers complaining about it. In all likelihood it’s extremely prevalent in less reputable journals but for all we know it could be happening at the big ones.

The whole issue of citations functioning like a currency recalls Goodhart’s Law [1]:

”When a measure becomes a target, it ceases to be a good measure.”

[1] https://en.wikipedia.org/wiki/Goodhart's_law

mnky9800n · 2024-12-28T20:06:07 1735416367

Tbh I used to have an issue with that but these days it really is a small issue in the grand scheme of things. You can say No but also, there are larger systemic problems in science.

chongli · 2024-12-28T20:20:02 1735417202

You’re right. It’s more of a symptom of the systemic problems than the main problem itself. But it still contributes to my distrust in science.

3abiton · 2024-12-28T15:10:42 1735398642

Scientific peer review is another facit of civilization that its current design does not allow it to scale well. More and more people are being involved in the process, but the qualityis forever going down.

mnky9800n · 2024-12-28T20:10:09 1735416609

Yes that’s right. It’s a scaling problem and there isn’t a clear answer. It’s easy to complain about it though haha. I think what is happening is science is atomitizing. People are publishing smaller amounts or simply creating ideas from nothing (like that science advances paper on hacker news a couple days ago that created a hydrogen rich crust from thin air).

cauliflower2718 · 2024-12-28T08:07:10 1735373230

It looks like it's written in latex to me. Standard formatting varies across departments, and the author is in the business school at CMU.

In some fields, single author papers are more common. Also, outside of ML conference culture, the journal publication process can be pretty slow.

Based on the above (which is separate from an actual evaluation of the paper), there are no immediate red flags.

Source: I am a PhD student and read papers across stats/CS/OR.

ersiees · 2024-12-28T10:28:05 1735381685

Another clue: there is no way to download the latex, while you can if someone uploaded the latex on arxiv.

cauliflower2718 · 2024-12-28T16:15:18 1735402518

There's a lazy way to submit to arxiv, which is to submit just the PDF, even if you did it in latex. Sometimes it can be annoying to organize the tex files to submit to arxiv. It's uncommon, but the font and math rendering are the standard latex font.

woolion · 2024-12-28T09:41:49 1735378909

The Latex feel comes in good part from the respect for typographical standards that is encoded as default behaviour. In this document, so many spacings are just flat-out wrong, first paragraph indents, etc. If it's indeed Latex (it kinda looks like it), someone worked hard to make it look bad.

The weirdest thing is that copy-paste doesn't work; if I copy the "3.1" of the corresponding equation, I get " . "

refulgentis · 2024-12-28T07:19:46 1735370386

> I wonder if other people use the same heuristics as me when judging a random arxiv link.

My prior after the header was the same as yours. The fight and interesting part is in the work past the initial reaction.

i.e. if I react with my first order, least effort, reaction, your comment leaves the reader with a brief, shocked, laugh at you seemingly doing performance art. A seemingly bland assessment and overly broad question...only to conclude with "Has anyone else read the paper? Do you like it?"

But that's not what you meant. You're geniunely curious if its a long tail, inappropriate, reaction to have that initial assessment based on pattern matching. And you didn't mean "did anyone else read it", you meant "Humbly, I'm admitting I'm skimmed, but I wasn't blown away for reasons X, Y, and Z. What do you all think? :)"

The paper is superb and one of the best I recall reading in recent memory.

It's a much whiter box than Spare Autoencoders. Handwaving what a bag of floats might do in general is much less interesting or helpful than being able to statistically quantify the behavior of the systems we're building.

The author is a PhD candidate at the Carnegie Mellon School of Business, and I was quite taken with their ability to hop across fields to get a rather simple and important way to systematically and statistically review the systems we're building.

apstroll · 2024-12-28T10:32:03 1735381923

This paper is doing exactly that though, handwaving with a couple of floats. The paper is just a collection of observations about what their implementation of shapley value analysis gives for a few variations of a prompt.

refulgentis · 2024-12-28T20:00:43 1735416043

You have an excellent point. Bear with me.

I realized when writing this up that saying SAE isn't helpful but this is comes across as perhaps devils advocating. But I came across this in a stream of consciousness while writing, so I had to take a step back and think through it before editing it out.

Here is that thinking:

If I had a model completely mapped using SAE, at most, I can say "we believe altering this neuron will make it 'think' about the golden gate bridge more when it talks" ---- that's really cool for mutating behavior, don't get me wrong, it's what my mind is drawn to as an engineer.

However, as a developer of LLMs, through writing the comment, I realized SAE isn't helpful for qualifying my outputs.

For context's sake, I've been laboring on a LLM client for a year with a doctor cofounder. I'm picking these examples because it feels natural, not to make them sound fancy or important

Anyways, let's say he texts me one day with "I noticed something weird...every time I say 'the patient presents with these symptoms:' it writes more accurate analyses"

With this technique, I can quantify that observation. I can pull 20 USMLE questions and see how it changes under the two prompts.

With SAE, I don't really have anything at all.

There's a trivial interpretation of that: ex. professionals are using paid LLMs, and we can't get SAE maps.

But there's a stronger interpetation too: if I waved a magic wand and my cofounder was running Llama-7-2000B on their phone, and I had a complete SAE map of the model, I still wouldn't be able to make any particular statement at all about the system under test, other than "that phrase seems to activate these neurons" -- which would sound useless / off-topic / engineer masturbatory to my cofounder.

But to my engineering mind, SAE is more appealing because it reveals how it works fundamentally. However, I am overlooking that it still doesn't say how it works, just a unquantifiable correlation between words in a prompt and what floats get used. To my users, the output is how it works.

johndough · 2024-12-28T18:20:27 1735410027

Two more heuristics:

1. The figures are not vectorized (text in figures can not be selected). All it takes is to replace "png" in `plt.savefig("figure.png")` with "pdf", so this is a very easy fix. Yet the author did not bother, which shows that he either did not care or did not know.

2. The equations lack punctuation.

Of course you can still write insightful papers with low quality figures and unusual punctuation. This is just a heuristic after all.

chongli · 2024-12-28T15:19:55 1735399195

I didn’t even read the paper, I just read the abstract. I was really impressed by the idea of using Shapley values to investigate how each token in a prompt affects the output, including order-based effects.

Even if the paper itself is rubbish I think this approach to studying LLMs at least warrants a second look by another team of researchers.

scottiescottie · 2024-12-28T18:16:14 1735409774

explainable AI just ain't there yet.

I wonder if the author took a class with Lipton, since he's at CMU. We literally had a lecture about Shapley Values "explaining" AI. It's BS.