Is there a name for the logical fallacy that the author presents, that goes like:
1) In the past (some portion of) society was scared about the printing press, and it turned out to be fine
2) In the past (some portion of) society was scared about the internet, and it turned out to be fine
3) Therefore, if nowadays (some portion of) people are scared of AI, then they're wrong, AI is safe, because in the past some portion of the population was wrong about other technologies
I guess it would be called a non-sequitur?
Here's a more contrived example to make the fallacy more clear:
1) In the past (some portion of) people didn't think automobiles could obtain speeds of 50mph and they turned out to be wrong
2) In the past (some portion of) people didn't think automobiles could obtain speeds of 300mph and they turned out to be wrong
3) Therefore, nowadays, my claim that I have an automobile that will drive 10,000 mph must always be right, because in the past (some portion of) people were wrong about automobile progress.
I've been seeing lots of examples of this type of fallacy where the giveaway is people pointing out about how people in the past made bad predictions, which somehow means any predictions people are making today are also wrong. It just doesn't follow.
As a cognitive bias: That sounds like the "normalcy" (or normality) bias, "the refusal to plan for, or react to, a disaster which has never happened before" [0], [1].
As a logical fallacy: Based on skimming the list at [2], the "appeal to tradition" [3, 4, 5] fallacy seems close: "claim in which a thesis is deemed correct on the basis of correlation with past or present tradition". (Aka: appeal to tradition, argumentum ad traditionem, argumentum ad antiquitatem, back in those good times, conservative bias, good old days); or, maybe: argument from inertia, stay the course.
"Weak induction" might be a good way to describe this. "Hasty generalization" if you want a more common logical fallacy to name, though ironically it's a more general fallacy that doesn't perfectly describe the situation you're talking about.
Technically all inductive arguments are a non-sequitur, since (non-)sequitur is a deductive concept and inductive arguments are inherently non-deductive.
The article claims that AI is a “dual use” general purpose computing technology, and that it can be used for good and evil. To my knowledge, most technology can be used for good and evil depending on who is using it, so the article begs the question of how much of each we can reliably expect.
I’m not sure if there is a specific term, but it’s basically an example of the problem of induction and confirmation bias. The conclusion does not logically follow from its tenants. Basically, no matter how many white swans you’ve seen, you can’t use that information to prove that black swans don’t exist.
Hi Jeremy here - I wrote this article. The deeper I got into studying this, the more I realised the people writing the laws that will regulate AI actually don't really understand what they're regulating at all.
So I created this to try to at least help them create regulations that actually do what they think they're going to do.
California's SB 1047, which I analyse closely, currently totally fails to meet the goals that the bill authors have stated. Hopefully this will help them fix these problems. If you have views on SB 1047, you can make a public comment here: https://calegislation.lc.ca.gov/Advocates/
Let me know if you have any questions or comments.
Jeremy -- thank you for sharing this on HN. And thank you also for everything else you've done for the community :-)
I agree with you that one of the biggest issues -- maybe the biggest issue -- with the proposed legislation is that it fails to differentiate between "releasing" and "deploying" a model. For example, Jamba was released by AI21, and Llama3 was released by Meta. In contrast, GPT-4o was deployed by OpenAI and the latest/largest Gemini was deployed by Google, but neither model was ever released! We don't want legislation that prevents researchers from releasing new models, because releases are critical to scientific progress.
However, I'm not so sure that lack of understanding by politicians is the main driver of misguided legislation. My understanding is that politicians prefer to consider the opinion of "experts" with the most "impressive" pedigrees, like high-ranking employees of dominant tech companies, most of which don't want anyone to release models.
Interestingly enough, in this case none of the 3 sponsoring organizations are dominant tech companies. Rather, they are well-funded AI safety orgs -- although, to be fair, these orgs generally get their money from folks that are some of the biggest investors in Anthropic, OpenAI, and Google.
But I do have the impression they earnestly believe they're doing the right thing. The AI safety orgs have done incredibly effective outreach at universities for years, and have as a result gotten a huge amount of traction amongst enthusiastic young people.
Thank you. Like you, I also have the impression that folks at those AI safety orgs sincerely believe they're doing the right thing.
But I would ask them the same questions I would ask the politicians: Which "experts" did they consult to reach their conclusions? From whom did their "talking points" come?
I hope you're right that they and the politicians are willing to change course.
Jeremy -- this is interesting and worthwhile. Thank you!
In the same spirit (ignoring the question of whether this sort of attempted regulation is a good idea), I have a question:
Debating release vs. deploy seems a bit like regulating e.g. explosives by saying "you can build the bomb, you just aren't allowed to detonate it". Regulation often addresses the creation of something dangerous, not just the usage of it.
Did you consider an option to somehow push the safety burden into the training phase? E.g. "you cannot train a model such that at any point the following safety criteria are not met." I don't know enough about how the training works to understand whether that's even possible -- but solving it 'upstream' makes more intuitive sense to me than saying "you can build and distribute the dangerous box, but no one is allowed to plug it in".
(Possibly irrelevant disclosure: I worked with Jeremy years ago and he is much smarter than me!)
Yes I considered that option, but it's mathematically impossible. There's no way to make it so that a general purpose learned mathematical function can't be tweaked downstream to do whatever someone chooses.
So in that sense it's more like the behaviour of the pen and paper, or a printing press, than explosives. You can't force a pen manufacturer to only sell pens that can't be used to write blackmail, for instance. They simply wouldn't be able to comply, and so such a regulation would effectively ban pens. (Of course, there's also lots of ways in which these technologies are different to AI -- I'm not making a general analogy here, just an analogy to show why this particular approach to regulation is impossible.)
I would not say it’s impossible… my lab is working on this (https://arxiv.org/abs/2405.14577) and though it’s far from mature - in theory some kind of resistance to downstream training isn’t impossible. I think under classical statistical learning theory you would predict it’s impossible with unlimited training data and budget for searching for models but we don’t have those same gaurentees with deep neural networks.
That makes sense. Regulating deployment may simply be the only option available -- literally no other mechanic (besides banning releasing models altogether) is on the menu.
Jeremy, this is a great read, thank you! What do you think about the amended version from today that gives the FMD significantly greater authority on what would and would not be covered by the legislation? Any other specific recommendations for the legislation that would help protect open-source?
Edit to ask: Does it seem likely to you that one of the unintended consequences of the legislation as written is that companies like Meta will no longer open-source their models?
It'll take me a while to digest today's changes, so I don't have an educated opinion about them as yet.
Yes, companies like Meta will no longer be able to open-source their models, once their compute reaches the threshold, if the bill goes through and if the "covered model" definition is interpreted to include base models (or is modified to make that clear).
this belongs in major newspapers and media outlets. some PR campaign to get this message out there should prove fruitful. you can hire someone and just have them suggest articles to magazines and papers, who are always looking for content anyway. it's topical, urgent, convincing, and it comes from an authority in the field, so it checks all the boxes IMHO
There is no agreed definition of what an open source AI model is.
I guess if the models you mention would have to be packaged in Debian, they would end up in the non-free section, since you cannot rebuild them from the training data, which is not published.
> As you can see from this description, just like creating weights cannot be inherently dangerous (since they’re just lists of numbers), neither can running a model be inherently dangerous (because they are just mathematical functions that take a list of numbers as an input, and create a new list of numbers as an output). (And again, that is not to say that running a model can’t be used to do something harmful. Another critical technical distinction!)
Ah, the classic duality of big tech: Singlehandedly bringing upon the next stage in the evolution of mankind (to investors) while at the same time just tinkering in their garages on some silly, entirely inconsequential contraptions that do nothing more than turn ones and zeros into different ones and zeros (to regulators).
Well, one thing to keep in mind is what can be practically regulated and bring the desired results.
Here are some examples: firearms don't kill people on their own, but we regulate their production as well as distribution and use (analogous to release and deployment in terms of the article). We do this because regulating use alone would be impractical due to enforcement. This is because we'd rather prevent things from happening than punish perpetrators in this case.
Another example: we, generally, don't seek a just verdict when suing the insurance company of the driver who caused an accident by hitting the back of our car. Maybe it was the front car driver's fault -- the courts don't have time for that, and even if "unjustly" in many cases, they will still rule in favor of the front car driver.
So, is it practical to regulate at the level of deployment? -- I don't know... It would seem that to be on the safe side, it'd be better to find a way to regulate earlier. Eg. an autopilot combined with a drone with a dangerous payload: certainly, whoever launched the drone bears responsibility, but similarly as with the case with guns, perhaps there should be regulations in place that require licensing such programs in a way that children or mentally ill people couldn't obtain them?
You're basically making the "guns don't kill people, people kill people" argument, with LLMs instead of guns: "A gun on its own is just a mechanical device. Only by assembling it into a gun/ammunition/shooter system does it gain the potential to do harm, and only by performing the act of shooting an innocent bystander is harm actually done. Therefore, we should only regulate the act of loading a gun and shooting someone instead of mere possession or distribution of a firearm."
With firearms, the argument is usually rejected because such a regulation would obviously be impossible to enforce: If someone already has a gun and ammunition, they will just need a few seconds to load it up and pull the trigger. No cop could force them to only shoot at legitimate targets.
The analogue with LLMs would be: "An LLM on its own is just a collection of numbers. Only by deploying it into a software system does it gain the potential to do harm and only by executing the system and causing malicious output is the harm actually done. Therefore we should only regulate deployment of LLMs instead of storage and release."
You could make the same counter-argument as in the pro-gun case here, that such a regulation would be impossible to enforce: The interesting thing about open source LLMs is exactly that you can deploy them on your own hardware without having to bring any third party into the loop: Companies can deploy them in their own data centers, hobbyists on their own consumer machines, some person could just run Llama3 on their laptop solely for themself. There is no way a regulator could even detect all those deployments, let alone validate them.
That's why I find the argument disingenious: You could make a case that the harms caused by unregulated, home-deployed LLMs are much smaller than the benefits - but that would be a different argument. You're essentially arguing that the regulator should hamstring itself by leaving the part where regulation could actually be enforced unregulated (model training and release) and only regulate the "deployment" part that can't be enforced.
> These kinds of technologies, like AI models, are fundamentally “dual use”. The general purpose computation capabilities of AI models, like these other technologies, is not amenable to control.
I find that entire section to be misleading or even false. Comparing a N-billion or even trillion parameter model representing aggregated human knowledge and some (however limited) agentness when put in a harness like autogpt makes it a different category than a pen and paper.
Additionally it is not true that models are just has hard to control as a piece of paper. If millions of dollars are invested in each training run, the reseachers and the associated infrastructure then this clearly is not a simple piece of paper. It's more like sat tech covered by ITAR or semiconductor tech which is also export-restricted.
It is pretty hard to build anything that can compete with decent LLMs, so indeed. The section you quote is the only argument in favor of allowing release of LLMs, and it rests on the assumption that LLMs will be like "all these technologies have been, on net, highly beneficial to society." This is a purely hypothetical extrapolation from cherry-picked examples. It is intellectually dishonest.
Releasing the actual models without deploying them (in terms of the article) still allows bad actors to generate large amounts of spam and disinformation, to mimic voices and generate revenge porn, to mention but a few risks.
And the author had better not reply that that's FUD:
> A restriction on Californian AI would mean that Americans would have to increasingly rely on Chinese software if they wanted full access to models.
If someone is generating spam and disinformation, or mimicking voices and generating revenge porn, it is a deployment under the definition of 'deploy' in the article. So under the proposal there, this would be regulated.
But you can't deploy it if it hasn't been released. If the regulator allows a release, someone can deploy it outside the jurisdiction. The internet doesn't end there. It's pointless. Worse, it's deceit.
Unfortunately the EU has already rammed their legislation through. The US is always going to be compared to EU propaganda about "how they protect the people".
Maybe I am naive about the progress in this space, but we should not use the word "AI" first because it adds to the confusion many people have about DNN based programs. So called AI is not much different from many software we're using in a sense that you give an input to the program, then it spits out the output. When I think about AI, I think of animal intelligence (no pun intended) that dogs or other mammals have.
A spider has intelligence too. It's far more limited than a mammal's, but it's still on the same spectrum.
And intelligence is not a single linear measure. AIs outperform humans on some tasks and are worse than rats at others. So it's more like that AIs have this weirdly shaped higher-dimensional capability surface that partially overlaps with what we consider intelligence. Haggling about which exact overlap gets to be called intelligence and which doesn't seems like... unproductive border-drawing on a poorly charted concept-map. Considering that they're getting more powerful with each year and such policies are about future capabilities, not just today's.
This is exactly my point about the word AI. They should not use the word AI to describe LLMs or any other generative models. Then again, words evolve to mean different things over time, so AI is a fine term to stick with.
> These kinds of technologies, like AI models, are fundamentally “dual use”.
It is certainly true that technologies can be used for good and evil. But that doesn’t mean that in practice good and evil benefit equally. “Dual use” implies a more or less equal split, but what about a good/bad 10/90 or 1/99 split? Technology, at its core, makes accomplishing certain tasks easier or harder, and besides the assertion of dual use, the article doesn’t really justify AI models being equally good and bad.
In the Soviet Union, a large percentage of the population was used for surveillance. The U.S. had surveillance too, but less. Technological limitations made surveilling every person prohibitively expensive. Police couldn’t just surveil everyone.
Today, surveillance is not only ubiquitous but better. It is possible to track millions of people in near real time. So this technology has caused a decrease in the cost and scalability of mass surveillance, which in conjunction with the third party doctrine (read: loophole) has the emergent effect of neutering the 4th amendment.
What makes this hard/impossible is anticipating likely applications, which is why I lean towards not regulating. However, we should recognize the possibility of a moral hazard here: by shielding industry from certain consequences of their actions, we may make those consequences more likely in the future.
> The general purpose computation capabilities of AI models, like these other technologies, is not amenable to control.
Sure. And we can’t stop people from posting copyrighted material online, but we can hold people accountable for distributing it. The question in my mind is whether we will have something like Section 230 for these models, which shields large distributors from first-pass liability. I don’t know how that would work though.
1) In the past (some portion of) society was scared about the printing press, and it turned out to be fine
2) In the past (some portion of) society was scared about the internet, and it turned out to be fine
3) Therefore, if nowadays (some portion of) people are scared of AI, then they're wrong, AI is safe, because in the past some portion of the population was wrong about other technologies
I guess it would be called a non-sequitur?
Here's a more contrived example to make the fallacy more clear:
1) In the past (some portion of) people didn't think automobiles could obtain speeds of 50mph and they turned out to be wrong
2) In the past (some portion of) people didn't think automobiles could obtain speeds of 300mph and they turned out to be wrong
3) Therefore, nowadays, my claim that I have an automobile that will drive 10,000 mph must always be right, because in the past (some portion of) people were wrong about automobile progress.
I've been seeing lots of examples of this type of fallacy where the giveaway is people pointing out about how people in the past made bad predictions, which somehow means any predictions people are making today are also wrong. It just doesn't follow.