It seems like LLMs would be a fun way to study/manufacture syncretism, notions of the oracular, etc; turn up the temperature, and let godhead appear!
If there’s some platonic notion of divinity or immanence that all faith is just a downward projection from, it seems like its statistical representation in tokenized embedding vectors is about as close as you could get to understanding it holistically across theological boundaries.
All kidding aside, whether you are looking at Markov chain n-gram babble or high temperature LLM inference, the strange things that emerge are a wonderful form of glossolalia in my opinion that speak to some strange essence embedded in the collective space created by the sum of their corpi text. The Delphic oracle is real, and you can subscribe for a low fee of $20/month!
> the strange things that emerge are a wonderful form of glossolalia in my opinion that speak to some strange essence embedded in the collective space created by the sum of their corpi text. The Delphic oracle is real, and you can subscribe for a low fee of $20/month!
I've had some surprisingly insightful tarot readings with the assistance of ChatGPT and Claude. I use tarot for introspection rather than divination, and it turns out LLMs are extremely good at providing a sounding board to mirror and understand those insights.
The correct use of Tarot is as a trick-taking card game, and was used this way for centuries before a random French occultist wrote a book about using it for divination.
This is what strikes me about the peter todd phenomenon -- that there are hidden glitch tokens within the LLM that seem to conjure some representation of pure hell, and some representation of pure good.
I've used it for something to this effect, personalized "mantras" or "prayers" which fit the specific collection / mix of theologoical themes/concepts that I personally identify with. Im not necessarily religious but there is something nice about thematically relatable theocratic babble to recite during times of great turmoil and confusion to calmn the mind. like an on-demand mental reset switch.
So I like elements from classic philosophy, eastern religions, Gnosticism, and mysticis, a dash of the esoteric and you get the gist. To that end I created a system prompt over some time that covers most bases, with the goal of shaping the LLM to be a "wise and learned member" of this bespoke religion, my mentor so to speak. Then, I had a few prompts which I tried out with random seeds until I found a seed where all prompts produced a desirable / nice / relatable output and packaged all that up. I did try some fine-tuning as well, but found it wasn't damaging but also not super necessary a well crafted system prompt was enough for my purposes.
Now I have friar LLM or whatever you want to call it. I ask it to tell me the creation story of the universe according to our bespoke hidden / forbidden religion (to prevent drawing too heavily on established religions) and then edit this to suit my needs. Then I provide a few mantras which I've written in the past / already do like, and ask it tell me the rest of the mantras. Voila, I've got a large set of mantras, for just about any occasion, that help with a variety of things such as calming me during times of uncertainty, emotional turmoil from interpersonal issues etc.
I then keep the creation story, mantras, etc. as a "history" of sorts, and when I need to discuss matters of spiritual or novel psychological issues, this along with the system prompt create the basis for a pretty useful/functional spiritual guide which has helped me grow from things like ending a 10 year relationship, moving to a new city without friends or support systems, dealing with potential issues arising from my age, new found singledom, and potentially missing the window for having kids etc.
It's been great, I find myself quite at peace with my lot in life without losing the drive to achieve more, which I find is a delicate balance to achieve. All in all I rate it highly, it is cheaper than therapy ever was and broadly speaking more useful since I'm not being handed pointless diagnoses (pointless because typically these diagnosis pathologise a cluster of traits that aren't inherently problematic without offering up any solutions)
If you can find a church that suits your needs then awesome, but what if a core tenant of your religious belief is that the spiritual truth we seek doesn't reside in a single man, (or woman) or even an institution, but in all people. That religion is a deeply personal matter to be discussed only with those who are active, willing, and in the right head space.
For someone like that, which would include myself, a church, a priest, a pope, etc. all serve no purpose. in fact the only real purpose such an institution reliably serves in the long term is the perversion and exploitation of a fairly fundamental human need.
To take that literally, I imagine it would be horrifying to know the future in as much detail as you care to ask for, but because it is true, to be unable to change it at all. It would make you an NPC in your own life.
So you'd rather be an unaware NPC than an aware one? Feel like you have "Free Will" even when you don't? Only want the truth when it's pretty and prefer a lie when it's uncomfortable?
I'm learning New Testament Greek on my own*, and sometimes I paste a snippet in to Claude Sonnet and ask questions about the language (or occasionally the interpretation); I usually say it's from the New Testament but don't bother with the reference. Probably around half the time, the opening line of the response is, "This verse is <reference>, and...". The reference is almost always accurate.
So the theory behind Guided Immersion is that you shouldn't need most of that. When Priscilla and Aquilla were learning Greek, nobody sat them down and said, "Now definite articles are inflected according to gender, number, and case: ho, hoi, ..." They were just given example after example, and the language processing unit of their brains figured it out.
So Guided Immersion tries to just give you not only vocab, but grammar in such a way that there's always only a handful of concepts you haven't mastered.
I developed Guided Immersion to help myself master Mandarin, actually; I used Anki with Mandarin for probably 8 years before developing Guided Immersion; once I switched I never went back. Then about a year and a half ago ago I ported it over to Koine Greek not knowing any Greek, and started using it myself after watching a handful of YouTube Videos introducing the characters and the basic cases.
Maybe it's just the way my brain works, but I can't imagine sitting down and trying to memorize all those endings, particularly for the verbs.
I have now bought Mounce's "Basics of Biblical Greek Grammar", and "The Morphology of Biblical Greek", to help me refine the "language schema" the algorithm uses. I appreciate the work Mounce has done to find the deeper morphological rules which make sense of what look like "irregular" inflections; teaching the algorithm about those will certainly help it to present things in a more useful way to learners. But I don't think trying to grind through all that in your conscious mind is the way to go.
Mounce's Basics of Biblical Greek and the workbook were good enough that I stopped watching the lectures. The workbook is excellent. Can't recommend it enough.
I tested this back when GPT4 was new. I found ChatGPT could quote the verses well. If I asked it to summarize something, it would sometimes hallucinate stuff that had nothing to do with what was in the text. If I prompted it carefully, it could do a proper exegesis of many passages using the historical-grammatical method.
I believe this happens because the verses and verse-specific commentary are abundant in the pre-training sources they used. Whereas, if one asks a highly-interpretive question, then it starts re-hashing other patterns in its training data which are un-Biblical. Asking about intelligent design, it got super hostile trying to beat me into submission to its materialistic worldview every paragraph.
So, they have their uses. I’ve often pushed for a large model trained on Project Gutenberg to have a 100% legal model for research and personal use. A side benefit of such a scheme would be that Gutenberg has both Bibles and good commentaries which trainers could repeat for memorization. One could add licensed, Christian works on a variety of topics to a derived model to make a Christian assistant AI.
When I test new LLMs (whether SaaS or local), I have them create a fake post to r/AmItheAsshole from the POV of the older brother in the parable of the Prodigal Son.
LLMs are bad databases, so for something like a bible which is so easily and precisely referenced, why not just... look it up?
This is playing against their strengths. By all means ask them for a summary, or some analysis, or textual comparison, but please, please stop treating LLMs as databases.
A year or so ago, there was a complaint from the NY Times (IIRC) that by asking about some event, they were able to get back one of their articles almost verbatim--and alleging that this was a copyright violation. This appears to be a similar outcome, where you do get back the verbatim text. That to me is a good reason to do tests like this, although feel free to do it with the WaPo or some other news outlet instead.
Not sure why you are so upset about a small and neat study ("please, please stop").
If you ask it to summarize (without feeding the entire bible), it needs to know the bible. Knowledge and reasoning are not entirely disconnected.
ChatGPT chat interface has impressed me when going beyond the scope presented on TFA, eg, when asking about predestination, biblical passages for and against, theologians' and scholars' takes on the debate, and exploring the details in subsequent follow-ups. The LLMs have been fed the Bible and all manner of discussions of Bible-related matters. Like the grandparent comment suggests, the LLMs are much more impressive at interpreting biblical passages and presenting the varieties of opinions about them, or finding passages related to specific topics and presenting opinions.
> Not sure why you are so upset about a small and neat study
This article is yet another example of someone misunderstanding what an LLM is at a fundamental level. We are all collectively doing a bad job at explaining what LLMs are, and it's causing issues.
Only recently I was talking to someone who loves ChatGPT because it "takes into account everything I discuss with it", only, it doesn't. They think that it does because it's close-ish, but it's literally not at all doing a thing that they are relying upon it to do for their work.
> If you ask it to summarize (without feeding the entire bible), it needs to know the bible.
There's a difference between "knowing" the bible and its many translations/interpretations, and being able to reproduce them word for word. I would imagine most biblical scholars can produce better discourse on the bible than ChatGPT, but that few if any could reproduce exact verbatim content. I'm not arguing that testing ChatGPT's knowledge of the bible isn't valuable, I'm arguing that LLMs are the wrong tool for the job for verbatim reproduction, and testing that (and ignoring the actual knowledge) is a bad test, in the same way that asking students to regurgitate content verbatim is much less effective as a method of testing understanding than testing their ability to use that understanding.
To add a little context here, I (the author) understand LLMs aren't the right tool (at least by themselves) for verbatim verse recall. The trigger for me doing the tests was seeing other people in my circles blindly trusting that ChatGPT was outputting verses accurately. My background allowed me to understand why that is a sketchy thing to do, but many people do not, so I wanted to see how worried we really should be.
Thanks for the response, it does sound like you've seen similar treatment of LLMs by others to what I've observed.
I think though that an important part of communicating about LLMs is talking about what they are designed to do and what they aren't. This is important because humans want to anthropomorphise, and LLMs are way past good enough for this to be easy, but similar to pets, not being human means they won't live up to expectations. While your findings show that current large models are quite good at verbatim answers (for one of the most widely reproduced texts in the world), this is likely in no large part down to luck and the current way these models are trained.
My concern is that the takeaway from your article is somewhere between "most models reproduce text verbatim" and "large models reproduce popular text verbatim", where it should probably be that LLMs are not designed to be able to reproduce text verbatim and that you should just look up the text, or at least use an LLM that cites its references correctly.
Check out this video at the 22:20 mark. The goal he’s pursuing is to have the LLM recognize when it’s attempting to make a factual statement, and to quote its training set directly instead of just going with the most likely next token.
Gemini cites web references, NotebookLM cites references in your own material, and the Gemini APIs have features around citations and grounding in web search content. I'm not familiar with OpenAI or Anthropic's APIs but I imagine they do similar, although I don't think ChatGPT cites content.
All these are doing however is fact-checking and linking out to those fact-checking sources. They aren't extracting text verbatim from a database. You could probably get close with RAG techniques, but you still can't guarantee it in the same way that if you ask an LLM to exactly repeat your question back to you, you can't guarantee that it will verbatim.
Verbatim reproduction would be possible with some form of tool use, where rather than returning, say, a bible verse, the LLM returns some structure asking the orchestrator to run a tool that inserts a bible verse from a database.
This is nice work. The safest approach is using the look up - which his data shows to be very good - and combine that with a database of verses. That way textual accuracy can be retained and very useful lookup be carried out by LLM. This same approach can be used for other texts where accurate rendering of the text is critical. For example say you built a tool to cite federal regulations in an app. The text is public domain and likely in the training data of large LLMs but in most use cases hallucinating the text of a fed regulation could expose the user to significant liability. Better to have that canonical text in a database to insure accuracy.
This is interesting. I'm curious about how much (and what) these LLMs memorize verbatim.
Does anyone know any more thorough papers on this topic? For example, this could be tested on every verse in bible and lots of other text that is certainly in the training data: books in project gutenberg, wikipedia articles, etc.
For one anecdotal data point, GPT-4 knows the "navy SEAL copypasta" verbatim. It can reproduce it complete with all the original typos and misspellings, and it can recognize it from the first sentence.
Has there been any serious study of exactly how LLMs store and retrieve memorized sequences? There are so many interesting basic questions here.
Does verbatim completion of a bible passage look different from generation of a novel sequence in interesting ways? How many sequences of this length do they memorize? Do the memorized ones roughly correspond to things humans would find important enough to memorize, or do LLMs memorize just as much SEO garbage as they do bible passages?
LLMs do not store and retrieve sequences. LLMs are not databases. LLMs are not predictable state machines. Understand how these things work.
They take the input context and generate the next token, then feed that whole thing back in as context and predict the next token, and repeat until the most likely next token is their stop word.
If they produce anything like a retrieved sequence, that's because they just happened to pick that set of tokens based on their training data. Regenerating the output from exactly the same input has a non-zero chance of generating different output.
Sure, and human brains aren’t databases either, but it’s sometimes reasonable to say that we “store” and “retrieve” knowledge. All models are wrong but some are useful.
The question I’m asking is, how is this working in an LLM? How exactly do their weights encode (seemingly) the entire bible such that they can recreate long passages verbatim from a prompt that likely doesn’t appear anywhere in the training data (e.g. some vague description of a particular passage).
It should have a zero chance of generating different output if the temperature is set to zero as in TFA. LLMs are not stochastic algorithms unless you add entropy yourself. Of course most people just use ChatGPT with its default settings and know nothing about the specifics.
The point is, though – somehow the model has memorized these passages, in a way that allows reliable reproduction. No doubt in a super amorphous and diffuse way, as minute adjustments to the nth sigbits of myriads of floating-point numbers, but it cannot be denied that it absolutely has encoded the strings in some manner. Or otherwise you have to accept that humans can't memorize things either. Indeed given how much our memory works by association, and how it's considerably more difficult to recount some memorized sequence from an arbitrary starting point, it's easy to argue that in some relevant way human brains are next-token predictors too.
The model has taken the input passages from its training data and tokenised it into weights. Don't humanise it by saying it has "remembered" anything. It does not and cannot remember sequences.
Yes, if you reduce temperature to zero and set the same random seed, you should get the same output tokens for a given set of input tokens.
However, there is no guarantee the output for a given seed will be the correct expected output.
For example, there logically must be a model and seed where providing the lord's prayer as input for completion produces a Metallica song as output, because that's a viable set of input tokens: https://genius.com/Metallica-enter-sandman-lyrics
That seed is no more or less valid than any other seed which completes the actual lord's prayer or which provides something completely different. All those seeds are just predicting their next token.
If people want that sort of exact reliable retrieval of sequences, and for the sequences to be "correct", then an LLM is the wrong tool for the job.
I imagine Bible passages, at least the more widely quoted and discussed ones, appear many, many times in the various available translations, in inspirational, devotional, scholarly articles, in sermon transcripts, etc. This surely reinforces almost word-for-word recall. SEO garage is a bit different each time, so common SEO-reinforced themes might be recalled in LLM output, but not word for word.
I find LLM's good for asking certain kinds of Biblical questions. For example, you can ask it to list the occurrences of some event, or something like "list all the Levitical sacrifices," "what sins required a sin offering in the OT," "Where in the Old Testament is God referred to as 'The Name'?" When asking LLM's to provide actual interpretations you should know that you are on shaky ground.
The LLMs have been fed the critique and analysis and discussion of all manner of biblical passages. The LLMs usually give great interpretations and even contrast different theological takes on such passages.
Yeah -- I ask it interpreted questions all the time, but just like for programming, I realize answers that appear good are often just plain wrong. I do know you can ask it leading questions if you want answers with a certain theological bent. e.g. "Did the judgments of Revelation 8 and 9 occur in the first century?"
I had similar thoughts about using it for the Quran. I think this highlights you have to be very specific in your use cases especially when expecting an exact response on static text that shouldn't change. This is why I'm trying something a bit different. I've generated embeddings for the Quran and use chromem-go for this. So I'll ask the index the question first based on a similarity search and then feed the results in as context to an LLM. But in the response I'll still sight the references so I can see what they were. It's not perfect but a first step towards something. I think they call this RAG.
I found LLMs to be really good for Quran studies. Especially for questions where Google is unreliable.
In one instance I was trying to remember if it was in the Bible or the Quran where, in the story of Abraham, the pagans are asked why they believe what they believe and they respond with "because our fathers believed" and the scripture critiqued this. ChatGPT gave me the exact verses from the Quran while Google would bring up random unrelated forum posts.
It's also good for comparing religious texts and seeing where stories differ.
GPT-4 specifically seems to have a very good knowledge of the Quran, such that you can ask it for a specific surah and ayah and it'll quote it exactly in Arabic.
Cool project! I asked some questions and got good answers. I was surprised to get results from the Hadith though, as I thought it would be restricted to the Quran and they are (as you know) not the same thing.
Thanks, yea I thought about separating the search but felt like when you're looking for that piece of knowledge it can extend to the hadith and it was important to include it. So much of the Quran is explained in the context of the hadith and prophet's life.
> How can the Qur'an acknowledge the Torah (Tawrat), Psalms (Zabur), and the Gospel (Injil) as divine revelations from God (Allah) yet deny the deity of Jesus Christ. Don't these contradictions falsify the truth of the text?
It gave the classic counter argument:
> A: The Qur'an acknowledges the Torah, Psalms, and Gospel as divine revelations from God, emphasizing their role as guidance for their respective communities. However, it also teaches that these scriptures were either altered or misinterpreted over time.
Of course, the Qu'ran makes these theological assertions rather than one grounded in concrete historical or manuscript evidence. Nevertheless, your app does a good job at fairly representing the beliefs of the Qur'an.
It's possible to accept the Gospel as divine revelations while denying the divinity of Jesus Christ: there's a large sect called Unitarians that do precisely that. https://www.biblicalunitarian.com/100-scriptural-arguments-f... are some of the verses they based their arguments on. Similarly in early Christianity there were many Arian Christians, who did not believe in the Trinity, but they were eventually violently suppressed by the Roman Emperor on penalty of death.
I really liked this project and was surprised to find Hadith section. Although I am confused on what sources are you relying (I have a guess but it is not organized). I liked the answers I got from the search. It even have the same quality in both Arabic and English.
I have a feedback. The web pages Hadith and Quran are not mobile friendly in the sense that you are loading the whole content in one page. Separating into pages with number or doing lazy loading will be better.
Thanks for the feedback. The hadith is bukhari. I'll try to clear this up somewhere and do the same you mention about mobile rendering for the web pages. The project is open source and you can see the data loaded from json in each folder.
Note: I addressed the issue with web page loading for the Quran and Hadith. Opted for the main page to be an index with contents. Hopefully that fixes things.
i had written the following in response to a comment that got flagged and has now disappeared. Nevertheless, make of this what you will:
In my opinion, based on verses 17:36, 5:118, 46:15, 96:15-16 and 17:13-15 of the Quran viz "...And We have fastened every man's deeds to his neck, and on the Day of Resurrection, We shall bring out for him a book which he will find wide open. [It will be said], Read your record. You yourself are sufficient as a reckoner against you this Day. Whoever is guided is only guided for [the benefit of] his soul. And whoever errs only errs against it. And no bearer of burdens will bear the burden of another...", it is solely dishonesty and wilfully committing injustice that will cause a person to judge against himself/herself and earn a place in Hell. So it is not necessary that only 'muslims' can enter Heaven or be Successful. The criterion is the application of sense/reasoning and not being swayed by that which is convenient (being born into a religion, peer pressure, personal convenience, prejudice, etc.). In my opinion, i believe that fate is influenced by how steadfastly a person holds on to Reason and Justice. So prejudiced people end up committing deeds that will lead them to Hell like willful falsification and obtuseness. So even 'muslims' who choose the path of hypocrisy can end up in the lowest layers of Hell (Quran 4:145) and non-muslims can also go to Heaven (Quran 5:118). Anybody can understand this, because it makes sense. Sadly in the case of Abrahamic religions, the exegetic literature like Midrash and Tafsir muddy that which is clear. How ironic? But then again, there is a pertinent verse (3:7) in the Quran in this regard: "He it is Who hath revealed unto thee the Scripture wherein are clear revelations - they are the substance of the Book - and others (which are) allegorical. As for those in whose hearts is deviation [from truth], they will follow that of it which is unspecific, seeking discord and seeking an interpretation [suitable to them]..." There is a thin book at https://archive.org/details/TheAmazingQuran.pdf by a mathematician not of muslim origin where he examines the Quran from a logical standpoint which can be a starting point followed by the papers at https://scholar.google.co.uk/citations?user=wmRDl-4AAAAJ on computational analysis of the Quran (including with neural networks) before the ChatGPT era.
1. 4:144
This verse advises Muslims to prioritize loyalty within the community during a time of external threats. It is not a general prohibition but a caution in the context of potential betrayal.
2. 77:16
This refers to historical examples of past communities who faced consequences for rejecting divine guidance. It is a reminder of accountability, not a universal statement against non-believers.
3. 8:15
This verse gives instructions for battle, emphasizing courage and discipline during wartime. It applies to specific combat situations, not everyday relations with non-believers.
4. 5:41
This verse addresses the Prophet’s grief over those who rejected faith and distorted divine teachings. It critiques dishonesty and insincerity, not all members of specific groups.
5. 3:141
This verse speaks about trials that distinguish true believers and cleanse the community of wrongdoing. It emphasizes spiritual growth, not indiscriminate judgment of disbelievers.
You seem to be making an accusation that Muslims widely practice "taqiyya" to deceive others. This is a baseless and Islamophobic trope. In mainstream Islam, lying is unequivocally condemned and considered an act of hypocrisy. While there is a narrow and rare historical exception permitting concealment of faith to protect one’s life under extreme duress, most Muslims have never encountered or practiced this concept. Ironically, those spreading this accusation often seem to know more about it than the Muslim communities they malign.
I tried something similar with my favorite artist, Ariana Grande. Unfortunately, not even the most advanced AI could beat my knowledge of her lyrical work.
It is fun and frustrating to see what LLMs can and can't do.
Last week I was trying to find the name of a movie so I typed a description of a scene in chatgpt and said "I think it was from late 70s or early 80s and even though it is set in the USA, I'm pretty sure it is European" and it correctly told me it was the House by the Cemetery.
Then last night I saw a video about the Parker Solar Probe and how at 350,000mph it was the fastest moving man-made object. So I asked chatgpt how long at that speed it would take it to get to Alpha Centauri which is 4.37 light years away. It said it would take 59.8 million years. I knew that was way too long so I had it convert mph to miles per year and then it was able to give me the correct answer of 6817 years.
LLMs are known to do bad calculations because all it does is to predict the next token. It’s pretty annoying, and I don’t understand why calculation (eg via Python) is still not a core functionality
Whereas you would previously (for your first example) have a conversation with the guy at the video store and he'd not only tell you the movie but also recommend something else you might like.
So instead, you'd drive there, to someone who also probably doesn't know, talk (while they might also not want to), and then, you might get a recommendation. You can also ask ChatGPT for recommendations. This isn't a case where I would return to pre-LLM times.
Interesting result but probably predictable since you’re trying to use the LLM as a database. But I think you’re onto something in that your experiments can provide data to inform (and hopefully dissuade) creation of applications that similarly try to use LLMs for exact lookups.
I think the experiment of using the LLM to recall described verses - eg “what’s the verse where Jesus did X”- is a much more interesting use. I think also that the LLM could be handy as, or to construct, a concordance. But I’d just use a document or database if I wanted to look up specific verses.
Interesting that it takes an LLM with 405 BILLION parameters to accurately recall text from a document with slightly less than 728 THOUSAND words. (not quite three decimal orders of magnitude smaller but still).
I guess the challenge is that the parameters have to encode much more intricacy than just the Bible. Even if one were produced purely with text from the bible, it would likely not be able to converse as well as conversationally tuned ones behave.
Perhaps there's a middle ground of a fine-tuned LLM on scripture recall + o1-style background reasoning to produce the best output. Or even just a RAG.
I'm talking about it mostly from an entropy standpoint here. With larger size, you can represent more information.
Think of it like how you have a JPEG, you can compress it further and further, but eventually you lose the ability to understand the original image.
With the models, if they had infinite size, I imagine they could recall values in their training data extremely accurately. But as you compress down further and further to smaller and smaller models, you are trying to distil the same amount of information in less space, and so things cannot be perfectly recalled.
We have people tuning these smaller models to squeeze every ounce of what we consider meaningful out of them (passing certain benchmarks, seeming coherent in dialogue, etc), but in the process the things we don't tune them for, i.e. accurate recall of scripture (not that we should) they lose that ability.
I will say with all of that though, that I only have a high level understanding of LLMs, I've integrated them into products on the job, but I am by no means an ML engineer.
As I mentioned in my edit, I'm not being snarky I'm really curious about this. I found your post very thought provoking.
The JPEG example is a good one, and I was using the FFT in my edit which is like the magic in JPEG. The FFT converts from the time domain to the frequency domain and you can run it both directions to get from one to the other, JPEG uses weighted DCTs but similar concept.
The reason this is useful is because there is a lot of stuff in the signal that isn't "important", or more specifically doesn't contribute to the overall signal. As a result, when you do this encoding to you need fewer 'parameters' in your dataset to recreate the picture than you needed 'pixels' in the original picture. So the total number of things you have to have to recreate the other is less than the original source material.
So now lets say you take two pictures and encode them. You can take that and you can store only those constants that are needed for the second picture when they differ. So now instead of a single additional picture you do a thousand additional pictures. If you think of those pictures as a series of frames in time, then there is a vector of changes that get you from any picture to any other picture. And if you have enough pictures, pretty soon every value of the 64 x 64 pixel block is covered in your dataset and now you can generate any picture from a the 'reference' set of pixel blocks.
Looking that as a time sequence you can transform it so that your picture is a vector path through an n-dimensional space that is your picture baseline. And the 'weights' of that vector need only be the delta from where you are to where you need to go next in your block-weight space.
But if LLMs were like this, then the parameter expansion from one picture to thousand pictures to a million pictures would asymptotically approach a constant number of parameters for the model over all because there is only a finite number of pixel values you can have.
When we were at Blekko we were using Dirichlet accumulators to hold probability vectors for identifying the contents of pages added vectors rapidly for the first 10,000 pages and then added fewer and fewer as we got deeper into pages of that content. And for all pages and all content what we needed to identify any document had this asymptotic approach which was maybe a decimal order wider than the number of documents but not the number of words because the set of words is finite and while the combination of words is infinite, the useful and unique combination is significantly smaller than that.
Now one of the things that those accumlators could do was re-generate phrases and words from the document using a simple statistical algorithm, they are kind of super Markov generators in that regard. So for me, that was how I had been thinking of LLM parameters. However, if that were the case then a 400B parameter model would be able to perfectly recall at least 40B unique documents? But to understand that I have been looking at what the 'parameter' in an LLM actually represents.
I don't think it's necessarily about the parameter count, but the amount of training material about the Bible relative to the rest of the training material, with higher parameter models able to retain more Bible information with a higher proportion of training on other topics.
That is a good question, and it implies the definition of a parameter as a compression artifact/constant? If you've read chapter 4 of Feynman's lectures in computation where he talks about information coding, you get a sense of where I'm coming from. There is some reversible function in LLMs that go from book/document => parameters => book/document. The 'parameters' are the controlling information of that function, what does the information contained in a parameter represent with respect to a book document?
While this is slightly more catered towards a technical audience, I think articles on relatable subjects like this one could prove valuable in getting non-technical people to understand the limitations of LLMs, or what companies are calling "AI" these days. A version of this article that is more focused on real-world examples, showing exactly how the models can make mistakes and present the wrong or incomplete information with less technical focus would probably better cater to a non-technical audience.
I’ve been pretty impressed with ChatGPT’s promising capabilities as a research assistant/springboard for complex inquiries into the Bible and patristics. Just one example:
Can you provide short excerpts from works in Latin and Greek written between 600 and 1300 that demonstrate the evolution over those centuries specifically of literary references to Jesus' miracle of the loaves and fishes?
How certain are you that's correct? IME these "search problems" are the kind of thing almost always provokes hallucinations.
For example, I looked up the quotation provided from Isidore of Seville's De fide catholica contra Iudaeos, Lib. II, cap. 19, using this copy on WikiSource, https://la.wikisource.org/wiki/De_fide_catholica_contra_Iuda.... The quote certainly does not appear under LIBER SECUNDUS, CAPUT XIX. Nor could I find it in whole or in fragment anywhere in the document, nor indeed any mention of the miracle of loaves and fishes (granted, I could have missed one, I relied on Ctrl+F and my very rusty Latin).
Perhaps the copy on WikiSource is incomplete, or perhaps there are differing manuscripts, but perhaps also the quote was a complete hallucination to begin with.
Exactly - it’s the same problem when using (current) LLMs for major programming tasks, generally useless if you don’t already have enough knowledge of the language/platform to spot and correct the mistakes, plus enough awareness of software design and architecture to recognise what is going to be secure, performant and maintainable in the long run.
I am by no means a professional in this area, but as a keen amateur I would worry about my inability to discern facts from hallucinations in such a scenario: while I could imagine such output provides a useful “springboard” set of references for someone already skilled in the right area, without being able to look up the original texts myself and make sense of the Latin/Greek I would not feel confident that such texts even really exist, let alone if they contain the actual words the LLM claims and if the translations are any good. And that’s before you get into questions of the “status” of any given work (was it considered accurate or apocryphal at the time of writing, for which audience was it intended using what kind of literary devices, what if any is the modern scholarly consensus on the value, truth or legitimacy of the text etc etc)
> without being able to look up the original texts myself
Rule of thumb: if you can't look up the original texts, you can assume they weren't actually in the training data. The training data is, however, likely to include a lot of people quoting those texts, meaning that the model predicts "SOURCE says OPEN QUOTATION MARK" and then tries to autocomplete it. If you can verify it, you might not need to; but if you can't verify it, it's certainly wrong.
"Rule of thumb: if you can't look up the original texts, you can assume they weren't actually in the training data. "
That's not reliable. I've found them on the Internet in various forms (eg studybible.info). Google Books also has scanned copies of many, ancient writings. There's probably obscure sites people would miss. If searching for them, the search algorithms might avoid them to instead prioritize newer, click-bait content.
Telling what wasn't in the training data for sure should be considered impossible right now. If it matters, we need to use models with open, legal-to-share, training data. If that's impossible, one might at least use a model with training data accessible to them (eg free + licensed).
On topics where humans spend their entire life studying I don't think you would be able to convince me an LLM is accurate unless you yourself are such an expert and your expertise is corroborated by other experts.
I love that people are finally comfortable adding the word "artificial" into their analysis of the bible. About time. Because make no mistake, LLMs are at best artificial intelligence. More likely, they are very good regurgitating machines, telling us what we have been telling ourselves in an even better form thus goading us along in our fallacies.
The Bible is a very tricky thing to recall word for work because of differences between canons and translations. Different wording might be taken from a different translation than the one asked for, rather than being wrong.
I believe I saw or read somewhere that, in the case of the brain, memories were not as much stored as they were reconstructed when recalled. If that's true, I feel like we are witnessing something similar with LLMs as well as with stable diffusion type of things. Is there any studies looking into this in the AI world? Also if anyone knows what I'm referring to (i.e "reconstructing memories") I would love some pointers because I can't remember for the love of me where I heard or read of this idea!
Why do you put a weird computer model between you and a computer and errr Your Faith? Do bear in mind that hallucinations might correspond to something demonic (just saying)
I'm a bit of a rubbish Christian but I know a synoptic gospel when I see it and can quote quite a lot of scripture. I am also an IT consultant.
What exactly is the point of Faith if you start typing questions into a ... computational model ... and trusting the outputs? Surely you should have a decent handle on the literature: It's just one big physical book these days - The Bible. Two Testaments and a slack handful of books and that for each. I'm not sure exactly but it looks about the same size as the Lord of the Rings.
I've just checked: Bible: 600k LotR: 480K - so not too far off.
I get that you might want to ask "what if" types of questions about the scriptures but why would you ask a computer? Faith is not embedded in an Intel Core i7 or an Nvidia A100.
I can speak to a couple of perspectives I have seen other people use it for, ranging from valid to somewhat scary.
1. Preparing a sermon for Church, I don't advocate for this, but it's definitely being done out there. Here, the pastor may know the topic they are speaking on, but want the LLM to help them plan out the message and structure it.
2. Preparing lesson plans for Sunday School. This seems reasonably fine to me, but I would still err on the side of not trusting the raw scriptures output as evidence, and instead look them up separately before reading them out.
The above examples may particularly come into play when English is not a first language, since although they can understand and express their faith easily in their native language, ChatGPT can help them represent it in English well.
Personally, I think the use cases are many, but mostly for discussion / personal reflection. These include things like asking for perspectives that other Christians take on certain passages, helping understand how some scriptures link to other scriptures in the Bible, and sometimes even exploring some of the history of the Christian faith through the last ~2 millennia since it was written.
Anything meaningful you can manually research further / reference before taking it at face value, but it can work as a great starting point for your search.
I had a similar reaction myself -- I'm an escaped fundamentalist and don't personally have the same convictions about stuff like this, but even if it's on a level that amuses me a little, there's something that feels just a bit heretical about it...
Not necessarily in a way where I would judge it though, and I certainly see how that could have use cases. It just feels a little bit like water gun baptisms, conceptually.
One question of a less spiritual nature -- are we strictly talking about recall from within the models themselves? I've never gotten deep enough into this kind of thing to mess with RAG pipelines, but I wonder if direct access to a translation or several would have any impact on its overall effectiveness for this.
It's discouraging that an LLM can accurately recall a book. That is, in a sense, overfitting. The LLM is supposed to be much smaller than the training set, having in some sense abstracted the training inputs.
Did they try this on obscure bible excerpts, or just ones likely to be well known and quoted elsewhere? Well known quotes would be reinforced by all the copies.
The bible is probably in enough different training sets (not just in whole, various papers making some religious argument that quote a few verses to make their point) that the model should have most of the bible.
Does GPT now query in real-time? If so, it should be able to reproduce anything searchable verbatim. It just needs to determine when verbatim quoting is appropriate given the prompt.
By Betteridge's law of headlines, the answer is clearly "no".[1]
But also, LLM's in general build a lossy compression of their training data so are not the right tool if you want a completely accurate recall.
Will the recall be accurate enough for a particular task? Well I'm not a religious person so I have no framework to help decide that question in the context of the bible. If you want a system to answer scripture questions I would expect a far better approach than just an LLM would be to build a RAG system and train the RAG embedding and search at the same time you train the model.
I'm heavily biased here because I don't find much value in the bible personally. Some of the stories are interesting and some interpretations seem useful, but as a whole I find it arbitrary.
I never tell other people what to believe or how they should do that in any capacity.
With that said I find the hallucination component here fascinating. From my perspective everyone who interprets various religious text does so differently and usually that involves varying levels of fabrication or something that looks a lot like it. I'm speaking about the "talking in tongues" and other methods here. I'm not trying to lump all religions into the same bag here, but I have seen that a lot have different ways of "receiving" communication or directive. To me this seems pretty consistent with the colloquial idea of a hallucination.
"From my perspective everyone who interprets various religious text does so differently"
The existence of denominations and confessions/creeds really shows that this isn't true generally. (There may be more than one interpretation, but not a unique one to every reader)
Even ignoring denominations, nearly all mainline christians for example would agree to the Nicene creed. (Anyone that disagreed probably wouldn't be considered "mainline", so somewhat definitional)
To suggest that all of theology is basically noneeterministically making things is naive and in my opinion insulting to an entire academic discipline, much less to the entire body of believers. (I can't tell if this is what you're taking about or not)
Nearly no group of mainline believers accept speaking in tongues and basically all of main protestantism believes that the time of prophets and new messages from God is over, the Bible is complete and will never be added to. (Pentecostals would be the one exception here, but I don't consider them mainline christians personally)
Most study and application tries to either source or fully work out from principles the meaning of the Bible. These can be wrong arguments but wouldn’t be hallucinations.
Your experience sounds limited to Pentecostal-originated churches, which are 100-150 years old. In those churches, it’s acceptable to speak as if you’ve received a spontaneous understanding of the Bible and to not explain it. That does have a parallel to LLM hallucinations in face value output, I suppose, but the origination is completely different as the spontaneous human is making planned remarks passed off as spontaneous, trying to affect specific people in the room, or emotionally overwhelmed. None of those resemble why/how LLMs hallucinate.
As quick as I am to criticize the bizarre versions of Christianity, I do think you're in error to assume Pentecostalism is all or even mostly about "planned remarks passed off as spontaneous".
Improv is a thing, and can be trained as a skill even outside of comedy/entertainment.
Though, outside of Charismatic sects, Christianity does see a more reasonable level of "I had prepared by thinking about (verse X), but suddenly now I'm thinking about (obscure verse Y)."
Interestingly there is an entirely licit charismatic subsect within Catholicism called Catholic Charismatic Renewal. And yeah they’re basically Catholic Pentecostals.
Didn’t say all or mostly. It is one aspect. You’re right it’s typically bullet points that are fleshed out in the moment, rather than a rehearsed speech, when this happens.
I agree fringe religious groups paint a stranger picture, but I don't think that entirely covers it here.
Without specifying any specific religions, many often have different interpretations at different times on critical issues like human rights, torture, slavery and worse. Different interpretations at different times have been used to justify spousal abuse, human selling/trafficking for specific agreed upon prices or transactions, etc.
I think in the same way that LLMs hallucinate because they can't find a proper place in their vector mapping for that embedding to draw on experience or reason (simplifying) the humans are also drawing from a place that has blanks and filling in the gaps.
If read in context, and ancient style, it's one of the most consistent works ever made given how many authors it had over what time period. It tells the "story of redemption" of God making man for a relationship, man betraying God, God redeeming man by literally dying at our hands, God continuing to drag His unfaithful partners to the finish line, and God spending eternity showing us undeserved love.
God's main requirement is to put faith in Christ and repent to enter a relationship with Him. His friends get forgiveness and grace, enemies get justice and wrath. From there, He dwells inside us to change us into what He wants. He rewards every good work He equips us for. He also disciplines our failures, like adopted sons and daughters, to keep us on the right path. He is gracious.
While telling that redemptive story, God's Word weaves together much testimony to teach us almost everything we need to know about life: God's/man's nature, God's laws/design, repeating patterns of man's behavior, different genres, prophecy/miracles for confirmation, and promises for the future. That the same message got the same results in thousands of people groups, peacefully, shows its universal power.
You wondered how to interpret it in an accurate way. Our church follows the historical-grammatical, or literal, method. We ask: who is speaking, what do their specific words mean, in what context, and for what reason then? And how does that apply today? And what do other passages say about the same topic in their context?
I'll leave you with that site in case it helps answer some of your questions. It's really the work Christ did that saves us, changes us, etc. Our actions help us live more effectively while on Earth. Every decision has an impact in eternity, too, as God will render to each for their works.
Since He promises answers, I suggest reading John's Gospel in ESV (good translation) while asking who Jesus Christ really is. You have to be humble and open to hear Him, though.
A really long word salad that doesn't say much, except state your religious pinions and disclaimer? It really is not necessary
With that said, all you said is that the process of transmitting religious ideas is akin to hallucination? Care to explain what the logical argument for that is?
My reason for saying happy holidays is because I don't really have a specific thing I celebrate, I simply take vacation during all of December and don't like being in the cold. That's a holiday for me.
There are over 5,000 manuscripts of the New Testament that overlap with more than 90% consistency. This makes it one of the most well-preserved and reliably transmitted ancient texts in history.
The accuracy of most translations over millenia is frankly unbelievable when compared against other sources and ancient manuscripts such as the Dead Sea scrolls, Josephus, etc. Even comparing LXX/Septuagint and Vulgate are more remarkable for the very few ways in which they diverge, but even those can be harmonized with careful study.
"I've often found myself uneasy when LLMs (Large Language Models) are asked to quote the Bible. While they can provide insightful discussions about faith, their tendency to hallucinate responses raises concerns when dealing with scripture, which we regard as the inspired Word of God."
Interesting. In my very religious upbringing I wasn't allowed to read fairy tales. The danger being not able to classify which stories truly happened and which ones didn't.
Might be an interesting variant on the Turing test. Can you make the AI believe in your religion? Probably there's a sci-fi book written about it.
> In my very religious upbringing I wasn't allowed to read fairy tales. The danger being not able to classify which stories truly happened and which ones didn't.
Thanks for sharing. You might be interested: JRR Tolkien, 'fairy tale' author, was also a/the leading scholar of Old English (Anglo-Saxon), related languages, and the culture and myth around them - including 'fairy tales'; and he was a devout Catholic.
How could he write (and study) such ungodly material? He wrestles with the question multiple times, but if you are interested, I strongly recommend On Fairy-stories, an essay based on a lecture. It covers far more ground than this question, but it's worth reading anyway. I'll append a spoiler below.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
[SPOILER]
There's more to it than this, but it's a wonderful vision:
"The Gospels contain a fairy-story, or a story of a larger kind which embraces all the essence of fairy-stories. They contain many marvels - peculiarly artistic,[1] beautiful, and moving: 'mythical' in their perfect, self-contained significance; and among the marvels is the greatest and most complete conceivable eucatastrophe. But this story has entered History and the primary world .... The Birth of Christ is the eucatastrophe of Man's history. The Resurrection is the eucatastrophe of the story of the Incarnation. This story begins and ends in joy. It has pre-eminently the 'inner consistency of reality'. There is no tale ever told that men would rather find was true, and none which so many sceptical men have accepted as true on its own merits. For the Art of it has the supremely convincing tone of Primary Art, that is, of Creation. ..."
[1] "The Art is here in the story itself rather than in the telling; for the Author of the story was not the evangelists."
If there’s some platonic notion of divinity or immanence that all faith is just a downward projection from, it seems like its statistical representation in tokenized embedding vectors is about as close as you could get to understanding it holistically across theological boundaries.
All kidding aside, whether you are looking at Markov chain n-gram babble or high temperature LLM inference, the strange things that emerge are a wonderful form of glossolalia in my opinion that speak to some strange essence embedded in the collective space created by the sum of their corpi text. The Delphic oracle is real, and you can subscribe for a low fee of $20/month!
reply