There’s many things they can’t do. Even a simple rule like “ensure that numbers from one to ten are written as words and numbers greater ten as digits in the given text” fails for me for so many examples even if it works for many others; few shot, chain of thought, many versions of the prompt, it doesn’t matter. Sometimes LLMs will even change the number to something else, even with temp set to 0. And then there’s the non-determinism (again with temp=0), you run the same prompt several times and that one time it’ll respond with something different.
As amazing as they are, they still have many limitations.
I’ve been working with ChatGPT and Gemini to apply simple rules like the one above and I got so frustrated.
The reason it can't do that is that, for example, "twenty" and "20" are nearly identical in the vector embedding space and it can't really distinguish them that well in most contexts. That's true for generally any task that relies on sort of "how the words look" vs "what the words mean". Any kind of meta request is going to be very difficult for an LLM, but a multi-modal GPT model should be able to handle it.
Much better, but still missing "than" after "greater", which seems kind of critical.
"Using" is important as a number greater than ten can't be written as a digit, but can be written using digits ("with" would be just as good). Repeating "written" makes it clearer that there are two instructions.
It's funny, I didn't notice the missing "than" until much later. After I learned the intended meaning of the original sentence, my mind just seemed to insert the missing "than" automatically.
Mine as well. After understanding the meaning thanks to the other posters, the sentence magically looked fine. But before knowing the meaning, it was gibberish. I’ve become aware of this before, and it makes me wonder just how often I’m interpreting grammatical nonsense on a daily basis without realizing it.
First of all, the texts the rule has to be applied to are written in English. Second, I believe English is by far (by far) the most prevalent language in the training dataset for those models, so I’d expect it to work better at this kind of task.
And third, I’m not the only one working on this problem, there are others that are native speakers, and as my initial message stated, there have been many variations of the prompt. None work for all cases.
And lastly, how would you rewrite my sample prompt? Which BTW bad a typo (unrelated to my English skills) that I’ve now fixed.
To be frank the response itself indicates that you don't really get what was being asked, or maybe how to parse English conversation conventions?
I.e. It doesn't seem to answer the actual question.
They seem to be half responding to the second sentence which was a personal opinion, so I wasn't soliciting any answers about it. And half going on a tangent that seems to lead away from forming a direct answer.
Run these comment through a translation tool if your still not 100% sure after reading this.
Your surname surely seems to indicate that some of your ancestors weren't native English speakers. I hope they didn't get lectured or made fun of by people like you on their poor English skills when they first landed on whichever country you were born.
Your English is absolutely fine and your answers in this thread clearly addressed the points brought up by other commenters. I have no idea what that guy is on about.
It's a simple prescriptive rule in English. If you are writing about a small number, like less than ten, spell it out. For example: "According to a survey, nine out of ten people agree."
But if you are writing about a large number, particularly one with a lot of different digits, prefer writing the digits: "A mile is 5,280 feet." Compare that to: "A mile is five thousand, two hundred, and eighty feet."
to me the main problem is that it should read "numbers greater than ten." I asked Gemini to rephrase it and Gemini produced correct English with the intended meaning:
> Change all numbers between one and ten to words, and write numbers eleven and above as digits in the text.
It even used eleven rather than ten which sounds like counting.
All of these issues are entirely due to the tokenization scheme. Literally all of them
You could get this behavior implemented perfectly with constrained text gen techniques like grammars or any of the various libraries implementing constrained text gen (i.e. guidance)
I had briefly looked into Guidance and others (LMQL, Outlines) but I couldn't figure out how to use them for this problem.
I could think of how to use them to prevent the LLM from generating digits for numbers greater than ten by using a regex plus a constraint that forbids digits, but the main problem is the other part of the rule, i.e. numbers above 10 should never be spelled out and should be written as digits instead. For that I presume you need to identify the spelled out numbers first, for which you presumably would need the LLM so you're back to LLM fallibility.
You constructed a task that no-one understands and then you even admit that it, despite that, actually succeeds most of the times. Sounds like a massive win for the LLMs to me.
As amazing as they are, they still have many limitations.
I’ve been working with ChatGPT and Gemini to apply simple rules like the one above and I got so frustrated.