Hacker News new | past | comments | ask | show | jobs | submit login

There’s many things they can’t do. Even a simple rule like “ensure that numbers from one to ten are written as words and numbers greater ten as digits in the given text” fails for me for so many examples even if it works for many others; few shot, chain of thought, many versions of the prompt, it doesn’t matter. Sometimes LLMs will even change the number to something else, even with temp set to 0. And then there’s the non-determinism (again with temp=0), you run the same prompt several times and that one time it’ll respond with something different.

As amazing as they are, they still have many limitations.

I’ve been working with ChatGPT and Gemini to apply simple rules like the one above and I got so frustrated.




The reason it can't do that is that, for example, "twenty" and "20" are nearly identical in the vector embedding space and it can't really distinguish them that well in most contexts. That's true for generally any task that relies on sort of "how the words look" vs "what the words mean". Any kind of meta request is going to be very difficult for an LLM, but a multi-modal GPT model should be able to handle it.


Thanks, I’ll try the multimodal one.


Tried it, did not perform better than the non-multimodal one.


> ensure that numbers from one to ten as written as words and numbers greater ten as digits in the given text

I can’t fault llms for not knowing what to do here because I, a human, have no idea what on earth this means.


Given the text "1,2,3,4,5,6,7,8,9,10,11,12" it should result in "one, two, three, four, five, six, seven, eight, nine, ten, 11, 12"

or at least that's my understanding of the prompt


I think you may be thrown off because the first "as" is meant to be "are".


Thanks, that was def a typo that I’ve fixed now.


“Ten” is a word, “10” are digits.

I’m not a native English speaker, how would you write it?

FWIW the LLMs get it right many times, but fail other times.


I couldn't understand the original wording either, but after reading one of the sibling comments that explains it, it suddenly made sense.

I think you left out a few words that most English writers would include. So instead of:

> "ensure that numbers from one to ten as written as words and numbers greater ten as digits in the given text",

something like the following might be better for most people:

> "ensure that the numbers from one to ten are written as words, and the numbers greater ten are written using numerical digits in the given text"

There are multiple ways to write this, so other people may have better versions.

I'm not an English grammar expert, so I cannot explain to you why the addition of those extra words helps with the clarity of that sentence.


Much better, but still missing "than" after "greater", which seems kind of critical.

"Using" is important as a number greater than ten can't be written as a digit, but can be written using digits ("with" would be just as good). Repeating "written" makes it clearer that there are two instructions.


It's funny, I didn't notice the missing "than" until much later. After I learned the intended meaning of the original sentence, my mind just seemed to insert the missing "than" automatically.


Mine as well. After understanding the meaning thanks to the other posters, the sentence magically looked fine. But before knowing the meaning, it was gibberish. I’ve become aware of this before, and it makes me wonder just how often I’m interpreting grammatical nonsense on a daily basis without realizing it.


Hilariously, you can ask GPT 4 to explain the “why” of arbitrary grammar fixes.


It’s a common style guide in newspapers.


If your not a native English speaker, why are you even expecting the LLM to understand even 80% of the time?

Just ask it in your own native language.


First of all, the texts the rule has to be applied to are written in English. Second, I believe English is by far (by far) the most prevalent language in the training dataset for those models, so I’d expect it to work better at this kind of task.

And third, I’m not the only one working on this problem, there are others that are native speakers, and as my initial message stated, there have been many variations of the prompt. None work for all cases.

And lastly, how would you rewrite my sample prompt? Which BTW bad a typo (unrelated to my English skills) that I’ve now fixed.


To be frank the response itself indicates that you don't really get what was being asked, or maybe how to parse English conversation conventions?

I.e. It doesn't seem to answer the actual question.

They seem to be half responding to the second sentence which was a personal opinion, so I wasn't soliciting any answers about it. And half going on a tangent that seems to lead away from forming a direct answer.

Run these comment through a translation tool if your still not 100% sure after reading this.


Alright man. So was it a quip when you said “if _your_ not a native English speaker”? Ok then. Very funny, I get it now.


I really recommend to use a translator, instead of relying purely on your English comprehension skills.


Your surname surely seems to indicate that some of your ancestors weren't native English speakers. I hope they didn't get lectured or made fun of by people like you on their poor English skills when they first landed on whichever country you were born.


Your English is absolutely fine and your answers in this thread clearly addressed the points brought up by other commenters. I have no idea what that guy is on about.


I've read this three times and it still doesn't make a lick of sense. How does this relate to the parent comments?


It's a simple prescriptive rule in English. If you are writing about a small number, like less than ten, spell it out. For example: "According to a survey, nine out of ten people agree."

But if you are writing about a large number, particularly one with a lot of different digits, prefer writing the digits: "A mile is 5,280 feet." Compare that to: "A mile is five thousand, two hundred, and eighty feet."


I think he mean that numbers less or equal than ten are written as words, and others are written as numbers.

Given the many reaponses, it would be fun to aee if llm beat humans on understanding the sentence ahah


to me the main problem is that it should read "numbers greater than ten." I asked Gemini to rephrase it and Gemini produced correct English with the intended meaning:

> Change all numbers between one and ten to words, and write numbers eleven and above as digits in the text.

It even used eleven rather than ten which sounds like counting.


> > ensure that numbers from one to ten as written as words and numbers greater ten as digits in the given text

There are two blue, one red, and 15 green m&ms in this bag.


All of these issues are entirely due to the tokenization scheme. Literally all of them

You could get this behavior implemented perfectly with constrained text gen techniques like grammars or any of the various libraries implementing constrained text gen (i.e. guidance)


I had briefly looked into Guidance and others (LMQL, Outlines) but I couldn't figure out how to use them for this problem.

I could think of how to use them to prevent the LLM from generating digits for numbers greater than ten by using a regex plus a constraint that forbids digits, but the main problem is the other part of the rule, i.e. numbers above 10 should never be spelled out and should be written as digits instead. For that I presume you need to identify the spelled out numbers first, for which you presumably would need the LLM so you're back to LLM fallibility.

Any pointers would be greatly appreciated.


You constructed a task that no-one understands and then you even admit that it, despite that, actually succeeds most of the times. Sounds like a massive win for the LLMs to me.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: