Hacker News new | past | comments | ask | show | jobs | submit login

A very simple example “list the Presidents in the order they were born”.

It gets the order wrong unless you tell it to “use Python”

https://chat.openai.com/share/4a673ea0-67d3-4256-b57d-dc1cf8...




My favorite example is telling it to reverse some longer text character-by-character. Trivial for any human to perform perfectly, but all models I've tested struggle with it and make mistakes all over. It's really hard for them because they lack hidden state to perform algorithms - or what you would call thought in a human. Instead (essentially) for each step they have to re-consider the entire thing and their past output, figure out what they already did, and what they have to do next.

On the other hand they'll spit out python code that'll get you the reversed text just fine.

It's also one of their greatest shortcomings when it comes to coding: They lack the ability to do any meaningful symbolical execution.

https://chat.openai.com/share/9faaae97-e20f-454e-b245-3e4c19...


I'm pretty sure this is a problem relating to tokenization. There are hardly any tokens that can be used for reversing text.

I simply took the "interesting fact" and added this to the prompt: "Reverse the sentence. Here is the beginning: ".noitatipicerp laminim seviecer"

Here is the result I got:

".noitatipicerp laminim seviecer" (prompt) - "taht ecalp a sa treseted a fo noitinifed eht yb htraE no treseted tsegral eht sa ti seifilaqulc esnapxe tsa v s'acitr nA.serutarepmet gnizeerf sti etipseD.dnali ni sse l ne ve na d t so ac eht g n o l l a ff all wons ) mc 5 . 61( seh cni . 5 6 tuoba fo noitatipicerp launn a egareva htiw, noitatipicerp eltt i l y r e v s eveice r ti esuaceb treseted a sa deifissalc si acitr an t A. e cap s d n a l y ci sti ot eu d dekool revo net fo tneu qer itnA" .a n fe ts erita n o t se r tse degral s'd lro w e ht taht stcaf gnitanicsaf enO".

Manually reversed back into original order:

"One fascinating facts that th e w orl d's larged est r es t o n atire st ef n a. Anti req uent of ten over looked d ue to its ic y l a n d s pac e .A t na rtica is classified as a detesert because it r ecieve s v e r y l i ttle precipitation ,with average a nnual precipitation of about 6 5 . inc hes (16 . 5 cm ) snow lla ff a l l o n g the ca os t d an ev en l ess in iland.Despite its freezing temperatures.An rtica's v ast expanse cluqalifies it as the largest detesert on Earth by the definition of a detesert as a place that" - (prompt) "receives minimal precipitation."


> I'm pretty sure this is a problem relating to tokenization.

I don't think so - because they seem to be able to repeat back any short sequence of characters without issue. If I pick anything from that text they struggled with, manually reverse it, and tell them to repeat the reversed word back to me, that works fine.

It's also not just an issue with reversing something character-by-character. You can ask them to reverse numbers or re-arrange words and they'll faceplant in the same way as soon as the input gets beyond a small threshold. Here surely there wouldn't be an issue with tokenization.

Of course if you would train a network on specifically the task of reversing text it would do quite well, but not because it's doing it using any straightforward algorithm. Nothing like what a human would be doing in that situation can be represented within their network - because they're directed graphs and there's no hidden state available to them.

The point is simply to demonstrate their inability to perform any novel task that requires even a tiny bit of what I dub "thought". By their very implementation they cannot.


> You can ask them to reverse numbers or re-arrange words and they'll faceplant in the same way as soon as the input gets beyond a small threshold. Here surely there wouldn't be an issue with tokenization.

My guess is the training data contains many short pairs of forward and backward sequences, but none after a certain threshold length (due to how quickly the number of possible sequences grows with length). This would imply there's no actual reversing going on, and the LLM is instead using the training data as a lookup table.


Apparently Claude-3 Opus can do reversal tasks pretty well, even without a code interpreter (or does it use one internally?).

https://twitter.com/AlexTamkin/status/1767248600919355670


Pretty much all of them will able to fake it on short sentences. All break down eventually (and soon).

Also that's not a reversal task because there was no input. It was free to make up anything that fits.


It’s horrible at relative times too. If you just give times, it can puzzle it out, but add something happening, it struggles:

https://chat.openai.com/share/5f558fc4-a0d0-494d-a3d7-ad78f5...

More: https://chat.openai.com/share/11c45192-6153-44b4-bb97-024e8d...

“The event at 3pm doesn’t fall within the 2.1-hour window around 5pm because this time window spans from 2:54 pm to 7:06 pm. The 3pm event occurred before the start of this window. Since 3pm is earlier than 2:54 pm, it’s outside the range we’re considering.”

Trillions of tokens!


The first example with ChatGPT 4

https://chat.openai.com/share/32335834-9d12-421e-96b2-9aa6f1...

For the second example, I had to tell it to use Python

https://chat.openai.com/share/76e6cd67-ad49-4508-b05d-3d26a3...


Does python involve calling "get_us_presidents()"?


I couldn’t see how to get the code to show in the shared link myself.

But I did look at the code during the session when I was creating the link. It’s just what you would expect - a dictionary of US Presidents and the year they were born and one line built in Python function to sort the list.


You can check the code it generated in the long OP provided (this button is not very visible so I understand if you missed it).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: