Hacker News new | past | comments | ask | show | jobs | submit login

I'm really glad that I see someone else doing something similar. I had the epiphany a while ago that if LLMs can interpret textual instructions to draw a picture and output the design in another textual format that this a strong indicator that they're more than just stochastic parrots.

My personal test has been "A horse eating apples next to a tree" but the deliberate absurdity of your example is a much more useful test.

Do you know if this is a recognized technique that people use to study LLMs?




I've seen people using "draw a unicorn using tikz" https://adamkdean.co.uk/posts/gpt-unicorn-a-daily-exploratio...


I did some experiments of my own after this paper, but letting GPT-4 run wild, picking its own scene. It wanted to draw a boat on a lake, and I also asked it to throw in some JS animations, so it made the sun set:

https://int19h.org/chatgpt/lakeside/index.html

One interesting thing that I found out while doing this is that if you ask GPT-4 to produce SVG suitable for use in HTML, it will often just generate base64-encoded data: URIs directly. Which do contain valid SVG inside as requested.


That came, IIRC, from one of the OpenAI or Microsoft people (Sebastian Bubeck); it was recounted in an NPR podcast "Greetings from Earth"

https://www.thisamericanlife.org/803/transcript


It's in this presentation https://www.youtube.com/watch?v=qbIk7-JPB2c

The most significant part I took away is that when safety "alignment" was done the ability plummeted. So that really makes me wonder how much better these models would be if they weren't lobotomized to prevent them from saying bad words.


But how will that prove that it's more than a stochastic parrot, honestly curious?

Isn't it just like any kind of conversion or translation? Ie. a relationship mapping between diffrent domains and just as much parroting "known" paths between parts of different domains?

If "sun" is associated with "round", "up high", "yellow","heat" in english that will map to those things in SVG or in whatever bizarre format you throw at with relatively isomorphic paths existing there just knitted together as a different metamorphosis or cluster of nodes.

On a tangent it's interesting what constitutes the heaviest nodes in the data, how shared is "yellow" or "up high" between different domains, and what is above and below them hierarchically weight-wise. Is there a heaviest "thing in the entire dataset"?

If you dump a heatmap of a description of the sun and an SVG of a sun - of the neuron / axon like cloud of data in some model - would it look similar in some way?


that’s a huge stretch for parroting


Not sure if this counts. I recently went from description of a screenshot of graph to generate pandas code and plot from description. Conceptually it was accurate.

I don’t think it reflects any understanding. But to go from screenshot to conceptually accurate and working code was impressive.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: