Hacker News new | past | comments | ask | show | jobs | submit login
Tabloid: A clickbait headline programming language (2021) (tabloid.vercel.app)
217 points by ko_pivot 3 months ago | hide | past | favorite | 29 comments



Now this is truly the programming language that we should be using to benchmark LLM code gen in a private hold out set. There is no substantial datasets on the internet or github, and no documentation except the one provided. And that's all the model should need.

I asked GPT-4 to write a mat mul function, but that was too ambitious and it spit out outrageous nonsense.

To be more fair, I gave it in-context access to the documentation in prompt, along with the fibonacci example function; aka everything humans have access to. I then asked it to do the simpler task of converting a base 10 integer to binary. It was unable to write something error free even after 4 rounds of supplying it the error messages.

I repeated this 5 times in case it generates something grammatical in the Top-K@5.

I suspected there was some confusion it couldn't surmount about string manipulation. So I changed the question to something challenging, yet something that only used function calls, conditional logic, basic math ops, and numbers. First, I asked for an nth root approximator using newton's method. Didn't work. Asked for just the square root. Didn't work. Finally, I asked for a function that prints a student's grade given their integer percentage. Not even.

GPT-4 also persistently hallucinated the keyword BREAKING NEWS, which I think sounds like a pretty good keyword if Tabloid were to ever get error handling.

The spooky part is that the almost all the solutions at face value would get partial credit. They had the right abstract approach, being familiar with reams of example approaches in natural language or programming languages. However, in each case, GPT-4, 4o, Claude all failed to produce something without syntax errors.

I suspect this is the case because transformers do subgraph matching, and while on one end there are rich internal connections for all the problems I requested, on the other end there is nothing similar enough for it to even get a foothold, hence the biggest struggle being syntax. If the only barrier to executing Tabloid code (or other unseen languages) is more basic syntax training, then it excitingly suggests it just needs to learn the abstract concepts from leetcode scrapes once for every syntax it knows. Prior research has shown that grammar is easy for language models. When GPT-2 was made large enough, it went from babbling to grammatical sentences very early in it's training, and at that moment its loss plummeted.

All tests conducted in temporary data mode so that this eval stays dark.


Claude managed to write code successfully.

```

DISCOVER HOW TO square_root WITH x, iterations RUMOR HAS IT EXPERTS CLAIM guess TO BE x DIVIDED BY 2 DISCOVER HOW TO improve_guess WITH current_guess RUMOR HAS IT SHOCKING DEVELOPMENT (current_guess PLUS (x DIVIDED BY current_guess)) DIVIDED BY 2 END OF STORY

    DISCOVER HOW TO iterate WITH current_guess, remaining_iterations
    RUMOR HAS IT
        WHAT IF remaining_iterations SMALLER THAN 1
            SHOCKING DEVELOPMENT current_guess
        LIES! RUMOR HAS IT
            EXPERTS CLAIM new_guess TO BE improve_guess OF current_guess
            SHOCKING DEVELOPMENT
                iterate OF new_guess, remaining_iterations MINUS 1
        END OF STORY
    END OF STORY
    
    SHOCKING DEVELOPMENT iterate OF guess, iterations
END OF STORY

EXPERTS CLAIM number TO BE 16 EXPERTS CLAIM num_iterations TO BE 5

YOU WON'T WANT TO MISS 'The square root of' YOU WON'T WANT TO MISS number YOU WON'T WANT TO MISS 'is approximately' YOU WON'T WANT TO MISS square_root OF number, num_iterations

PLEASE LIKE AND SUBSCRIBE

```


This is consistent with my own experience that Claude is just downright better than ChatGPT.


Same, I've been pretty impressed as well and typically give Claude a shot. Sometimes I even pass their results back and forth in an LLM collab so they generate more diverse perspectives. However, this paper from 4 days ago shows that Claude can fall apart quickly in out of distribution tasks. If you ask opposite day questions, GPT-4 is weirdly strong at it (figure 2).

https://arxiv.org/pdf/2307.02477


Ah bravo! What was the prompt and Claude model?


Great idea here. I wonder if there's potentially more demand for new programming languages now purely as benchmarks for LLMs, like you said?


Maybe they will take on that role too one day


I really think the "please like and subscribe" that ends the program should also be printed out (with a link to the project's GitHub page to make it more... actionable).


I would change BEATS/SMALLER THAN to “DESTROYS” and “HUMILIATED BY”


hahaha laughed hard on this one.


and functions: WHY YOU SHOULD foo WITH bar


I wrote the Racket implementation, in case you want to be able to compile your Tabloid programs: https://github.com/otherjoel/tabloid


Some more discussion from 2020 with author input:

https://news.ycombinator.com/item?id=24578749


Reminds me of ArnoldC[1] from a few years ago.

[1] https://lhartikk.github.io/ArnoldC/


For-Loops should be something like

[n] GOOD REASONS WHY [i =< n]

[thing] HATES THIS THING <----- exception handling

ITS TIME WE TALK ABOUT [x] <----- while-loop


I couldn't believe and was SHOCKED to find out that this was a computer language! Please like and subscribe to learn more.


> Before making Tabloid, I also created a ... boring and unpopular programming language, called Ink.

That line killed me.


Compiler developers hate him.


Whoever built this is a bloody genius


Reminds me of aussue++[1] from a few years ago.

[1] https://github.com/zackradisic/aussieplusplus/


Software engineers don't want you to know this one weird trick()



This seems like a both fun/humorous and an educative project on programming language and interpreter design.

Motivating, lovely.


That is exactly what I did for the 'A practical approach to parsing' workshop I gave at MCH2022 [1]. You can give it a try with the online IParse Studio [2], which has a simple build in interpreter, and if you are lazy or getting stuck, you can have a look at the grammar I wrote myself [3], which does not specify operator precedence yet.

[1] https://fransfaase.github.io/MCH2022ParserWorkshop/

[2] https://fransfaase.github.io/MCH2022ParserWorkshop/IParseStu...

[3] https://github.com/FransFaase/MCH2022ParserWorkshop/blob/mai...


Looks like FORTH!


I miss the old headlinese. Slam, pan, rip.


Cute


I'm very disappointed that (Number four will shock you) wasn't some kind of break statement or event handling.


This is cursed as shit lol




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: