Hacker News new | past | comments | ask | show | jobs | submit login

My nuclear fire hot take is that the chat pattern is actively hampering AI tools because we have to square peg -> round hole things either into the chat UI (because that's what people expect), or that as developers you have to square peg -> round hole into the chat API patterns.

Last night I wrote an implementation of an AI paper and it was so much easier to just discard the automatic chat formatting and do it "by hand": https://github.com/Xe/structured-reasoning/blob/main/index.j...

I wonder if foundation models are an untapped goldmine in terms of the things they can do, but we can't surface them to developers because everyone's stuck in the chat pattern.






Whoa! You broke my brain a bit there (but your posts often do, in a Good way!)

Would you be so kind as to ELI5 what you did in that index.js?

I've used ollama to run models locally, but I'm still stuck in chat-land.

Of course, if a blog post is in the works, I'll just wait for that :)


The file explains it a bit, but my blogpost https://xeiaso.net/notes/2025/s1-simple-test-time-scaling/ could probably be better explained. I'll write out more but just for you I'll summarize what I'm gonna end up writing up.

AI models fundamentally work on the basis of "given what's before, what comes next?" When you pass messages to an API like:

    [
      { "role": "system", content": "You are an expert in selling propane and propane accessories. Whenever someone talks about anything that isn't propane, steer them back." },
      { "role": "user", "content": "What should I use to cook food on my grill?" },
      { "role": "assistant", "content": "For cooking food on your grill, using propane is a great choice due to its convenience and efficiency. [...]" }
    ]
Under the hood, the model actually sees something like this (using the formatting that DeepSeek's Qwen 2.5 32b reasoning distillation uses):

    You are an expert in selling propane and propane accessories. Whenever someone talks about anything that isn't propane, steer them back.
    <|User|>What should I use to cook food on my grill?<|endofsentence|>
    <|Assistant|>
And then the model starts generating tokens to get you a reply. What the model returns is something like:

    For cooking food on your grill, using propane is a great choice due to its convenience and efficiency. [...]<|endofsentence|>
The runtime around the model then appends that as the final "assistant" message and sends it back to the user so there's a façade of communication.

What I'm doing here is manually assembling the context window such that I can take advantage of that and then induce the model that it needs to think more, so the basic context window looks like:

    Follow this JSON schema: [omitted for brevity]
    <|User|>Tell me about Canada.<|endofsentence|>
    <|Assistant|><think>Okay
And then the model will output reasoning steps until it sends a </think> token, which can be used to tell the runtime that it's done thinking and to treat any tokens after that as the normal chat response. However, sometimes the model stops thinking too soon, so what you can do is intercept this </think> token and then append a newline and the word "Wait" to the context window. Then when you send it back to the model, it will second-guess and double-check its work.

The paper s1: Simple test-time scaling (https://arxiv.org/abs/2501.19393) concludes that this is probably how OpenAI implemented the "reasoning effort" slider for their o1 API. My index.js file applies this principle and has DeepSeek's Qwen 2.5 32b reasoning distillation think for three rounds of effort and then output some detailed information about Canada.

In my opinion, this is the kind of thing that people need to be more aware of, and the kind of stuff that I use in my own research for finding ways to make AI models benefit humanity instead of replacing human labor.


Thank You so much for making time to write that up! Deeply appreciated.

It's fascinating how this "turn-taking protocol" has emerged in this space -- as a (possibly weird) analogy, different countries don't always use the same electrical voltage or plug/socket form-factor.

Yet, the `role` and `content` attrib in json appears to be pretty much a de facto standard now.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: