Hacker News new | past | comments | ask | show | jobs | submit | pedrovhb's comments login

You may already be aware of it, but in case not - it sounds like tree-sitter-graph could be something you'd be interested in: https://docs.rs/tree-sitter-graph/latest/tree_sitter_graph/r...

I haven't gotten into it yet but it looks pretty neat, and it's an official tool.


Or by the definition that the ratio between consecutive fib numbers approaches Phi, just multiply by 1.618? Though at that point might as well just use the real conversion ratio.

In other news, π² ≈ g.


+1 on feeling there's a lot of UX possibilities left on the table. Most seem to have accepted chat as the only means of using LLMs. In particular, I don't think most people realize that LLMs can be used in very powerful ways that just aren't possible with black-box API services as they currently exist. Google kind of has an edge on this area with recent context caching support for Gemini, but that's just one thing. Some things that feel like they could enable new modes of interaction aren't possible at all, like grammar constrained generation and rapid LLM-tool interactions (think a repl or shell rather than function calls; currently you have to pay for the input tokens all over again if you want to use the results of that function call as context and it adds up quickly).

On Copilot, I've been using it since it was public, and have always found it useful, but it hasn't really changed much. There's a chat window now (groundbreaking, I know) and it shows a "processing steps" thing that says it's doing some distinct agentic tasks like collecting context and test run results and what have you, but it doesn't feel like it knows my codebase any better than the cursory description I'd give an LLM without context. I use the jetbrains plugin though, and I understand the vscode extension has some different features, so ymmv.


It does view RAW when compiled with the right flags. JXL too, interestingly. Managed to save a bunch of space on old photos (converting with cjxl, but which I wouldn't have done if I weren't able to see them somehow).


Here's an idea: recursively mount code files/projects. Use something like tree-sitter to extract class and function definitions and make each into a "file" within the directory representing the actual file. Need to get an idea for how a codebase is structured? Just `tree` it :)

Getting deeper into the rabbit hole, maybe imports could be resolved into symlinks and such. Plenty of interesting possibilities!


Have you tried asking it for a specific concrete length, like a number of words? I was also frustrated with concise answers when asking for long ones, but I found that the outputs improved significantly if I asked for e.g. 4000 words specifically. Further than that, have it break it down into sections and write X words per section.


Yes, all the possible length extending custom instructions you can think of. I can get some reasonable length responses out of it, but I've never seen them go over 1 page worth, and multi-shot example prompts using multiple USER and GPT exchanges to define the format. Seems like GPT4 has a hard limit as to how much it will output when you click "continue", and Claude Opus never goes over a page either. Another user pointed out using the API, which I have done in the past, but it's been a long while, and I can't really justify the cost of using the advanced models via API for my general use.


Everyone's coalescing at a max of 4096 tokens/12 "pages" via API (page is 250 words, which is 1 8.5"x11" double spaced)

To your point, doesn't matter anyway, it's nigh impossible to get over 2K of output with every trick and bit of guidance you can think of (I got desperate when 16K/48 pages came out to "make it work", even completely deforming tricks like making it number each line and write a reminder on each line that it should write 1000 lines don't work)


My intuition is that a significant challenge for LLMs' ability to do arithmetics has to do with tokenization. For instance, `1654+73225` as per the OpenAI tokenizer tool breaks down into `165•4•+•732•25`, meaning the LLM is incapable of considering digits individually; that is, "165" is a single "word" and its relationship to "4" and in fact each other token representing a numerical value has to be learned. It can't do simple carry operations (or other arithmetic abstractions humans have access to) in the vast majority of cases because its internal representation of text is not designed for this. Arithmetic is easy to do in base 10 or 2 or 16, but it's a whole lot harder in base ~100k where 99% of the "digits" are words like "cat" or "///////////".

Compare that to understanding arbitrary base64-encoded strings; that's much harder for humans to do without tools. Tokenization still isn't _the_ greatest fit for it, but it's a lot more tractable, and LLMs can do it no problem. Even understanding ASCII art is impressive, given they have no innate idea of what any letter looks like, and they "see" fragments of each letter on each line.

So I'm not sure if I agree or disagree with you here. I'd say LLMs in fact have very impressive capabilities to learn logical structures. Whether grammar is the problem isn't clear to me, but their internal representation format obviously and enormously influences how much harder seemingly trivial tasks become. Perhaps some efforts in hand-tuning vocabularies could improve performance in some tasks, perhaps something different altogether is necessary, but I don't think it's an impossible hurdle to overcome.


I don't think that's really how it works - sure this is true at the first level in a neural network, but in deep neural networks after the first few layers the LLM shouldn't be 'thinking' in tokens anymore.

The tokens are just the input - the internal representation can be totally different (and that format isn't tokens).


Please don't act like you "know how it works" when you obviously don't.

The issue is not the fact that the model "thinks or doesn't think in tokens". The model is forced at the final sampling/decoding step to convert it's latent back into tokens, one token at a time.

The models are fully capable of understanding the premise that they should "output a 5-7-5 syllable Haiku", but from the perspective of a model trying to count its own syllables, this is not possible, as its own vocabulary is tokenized in such a way that not only does the model not have direct phonetic information within the dataset, but it literally has no analogue for how humans count syllables (measuring mouth drops). Models can't reason about the number of characters or even tokens used in a reply too for the same exact reason too.

The person you're replying to broadly is right, and you are broadly wrong. The internal format does not matter when the final decoding step forces a return of tokenization. Please actually use these systems rather than pontificating about them online.


Thank god we aren’t talking about a model counting syllables then.


That requires converting from a weird unhelpful form into a more helpful form first, so yes but the tokenisation makes things harder as it adds an extra step - they need to learn how these things relate while having significant amounts of the structure hidden from them.


This conversion is inherent in the problem of language and maths though - Two, too (misspelt), 2, duo, dos, $0.02, and one apple next to another apple, 0b10 and 二 can all represent the (fairly abstract) concept of two.

The conversion to a helpful form is required anyway (also lets remember that computers don't work in base 10, and there isn't really a reason to believe that base 10 is inherently great for LLM's either)


It is, but there's a reason I teach my son addition like this:

    hundreds | tens | ones

        1        2      3
    +   2        1      5
    -----------------------
        3        3      8
Rather than

unoDOOOOS(third) {}{}{} [512354]_ = three"ate

* replace {}{}{} with addition, {}{} is subtraction unless followed by three spaces in which case it's also addition * translate and correct any misspellings * [512354] look up in your tables * _ is 15 * dotted lines indicate repeated numbers

Technically they're doing the same thing. One we would assume is harder to learn the fundamental concepts from.


Right, which is why testing arithmetics is a good way to test how well LLMs generalize their capabilities to non text tasks. LLMs can in theory be excellent at it, but they aren't due to how they are trained.


The tokens are the structure over which the attention mechanism is permutation equivariant. This structure permeates the forward pass, its important at every layer and will be until we find something better than attention.


I thought of something similar these days but with a different approach - rather than settrace, it would use a subclass of bdb.Bdb (the standard library base debugger, on top of which Pdb is built) to actually have the LLM run a real debugging session. It'd place breakpoints (or postmortem sessions after an uncaught exception) to drop into a repl which allows going up/down the frame stack at a given execution point, listing local state for frames, running code on the repl to try out hypotheses or understand the cause of an exception, look at methods available for the objects in scope, etc. This is similar to what you'd get by running the `%debug` magic on IPython after an uncaught exception in a cell (try it out).

The quick LLM input/repl output look is more suitable for local models though, where you can control hidden state cache, have lower latency, and enforce a grammar to ensure it doesn't go off the rails/commands implemented for interacting with the debugger, which afaik you can't do with services like OpenAI's. This is something I'd like to see more of - having low level control of a model gives qualitatively different ways of using it which I haven't seen people explore that much.


So interestingly enough, we first tried letting GPT interact with pdb, through just a set of directed prompts, but we found that it kept hallucinating commands, not responding with the correct syntax and really struggling with line numbers. That's why we pivoted to just getting all the relevant data upfront GPT could need and letting GPT synthesize that data into a singular root cause.

I think we're going to explore the local model approach though - you raise some really great points about having more granular control over the state of the model.


Interesting! Did you try the function calling API? I feel you with the line number troubles, it's hard to get something consistent there. Using diffs with GPT-4 isn't much better in my experience; I didn't extensively test that, but from what I did it rarely produced synctatically valid diffs that could just be sent to `patch`. One approach I started playing with was using tree-sitter to add markers to code and let the LLM specify marker ranges for deletion/insertion/replacement, but alas, I got distracted before fully going through with it.

In any case, I'll keep an eye on the project, good luck! Let me know if you ever need an extra set of hands, I find this stuff pretty interesting to think about :)


I actually coded something very close to this and it worked surprisingly well: https://github.com/janpf/debuggAIr


Ooh, interesting - starred and going to dig into this later today!


I've done a manual version of this with chatgpt.

I had ipdb, told it to request any variables that I should look at, suggest what to do next, what it would expect - it was quite good, but took a lot of persuading, just having an LLM that was more tuned to this would be better.


For all the brilliance in the AI and infra departments of OpenAI, their official Python library (which is the flagship one as I understand) feels pretty unidiomatic, designed without much thought for common patterns in the language.

2012 JavaScript called, it wants its callbacks wrapped in objects back. Why do we have a context manager named "stream" for which you call `.until_done()`? This could've been an iterator, or better - an asynchronous iterator, since this is streaming over the network. We could be destructing instances of named tuples with pattern matching, or even just doing `"".join(delta.text for delta in prompt (...)`. But no here subclass this instead, tells me the wrapper around a web API.


Hey there, I helped design the Python library.

The `stream` context manager actually does expose an async iterator (in the async client), so you could instead do this for the simple case:

    with client.beta.threads.runs.create_and_stream(…) as stream:
      async for text in stream.text_deltas:
        print(text, end="", flush=True)
which I think is roughly what you want.

Perhaps the docs should be updated to highlight this simple case earlier.

We are also considering expanding this design, and perhaps replacing the callbacks, like so:

    with client.beta.threads.runs.create_and_stream(…) as stream:
      async for event in stream.all_events:
        if event.type == 'text_delta':
          print(event.delta.value, end='')
        elif event.type == 'run_step_delta':
          event.snapshot.id
          event.delta.step_details...
which I think is also more in line with what you expect. (you could also `match event: case TextDelta: …`).

Note that the context manager is required because otherwise there's no way to tell if you `break` out of the loop (or otherwise stop listening to the stream) which means we can't close the request (and you both keep burning tokens and leak resources in your app).


Context managers are a great abstraction.


Everything feels unidiomatic. The API design is bad, the frontends they build are horrific, reliability and availability are shocking.

And yet the AI is so good I put up with them everyday

If they ever grow into a proper product org they'll be unstoppable.


Hi there, I help design the OpenAI APIs. Would you be able to share more?

You can reply here or email me at atty@openai.com.

(Please don't hold back; we would love to hear the pain points so we can fix them.)


does your team do usability tests on the apis before launching them?

if you got 3-5 developers to try and use one of the sdks to build something, i bet you'd see common trends.

e.g. we recently had to update an assistant with new data everyday and get 1 response, and this is what the engineer came up with. probably it could be improved, but this is really ugly

``` const file = await openai.files.create({ file: fs.createReadStream(fileName), purpose: 'assistants', }) await openai.beta.assistants.update(assistantId, { file_ids: [file.id], })

  const { id: threadId } = await openai.beta.threads.create({
   messages: [
    {
     role: 'user',
     content:
      'Create PostSuggestions from the file. Remember to keep the style fun and engaging, not just regurgitating the headlines. Read the WHOLE article.',
    },
   ],
  })
  const getSuggestions = async (runIdArg: string) => {
   return new Promise<PostSuggestions>(resolve => {
    const checkStatus = async () => {
     const { status, last_error, required_action } = await openai.beta.threads.runs.retrieve(threadId, runIdArg)

     console.log({ status })
     if (status === 'requires_action') {
      if (required_action?.type === 'submit_tool_outputs') {
       required_action?.submit_tool_outputs?.tool_calls?.forEach(async toolOutput => {
        const parsed = PostSuggestions.safeParse(JSON.parse(toolOutput.function.arguments))
        if (parsed.success) {
         await openai.beta.threads.runs.cancel(threadId, runIdArg)
         resolve(parsed.data)
        } else {
         console.error(`failed to parse args from openai to my type (errors=${parsed.error.errors}`)
        }
       })
      } else {
       console.error(`requires_action, but not submit_tool_outputs (type=${required_action?.type})`)
      }
     } else if (status === 'completed') {
      throw new Error(`status is completed, but no data. supposed to go to requires_action`)
     } else if (status === 'failed') {
      throw new Error(`message=${last_error?.message}, code=${last_error?.code}`)
     } else {
      setTimeout(checkStatus, 500)
     }
    }

    checkStatus()
   })
  }
  const { id: runId } = await openai.beta.threads.runs.create(threadId, {
   assistant_id: assistantId,
  })
  console.time('openai create thread')
  const newsSuggestions = await getSuggestions(runId)
  console.timeEnd('openai create thread')
```


just to add to this, it's not helped by the docs. either they don't exist, or the seo isn't working right.

e.g. search term for me "openai assistant service function call node". The first 2 results are community forums, not what i'm looking for. The 3rd is seemingly the official one but doesn't actually answer the question (how to use the assistance service with node and function calling) with an example. The 4th is in python.

https://community.openai.com/t/how-does-function-calling-act...

https://community.openai.com/t/how-assistant-api-function-ca...

https://platform.openai.com/docs/guides/function-calling

https://learn.microsoft.com/en-us/azure/ai-services/openai/h...


I'm sorry for your experience, and thanks very much for sharing the code snippet - that's helpful!

We did indeed code up some sample apps and highlighted this exact concern. We have some helpers planned to make it smoother, which we hope to launch before Assistants GA. For streaming beta, we were focused just on the streaming part of these helpers.


Hey, random question.

Is there a technical reason why log probs aren't available when using function calling? It's not a problem, I've already found a workaround. I was just curious haha.

In general I feel like the function calling/tool use is a bit cumbersome and restrictive so I prefer to write the typescript in the functions namespace myself and just use json_mode.


Have you seen/tried the `.runTools()` helper?

Docs: https://github.com/openai/openai-node?tab=readme-ov-file#aut...

Example: https://github.com/openai/openai-node/blob/bb4bce30ff1bfb06d...

(if what you're fundamentally trying to do is really just get JSON out, then I can see how json_mode is still easier).


Who can I reach out to for feedback on the web UI? Specifically, the chat.openai.com interface.

Web developer/designer for 24 years so I have a lot of ideas


...except for all the others.

Use Claude in Safari and the browser completely locks up after a single response.


My experience is their official Python library was easy to use, no surprises, everything is typed and generated from the OpenAPI spec in a thoughtful way.

The tools are great because they don't invent their own DSL, they "just" use JSON schemas.

Maybe they ought to contribute changes to OpenAPI to support streaming APIs better.

In contrast so many startups make their own annotation-driven DSLs for Python with their branding slapped over everything. It gives desperate-for-lock-in vibes. The last people OpenAI should be taking advice from for their API design is this forum.


How is suggesting the use of iterators and named tuples related to creating domain specific languages? If anything I'd say they're a much more generic and universally recognizable approach than having users subclass `AssistantEventHandler` to be passed to `client.beta.threads.runs.create_and_stream`, the context manager. This is very much a long way past just using JSON schemas but that part is ok - there's a REST API, and there's a library. If you're keen on the simplicity of JSON schema then by all means use the API with `requests` or your preferred http client library. Since that's always an option, it stands to reason that the point of having a dedicated library is to provide thoughtful abstractions that make it easier to use the service.

What I'm arguing is precisely that the abstractions in the library (such as the `AssistantEventHandler` shown in the article) are ineffective in making things simpler. They force you to over-engineer solutions and distribute state unnecessarily and be aware of that specific class interface when it could've just been something you use in a `for x in y` loop like everyone would know to do without spending an afternoon looking over docs and figuring out how the underlying implicit FSM works.


Probably written by GPT4


It’s not the case. The SDK is a collaboration between OpenAI and Stainless.

https://www.stainlessapi.com/

As a Stainless contributor I can guarantee you a lot of thoughts has been put into the design, and it definitely isn’t written by an ML model


Meta seems to actually be taking all the right steps in how they're contributing to open source AI research. Is this a "commodotize your complement" kind of situation?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: