But there's really nothing about chess that makes reasoning a prerequisite, a win is a win as long as it's a win. This is kind of a semantics game: it's a question of whether the degree of skill people observe in an LLM playing chess is actually some different quantity than the chance it wins.
I mean at some level you're saying that no matter how close to 1 the win probability (1 - epsilon) gets, both of the following are true:
A. you should always expect for the computation that you're able to do via conscious reasoning alone to always be sufficient, at least in principle, to asymptotically get a higher win probability than a model, no matter what the model's win probability was to begin with
B. no matter how close to 1 that the model's win rate p=(1 - epsilon) gets, because logical inference is so non-smooth, the win rate on yet-unseen data is fundamentally algorithmically random/totally uncorrelated to in-distribution performance, so it's never appropriate to say that a model can understand or to reason
To me it seems that people are subject to both of these criteria, though. They have a tendency to cap out at their eventual skill cap unless given a challenge to nudge them to a higher level, and likewise possession of logical reasoning doesn't let us say much at all about situations that their reasoning is unfamiliar with.
I also think, if you want to say that what LLMs do has nothing to do with understanding or ability, then you also have to have an alternate explanation for the phenomenon of AlphaGo defeating Lee Sedol being a catalyst for top Go players being able to rapidly increase their own rankings shortly after.
Perceptual closure under repeated iterations is just a stronger form of perceptual losslessness, then, after k generations instead of the usual k=1. What you’re describing is called generation loss, and there are in fact perceptually lossy image codecs that have essentially no generation loss; jpeg xl is one https://m.youtube.com/watch?v=FtSWpw7zNkI
Technically it also had a fourth :^) [0] but it was spun out into a separate project of its own, jpegli [1]: JPEG but it uses some tricks from JPEG XL. These include spatially adaptive quantization, quantization matrices that better preserve psychovisual detail, more efficient color spaces, and also HDR (10+ bit depth) support [2].
Pretty good news! I imagine that it'll take a while before libjxl and jpegli won't both supply a cjpegli binary so that'll be mildly annoying at the start, but hopefully this way it'll be adopted quicker so it'll accept more input formats and image software will switch over to jpegli native export instead of using the libjpeg compatible controls.
It's a really excellent software. Its default output quality is storage quality, while the file size is acceptable for mobile data and cloud storage of pictures in most countries. It producing progressive pictures by default still helps when quickly swiping through a whole album of vacation pictures stored on cloud storage, and its progressive output actually reduces size rather than add to it. And it's compatible with everything so now I just throw everything lossy I produce through its default settings until JXL becomes natively supported in Chrome and Windows.
I think you might be thinking of applying a kind of low-rank decomposition to the vocabulary embeddings. A quick search on Google Scholar suggests that this might be useful in the context of multilingual tokenization.
What I would realllly like is AI autocomplete that is somehow sensitive to my mental map of the code I'm writing. There's too often a temptation when writing something brand new to get to a semi working proof of concept as fast as possible with an LLM, but it's now like 1000 lines of code that I really don't understand.
Previously I used to obey a rule that if I ever use LLM generated code, I have to manually type it in - so that I myself have the familiarity of every symbol in the code. This helped considerably reduce that problem. But I stopped doing that out of laziness.
4 trustworthy pdfs of very high quality that I've used as resources to fill my gaps:
- The 3 Kevin Murphy textbooks on Probabilistic Machine Learning
- Deep Learning by Goodfellow et al
I think they do contain some of the nuggets you're looking for, and have explanations too. I typically just use the index to find what I'm looking for but you could def pass them through a pdf llm tool to have natural language search/extract answers too eg PocketLLM [0]
Here's a practical in this vein but much simpler - if you're trying to answer a question with an LLM, and have it answer in json format within the same prompt, for many models the accuracy is worse than just having it answer in plaintext. The reason is that you're now having to place a bet that the distribution of json strings it's seen before meshes nicely with the distribution of answers to that question.
So one remedy is to have it just answer in plaintext, and then use a second, more specialized model that's specifically trained to turn plaintext into json. Whether this chain of models works better than just having one model all depends on the distribution match penalties accrued along the chain in between.
I wrap the plaintext in quotes, and perhaps a period, so that it knows when to start and when to stop, you can add logit biases for the syntax and pass period as a stop marker to chatgpt apis.
Also you don't need to use a model to build a json from plaintext answers lol, just use a programming language.
a combination of lots of things, with the general theme being focused prompts good at individual, specific subtasks circuits… off the top of my head:
- that try to extract the factual / knowledge content and try to update the rest of the system (e.g. if the user chats you to not send notifications after 9pm, ideally you’d like the whole system to reflect that. if they say they like the color gold, you’d like the recommendation system to know that.)
- detect the emotional valence of the user’s chat and raise an alert if they seem, say, angry
- speculative “given this new information, go over old outputs and see if any of our assumptions were wrong or apt and adjust accordingly”
- learning/feedback systems that run evals after every k runs to update and optimize the prompt
- systems where there is a large state space but any particular user has a very sparse representation (pick the top k adjectives for this user out of a list of 100, where each adjective is evaluated in its own prompt)
- llm circuits with detailed QA rules (and/or many rounds of generation and self-reflection to ensure generation/result quality)
- speculative execution in order to trade increased cost / computation for lower latency. (cf. graph of thoughts prompting)
- out of band knowledge generation, prompt generation, etc.
- alerting / backstopping / monitoring systems
- running multiple independent systems in parallel and then picking the best one or even merge the best results across all of them.
the more, smaller prompts, the easier to do eval and testing, as well as making the system more parallelizable. also, you get stronger, deeper signals that communicate a deeper domain understanding to the user s.t. they think you know what you’re doing.
but the point is every bit as much that you are embodying your own human cognition within the llm— the reason to do that in the first place is because it is virtually infinitely scalable when it makes it out of your brain and onto/into silicon. even if each marginal prompt you add only has a .1% chance of “hitting”, you can just trigger 1000 prompts and voila: one more eureka moment for your system.
sure, there’s diminishing returns in the same way that the CIA wants 100 Iraq analysts but not 10000. but unlike the CIA, you dont need to pag salaries, healthcare, managers, etc. it all scales basically linearly. and, besides, is extremely cheap as long as your prompting is even halfway decent.
I mean at some level you're saying that no matter how close to 1 the win probability (1 - epsilon) gets, both of the following are true:
A. you should always expect for the computation that you're able to do via conscious reasoning alone to always be sufficient, at least in principle, to asymptotically get a higher win probability than a model, no matter what the model's win probability was to begin with
B. no matter how close to 1 that the model's win rate p=(1 - epsilon) gets, because logical inference is so non-smooth, the win rate on yet-unseen data is fundamentally algorithmically random/totally uncorrelated to in-distribution performance, so it's never appropriate to say that a model can understand or to reason
To me it seems that people are subject to both of these criteria, though. They have a tendency to cap out at their eventual skill cap unless given a challenge to nudge them to a higher level, and likewise possession of logical reasoning doesn't let us say much at all about situations that their reasoning is unfamiliar with.
I also think, if you want to say that what LLMs do has nothing to do with understanding or ability, then you also have to have an alternate explanation for the phenomenon of AlphaGo defeating Lee Sedol being a catalyst for top Go players being able to rapidly increase their own rankings shortly after.
reply