> Additionally, while this wasn’t an issue for GPT, the Llama chat models would ...

redox99 · on Aug 11, 2023

Use a better model.

airoboros supports the PLAINFORMAT token "to avoid backticks, explanations, etc. and just print the code".

https://huggingface.co/TheBloke/airoboros-l2-70B-GPT4-2.0-GG...

crooked-v · on Aug 11, 2023

It's not useful for code, but you can see the difference of approach with NovelAI's homegrown Kayra model, which is set up to handle a mix of text completion and instruct functionality. It never includes extraneous prefix/suffix text and will smoothly follow instructions embedded in text without interrupting the text.

behnamoh · on Aug 11, 2023

Thanks, I'll give this a try.

I wonder if LLMs will have less reasoning power if they simply return the output. AFAIK, they think by writing their thoughts. So forcing an LLM to just return the goddamn code might limit its reasoning skills, leading to poor code. Is that true?

redox99 · on Aug 11, 2023

Potentially it could have an impact if it omits a high level description before writing the code, although obviously things like "Sure! Happy to help" do not help.

In practice I haven't seen it make too much of a difference with GPT. The model can still use comments to express itself.

For non coding tasks, adding "Think step by step" makes a huge difference (versus YOLOing a single word reply).

behnamoh · on Aug 11, 2023

> although obviously things like "Sure! Happy to help" do not help.

Yes you're right. I'm mostly concerned with the text that actually "computes" something before the actual code begins. Niceties like "sure! happy to help" don't compute anything.

CoT indeed works. Now I've seem people take it to the extreme by having tree of thoughts, forest of thoughts, etc. but I'm not sure how much "reasoning" we can extract from a model that is obviously limited in terms of knowledge and intelligence. CoT already gets us to 80% of the way. With some tweaks it can get even better.

I've also seen simulation methods where GPT "agents" talk to each other to form better ideas about a subject. But then again, it's like trying to achieve perpetual motion in physics. One can't get more intelligence from a system than one puts in the system.

kaibee · on Aug 11, 2023

> But then again, it's like trying to achieve perpetual motion in physics. One can't get more intelligence from a system than one puts in the system.

Not necessarily the same thing, as you're still putting in more processing power/checking more possible paths. Its kinda like simulated annealing, sure the system is dumb, but as long as checking if you have a correct answer is cheap, it still narrows down the search space a lot.

behnamoh · on Aug 11, 2023

> Its kinda like simulated annealing.

Yeah I get that. We assume there's X amount of intelligence in the LLM and try different paths to tap on that potential. The more paths are simulated, the closer we get to the LLM's intelligence asymptote. But then that's it—we can't go any further.

dontupvoteme · on Aug 11, 2023

You can also just parse the text for all valid code blocks and combine them. I have a script which automatically check the clipboard for this

There's no reason to handle the LLM side of things, unless you want to try and optimize the amount of tokens which are code vs comments vs explanations and such. (Though you could also just start a new context window with only your code or such)

mikeravkine · on Aug 11, 2023

The model card also has prompt formats for context aware document Q/A and multi-CoT, using those correctly improves performance at such tasks significantly.

kouroshh · on Aug 11, 2023

Llama-2-chat models have been overly fine-tuned to be like this. You can give a few-shot prompting a try, but they still don't gurantee a desired output. The best way to guarantee is to fine-tune on small (~1k) data points and go from there.

solaxun · on Aug 14, 2023

fine tune the chat or base model?

jauntbox · on Aug 12, 2023

It depends on what your goal is, but I've had success reproducing specific output formatting by fine-tuning the base LLaMA2 models instead of the RLHF'd models. My use cases were simpler - information extraction/synthesis from text rather than creative writing. The base models might not be good fits for your task.

screye · on Aug 12, 2023

Prompt the model to always output answers / code within ```content``` strings or json. If it's json, then you can identify where it starts and ends. Strip everything outside the json.