Just to add to this, I run through a lot of these topics around fine-tuning Llama 2 on your own dataset (for me it's my own code :P) in a coding live stream a couple weeks ago. All on Colab single GPU
I have a couple more one where I do a QLoRa fine tuning session and explain the concepts as a personally self taught engineer (software engineer of 8 years moving into ML recently)
Overall I'm trying to breakdown how I'm approaching a lot of my personal projects and my current AI driven startup. Want to make this information as accessible as possible. Also have a series where I'm fine-tuning a model to be the smallest webdev llm as possible which seems like people are liking. Only been streaming for about a month and plenty more to come.
Ask me any question about the stream and fine-tuning llama!
I've read that SFT is good for "leveraging existing knowledge" gained during initial pretraining, and helpful in changing the way that the model responds, but not useful for teaching it new knowledge. In your experience is that true?
For example, changing the way in which it responds could be:
- debate me
- brainstorm
- be sarcastic
Which also seems like something that could be accomplished with a system prompt or few shot examples, so I'm not sure when SFT is the more appropriate approach or what the tradeoffs are.
Alternatively, gaining new knowledge would be training it on a dataset of e.g. sports trivia to make it highly effective at answering those types of questions.
P.S. nice username... Irving Fisher would approve.
I have an RAG video (my "make a ChatGPT with podcasts" video) you might be interested in. Semantic search is increddible and you might be surprised how good a Q/A solution is by just extracting passages that answer the question.
Overall it depends on whether or not you can turn your data into a fine-tuning data and if you can find a low parameter (enough) model that can use your found contexts as input to host yourself of use inference endpoints. Hosting an LLM is actually not easy and I'm finding in the field working an information retrieval business OpenAI isn't terrible compared to costs of having a GPUs for your users across the world.
Sorry - myself I don't have much experience yet, I am at the research phase, but from what I read it makes sense to finetune the model to better understand the format used for calling the external tools including a search engine.
I wrote a simple implementation to do this in ChatGPT via local plugin [0]. Obviously it doesn’t hit the “fully private” requirement but I imagine it would be relatively straightforward to integrate into a local LLM. The question is whether a local LLM would be as good at grabbing enough context and nuance from the project to answer meaningfully as GPT-4 is able to do with plugins.
One of my streams I essentially build this from scratch https://www.youtube.com/watch?v=kBB1A2ot-Bw&t=236s. A retriever reader model, let me know if you want the code I think I like the colab in the comments but let me know if you need more.
this is brilliant.
could you do a series about how to prepare custom data sets for finetuning. thats the part that a lot of other tutorials skip on.
Especially for different goals - like safety, accuracy, etc.
well not so much as the raw data acquisition (scraping and stuff), but really data prep for finetuning.
I'm hearing that each model needs it in a different format - chat finetuning data is different from instruct, etc etc
Would be good to see a rigorous analysis of these PEFT methods on quality. There still seems to be a debate on whether these methods sacrifice quality or not.
Fine-tuning Llama stream: https://www.youtube.com/watch?v=TYgtG2Th6fI&t=2282s
I have a couple more one where I do a QLoRa fine tuning session and explain the concepts as a personally self taught engineer (software engineer of 8 years moving into ML recently)
QloRa fine-tuning stream: https://www.youtube.com/watch?v=LitybCiLhSc&t=4584s
Overall I'm trying to breakdown how I'm approaching a lot of my personal projects and my current AI driven startup. Want to make this information as accessible as possible. Also have a series where I'm fine-tuning a model to be the smallest webdev llm as possible which seems like people are liking. Only been streaming for about a month and plenty more to come.
Ask me any question about the stream and fine-tuning llama!