Just to add to this, I run through a lot of these topics around fine-tuning Llam...

SOLAR_FIELDS · on Aug 11, 2023

What is the general thought process on when it makes sense to use RAG vs fine tuning?

How does segmenting fine tuning models make sense? Do I need a terraform LLM, a SQL LLM, and a python LLM, or can I just use a “code” LLM?

DebtDeflation · on Aug 11, 2023

Fine tuning for training the model to perform a new task, RAG for adding knowledge.

In your example, you would fine tune the model to train it to code in a language it hasn't seen before, RAG will not really help with that.

solaxun · on Aug 14, 2023

I've read that SFT is good for "leveraging existing knowledge" gained during initial pretraining, and helpful in changing the way that the model responds, but not useful for teaching it new knowledge. In your experience is that true?

For example, changing the way in which it responds could be:

  - debate me
  - brainstorm
  - be sarcastic

Which also seems like something that could be accomplished with a system prompt or few shot examples, so I'm not sure when SFT is the more appropriate approach or what the tradeoffs are.

Alternatively, gaining new knowledge would be training it on a dataset of e.g. sports trivia to make it highly effective at answering those types of questions.

P.S. nice username... Irving Fisher would approve.

jawerty · on Aug 11, 2023

I have an RAG video (my "make a ChatGPT with podcasts" video) you might be interested in. Semantic search is increddible and you might be surprised how good a Q/A solution is by just extracting passages that answer the question.

Overall it depends on whether or not you can turn your data into a fine-tuning data and if you can find a low parameter (enough) model that can use your found contexts as input to host yourself of use inference endpoints. Hosting an LLM is actually not easy and I'm finding in the field working an information retrieval business OpenAI isn't terrible compared to costs of having a GPUs for your users across the world.

zby · on Aug 11, 2023

There is an article at the original site about that: https://www.anyscale.com/blog/fine-tuning-is-for-form-not-fa...

Everybody new to this field thinks that he needs finetuning to teach the LLM of new facts. I made the same mistake initially, later I published a slightly ranty post on that: https://zzbbyy.substack.com/p/why-you-need-rag-not-finetunin...

sandGorgon · on Aug 11, 2023

Quick question - Gorilla paper talks about finetuning for RAG. Do you see this in practice ? can you do finetuning that specifically affects RAG ?

zby · on Aug 12, 2023

Sorry - myself I don't have much experience yet, I am at the research phase, but from what I read it makes sense to finetune the model to better understand the format used for calling the external tools including a search engine.

purplecats · on Aug 11, 2023

really need a simple "put your source stuff in this directory, then press this button, then chat with your contents" type app/module/library.

too much implementation detail required make it inaccessible for any non-significant use case. i imagine privateGpt will get there slowly

zora_goron · on Aug 11, 2023

I wrote a simple implementation to do this in ChatGPT via local plugin [0]. Obviously it doesn’t hit the “fully private” requirement but I imagine it would be relatively straightforward to integrate into a local LLM. The question is whether a local LLM would be as good at grabbing enough context and nuance from the project to answer meaningfully as GPT-4 is able to do with plugins.

[0] https://github.com/samrawal/chatgpt-localfiles

jawerty · on Aug 11, 2023

One of my streams I essentially build this from scratch https://www.youtube.com/watch?v=kBB1A2ot-Bw&t=236s. A retriever reader model, let me know if you want the code I think I like the colab in the comments but let me know if you need more.

whoevercares · on Aug 12, 2023

At this stage of the AI, The implementation details matters a lot for the chat to be actually meaningful… RAG is over-hyped

sandGorgon · on Aug 11, 2023

this is brilliant. could you do a series about how to prepare custom data sets for finetuning. thats the part that a lot of other tutorials skip on. Especially for different goals - like safety, accuracy, etc.

jawerty · on Aug 11, 2023

Of course I have a few where I web scrape and build a dataset for myself with prefix tokens. I can break that down more on a specific stream about it.

sandGorgon · on Aug 12, 2023

well not so much as the raw data acquisition (scraping and stuff), but really data prep for finetuning. I'm hearing that each model needs it in a different format - chat finetuning data is different from instruct, etc etc

SubiculumCode · on Aug 11, 2023

one gpu? feasible with one 3060?

nacs · on Aug 11, 2023

Absolutely. For QLORA / 4bit / GPTQ finetuning, you can train a 7B easily on an RTX 3060 (12GB VRAM).

If you have a 24GB VRAM GPU like a RTX 3090/4090, you can Qlora finetune a 13B or even a 30B model (in a few hours).

kouroshh · on Aug 11, 2023

Would be good to see a rigorous analysis of these PEFT methods on quality. There still seems to be a debate on whether these methods sacrifice quality or not.

jawerty · on Aug 11, 2023

+1 this