Hacker News new | past | comments | ask | show | jobs | submit login

Just to add to this, I run through a lot of these topics around fine-tuning Llama 2 on your own dataset (for me it's my own code :P) in a coding live stream a couple weeks ago. All on Colab single GPU

Fine-tuning Llama stream: https://www.youtube.com/watch?v=TYgtG2Th6fI&t=2282s

I have a couple more one where I do a QLoRa fine tuning session and explain the concepts as a personally self taught engineer (software engineer of 8 years moving into ML recently)

QloRa fine-tuning stream: https://www.youtube.com/watch?v=LitybCiLhSc&t=4584s

Overall I'm trying to breakdown how I'm approaching a lot of my personal projects and my current AI driven startup. Want to make this information as accessible as possible. Also have a series where I'm fine-tuning a model to be the smallest webdev llm as possible which seems like people are liking. Only been streaming for about a month and plenty more to come.

Ask me any question about the stream and fine-tuning llama!




What is the general thought process on when it makes sense to use RAG vs fine tuning?

How does segmenting fine tuning models make sense? Do I need a terraform LLM, a SQL LLM, and a python LLM, or can I just use a “code” LLM?


Fine tuning for training the model to perform a new task, RAG for adding knowledge.

In your example, you would fine tune the model to train it to code in a language it hasn't seen before, RAG will not really help with that.


I've read that SFT is good for "leveraging existing knowledge" gained during initial pretraining, and helpful in changing the way that the model responds, but not useful for teaching it new knowledge. In your experience is that true?

For example, changing the way in which it responds could be:

  - debate me
  - brainstorm
  - be sarcastic
Which also seems like something that could be accomplished with a system prompt or few shot examples, so I'm not sure when SFT is the more appropriate approach or what the tradeoffs are.

Alternatively, gaining new knowledge would be training it on a dataset of e.g. sports trivia to make it highly effective at answering those types of questions.

P.S. nice username... Irving Fisher would approve.


I have an RAG video (my "make a ChatGPT with podcasts" video) you might be interested in. Semantic search is increddible and you might be surprised how good a Q/A solution is by just extracting passages that answer the question.

Overall it depends on whether or not you can turn your data into a fine-tuning data and if you can find a low parameter (enough) model that can use your found contexts as input to host yourself of use inference endpoints. Hosting an LLM is actually not easy and I'm finding in the field working an information retrieval business OpenAI isn't terrible compared to costs of having a GPUs for your users across the world.


There is an article at the original site about that: https://www.anyscale.com/blog/fine-tuning-is-for-form-not-fa...

Everybody new to this field thinks that he needs finetuning to teach the LLM of new facts. I made the same mistake initially, later I published a slightly ranty post on that: https://zzbbyy.substack.com/p/why-you-need-rag-not-finetunin...


Quick question - Gorilla paper talks about finetuning for RAG. Do you see this in practice ? can you do finetuning that specifically affects RAG ?


Sorry - myself I don't have much experience yet, I am at the research phase, but from what I read it makes sense to finetune the model to better understand the format used for calling the external tools including a search engine.


really need a simple "put your source stuff in this directory, then press this button, then chat with your contents" type app/module/library.

too much implementation detail required make it inaccessible for any non-significant use case. i imagine privateGpt will get there slowly


I wrote a simple implementation to do this in ChatGPT via local plugin [0]. Obviously it doesn’t hit the “fully private” requirement but I imagine it would be relatively straightforward to integrate into a local LLM. The question is whether a local LLM would be as good at grabbing enough context and nuance from the project to answer meaningfully as GPT-4 is able to do with plugins.

[0] https://github.com/samrawal/chatgpt-localfiles


One of my streams I essentially build this from scratch https://www.youtube.com/watch?v=kBB1A2ot-Bw&t=236s. A retriever reader model, let me know if you want the code I think I like the colab in the comments but let me know if you need more.


At this stage of the AI, The implementation details matters a lot for the chat to be actually meaningful… RAG is over-hyped


this is brilliant. could you do a series about how to prepare custom data sets for finetuning. thats the part that a lot of other tutorials skip on. Especially for different goals - like safety, accuracy, etc.


Of course I have a few where I web scrape and build a dataset for myself with prefix tokens. I can break that down more on a specific stream about it.


well not so much as the raw data acquisition (scraping and stuff), but really data prep for finetuning. I'm hearing that each model needs it in a different format - chat finetuning data is different from instruct, etc etc


one gpu? feasible with one 3060?


Absolutely. For QLORA / 4bit / GPTQ finetuning, you can train a 7B easily on an RTX 3060 (12GB VRAM).

If you have a 24GB VRAM GPU like a RTX 3090/4090, you can Qlora finetune a 13B or even a 30B model (in a few hours).


Would be good to see a rigorous analysis of these PEFT methods on quality. There still seems to be a debate on whether these methods sacrifice quality or not.


+1 this




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: