Hacker News new | past | comments | ask | show | jobs | submit login

You can use this command to apply the delta weights. (https://github.com/lm-sys/FastChat#vicuna-13b) The delta weights are hosted on huggingface and will be automatically downloaded.



Thanks! https://huggingface.co/lmsys/vicuna-13b-delta-v0

Edit, later: I found some instructive pages on how to use the vicuna weights with llama.cpp (https://lmsysvicuna.miraheze.org/wiki/How_to_use_Vicuna#Use_...) and pre-made ggml format compatible 4-bit quantized vicuna weights, https://huggingface.co/eachadea/ggml-vicuna-13b-4bit/tree/ma... (8GB ready to go, no 60+GB RAM steps needed)


I did try, but got:

``` ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported. ```


> Unfortunately there's a mismatch between the model generated by the delta patcher and the tokenizer (32001 vs 32000 tokens). There's a tool to fix this at llama-tools (https://github.com/Ronsor/llama-tools). Add 1 token like (C controltoken), and then run the conversion script.


Just rename it in the tokenconfig.json


Thanks, that indeed worked!

This and using conda in wsl2, instead on bare windows




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: