You can use this command to apply the delta weights. (https://github.com/lm-sys/...

superkuh · on April 3, 2023

Thanks! https://huggingface.co/lmsys/vicuna-13b-delta-v0

Edit, later: I found some instructive pages on how to use the vicuna weights with llama.cpp (https://lmsysvicuna.miraheze.org/wiki/How_to_use_Vicuna#Use_...) and pre-made ggml format compatible 4-bit quantized vicuna weights, https://huggingface.co/eachadea/ggml-vicuna-13b-4bit/tree/ma... (8GB ready to go, no 60+GB RAM steps needed)

eurekin · on April 3, 2023

I did try, but got:

``` ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported. ```

superkuh · on April 3, 2023

> Unfortunately there's a mismatch between the model generated by the delta patcher and the tokenizer (32001 vs 32000 tokens). There's a tool to fix this at llama-tools (https://github.com/Ronsor/llama-tools). Add 1 token like (C controltoken), and then run the conversion script.

DrSiemer · on April 4, 2023

Just rename it in the tokenconfig.json

eurekin · on April 4, 2023

Thanks, that indeed worked!

This and using conda in wsl2, instead on bare windows