> Llama 13B > Llama.cpp 30B > LLaMA-65B the "number B" stands for "number of bil... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

MuffinFlavored on April 1, 2023 | parent | context | favorite | on: Llama.cpp 30B runs with only 6GB of RAM now

> Llama 13B

> Llama.cpp 30B

> LLaMA-65B

the "number B" stands for "number of billions" of parameters... trained on?

like you take 65 billion words (from paragraphs / sentences from like, Wikipedia pages or whatever) and "train" the LLM. is that the metric?

why aren't "more parameters" (higher B) always better? aka return better results

how many "B" parameters is ChatGPT on GPT3.5 vs GPT4?

GPT3: 175b

GPT3.5: ?

GPT4: ?

https://blog.accubits.com/gpt-3-vs-gpt-3-5-whats-new-in-open...

how is Llama with 13B parameters able to compete with GPT3 with 175B parameters? It's 10x+ less? How much RAM goes it take to run "a single node" of GPT3 / GPT3.5 / GPT4?

turmeric_root on April 2, 2023 [–]

> the "number B" stands for "number of billions" of parameters... trained on?

No, it's just the size of the network (i.e. number of learnable parameters). The 13/30/65B models were each trained on ~1.4 trillion tokens of training data (each token is around half a word).

Consider applying for YC's W25 batch! Applications are open till Nov 12.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact