GLM-4-9B: open-source model with superior performance to Llama-3-8B

ilaksh · 2024-06-06T01:14:42 1717636482

Looks like terrific technology. However, the translation says that it's an "irrevocable revocable" non-commercial license with a form to apply for commercial use.

pilotneko · 2024-06-06T13:24:15 1717680255

That's weird, because the repository states Apache 2.0. https://github.com/THUDM/GLM-4/blob/main/LICENSE

Oops, you are right. The code is Apache 2.0, license for the model weights is separate.

mikeqq2024 · 2024-06-07T07:42:41 1717746161

"non-exclusive, worldwide, irrevocable, non-sublicensable, revocable, photo-free copyright license."

Translation error? output from GPT: "non-exclusive, global, non-transferable, non-sublicensable, revocable, royalty-free license."

great_psy · 2024-06-05T23:55:44 1717631744

I’m excited to hear work is being done on models that support function calling natively.

Does anybody know if performance could be greatly increased if only a single language was supported ?

I suspect there’s a high demand for models that are maybe smaller and can run faster if the tradeoff is support for only English.

Is this available in ollama ?

freeqaz · 2024-06-06T07:59:35 1717660775

Are there any other models that support function calling?

wtarreau · 2024-06-07T11:26:03 1717759563

I ran some tests on phi-3 and mistral-7b and it's not very hard to teach them to use tools, even though they were not designed for this. It turns out these models obey their instructions quite well and when you explain them that if they need to look up data on the net or to perform a calculation, they must formulate this demand with a specific syntax, they do a pretty good job. You just have to enable reverse-prompting so that the evaluation stops after their demand, your tools do the job (or you simulate it manually) and their task continues.

abrichr · 2024-06-06T16:38:37 1717691917

> GLM-4V-9B possesses dialogue capabilities in both Chinese and English at a high resolution of 1120*1120. In various multimodal evaluations, including comprehensive abilities in Chinese and English, perception & reasoning, text recognition, and chart understanding, GLM-4V-9B demonstrates superior performance compared to GPT-4-turbo-2024-04-09, Gemini 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus.

But according to their own evaluation further down, gpt-4o-2024-05-13 outperforms GLM-4V-9B on every task except OCRBench.

czl · 2024-06-06T21:45:16 1717710316

Based on size (parameter count) they are in different categories. GLM-4V-9B is in the "light-weight" competition. gpt-4o-2024-05-13 is in the medium or heavy "weight" competition.

norwalkbear · 2024-06-06T12:37:50 1717677470

Isnt 3-70b so good, reddit llamaers are saying people should buy hardware to run it?

Llama-3-8b was garbage for me but damn 70b is good enough

reaperman · 2024-06-06T23:00:40 1717714840

The unquantized llama 70B requires 142GB of VRAM. Some of the quantized versions are quite decent but they do tend to get overquantized below around 26.5GB of VRAM (~3 bits per weight).

So you’d at minimum be looking at dual 3090 with NVLink for about $4000 or so. Or for the highest performing non-quantized model, you’d be spending about $40,000 for two A100’s.

norwalkbear · 2024-06-20T21:27:39 1718918859

So a MacBook m series is a decent buy

Manabu-eo · 2024-06-06T23:16:28 1717715788

No need for NVLink just for inference, not even with tensor parallelism. And you can get used 3090 much cheaper than that.

wkat4242 · 2024-06-06T23:21:14 1717716074

True, you can buy 2x 4090 FE brand new for $4000

oarth · 2024-06-06T07:17:57 1717658277

If those numbers are true then it's very impressive. Hoping for llama.cpp support.

nubinetwork · 2024-06-06T14:47:57 1717685277

1M context, but does it really? I've been hit with 32K models that crap out after 10K before...

fragmede · 2024-06-06T06:23:01 1717654981

model available, not open source.

refulgentis · 2024-06-06T22:31:36 1717713096

Ehhh man this is frustrating, 7B was a real sweet spot for hobbyist. 8B...doable. I've been joking to myself/simultaneously worried that Llama 3 8B and Phi-3 "3B" (3.8B) would start a "ehhh, +1, might as well be a rounding error" thing. It's a big deal! I measure a 33% decrease just going from 3B to 3.8B when inferencing on CPU.