Hacker News new | past | comments | ask | show | jobs | submit login
GLM-4-9B: open-source model with superior performance to Llama-3-8B (github.com/thudm)
66 points by marcelsalathe 7 months ago | hide | past | favorite | 17 comments



Looks like terrific technology. However, the translation says that it's an "irrevocable revocable" non-commercial license with a form to apply for commercial use.


That's weird, because the repository states Apache 2.0. https://github.com/THUDM/GLM-4/blob/main/LICENSE

Oops, you are right. The code is Apache 2.0, license for the model weights is separate.


"non-exclusive, worldwide, irrevocable, non-sublicensable, revocable, photo-free copyright license."

Translation error? output from GPT: "non-exclusive, global, non-transferable, non-sublicensable, revocable, royalty-free license."


I’m excited to hear work is being done on models that support function calling natively.

Does anybody know if performance could be greatly increased if only a single language was supported ?

I suspect there’s a high demand for models that are maybe smaller and can run faster if the tradeoff is support for only English.

Is this available in ollama ?


Are there any other models that support function calling?


I ran some tests on phi-3 and mistral-7b and it's not very hard to teach them to use tools, even though they were not designed for this. It turns out these models obey their instructions quite well and when you explain them that if they need to look up data on the net or to perform a calculation, they must formulate this demand with a specific syntax, they do a pretty good job. You just have to enable reverse-prompting so that the evaluation stops after their demand, your tools do the job (or you simulate it manually) and their task continues.


> GLM-4V-9B possesses dialogue capabilities in both Chinese and English at a high resolution of 1120*1120. In various multimodal evaluations, including comprehensive abilities in Chinese and English, perception & reasoning, text recognition, and chart understanding, GLM-4V-9B demonstrates superior performance compared to GPT-4-turbo-2024-04-09, Gemini 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus.

But according to their own evaluation further down, gpt-4o-2024-05-13 outperforms GLM-4V-9B on every task except OCRBench.


Based on size (parameter count) they are in different categories. GLM-4V-9B is in the "light-weight" competition. gpt-4o-2024-05-13 is in the medium or heavy "weight" competition.


Isnt 3-70b so good, reddit llamaers are saying people should buy hardware to run it?

Llama-3-8b was garbage for me but damn 70b is good enough


The unquantized llama 70B requires 142GB of VRAM. Some of the quantized versions are quite decent but they do tend to get overquantized below around 26.5GB of VRAM (~3 bits per weight).

So you’d at minimum be looking at dual 3090 with NVLink for about $4000 or so. Or for the highest performing non-quantized model, you’d be spending about $40,000 for two A100’s.


So a MacBook m series is a decent buy


No need for NVLink just for inference, not even with tensor parallelism. And you can get used 3090 much cheaper than that.


True, you can buy 2x 4090 FE brand new for $4000


If those numbers are true then it's very impressive. Hoping for llama.cpp support.


1M context, but does it really? I've been hit with 32K models that crap out after 10K before...


model available, not open source.


Ehhh man this is frustrating, 7B was a real sweet spot for hobbyist. 8B...doable. I've been joking to myself/simultaneously worried that Llama 3 8B and Phi-3 "3B" (3.8B) would start a "ehhh, +1, might as well be a rounding error" thing. It's a big deal! I measure a 33% decrease just going from 3B to 3.8B when inferencing on CPU.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: