Looks like terrific technology. However, the translation says that it's an "irrevocable revocable" non-commercial license with a form to apply for commercial use.
I ran some tests on phi-3 and mistral-7b and it's not very hard to teach them to use tools, even though they were not designed for this. It turns out these models obey their instructions quite well and when you explain them that if they need to look up data on the net or to perform a calculation, they must formulate this demand with a specific syntax, they do a pretty good job. You just have to enable reverse-prompting so that the evaluation stops after their demand, your tools do the job (or you simulate it manually) and their task continues.
> GLM-4V-9B possesses dialogue capabilities in both Chinese and English at a high resolution of 1120*1120. In various multimodal evaluations, including comprehensive abilities in Chinese and English, perception & reasoning, text recognition, and chart understanding, GLM-4V-9B demonstrates superior performance compared to GPT-4-turbo-2024-04-09, Gemini 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus.
But according to their own evaluation further down, gpt-4o-2024-05-13 outperforms GLM-4V-9B on every task except OCRBench.
Based on size (parameter count) they are in different categories. GLM-4V-9B is in the "light-weight" competition. gpt-4o-2024-05-13 is in the medium or heavy "weight" competition.
The unquantized llama 70B requires 142GB of VRAM. Some of the quantized versions are quite decent but they do tend to get overquantized below around 26.5GB of VRAM (~3 bits per weight).
So you’d at minimum be looking at dual 3090 with NVLink for about $4000 or so. Or for the highest performing non-quantized model, you’d be spending about $40,000 for two A100’s.
Ehhh man this is frustrating, 7B was a real sweet spot for hobbyist. 8B...doable. I've been joking to myself/simultaneously worried that Llama 3 8B and Phi-3 "3B" (3.8B) would start a "ehhh, +1, might as well be a rounding error" thing. It's a big deal! I measure a 33% decrease just going from 3B to 3.8B when inferencing on CPU.