InternLM2

zone411 · 2024-04-01T04:00:56 1729753239

We really need better long context benchmarks than needle-in-a-haystack. There is LV-Eval (https://arxiv.org/abs/2402.05136) with multi-hop QA that's better but still pretty basic.

s-macke · 2024-04-13T20:18:37 1729753239

We got RULER 3 days ago.

https://arxiv.org/abs/2404.06654

https://github.com/hsiehjackson/RULER

andersa · 2024-04-01T12:16:50 1729753239

Yes, I don't understand why they are using a search benchmark for these... it would be much better to have something like giving it a story up to the context length (from a book? how to find one that's not in the training data?) and have it write a new chapter/ending that is consistent with all prior text and introduces zero inconsistencies.

But how can you automatically evaluate whether it did this?

ricw · 2024-04-01T12:30:27 1729753239

Because that’s how people use llms. You go to ChatGPT to ask a question and get an answer, rather than searching on Google, revising your search because you didn’t know a term, and then look at 3-5 different links to find the answer to what you were searching for

zone411 · 2024-04-01T12:41:52 1729753239

It seems you might be mixing up different types of "context" in LLM benchmarking. In this case, it refers to the input text directly provided to the model during evaluation by the user (as in in-context learning). This is separate from the text an LLM is trained on or can access via RAG methods.

loa_in_ · 2024-04-01T12:37:08 1729753239

That's how people use llms because they (llms) don't seem to be good at the more sophisticated thing

sp332 · 2024-04-01T17:16:25 1729753239

But also because search engines have gotten even worse at answering questions.

esha_manideep · 2024-04-01T03:12:50 1729753239

Pretty amazing to see training data being discussed more openly

WiSaGaN · 2024-04-01T06:19:59 1729753239

Indeed. I think part of the reason when they are not discussed openly may be that much of the data used is copyrighted, which introduces some legal ambiguities.

YetAnotherNick · 2024-04-01T08:36:40 1729753239

IANAL but hiding something doesn't make someone legally immune. Any company could sue LLM companies and they can't hide it during the case. e.g. there is already a similar case on OpenAI.

fl0id · 2024-04-01T09:33:32 1729753239

Yes, but it at the very least delays any findings while you rake in the cash and try to create a favorable environment. OpenAI even stated that think using copyrighted texts is necessary and should be covered by fair use.

milliondreams · 2024-03-31T23:51:26 1729753239

TLDR; 1. InternLM2 is an open-source Large Language Model that has shown improvements over previous models, particularly in long-context modeling. 2. The model uses a unique approach, combining traditional training with Supervised Fine-Tuning and Conditional Online Reinforcement Learning from Human Feedback. 3. It offers a variety of model sizes and training stages to the community, demonstrating significant advancements in AI research and application.

jerpint · 2024-04-01T02:28:24 1729753239

Excited to see how it will perform on the lmsys leaderboard

barsonme · 2024-04-01T04:39:27 1729753239

Is it normal for papers to have that many authors?

barkingcat · 2024-04-01T06:33:15 1729753239

It's not abnormal in many fields. A lot of biology or physics papers have more than that number.

In academic labs, so you're a postgrad working the overnight shift watching some petri dish to make sure the bacteria doesn't die, etc. It's super boring grunt work but you do it so you get on the paper's author list.

esafak · 2024-04-01T04:49:51 1729753239

Only in fields like foundation models and high-energy physics, where immense resources are required. Look at GPT-4's credits: https://openai.com/contributions/gpt-4

exe34 · 2024-04-01T08:42:24 1729753239

Particle physics papers usually have more pages for the names than for the work.

m3kw9 · 2024-04-01T17:31:14 1729753239

Anyone that even sniffed it can get in on the action

ilaksh · 2024-04-01T02:37:52 1729753239

Does anyone know how the free commercial license works? Do they usually grant it? https://wj.qq.com/s2/12727483/5dba/ looks like a form there.

Apache 2 code, free commercial license with application form for weights.

Kwpolska · 2024-04-01T16:01:38 1729753239

The name suggests this is interns posing as a chatbot, especially considering today’s date.

pilotneko · 2024-04-01T13:17:03 1729753239

I experimented with this model and vLLM around a month ago. The long context length is attractive, but it was incredibly slow on a g5.12xlarge (4 NVIDIA A10G GPUs). I actually could not get it to respond for single examples longer than 50K tokens.

viraptor · 2024-04-01T03:32:28 1729753239

The repo is here: https://github.com/InternLM/InternLM

dannyw · 2024-04-01T03:31:50 1729753239

How good is the base (non-instruction-tuned) model? Everyone is trying to make chat bots, but for my use cases, I find base models more suitable.

fragmede · 2024-04-01T05:30:05 1729753239

Interesting. What are some of those use cases?