Hacker News new | past | comments | ask | show | jobs | submit login

Any chance you can share more details on your measurement setup and eval protocols? You're likely seeing some config snafus, which we're trying to track down.



I just loaded it in vllm with default settings.

I can't share the eval, but it's pretty simple: it asks a question about some data, and is restricted to only answer yes/no (based on the output logits and suggested in the prompt). It's called with 0 temperature and only 1 output token, so sampling shouldn't be an issue.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: