Any chance you can share more details on your measurement setup and eval protoco...

brucethemoose2 · 2024-02-24T04:42:59 1708749779

I just loaded it in vllm with default settings.

I can't share the eval, but it's pretty simple: it asks a question about some data, and is restricted to only answer yes/no (based on the output logits and suggested in the prompt). It's called with 0 temperature and only 1 output token, so sampling shouldn't be an issue.