I don’t have quantitative data but my employer recently released an internal llama v2 at 70B FP16. The internal front end allows us to switch between different LLMs. I would say it’s very on par and sometimes better than GPT3.5 for the task I use it for. You’ll get a lot of different answers here because not many people run 70B at FP16.