Hacker News new | past | comments | ask | show | jobs | submit | ai_biden's comments login

I'm not too excited by Phi-4 benchmark results - It is#BenchmarkInflation.

Microsoft Research just dropped Phi-4 14B, an open-source model that’s turning heads. It claims to rival Llama 3.3 70B with a fraction of the parameters — 5x fewer, to be exact.

What’s the secret? Synthetic data. -> Higher quality, Less misinformation, More diversity

But the Phi models always have great benchmark scores, but they always disappoint me in real-world use cases.

Phi series is famous for to be trained on benchmarks.

I tried again with the hashtag#phi4 through Ollama - but its not satisfactory.

To me, at the moment - IFEval is the most important llm benchmark.

But look the smart business strategy of Microsoft:

have unlimited access to gpt-4 the input prompt it to generate 30B tokens train a 1B parameter model call it phi-1 show benchmarks beating models 10x the size never release the data never detail how to generate the data( this time they told in very high level) claim victory over small models


the industry is changing - lets see.


Detailed post with so many frameworks!!!!!


Twitter is coming back with H100 - that's exciting for the space.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: